That part struck me too. On the other hand, agile is the anathema of good data m...

wpietri · on Aug 22, 2012

On the other hand, agile is the anathema of good data modelling and it causes productivity gains early on which lead to ossification and inflexibility later

If you stop refactoring. So you know what? Don't stop.

At the beginning is the project when you know the least. About your tech, about your product, about your customers, about your competitors and partners. The best design decisions are made with the most information. Ergo, all design decisions should be delayed as long as possible.

The only question is how we responsibly delay design decisions. The answer for me is lots of automated tests and a willingness to refactor as we see ways to improve our designs. Plus a bunch of other agile technical practices.

einhverfr · on Aug 23, 2012

This works only if you use an RDBMS (instead of a NoSQL solution) and you use it as a very dumb data store. For large classes of applications you lose far more than you gain. The reason is that you are FORCED to refactor your data with an RDBMS.

As soon as you move outside the RDMBS storage model with agile development something very icky happens. Old data structures stick around until actively cleaned up using processes which may take a long time to complete. If you have half a billion documents in Mongo DB created and used by your agile app, you are going to have to have your application handle those data-structure-wise forever. Every new change then creates an additional corner case that must be perpetually handled.

wpietri · on Aug 23, 2012

I don't get why you think NoSQL solutions are exempt from refactoring and maintaining data quality. I've happily done that with non-SQL stores.

No matter what development approach you use, you will eventually have to deal with change. No interesting application hits v 1.0 and sticks there. The only question is whether you embrace change, which is the Agile approach, or whether you resist it. If you resist it, you change in big lurches when circumstances force it. Having tried both, I favor the former regardless of what storage mechanism I'm using.

einhverfr · on Aug 23, 2012

Refactoring your data structures is different though, is it not? In an RDBMS, you have to migrate your data when the schema changes, correct?

If you have, say, 100 million records in Mongo DB and you decide you need to refactor your data schemas, how long will it take you to do that?

The problem is that an RDBMS is very rigid typically on data input but very flexible on data output. if you change data input you MUST migrate data with it. MongoDB and others are very flexible with data input but what you get out is more or less what you have told it to store. This is a very different problem. You lose the ability to continue to refactor your data structure semantics beyond a certain point. This is not really the case with an RDBMS.

BTW, I have started to get PostgreSQL object-relational modelling and it is amazing. I suspect that although the barrier to entry (due to knowledge requried) is high, multiple table inheritance ought to allow far more agile development of database schemata than standard relational models allow, and at the same time do full intelligent database stuff.

wpietri · on Aug 23, 2012

Again, I'm not seeing the problem.

100 million records may take a lot of time to update, but so what? It's not like you make major schema changes to the same data every day. If you write your code to read either format v1 or v2 and to output v2, the migration can begin gradually. When you're sure it's working well, you start a low-priority job that reads old records and migrates them. Eventually, all data is in v2, so you can ditch the reader code for v1.

einhverfr · on Aug 24, 2012

When you're sure it's working well,

Translation: We cut out the time on managing our schema and sped up our development, and this meant we got to spend it all back and more managing schemaless storage!

wpietri · on Aug 25, 2012

If you don't like NoSQL stores, don't use 'em. Either toolset is fine with an agile approach, though.

_ea1k · on Aug 22, 2012

I'm not really following your argument. Why would Agile shut doors to future change? It seems like one of the core tenets of Agile is to embrace the concept of change and willfully eliminate legacy models as you are able.

I also think it's very naive to think of NoSQL as purely OODB 2.0. It is getting used for a lot of different things, that IMO are inappropriate for it. On the other hand, the movement is intended to solve a very different set of problems (DBs with various ACID levels depending upon scalability and durability requirements), from those solved by OODBs (API improvements that weren't), and traditional RDBMS systems.

einhverfr · on Aug 23, 2012

But NoSQL is pretty much based on the idea that you store data from your application in a way which is close to the internal data structures, is it not? Since you aren't dealing with the ability to do on-the-fly, ad hoc transformation (the RDBMS and ORDBMS model) then you are limited more or less to a transactional state persistence layer, which is what the OODBMs's were intended to be. It doesn't really solve any of the problems that beset the OODBMS and other app-specific storage layer. It does solve all the problems that OODBMSs solved and a few more.

I can't see why one wouldn't see NoSQL as essentially a continued outgrowth of the OODBMS movement to be honest.

In essence for complex data it seems to me you have two choices:

1) Insist on rigidly mathematically defined input, allow ad-hoc transformation on output. Input is rigid, output is flexible, or

2) Be flexible with input, and allow some transformation both in input and calling on output. Input is flexible but output is pretty rigid.

the first is an RDBMS, and possibly the array-native databases in the NoSQL world, while the second is the OODBMS, and the vast majority of NoSQL products.

I think this fundamentally gets us to Stonebraker's 4-fold division, where on the left hand side you have simple data:

With query = RDBMS Without Query = Filesystem

On the right hand side you have complex data (nested structures and the like):

With query: ORDBMS Without Query: OODBMS

I don't think there is any question that the key problems of operating without declarative queries that can transform data reliably on data output (and this requires fixed schemas for data input) are the same whether you are dealing with OODBMS's or NoSQL db's.

_ea1k · on Aug 23, 2012

> But NoSQL is pretty much based on the idea that you store data from your application in a way which is close to the internal data structures, is it not?

IMO, it's not. Try BigTable on Google App Engine, or HBase for some examples. IMO, these are often more difficult to relate to a programming model than a traditional RDBMS due to their inherent limitations. Of course, their structure is one of Columns/Rows as well.

One of the problems with describing NoSQL as a movement is that there are a broad range of solutions to a broad range of problems that are being covered by the same terminology.

On the other hand, I suppose there are quite a few NoSQL solutions that do not require structured input data and therefore may appear attractive to people who don't want to maintain rigid structures for data input. That's always seemed like more hype than reality to me, though. I've had pretty good experiences with evolving schemas within PostgreSQL and other SQL systems, and the challenges faced (data migration, dealing with old data in the model, etc) were essentially the same that I would have faced with something like a MongoDB system. I don't really see the advantage of NoSQL there at all. I can also see how they could be viewed as essentially OODB 2.0, and they may very well fall by the wayside for a lot of the same reasons.