Passionate about data

Data and its implications on software design and development.

Migrations in NoSQL Databases

In relational database usage the pattern of migrations is well understood and has gained widespread acceptance. Frameworks such as DBDeploy, DBMaintain, MyBatis migrations, Flyway, Liquibase, Active Record Migrations and many others. These tools allow to migrate the database and maintain the version history of the database in the database.

With the rise of NoSQL Databases and their adoption in development teams we are faced with the problem of migrations in NoSQL databases. What are the patterns of data migrations that work in NoSQL databases? as NoSQL databases are schema free and the database does not enforce any schema validation, the schema of the data is in the application and thus allows for different techniques of data migration. ## Migrate all the data in one go In this pattern, we have to write a script that access all the objects in the database and migrates them to the latest version of the schema in the code, this pattern assumes that

  • It would be possible to access all the objects in the database, modify them and persist them back. In key-value stores its an expensive operation to retrieve all the keys
  • During update of all the objects, the application may not modify the objects and create collusions.
  • All the existing objects are at the same version of the schema

Given the above assumptions, are valid we can use the same pattern used in relational databases of maintaining an list of versions applied to the database changelog table and then deciding which versions need to be applied for this deployment of the application mongodb migrations, mutagen cassandra, cdeploy Mongoid rails migration are an example of this approach.

Migrate data during read

In this pattern, the data is read as is when required by the application, the data is then migrated to the latest version needed by the application and used by the application and written back when the user is done with the operation and now consists of upgraded data, this pattern assumes that

  • The application can read the oldest version of the data and upgrade it to the latest, sometimes keeping all this code in the code repository may dirty the code and reduce programmer productivity.
  • Each object needs to know which version of the migration was applied to it, which means an additional attribute on the object.
  • There maybe some objects that never get accessed and thus will never get migrated.

Given the above assumptions, we need to write code that can deal with multiple versions of the data and upgrade all of the versions to the required version and persist it back with the latest version. Curator is an example for rails which allows the objects to be migrated to the latest version on read.

Hybrid approach

In this pattern we can migrate the objects one at a time during read operations, at the same time we can have a background job that is running constantly and migrating the objects one at a time. This approach allows for all the objects to be migrated in a known period of time without having the need to keep all the code around to migrate the oldest versions of the object, thus allowing us to clean out code that is not needed.