Passionate about data

Data and its implications on software design and development.

Migrations in NoSQL Databases

In relational database usage the pattern of migrations is well understood and has gained widespread acceptance. Frameworks such as DBDeploy, DBMaintain, MyBatis migrations, Flyway, Liquibase, Active Record Migrations and many others. These tools allow to migrate the database and maintain the version history of the database in the database.

With the rise of NoSQL Databases and their adoption in development teams we are faced with the problem of migrations in NoSQL databases. What are the patterns of data migrations that work in NoSQL databases? as NoSQL databases are schema free and the database does not enforce any schema validation, the schema of the data is in the application and thus allows for different techniques of data migration.

10 Node Mongodb ReplicaSet on a Single Machine

While doing evalauation of NoSQL databases, we had a 10 node riak cluster and wanted check how a similar setup would work with mongodb. So started to setup a 10 node mongodb cluster. Since this was for initial spikes, we decided to set this up on a single machine as with the other test setup using Riak.

Before I explain how we setup 10 node mongodb ReplicaSet, let me talk about replica sets. MongoDB implements replication, providing high availability using replica sets. In a replica set, there are two or more nodes participating in an asynchronous master-slave replication. The replica-set nodes elect the master node, or primary node, among themselves and when the primary node goes down, the rest of the node elect the new primary node.

Moved My Blog to Octopress

Its been about a month since my blog moved to octopress, wanted to write about my experience. I had been running my blog for a some time now using Movable Type upgrading as and when new versions where released. Over time I realized that upgrading was fraught with errors as lot of steps had to be done manually. Customizing the layout was risky as there was no way to preview your changes and commit only when I was comfortable. With the release of Movable Type 6 there is no longer a free version to download.

Usage of Mixed Case Database Object Names Is Dangerous

Some versions back, Oracle would not allow to create database object names with mixed cases, even if we tried to create them, we could not. In newer versions of Oracle we can create tables, columns, indexes etc using mixed case or lower case, when the names are put inside double quotes. For example

Create mixed case Customer table and Index on the table
CREATE TABLE "Customer" (
"CustomerID" number(10)

CREATE INDEX "IDX_Customer_CustomerID"
on "Customer"("CustomerID");

10 Node Riak Cluster on a Single Machine

When trying to evaluate NoSQL databases, its usually better to try them out. While trying them out, its better to use them with multiple node configurations instead of running single node. Such as clusters in Riak or Replica-set in mongodb maybe even a sharded setup. On our project we evaluated a 10 node Riak cluster so that we could experiment with N, R and W values and decide which values where optimal for us. In Riak here is what N, R and W mean.

N = Number of Riak nodes to which data will be replicated R = Number of Riak nodes which have to return results for the read to be considered successful W = Number of Riak nodes which have to return a write success before the write is considered successful

Transactions Using Groovy.SQL With Spring Annotations and Connection Pools

When using Groovy with Spring framework,  interacting with the database can be done using the Groovy.SQL class which provides a easy to use interface. When using Groovy.SQL, if we have a need to do transactions, we have the .withTransaction method that accepts a closure, to which we can pass in code to execute within the transaction. In our project since we were using spring already, using annotations to define transactions would be a great. Standard @Transactional annotations with Groovy.SQL will not work, since every place where the Groovy.SQL is used a new connection is acquired from the connection pool causing the database work to span multiple connections, which can result in dead-locks on the database. What we really want is that the database connection be the same across all invocations of Groovy.SQL with-in the same transaction started by the annotated method.

Backup in Mongodb Replica-set Configurations

There are multiple ways to take backups of mongodb is different configuraitions, one of the configuration that I have been involved recently is replica-sets. When mongodb is running in replica-set configuration, there is a single primary node and multiple secondary nodes. To take backup of the replica-set we can either do a mongodump of one of the nodes or shutdown one of the secondary nodes and take file copies, since in a replica-set all nodes have the same data (except arbiter). Lets see how we could deal with mongodump method of taking backup.

MSSQL JDBC Driver Behavior

My latest project involves talking to MS-SQL Server using the JDBC driver and Java. While doing this we setup the database connection and had a simple SQL to get the first_name and last_name for a unique user_id from the application_user table in the database.

SELECT first_name,last_name 
FROM application_user 
WHERE user_id = ?

Back to Blogging

There has been a long pause in my blogging activity. I was trying to finish of my latest writing engagement in regards to NoSQL. Working with Martin Fowler on NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence was really fun. This book will provide a concise text and easy way to understand for everyone the rise of the NoSQL movement and help with what kinds of trade-offs need to be made while working with NoSQL.

The book should soon be in print and e-book formats. Martin has written more about it here

With So Much Pain, Why Are Stored Procedures Used So Much

I keep encountering situations where all the business logic for the applications is in stored procedures and the application layer is just calling the stored procedures to get the work done and return the data. There are many problems with this approach some of them are.

  • Writing stored procedure code is fraught with danger as there are no modern IDE’s that support refactoring, provide code smells like “variable not used”, “variable out of scope”.