Passionate about data

Data and its implications on software design and development.

Using Liquibase to Load Data and Ignore Some Columns

Loading data into tables is needed many times on projects to load test, Liquibase provides a method to load data into tables with lots of customization. In the example shown below, I’m loading zip code data with the following column layout


Two Factor Authentication to Authorize Credit

Simple flow of credit file usage by Individual Citizens

In todays environment, identity theft and related crimes are frequent and common place. Most of these incidents take place because our identity can be easily stolen will little bits of information such as name, address, birthdate, social security number or maybe just your card number.

What if, credit file or credit report or credit score even credit cards had two factor authentication, just like we have, two factor authentication on gmail, chase and many other services.

Automatically Adding Columns to Rails Migrations

Many projects need addition of identical columns to all the tables created by the project. Audit columns are an example of such a requirement. The requirement is to add columns such as created_by, created_date, modified_by and modified_date to all the tables, these columns store, who created the row, when the row was created, who modified the row last and when was it modified. created_by and created_date are required to be present when the row is inserted and thus are required to be not nullable. Adding these columns to each and every table is a lot of work for developers.

Synonyms as Abstraction Layer

In many development shops, developers are not allowed to access the database schema directly, and are not allowed to create tables, indexes, views etc, instead are given access via a different schema that allows SELECT, UPDATE and DELETE access on data. The general reason is to avoid developers creating database objects without

Behavior Driven Database Development

When Behavior Driven Development BDD was introduced, some of the key principles were

  • Requirements are behavior,
  • Provides “ubiquitous language” for analysis,
  • Acceptance criteria should be executable.
  • Design constraints should be made into executable tests.

All of these principles can be applied to database development. When interacting with the database, we tend to assume certain behavior of the database. Some of this is universal, like when a row is inserted in a table, the same row can later be retrieved. There are other behaviors of the database on every project that are not that universal, like Person table should have at least firstname or lastname. This behavior changes based on the functionality being developed and thus needs to be properly specified and executed. The database lends itself very well to the new way of thinking in the BDD space, where the behavior of the objects is considered. BDD is similar to describing requirements in code.

Using Rake and Activerecord to Generate Boilerplate DB Code

IN many projects, there are tables which need default audit columns such as Created_By, Created_Date, Modified_By, Modified_date and other columns that need to be updated everytime some actions are done againt the tables and/or columns. This type of functionality can be implemented using triggers. Why have triggers? when the application can do the updating, this is a good argument, but like all application databases eventually other users, applications and scripts will get access to the applications database and end up wanting to read from the database and write to the database. During these times its better to have triggers updating the data independent of the application to ensure audit columns and other columns are updated with appropriate values.

Database Naming Conventions in Different Environments

In every enterprise and every project we end up having multiple environments, especially the database side of the enterprise tends to stick around for a longer period of time and has much more dependencies or application integration as opposed to application urls etc. Given this, how to name the servers, databases and schemas becomes a very important decision, do these names provide for an easy way to use the application and not make it harder or the developers to access the database.

8 Techniques for Testing Migration of Data From Legacy Systems

Many of the projects we end up working on are replacing existing systems with existing data either wholly or in part. In all of the above projects we end up writing data migration or data conversion code to move the data from legacy systems to the new systems. Many stake holders of the project such as business users, project managers, business analysts really care about the data conversion scripts and the quality of the conversion. Since this conversion is business entity related and matters a lot as future business/functionality depends on the data being logically equivalent to the legacy system.

Over the years we have found many techniques that help in testing the quality of the data conversion. Here are the 8 techniques that encompass our learnings when converting data over from legacy databases.

Migrations in NoSQL Databases

In relational database usage the pattern of migrations is well understood and has gained widespread acceptance. Frameworks such as DBDeploy, DBMaintain, MyBatis migrations, Flyway, Liquibase, Active Record Migrations and many others. These tools allow to migrate the database and maintain the version history of the database in the database.

With the rise of NoSQL Databases and their adoption in development teams we are faced with the problem of migrations in NoSQL databases. What are the patterns of data migrations that work in NoSQL databases? as NoSQL databases are schema free and the database does not enforce any schema validation, the schema of the data is in the application and thus allows for different techniques of data migration.

10 Node Mongodb ReplicaSet on a Single Machine

While doing evalauation of NoSQL databases, we had a 10 node riak cluster and wanted check how a similar setup would work with mongodb. So started to setup a 10 node mongodb cluster. Since this was for initial spikes, we decided to set this up on a single machine as with the other test setup using Riak.

Before I explain how we setup 10 node mongodb ReplicaSet, let me talk about replica sets. MongoDB implements replication, providing high availability using replica sets. In a replica set, there are two or more nodes participating in an asynchronous master-slave replication. The replica-set nodes elect the master node, or primary node, among themselves and when the primary node goes down, the rest of the node elect the new primary node.