Architecture and Data Blog

Thoughts about intersection of data, devops, design and software architecture

8 Techniques for testing migration of data from legacy systems

How to use stored procedures as interface to the data

Many of the projects we end up working on are replacing existing systems with existing data either wholly or in part. In all of the above projects we end up writing data migration or data conversion code to move the data from legacy systems to the new systems. Many stake holders of the project such as business users, project managers, business analysts really care about the data conversion scripts and the quality of the conversion. Since this conversion is business entity related and matters a lot as future business/functionality depends on the data being logically equivalent to the legacy system.


10 node mongodb ReplicaSet on a Single Machine"

While doing evalauation of NoSQL databases, we had a 10 node riak cluster and wanted check how a similar setup would work with mongodb. So started to setup a 10 node mongodb cluster. Since this was for initial spikes, we decided to set this up on a single machine as with the other test setup using Riak.

Before I explain how we setup 10 node mongodb ReplicaSet, let me talk about replica sets. MongoDB implements replication, providing high availability using replica sets. In a replica set, there are two or more nodes participating in an asynchronous master-slave replication. The replica-set nodes elect the master node, or primary node, among themselves and when the primary node goes down, the rest of the node elect the new primary node.


Usage of mixed case database object names is dangerous

Just because the database allows mixed case naming, does not mean you should use it

Some versions back, Oracle would not allow to create database object names with mixed cases, even if we tried to create them, we could not. In newer versions of Oracle we can create tables, columns, indexes etc using mixed case or lower case, when the names are put inside double quotes. For example


10 node Riak cluster on a single machine

When trying to evaluate NoSQL databases, its usually better to try them out. While trying them out, its better to use them with multiple node configurations instead of running single node. Such as clusters in Riak or Replica-set in mongodb maybe even a sharded setup. On our project we evaluated a 10 node Riak cluster so that we could experiment with N, R and W values and decide which values where optimal for us. In Riak here is what N, R and W mean.


With so much pain, why are stored procedures used so much

Deciding when to used stored procedures is an important design decision that needs to be made with lots of thought

I keep encountering situations where all the business logic for the applications is in stored procedures and the application layer is just calling the stored procedures to get the work done and return the data. There are many problems with this approach some of them are.

  • Writing stored procedure code is fraught with danger as there are no modern IDE’s that support refactoring, provide code smells like “variable not used”, “variable out of scope”.


Replica sets in MongoDB

How to setup high availability replicaset in mongodb

Replica sets is a feature of MongoDB for Automatic Failover, in this setup there is a primary server and the rest are secondary servers. If the primary server goes down, the rest of the secondary servers choose a new primary via an election process, each server can also be assigned number of votes, so that you can decide the next primary based on data-center location, machine properties etc, you can also start mongo database processes that act only as election tie-breakers these are known as arbiters, these arbiters will never have data, but just act as agents that break the tie.


Create an Index for all FK Columns in the database

Generating indexes for all foreign key constraints in the database

Most of the time I have seen database foreign key constraints on tables without indexes on those columns. Lets say the application is trying to delete a row from the CUSTOMER table

DELETE FROM CUSTOMER WHERE CUSTOMERID = 1000;

When the database goes about deleting the customerId of 1000, if there are foreign key constraints defined on customerId, then the database is going to try to find if the customerId of 1000 is used in any of those tables. Lets say ORDER table has the customerId column, the database is going to issue


Explicitly rollback when you encounter a deadlock.

Deadlock transactions raise an exception, but don't automatically get reversed

Dead lock is caused in the database when you have resources (connections) waiting for other connections to release locks on the rows that are needed by the session, resulting in all session being blocked. Oracle automatically detects deadlocks are resolves the deadlock by rolling back the statement in the transaction that detected the deadlock. Thing to remember is that last statement is rolled back and not the whole transaction, which means that if you had other modifications, those rows are still locked and the application should make sure that it does a explicit rollback on the connection.


Oracle Metadata can be mis-leading

Oracle has metadata about all its objects in various tables/views. One such view is the USER_OBJECTS or ALL_OBJECTS, this view has a column named as STATUS which shows you if the given object is VALID or INVALID. The status applies to DB Code (Stored Procedures, Functions, Triggers etc).

To find all the INVALID objects in the schema, issue

SELECT * FROM USER_OBJECTS WHERE STATUS='INVALID';

One problem with the way oracle maintains this metadata is, changing the underlying table on which the DB Code depends, oracle marks the objects are INVALID even though the underlying table may have changed in such a way, that it does not affect the DB Code at all (like adding a new column, or making a colum nullable). Here is some code which shows you what I mean. Run it through SQLPlus.


Storing just the time in Oracle

How to store time, without the data component

We came across a need to save just the Time in the database, the requirement is to store time of the day, like say the user likes to have Breakfast at 8.15AM and Lunch at 12.32PM etc. Off course oracle does not have a time only data type. So we ended up using DATE as the data type and just setting the time. for example:

CREATE TABLE FOO (PREFERRED_TIME DATE NULL);
INSERT INTO FOO (TO_DATE('11:34','HH24:MI'));

oracle automatically sets the date to the first day of the current month. so when you do a select from the FOO table the data would be