June 2007 Archives

Long Running Data Migrations during Database Refactorings.

When you are refactoring large databases, you will have certain tables that have millions of rows, so lets say we are doing the Move Column refactoring, moving the TaxAmount column from Charge table which has millions of rows to TaxCharge table. Create the TaxAmount column in the TaxCharge table. Then have to move the data from the TaxAmount column in the Charge table to the TaxAmount column you created in the TaxCharge table.

One way to move the data would be to write a single update statement as shown below.

UPDATE taxCharge set taxAmount=
(SELECT taxAmount FROM charge
WHERE charge.chargeid = taxCharge.chargeID);

Now this update will run for some time and will be run as one transaction, the Update operation will need a lot of UNDO space (UNDO space in oracle other databases may need other types of space, but the general idea is some space is needed by the database server ), so there is a probability of the update failing if the database server runs out of UNDO space, another side effect of moving large amounts of data using a single update (transaction) is that all the updated rows in the TaxCharge table will be locked till the whole update transaction is done and a commit is issued.

Getting around this problem is to use a programmatic update, like the PL/SQL code shown below. This code can be implemented in the database procedural code like TSQL or any other language appropriate for the task.

DECLARE
CURSOR allCharges IS
    SELECT chargeId, taxAmount FROM charge;
numberOfRowsToCommit NUMBER :=1000;
BEGIN
FOR thisCharge IN allCharges LOOP
    UPDATE taxCharge SET taxamount=thisCharge.taxAmount 
        WHERE chargeid = thisCharge.chargeid;
    IF allCharges%rowcount MOD numberOfRowsToCommit = 0 THEN
        COMMIT;
    END IF;
END LOOP;
END;
/

The above PL/SQL code basically loops through all the rows in the Charge table and updates the TaxCharge table with the taxAmount and commits after every 1000 rows (as defined by numberOfRowsToCommit). This approach allows us to work with a smaller UNDO size, releases (not lock) rows once they are committed after every 1000 rows. Off course there are many variations of this technique and many ways to implement this particular update. The point I'm trying to get to is that, depending on the situation, I will choose different ways to update the data and maybe not always use UPDATE statements.

XP2007 Tutorial

I will presenting about Database Refactoring: Evolutionary Database Design at XP 2007 here is the Tutorial schedule

Enforcing your assumptions about database functionality

When you are writing xUnit tests you are in certain ways trying to make sure that the test breaks when the code that is being tested changes the assumptions you made when writing the Test and Production code.

Similarly if you are relying on the database to throw a error when you put invalid data, then you should write a test around this assumption, so that when someone changes the database to not behave the way you assumed it to behave, the test you wrote will break and it will force the team to think about the change to the database that is being undertaken. If the change is really required, then the team would fix the test else rollback the change being made.

Here is a example.

    public void testShouldNotCreateEmployeeWhenHireDateGreaterThanTerminationDate()  {
        employee.setEmployeeID(employeeId);
        employee.setHireDate(hireDate);
        employee.setTerminatedDate(terminatedDate);
        boolean hadException = false;
        try {
            employeeGateway.insert(employee);
        } catch (SQLException e) {
            hadException = true;
        }
        assertTrue("Termination date before hiredate should not be allowed by the database", hadException);
    }

On the Employee table, we have a Check constraint that checks if the TerminationDate is before HireDate. When a employee with hiredata greater than termination date is entered the database will throw a exception. This functionality provided by the database can be assumed by the application. If this database design changes the team will need to change all the affected code, hence its better to put a test around this assumption and fail the test suite when the database design changes.

The failed test will force the team to look at why the database design was the way it was designed and what other places in the code need to change if the database design changes. Obviously you can do this validation in the application before you persist the Employee object to the database. In situations where the database is also being accessed by other applications, its better to delegate the data quality constraints to the database, instead of relying on other apps to do the right thing.

Another way to do the above test would be as suggested by Sudhindra.

    public void testShouldNotCreateEmployeeWhenHireDateGreaterThanTerminationDate()  {
        employee.setEmployeeID(employeeId);
        employee.setHireDate(hireDate);
        employee.setTerminatedDate(terminatedDate);
        try {
            employeeGateway.insert(employee);
            fail("Termination date before hiredate should not be allowed by the database");
        } catch (SQLException e) {
            // expected SQLException
        }
    }