Main

September 3, 2009

Create an Index for all FK Columns in the database

Most of the time I have seen database foreign key constraints on tables without indexes on those columns. Lets say the application is trying to delete a row from the CUSTOMER table
DELETE FROM CUSTOMER WHERE CUSTOMERID = 1000;
When the database goes about deleting the customerId of 1000, if there are foreign key constraints defined on customerId, then the database is going to try to find if the customerId of 1000 is used in any of those tables. Lets say ORDER table has the customerId column, the database is going to issue
SELECT ... FROM ORDER WHERE CUSTOMERID = 1000;
now if there is no index on ORDER.CUSTOMERID, the database will have to do a full Table scan which is very expensive in terms of IO and resources, imagine customerId being used in lots of tables, the problem just multiplies significantly. In an multiuser scenario, this will lead to deadlocks, since the same tables are being read and locks being applied to find dependend children. Introducing an index on all the columns that are foreign key referenced helps a lot in this case.

October 3, 2007

Frameworks should not constrain your table design

Recently talking to someone about a persistence framework that they are using,
this framework creates a rows in the database table with just Primary Key value
and then later on updates the table row with values for other columns. Because
of this framework, the tables cannot have any NOT-NULL constraints defined.

For example, the framework is doing

INSERT INTO customer (customerId, name, birthdate) VALUES (1,null,null);
UPDATE customer set name = 'FOO Name', birthdate = '12/12/1978' WHERE customerId = 1;

You cannot have NON-NULL constraint defined on name or birhdate column, since
the INSERT statement would blow up, forcing you to change table design

CREATE TABLE Customer (
CustomerId NUMBER NOT NULL,
Name VARCHAR2(40) NULL,
birthdate DATE NULL,
CONSTRAINT PK_Customer
      PRIMARY KEY (CustomerId)
)
/

What can you do, so that table design does not suffer? and have better data
quality constraints? one argument we see is that, application will constrain the
data and not allow customers with null name or birthdate to be persisted. I have
seen way too many projects where this argument fails after some time when
someone starts to import customers from other system or starts to implement a
service to push customers and now the constraint in the application layer is
pretty useless allowing bad data to get in, since there is nothing stopping
these bad rows from getting in the Customer table.

Differed constraints come to the rescue here in oracle (bet there is equivalent
types of differed constraint checking for other database vendors) Using the
Customer table example, create a differed constraint on Name and BirthDate
columns, as shown below

ALTER TABLE Customer
    ADD (CONSTRAINT CHK_Customer_Name_NotNull
    CHECK (Name IS NOT NULL) DEFERRABLE INITIALLY DEFERRED)
/
ALTER TABLE Customer
    ADD (CONSTRAINT CHK_Customer_BirthDate_NotNull
    CHECK (BirthDate IS NOT NULL) DEFERRABLE INITIALLY DEFERRED)
/

now run the same Insert and Update shown above, they would run fine, but you
would get a constraint violation error when you try to commit with a null Name
or BirthDate.

INSERT INTO customer (customerId, name, birthdate) VALUES (1,null,null);
UPDATE customer set name = 'FOO Name', birthdate = '12/12/1978' WHERE customerId = 1;

The radical Idea, I'm trying to highlight is to be creative in you database
design and use the features of the database to be able to enforce data quality
constraints or other constraints and still not have to suffer on application
development velocity.

August 28, 2007

When does evolutionary design happen?

A question I get, mostly related to the evolutionary database design and development. When the pair (team) gets a new feature (story) to work on, the team looks at the existing table/database design and sees if the current design
is enough to implement the feature they are working on. If the currency database design does support the feature they are trying to implement, then they do not have to change the database at all, they will move on to implement the feature and change the application code as necessary.

If the current database/table design does not allow them to implement the feature that they are trying to complete, then they start looking at how to change the database/table design so that they can implement the feature.

Without getting into the discussion of what is the correct design, I wanted to highlight how the design decision is made.

Lets take a example: The feature Joe and Andy (pair) are working on is "As a Customer I should be able to have billing and shipping address". Now to implement this feature Joe and Andy see if the Customer has the ability to store any addresses, if we can store addresses, if we have address table etc.

Joe and Andy find out that they have a Customer table and an Address table, also that Customer table has a AddressID on it, which shows the CustomerAddress. Joe and Andy decide to create two more attributes/columns on the Customer table named BillingAddressID and ShippingAddressID and remove the previous AddressID, copying the data from AddressID to BillingAddressID as well as ShippingAddressID, changing the AddressID makes for better column name.

So Joe and Andy implemented the following refactorings.

Introduce New Column (ShippingAddressID)
Introduce New Column (BillingAddressID)
Move Data (AddressID to ShippingAddressID and BillingAddressID)
Add Foreign Key Constraint (ShippingAddressID depends on Address table)
Add Foreign Key Constraint (BillingAddressID depends on Address table)
Drop Column (AddressID)

Now they go on to implement the rest of the feature in the Application, I find that one pair (Joe and Andy in our example) working from the UI layer down to the database layer touching all the layers of the application makes for much more sense than having one pair work on the database side of things, another on the domain layer and then another on the UI layer. This allows for the team to learn all aspects of the application and also makes everyone productive and not wait on team to finish their part before starting to work on our feature. When you are lacking in certain skills, you can always pair with the Expert in the given area. If Joe and Andy thought they need help writing the data migration script for the above refactorings, then they would pull the data expert on the team and then pair with him. This is how collaborative and evolutionary design happens.

June 8, 2007

Enforcing your assumptions about database functionality

When you are writing xUnit tests you are in certain ways trying to make sure that the test breaks when the code that is being tested changes the assumptions you made when writing the Test and Production code.

Similarly if you are relying on the database to throw a error when you put invalid data, then you should write a test around this assumption, so that when someone changes the database to not behave the way you assumed it to behave, the test you wrote will break and it will force the team to think about the change to the database that is being undertaken. If the change is really required, then the team would fix the test else rollback the change being made.

Here is a example.

    public void testShouldNotCreateEmployeeWhenHireDateGreaterThanTerminationDate()  {
        employee.setEmployeeID(employeeId);
        employee.setHireDate(hireDate);
        employee.setTerminatedDate(terminatedDate);
        boolean hadException = false;
        try {
            employeeGateway.insert(employee);
        } catch (SQLException e) {
            hadException = true;
        }
        assertTrue("Termination date before hiredate should not be allowed by the database", hadException);
    }

On the Employee table, we have a Check constraint that checks if the TerminationDate is before HireDate. When a employee with hiredata greater than termination date is entered the database will throw a exception. This functionality provided by the database can be assumed by the application. If this database design changes the team will need to change all the affected code, hence its better to put a test around this assumption and fail the test suite when the database design changes.

The failed test will force the team to look at why the database design was the way it was designed and what other places in the code need to change if the database design changes. Obviously you can do this validation in the application before you persist the Employee object to the database. In situations where the database is also being accessed by other applications, its better to delegate the data quality constraints to the database, instead of relying on other apps to do the right thing.

Another way to do the above test would be as suggested by Sudhindra.

    public void testShouldNotCreateEmployeeWhenHireDateGreaterThanTerminationDate()  {
        employee.setEmployeeID(employeeId);
        employee.setHireDate(hireDate);
        employee.setTerminatedDate(terminatedDate);
        try {
            employeeGateway.insert(employee);
            fail("Termination date before hiredate should not be allowed by the database");
        } catch (SQLException e) {
            // expected SQLException
        }
    }

December 15, 2006

Implementing Make Column Non Nullable

While working on a Legacy Application with Legacy Database design as part of fixing a bug, I thought this bug would not have ever happened if a particular column was defined as Non Nullable since this particular column was the identifier to the parent table.

We had a Customer table and the all the names a customer could have like LegalName, LongName, ShortName etc are stored in the CustomerName table. CustomerName cannot exist without Customer hence its logical that the CustomerName.CustomerID column cannot be nullable and should also have a Foreign Key constraint enforcing the relationship. Implementing just the Foreign Key constraint is not enough since the application could potentially be inserting null values in the CustomerName.CustomerID creating orphan records.

So we had to implement Add Foreign Key Constraint and later on implement Make column non nullable

The changes we had to make to the legacy application and legacy database involved

1. Find and fix the place where the application creates or updates CustomerName so that it does not create a CustomerName with Null or Invalid CustomerID values. I did it by using the Applications Customer and CustomerName domain objects and the business logic that created these objects and wrote a unit test that inteacts with the database. So that the object was persisted and I could see the resulting rows in the database. So once I do Step 2 and 3 below, this unit test will fail. The failing test will point me to the problem in the application code.

2.Write a SQL Script to fix the data in CustomerName, such that we delete all CustomerName rows where the CustomerName.CustomerID is null and also where CustomerName.CustomerID does not exist in Customer.CustomerID. The reason I did this was all the CustomerName rows where CustomerID was null where orphan rows that could not be reached too by the application anyway and hence could be removed.

3.Create a Foreignkey Constraint between Customer and CustomerName on the CustomerID column.

4.Now that we have no nulls in the CustomerName.CustomerID column and all the refrentially improper data has been removed. We run our unit test from Step 1. THe failure will point us to the places in the application code where we need to fix the code.

5.Now our applcation code has been changed and we improved it, we have improved the the database design, in the process we also have a unit test that will break when anyone else makes any application code changes that break the Refrentially Integrity and/or the Not Null rule.

Overall a satisfying day at work I think.