Recently in Improving Design Category

With so much pain, why are stored procedures used so much

I keep encountering situations where all the business logic for the applications is in stored procedures and the application layer is just calling the stored procedures to get the work done and return the data. There are many problems with this approach some of them are.
  • Writing stored procedure code is fraught with danger as there are no modern IDE's that support refactoring, provide code smells like "variable not used", "variable out of scope".
  • Finding usages of a given stored procedure or function usually means doing a text search of the whole code base for the name of the function or stored procedure, so refactoring to change name is painful, which means names that do not make any sense are propagated, causing pain and loss of developer productivity
  • When coding of stored procedures is done, you need a database to compile the code, this usually means a large database install on your desktop or laptop the other option being to connect to the central database server, again this leads to developers having to carry a lot of dependent systems just to compile their code, this can to solved by database vendors providing a way to compile the code outside of the database.
  • Code complexity tools, PMD metrics, Checkstyle etc type of tools are very rare to find for stored procedures, thus making the visualization of metrics around the stored procedure code almost impossible or very hard
  • Unit testing stored procedures using *Unit testing frameworks out there like pl/sql unit, ounit, tsql unit is hard, since these frameworks need to be run inside the database and integrating them with Continuous Integration further exasperates the problems
  • Order or creation of stored procedures becomes important as you start creating lots of stored procedures and they become interdependent. While creating them in a brand new database, there are false notifications thrown around about missing stored procedures, usually to get around this problem, I have seen a master list of ordered stored procedures for creation maintained by the team or just recompile all stored procedures once they are created "ALTER RECOMPILE" was built for this. Both of these solutions have their own overhead.
  • While running CPU intensive stored procedures, the database engine is the only machine (like JVM) available for the code to run, so if you want to start more processes so that we can handle more requests, its not possible without a database engine. So the only solution left is to get a bigger box (Vertical Scaling)
There certainly are lots of other problems associated with using stored procedures, which I will not get into.

Create an Index for all FK Columns in the database

Most of the time I have seen database foreign key constraints on tables without indexes on those columns. Lets say the application is trying to delete a row from the CUSTOMER table
DELETE FROM CUSTOMER WHERE CUSTOMERID = 1000;
When the database goes about deleting the customerId of 1000, if there are foreign key constraints defined on customerId, then the database is going to try to find if the customerId of 1000 is used in any of those tables. Lets say ORDER table has the customerId column, the database is going to issue
SELECT ... FROM ORDER WHERE CUSTOMERID = 1000;
now if there is no index on ORDER.CUSTOMERID, the database will have to do a full Table scan which is very expensive in terms of IO and resources, imagine customerId being used in lots of tables, the problem just multiplies significantly. In an multiuser scenario, this will lead to deadlocks, since the same tables are being read and locks being applied to find dependend children. Introducing an index on all the columns that are foreign key referenced helps a lot in this case.

Frameworks should not constrain your table design

Recently talking to someone about a persistence framework that they are using,
this framework creates a rows in the database table with just Primary Key value
and then later on updates the table row with values for other columns. Because
of this framework, the tables cannot have any NOT-NULL constraints defined.

For example, the framework is doing

INSERT INTO customer (customerId, name, birthdate) VALUES (1,null,null);
UPDATE customer set name = 'FOO Name', birthdate = '12/12/1978' WHERE customerId = 1;

You cannot have NON-NULL constraint defined on name or birthdate column, since
the INSERT statement would blow up, forcing you to change table design

CREATE TABLE Customer (
CustomerId NUMBER NOT NULL,
Name VARCHAR2(40) NULL,
birthdate DATE NULL,
CONSTRAINT PK_Customer
      PRIMARY KEY (CustomerId)
)
/

What can you do, so that table design does not suffer? and have better data
quality constraints? one argument we see is that, application will constrain the
data and not allow customers with null name or birthdate to be persisted. I have
seen way too many projects where this argument fails after some time when
someone starts to import customers from other system or starts to implement a
service to push customers and now the constraint in the application layer is
pretty useless allowing bad data to get in, since there is nothing stopping
these bad rows from getting in the Customer table.

Differed constraints come to the rescue here in oracle (bet there is equivalent
types of differed constraint checking for other database vendors) Using the
Customer table example, create a differed constraint on Name and BirthDate
columns, as shown below

ALTER TABLE Customer
    ADD (CONSTRAINT CHK_Customer_Name_NotNull
    CHECK (Name IS NOT NULL) DEFERRABLE INITIALLY DEFERRED)
/
ALTER TABLE Customer
    ADD (CONSTRAINT CHK_Customer_BirthDate_NotNull
    CHECK (BirthDate IS NOT NULL) DEFERRABLE INITIALLY DEFERRED)
/

now run the same Insert and Update shown above, they would run fine, but you
would get a constraint violation error when you try to commit with a null Name
or BirthDate.

INSERT INTO customer (customerId, name, birthdate) VALUES (1,null,null);
UPDATE customer set name = 'FOO Name', birthdate = '12/12/1978' WHERE customerId = 1;

The radical Idea, I'm trying to highlight is to be creative in you database
design and use the features of the database to be able to enforce data quality
constraints or other constraints and still not have to suffer on application
development velocity.

When does evolutionary design happen?

A question I get, mostly related to the evolutionary database design and development. When the pair (team) gets a new feature (story) to work on, the team looks at the existing table/database design and sees if the current design
is enough to implement the feature they are working on. If the currency database design does support the feature they are trying to implement, then they do not have to change the database at all, they will move on to implement the feature and change the application code as necessary.

If the current database/table design does not allow them to implement the feature that they are trying to complete, then they start looking at how to change the database/table design so that they can implement the feature.

Without getting into the discussion of what is the correct design, I wanted to highlight how the design decision is made.

Lets take a example: The feature Joe and Andy (pair) are working on is "As a Customer I should be able to have billing and shipping address". Now to implement this feature Joe and Andy see if the Customer has the ability to store any addresses, if we can store addresses, if we have address table etc.

Joe and Andy find out that they have a Customer table and an Address table, also that Customer table has a AddressID on it, which shows the CustomerAddress. Joe and Andy decide to create two more attributes/columns on the Customer table named BillingAddressID and ShippingAddressID and remove the previous AddressID, copying the data from AddressID to BillingAddressID as well as ShippingAddressID, changing the AddressID makes for better column name.

So Joe and Andy implemented the following refactorings.

Introduce New Column (ShippingAddressID)
Introduce New Column (BillingAddressID)
Move Data (AddressID to ShippingAddressID and BillingAddressID)
Add Foreign Key Constraint (ShippingAddressID depends on Address table)
Add Foreign Key Constraint (BillingAddressID depends on Address table)
Drop Column (AddressID)

Now they go on to implement the rest of the feature in the Application, I find that one pair (Joe and Andy in our example) working from the UI layer down to the database layer touching all the layers of the application makes for much more sense than having one pair work on the database side of things, another on the domain layer and then another on the UI layer. This allows for the team to learn all aspects of the application and also makes everyone productive and not wait on team to finish their part before starting to work on our feature. When you are lacking in certain skills, you can always pair with the Expert in the given area. If Joe and Andy thought they need help writing the data migration script for the above refactorings, then they would pull the data expert on the team and then pair with him. This is how collaborative and evolutionary design happens.

Enforcing your assumptions about database functionality

When you are writing xUnit tests you are in certain ways trying to make sure that the test breaks when the code that is being tested changes the assumptions you made when writing the Test and Production code.

Similarly if you are relying on the database to throw a error when you put invalid data, then you should write a test around this assumption, so that when someone changes the database to not behave the way you assumed it to behave, the test you wrote will break and it will force the team to think about the change to the database that is being undertaken. If the change is really required, then the team would fix the test else rollback the change being made.

Here is a example.

    public void testShouldNotCreateEmployeeWhenHireDateGreaterThanTerminationDate()  {
        employee.setEmployeeID(employeeId);
        employee.setHireDate(hireDate);
        employee.setTerminatedDate(terminatedDate);
        boolean hadException = false;
        try {
            employeeGateway.insert(employee);
        } catch (SQLException e) {
            hadException = true;
        }
        assertTrue("Termination date before hiredate should not be allowed by the database", hadException);
    }

On the Employee table, we have a Check constraint that checks if the TerminationDate is before HireDate. When a employee with hiredata greater than termination date is entered the database will throw a exception. This functionality provided by the database can be assumed by the application. If this database design changes the team will need to change all the affected code, hence its better to put a test around this assumption and fail the test suite when the database design changes.

The failed test will force the team to look at why the database design was the way it was designed and what other places in the code need to change if the database design changes. Obviously you can do this validation in the application before you persist the Employee object to the database. In situations where the database is also being accessed by other applications, its better to delegate the data quality constraints to the database, instead of relying on other apps to do the right thing.

Another way to do the above test would be as suggested by Sudhindra.

    public void testShouldNotCreateEmployeeWhenHireDateGreaterThanTerminationDate()  {
        employee.setEmployeeID(employeeId);
        employee.setHireDate(hireDate);
        employee.setTerminatedDate(terminatedDate);
        try {
            employeeGateway.insert(employee);
            fail("Termination date before hiredate should not be allowed by the database");
        } catch (SQLException e) {
            // expected SQLException
        }
    }

Implementing Make Column Non Nullable

While working on a Legacy Application with Legacy Database design as part of fixing a bug, I thought this bug would not have ever happened if a particular column was defined as Non Nullable since this particular column was the identifier to the parent table.

We had a Customer table and the all the names a customer could have like LegalName, LongName, ShortName etc are stored in the CustomerName table. CustomerName cannot exist without Customer hence its logical that the CustomerName.CustomerID column cannot be nullable and should also have a Foreign Key constraint enforcing the relationship. Implementing just the Foreign Key constraint is not enough since the application could potentially be inserting null values in the CustomerName.CustomerID creating orphan records.

So we had to implement Add Foreign Key Constraint and later on implement Make column non nullable

The changes we had to make to the legacy application and legacy database involved

1. Find and fix the place where the application creates or updates CustomerName so that it does not create a CustomerName with Null or Invalid CustomerID values. I did it by using the Applications Customer and CustomerName domain objects and the business logic that created these objects and wrote a unit test that inteacts with the database. So that the object was persisted and I could see the resulting rows in the database. So once I do Step 2 and 3 below, this unit test will fail. The failing test will point me to the problem in the application code.

2.Write a SQL Script to fix the data in CustomerName, such that we delete all CustomerName rows where the CustomerName.CustomerID is null and also where CustomerName.CustomerID does not exist in Customer.CustomerID. The reason I did this was all the CustomerName rows where CustomerID was null where orphan rows that could not be reached too by the application anyway and hence could be removed.

3.Create a Foreignkey Constraint between Customer and CustomerName on the CustomerID column.

4.Now that we have no nulls in the CustomerName.CustomerID column and all the refrentially improper data has been removed. We run our unit test from Step 1. THe failure will point us to the places in the application code where we need to fix the code.

5.Now our applcation code has been changed and we improved it, we have improved the the database design, in the process we also have a unit test that will break when anyone else makes any application code changes that break the Refrentially Integrity and/or the Not Null rule.

Overall a satisfying day at work I think.

All Content Copyright © 2012 Pramod Sadalage. All Rights Reserved.