February 2007 Archives

Nulls need special love

The following SELECT statement in code

stmt = DB.prepare("select id,name,state,zip " +
        "from customer " +
        "where " +
        "phone = ? " +
        "and active = ?");
stmt.setString(1, customerPhone);
stmt.setBoolean(2, isActive);
stmt.execute();

where customerPhone and isActive are values you would pass in to the SELECT before its executed. Everything is fine when one day the value passed for customerPhone is NULL. For a database (Oracle is what I know most) a NULL will never be equal to NULL , the SELECT will not return rows where the customer.phone is NULL, leading to wrong results. The SELECT will have to be changed to

stmt = DB.prepare("select id,name,state,zip " +
        "from customer " +
        "where " +
        "(phone IS NULL or phone = ?) " +
        "and " +
        "active = ?");
stmt.setString(1, customerPhone);
stmt.setBoolean(2, isActive);
stmt.execute();

We could dynamically write the SELECT so that we don't have to do the OR in the where clause, which could be expensive.

boolean hasPhone = false;
StringBuffer sqlQuery = new StringBuffer();
sqlQuery.append("select id,name,state,zip " +
                "from customer " +
                "where 1=1 ");
if (phoneNumber != null) {
    sqlQuery.append("and phone=? ");
    hasPhone = true;
} else {
    sqlQuery.append("and phone IS NULL ");
}
sqlQuery.append("and isActive=?");
stmt = DB.prepare(sqlQuery.toString());
if (hasPhone) {
    stmt.setString(1, phoneNumber);
    stmt.setBoolean(2, isActive);
} else {
    stmt.setBoolean(1, isActive);
}
stmt.execute();

Data Quality and Code Quality

Recently we had peculiar problem. Some of the data in the database was not being created in a proper fashion. Once we found that out we fixed the problem in the application. The customer still had the perception that the code is still broken, because the fixed code was now interacting with the data that was broken (since it was created much earlier by code that was broken). Data has a life of its own (more on this later)

So I started thinking ( this happens very rarely) about data quality. Does improved data quality give a perception of improved code quality? Does poor data quality drag down the perception of code quality? Anyway the effort invested in code quality can be negated if Data Quality is not thought about!.

When unit tests could be written to flush out design/code quality problems in the application. what could be done to improve data quality, what kind of tests could be written? Would these tests apply to the design of the database or to the quality of the data?