Main

November 18, 2009

Testing in conversion projects

When working on projects involving Conversion of data or Migration/Moving of data from a legacy database. The testing effort is enormous and testing takes a lot of time, some test automation can help this effort.

Since data is moved/changed from a source database to destination database, we can write sql which should provide results for the types of tests you want to perform, for example: write a sql to give us number of customers, write a sql to give us account balance for a specific account.

These sqls can be run on your source database as well as your destination database and the results can be compared programmatically, providing us an easy way to compare the state of the database before and after conversion/migration. This testing can be run through a CI engine to make it a regression test suite.

Here is an example implementation using ruby,

We have two databases SOURCE and DESTINATION and two sql files names source.sql and destination.sql. The ruby program picks up sql from these two files and runs them against their database i.e. sql from source.sql is run against the SOURCE database and sql from destination.sql is run against DESTINATION database. The results of both of those sqls is compared and an failure is raised when the results do not match.

 results
  statement = get_sql_statement_to_execute
    begin
      source_statement = statement[0]
      destination_statement = statement[1]
      source_rows = exec_sql_in_source_return_rows(source_statement)
      destination_rows = exec_sql_in_destination_return_rows(destination_statement)
      result = compare_rows(source_rows, destination_rows, destination_statement, source_statement)
      results << result
    rescue
      Log.log("Could not process: "+statement)
    end
    if (results.size > 0)
      Log.log("Results do not match in source and destination")
    end

The sample ruby code above shows how the solution can be implemented, thus enabling automation of database conversion/migration testing

May 26, 2009

Explicitly rollback when you encounter a deadlock.

Dead lock is caused in the database when you have resources (connections) waiting for other connections to release locks on the rows that are needed by the session, resulting in all session being blocked. Oracle automatically detects deadlocks are resolves the deadlock by rolling back the statement in the transaction that detected the deadlock. Thing to remember is that last statement is rolled back and not the whole transaction, which means that if you had other modifications, those rows are still locked and the application should make sure that it does a explicit rollback on the connection.

For example.
Lets assume there are two tables Parent(ParentID) and Child(ChildID)

SESSION_A >create table parent (parentId number(10));
Table created.
SESSION_A >create table child (childId number(10));
Table created.
SESSION_A >insert into parent values (100);
1 row created.
SESSION_A >insert into child values (200);
1 row created.
SESSION_A >commit;
Commit complete.
SESSION_A >select * from parent;
  PARENTID
----------
       100

SESSION_A >select * from child;
CHILDID
----------
200
SESSION_A >

Now lets create a situation where a deadlock happens. There are two sessions connected to the same database and same user, SESSION_A and SESSION_B are the two sessions in question.

SESSION_A >update parent set parentid = 1000 where parentid=100;
1 row updated.
SESSION_B >update child set childid = 2000 where childid = 200;
1 row updated.
SESSION_B >update parent set parentid = 2001 where parentid=100;
--Waiting For Lock on Row in Parent Table, held by SESSION_A
SESSION_A >update child set childid = 1001 where childid = 200;
update child set childid = 1001 where childid = 200
       *
ERROR at line 1:
ORA-00060: deadlock detected while waiting for resource
--SESSION_A requesting lock on row, held by SESSION_B causing deadlock.
SESSION_A >

After you get the ORA-00060 error the statement update child set childid = 1001 where childid = 200; is rolled back.. but SESSION_B is still waiting for the lock on the Parent table to be released.

So when your application get the ORA-00060 or any deadlock exception in any other database, explicitly rollback your transaction (not just the current statement) so that all the changes made in the transaction and all the locks held by the transaction are released.

March 26, 2009

Presentation on Database Refactoring

My Presentation on Database Refactoring at QCon was recorded and is live now on infoQ here

February 3, 2009

Considerations when using stored procedures for doing CRUD.

Some environments like to have access to the database tables routed via stored procedures.
Instead of using Create/Read/Update/Delete (CRUD) with DML, stored procedures are invoked with the parameters to perform the required operation. I'm not arguing about the benefits/pitfalls of this approach, if you have to do stored procedures, here are some things to look at.

1. Make the stored procedure handle one object/table only and not multiple objects or tables.
2. Do not commit open transactions inside the stored procedures.
3. Do not do business logic in stored procedures.
4. If they are straight CRUD stored procedures, see if you can you generate the stored procedure code using some metadata?
5. Make sure creation and execution of the stored procedures is part of your Continuous Integration build and developer build.
6. Make sure stored procedures (or the metadata used to generate them) is under Version Control, have seen many problems when the stored procedure version does not match application code version
7. Develop against the production stack database.
8. Make sure exceptions thrown by the database are passed back to the application.

September 30, 2008

Using Explicit Order By in your SQL Statements

Recently when our test databases where upgraded new version of Oracle, we started noticing that the order in which some drop down lists were being displayed was not correct. It turns out that the SELECT statement we had, did not have a ORDER BY clause and the data was being returned in the ORDER of the creation of rows (in the order of ROWID) when the database was upgraded these ROWID's got changed and hence the ORDER of the data being shown in the drop down lists.

Lesson learnt: have a EXPLICIT ORDER BY in every SQL that provides data to be shown on the screen. In other words do not rely on the order of the result set provided by the current version of the database server. if you prefer a certain order make that choice explicit in the SQL SELECT statement.

July 15, 2008

Create a Index for every Foreign Key constraint created

When creating a Foreign Key constraint on the database as shown below

ALTER TABLE BOOK ADD 
    (CONSTRAINT FK_BOOK_AUTHOR FOREIGN KEY (AUTHORID) 
     REFERENCES AUTHOR)
/

In the above example we are telling the database to check if the BOOK.AUTHORID is a valid value in the Author.AuthorID. When the Author table is being changed, the database does data verification on the BOOK table using SELECT against the BOOK table for the AUTHORID some thing like this

SELECT count(*) FROM BOOK WHERE AUTHORID = nnnn

Basically the database server is trying to check if it has children rows for the row that just changed (inserted or deleted). While doing this if there is not index on BOOK.AUTHORID, the database will have to scan the whole table which is slow. Hence when creating a Foreign Key constraint, remember to create a corresponding INDEX on the table, so that the performance does not degrade, or when observing slow performance on a database after you put in Foreign Key constraints. Make sure to look for Indexes on the columns that are constrained.

June 6, 2008

Version Control your work..

So we version control/source control everything on our project.. code/data/artifacts/diagrams etc. yesterday I said why not extend it to my writings to everything I have. So I started this long journey of refactoring my folder layout and making a nice folder structure to hold all the things I have written about have other artifacts in the process of writing and moved them all to subversion, now all my example code and writings are all under version control that gets backed up everyday.... feels liberating

November 9, 2006

Database Testing revisited

Some time ago I wrote about what it means to do database testing.. more I think about this and having had some strange situations recently I want to add more to the list of things we should be testing.

Persistence Layer. We should persist the objects to the database using the applications persistence layer and retrieve the objects using the same mechanism and test that we get the same object back. If we have a lot of business logic in out persistence layer we may also want to retrieve the object using Direct SQL and test that the correct values got persisted.

Database Structure. The application assumes a certain database structure and objects to be around for it to function properly and we want to make sure that all the database objects we depend on are there and are defined as we expect them to be, for example if we need a Auto Incrementing column on a table, then we should test that the database has such a column using the database metadata. In other situations you are expecting the Customer.Name column to be 128 characters wide and the database does not have the Customer.Name column at 128 characters, these kinds of errors can be found out using the database metadata.

Database Code Stored procedures need to be tested, using unit testing frameworks like utPlSql or whatever is equivalent to your situation

Database Objects Views are database objects that the application will usually depend on and the views have business logic in them. We need to test these views and you can use your application code to test them or you can use the database unit testing frame work. One I have found is easy to do is to test them using JUnit (or whatever that works for). Setup the correct data in your tables and use the view to retrieve the data and test of the view returns the correct rows for the data in the database. These kinds of test could get really complicated if you have no control over what data is present in the database. DBUnit can help here by providing re-loadable data.

May 23, 2006

Move your DBAs to the Project team locations

Many IT organizations I have seen have groups of specialists, typical are UNIX Group, DBA group etc.

When a project starts the developers on the project have to meet with all the Groups (I have more experience with the DBA group, so I write with the DBA group in mind) that they need to interact with and explain to the groups their design, the projects operational needs and other requirements, later when development starts they have to email these groups about all the setup that needs to be done and also the day to day changes that are needed. This way of working slows down the productivity of the team and the organization.

What I have found works best is when one of the member (this member could/should be rotated) of the DBA group is placed on the development team. When that happens, the team has a expert on hand they can ask him all kinds of questions, run different scenarios, design options by him, the member of the DBA group will also understand in depth what is happening on the team, what are their needs and how to tackle them, it also makes use of his link back to the group (since he is part of the DBA group, the DBA group is more likely to not ask any questions of the requirements/issues raised by him), since he knows all the members of his group, he can communicate better with the DBA group and make the tasks easier for the development team and also make it easier on the DBA group to execute the tasks expected of them. DBA's will also learn a lot from this, since now they are not sitting behind the email wall and guessing about the needs of the teams.

May 1, 2006

Automated Tablespace deployment

In development mode you don't want to worry about which table goes into what Tablespace in production as it complicates development environments. The production DBA's want to have their input and control over deciding what table goes into what Tablespace. To allow for this I used a mapping scheme as shown below.

Lets assume we have 3 tables in our system Customer, CustomerOrder, OrderStatus. Where we are expecting Customer table to have large numbers of rows and CustomerOrder to have significanly large number of rows while OrderStatus would have few rows and not change as much. In development environments all these tables and their indexes will be put under the same tablespace. In production like environments we want to put them into seperate tablespaces.

So we had 3 types of tablespaces Small, Medium and Large to hold tables of Small, Medium and Large Sizes. We also had 3 tablespaces SmallIndex, MediumIndex and LargeIndex to hold the tables Indexes. The production DBA provided a MAP of tables and tablespace names that the table needed to reside in, when we created the tables and their indexes in production like environements we created all the objects in the same tablespace and later on a pl/sql script moved all the tables into their respective tablespaces using the MAP provided by the production DBA's.

Now lets look at the code. The tablespace MAP looks as below, where we map a table to the tablespace where the table needs to reside. By naming convention we can also move all the indexes of the table. For example the indexes of OrderStatus table will be created in tablespace named as SmallIndex

Table_NameTablespace_Name
ORDERSTATUSSmall
CUSTOMERMedium
CUSTOMERORDERLarge

The pl/sql code to check if all the tables are mapped, if all tables are not mapped then we get a exception.

DECLARE
non_mapped_tables number;
BEGIN
 SELECT count(*) INTO non_mapped_tables FROM user_tables 
 WHERE table_name
    NOT IN (SELECT table_name FROM tablespace_map)
    AND table_name != 'TABLESPACE_MAP';
  IF non_mapped_tables != 0 THEN
    THROW_ERROR(non_mapped_tables,' All table/s have not been mapped to TABLESPACES');
  END IF;
END;
/

Now we are ready to move the tables and their indexes to their respective tablespaces.

DECLARE
CURSOR tabs IS
  SELECT 'ALTER TABLE ' || segment_name || ' MOVE tablespace ' stmt,m.tablespace_name
  FROM user_segments u,tablespace_map m
  WHERE
    u.segment_type = 'TABLE'
    AND u.segment_name = m.table_name
    ORDER BY u.segment_name;
CURSOR idx IS
  SELECT 'ALTER INDEX '|| segment_name || ' REBUILD tablespace ' stmt,m.tablespace_name
  FROM user_segments u,user_indexes i,tablespace_map m
  WHERE
  u.segment_type = 'INDEX'
  AND u.segment_name = i.index_name
  AND i.table_name=m.table_name
  ORDER BY u.segment_name;
BEGIN
--Move all the tables
  FOR tabsrec IN tabs LOOP
    EXECUTE IMMEDIATE tabsrec.stmt||tabsrec.tablespace_name;
  END LOOP;
--Move all the indexes
  FOR idxrec IN idx LOOP
    EXECUTE IMMEDIATE idxrec.stmt||idxrec.tablespace_name;
  END LOOP;
END;
/

Deployments done this way will sheild the development team worrying about production database specifics and allows the production DBA's to fine tune the production deployment.

February 8, 2006

To allow NULLs or NOT

In any greenfield application (existing production application is a topic for
another post). When you design table(s) lets say you have Item and Manufacturer
table as shown below

CREATE TABLE Item
(ItemID NUMBER NOT NULL,
ManufacturerID NUMBER,
Name VARCHAR2(128),
Rate NUMBER,
CONSTRAINT PK_ITEM
PRIMARY KEY (ItemID)
);

CREATE TABLE Manufacturer
(ManufacturerID NUMBER NOT NULL,
Name VARCAHR2(128),
CONSTRAINT PK_MANUFACTURER
PRIMARY KEY (ManufacturerID)
);

Databases I have seen tend to not constrain the data in the database. For example make the Item.ManufacturerID non-nullable and make it a foreign key to the Manufacturer
table. Similarly Manufacturer.Name and Item.Rate as non-nullable columns.

For now let's talk about NOT NULL constraints. Many say they don't make columns
non-nullable, because they don't know the requirements at design time, or they don't want their tests to create elaborate sets of data, or that the application has the constraint then why enforce the constraint on the database?

When this design goes into production, bad data gets into the database over time. Since databases are used as integration points, users or other applications put data in to the database without using the application, you don't have data entry screens for many setup kind of tables so data has to be put into the database without using the application. When this happens your data in no longer protected by the constraints on the application side and you are forced to counter the effects of bad data with more application changes like making joins with child tables as outer joins.

SELECT item.Name,Manufacturer.Name FROM Item, Manufacturer
WHERE Item.ManufacturerID = Manufacturer.ManufacturerID(+)

Having null checks in the application code like

if (item.getManufacturer() != null) {
return item.getManufacturer().getName();
}

You also have the very difficult task of fixing data later on. Like what Item should have what ManufacturerID?

So why not make the columns non-nullable to begin with. If you don't have the requirements when desiging the tables, then you should make them non-nullable when the requirement becomes clear later on or when you start enforcing constraints on the application side. Unit tests can be changed or made to work with not-null constraints on the database.

Making good decisions about Data Quality in your applications database design will better serve you in the long term maintainability of the application.