Recently in Best Practices Category

With so much pain, why are stored procedures used so much

I keep encountering situations where all the business logic for the applications is in stored procedures and the application layer is just calling the stored procedures to get the work done and return the data. There are many problems with this approach some of them are.
  • Writing stored procedure code is fraught with danger as there are no modern IDE's that support refactoring, provide code smells like "variable not used", "variable out of scope".
  • Finding usages of a given stored procedure or function usually means doing a text search of the whole code base for the name of the function or stored procedure, so refactoring to change name is painful, which means names that do not make any sense are propagated, causing pain and loss of developer productivity
  • When coding of stored procedures is done, you need a database to compile the code, this usually means a large database install on your desktop or laptop the other option being to connect to the central database server, again this leads to developers having to carry a lot of dependent systems just to compile their code, this can to solved by database vendors providing a way to compile the code outside of the database.
  • Code complexity tools, PMD metrics, Checkstyle etc type of tools are very rare to find for stored procedures, thus making the visualization of metrics around the stored procedure code almost impossible or very hard
  • Unit testing stored procedures using *Unit testing frameworks out there like pl/sql unit, ounit, tsql unit is hard, since these frameworks need to be run inside the database and integrating them with Continuous Integration further exasperates the problems
  • Order or creation of stored procedures becomes important as you start creating lots of stored procedures and they become interdependent. While creating them in a brand new database, there are false notifications thrown around about missing stored procedures, usually to get around this problem, I have seen a master list of ordered stored procedures for creation maintained by the team or just recompile all stored procedures once they are created "ALTER RECOMPILE" was built for this. Both of these solutions have their own overhead.
  • While running CPU intensive stored procedures, the database engine is the only machine (like JVM) available for the code to run, so if you want to start more processes so that we can handle more requests, its not possible without a database engine. So the only solution left is to get a bigger box (Vertical Scaling)
There certainly are lots of other problems associated with using stored procedures, which I will not get into.

Testing in conversion projects

When working on projects involving Conversion of data or Migration/Moving of data from a legacy database. The testing effort is enormous and testing takes a lot of time, some test automation can help this effort.

Since data is moved/changed from a source database to destination database, we can write sql which should provide results for the types of tests you want to perform, for example: write a sql to give us number of customers, write a sql to give us account balance for a specific account.

These sqls can be run on your source database as well as your destination database and the results can be compared programmatically, providing us an easy way to compare the state of the database before and after conversion/migration. This testing can be run through a CI engine to make it a regression test suite.

Here is an example implementation using ruby,

We have two databases SOURCE and DESTINATION and two sql files names source.sql and destination.sql. The ruby program picks up sql from these two files and runs them against their database i.e. sql from source.sql is run against the SOURCE database and sql from destination.sql is run against DESTINATION database. The results of both of those sqls is compared and an failure is raised when the results do not match.

 results
  statement = get_sql_statement_to_execute
    begin
      source_statement = statement[0]
      destination_statement = statement[1]
      source_rows = exec_sql_in_source_return_rows(source_statement)
      destination_rows = exec_sql_in_destination_return_rows(destination_statement)
      result = compare_rows(source_rows, destination_rows, destination_statement, source_statement)
      results << result
    rescue
      Log.log("Could not process: "+statement)
    end
    if (results.size > 0)
      Log.log("Results do not match in source and destination")
    end

The sample ruby code above shows how the solution can be implemented, thus enabling automation of database conversion/migration testing

Explicitly rollback when you encounter a deadlock.

Dead lock is caused in the database when you have resources (connections) waiting for other connections to release locks on the rows that are needed by the session, resulting in all session being blocked. Oracle automatically detects deadlocks are resolves the deadlock by rolling back the statement in the transaction that detected the deadlock. Thing to remember is that last statement is rolled back and not the whole transaction, which means that if you had other modifications, those rows are still locked and the application should make sure that it does a explicit rollback on the connection.

For example.
Lets assume there are two tables Parent(ParentID) and Child(ChildID)

SESSION_A >create table parent (parentId number(10));
Table created.
SESSION_A >create table child (childId number(10));
Table created.
SESSION_A >insert into parent values (100);
1 row created.
SESSION_A >insert into child values (200);
1 row created.
SESSION_A >commit;
Commit complete.
SESSION_A >select * from parent;
  PARENTID
----------
       100

SESSION_A >select * from child;
CHILDID
----------
200
SESSION_A >

Now lets create a situation where a deadlock happens. There are two sessions connected to the same database and same user, SESSION_A and SESSION_B are the two sessions in question.

SESSION_A >update parent set parentid = 1000 where parentid=100;
1 row updated.
SESSION_B >update child set childid = 2000 where childid = 200;
1 row updated.
SESSION_B >update parent set parentid = 2001 where parentid=100;
--Waiting For Lock on Row in Parent Table, held by SESSION_A
SESSION_A >update child set childid = 1001 where childid = 200;
update child set childid = 1001 where childid = 200
       *
ERROR at line 1:
ORA-00060: deadlock detected while waiting for resource
--SESSION_A requesting lock on row, held by SESSION_B causing deadlock.
SESSION_A >

After you get the ORA-00060 error the statement update child set childid = 1001 where childid = 200; is rolled back.. but SESSION_B is still waiting for the lock on the Parent table to be released.

So when your application get the ORA-00060 or any deadlock exception in any other database, explicitly rollback your transaction (not just the current statement) so that all the changes made in the transaction and all the locks held by the transaction are released.

Presentation on Database Refactoring

My Presentation on Database Refactoring at QCon was recorded and is live now on infoQ here

Considerations when using stored procedures for doing CRUD.

Some environments like to have access to the database tables routed via stored procedures.
Instead of using Create/Read/Update/Delete (CRUD) with DML, stored procedures are invoked with the parameters to perform the required operation. I'm not arguing about the benefits/pitfalls of this approach, if you have to do stored procedures, here are some things to look at.

1. Make the stored procedure handle one object/table only and not multiple objects or tables.
2. Do not commit open transactions inside the stored procedures.
3. Do not do business logic in stored procedures.
4. If they are straight CRUD stored procedures, see if you can you generate the stored procedure code using some metadata?
5. Make sure creation and execution of the stored procedures is part of your Continuous Integration build and developer build.
6. Make sure stored procedures (or the metadata used to generate them) is under Version Control, have seen many problems when the stored procedure version does not match application code version
7. Develop against the production stack database.
8. Make sure exceptions thrown by the database are passed back to the application.

Using Explicit Order By in your SQL Statements

Recently when our test databases where upgraded new version of Oracle, we started noticing that the order in which some drop down lists were being displayed was not correct. It turns out that the SELECT statement we had, did not have a ORDER BY clause and the data was being returned in the ORDER of the creation of rows (in the order of ROWID) when the database was upgraded these ROWID's got changed and hence the ORDER of the data being shown in the drop down lists.

Lesson learnt: have a EXPLICIT ORDER BY in every SQL that provides data to be shown on the screen. In other words do not rely on the order of the result set provided by the current version of the database server. if you prefer a certain order make that choice explicit in the SQL SELECT statement.

Create a Index for every Foreign Key constraint created

When creating a Foreign Key constraint on the database as shown below

ALTER TABLE BOOK ADD 
    (CONSTRAINT FK_BOOK_AUTHOR FOREIGN KEY (AUTHORID) 
     REFERENCES AUTHOR)
/

In the above example we are telling the database to check if the BOOK.AUTHORID is a valid value in the Author.AuthorID. When the Author table is being changed, the database does data verification on the BOOK table using SELECT against the BOOK table for the AUTHORID some thing like this

SELECT count(*) FROM BOOK WHERE AUTHORID = nnnn

Basically the database server is trying to check if it has children rows for the row that just changed (inserted or deleted). While doing this if there is not index on BOOK.AUTHORID, the database will have to scan the whole table which is slow. Hence when creating a Foreign Key constraint, remember to create a corresponding INDEX on the table, so that the performance does not degrade, or when observing slow performance on a database after you put in Foreign Key constraints. Make sure to look for Indexes on the columns that are constrained.

Version Control your work..

So we version control/source control everything on our project.. code/data/artifacts/diagrams etc. yesterday I said why not extend it to my writings to everything I have. So I started this long journey of refactoring my folder layout and making a nice folder structure to hold all the things I have written about have other artifacts in the process of writing and moved them all to subversion, now all my example code and writings are all under version control that gets backed up everyday.... feels liberating

Database Testing revisited

Some time ago I wrote about what it means to do database testing.. more I think about this and having had some strange situations recently I want to add more to the list of things we should be testing.

Persistence Layer. We should persist the objects to the database using the applications persistence layer and retrieve the objects using the same mechanism and test that we get the same object back. If we have a lot of business logic in out persistence layer we may also want to retrieve the object using Direct SQL and test that the correct values got persisted.

Database Structure. The application assumes a certain database structure and objects to be around for it to function properly and we want to make sure that all the database objects we depend on are there and are defined as we expect them to be, for example if we need a Auto Incrementing column on a table, then we should test that the database has such a column using the database metadata. In other situations you are expecting the Customer.Name column to be 128 characters wide and the database does not have the Customer.Name column at 128 characters, these kinds of errors can be found out using the database metadata.

Database Code Stored procedures need to be tested, using unit testing frameworks like utPlSql or whatever is equivalent to your situation

Database Objects Views are database objects that the application will usually depend on and the views have business logic in them. We need to test these views and you can use your application code to test them or you can use the database unit testing frame work. One I have found is easy to do is to test them using JUnit (or whatever that works for). Setup the correct data in your tables and use the view to retrieve the data and test of the view returns the correct rows for the data in the database. These kinds of test could get really complicated if you have no control over what data is present in the database. DBUnit can help here by providing re-loadable data.

Move your DBAs to the Project team locations

Many IT organizations I have seen have groups of specialists, typical are UNIX Group, DBA group etc.

When a project starts the developers on the project have to meet with all the Groups (I have more experience with the DBA group, so I write with the DBA group in mind) that they need to interact with and explain to the groups their design, the projects operational needs and other requirements, later when development starts they have to email these groups about all the setup that needs to be done and also the day to day changes that are needed. This way of working slows down the productivity of the team and the organization.

What I have found works best is when one of the member (this member could/should be rotated) of the DBA group is placed on the development team. When that happens, the team has a expert on hand they can ask him all kinds of questions, run different scenarios, design options by him, the member of the DBA group will also understand in depth what is happening on the team, what are their needs and how to tackle them, it also makes use of his link back to the group (since he is part of the DBA group, the DBA group is more likely to not ask any questions of the requirements/issues raised by him), since he knows all the members of his group, he can communicate better with the DBA group and make the tasks easier for the development team and also make it easier on the DBA group to execute the tasks expected of them. DBA's will also learn a lot from this, since now they are not sitting behind the email wall and guessing about the needs of the teams.

All Content Copyright © 2012 Pramod Sadalage. All Rights Reserved.