Recently in Agile DBA Category

Setup and Teardown of database during testing

When doing Performance Testing or running Unit/Functional tests on a database, there is a need to periodically get the database to a known state, so that the tests behave in a predictable way and to get rid of all the data created by the tests. Some of the ways to get a clean database are.

Using Scripts: Recreate the database using scripts, the same scripts that are used in development environment.

Using DB Backup: Especially when the database (and the data set) is large (using the scripts approach above will be slow) is to make a backup of the database in its pristine state and then run the tests, once the tests are done running, restore the database with the backup that was done before the tests corrupted the data.

Using Virtual Machine: The DB backup approach can be improved by using Virtual Machine (VM). Setup a VM and run the database server inside the VM, get the database and data so that the tests can run. Now make a image of the VM and run the tests, when the tests are done all that needs to be done is to restore the image of the VM.

Exprerience using DBDeploy on my project

We have been using DBDeploy on my project for more than 6 months now and wanted to show how things are going. First lets talk about set up, we are using dbdeploy in our Java development environment with ANT as our build scripting tool, against a Oracle 10g database.

Define the ANT task first

<taskdef name="dbdeploy" classname="net.sf.dbdeploy.AntTarget" classpath="lib/dbdeploy.jar"/>;

Now we create the main dbinitialize task a ANT task to create you database schema, using the upgrade generated by the dbdeploy file shown below. The thing to note is that dbdeploy generates the upgrade file but does not run it against your database, so we have to make sure we call the generated upgrade file via a sql ANT task.

<target name="dbinit" depends="init,dbclean">
    <echo message="Working UserName: ${db.user}"/>
    <mkdir dir="${migrationfolder}"/>
    <dbdeploy driver="${driver.name}" url="${db.url}"
              userid="${db.user}" password="${db.password}" deltaset="couger"
              dir="db/migration" outputfile="${migrationfolder}/upgrade.sql"
              dbms="ora" undoOutputfile="${migrationfolder}/undo.sql"/>
    <!--Now run the generate upgrade file -->
    <sql password="${db.password}" userid="${db.user}" url="${db.url}" driver="${driver.name}"
         classpath="${driver.classpath}" onerror="abort">
        <fileset includes="upgrade.sql" dir="${migrationfolder}"/>
    </sql>
</target>

So lets say we want to write the first migration (migration or delta is the same in this context) we will create a new file in the db/migration folder named as 001_CreateCustomerTable.sql the 001 is just a number to sequence the migrations, dbdeploy only cares that the numbers increment and CreateCustomerTable is used to give it a meaningful name, you can name the migration as 1.sql but thats not meaningful is it? neither does the name really say what its doing. When we are done writing the
migration and confirm that it works locally, we check in the file (we used subversion). When <a href="http://cruisecontrol.sourceforge.net/">CruiseControl</a> build was done we also published all the
migrations from db/migration folder on CruiseControl artifacts page using the onsuccess event

<publishers>
    <onsuccess>
            ...                
            <artifactspublisher dest="artifacts/${project.name}" dir="projects/${project.name}/db/migration"/>
    </onsuccess>
</publishers>

Publishing the migrations allowed us to know what all migrations are needed for this particular build. we have deployed to production twice already and have found this process to be smooth, doing migrations has allowed us to test our migrations hundreds of times and also test them against a copy of the production database before hand so that we can tune the migrations for performance if needed. Delivering the migrations to the client is also easy since its pure SQL that the client DBA's can look at and be comfortable about the migration/upgrade of their database.

Parsing mapping files for usage information

Currently working on a legacy application, thats been in production for a long time now. I wanted to find out what are the Tables and Columns being used by the application. Since we could see that some table columns where not being used.
We are using a Object Relational mapping framework on the project, so we decided to write some code that would parse all the mapping files and gives us a list of table names and columns. We used this list to create rows in a table with two columns tablename and columnname. Once the table had this data, we just ran one more SELECT against the metadata of the database and our table which pretty much gave us a list of Table and Columns that we are not using

The SQL we used to get the Tables and Columns not used, from Oracles metadata was

SELECT table_name,column_name FROM user_tab_columns
MINUS
SELECT usedTableName, usedColumnName FROM usedTableColumns

I thought that was a pretty easy way to find out all the tables, columns used by the application and not to have to do manual analysis

Long Running Data Migrations during Database Refactorings.

When you are refactoring large databases, you will have certain tables that have millions of rows, so lets say we are doing the Move Column refactoring, moving the TaxAmount column from Charge table which has millions of rows to TaxCharge table. Create the TaxAmount column in the TaxCharge table. Then have to move the data from the TaxAmount column in the Charge table to the TaxAmount column you created in the TaxCharge table.

One way to move the data would be to write a single update statement as shown below.

UPDATE taxCharge set taxAmount=
(SELECT taxAmount FROM charge
WHERE charge.chargeid = taxCharge.chargeID);

Now this update will run for some time and will be run as one transaction, the Update operation will need a lot of UNDO space (UNDO space in oracle other databases may need other types of space, but the general idea is some space is needed by the database server ), so there is a probability of the update failing if the database server runs out of UNDO space, another side effect of moving large amounts of data using a single update (transaction) is that all the updated rows in the TaxCharge table will be locked till the whole update transaction is done and a commit is issued.

Getting around this problem is to use a programmatic update, like the PL/SQL code shown below. This code can be implemented in the database procedural code like TSQL or any other language appropriate for the task.

DECLARE
CURSOR allCharges IS
    SELECT chargeId, taxAmount FROM charge;
numberOfRowsToCommit NUMBER :=1000;
BEGIN
FOR thisCharge IN allCharges LOOP
    UPDATE taxCharge SET taxamount=thisCharge.taxAmount 
        WHERE chargeid = thisCharge.chargeid;
    IF allCharges%rowcount MOD numberOfRowsToCommit = 0 THEN
        COMMIT;
    END IF;
END LOOP;
END;
/

The above PL/SQL code basically loops through all the rows in the Charge table and updates the TaxCharge table with the taxAmount and commits after every 1000 rows (as defined by numberOfRowsToCommit). This approach allows us to work with a smaller UNDO size, releases (not lock) rows once they are committed after every 1000 rows. Off course there are many variations of this technique and many ways to implement this particular update. The point I'm trying to get to is that, depending on the situation, I will choose different ways to update the data and maybe not always use UPDATE statements.

Moscow

Last week I was at SD Best Practices in Moscow, doing a presentation on "Refactoring Databases: Evolutionary Database Design". Moscow seems like a interesting place, loads of huge buildings, squares, fountains and roads. Things some how feel rundown, feels like a player trying to regain his former ability or glory.

Opening Keynote by Jim McCarthy about how teams should operate was interesting, he proposed 11 principals or protocols as he calls them, to be followed by members in a team so that the team becomes more productive, many of these protocols are about avoiding waste and promoting clear communication channels.

While doing my presentation I had live translators which I experienced for the very first time, I also had a hard time trying to understand questions from the audience, since there was no reverse translator who would translate from Russian to English. The audience would try and ask the questions in English which would invariably get confusing.Anyway I think I had a good time meeting all the folks out in Moscow and was pleasantly surprised to find out that ThoughtWorks has a following in Russia.

Database Testing

What does it mean to test your Database? usually when someone mentions database testing, what is that they want to test. The application code that interacts with the database, or the sql code the resides in the database like stored procedures and triggers etc. I see all these aspects to database testing as important.

Testing the applications persistence mechanism We should test that the application persists what its supposed to save and retirive the data using SQL and see if the database contains the same information that is being saved, this kind of testing makes sense when the application has complex persistence layer. This type of testing can be achived using unit tests, functional tests etc.

Testing the Database Code The database code like Views, Stored procedures etc, contains business logic that you want tested, since a lot of your application code will depend on it since database code is the API that the database provides for the application to use. You can use your application unit tests(or other test frameworks that your are using) to test the database API. You can achive this using unit tests that actually hit the database, these tests will be slow, but having some tests is better than nothing. You can also use unit testing frameworks for the database code like Ounit, PL/SQL Unit etc (more about how to use these in later posts).

Refactoring Data

Many a times Refactoring is talked about in the context of code, recently I finished working with Scott Ambler on Database Refactoring

Lately I have been working on changing data in an production database, and have been wondering how do I define it, Data Refactoring? what are the patterns of Data Refactoring. First let me talk about what I mean by Data Refactoring.

When a given application goes into production, and starts life as a live application we find bugs with the application, these bugs create a weird data in the database, also with the way people change data through the app and some times through the database (yikes) and these data changes do lead to bad data. How do you go about fixing these data problems, are there patterns to these fixes.

Well as I work through this I think I will have some thing to report on or maybe its such a situational problem that maybe there are no patterns to this.

All Content Copyright © 2012 Pramod Sadalage. All Rights Reserved.