<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title>Agile DBA</title>
    <link rel="alternate" type="text/html" href="http://www.sadalage.com/" />
    <link rel="self" type="application/atom+xml" href="http://www.sadalage.com/atom.xml" />
    <id>tag:www.sadalage.com,2010-04-13://2</id>
    <updated>2012-04-22T17:40:31Z</updated>
    <subtitle>My thoughts on evolutionary design in regards to databases. Database administration. Best Practices, NoSQL data stores, Database utilities and other things software</subtitle>
    <generator uri="http://www.sixapart.com/movabletype/">Movable Type 4.3-en</generator>

<entry>
    <title>MSSQL JDBC Driver behavior</title>
    <link rel="alternate" type="text/html" href="http://www.sadalage.com/2012/04/mssql-jdbc-driver-behavior.html" />
    <id>tag:www.sadalage.com,2012://2.71</id>

    <published>2012-04-22T16:57:27Z</published>
    <updated>2012-04-22T17:40:31Z</updated>

    <summary>My latest project involves talking to MS-SQL Server using the JDBC driver and Java. While doing this we setup the database connection and had a simple SQL to get the firstname and lastname for a unique userid from the applicationuser...</summary>
    <author>
        <name>Pramod Sadalage</name>
        
    </author>
    
        <category term="JDBC" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="development" label="development" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="jdbc" label="jdbc" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.sadalage.com/">
        <![CDATA[<p>My latest project involves talking to MS-SQL Server using the JDBC driver and Java. While doing this we setup the database connection and had a simple SQL to get the <strong>first<em>name</strong> and <strong>last</em>name</strong> for a unique <strong>user<em>id</strong> from the <strong>application</em>user</strong> table in the database.</p>

<pre class="codeexample">
SELECT first_name,last_name 
FROM application_user 
WHERE user_id = ?
</pre>

<p>Given the above SQL, we did not think too much about performance as the <strong>user_id</strong> was indexed. The java code as below was used to run the SQL.</p>

<pre class="codeexample">
Connection conn = db.conn(DATABASE_URL);
try {
    PreparedStatement stmt = prepare(conn,
            "SELECT first_name, last_name " +
            "FROM application_user " +
            "WHERE user_id = ?");
    stmt.setString(1, username);
    ResultSet resultSet = stmt.executeQuery();
    return extractResults(resultSet);
} catch (SQLException e) {
    e.printStackTrace();
}
return null;
</pre>

<p>When writing integration tests we started noticing that the SQL was taking about 6 seconds to execute. The same SQL would execute inside 100 milliseconds on the MSSQL query analyzer. The friendly DBA&#8217;s on our team pointed out that the SQL was doing some data type conversion as the user_id field was of the type VARCHAR but the SQL sent by the the JDBC driver set the data type to NVARCHAR because of this the index was not being used and the SQL took more than 6 seconds to execute. Researching this topic further we decided to cast the variable to VARCHAR as shown below.</p>

<pre class="codeexample">
Connection conn = db.conn(DATABASE_URL);
try {
    PreparedStatement stmt = prepare(conn,
            "SELECT first_name, last_name " +
            "FROM application_user " +
            "WHERE user_id = cast(? AS VARCHAR");
    stmt.setString(1, username);
    ResultSet resultSet = stmt.executeQuery();
    return extractResults(resultSet);
} catch (SQLException e) {
    e.printStackTrace();
}
return null;
</pre>

<p>The above code executed under 100milliseconds and showed us that the data types being used did not match the datatype in the database. We later found out that the MS-SQL JDBC driver does this to properly deal with unicode characters. This behavior can be turned off using the <strong>sendStringParametersAsUnicode</strong> flag on the database connection. Once this flag is set to false on the connection, then all the SQL we issue do not need the cast</p>

<pre class="codeexample">
Connection conn = db.conn(DATABASE_URL
                + ";sendStringParametersAsUnicode=false");
try {
    PreparedStatement stmt = prepare(conn,
            "SELECT first_name, last_name " +
            "FROM application_user " +
            "WHERE user_id = ? ");
    stmt.setString(1, username);
    ResultSet resultSet = stmt.executeQuery();
    return extractResults(resultSet);
} catch (SQLException e) {
    e.printStackTrace();
}
return null;
</pre>

<p>Off course this only works if there is no unicode data in your database if there is any unicode data in the database, we will have to revert to casting individual SQL statements.</p>
]]>
        

    </content>
</entry>

<entry>
    <title>Back to blogging</title>
    <link rel="alternate" type="text/html" href="http://www.sadalage.com/2012/04/back-to-blogging.html" />
    <id>tag:www.sadalage.com,2012://2.70</id>

    <published>2012-04-22T16:46:06Z</published>
    <updated>2012-04-22T16:52:49Z</updated>

    <summary>There has been a long pause in my blogging activity. I was trying to finish of my latest writing engagement in regards to NoSQL. Working with Martin Fowler on NoSQL Distilled was really fun and will provide a concise text...</summary>
    <author>
        <name>Pramod Sadalage</name>
        
    </author>
    
        <category term="Broadcast" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="books" label="books" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="nosql" label="nosql" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.sadalage.com/">
        <![CDATA[<p>There has been a long pause in my blogging activity. I was trying to finish of my latest writing engagement in regards to NoSQL. Working with <a href="http://martinfowler.com">Martin Fowler</a> on NoSQL Distilled was really fun and will provide a concise text and easy way to understand for everyone the rise of the NoSQL movement and help with what kinds of trade-offs need to be made while working with NoSQL.<br />
The book should soon be in print and e-book formats. Martin has written more about it <a href="http://martinfowler.com/bliki/NosqlDistilled.html">here</a></p>]]>
        
    </content>
</entry>

<entry>
    <title>With so much pain, why are stored procedures used so much</title>
    <link rel="alternate" type="text/html" href="http://www.sadalage.com/2011/01/with-so-much-pain-why-do-we-st.html" />
    <id>tag:www.sadalage.com,2011://2.69</id>

    <published>2011-01-20T04:18:42Z</published>
    <updated>2011-01-20T04:39:16Z</updated>

    <summary>I keep encountering situations where all the business logic for the applications is in stored procedures and the application layer is just calling the stored procedures to get the work done and return the data. There are many problems with...</summary>
    <author>
        <name>Pramod Sadalage</name>
        
    </author>
    
        <category term="Best Practices" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Improving Design" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="agiledba" label="agile dba" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="storedprocedures" label="stored procedures" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="technicaldebt" label="technical debt" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.sadalage.com/">
        <![CDATA[I keep encountering situations where all the business logic for the
applications is in stored procedures and the application layer is just calling
the stored procedures to get the work done and return the data. There are many
problems with this approach some of them are.


<ul>
	<li>Writing stored procedure code is fraught with danger as there are no modern IDE's that support refactoring, provide code smells like "variable not used", "variable out of scope".</li>
</ul>
<ul>
	<li>Finding usages of a given stored procedure or function usually means doing a text search of the whole code base for the name of the function or stored procedure, so refactoring to change name is painful, which means names that do not make any sense are propagated, causing pain and loss of developer productivity</li>
</ul>
<ul>
	<li>When coding of stored procedures is done, you need a database to compile the code, this usually means a large database install on your desktop or laptop the other option being to connect to the central database server, again this leads to developers having to carry a lot of dependent systems just to compile their code, this can to solved by database vendors providing a way to compile the code outside of the database.</li>
</ul>
<ul>
	<li>Code complexity tools, PMD metrics, Checkstyle etc type of tools are very rare to find for stored procedures, thus making the visualization of metrics around the stored procedure code almost impossible or very hard</li>
</ul>

<ul>
	<li>Unit testing stored procedures using *Unit testing frameworks out there like pl/sql unit, ounit, tsql unit is hard, since these frameworks need to be run inside the database and integrating them with Continuous Integration further exasperates the problems</li>
</ul>
<ul>
	<li>Order or creation of stored procedures becomes important as you start creating lots of stored procedures and they become interdependent. While creating them in a brand new database, there are false notifications thrown around about missing stored procedures, usually to get around this problem, I have seen a master list of ordered stored procedures for creation maintained by the team or just recompile all stored procedures once they are created "ALTER <STORED PROCEDURE NAME> RECOMPILE" was built for this. Both of these solutions have their own overhead.</li>
</ul>

<ul>
	<li>While running CPU intensive stored procedures, the database engine is the only machine (like JVM) available for the code to run, so if you want to start more processes so that we can handle more requests, its not possible without a database engine. So the only solution left is to get a bigger box (Vertical Scaling)</li>
</ul>
There certainly are lots of other problems associated with using stored procedures, which I will not get into.]]>
        
    </content>
</entry>

<entry>
    <title>Replica sets in MongoDB</title>
    <link rel="alternate" type="text/html" href="http://www.sadalage.com/2010/10/replica-sets-in-mongodb.html" />
    <id>tag:www.sadalage.com,2010://2.68</id>

    <published>2010-10-31T20:29:30Z</published>
    <updated>2010-11-01T00:13:47Z</updated>

    <summary>Replica sets is a feature of MongoDB for Automatic Failover, in this setup there is a primary server and the rest are secondary servers. If the primary server goes down, the rest of the secondary servers choose a new primary...</summary>
    <author>
        <name>Pramod Sadalage</name>
        
    </author>
    
        <category term="MongoDB" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="NoSQL" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="failover" label="failover" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="mongodb" label="mongodb" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="replicasets" label="replica sets" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.sadalage.com/">
        <![CDATA[<p>Replica sets is a feature of MongoDB for Automatic Failover, in this setup there is a primary server and the rest are secondary servers. If the primary server goes down, the rest of the secondary servers choose a new primary via an election process, each server can also be assigned number of votes, so that you can decide the next primary based on data-center location, machine properties etc, you can also start mongo database processes that act only as election tie-breakers these are known as arbiters, these arbiters will never have data, but just act as agents that break the tie.</p>

<p>All operations are directed at the primary server, the primary server writes the operations to its operation log (also known as opslog), the secondary servers get updates from the primary server. The data is written to the primary server and later replicated to the other secondary servers, so when the write happens at the primary and before the write is replicated to the secondary servers, if the primary server goes down you will loose the data that was written to the primary but never replicated to the secondary servers, you can get around this by specifying how many servers should have the data, before the write is considered good</p>

<pre class="codeexample">
db.runCommand( { getlasterror : 1 , w : 3 } )
</pre>
in the above command, you are saying that the write to the database is considered good, only if the write has been propagated to at least 3 servers, off course doing this for every write is going to be very expensive, so you should batch all your writes for a user action and then issue getlasterror

<p>This is how you start the the mongod servers, in a replica set, the can run on any machine any port, as long as they can talk to each other over the network and all of them have the same "--replSet" parameter, in the example below its "prod"</p>

<pre class="codeexample">
mongod --replSet prod --port 27017 --dbpath /data/node1 
mongod --replSet prod --port 27027 --dbpath /data/node2 
mongod --replSet prod --port 27037 --dbpath /data/node3 
</pre>

<p>Once the three servers are up, you have to create a replica configuration as shown below, if you use localhost as a server name, then all the members of the replica set have to be on localhost, if the mongo servers are on different servers, you should use distinct machine names and not localhost for anyone of them, once the replica config is defined, you then initiate the replica using the configuration as shown below</p>

<pre class="codeexample">
replica_config = {_id: 'prod', members: [
                          {_id: 0, host: 'localhost:27017'},
                          {_id: 1, host: 'localhost:27027'},
                          {_id: 2, host: 'localhost:27037'}]}
#Now initiate the replica_config
rs.initiate(replica_config);
</pre>
When you are connecting to a replica set, you have to connect to atleast one server which is alive, using the ruby driver you can connect to more than one server using the "multi" method, one part you should be careful about is, lets say you define all the servers in the replica set as your connection string, but one of the members of the replica set is down, you will get connection failures, so the best thing to do is give members of the replica set that are up and the drivers will discover the other servers when they come online or go offline. Here is a sample ruby program to find a doc in a loop.

<pre class="codeexample">
#!/usr/bin/env ruby
require 'mongo'
begin
  @connection = Mongo::Connection.multi([
                                       ['localhost',27017],
                                       ['localhost',27027],
                                       ['localhost',27037]])
  @collection = @connection.db("sales").collection("products")
  product = { "name" => "Refactoring", 
              "code" => "023XX3",
              "type" => "book", 
              "in_stock" => 100}
  @collection.insert(product)
  100.times do
    sleep 0.5
    begin
  	  product = @collection.find_one "code" => "023XX3"
  	  puts "Found Book: "+product["name"]
  	rescue Exception => e
    	puts e.message
    	next
  	end
  end
end
</pre>
While the ruby program is running, you can kill the current primary and you will see that the program gets connection exceptions, while the replica set is figuring out the next master, once the next master is picked, the program starts going about its way finding the same data from the newly elected primary, here is a screen cast of the replica sets in action. <a href="http://sadalage.com/screencast/replicaset/">Replica Sets screencast</a> ]]>
        
    </content>
</entry>

<entry>
    <title>Schema less databases and its ramifications.</title>
    <link rel="alternate" type="text/html" href="http://www.sadalage.com/2010/10/schema-less-databases-and-its.html" />
    <id>tag:www.sadalage.com,2010://2.67</id>

    <published>2010-10-12T14:13:31Z</published>
    <updated>2010-11-28T22:55:03Z</updated>

    <summary>In the No-SQL land schema-less is a power full feature that is advertised a lot, schema-less basically means you don&apos;t have to worry about column names and table names in a traditional sense, if you want to change the column...</summary>
    <author>
        <name>Pramod Sadalage</name>
        
    </author>
    
        <category term="MongoDB" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="NoSQL" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="mongodb" label="mongodb" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="nosql" label="NoSQL" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="refactoring" label="refactoring" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="schemaless" label="Schema less" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.sadalage.com/">
        <![CDATA[<p>In the No-SQL land schema-less is a power full feature that is advertised a lot, schema-less basically means you don't have to worry about column names and table names in a traditional sense, if you want to change the column name you just start saving the data using the new column name Lets say you have a document database like mongoDB and you have JSON document as shown below.</p>

<pre class="codeexample">
{  "_id":"4bc9157e201f254d204226bf",
   "FIRST_NAME":"JOHN",
   "MIDDLE_NAME":"D",
   "LAST_NAME":"DOE",
   "CREATED":"2010-10-12"
}
</pre>

<p>You have some corresponding code to read the documents from the database and lets say you lots of data in the database in the order of millions of documents. If you want to change the name of some attributes or columns at this point and the new JSON would look like</p>

<pre class="codeexample">
{  "_id":"4bc9157e201f254d204226bf",
   "first_name":"JOHN",
   "middle_name":"D",
   "last_name":"DOE",
   "created":"2010-10-12"
}
</pre>

<p>You will have to either change every document in the database to match the new attribute names or you have to make sure you code can handle both types of attribute names like</p>

<pre class="codeexample">
   first_name = doc["first_name"] 
   first_name = doc["FIRST_NAME"] unless !first_name.nil? 
   middle_name = doc["middle_name"]  
   middle_name = doc["MIDDLE_NAME"] unless !middle_name.nil?
   last_name = doc["last_name"]
   last_name = doc["LAST_NAME"] unless !last_name.nil?
</pre>

<p>This attribute name change also affects the indexes created on mongoDB, since the attribute name change is not across all the documents, an Index created on</p>

<pre class="codeexample">
   db.people.ensureIndex({first_name:1})
</pre>

<p>will not index documents where the attribute name is FIRST_NAME, so you have to create another index for this new attribute name</p>

<pre class="codeexample">
   db.people.ensureIndex({FIRST_NAME:1})
</pre>

<p>As you can see this gets really complicated if you do multiple refactorings, over a period of time. So when you hear schema less make sure you understand the ramifications of refactoring the attribute names at will and its effect on the code base and the database.</p>]]>
        
    </content>
</entry>

<entry>
    <title>Effective use of data for better customer experience.</title>
    <link rel="alternate" type="text/html" href="http://www.sadalage.com/2010/08/effective-use-of-data-for-bett.html" />
    <id>tag:www.sadalage.com,2010://2.66</id>

    <published>2010-08-31T20:15:05Z</published>
    <updated>2010-08-31T20:32:53Z</updated>

    <summary>For more than seven years I have been getting offers for credit cards from Airlines and Banks. One particular bank has been sending me these solicitations for more than seven years. That is 12 mailings per year, more than 72...</summary>
    <author>
        <name>Pramod Sadalage</name>
        
    </author>
    
        <category term="BI" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="bi" label="BI" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="design" label="design" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.sadalage.com/">
        <![CDATA[<p>For more than seven years I have been getting offers for credit cards from Airlines and Banks. One particular bank has been sending me these solicitations for more than seven years. That is 12 mailings per year, more than 72 mailings so far, remember these are physical paper mailings not the electronic kind. I don't like the junk, it hurts the environment and worst of all I think its not good use of the data they have. How hard is it to design a system around the data they have.</p>

<p>Lets say they have a table of all the targeted customers they want to send a credit card applications to, why not have a attribute on the table for counting how many times the solicitation was sent, or they can even have the date the first solicitation was sent.</p>

<pre class="codeexample">
Customer
    Name
    Address
    City
    FirstSolicitationSent
</pre>

<p>or</p>

<pre class="codeexample">
Customer
    Name
    Address
    City
    Solicitations
</pre>

<p>So they could say if the days between today and the firstSolicitationSent is more than 90 days, then not send another solicitation, or if this number so solicitations is more than three do not send another solicitation.</p>

<p>This allows them to not send solicitations for years and ultimately loose the customer, I understand the argument of the customer needing time to react to the solicitation, but seven years of trying to convert a prospect is pure waste of time and effort. The data available can be used in better ways.</p>]]>
        
    </content>
</entry>

<entry>
    <title>Schema design in a document database</title>
    <link rel="alternate" type="text/html" href="http://www.sadalage.com/2010/04/schema-design-in-a-document-da.html" />
    <id>tag:www.sadalage.com,2010://2.65</id>

    <published>2010-04-28T23:45:03Z</published>
    <updated>2010-05-05T14:24:25Z</updated>

    <summary>We are using MongoDB on our project, since mongo is document store, schema design is somewhat different, when you are using traditional RDBMS data stores, one thinks about tables and rows, while using a document database you have to think...</summary>
    <author>
        <name>Pramod Sadalage</name>
        
    </author>
    
        <category term="MongoDB" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="design" label="design" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="documentdb" label="documentdb" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="mongodb" label="mongodb" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.sadalage.com/">
        <![CDATA[We are using MongoDB on our project, since mongo is document store, schema
design is somewhat different, when you are using traditional RDBMS data
stores, one thinks about tables and rows, while using a document database you
have to think about the schema in a some what different way. Lets say, we want
to save a customer object, when using a RDBMS we would come up with Customer,
Address, Phone, Email. They are related to each other as shown below.
<img alt="customer.jpg" src="http://www.sadalage.com/images/customer.jpg" width="448" height="367" class="mt-image-none" style="" />
When doing a document database, the schema design actually does not change
much, the Customer document contains an array of Addresses, a one
to many relationship. You will not need the FK columns or the Primary Key
columns on the child tables, since the child rows are embedded in the parent
object. The JSON object below shows how the data would look.

<pre class="codeexample">
{
"_id" : ObjectId("4bd8ae97c47016442af4a580"),
"customerid" : 99999,
"name" : "Foo Sushi Inc",
"type" : "Good",
"since" : "12/12/2001",
"addresses" : [{
		"address" : "4821 Big Street",
		"city" : "Stone",			
		"state" : "IL",
		"country" : "USA"
	},
	{	"address" : "1248 Barlow Ln",
		"city" : "Hedgestone",			
		"country" : "UK"
	}		
],
"emails" : [ 
	{"email" : "foousa@sushi.com"},
	{"email" : "foouk@sushi.com"}
],
"phones" : [ 
	{"phone" : "773-7777-7777"},
	{"phone" : "020-6666-6666"}
]
}
</pre>

So Instead of 1 Row for customer, 2 rows for address, phone and email each, you get one Customer document. If you want to query for customers in USA. Using RDBMS you would do
<pre class="codeexample">
SELECT customer.name FROM customer, address 
WHERE customer.customerid = address.customerid 
AND address.country="USA"
</pre>
The same query in mongo would look like
<pre class="codeexample">
db.customers.find({"addresses.country":"USA"},{"name":true})
</pre>
where customers is the collection in which we store our customers.]]>
        
    </content>
</entry>

<entry>
    <title>My experience with MongoDB</title>
    <link rel="alternate" type="text/html" href="http://www.sadalage.com/2010/04/my-experience-with-mongodb.html" />
    <id>tag:www.sadalage.com,2010://2.64</id>

    <published>2010-04-18T17:27:28Z</published>
    <updated>2010-04-19T03:32:15Z</updated>

    <summary>The current project I&apos;m on is using MongoDB. MongoDB is a document based database, it stores JSON objects as BSON (Binary JSON objects). MongoDB provides a middle ground between the traditional RDBMS and the NOSql databases out there, it provides...</summary>
    <author>
        <name>Pramod Sadalage</name>
        
    </author>
    
        <category term="MongoDB" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="java" label="java" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="mongodb" label="mongodb" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="nosql" label="nosql" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.sadalage.com/">
        <![CDATA[The current project I'm on is using MongoDB. MongoDB is a document based database, it stores JSON objects as BSON (Binary JSON objects). MongoDB provides a middle ground between the traditional RDBMS and the NOSql databases out there, it provides for indexes, dynamic queries, replication, map reduce and auto sharding, its open source and can be downloaded <a href="www.mongodb.org">here</a>, starting up mongodb is pretty easy.

<pre class="codeexample">
./mongod --dbpath=/user/data/db
</pre>
is all you need, where /user/data/db is the path where you want mongo to create its data files. There are many other options that you can use to customize the mongo instance.

Each mongo instance has databases and each database has many collections, mapping back to oracle, mongo database is a oracle schema and mongo collection is a oracle table,
The difference is, each collection can hold any type of object, basically every row can be different. 

An example connection to the database using java looks like this

<pre class="codeexample">
   Mongo mongo = new Mongo("localhost");
   db = mongo.getDB("mydatabase");
</pre>
If the "mydatabase" does not exist, it will be created. When you want to put objects in the database, you need to have a collection which holds the objects.

<pre class="codeexample">
   users = db.getCollection("applicationusers");
</pre>
if the "applicationusers" collection does not exist, it will be created, at this point you are ready to put objects into the collection.

<pre class="codeexample">
    BasicDBObject userDocument = new BasicDBObject();
    userDocument.put("name", "jack");
    userDocument.put("type", "super");
    users.insert(userDocument);
</pre>

You create a document by using the BasicDBObject and put attribute names and their values, in the above example "name" is the attribute and "jack" is the value, the <strong>users.insert</strong> takes the document and inserts it into the collection "users". At this point you have a JSON object put into the database.

You can query for the object using the mongo query tool or the rest full api mongo provides using the flag --rest, when you start mongodb, visiting <em>http://127.0.0.1:28017/mydatabase/users/</em> should give you

<pre class="codeexample">
{
  "offset" : 0,
  "rows": [
    { "_id" : { "$oid" : "4bc9157e201f254d204226bf" }, "name" : "jack", "type" : "super" }
  ],
  "total_rows" : 1 ,
  "query" : {} ,
  "millis" : 0
}
</pre>

Every object you insert, gets a auto generated id, more about update, delete and complex objects in next blog post.]]>
        
    </content>
</entry>

<entry>
    <title>Workshop at Enterprise Data World 2010</title>
    <link rel="alternate" type="text/html" href="http://www.sadalage.com/2010/01/workshop-at-enterprise-data-wo.html" />
    <id>tag:www.sadalage.com,2010://2.62</id>

    <published>2010-01-29T20:47:25Z</published>
    <updated>2010-04-20T03:52:49Z</updated>

    <summary>Doing a workshop on Agile Database Development at Enterprise Data World 2010 at SF. See you there....</summary>
    <author>
        <name>Pramod Sadalage</name>
        
    </author>
    
        <category term="Broadcast" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="agiledata" label="Agile Data" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="edw" label="EDW" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.sadalage.com/">
        <![CDATA[<p>Doing a workshop on <a href="http://edw2010.wilshireconferences.com/sessionPop.cfm?confid=38&proposalid=2204">Agile Database Development</a> at <a href="http://edw2010.wilshireconferences.com/">Enterprise Data World 2010</a> at SF. See you there.</p>]]>
        
    </content>
</entry>

<entry>
    <title>Testing in conversion projects</title>
    <link rel="alternate" type="text/html" href="http://www.sadalage.com/2009/11/automated-data-compare-in-conv.html" />
    <id>tag:www.sadalage.com,2009://2.61</id>

    <published>2009-11-18T22:06:33Z</published>
    <updated>2010-04-20T03:53:49Z</updated>

    <summary>When working on projects involving Conversion of data or Migration/Moving of data from a legacy database. The testing effort is enormous and testing takes a lot of time, some test automation can help this effort. Since data is moved/changed from...</summary>
    <author>
        <name>Pramod Sadalage</name>
        
    </author>
    
        <category term="Best Practices" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Ruby" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="agiledba" label="agile dba" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="automation" label="automation" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="dataconversion" label="data conversion" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.sadalage.com/">
        <![CDATA[<p>When working on projects involving Conversion of data or Migration/Moving of data from a legacy database. The testing effort is enormous and testing takes a lot of time, some test automation can help this effort.</p>

<p>Since data is moved/changed from a source database to destination database, we can write sql which should provide results for the types of tests you want to perform, for example: write a sql to give us number of customers, write a sql to give us account balance for a specific account.</p>

<p>These sqls can be run on your source database as well as your destination database and the results can be compared programmatically, providing us an easy way to compare the state of the database before and after conversion/migration. This testing can be run through a <a href="http://martinfowler.com/articles/continuousIntegration.html">CI</a> engine to make it a regression test suite. </p>

<p>Here is an example implementation using ruby,</p>

<p>We have two databases SOURCE and DESTINATION and two sql files names source.sql and destination.sql. The ruby program picks up sql from these two files and runs them against their database i.e. sql from source.sql is run against the SOURCE database and sql from destination.sql is run against DESTINATION database. The results of both of those sqls is compared and an failure is raised when the results do not match.</p>

<pre class="codeexample">
 results
  statement = get_sql_statement_to_execute
    begin
      source_statement = statement[0]
      destination_statement = statement[1]
      source_rows = exec_sql_in_source_return_rows(source_statement)
      destination_rows = exec_sql_in_destination_return_rows(destination_statement)
      result = compare_rows(source_rows, destination_rows, destination_statement, source_statement)
      results << result
    rescue
      Log.log("Could not process: "+statement)
    end
    if (results.size > 0)
      Log.log("Results do not match in source and destination")
    end
</pre>

<p>The sample ruby code above shows how the solution can be implemented, thus enabling automation of database conversion/migration testing</p>]]>
        
    </content>
</entry>

<entry>
    <title>Ruby OCI 2.0 Array binding</title>
    <link rel="alternate" type="text/html" href="http://www.sadalage.com/2009/10/ruby-oci-20-array-binding.html" />
    <id>tag:www.sadalage.com,2009://2.60</id>

    <published>2009-10-09T01:16:11Z</published>
    <updated>2010-04-18T17:52:08Z</updated>

    <summary>We have been doing some data moving lately using Ruby and Ruby-OCI. We started with Ruby OCI 1.0 and did use prepared statements with bind variables (since we are using oracle database and pulling data from an oracle database and...</summary>
    <author>
        <name>Pramod Sadalage</name>
        
    </author>
    
        <category term="Oracle" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Ruby" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="oci" label="oci" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="performance" label="performance" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="ruby" label="ruby" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.sadalage.com/">
        <![CDATA[<p>We have been doing some data moving lately using Ruby and <a href="http://ruby-oci8.rubyforge.org">Ruby-OCI</a>. We started with Ruby OCI 1.0 and did use prepared statements with bind variables (since we are using oracle database and pulling data from an oracle database and pushing data to an oracle database). Later we found this really cool feature in Ruby-OCI8 2.0 where you can bind a whole array and just make one database trip for many database operations.</p>

<p>Lets say you want to insert 10 rows, using the insert one row at a time would be 10 trips to the database.</p>

<pre class="codeexample">
def save_accounts(accounts)
    stmt = $connection.parse "INSERT INTO account (accountid,name) values (:account_id,:name)"
      accounts.each do |account|
        stmt.bind_param(:account_id, account[0], Float)
        stmt.bind_param(:name, account[1], String)
        stmt.exec
      end
      $connection.commit
    end
</pre>

<p>Using the array bind feature, its actually just one trip to the database (off course depends on the array size you are going to bind, but you get the picture, it reduces database trips)</p>

<pre class="codeexample">
def save_accounts(account_ids, account_names)
      stmt = $connection.parse "INSERT INTO account (accountid,name) values (:account_id,:name)"
      stmt.max_array_size= account_ids.size
      stmt.bind_param_array(:account_id, account_ids)
      stmt.bind_param_array(:name, account_names)
      stmt.exec_array
      $connection.commit
    end
</pre>
We saw a 100% improvement in performance by changing the way we bind the variables in just one place. Looks like a feature to look out for.]]>
        
    </content>
</entry>

<entry>
    <title>Create an Index for all FK Columns in the database</title>
    <link rel="alternate" type="text/html" href="http://www.sadalage.com/2009/09/create-an-index-for-all-fk-col.html" />
    <id>tag:www.sadalage.com,2009://2.59</id>

    <published>2009-09-04T04:15:58Z</published>
    <updated>2010-04-20T03:55:06Z</updated>

    <summary>Most of the time I have seen database foreign key constraints on tables without indexes on those columns. Lets say the application is trying to delete a row from the CUSTOMER table DELETE FROM CUSTOMER WHERE CUSTOMERID = 1000; When...</summary>
    <author>
        <name>Pramod Sadalage</name>
        
    </author>
    
        <category term="Improving Design" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Oracle" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="database" label="database" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="design" label="design" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="practice" label="practice" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.sadalage.com/">
        <![CDATA[Most of the time I have seen database foreign key constraints on tables without indexes on those columns. Lets say the application is trying to delete a row from the CUSTOMER table
<pre class="codeexample">
DELETE FROM CUSTOMER WHERE CUSTOMERID = 1000;
</pre>
When the database goes about deleting the customerId of 1000, if there are foreign key constraints defined on customerId, then the database is going to try to find if the customerId of 1000 is used in any of those tables. Lets say ORDER table has the customerId column, the database is going to issue 
<pre class="codeexample">
SELECT ... FROM ORDER WHERE CUSTOMERID = 1000;
</pre>
now if there is no index on ORDER.CUSTOMERID, the database will have to do a full Table scan which is very expensive in terms of IO and resources, imagine customerId being used in lots of tables, the problem just multiplies significantly. In an multiuser scenario, this will lead to deadlocks, since the same tables are being read and locks being applied to find dependend children. Introducing an index on all the columns that are foreign key referenced helps a lot in this case.]]>
        
    </content>
</entry>

<entry>
    <title>Materialized views and database links in oracle.</title>
    <link rel="alternate" type="text/html" href="http://www.sadalage.com/2009/08/materialized-views-and-databas.html" />
    <id>tag:www.sadalage.com,2009://2.58</id>

    <published>2009-08-10T15:54:12Z</published>
    <updated>2010-04-20T03:57:01Z</updated>

    <summary>Recently one of my colleague Jeff Norris had a weird error. He was trying to build a materialized view over some tables in his local database and some tables in his remote database using database links the sql to create...</summary>
    <author>
        <name>Pramod Sadalage</name>
        
    </author>
    
        <category term="Oracle" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="databaselink" label="database link" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="oracle" label="oracle" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.sadalage.com/">
        <![CDATA[<p>Recently one of my colleague <a href="http://blog.norrissoftware.com/">Jeff Norris</a> had a weird error. He was trying to build a materialized view over some tables in his local database and some tables in his remote database using database links the sql to create the view ran fine and provided the results as expected, but when put inside a materialized view statement complained with ORA-00942 errors.</p>

<p>Lets say the two databases in question are local and remote, so the sql to create the materialized view to load immediately and refresh everyday is</p>

<pre class="codeexample">
CREATE MATERIALIZED VIEW MV_CUSTOMERBALANCE 
BUILD IMMEDIATE
REFRESH FORCE START WITH ROUND(SYSDATE) + 23/24
NEXT SYSDATE + 1
AS
SELECT customer.name , account.balance, accounttype.name 
FROM customer , account@remotedb account, accounttype@remotedb accounttype
WHERE
customer.id = account.customerid
AND account.accounttyppeid = accounttype.id
/
</pre>
Oracle started to complain when creating the above materialized view issuing an error <strong>ORA-00942: table or view does not exist</strong>, but the SQL without the create materialized view command ran fine giving the expected results.

<pre class="codeexample">
SELECT customer.name , account.balance, accounttype.name 
FROM customer , account@remotedb account, accounttype@remotedb accounttype
WHERE
customer.id = account.customerid
AND account.accounttyppeid = accounttype.id
/
</pre>
After some searching around and experimenting I found, in the create materialized view statement the database link name can be used only once, which meant we can only use the "remotedb" name once, we got around this restriction by creating two database links to the remote database as <strong>REMOTEACCOUNT</strong> and <strong>REMOTEACCOUNTTYPE</strong> and using them in the creation of the materialized view as shown below.

<pre class="codeexample">
CREATE MATERIALIZED VIEW MV_CUSTOMERBALANCE 
BUILD IMMEDIATE
REFRESH FORCE START WITH ROUND(SYSDATE) + 23/24
NEXT SYSDATE + 1
AS
SELECT customer.name , account.balance, accounttype.name 
FROM customer , account@remoteaccount account, accounttype@remoteaccounttype accounttype
WHERE
customer.id = account.customerid
AND account.accounttyppeid = accounttype.id
/
</pre>]]>
        
    </content>
</entry>

<entry>
    <title>Perfectly good data.. wasted</title>
    <link rel="alternate" type="text/html" href="http://www.sadalage.com/2009/08/use-the-data-you-have-already.html" />
    <id>tag:www.sadalage.com,2009://2.57</id>

    <published>2009-08-06T04:27:28Z</published>
    <updated>2010-04-20T03:57:54Z</updated>

    <summary>Okay this is kind of a rant, maybe I&apos;m too picky or just that I hate to see perfectly good data not being used. This is how it goes.. I go regularly to this store to get Horizon organic milk...</summary>
    <author>
        <name>Pramod Sadalage</name>
        
    </author>
    
        <category term="BI" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="bi" label="BI" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="data" label="Data" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.sadalage.com/">
        <![CDATA[<p>Okay this is kind of a rant, maybe I'm too picky or just that I hate to see perfectly good data not being used.  This is how it goes..</p>

<p>I go regularly to this <a href="http://www.target.com/">store</a> to get Horizon organic milk for my family, about 60% of the time I see milk I need NOT in stock, okay I can live with that, may be lots of folks are buying organic milk, but not when it happens frequently, especially when the store knows how much milk was ordered (or supplied from the warehouse) and how much milk was sold, the store should be able to figure out that organic milk gets sold out pretty fast, putting my Business Intelligence (BI) hat on, I think the store should be able to predict when they are going to run out of organic milk ( for that matter any product), its especially frustrating when they have all the data they need to get it done.</p>

<p>One more non usage of data that really makes me red is, when the organic milk in the store is already expired (past the sell by date). I mean how hard is it for someone to generate a list of all the products that expire today and ask the store associates to remove them from the shelves by the end of the day, especially when its edible items. </p>

<p><br />
</p>]]>
        
    </content>
</entry>

<entry>
    <title>Explicitly rollback when you encounter a deadlock.</title>
    <link rel="alternate" type="text/html" href="http://www.sadalage.com/2009/05/explicitly-rollback-when-you-e.html" />
    <id>tag:www.sadalage.com,2009://2.56</id>

    <published>2009-05-26T22:06:56Z</published>
    <updated>2010-04-20T03:58:51Z</updated>

    <summary>Dead lock is caused in the database when you have resources (connections) waiting for other connections to release locks on the rows that are needed by the session, resulting in all session being blocked. Oracle automatically detects deadlocks are resolves...</summary>
    <author>
        <name>Pramod Sadalage</name>
        
    </author>
    
        <category term="Best Practices" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="design" label="design" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="transaction" label="transaction" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.sadalage.com/">
        <![CDATA[<p>Dead lock is caused in the database when you have resources (connections) waiting for other connections to release locks on the rows that are needed by the session, resulting in all session being blocked. Oracle automatically detects deadlocks are resolves the deadlock by rolling back the statement  in the transaction that detected the deadlock. Thing to remember is that <strong>last statement is rolled back and not the whole transaction</strong>, which means that if you had other modifications, those rows are still locked and the application should make sure that it does a explicit rollback on the connection.</p>

<p>For example.<br />
Lets assume there are two tables <strong>Parent(ParentID)</strong> and <strong>Child(ChildID)</strong></p>

<pre class="codeexample">
SESSION_A >create table parent (parentId number(10));
Table created.
SESSION_A >create table child (childId number(10));
Table created.
SESSION_A >insert into parent values (100);
1 row created.
SESSION_A >insert into child values (200);
1 row created.
SESSION_A >commit;
Commit complete.
SESSION_A >select * from parent;
  PARENTID
----------
       100

<p>SESSION_A >select * from child;<br />
   CHILDID<br />
----------<br />
       200<br />
SESSION_A ><br />
</pre></p>

<p>Now lets create a situation where a deadlock happens. There are two sessions connected to the same database and same user, SESSION_A and SESSION_B are the two sessions in question.</p>

<pre class="codeexample">
SESSION_A >update parent set parentid = 1000 where parentid=100;
1 row updated.
SESSION_B >update child set childid = 2000 where childid = 200;
1 row updated.
SESSION_B >update parent set parentid = 2001 where parentid=100;
--Waiting For Lock on Row in Parent Table, held by SESSION_A
SESSION_A >update child set childid = 1001 where childid = 200;
update child set childid = 1001 where childid = 200
       *
ERROR at line 1:
ORA-00060: deadlock detected while waiting for resource
--SESSION_A requesting lock on row, held by SESSION_B causing deadlock.
SESSION_A >
</pre>

<p>After you get the ORA-00060 error the statement <em>update child set childid = 1001 where childid = 200;</em> is rolled back.. but SESSION_B is still waiting for the lock on the Parent table to be released.</p>

<p>So when your application get the ORA-00060 or any deadlock exception in any other database, explicitly rollback your transaction (not just the current statement) so that all the changes made in the transaction and all the locks held by the transaction are released. </p>]]>
        
    </content>
</entry>

</feed>

