Passionate about data

Data and its implications on software design and development.

Static Analysis of PL/SQL Code

In all new development and sometimes during legacy codebase modernization, developers tend to add code quality checks and static analysis of codebase such as style checks, bug finders, cyclomotic complexcity checking etc. into the CI/CD pipeline. When we inherit a codebase that has much PL/SQL and there is a desire to put the PL/SQL code base through the same types of code analysis, what options does a developer/dba have?

There are some options we can explore such as

ID Generation With ORM’s Table, Sequence or Identity Strategy

In this blog post, I will discuss ID generation techniques using the Object Relation Mapping frameworks such as Hibernate, Toplink, ActiveRecord, Entity Framework. When using hibernate or any other ORM mapping framework. There is a need to generate primary key values for the “id” columns. These values can be generated by using IDENTITY, SEQUENCE or TABLE strategies. Generating custom values for primary keys or other values is a topic for another blog post.

In this blog post the examples are using Hibernate, JPA, Java8 and Oracle12c.

TABLE Id generation strategy:

When using this strategy, a table in the database is used to store the current id value and the next value is generated and written back to the table.

1
2
3
4
CREATE TABLE hibernate_sequences(
    sequence_name VARCHAR2(40) NOT NULL,
    next_val NUMER(18) NOT NULL
)

The above table is used by hibernate to generate next_val for sequence_name specified in the entity mapping, if no sequence_name is specified then default is used.

The ID generation involves a transaction for its own use and care has to be taken when multiple processes are generating ID’s, and involves locking the ROW in the table. This strategy is the least efficient and should be avoided.

IDENTITY Id generation strategy:

Identity columns in databases like MySQL, MSSQL, PostgreSQL and Oracle 12c are used to auto generate numeric values for a number column. There can be only one identity column in a table, the column has to be not-null and cannot accept any other default values. Identity columns are generally used as synthetic primary key columns. Shown below is an example of creating a table with identity column id in Oracle12c.

1
2
3
4
5
6
7
8
9
CREATE TABLE person
(
  id NUMBER(18) GENERATED ALWAYS AS IDENTITY
                INCREMENT BY 1 START WITH 1 NOT NULL,
  name VARCHAR2(40) NOT NULL
);

ALTER TABLE person ADD CONSTRAINT pk_person PRIMARY KEY (id)
;

The id column from the person table can now be mapped to the Person entity. The generation type we are using is IDENTITY and hibernate now understands that the database will provide the value for id at insert time.

1
2
3
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;

Mapping in Hibernate is simple, without having to specify any sequence names, All inserts into the table automatically use the same method for getting ID’s either through the application or directly into the database by other users.

Since ID’s are assigned at insert time, JDBC batching does not work when using identity columns. In Oracle12c dropping of tables does not drop the sequence associated with the identity column leaving stale sequences in the database. In a Parent and Child relationship table such as Person and all their Addresses table, the child.parent_id column cannot be “not null” as hibernate (the ORM framework) cannot get the ID of the parent before the parent is inserted, hence it does these operations

1
2
3
4
5
insert into parent_table
insert into child_table
select parentId from parent_table
select childId from child_table
update child table with parentId for childId.

Identity ID generation does not support preallocation, so requires a select after each insert, increasing the trips to the database and degrading insert performance.

SEQUENCE Id generation strategy:

Sequences in Oracle, Postgres and SQLServer 2012 are database objects that generate unique numbers and are generated independent of transactions. Sequences improve the concurrency of number generation. Sequences can be increment by any number, can start at any number, can cycle the numbers after maximum number is reached, have upper limit on the number generated and can cache certain number of values. Shown below is an example of creating sequence in Oracle12c.

1
2
3
4
5
6
7
8
9
10
CREATE SEQUENCE seq_person
  START WITH 10000 INCREMENT BY 1;

CREATE TABLE PERSON
(
  id NUMBER(18) NOT NULL,
  NAME VARCHAR2(40)
);
ALTER TABLE PERSON ADD CONSTRAINT PK_PERSON PRIMARY KEY (ID)
;

Using the sequence above and the table definition, we can map the table with the object and use the SEQUENCE GenerationType and the sequence name seq_person. When a new Person is created, hibernate will get the next_id from the sequence and assign it to id attribute on the object.

1
2
3
4
@Id
@GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "person_generator")
@SequenceGenerator(name="person_generator", sequenceName = "seq_person")
private Long id;

Sequences provide a easy way to create an object to provide unique numbers and are not bound by transaction boundaries. Various configuration options, such as start number, increment interval, cache size, max number, min number, cycle the numbers etc.

ORM frameworks can use batch operations for inserts as they can get the sequence values for ObjectID’s without inserting the data in the tables.

Applications have to remember to use the correct sequence to insert the data in the table, hibernate mapping is somewhat verbose. Other utilities have to remember to use the correct sequence when inserting data directly into the database.

Recommendation:

Based on the above observations and usage:

If the database provides features to use sequences they should be used. In databases that do not provide sequences identity columns have to be used.

If the database provides both identity and sequence features. The decision depends on the amount of batch operations being performed and the need to provide consistent primary key generation for new data from the application and other utilities loading data directly in the database.

Using Liquibase to Load Data and Ignore Some Columns

Loading data into tables is needed many times on projects to load test, Liquibase provides a method to load data into tables with lots of customization. In the example shown below, I’m loading zip code data with the following column layout

1
"Zipcode","ZipCodeType","City","State","LocationType","Lat","Long","Location","Decommisioned","TaxReturnsFiled","EstimatedPopulation","TotalWages"

Two Factor Authentication to Authorize Credit

Simple flow of credit file usage by Individual Citizens

In todays environment, identity theft and related crimes are frequent and common place. Most of these incidents take place because our identity can be easily stolen will little bits of information such as name, address, birthdate, social security number or maybe just your card number.

What if, credit file or credit report or credit score even credit cards had two factor authentication, just like we have, two factor authentication on gmail, chase and many other services.

Automatically Adding Columns to Rails Migrations

Many projects need addition of identical columns to all the tables created by the project. Audit columns are an example of such a requirement. The requirement is to add columns such as created_by, created_date, modified_by and modified_date to all the tables, these columns store, who created the row, when the row was created, who modified the row last and when was it modified. created_by and created_date are required to be present when the row is inserted and thus are required to be not nullable. Adding these columns to each and every table is a lot of work for developers.

Synonyms as Abstraction Layer

In many development shops, developers are not allowed to access the database schema directly, and are not allowed to create tables, indexes, views etc, instead are given access via a different schema that allows SELECT, UPDATE and DELETE access on data. The general reason is to avoid developers creating database objects without

Behavior Driven Database Development

When Behavior Driven Development BDD was introduced, some of the key principles were

  • Requirements are behavior,
  • Provides “ubiquitous language” for analysis,
  • Acceptance criteria should be executable.
  • Design constraints should be made into executable tests.

All of these principles can be applied to database development. When interacting with the database, we tend to assume certain behavior of the database. Some of this is universal, like when a row is inserted in a table, the same row can later be retrieved. There are other behaviors of the database on every project that are not that universal, like Person table should have at least firstname or lastname. This behavior changes based on the functionality being developed and thus needs to be properly specified and executed. The database lends itself very well to the new way of thinking in the BDD space, where the behavior of the objects is considered. BDD is similar to describing requirements in code.

Using Rake and Activerecord to Generate Boilerplate DB Code

IN many projects, there are tables which need default audit columns such as Created_By, Created_Date, Modified_By, Modified_date and other columns that need to be updated everytime some actions are done againt the tables and/or columns. This type of functionality can be implemented using triggers. Why have triggers? when the application can do the updating, this is a good argument, but like all application databases eventually other users, applications and scripts will get access to the applications database and end up wanting to read from the database and write to the database. During these times its better to have triggers updating the data independent of the application to ensure audit columns and other columns are updated with appropriate values.

Database Naming Conventions in Different Environments

In every enterprise and every project we end up having multiple environments, especially the database side of the enterprise tends to stick around for a longer period of time and has much more dependencies or application integration as opposed to application urls etc. Given this, how to name the servers, databases and schemas becomes a very important decision, do these names provide for an easy way to use the application and not make it harder or the developers to access the database.

8 Techniques for Testing Migration of Data From Legacy Systems

Many of the projects we end up working on are replacing existing systems with existing data either wholly or in part. In all of the above projects we end up writing data migration or data conversion code to move the data from legacy systems to the new systems. Many stake holders of the project such as business users, project managers, business analysts really care about the data conversion scripts and the quality of the conversion. Since this conversion is business entity related and matters a lot as future business/functionality depends on the data being logically equivalent to the legacy system.

Over the years we have found many techniques that help in testing the quality of the data conversion. Here are the 8 techniques that encompass our learnings when converting data over from legacy databases.