Tough Business Problems

A couple of business problems plague companies that maintain large amounts structured database data.

The first is how to expire older records from systems once they are no longer needed for compliance reasons. This becomes difficult because running DELETE statments via SQL can incur performance problems, especially on very large tables. Another issue is being able to verify that deletes actually took place, and that they took place at the desired time.

The second is how easily and inexpensively to fulfill requests from auditors who ask “I’d like to see the exact state of your database X number of years ago.” The larger the database, the more difficult this process becomes. To restore from a database backup, users need to supply sufficient disk space for the newly restored system. They need to supply the exact database version in use in the past, as well as the exact operating system.

The RainStor Advantage

RainStor’s database is tailored for Big Data amounts of structured and semi-structured data. RainStor routinely compresses data at levels of 20x to 40x. And it has outstanding features that address the two business problems mentioned at the start of this peice. RainStor is a database oriented toward archival and data warehouse use cases. It is an append only database into which users bulk load data. There are no SQL INSERT or UPDATE statements, for example. However, RainStor ingests bulk loaded data extremely rapidly, compresses it enormously, scales out to hundreds of nodes if needed. It’s licensing model makes for very attractive pricing compared to other database vendors. And when users put their data in RainStor, they get tremendous control over when data is expired and how they present older states of the database.

Record Retention and Expiration with RainStor

Let’s first talk about expiring older records within RainStor.

A RainStor database system is divided up into archives, schemas and tables. An archive is similar to a database within Sybase, SQL Server or DB2. An archive can contain many schemas. And each schema may own many tables.

Within RainStor, users can set a retention period for how many days or months records within a table may be kept. The retention period may be set at the archive or table level. In fact, it can also be set a system-wide level. Archive level retention period settings override the system level. And table level retention period settings override the archive ones.

LOGICAL DATE: The Key

How does RainStor know how old a record is? It checks the LOGICAL DATE set for the record. The LOGICAL DATE is automatically set when the group of records get ingested into RainStor. If the LOGICAL DATE is not set explicitly by the user, it will simply match the wall clock date. To set the value explicity for LOGICAL DATE, all one has to do is alter the name of the file to be loaded. Instead of naming a table my.table.bcp, just name it my.table.2013-10-17.bcp. RainStor will take care of the rest from there. In fact, in RainStor, one does not even have to run a utility to get data loaded, but that’s a story for another day.

Now to finish up how records get expired. RainStor simply compares the logical date of records against the retention period value that was set. If the records are older then the retention period, they are removed from the system.

The best part is that the expiration is super fast. It is not a SQL DELETE statement. Rather, in RainStor, all data files for tables contain records with the same LOGICAL DATE. Therefore, expiring records means only removing a file at the operating system level. No SQL DELETE. Super fast.

All That and Easy Reporting Too

Lastly, all expiration of records are tracked within RainStor within internal system tables that are easy to report upon. Users can furnish auditors with timely reports that prove outdated date gets removed from RainStor in a timely fashion.

RainStor and Database Backup and Recovery

A common use case for deploying RainStor is to eliminate or vastly reduce dependancy on traditional database backup. In some RainStor deployments, users replace an existing database system with RainStor. In other cases, they feed RainStor with data from an on line transcation processing (OLTP) database system. In either case, RainStor can help eliminate or reduce backups.

LOGICAL DATE Again

Earlier, this article discussed how LOGICAL DATE makes it possible to expire database records. LOGICAL DATE is also the linchpin for running queries that show the state of the database in the past.

In order to see only the rows that were present in the past, users need only run a single line of code.

SET ARCHIVE TIME 2011-11-01;

Doing so will mean that any subsequent query will not return records that have LOGICAL DATES more recent than November 1, 2011.

The SET ARCHIVE TIME functionality is available both in RainStor’s command line query tool and also in RainStor’s JDBC and ODBC drivers.

But What About Schema Changes?

RainStor also takes this concept a step further in a way that is unique on the market today.

Whenever users make a DDL change to add/drop/modify columns or add/drop/modify tables, RainStor also attaches a LOGICAL DATE value to the DDL change. This gives RainStor the amazing ability to run point-in-time queries in the past that not only return just the records from that time but also the combination of columns and tables from that time.

The syntax for doing this? It’s as easy as:

SET ARCHIVE TIME AND SCHEMA 2011-11-01;

Wrapping Up

In summary, RainStor is a uniquely valuable database product. It makes possible enormous reductions in cost on database storage. It enables companies to enforce easily data retention policies so that businesses can enforce data retention SLAs. And it also dramatically reduces cost and complexity in database backup and recovery infrastructure.