back to _why's Estate

A Quick Guide to SQLite and Ruby

I spent last night playing with SQLite and am convinced that this is a tool which could prove incredibly useful to coders and a great tool for learners to check out SQL. The problem is that there isn't enough documentation for Ruby users who want to take advantage of SQLite's features.

So, let's talk about SQLite's handsome features:

SQLite is swift. In my own testing, I have found it to be speedy. Some speed comparisons with MySQL and PostgreSQL are here.

SQLite is not a large database server, such as MySQL. You don't connect to the database. Using SQLite, you access a database file. Everything happens in-process.

SQLite is an ACID database. Supports transactions, triggers.

SQLite is public domain. Absolutely no licensing issues.

SQLite is typeless. Any type or length of data may be stored in a column, regardless of the declared type. This allows extreme flexibility and avoidance of type errors.

SQLite allows custom functions and aggregates. This is my favorite feature of SQLite, which we will explore shortly.

Getting Started with SQLite

SQLite is available for most platforms (Linux, BSD, Windows) from the download page. SQLite comes with a command-line tool for managing databases. You can find a decent tutorial for starting with SQLite here.

In RAA, you'll find several Ruby libraries for using SQLite. The ruby-dbi module is great if you want your code to work if you switch databases, but you're hampered in using some of SQLite's features. (If you plan on using ruby-dbi, I would be aware of how SQLite compares to SQL-92, so your queries can be portable as well.)

The other two libraries (ruby-sqlite and sqlite-ruby) have custom APIs for accessing SQLite, which will allow us to add custom functions and aggregates, as well as set table metadata. I suggest sqlite-ruby, as it is a bit more feature complete. Either will work fine, but the rest of this tutorial will focus on using sqlite-ruby.

Creating a Database

To open a new database (or an existing one), simply instantiate a SQLite::Database object with the name of the database file:

 require 'sqlite'
 db = SQLite::Database.new( 'sample.db', 0644 )

According to SQLite docs, the second argument passed to the constructor is "intended to signal whether the database is going to be used for reading and writing or just for reading." But in current implementations, this argument is ignored. All databases are opened for both reading and writing, though it is anticipated that readonly databases could be added in the future.

SQLite stores all of the data for a database inside a single file. This encompasses all indices, tables and schemas for the entire database. The advantage is that this single file can be easily transported wherever you like. The same database file can be included with your software and accessed on Windows, Linux, or any other supported platform.

The disadvantage to a single database file is that this file can grow quite large. Even after you've deleted rows or entire tables, your file may not decrease in size. To free the disk space once again, you'll need to execute the VACUUM statement, which cleans up tables and indices. The VACUUM statement can be run alone to clean the whole database.

Passing Queries to SQLite

The execute method can be used to pass queries to your database, once it is open.

 db.execute <<SQL

  CREATE TABLE sites (
   idx INTEGER PRIMARY KEY,
   url VARCHAR(255)
  );

 SQL

You can also test the completeness of your SQL statements with the complete? methods.

 >> db.complete? "SELECT *" 
 => false
 >> db.complete? "SELECT * FROM email;" 
 => true

On its own, execute will simply return an Array of Hashes as the resultset. Passing a block into execute will cause the block to be called on each successive loading of a row. In such a case, it becomes a sort of "each_row" for a query, each time receiving a Hash of field-value pairs.

 db.execute( "SELECT * FROM sites;" ) do |site|
  puts "-> Site #%d %s" % [ site['idx'], site['url'] ]
 end

Vital Pragma

SQLite has a few features enabled by default that you might consider disabling. These are optimizations that have consequences and I present them for your careful thought. I am giving you the basics. Futher optimizations can be had at the SQLite Optimization FAQ.

The cache_size setting determines how many database pages can be kept in memory. The default settings is 2000, counted in 1KB chunks. Consider increasing this before executing queries on large sets of data. (Especially updates to large tables.) This setting can dramatically speedup such situations. Use PRAGMA cache_size to set.

By calling PRAGMA default_synchronous=OFF;, you can turn off the intensive synchronization of the database. When set, queries will wait for a database to be completely written before executing. On truly mission-critical apps, this may be necessary, but generally you can turn this off.

If you're not worried about how many rows are affected following an UPDATE or INSERT, consider using PRAGMA count_changes=OFF;, which will disable counting of affected rows. A smaller speedup in this case, but still worth noting.

Custom Functions

SQLite comes with a variety of common functions for forming expressions. For example, you may want to uppercase a field you are reading:

 db.execute( "SELECT UPPER(url) FROM sites;" )

You can add your own Ruby functions to SQLite by using the create_function method. To make our own function for reversing a field's contents:

 db.create_function( 'revers', 1, 
   proc { |ctx,s| s.to_s.reverse }, nil )

The first parameter we pass in is the name of the function to create. SQLite will ignore casing of this string. The second parameter indicates the number of parameters to send to the function. The third parameter is a Proc object. The fourth parameter should allow you to pass further data into the Proc, but doesn't appear to be implemented at the time of this writing.

The proc object you create should receive an extra initial argument, listed above as ctx. This is a SQLite::Context object, which allows you store data between calls. I've found this object to be quite buggy when used in functions. But, hey, it's there.

To call our new revers function:

 db.execute( "SELECT REVERS(url) FROM sites;" )

One thing to note about the create_function method is that your proc should not return any sort of object which is a collection (Array, Hash, etc.) The object won't make the translation in and out of the database.

Like Ruby, you may also override the current set of functions. For example, the @Y LIKE X@ syntax is syntactical sugar for the like(X,Y)= function. If you want to support regular expressions in your =LIKE statement, you could override LIKE to do so:

 like_function = proc do |ctx, x, y|
  1 if /#{ x }/ =~ y
 end

 db.create_function( 'like', 2, like_function, nil )
 db.execute( "SELECT url FROM sites WHERE url LIKE '^http:'" )

Custom Aggregates

Aggregates are similiar to functions, but their return is totaled for a set of rows. If you've used much SQL, you've seen these before in the form of count, avg, or sum functions.

To create an aggregate, you provide two procs. One which is called for each row like a function. The other proc is called upon completion of the query and provides a final total.

 sum_up_1 = proc do |ctx, a|
  ctx.properties["sum"] ||= 0
  ctx.properties["sum"] += a.length
 end

 sum_up_2 = proc do |ctx|
  ctx.properties["sum"]
 end

 db.create_aggregate( 'letter_count', 1, 
            sum_up_1, sum_up_2, nil )
 db.execute( "SELECT LETTER_COUNT(address) FROM email" )

The above code totals the letter count for all of the address fields in a set of rows.

So how does SQLite do this? Remember that since SQLite is executed in-process, you can pass memory addresses to it. A function pointer is passed inside the SQLite extension, which calls your proc. I haven't done any benchmarking, but I imagine the figures are pretty tight for these calls.

Storing Binary Data

Storing binary data is a big use case for SQLite. If I was going to write an adventure game in Ruby, I would lodge all my scenes and characters in an SQLite database.

But remember I said that SQLite was typeless? This means that you can't get away with storing binary data in a BLOB. BLOBs, CHARs, TEXTs are all the same datatypes which only store null-terminated strings. SQLite comes with two API functions, sqlite_encode_binary and sqlite_decode_binary, but these aren't implemented in any Ruby APIs currently.

A quick solution is to use Ruby's base64 library. Really, base64 is a bit much, since we really only need to escape '\000' (which is what sqlite_encode_binary does). Until we can get those function exposed, though, certainly use base64.

Let's declare our table with a BLOB to indicate that we plan to store binary data and to give our table some degree of portability.

 db.execute << SQL

  CREATE TABLE scenes (
   idx INTEGER PRIMARY KEY,
   background_png BLOB
  );

 SQL

To store binary data in our table:

 require 'base64'
 background_png = File.open( 'background.png' ).read
 db.execute( "INSERT INTO scenes (background_png) VALUES " + 
       "('#{ encode64( background_png ) }');" )

To read binary data from our table and write it out to files:

 db.execute( "SELECT * FROM scenes" ) do |scene|
  background_png = decode64( scene['background_png'] )
  File.open( "back-#{ idx }.png", "w" ) do |back_out|
   back_out << background_png
  end
 end

Alternatively (if you're mental), you could load the schema for your database and parse out the blobs. Try this query, after creating the scenes table:

 SELECT sql FROM
  (SELECT * FROM sqlite_master UNION ALL
   SELECT * FROM sqlite_temp_master)
 WHERE tbl_name = 'scenes' AND type != 'meta'

You'll receive the CREATE TABLE statement we used to create the table. BLOBs could be parsed out when the database is loaded and handled differently. (To myself: why am I even suggesting this?! Probably to demonstrate metadata access without having to write a new section on it!)

Conclusion

Hopefully this is a fitting introduction to SQLite in Ruby. If not, please contact me and spew wisdom.

by why the lucky stiff

july 2, 2003