In many situations, auxiliary tables are faster and more appropriate than SQL with computations. To illustrate, consider a classic problem. You have a theater and a bunch of seats you wish to sell for a performance (or think of seats for an airline flight). The seats have a sequential serial number from 1 to (n) for inventory. But in the theater building and on the tickets, the seats are arranged in rows of (k) seats and referenced by the pair (row_nbr, seat_within_row_nbr).

Joe Celko, Contributor

November 5, 2007

6 Min Read

In many situations, auxiliary tables are faster and more appropriate than SQL with computations (a topic I address at length this week in "Celko on SQL: Auxiliary Tables vs. Declarative Coding"). To illustrate, consider a classic problem. You have a theater and a bunch of seats you wish to sell for a performance (or think of seats for an airline flight). The seats have a sequential serial number from 1 to (n) for inventory. But in the theater building and on the tickets, the seats are arranged in rows of (k) seats and referenced by the pair (row_nbr, seat_within_row_nbr). You could construct an auxiliary table like this:CREATE TABLE Theater - rows of six seats (serial_nbr INTEGER NOT NULL PRIMARY KEY, row_nbr INTEGER NOT NULL CHECK (MOD(serial_nbr, 6) +1 = row_nbr), seat_within_row_nbr INTEGER NOT NULL, CHECK (serial_nbr / 6) = seat_within_row_nbr), UNIQUE (row_nbr, seat_within_row_nbr), seat_status INTEGER DEFAULT 0 NOT NULL CHECK (seat_status IN (0, 1)), - 0= available, 1= sold);

This is hardwired for rows of six seats, so it is not flexible. I also have two keys for the table. It is fine to have more than one key, defined by a PRIMARY KEY and some UNIQUE-NOT NULL columns. But it is a symptom of a possible design problem. In this case, the (row_nbr, seat_within_row_nbr) key is computed from (serial_nbr). The computation is not that complicated - we are using basic integer math, not calculus here!

CREATE VIEW TheaterLayout (serial_nbr, row_nbr, seat_within_row_nbr, seat_status) AS SELECT serial_nbr, (MOD(serial_nbr, 6) +1, (serial_nbr / 6), seat_status FROM Seats;

Where we have a simple table:

CREATE TABLE Seats (serial_nbr INTEGER NOT NULL PRIMARY KEY, seat_status INTEGER DEFAULT 0 NOT NULL CHECK (seat_status IN (0, 1)), - 0= available, 1= sold);

The seat_status can be extended to include 'permanent reservations', 'out for repairs' and so forth. That is why it is an integer and not an assembly-language style bit flag (BIT and BIT VARYING are deprecated in Standard SQL and were never a good idea). The idea is that any seat_status > 0 is not available.

The classic problem with these seats is to find blocks of (k) contiguous seats where (k BETWEEN 1 AND 6) so you can sell some tickets. What could we do for a table-driven approach? Well, there are 2^6 = 64 possible ways to mix available and unavailable seats in a single row. You can build a small table of patterns with that small a sample using 1s and 0s.

Doing that we would get {(000000)} for an entire available row, {(100000), (000001)} for a row with five available seats and so forth. But look at the patterns (001100) and (010011) which have multiple groups of one and two contiguous seats, and {(000111), (100011), (110001), (11100)} which have four groups of three contiguous seats.

What if we introduce the convention that in patterns, 0= available, 1= not available and ?= do not care? The patterns for

One seat = {(01????), (101???), (?101??), (??101?), (???101). (????10)} Two seats = {(001???), (1001??), (?1001?), (??1001), (???100)} Three seats = {(0001??), (10001?), (?10001), (??1000)} Four seats = {(00001?), (100001), (?10000)} Five seats = {(000001), (100000)} Six seats = {(000000)}

That is such a small set (6+5+4+3+2+1 = 21 patterns) that you could hardwire it into a (rather ugly) CASE expression that uses the desired block size as a parameter. Or I could build a normalized table like this:

CREATE TABLE SeatPatterns (pattern_id INTEGER NOT NULL PRIMARY KEY, seat_within_row_nbr INTEGER NOT NULL CHECK (seat_within_row_nbr BETWEEN 1 AND 6), pattern_status INTEGER DEFAULT -1 NOT NULL CHECK(pattern_status IN (-1, 0, +1));

I am using -1 for '?' so we can compare the pattern and seat status codes with the expression (ABS(pattern_status) * seat_status) = SIGN(seat_status)) in a predicate that you have to figure out as this month's second puzzle.

The SIGN() or signum function will convert all the non-zero seat status codes to <> one. The multiplication and ABS() is a Boolean/bit AND operator in disguise for speed. You can make it run really, really fast! Those low-level math operators are one machine instruction in most modern hardware. Having said all of this, I will now tell you that we are being too clever for our own good. Your code will be a pain to write, read and maintain. Any speed gains will be lost the first time you have to maintain this code. I assigned it as a learning experience (teacher-speak for "torture the students"). It might be really fast, but what if we want to rearrange the seating to have row with more or less than six seats across? This thing is like a Polar Bear in the Caribbean - very well adapted to its original environment but a total failure outside of it. The patterns tables will not scale very well. For (n) seats per row, you will have ((n * (n+1))/2) patterns. That is better than (n!) but it is still a burden when you have to generate separate tables for all possible values of (n). And be ready to add the (n+1) table without warning. Having told you that pattern tables are a bad idea, here are three puzzler challenges: 1. Write a lookup table for (n = 10) that finds all runs of seats between 1 and 10 on a row. 2. Write a stored procedure or parameterized query for a theater with 10 seats per row. 3. Can you generalize that query for any value of (n)? I'll post my own responses to these challenges next week. In the meantime, the first to post decent, working responses to all three challenges in the comment field below will get a free copy of one my books on SQL. Joe Celko is an independent consultant in Austin, Texas, and the author of SQL Puzzles and Answers (2006), Joe Celko's SQL for Smarties: Advanced SQL Programming (2005), and Joe Celko's Trees and Hierarchies in SQL for Smarties (2004).In many situations, auxiliary tables are faster and more appropriate than SQL with computations. To illustrate, consider a classic problem. You have a theater and a bunch of seats you wish to sell for a performance (or think of seats for an airline flight). The seats have a sequential serial number from 1 to (n) for inventory. But in the theater building and on the tickets, the seats are arranged in rows of (k) seats and referenced by the pair (row_nbr, seat_within_row_nbr).

About the Author(s)

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like


More Insights