Creating Sequences using SDB

Thanks to the new functionality in Amazon's SimpleDB we can now choose to have consistent DBs instead of highly scalable ones. Previously if we wanted to have a database that was guaranteed to be consistent right away, our only choice was to use our own SQL database, or use RDS.

But why would you want to trade in performance and availability for consistency? It's quite simple, if you've ever tried to generate sequential numbers for any reason (typically because people don't like using random UUIDs), then you know the ONLY way to do this is by using a locking mechanism. Since SDB was previously only eventually consistent, this made it impossible to use such a database for that purpose.

Thanks largely to Mitch Garnaat's post about how to create Counters, I've been able to create a "Sequence" object for boto that will allow you to persist a SequenceGenerator into SDB, and use it reliably across multiple locations, threads, and processes. This new functionality is now in boto.

Using this new sequence object is relatively simple. First, if you have a [DB] section already in your boto.cfg, it's easy to set up a default domain for your sequences. The Sequence object will look first for a key "sequence_db", and if that doesn't exist it will fall back to "db_name" which is used by the rest of the boto.sdb.db module as well. A sample config section would look like this:


[DB]
db_name = default
sequence_db = sequences


Next it's time to launch up python and start playing around.



>>> from boto.sdb.db.sequence import Sequence
>>> s = Sequence() # Note that we can pass in an optional name
>>> s.id # but if we don't it just uses a UUID
'1ce3eb7b-3fdd-4c60-b243-ec33019090bd'
>>> s.val # The value is set to the first value in our set
0
>>> s.next() # Lets get the next value in this set
1
>>> s2 = Sequence(s.id) # Lets load up this set in another object
>>> s2.val # The value should be the same, even if this was somewhere else
1
>>> s.next() # We increment our first object
2
>>> s2.val # And when we look at the second object it's also incremented
2
>>> s.delete()

So that's all fine and dandy if we're just using a simple sequence, but what if we want to do something more complicated like a fibonacci sequence? Lucky for us this is built into our sequence module:



>>> from boto.sdb.db.sequence import fib
>>> s = Sequence(fnc=fib)
>>> s.val
1
>>> s.next()
1
>>> s.next()
2
>>> s.next()
3
>>> s.next()
5
>>> s.next()
8
>>> s.delete()

But what is this "fnc" argument you ask? Quite simply the sequence object allows you to pass in a custom function that determines how to get the next value in the sequence. This function is passed in both the current value, and the previous value in the sequence. The fibonacci function, which you could have made yourself, simply looks like this:

def fib(cv=1, lv=0):
    """The fibonacci sequence, this incrementer uses the
    last value"""
    if cv == None:
        cv = 1
    if lv == None:
        lv = 0
    return cv + lv


The important things here to remember is that the first value in the sequence must be returned if both the first and second values passed into the function are "None". The first value passed into this function is the "current" value of the sequence, and the second value passed in is the "last" or "previous" value that was in the sequence just before our current value. 


So this is great if you're dealing with integers, but what if I want to increment a string, or a double, or for that matter any random sequence? Lucky for us the cast type is determined automatically, so whatever types you have in your sequence will be the types that come back out of it. So, for example, if you have a string sequence that you want to increment easily, you can use the "increment_string" function:



>>> from boto.sdb.db.sequence import increment_string
>>> s = Sequence(fnc=increment_string)
>>> s.val
'A'
>>> s.next()
'B'
>>> s.next()
'C'
>>> s.val = "Z"
>>> s.next()
'AA'
>>> s.delete()

So what's the magic of this "increment_string" function? Let's take a look:

increment_string = SequenceGenerator("ABCDEFGHIJKLMNOPQRSTUVWXYZ")

What's this SequenceGenerator stuff? Quite simply you can pass in either a string or a list and it'll use that to determine what the next value in the sequence should be. You can also pass in an optional value called "rollover" which will prevent the sequence from "rolling over" and instead just make it return back to the initial value, so instead of going from "Z" to "AA", it would go back to "A":

>>> from boto.sdb.db.sequence import SequenceGenerator
>>> s = Sequence(fnc=SequenceGenerator("ABC", True))
>>> s.val
'A'
>>> s.next()
'B'
>>> s.next()
'C'
>>> s.next()
'A'
>>> s.delete()

With this new SequenceGenerator, and the Sequence object now available in boto.sdb.db, you should now be able to generate any sort of sequence kept in SDB that you could think of. 


Comments

Unknown said…
Just what I needed, thanks.

using sequence to generate consistent order numbers between redundant instances behind an ELB.

Shared file space seemed over kill
just to keep consistent order numbers and this fit the bill perfectly.

just had to keep consistent Sequence name and remove the clean up routine so it wouldn't delete the current sequence on exit.

I've tried to blow it up, running several at a time getting numbers and so far has not skipped a beat.

Thanks for a cool module.