Friday, December 18, 2009

Aeron Chairs

Newstex has a tradition of giving all of it's employee's something related to the business we do (Authoritative Content Aggrigation) every christmas. One year it was a Kindle,  once an Apple TV, and once an iPhone. This year, however, Newstex bought the most expensive desk chairs ever produced.

It took a while to adjust this chair, about 400 different settings to change, it can be relatively comfortable, but it's one of those things that we all look at yet never want to buy (who spends $850 on a chair?). In any event, this is the exact same chair as we use to have at the old RIT DataCenter when we had to stay up late running scripts which should have been handeld by a good crontab.

I've realized today that I've moved quite a bit away from that original job in the DCO. Back then I was running scripts by hand that should have easily been automated, and today I work for a company automating processes that most people would consider far to complex for a computer to handle. Now that we've migrated our entire operations into Amazon Web Services, we have the scalability and availability of the largest companies, without any big investments in hardware, or having to worry about hardware upgrades at all. Just today I performed a live upgrade on our websites without a single second of downtime.

Saturday, December 12, 2009

Paging SDB results in boto

Boto can be a great tool if you're querying against SDB, and it helps you out by managing paging automatically for you so you don't have to keep querying it for the next set of results. If you're dealing with a web-based application, however, you have to deal with your own paging and simply iterating forever over a large result set will eventually time out your connections. To solve this, you can use the built-in paging system provided by boto.

Everytime you query using "db.select" in boto, you get back a result set. Most people probably just think of this as an iterator, since it does all the magic behind-the scenes and only queries when you start iterating. It also stores that magical "next_token" within itself so it can query for the next page of results from SDB. Normally, you wouldn't even notice this attribute, but if you're dealing with a service that needs to return in a short amount of time, it can be quite useful.

Additionally, there are two important keyword arguments you can specify to the "select" command on any domain. These are max_items, and next_token. The max_items keyword tells boto to return after it has yielded that number of results, instead of simply handling the paging automatically for you. It's also quite important to add the limit SDB command to your query or boto will return in the middle of the result set and you will lose those middle results!

Ok, now to the code:


>>> import boto
>>> sdb = boto.connect_sdb()
>>> db = sdb.get_domain("default")
>>> rs = db.select("SELECT * FROM `default` LIMIT 10", max_items=10)
Notice that we set "LIMIT" and "max_items" both to 10.

Also note that "rs" is the result set of your select query, but only runs after you start iterating, rs.next_token should be blank now

>>> rs.next_token
>>> for i in rs:
... print i
Your first 10 results will print out, now rs.next_token is set:

>>> rs.next_token
u'r........'
Now you can pass that next_token back to the SAME select, it must be the EXACT same query for next_token to work:
>>> rs2 = db.select("SELECT * FROM `default` LIMIT 10", max_items=10, next_token=rs.next_token)
>>> for i in rs2:
... print i
Your next 10 results will print out


After you get to this point, it's a simple process to rinse-repeat. Once you run out of results, rs.next_token will be empty.