Boto can be a great tool if you're querying against SDB, and it helps you out by managing paging automatically for you so you don't have to keep querying it for the next set of results. If you're dealing with a web-based application, however, you have to deal with your own paging and simply iterating forever over a large result set will eventually time out your connections. To solve this, you can use the built-in paging system provided by boto.
Everytime you query using "db.select" in boto, you get back a result set. Most people probably just think of this as an iterator, since it does all the magic behind-the scenes and only queries when you start iterating. It also stores that magical "next_token" within itself so it can query for the next page of results from SDB. Normally, you wouldn't even notice this attribute, but if you're dealing with a service that needs to return in a short amount of time, it can be quite useful.
Additionally, there are two important keyword arguments you can specify to the "select" command on any domain. These are max_items, and next_token. The max_items keyword tells boto to return after it has yielded that number of results, instead of simply handling the paging automatically for you. It's also quite important to add the limit SDB command to your query or boto will return in the middle of the result set and you will lose those middle results!
Ok, now to the code:
>>> import boto
>>> sdb = boto.connect_sdb()
>>> db = sdb.get_domain("default")
>>> rs = db.select("SELECT * FROM `default` LIMIT 10", max_items=10)
Notice that we set "LIMIT" and "max_items" both to 10.
Also note that "rs" is the result set of your select query, but only runs after you start iterating, rs.next_token should be blank now
>>> rs.next_token
>>> for i in rs:
... print i
Your first 10 results will print out, now rs.next_token is set:
>>> rs.next_token
u'r........'
Now you can pass that next_token back to the SAME select, it must be the EXACT same query for next_token to work:
>>> rs2 = db.select("SELECT * FROM `default` LIMIT 10", max_items=10, next_token=rs.next_token)
>>> for i in rs2:
... print i
Your next 10 results will print out
After you get to this point, it's a simple process to rinse-repeat. Once you run out of results, rs.next_token will be empty.
Comments