Monitor your SDB Domains

This recently came up in the Boto-Users mailing list, so I thought I'd post here a few quick details on how to monitor your SimpleDB domains to prevent them from hitting maximum capacity before you know it.

As you should be aware, SimpleDB has a limit of 10GB per domain. This limit is calculated as a sum of the bytes used by Item Names, Attribute Names (unique), and Attribute Values. Fortunately, Attribute Names are only stored once per name, so you don't pay for each name being used multiple times, they just charge you per unique name.

You can get all of the Usage information about your SDB Domain using the get_metadata function of a boto domain.

>>> import boto
>>> sdb = boto.connect_sdb()
>>> db = sdb.lookup("my-domain")
>>> md = db.get_metadata()


This "md" object then contains the following elements:


md.item_names_size
md.attr_names_size
md.attr_values_size
md.item_count


I wrote a simple script to check my domains, which takes an optional list of arguments for domain names to check. If you dont' pass in a domain name, it will iterate over all of them and show you any domain that uses more then 3GB:



#!/usr/bin/env python
"""
Check script to make sure none of our domains are close to the size limit
"""
import boto
if __name__ == "__main__":
   import sys
   sdb = boto.connect_sdb()
   if len(sys.argv) > 1:
      query = [sdb.lookup(n) for n in sys.argv[1:]]
      limit = 0
   else:
      query = sdb.get_all_domains()
      limit = 3000000000
   for db in query:
      md = db.get_metadata()
      total = int(md.item_names_size) + int(md.attr_names_size) + int(md.attr_values_size)
      if total > limit:
         print db.name
         print "\tItems:", md.item_count
         print "\tItem Name Size:", md.item_names_size
         print "\tAttribute Name Size:", md.attr_names_size
         print "\tAttribute Values Size:", md.attr_values_size
         print "\t---------------------------------------------"
         print "\tTOTAL:", total




If your domain is using more then 10GB of space, you can use this to track down what's using a lot of space. In my case, I was adding a lot of unnecessary items that were almost completely blank, so my item_count was huge, and my item_names_size was over 7GB. 


Of course, if you do have need for all these items, you should consider Sharding your domain into multiple sub-domains. This process is usually handled by taking one attribute that nicely splits your items into different segments, and using that value as the domain name. Unfortunately, you can not query across multiple domains, so you have to be very careful what you choose as your Shard Key. 


0