Amazon DynamoDB

Last week, Amazon announced the launch of a new product, DynamoDB. Within the same day, Mitch Garnaat quickly released support for DynamoDB in Boto. I quickly worked with Mitch to add on some additional features, and work out some of the more interesting quirks that DynamoDB has, such as the provisioned throughput, and what exactly it means to read and write to the database.

One very interesting and confusing part that I discovered was how Amazon actually measures this provisioned throughput. When creating a table (or at any time in the future), you set up a provisioned amount of "Read" and "Write" units individually. At a minimum, you must have at least 5 Read and 5 Write units partitioned. What isn't as clear, however, is that read and write units are measured in terms of 1KB operations. That is, if you're reading a single value that's 5KB, that counts as 5 Read units (same with Write). If you choose to operate in eventually consistent mode, you're charged for half of a read or write operation, so you can essentially get double your provisioned throughput if you're willing to put up with only eventually consistent operations.

Ok, so read operations are essentially just look-up operations. This is a database after all, so we're probably not just going to be looking at looking up items we know, right?


Amazon does offer a "Scan" operation, but they state that it is very "expensive". This isn't just in terms of speed, but also in terms of partitioned throughput. A scan operation iterates over every item in the table, It then filters out the returned results, based on some very crude filtering options which are not full SQL-like, (nothing close to what SDB or any relational database offers). What's worse, a single Scan operation can operate on up to 1MB of data at a time. Since Scan operates only in eventually consistent mode, that means it will use up to 500 Read units in a single operation (1,000KB items/2 (eventually consistent) = 500). If you have 5 provisioned Read units per second, that means you're going to have to wait 100 seconds (almost 2 minutes) before you can perform another Read operation of any sort again.

So, if you have 1 Million 1KB records in your Table, that's approximately 1,000 Scan operations to perform. Assuming you provisioned 1,000 Read operations per second, that's roughly 17 minutes to iterate through the entire database. Now yes, you could easily increase your read operations to cut that time down significantly, but lets assume that at a minimum it takes at least 10ms for a single scan operation. That still means the fastest you could get through your meager 1 Million records is 10 seconds. Now extend that out to a billion records. Scan just isn't effective.

So what's the alternative? Well there's this other very obscure ability that DynamoDB has, you may set your Primary Key to a Hash and Range key. You always need to provide your Hash Key, but you may also provide the Range Key as either Greater then, Less then, Equal To,  Greater then or equal to, Less then or equal to, Between, or Starts With using the Query operation.

Unlike Scan, Query only operates on matching records, not all records. This means that you only pay for the throughput of the items that match, not for everything scanned.

So how do you effectively use this operation? Simply put, you have to build your own special indexes. This lends itself to the concept of "Ghost Records", which simply point back to the original record, letting you keep a separate index of the original for specific attributes. Lets assume we're dealing with a record representing a Person. This Person may have several things that identify it, but lets use a unique identifier as their Hash key, with no Rage key. Then we'll create several separate Ghost records, in a different table. Lets call this table "PersonIndex".

Now if we want to search for someone by their First Name, we simply issue a query with a Hash Key of property = "First Name", and a range Key of the first name we're looking for, or even "Starts With" to match things like "Sam" to match "Samuel". We can also insert "alias" records, for things like "Dick" to match "Richard". Once we retrieve the Index Record, we can use the "Stories" property to go back and retrieve the Person records.

So now to search for a record it takes us  Read operation to search, and 1 Read operation for each matching record, which is a heck of a lot cheaper then one million! The only negative is that you also have to maintain this secondary table of Indexes. Keeping these indexes up to date is the hardest part of maintaining your own separate indexes. however, if you can do this, you can search and return records within milliseconds instead of seconds, or even minutes.

How are you using or planning to use Amazon DynamoDB?


Ben said…
Thanks for the write up. I'm new to noSQL. Could you give a practical example of how you would use the "Stories" property. Thanks! Ben.

BTW ... posting on is a PITA with the requirement to login with google account...and it obviously doesn't keep spam away!
deime said…
Having two persons named "John" is not uncommon. In this case, there will be a need for two identical primary keys in Person Index (hash_key='first name', range_key='John'), but this is not possible from what I read in the docs.

Am I wrong? Are there solutions to this issue?
Unknown said…

yes it is not uncommon but you can save 1 or more commaseparated person IDs

HashKey = FirstName

where 10,100,5,20 list of people with name John
Unknown said…
You don't need to use a comma separated value for multiple values within a single property. You can actually store a list to any property within DynamoDB. Since there's no limit to the number of properties within a single item on DynamoDB, this is a real win-win situation.
Unknown said…
There is a maximum item size of 64kb though isn't there? If you had a lot of records then this wouldn't scale?
BobbyArcher said…
Thank you so much for providing individuals with such a wonderful chance to read articles and post posts from here. It is often very nice and packed with amusement for me personally and my office acquaintances to visit your web site particularly thrice weekly to study the latest guides you have. And of course, we are certainly amazed with all the incredible inspiring ideas served by you. Some two points in this post are without a doubt the very best we've had.
To know more about NewsHerder visit
Mahzar said…
Taldeen is one of the best plastic manufacturing company in Saudi Arabia. They are manufacturing Handling Solutions Plastic products like Plastic Pallets and plastic crates. Here is the link of the product
Handling Solutions
Plastic Pallets
Here is the details of best BSc Medical Imaging Technology Colleges in Bangalore. You can get the college details from the below link. BSc Medical Imaging Technology Course is one of the best demanding course in recent times in India
BSc Medical Imaging Technology Colleges In Bangalore
Christian College Bangalore providing BSc Medical Imaging Technology Course. Here is the link about the details of BSc Medical Imaging Technology. You can click the below link for more information about BSc Medical Imaging Technology.
BSc Cardiac Care Technology Colleges In Bangalore
Christian College Bangalore providing BSc Optometry Course. Here is the link about the details of BSc Optometry. You can click the below link for more information about BSc Optometry. BSc Optometry is one of the most demanding course in recent times.
Optometry Colleges In Bangalore
BBA Aviation course is the best (Most Demanded) management course in India. Here, Christian College Bangalore providing BBA Aviation course. You can get the details of Christian College BBA Aviation from the below mentioned link. If you are interested in BBA Aviation, just visit the below link to know about BBA Aviation.
BBA Aviation Colleges In Bangalore
GrueBleen is one of the Branding and Marketing agency Based in Riyadh- Saudi Arabia. The main functions of GrueBleen is Advertising, Branding, Marketing, Office Branding, Exhibition Management and Digital Marketing. Visit the below link to know more about GrueBleen Creative Club.
Branding Agency Riyadh
Marketing Agency Riyadh
Agriculture Solutions – Taldeen is a plastic manufacturing company in Saudi Arabia. They are manufacturing agricultural plastic products like greenhouse cover and hay cover. Visit the below link to know more details
Agriculture Solutions
Greenhouse Cover
Medical Imaging Technology – One of the most demanding allied health science course in recent times in India. Check out the details of Best BSc Medical Imaging Technology Colleges Details with the following link.
BSc Medical Imaging Technology Colleges In Bangalore
BSc Perfusion Technology – If you are looking to study BSc Perfusion Technology in Bangalore, just check out the following link. In that link you can get the details of Best BSc Medical Imaging Technology colleges in Bangalore
BSc Perfusion Technology Colleges in Bangalore
GrueBleen – One of the best social media marketing agency in Riyadh- Saudi Arabia. Visit here for the all service details of GrueBleen.
Social Media Marketing Agency

Mazhar said…
Here is the best Digital Marketing Agency Riyadh. If you are looking for a social media agency Riyadh, the above(link) company is the best
Mazhar said…
Are you looking for Advertising Agency in Riyadh. Here, I would like you to share best Branding Agency Riyadh & Marketing Agency Riyadh
eCom said…
Thank you, please visit, thanks!
Rajan Mhatre said…
Your website is really cool and this is a great inspiring article. Much thanks to you such a great amount for sharing this sort of information.
Visit us for Custom Printed Puma Sweat Jacket.
Vegas Marketing said…
Very interesting blog. Many blogs I see these days do not really provide anything that attracts others, but believe me the way you interact is literally awesome.You can also check my articles as well.

Security Guard License
Ontario Security License
Security License Ontario
Security License

Thank you..
Pankaj Singh said…
Your article is really interesting thanks for sharing this valuable blog. Ogen Infosystem is a leading website designing and SEO Services in Delhi, India.
Website Designing Company in India
jhansi said…
Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware. It provides massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs.
tally training in chennai

hadoop training in chennai

sap training in chennai

oracle training in chennai

angular js training in chennai