Amazon DynamoDB

Last week, Amazon announced the launch of a new product, DynamoDB. Within the same day, Mitch Garnaat quickly released support for DynamoDB in Boto. I quickly worked with Mitch to add on some additional features, and work out some of the more interesting quirks that DynamoDB has, such as the provisioned throughput, and what exactly it means to read and write to the database.

One very interesting and confusing part that I discovered was how Amazon actually measures this provisioned throughput. When creating a table (or at any time in the future), you set up a provisioned amount of "Read" and "Write" units individually. At a minimum, you must have at least 5 Read and 5 Write units partitioned. What isn't as clear, however, is that read and write units are measured in terms of 1KB operations. That is, if you're reading a single value that's 5KB, that counts as 5 Read units (same with Write). If you choose to operate in eventually consistent mode, you're charged for half of a read or write operation, so you can essentially get double your provisioned throughput if you're willing to put up with only eventually consistent operations.

Ok, so read operations are essentially just look-up operations. This is a database after all, so we're probably not just going to be looking at looking up items we know, right?

Wrong.

Amazon does offer a "Scan" operation, but they state that it is very "expensive". This isn't just in terms of speed, but also in terms of partitioned throughput. A scan operation iterates over every item in the table, It then filters out the returned results, based on some very crude filtering options which are not full SQL-like, (nothing close to what SDB or any relational database offers). What's worse, a single Scan operation can operate on up to 1MB of data at a time. Since Scan operates only in eventually consistent mode, that means it will use up to 500 Read units in a single operation (1,000KB items/2 (eventually consistent) = 500). If you have 5 provisioned Read units per second, that means you're going to have to wait 100 seconds (almost 2 minutes) before you can perform another Read operation of any sort again.

So, if you have 1 Million 1KB records in your Table, that's approximately 1,000 Scan operations to perform. Assuming you provisioned 1,000 Read operations per second, that's roughly 17 minutes to iterate through the entire database. Now yes, you could easily increase your read operations to cut that time down significantly, but lets assume that at a minimum it takes at least 10ms for a single scan operation. That still means the fastest you could get through your meager 1 Million records is 10 seconds. Now extend that out to a billion records. Scan just isn't effective.


So what's the alternative? Well there's this other very obscure ability that DynamoDB has, you may set your Primary Key to a Hash and Range key. You always need to provide your Hash Key, but you may also provide the Range Key as either Greater then, Less then, Equal To,  Greater then or equal to, Less then or equal to, Between, or Starts With using the Query operation.

Unlike Scan, Query only operates on matching records, not all records. This means that you only pay for the throughput of the items that match, not for everything scanned.

So how do you effectively use this operation? Simply put, you have to build your own special indexes. This lends itself to the concept of "Ghost Records", which simply point back to the original record, letting you keep a separate index of the original for specific attributes. Lets assume we're dealing with a record representing a Person. This Person may have several things that identify it, but lets use a unique identifier as their Hash key, with no Rage key. Then we'll create several separate Ghost records, in a different table. Lets call this table "PersonIndex".


Now if we want to search for someone by their First Name, we simply issue a query with a Hash Key of property = "First Name", and a range Key of the first name we're looking for, or even "Starts With" to match things like "Sam" to match "Samuel". We can also insert "alias" records, for things like "Dick" to match "Richard". Once we retrieve the Index Record, we can use the "Stories" property to go back and retrieve the Person records.

So now to search for a record it takes us  Read operation to search, and 1 Read operation for each matching record, which is a heck of a lot cheaper then one million! The only negative is that you also have to maintain this secondary table of Indexes. Keeping these indexes up to date is the hardest part of maintaining your own separate indexes. however, if you can do this, you can search and return records within milliseconds instead of seconds, or even minutes.


How are you using or planning to use Amazon DynamoDB?

Comments

Ben said…
Thanks for the write up. I'm new to noSQL. Could you give a practical example of how you would use the "Stories" property. Thanks! Ben.

BTW ... posting on blogger.com is a PITA with the requirement to login with google account...and it obviously doesn't keep spam away!
deime said…
Having two persons named "John" is not uncommon. In this case, there will be a need for two identical primary keys in Person Index (hash_key='first name', range_key='John'), but this is not possible from what I read in the docs.

Am I wrong? Are there solutions to this issue?
Unknown said…
deime,

yes it is not uncommon but you can save 1 or more commaseparated person IDs

i.e.
HashKey = FirstName
Range=John
persons=10,100,5,20

where 10,100,5,20 list of people with name John
Unknown said…
You don't need to use a comma separated value for multiple values within a single property. You can actually store a list to any property within DynamoDB. Since there's no limit to the number of properties within a single item on DynamoDB, this is a real win-win situation.
Unknown said…
There is a maximum item size of 64kb though isn't there? If you had a lot of records then this wouldn't scale?
BobbyArcher said…
Thank you so much for providing individuals with such a wonderful chance to read articles and post posts from here. It is often very nice and packed with amusement for me personally and my office acquaintances to visit your web site particularly thrice weekly to study the latest guides you have. And of course, we are certainly amazed with all the incredible inspiring ideas served by you. Some two points in this post are without a doubt the very best we've had.
To know more about NewsHerder visit https://newsherder.com
Rajan Mhatre said…
Your website is really cool and this is a great inspiring article. Much thanks to you such a great amount for sharing this sort of information.
Visit us for Custom Printed Puma Sweat Jacket.
Right Jackets said…
This comment has been removed by the author.
Anonymous said…
This is a very nice one and gives in-depth information. I am really happy with the quality and presentation of the article. I’d really like to appreciate the efforts you get with writing this post. Thanks for sharing.
AutoCad Classes In Pune
Anonymous said…
This is a very nice one and gives in-depth information. I am really happy with the quality and presentation of the article. I’d really like to appreciate the efforts you get with writing this post. Thanks for sharing.
best Training Provider in pune
Karuna Anand said…
Very informative and useful blog. Also read my rising of fullstack developer blog in the 2023
SHIVAM SHARMA said…
Useful post Thanks for sharing it that truly valuable knowledge about similar topic. python classes in pune
What an useful content. If you are having Intuit Download Manager Error 2017 and need assistance fixing it, contact our QuickBooks Desktop Service Experts at +1-(855)-955-1942.
This blog post is a goldmine of inspiration! I can't wait to implement the practical tips and tricks you shared to enhance my daily routine. color blind test online are like a secret code that only certain eyes can decipher. Reading about the different types of color blindness and the ingenious ways these tests are designed made me appreciate the complexity of our visual system. It's like a whole new world within the colors we take for granted!
SHIVAM SHARMA said…
Useful post Thanks for sharing it that truly valuable knowledge about similar topic. Tableau training in pune
Janet R. Mack said…
The rapid integration of DynamoDB into Boto reflects the dynamic nature of technology and the agile response to new developments. Just as adapting to emerging trends is vital in the tech industry, students too need to stay flexible when it comes to their academic commitments. If managing coursework, including challenging topics like database provisioning, becomes overwhelming, services like "pay to take my online exam - Take My Online Exam" can offer valuable support. Staying up-to-date and well-prepared is essential in both tech and education to ensure a seamless transition into new ventures.