Last week, Amazon announced the launch of a new product, DynamoDB. Within the same day, Mitch Garnaat quickly released support for DynamoDB in Boto. I quickly worked with Mitch to add on some additional features, and work out some of the more interesting quirks that DynamoDB has, such as the provisioned throughput, and what exactly it means to read and write to the database.
One very interesting and confusing part that I discovered was how Amazon actually measures this provisioned throughput. When creating a table (or at any time in the future), you set up a provisioned amount of "Read" and "Write" units individually. At a minimum, you must have at least 5 Read and 5 Write units partitioned. What isn't as clear, however, is that read and write units are measured in terms of 1KB operations. That is, if you're reading a single value that's 5KB, that counts as 5 Read units (same with Write). If you choose to operate in eventually consistent mode, you're charged for half of a read or write operation, so you can essentially get double your provisioned throughput if you're willing to put up with only eventually consistent operations.
Ok, so read operations are essentially just look-up operations. This is a database after all, so we're probably not just going to be looking at looking up items we know, right?
Wrong.
Amazon does offer a "Scan" operation, but they state that it is very "expensive". This isn't just in terms of speed, but also in terms of partitioned throughput. A scan operation iterates over every item in the table, It then filters out the returned results, based on some very crude filtering options which are not full SQL-like, (nothing close to what SDB or any relational database offers). What's worse, a single Scan operation can operate on up to 1MB of data at a time. Since Scan operates only in eventually consistent mode, that means it will use up to 500 Read units in a single operation (1,000KB items/2 (eventually consistent) = 500). If you have 5 provisioned Read units per second, that means you're going to have to wait 100 seconds (almost 2 minutes) before you can perform another Read operation of any sort again.
So, if you have 1 Million 1KB records in your Table, that's approximately 1,000 Scan operations to perform. Assuming you provisioned 1,000 Read operations per second, that's roughly 17 minutes to iterate through the entire database. Now yes, you could easily increase your read operations to cut that time down significantly, but lets assume that at a minimum it takes at least 10ms for a single scan operation. That still means the fastest you could get through your meager 1 Million records is 10 seconds. Now extend that out to a billion records. Scan just isn't effective.
So what's the alternative? Well there's this other very obscure ability that DynamoDB has, you may set your Primary Key to a Hash and Range key. You always need to provide your Hash Key, but you may also provide the Range Key as either Greater then, Less then, Equal To, Greater then or equal to, Less then or equal to, Between, or Starts With using the Query operation.
Unlike Scan, Query only operates on matching records, not all records. This means that you only pay for the throughput of the items that match, not for everything scanned.
So how do you effectively use this operation? Simply put, you have to build your own special indexes. This lends itself to the concept of "Ghost Records", which simply point back to the original record, letting you keep a separate index of the original for specific attributes. Lets assume we're dealing with a record representing a Person. This Person may have several things that identify it, but lets use a unique identifier as their Hash key, with no Rage key. Then we'll create several separate Ghost records, in a different table. Lets call this table "PersonIndex".
Now if we want to search for someone by their First Name, we simply issue a query with a Hash Key of property = "First Name", and a range Key of the first name we're looking for, or even "Starts With" to match things like "Sam" to match "Samuel". We can also insert "alias" records, for things like "Dick" to match "Richard". Once we retrieve the Index Record, we can use the "Stories" property to go back and retrieve the Person records.
So now to search for a record it takes us Read operation to search, and 1 Read operation for each matching record, which is a heck of a lot cheaper then one million! The only negative is that you also have to maintain this secondary table of Indexes. Keeping these indexes up to date is the hardest part of maintaining your own separate indexes. however, if you can do this, you can search and return records within milliseconds instead of seconds, or even minutes.
How are you using or planning to use Amazon DynamoDB?
One very interesting and confusing part that I discovered was how Amazon actually measures this provisioned throughput. When creating a table (or at any time in the future), you set up a provisioned amount of "Read" and "Write" units individually. At a minimum, you must have at least 5 Read and 5 Write units partitioned. What isn't as clear, however, is that read and write units are measured in terms of 1KB operations. That is, if you're reading a single value that's 5KB, that counts as 5 Read units (same with Write). If you choose to operate in eventually consistent mode, you're charged for half of a read or write operation, so you can essentially get double your provisioned throughput if you're willing to put up with only eventually consistent operations.
Ok, so read operations are essentially just look-up operations. This is a database after all, so we're probably not just going to be looking at looking up items we know, right?
Wrong.
Amazon does offer a "Scan" operation, but they state that it is very "expensive". This isn't just in terms of speed, but also in terms of partitioned throughput. A scan operation iterates over every item in the table, It then filters out the returned results, based on some very crude filtering options which are not full SQL-like, (nothing close to what SDB or any relational database offers). What's worse, a single Scan operation can operate on up to 1MB of data at a time. Since Scan operates only in eventually consistent mode, that means it will use up to 500 Read units in a single operation (1,000KB items/2 (eventually consistent) = 500). If you have 5 provisioned Read units per second, that means you're going to have to wait 100 seconds (almost 2 minutes) before you can perform another Read operation of any sort again.
So, if you have 1 Million 1KB records in your Table, that's approximately 1,000 Scan operations to perform. Assuming you provisioned 1,000 Read operations per second, that's roughly 17 minutes to iterate through the entire database. Now yes, you could easily increase your read operations to cut that time down significantly, but lets assume that at a minimum it takes at least 10ms for a single scan operation. That still means the fastest you could get through your meager 1 Million records is 10 seconds. Now extend that out to a billion records. Scan just isn't effective.
So what's the alternative? Well there's this other very obscure ability that DynamoDB has, you may set your Primary Key to a Hash and Range key. You always need to provide your Hash Key, but you may also provide the Range Key as either Greater then, Less then, Equal To, Greater then or equal to, Less then or equal to, Between, or Starts With using the Query operation.
Unlike Scan, Query only operates on matching records, not all records. This means that you only pay for the throughput of the items that match, not for everything scanned.
So how do you effectively use this operation? Simply put, you have to build your own special indexes. This lends itself to the concept of "Ghost Records", which simply point back to the original record, letting you keep a separate index of the original for specific attributes. Lets assume we're dealing with a record representing a Person. This Person may have several things that identify it, but lets use a unique identifier as their Hash key, with no Rage key. Then we'll create several separate Ghost records, in a different table. Lets call this table "PersonIndex".
Now if we want to search for someone by their First Name, we simply issue a query with a Hash Key of property = "First Name", and a range Key of the first name we're looking for, or even "Starts With" to match things like "Sam" to match "Samuel". We can also insert "alias" records, for things like "Dick" to match "Richard". Once we retrieve the Index Record, we can use the "Stories" property to go back and retrieve the Person records.
So now to search for a record it takes us Read operation to search, and 1 Read operation for each matching record, which is a heck of a lot cheaper then one million! The only negative is that you also have to maintain this secondary table of Indexes. Keeping these indexes up to date is the hardest part of maintaining your own separate indexes. however, if you can do this, you can search and return records within milliseconds instead of seconds, or even minutes.
How are you using or planning to use Amazon DynamoDB?
Comments
BTW ... posting on blogger.com is a PITA with the requirement to login with google account...and it obviously doesn't keep spam away!
Am I wrong? Are there solutions to this issue?
yes it is not uncommon but you can save 1 or more commaseparated person IDs
i.e.
HashKey = FirstName
Range=John
persons=10,100,5,20
where 10,100,5,20 list of people with name John
CCNA Training in Chennai
android Training in Chennai
Java Training in Chennai
AWS Training in Chennai
AWS Certification in Chennai
AWS Course
Crm Software Development Company in Chennai
web portal development company in chennai
web portal development services in chennai
professional web design company in chennai
smo company in chennai
seo company in chennai
best seo company in chennai
erp software development company in chennai
sem services in chennai
twitter marketing company in chennai
Erp software development company in chennai
Professional webdesigning company in chennai
seo company in chennai
Crm software development company in chennai
To know more about NewsHerder visit https://newsherder.com
Visit us for Custom Printed Puma Sweat Jacket.
Robotic Process Automation (RPA) Training in Chennai | Robotic Process Automation (RPA) Training in anna nagar | Robotic Process Automation (RPA) Training in omr | Robotic Process Automation (RPA) Training in porur | Robotic Process Automation (RPA) Training in tambaram | Robotic Process Automation (RPA) Training in velachery
Security Guard License
Ontario Security License
Security License Ontario
Security License
Thank you..
Website Designing Company in India
tally training in chennai
hadoop training in chennai
sap training in chennai
oracle training in chennai
angular js training in chennai
TALLY TRAINING IN CHENNAI
TALLY TRAINING INSTITUTE IN CHENNAI
TALLY ERP 9 CLASSES IN CHENNAI
TALLY COURSE IN CHENNAI
GST TALLY TRAINING INSTITUTE IN CHENNAI
TALLY CLASSES IN CHENNAI
BEST TALLY TRAINING CENTER IN CHENNAI
TALLY CERTIFICATION COURSE IN CHENNAI
TALLY CLASSES IN CHENNAI
BEST TALLY TRAINING COURSES IN CHENNAI
BEST TALLY ERP 9 TRAINING IN CHENNAI
TALLY ERP 9 COURSE IN CHENNAI
COMPANY ACCOUNTS TRAINING INSTITUTE IN CHENNAI
COMPANY ACCOUNTS TRAINING IN CHENNAI
ACCOUNTING COURSES IN CHENNAI
ACCOUNTS TRAINING CENTER IN CHENNAI
GST TRAINING INSTITUTE IN CHENNAI
GST CLASSES IN CHENNAI
GST TRAINING COURSES IN CHENNAI
GST CERTIFICATION COURSE IN CHENNAI
TALLY GST CERTIFICATION TRAINING IN CHENNAI
BEST TAXATION TRAINING IN CHENNAI
TAXATION COURSE IN CHENNAI
TAXATION CLASSES IN CHENNAI
INCOME TAX CONSULTANT COURSE IN CHENNAI
BEST TALLY TRAINING IN CHENNAI&VADAPALANI
BEST TALLY TRAINING COURSES WITH 100% JOB PLACEMENTS & CERTIFICATION
Java training institute in Chennai
Java training in Chennai
Best Java courses institutes
Java training in Chennai with 100% placements
Best Java training institute in Chennai
ONLINE JAVA TRAINING INSTITUTE IN CHENNAI
BEST JAVA TRAINING IN CHENNAI & VADAPALANI
CORE JAVA/J2EE TRAINING IN CHENNAI
BEST NO.1 JAVA TRAINING INSTITUTE IN CHENNAI & VADAPALANI
JAVA COURSE IN CHENNAI
JAVA CLASS IN CHENNAI
JAVA CLASSES IN CHENNAI
BEST JAVA TRAINING CENTER IN CHENNAI
JAVA CERTIFICATION COURSE IN CHENNAI
BEST JAVA TRAINING COURSES WITH 100% JOB PLACEMENTS & CERTIFICATION
Solution
Cosplay Models
Australian Leather Jackets
nezuko cosplay
law dissertation writing services
Tire repair near me
Gas near me
Truck repair shop near me
web design company
website designer near me
web development company
AutoCad Classes In Pune
Thanks for the information, good information & very helpful for others. For more information about what subjects are needed to Information.
spoken english Classes in pune
best Training Provider in pune
Tire repair near me
real estate marketing companies canada
web designing company toronto
AWS classes in Pune