Thursday, March 22, 2012

Indexing with DynamoDB

One of Amazon's coolest services recently announced was Amazon DynamoDB. With this new service, you can utilize the massive power of Amazon's Cluster of Solid State Disks, and computational power to store and search your data. What's interesting about DynamoDB is that it doesn't index any of the fields you provide (outside of the ID), so if you want to be able to retrieve your data, most likely it's going to have to be via ID and lookups, not by scanning or querying. If you want to really utilize DynamoDB, you have to re-think how you store your data.

The Concept

Whenever Amazon releases a new product this interesting, I always try to figure out how it would work into my current workflow. For DynamoDB, I realized this could very easily solve my indexing problems, by simply taking a few notes from other search systems, I created an algorithm which indexes the most common versions of any given string you provide to it. For example, if you want to the following string:

Learning by Doing:

This gets first split into it's tuple pairs, each combination of words that it appears in this string. This starts with the full string:


  • LEARNING BY DOING


Then we go to two word tuples:


  • LEARNING BY
  • BY DOING


Then we go into single word tuples:


  • LEARNING
  • BY
  • DOING


Then we also index each "stemmed" version of each tuple string:

  • LEARN BY DO
  • LEARN BY
  • BY DO
  • LEARN
  • BY
  • DO


This leaves us with a list, which we then make sure to remove any duplicates from, and then insert the corresponding records into DynamoDB. The "id" field is whatever we pass into the add function when indexing. Lets see how this works in the new botoweb.db.index.Index class.

Indexing with Botoweb

I decided that this isn't just something that might be useful for Newstex, but instead could be generally useful. So I made a generic Index class which can be used to generate full and complex Indexes. To get started, simply create a new Index object


>>> from botoweb.db.index import Index
>>> index = Index("test-search")

Then lets add something to the index:

>>> index.add("Learning by Doing", "learnbydoing.com")


Note that if this is the first time you've created this index, it may take a minute or two for the first item to be added. This is because the Index class is actually creating your DynamoDB table before adding the item to Dynamo. After the record is added, you'll be able to search for it by almost any of the terms you would think of:


>>> for item in index.search("learn"):
...     print item['id']
... 
learnbydoing.com

What's even more fun is when you start indexing longer "collapsed" words:


>>> index.add("coredumped.org", "cdump")
>>> for item in index.search("core dumped"):
...     print item['id']
... 
cdump

What you don't see is that behind the scenes here multiple different searches take effect (with fallbacks so your primary search is always given precedence). In this search since there is no match for "core dumped' as two separate words, it also checks for "coredumped" as a single collapsed version. Additionally, the indexer takes the "." as a word separator, so it's not required to match the search result. However, what will not match is just using the word "core".

I'm still trying to find a good way to match partial words typed in (short of indexing each letter), so if anyone has any ideas there please let me know!

Thursday, March 15, 2012

Why Kindle will always live on

There's a lot that Amazon has done wrong recently with Kindle, most notably releasing the Kindle Fire. Although the Fire has been a much larger seller then most of us would have anticipated, it's constantly receiving negative reviews. There's a lot wrong with it, but the number one thing that's wrong is that it isn't a Kindle.

I recently purchased a new Kindle Touch, just to see what all the Hype is really about.

Simple No-Frills Interface

The simpler, the more specific, the better. Imagine giving an iPad to your non-technical Grandmother who's just use to reading books, you know, those things on paper. Paper, you remember, the stuff made out of dead trees? Well guess what, it's not nearly as intuitive as a book. A book you just open and start reading. An iPad, well you have to click on all these things, first of all hopefully you already pre-installed iBooks for her, and then you have to guide her through how to browse books and purchase them... and you can only purchase them on that device and read them on that device....

Then there's Kindle. Kindle not only looks and acts like a book when reading it, you also can purchase books on your computer, iPad, iPhone, Kindle, whatever, and just read them. Kindle isn't designed to be fancy, it's plain and simple. When you order one from Amazon, it's already activated to your account that you purchased it from. It starts up and guides you through everything. From the moment you turn it on, it's goal is to help you use it, not to confuse you with all the different features you can use on it. And guess what, if you forget to turn it off? Oh well, it's battery life is measured in months, not hours.

E-Ink

The E-ink display literally exactly like you would expect normal paper to react. It's not backlit, there's no strain on your eyes, it's like reading a regular text book. The only negatives so far are that the current version doesn't do color (although that is rumored to be released in the next refresh of kindle in the middle of 2012), and it has a horrible refresh rate. While backlit displays measure in terms of single digit milliseconds, the E-Ink display measures refresh rates in the hundreds of milliseconds.

Even the Color E-Ink display that is coming out doesn't have very good stats. It only displays just over 4,000 different colors (compared to the millions/billions that a traditional display can do). So what makes the E-Ink displays so much better?

Battery Life, These things last for months longer because they soak up just a tiny fraction of the battery a traditional backlit display does. There's nothing else on the market that even comes close.

You can read it in direct sunlight, that's right, on the beach where you'd normally need a traditional book, you can use an e-ink display. What's even weirder, you actually can not view them without a light source. They don't provide their own light, so you need something like this if you want to read your kindle in the dark.

No Backlit display means less eye strain. How many times do you get headaches because of constantly staring at an LCD? Right now I'm typing this up on a traditional display, and already my eyes are hurting. It's not anything that any styling can do to fix it, it's the fundamental flaw of a backlit display.

Come out from the dungeon! Perhaps the most important aspect these e-ink displays will do is bring us out of our dungeons. Right now I'm sitting in a room that's mostly dark, because that's the only way I can see my screens. With the e-ink displays, they actually encourage you to get outside and see actual sunlight. Imagine the difference between "Work at home" to "Work outside wherever the hell you want".

It will get better. Remember when regular CRT monitors first came out? How many colors did they have? How bad was the refresh rate? For my money, within 10 years, I expect that we'll have e-ink displays that can compete with the best LCDs out there. Yes, there will still be some that hold out on the backlit displays, and I suspect the backlit displays will always be around in some manor, but E-Ink will start to creep into our lives more and more over the next decade, and it'll be a change for the better.


Free 3G

No it doesn't work with the Experimental browser or any third-party stuff, but for syncing and most book-related things, the free 3G is an excellent selling point. You don't have to be on WiFi to buy a new book, sync your old books (remember there is a limited amount of space on the Kindle, but everything is archived to your "book cloud"). And no contract, no meetering, no billing. You don't ever even have to talk to AT&T (unless you want to use it overseas). Yes, it's slow, but it's not designed for streaming videos, it's designed to synchronize your text documents, bookmarks, and notes.


Tuesday, March 13, 2012

Before you name it, Google it.

I've noticed today two major screw ups by companies that chose to use names that are already very well established as existing applications or technology.

Kismet


Kismet, as most technology and network enthusiasts know, is a well established and highly popular Network analysis tool. Yet a company decided to make an iOS application called Kismet, which deals with the other kind of networking. What's worse, people have been trying to get Kismet on iOS for a while, and other apps are appearing as "Kismet like" for iOS. Now that's shrouded by this false app, which isn't at all what most people are going to want when they search for Kismet.

What do you see when you Google for Kismet? Certainly not the Social networking tool.

Google


Google recently announced the new Google Play; nothing really revolutionary, but essentially bringing it up to par with iTunes. This is actually a pretty big and important step for Google. They updated their Privacy Policies to allow them to share your user information between their own apps, and now they're unifying their purchase system to be all in one. Unfortunately, they chose a very poor name for their former Google Books application, which is now called "Play Books".

There's two things wrong with this name: First, it doesn't at all say Google anymore. Ok the market center is now called Google Play, but you can still call the app Google Books. Secondly, There already is a thing called a Play Book. Yes, it's from BlackBerry, and now people are going to associate that app with a failing company that's losing marketshare like a leaky faucet.

What do you get for a Google search for Play Books? Even though BlackBerry may be on the downward spiral, they still show up as the top results, not Google's app.

For a company like Google, you'd think they'd have searched on that name first.


Please Google it first


If it's already something well established, pick a different name. Stop naming things that can be confused with other products or companies; and please, stop adding "get" to the front of your domain name because someone already owns the original.

Wizpert