Wednesday, April 23, 2014

CloudSearch vs Algolia - Battle of the Search engines

It's pretty clear the new CloudSearch system isn't heading in the right direction, in fact they're starting to lose some of their important features and instead it's becoming a commodity. In addition to doing a lot of searching on the new CloudSearch to see if it had any redeeming factors, I also started looking around for alternatives. One in particular came out as having a lot of potential, Algolia. Silly name, but amazing product.

Lets break it down.


CloudSearch v1 allows you to send random JSON fields to it even if they aren't in the index. However, anything not configured won't be searchable until you do configure it. Still, every field can be multiple, and every field can be added to the index later if needed. CloudSearch v2 does not let you send extra fields, and instead tosses an error and refuses to index anything with extra fields. That means that if you want to start searching another field, you have to re-submit all your documents after adding it to the index.

Algolia, on the other hand, accepts any arbitrary JSON, including nested data. Yes, you can send something like this to Algolia and it will just figure it out:

  "author": {
     "first_name": "Chris",
     "last_name": "Moyer",
     "id": 12345
  "title": "Building Applications in the Cloud",
  "toc": [
        "number": 1,
        "title": ...

The previous example would index "author.first_name", "author.last_name", "", "title", "toc.number", and "toc.title". You can even go multiple levels deep and it just works.

All of this without having to pre-configure an index. Yes you can choose fields and how things are indexed, but you don't have  to do so. It tries to figure everything out for you automatically, and it does a pretty good job.

Winner: Algolia


Both versions of CloudSearch allow complex boolean queries. CloudSearch v1 allows for a prologue-like format of searching:

   (and (or author:'Chris' title:'Building Applications*') content:'Cloud Computing' (not 'Weather'))

This lets you combine for some very complex logic, and gives you full power to search full-text throughout your records. You can also do simple text-based searches, and define what fields those text-base searches by default. You can combine using wildcards or full-words, as well as phrases grouped by quotes ("this is a phrase"). With v2, this syntax changes slightly, but still allows you to do some very complex querying, and even adds in Location-based searching (Lat-Lon).

Algolia does not allow for complex boolean query searching. You can make full-text searches, and filter against Facets. You can not group searches in the way you can in CloudSearch. You can not do negation searches. You CAN do some OR logic with Facet Filters, but not nearly as complex as CloudSearch offers.

Algolia also allows you to search multiple indexes at once, with CloudSearch you do not get that option.

Both systems offer Lat-Lon Searching, Faceting, and Numeric Range filtering. Both systems also return results relatively fast (within a few milliseconds).

Winner: CloudSearch


CloudSearch added support for Search Analytics. These analytics come in three different reports, Search Count, Top Searches, and Top Documents.

The most interesting one is Search Count:

All reports are also downloadable to a CSV which can be used for further analytics. Most of the data is very raw and not very useful right out of the console.

Algolia, on the other hand, provides a weekly email that shows many more stats, and the console for their system includes quite a few bits of Eye Candy.

They also provide a nice dashboard which contains a lot of useful performance stats, as well as a general "health" of your indexes:

There's also a full set of stats available on each index including the number of operations, searches, and records, all by time series.

Winner: Algolia


CloudSearch requires quite a bit of initial setup. You have to provision your domain, initialize some indexes, and then wait about 30 minutes for each domain to be created. You also have to configure IP addresses that can access the domains. This is quite contrary to other Amazon Web Services, and does not support IAM or Credentials at all.

Algolia, on the other hand, does support Access Tokens, and even supports setting up custom Credentials with varying levels of permissions on different indexes. It does not allow you to edit the permissions after the credentials are generated, but you can always revoke credentials and send out new ones. As for setup? There is almost none. you can create a new index in seconds, you don't need to start with anything. You can even do so from the API by just sending documents, and then configuring a few things like default sort order, facets, and default search fields.

Additionally, when you change an index in Algolia, it happens nearly instantaneously. With CloudSearch you have to re-issue an "Index Documents" request, which temporarily puts your domain in a partially-working state (searches might return out dated results), and takes anywhere from a few minutes to a few hours. It also costs you.

Algolia lets you clear a domain instantly and those records are gone immediately. This makes resetting an index very simple. With CloudSearch, you have to remove each story individually, and then issue a new Index Documents request to get the size of your domains down again.

Winner: Algolia


CloudSearch v1 was entirely based on the A9 search system. It was built to run on large servers, and designed around speed of search results. It works very well, but requires a lot of resources, and thus is costly. You also can't tell ahead of time how much storage you'll need, and the transparency is very low on how much you're using. The domains automatically scale and you don't have much control over it.

CloudSearch v2 is based on a different system, and does significantly reduce the costs, however it still is expensive, and doesn't really let you know how much storage you're using. You can give the domain hints to how large of a domain you want to start with, but you don't get any control over where the domain goes.

With CloudSearch, all you can ever see is how many documents are in the domain, and how many servers of what size are being used. It automatically scales with you, but the cost is very high.

With Algolia, you pick a plan. Up until you decide to go with an Enterprise Plan, you're paying for the number of documents in your domain. About $450/month gets you 5 million documents. For me, that's about an XXLarge domain on CloudSearch, which is about $800/month on AWS, plus indexing costs. Want to make sure it's reliable? Then you have to turn on Multi-AZ, doubling the cost to $1600/month.  CloudSearch v2 has been known to reduce sizes by up to 50%, but even at that with Multi-AZ enabled you're looking at about $800/month. Plus you pay for Batch Uploads, and Document Indexes if you need to run those. 

Algolia also shows you right off the top how much of your quota you're using, and you can easily remove documents. When you remove a document it's gone right away, you don't have to fiddle about trying to get your domains to scale down in size. If you want to go Enterprise, you pay by the Storage Size, but you can get 150GB of storage index, mirrored onto 3 dedicated servers, for about $1,750/month. In my example, that will fit about 30 Million records pretty easily, which costs us right now about $6k/month. That's a pretty big difference.

Winner: Algolia


In total, that brings Algolia to 4 wins, with CloudSearch only at 1 win. Still, that one win is on Search capabilities itself. Algolia was designed around making things fast, and require very few resources. They're slick, powerful, and new. They have a long way to go but they're already winning over CloudSearch. For most of my needs, Algolia wins easily over CloudSearch, even without the complex querying capabilities.

If for nothing other then Cost alone, Algolia is vastly better then CloudSearch. The team is small, but the product is solid, and I can't wait to see where it goes next.

Have you worked with Search as a Service solutions? What other systems have you found useful?

Wednesday, April 2, 2014

Is the new Amazon CloudSearch April Fools Joke?

Anyone who knows me, knows I'm a pretty big fan of AWS. I come out in defense of them more often then not, and my twitter feed is always buzzing with how much better they (typically) are then other wanna-be cloud providers. I tend to love any new service they come out with, and I try just about everything they make available to me.

I don't usually rant about AWS. I use them for my everyday life. They've built an Amazing amount of services. My favorite quote (and many of your favorites as well):

AWS is the Python of the CloudSpace. Everything is in the STDLib… #AWSSummit
So when I do rant, it's because they've done something really out of character. In this case, it's something so bad I think they need to re-think even deploying this until it's fixed.

The "New" CloudSearch

Not too long ago, Amazon release a new version of CloudSearch, with a lot of anticipated features, some of which are very enticing, such as:

  • Geographic Search (Lat-Lon)
  • Search Highlighting
  • Multi-AZ
  • Autocomplete
Unfortunately, there are some services that are implemented very poorly, and so much of it is a backward step that I have to wonder if this was some sort of early April fools joke. This "new" cloud search feels more like a pre-beta version, taking several leaps in the opposite direction of progress.

Hey Amazon, this is a joke right?

A long awaited step was to support multiple languages, however, they also removed your ability to specify the language of the document. Instead they suggest what you really wanted was to specify the language of each field. What..

To upload your data to a 2013-01-01 domain, you need to:
Omit the version and lang attributes from your document batches. You can use cs-import-documents to convert 2011-02-01 SDF batches to the 2013-01-01 format.

Another very important and backwards step is the ability to upload documents with more fields then you need indexed initially. This was incredibly useful because your backend can simply dump out all the objects to your SDF and upload them to the domain, then in the future if you wanted to add a new field, you don't need to re-upload all of your documents. This has also been removed:
Make sure all of the document fields correspond to index fields configured for your domain. Unrecognized fields are no longer ignored, they will generate an error.

What's worse, they added support for indexing from DynamoDB, but if you don't put every single field directly in your domain, you have to hand edit the SDFs or put everything into your CloudSearch domain:
The DynamoDB uploads are also not a pipeline, it only helps with the initial upload.

The need for specifying the exact document format you're going to upload is really very intense too. Before any field could be multi-valued, and now you have to very specifically tell it if a field is multi-valued or not, and if you want to use something like a "suggester", it only works with single-valued fields.

You also can't make a single field that maps in multiple source-fields, unless each of those source fields are included in the index itself. There is no more merging in multiple source fields into a single field to save on space.

Not just a new API, a whole new (incompatible) system

Perhaps worst of all, the new version of CloudSearch is entirely incompatible with the old version. This means that if you want to try out any of the new features, you basically have to start all over and re-design all of your systems, as well as re-create and re-upload your existing indexed data. Amazon provides no automated tools to do so either, you're pretty much on your own.

If you are an existing user of CloudSearch, you won't want to switch to this new system. It's not nearly as advanced as your existing implementation. You'll be missing quite a bit of functionality. If you're just starting out, you might not notice, and you'll probably be happy with some of the different (not new) features they're providing, such as pre-scaling and multi-AZ support.

Hopefully this is not indicative of the new way that CloudSearch, and Amazon in general, is moving to. This is the first time they've released a new product that has completely frustrated me, to the point of wondering what they're thinking. This is not the path forward, this is a completely re-work of an existing system for some specific use case, not a general need.

Friday, December 6, 2013

Half-pops: The "real" corn-based-nut

Recently a friend of mine introduced me to Corn Nuts. After scouring the stores looking for them and finally locating them at a CSV about 15 miles away, I tried them. They were OK, but not that intense crunch I'd grown to love by eating the little half-popped kernels of corn you get when you make a big batch of popcorn.

I love popcorn, and I've even gone so far as to find my perfect type of popcorn, the softest and smallest I've found, Ladyfinger Popcorn. I particularly love the kind from Wabash Farms, on Amazon.

I also discovered HalfPops, which appeared to be almost exactly what I was looking for, but without a local branch to sell them and having to pay a very large sum to get them shipped to me, I set out to find a way to make them on my own. I'd already found that making Corn Nuts was as simple as deep-frying Hominy, so these new Half-Pops (sometimes called "Old Maids") couldn't be that hard to make. would like you to think it's an incredibly complex process; it's not.

First, lets take a look at what we're aiming for:

This is what I wanted to make, but not just a few with a lot of wasted popcorn, a whole bunch of them, something like this:

That's a whole lot of half-popped popcorn! Because I used ladyfinger popcorn, they also aren't incredibly hard, but they still are quite crunchy. Enough that they could probably break your teeth if you're not too careful, and you might occasionally still get a few that are just too hard to eat, so make these at your own risk... but oh are they incredibly delicious.

So what's the secret? It's much simpler then you might think. The issue with popcorn is the coating on the outside makes it hold all of the moisture inside until it finally explodes, so the key is to try to remove some of that coating before you pop the kernels (this also has the added effect of "softening" the hardness of the kernels). There's a lot of ways you could try to do this, but the easiest I've found is  by soaking it in water.

What you need:

You'll only need a few simple household items to make these, plus the Ladyfinger popcorn. It might work with others, but I've tried Mushroom popcorn with no success at all, so steer clear of that. Here is what I used

The process is quite simple, although it does take a while due to the soaking process. Take a lot of popcorn, say about a cup at least, and add it to an air-tight container. Add water, enough to cover it. Some of the kernels will float, don't worry about that, eventually you'll stir it around and they will mostly all sink. Add about the same amount of salt you would do for brining meat, so if you've got a cup of kernels, you should have about a cup of water, and a quarter cup of salt. Seal, then shake the container (like the old shake-and-bake bags) until all the salt, kernels, and water are mixed.

Then comes the long part, let it sit in a cool area (but not the fridge!) for at least 4 days, occasionally shaking it up to make sure the salt doesn't settle too much.

After the 4 days, you're ready to start. First, spin up your popcorn kettle (you can also try this with a traditional pot, but it might be tough to make sure the kernels don't burn at the bottom). You'll need to add a good quantity of Popping oil, about 3-4x what you normally would add. I usually squeeze in about 4-5 tablespoons, but you can experiment with more depending on how much popcorn you're using. You need to add enough that the top of the kernels will be covered with oil.

Next, turn the kettle on. While it is warming up a bit, drain the popcorn that's been soaking for at least 4 days. DO NOT dry it with a paper towel. You don't need to get all the water out, you just don't want to dump in a whole ton of the stuff.

You don't need to wait for the kettle to completely come up to temperature like with regular popcorn, in fact that seems to harm the process, so by the time you're done draining the popcorn, it's time to add it to the pot. Add about 3x what you normally would, as long as all the kernels are covered with oil. It's ok (and even good) if a little extra water gets into the pot.

Wait for the kernels to start popping (slightly) and then listen closely. The kernels won't pop out of the kettle, so you have to be very careful to watch that they don't burn.  It will start slowly, then pick up, and then start to slow down again. It takes a little longer then regular popcorn, so just be patient. If you start to see kernels popping out of the top, then you've left it in too long, or didn't soak it long enough.

Once the popping starts to really slow down, about a second between each pop, turn off the kettle and let it sit for at least a minute. This is very important as the oil is still very hot. If you dump the kernels out as they are, you'll end up burning yourself, and probably melt any plastic bowl you might be putting them into (I did this the first time I made it!).

After a minute or so sitting off, dump the kernels out into a bowl, and add whatever seasoning you want. If you added enough oil, there will still be quite a bit left with the kernels, so you shouldn't need to add any butter to get the seasonings to stick. Personally, I like mine hot, so I always add this:

You also probably want to add some salt. I typically also add Gourmet Fries Seasonings, Salt and Vinegar. The combination of these two adds quite an amazing little punch to these things.

Make sure you still let them breathe a bit, and you can even dry them off with a paper towel like you would fries.

They aren't quite the same as the half-pops, in fact usually they're a little less popped. The perfect snack, and it doesn't take a whole lot of new equipment to make it!

What do you like adding to your half-pops?


Tuesday, August 27, 2013

Alfred 2: Workflows Review

Recently, a co-worker of mine discovered Alfred. (They have an app in the App Store as well, but version 2 is only available on their website.) The second iteration of this application has introduced a pro version which includes a powerful new feature, Workflows. While there are many pre-built workflows, the most powerful feature I've discovered is the ability to create your own workflow, which can be incredibly advanced, calling PHP, Python, or any generic script to populate the list of choices in Alfred, and even execute custom actions.

Before starting with Workflows, I suggest you download the very powerful and useful Workflow Searcher. It ties into the api behind and lets you very easily download and install custom workflows right from Alfred.

Example: Searching for Spotify workflows

While I've enjoyed many workflows built by others, I also wanted to contribute by making some of my own. So far I've managed to make a custom workflow that searches our internal systems, and lets me access any record within our database very easily.

My next challenge was re-building the list_instances command from boto and combining it into an SSH script. The python ALP library has proven exceptionally useful in this experience. I've chosen to bundle this library with my workflows since the version installed via pip or easy_install is too old and doesn't contain the latest functionality. This also means that in order to install the workflow, you don't need to have alp, just boto itself installed and configured properly.

The result is something like this:

Selecting one of the items automatically copies the appropriate SSH commands to your clipboard.

I plan to improve this workflow to automatically perform the SSH commands in the future, however I've currently set it up to also allow me to SSH into instances behind a VPC. This is done by first running "ssh vpc" and then sshing directly into the instances private IP. This means two commands are copied to your clipboard, which I haven't yet found a good way to execute via a terminal directly.

You can find the full workflow here. Or a direct download link here.

Other boto commands to come… let me know if you'd like to help, or have any ideas by posting issues to the workflow on github!

Monday, August 12, 2013

Backing up your RDS instance to Dropbox using Sequel Pro

After searching around for a third-party backup solution to RDS, and realizing that many companies just want to charge way too much for something so simple, I decided it would be much easier to simply back up our SQL databases ourselves. While this process is manual, you could easily use a script to automate backing up (something like Mysqldump, or just write a simple python script to create the dump file then upload to dropbox).

For our purposes, we wanted to take snapshots of the databases after a certain task was completed, so it's easiest to just take manual snapshots. Fortunately, SequelPro makes backing up databases so simple anyone can do it.

Here's how.

Step 1: Download Sequel Pro:

It's free, and available here:

Yes, it's Mac only.

Step 2: Authorize your IP

Make sure your IP is authorized to access your RDS server.  You can do this pretty easily from the AWS Management console. Click on Security Groups:

Then choose your security group, most likely you just have the default one. Click the icon next to the checkbox to dive deeper into the group

Lastly, enter your IP followed by /32 if it doesn't already appear in your list, and hit add.

Note that it can take a few minutes for your IP to be authorized.

Step 3: Connect to your server

It's pretty simple to connect to your server, just fill in the Host, Username, and password. Make sure to select "Connect using SSL". You can save this connection for later by pushing the Plus button in the bottom left and naming the connection.

Step 4: Select the database you want to export

Make sure you select the database you want to export from the menu located at the top of the SequelPro dialog:

Step 5: Choose "Export" from the File menu

 This will bring up a dialog box that will let you choose what to export. Make sure to choose "SQL" and you probably will want to select a compression method. You can access this option by clicking on the arrow at the bottom of the dialog box.

Make sure you choose the path you want to save to. Since we've got a large dropbox account, this is the best place to go for us. It's easy to simply export the compressed BZ2 (which takes longer, but is much more compact) right to dropbox, and have it automatically backed up to an external system.


Recovering from this backup is also just as easy. All you have to do instead of choosing "export" on step 5 is choose "import". This will let you choose the location of the file to import, and your database will be replaced with the content of that latest backup.

Wednesday, June 26, 2013

blink(1) - Completely Programmable USB LED

Blink(1) USB Light
Ever since I missed the chance to get The Woot Lights, I've been looking for any sort of simple programmable USB light. Yes, my Phone is hooked up to PagerDuty, and I've got Geckoboard to show me ongoing status, as well as FlowDock set up to get non-critical alerts from our various systems as well, but there's something about having an actual light somewhere set up to go off or simply show status colors for various things. Call it my Delta Flyer.

After a lot of searching, I ran across the Blink(1) by ThingM.

The guys over at ThingM really know what they're doing, even the box is well designed. Simple, clean, and very apple-esque:

Blink(1) Box
This little USB "Status Light" comes with some pretty cool software, but the most impressive part is that it's entirely open source. They even have a Node.js library, which works quite well. Even better, for simple applications, they have a very simple control program which hooks into services like IFTTT, or even works off of simple shell scripts which can return a status color code.

Overall my experience with the Blink(1) has been very great. It's simple to get started with, and very open and easy to make do exactly what you want. Even if you don't want to do development, it's super simple to make it respond to things like searches on twitter, getting emails, or even just responding to music. Being from the DAMMIT era, this is certainly a worthy piece of technology.

Sunday, April 7, 2013

Amazon CloudDrive review

Amazon recently split it's Cloud Drive from their Cloud Player, and released both a Windows and Mac Sync application, entering the same space as Google Cloud Drive, Microsoft SkyDrive, and Dropbox.

Dropbox is the de-facto standard in the market, having the most impressive user experience, and seamlessly integrating into OSX and Windows. It also has a lot of additional features that make it very useful when you're using multiple computers in the same network, and even more improvements recently for sharing, and for using Dropbox in a team environment.

So how does this new addition stack up to the competition?

Unfortunately, it falls just short of being as good as Dropbox, for a few very important reasons.

The UI

Dropbox made its waves by integrating seamlessly into OSX. Specifically, it makes it very easy to tell what is in sync, what's shared, and what is still being synced, right from within Finder.

Here's a folder in Dropbox:

And here's that same folder in CloudDrive (as well as any of the other "drive" applications):

Notice the green checkmark in the lower left corner of the Dropbox folder? That tells you the folder is "in sync" and available on dropbox. No such integration with the other drive applications, all you can tell from there is if your entire folder is in sync or not.


There is also better integration for "sharing" folders directly from within the finder. In Dropbox, you can right click on any folder to open up a sharing link to share that folder with any other Dropbox user, or to generate a link to share with someone on the web. With CloudDrive, there doesn't appear to be any sharing option at all, let alone something as simple as Dropbox's options.

Dropbox is more then just about keeping your files in sync with your own devices, it's also about sharing and collaboration. CloudDrive completely misses this aspect.


Perhaps the most important difference between these applications is how they synchronize your files between your computers. Dropbox intelligently uploads your files to it's servers and your other devices. Specifically, if you have a device that contains files locally, and a device that needs to download them, instead of downloading from the internet, your device will download from the closest local system. This means that it's not unnecessarily traveling over your internet connection when it can more quickly grab those files from another device on your local network.

CloudDrive and most of the other Drive applications do not do this. They simply allow you to download the files directly from the "Cloud Server" and do not intelligently choose where to download the files from if they are available locally from another device. This means that your internet connection is much more saturated, and if you're trying to watch your favorite baseball team on while your  trying out CloudDrive, you're going to be disappointed.


While CloudDrive does appear to be the most promising so far, it still has a long way to go before I would switch from Dropbox. Dropbox has had the lead in the market for so long that they are a very hard target to chase, but hopefully with this new addition the gap between competitors will start to close, and we can begin to see more improvements in the sync'd drive space.