It's pretty clear the new CloudSearch system isn't heading in the right direction, in fact they're starting to lose some of their important features and instead it's becoming a commodity. In addition to doing a lot of searching on the new CloudSearch to see if it had any redeeming factors, I also started looking around for alternatives. One in particular came out as having a lot of potential, Algolia. Silly name, but amazing product.
Lets break it down.
Lets break it down.
CloudSearch v1 allows you to send random JSON fields to it even if they aren't in the index. However, anything not configured won't be searchable until you do configure it. Still, every field can be multiple, and every field can be added to the index later if needed. CloudSearch v2 does not let you send extra fields, and instead tosses an error and refuses to index anything with extra fields. That means that if you want to start searching another field, you have to re-submit all your documents after adding it to the index.
Algolia, on the other hand, accepts any arbitrary JSON, including nested data. Yes, you can send something like this to Algolia and it will just figure it out:
"title": "Building Applications in the Cloud",
The previous example would index "author.first_name", "author.last_name", "author.id", "title", "toc.number", and "toc.title". You can even go multiple levels deep and it just works.
All of this without having to pre-configure an index. Yes you can choose fields and how things are indexed, but you don't have to do so. It tries to figure everything out for you automatically, and it does a pretty good job.
Both versions of CloudSearch allow complex boolean queries. CloudSearch v1 allows for a prologue-like format of searching:
(and (or author:'Chris' title:'Building Applications*') content:'Cloud Computing' (not 'Weather'))
This lets you combine for some very complex logic, and gives you full power to search full-text throughout your records. You can also do simple text-based searches, and define what fields those text-base searches by default. You can combine using wildcards or full-words, as well as phrases grouped by quotes ("this is a phrase"). With v2, this syntax changes slightly, but still allows you to do some very complex querying, and even adds in Location-based searching (Lat-Lon).
Algolia does not allow for complex boolean query searching. You can make full-text searches, and filter against Facets. You can not group searches in the way you can in CloudSearch. You can not do negation searches. You CAN do some OR logic with Facet Filters, but not nearly as complex as CloudSearch offers.
Algolia also allows you to search multiple indexes at once, with CloudSearch you do not get that option.
Both systems offer Lat-Lon Searching, Faceting, and Numeric Range filtering. Both systems also return results relatively fast (within a few milliseconds).
CloudSearch added support for Search Analytics. These analytics come in three different reports, Search Count, Top Searches, and Top Documents.
The most interesting one is Search Count:
All reports are also downloadable to a CSV which can be used for further analytics. Most of the data is very raw and not very useful right out of the console.
Algolia, on the other hand, provides a weekly email that shows many more stats, and the console for their system includes quite a few bits of Eye Candy.
They also provide a nice dashboard which contains a lot of useful performance stats, as well as a general "health" of your indexes:
There's also a full set of stats available on each index including the number of operations, searches, and records, all by time series.
CloudSearch requires quite a bit of initial setup. You have to provision your domain, initialize some indexes, and then wait about 30 minutes for each domain to be created. You also have to configure IP addresses that can access the domains. This is quite contrary to other Amazon Web Services, and does not support IAM or Credentials at all.
Algolia, on the other hand, does support Access Tokens, and even supports setting up custom Credentials with varying levels of permissions on different indexes. It does not allow you to edit the permissions after the credentials are generated, but you can always revoke credentials and send out new ones. As for setup? There is almost none. you can create a new index in seconds, you don't need to start with anything. You can even do so from the API by just sending documents, and then configuring a few things like default sort order, facets, and default search fields.
Additionally, when you change an index in Algolia, it happens nearly instantaneously. With CloudSearch you have to re-issue an "Index Documents" request, which temporarily puts your domain in a partially-working state (searches might return out dated results), and takes anywhere from a few minutes to a few hours. It also costs you.
Algolia lets you clear a domain instantly and those records are gone immediately. This makes resetting an index very simple. With CloudSearch, you have to remove each story individually, and then issue a new Index Documents request to get the size of your domains down again.
CloudSearch v1 was entirely based on the A9 search system. It was built to run on large servers, and designed around speed of search results. It works very well, but requires a lot of resources, and thus is costly. You also can't tell ahead of time how much storage you'll need, and the transparency is very low on how much you're using. The domains automatically scale and you don't have much control over it.
CloudSearch v2 is based on a different system, and does significantly reduce the costs, however it still is expensive, and doesn't really let you know how much storage you're using. You can give the domain hints to how large of a domain you want to start with, but you don't get any control over where the domain goes.
With CloudSearch, all you can ever see is how many documents are in the domain, and how many servers of what size are being used. It automatically scales with you, but the cost is very high.
With Algolia, you pick a plan. Up until you decide to go with an Enterprise Plan, you're paying for the number of documents in your domain. About $450/month gets you 5 million documents. For me, that's about an XXLarge domain on CloudSearch, which is about $800/month on AWS, plus indexing costs. Want to make sure it's reliable? Then you have to turn on Multi-AZ, doubling the cost to $1600/month. CloudSearch v2 has been known to reduce sizes by up to 50%, but even at that with Multi-AZ enabled you're looking at about $800/month. Plus you pay for Batch Uploads, and Document Indexes if you need to run those.
Algolia also shows you right off the top how much of your quota you're using, and you can easily remove documents. When you remove a document it's gone right away, you don't have to fiddle about trying to get your domains to scale down in size. If you want to go Enterprise, you pay by the Storage Size, but you can get 150GB of storage index, mirrored onto 3 dedicated servers, for about $1,750/month. In my example, that will fit about 30 Million records pretty easily, which costs us right now about $6k/month. That's a pretty big difference.
In total, that brings Algolia to 4 wins, with CloudSearch only at 1 win. Still, that one win is on Search capabilities itself. Algolia was designed around making things fast, and require very few resources. They're slick, powerful, and new. They have a long way to go but they're already winning over CloudSearch. For most of my needs, Algolia wins easily over CloudSearch, even without the complex querying capabilities.
If for nothing other then Cost alone, Algolia is vastly better then CloudSearch. The team is small, but the product is solid, and I can't wait to see where it goes next.
Have you worked with Search as a Service solutions? What other systems have you found useful?