Thursday, July 17, 2014

How NOT to do 2 Factor authentication (MailChimp, this means you!)

UPDATE: You can enable QR code/Authy for AlterEgo

Thanks to a co-worker who discovered how to enable a QRCode based authentication for AlterEgo. After logging into AlterEgo via the website, you can go to "Integrations":


Under "Google Authenticator" choose "Connect":


This will generate a QR code you can attach to Authy, or any other standard Software MFA device!

How NOT to do 2 Factor authentication

Two factor authentication is great. It's the latest craze, but it's also a good idea. In general, the password is obsolete. Anyone can guess or brute force a static password, and making people change a password is lame. They forget, which means you need to have way to let them reset.

If it's something they're typing on mobile devices, it's probably going to be pretty weak, and the more you have to type it, the less secure it will be.

A multi-factor (or Two-factor) authentication token solves much of these problems. People will always make insecure passwords, a second form of authentication is key. There are three main types of authentication:


  • Knows Something (Password)
  • Has Something (Authentication Token)
  • Is Something (Firewall)
Breaking into the "Has Something" is critical, but it's also important to make sure it's not an obstacle. There are standards out there for how to do authentication tokens. Almost everyone generates a QR code that you can scan on your mobile application, and/or just uses SMS.

Yes, this does mean that there's a QR code out there that someone could hijack, but hopefully that QR code is not printed, but instead kept securely on the user's device. If you're like me, you use Authy, which does back up your MFA tokens, but also requires you to input more information when you need to restore, only allows on one device at a time, and requires a secondary form of MFA if you do need to restore (such as an SMS).

Other providers, such as RSA, allow for physical MFA tokens. These are by far the most secure, but also expensive, and a hassle if you have a bunch of them. I have one for my 401k, PayPal, and AWS account. Everything else is a Software Auth Token.

Google's MFA does not do backups, and if you upgrade your phone you lose it all. Not as ideal, but still not as bad as....

Mailchimp You're doing it wrong

MailChimp introduced MFA. Pretty great right? You don't want someone getting ahold of your client list, that could be pretty bad.

But they don't use a standard like a QR code, a physical token, or just SMS. Nope, they use a third-party company called AlterEgo

First off, when you search for "Alter Ego" in the app store, this app isn't what comes up. That's pretty bad itself, but not the worst part.

The worst part? They don't do two factor authentication like anyone else. The app is a mobile-browser package, and you can tell. It is NOT optimized for touch screens, let along small devices. It requires a login of username and password... wait isn't this what the MFA was suppose to be solving for us?

Worse yet, while it DOES have time-based codes, those codes are also one-time use. The interface doesn't have a simple way to let you generate a new code until the old one expires, even if you've already used it. In MailChimp, you often have to re-login all over again (another issue) including when you add new people, or are setting up your account for the first time. This means you're typing in your AlterEgo token multiple times within the 1 minute window that the token takes to "expire". That means you have to wait.... you can't just re-generate a new token, even though the one on the screen no longer works.


PLEASE, MAILCHIMP, DROP ALTEREGO!

It does not make me feel more secure. In fact it breaks your normal workflow, and makes your service difficult to use. There is no reason you can't generate a QR code and support every other type of MFA out there, or even just use SMS. You have SMS as a backup, but you can't set it up that way just with SMS.

Please please please, don't continue to require AlterEgo.




Wednesday, July 2, 2014

Google Cloud, still not ready for the real world

With all of the new buzz around the fancy features that Google has been launching lately, I decided to take a stab at using Google Cloud for one of my new projects at Newstex. The project is simple and isolated, so there was nothing to hinder me or tie me into AWS, and nothing to prevent me from testing out the waters.

What I discovered, however, is the reason why many people are still primary focused on AWS instead of Google Cloud; there's a lot of shiny features, but it's missing the important parts.

Security

Let's start with the most important part of any Cloud Infrastructure, Security. It's absolutely paramount that you can control access to resources, and make sure you contain who (and what) has access to different elements of your cloud environment. For example, you don't want to allow a process to have access to start and stop servers if all it needs is access to read files from your storage system. You also don't want to give access to every storage bucket to your new client that wants access to your files to download, just the specific files they should have access to. You also don't want to give the new DevOp you just hired full write-level access to kill all servers and horrendously screw things up before they know what they're doing.

For all of these things, Amazon developed IAM. You can securely control exactly what access any given set of credentials provides, in some cases down to an incredibly low-level of granularity, such as limiting someone to a specific prefix in an S3 Bucket, or a certain sub-set of items within a DynamoDB table. The granularity level is extreme, and it's very easy to construct access control rules that allow you to prevent abuse, as well as just misuse.

Google Cloud lets you split access apart by Project. That's it.

APIs and Documentation

Here's an interesting test for you, try to figure out how to write a row of data into a BigQuery table from a script in Python. Better yet, try finding out how to do it in Node.js. While Google does consider Python a primary programming language, if you're just trying to access an API through a script, it takes a lot of bootstrapping just to get that to work. There's no simple API like "boto" to handle all the automatic magic required to connect. The API keys don't even work with BigQuery, and the API keys aren't even secure (or really secret since you're sending them along with every request). There's no simple signed url scheme, it all relies on OAuth2.

Have you ever worked with OAuth2? It's a complete pain in the ass. Not to mention it's designed for a web-based workflow, not a server-side script. So when you do finally manage to get that script working, you have to go to a browser to authorize your request, then store the access keys for future use. Oh, and those expire. That requires more manual intervention.

It's not a usable API if it requires manual intervention. Yes, you can use it from within AppEngine, or within Google Cloud (with some work), but what if I don't want to run it within the Google Environment? Why am I so tied down to using their systems? Even then, it's not very obvious how to get things to work.

This is a fault of two aspects, the API is not well designed, and the documentation does not describe it well. A simple REST API with JSON would solve this issue, provided there was also decent documentation. There should be documentation and API wrappers to help with at least the most popular languages out there. I shouldn't have to resort to looking through python code to find out how to authenticate myself.

Oh, and if you offer an API Key, make it work with everything, and make it secure.

Consistency

Google is in a state of transition right now. It's very obvious if you ever talk to a Google engineer that the teams at Google are in a state of unhealthy competition. The BigQuery folks don't like the Cloud SQL Folks, and the Compute Engine folks are at war with the AppEngine folks. This is very obvious if you look at the design of each system. The "Cloud Console" doesn't have everything in it yet. You have to go out to the old consoles a lot to get the full functionality you want (like certain APIs, or to access certain features of BigQuery).

Yes, this will change, but it is a symptom of a larger problem at Google. There is a lot of internal politics going on that bleeds out and hits the consumer. It's unfortunate that the boys at Google don't know how to cooperate and realize that when one team succeeds, the entire company does (just look at how long it took for Android to adopt Chrome as the default browser).

With Material being released, hopefully this will change soon. This is simply a growing pain issue. Google will solve this eventually, but for now we'll have to wait until they solve their internal disputes before as a consumer we can get a good experience.

Quantity of Services

The quantity of services available at Google is pitiful. Google is very single-minded, and they have specific tools designed around an exact workflow. AppEngine is designed around very specific workflows (Web applications and background processes). BigQuery is designed around a very specific process (write-only datastore). What if you want to queue messages? What if you want to store versioned files? What if you need to store petabites of metadata in an SQL-Like environment? How about DNS with location-based routes?

For those solutions, you're on your own. Yes, there are OpenSource solutions you can run, and you can even run them on Google Cloud, but they don't really give you any simple solutions to do so.

Shiny features, not hard-level power

In conclusion, there are a lot of shiny features that Google Cloud offers (such as hot-VM Migrations), but this is not enough. They did not focus enough on the core requirements, just the differentiating factors. Yes, there are some very nice features of Google Cloud, but it is not a complete solution. There is still too much missing to make it a real competitor to AWS.

What does google need to do?

To get me to switch, there are a few things Google needs:
  • Easier access (Signed API Keys)
  • Granular Access Control (by API Key, by service, by access type)
  • Better documentation
  • Better API Wrappers (Node.js, Python, Go)
  • More services, or easy-access to open-source alternatives such as:
    • Redis
    • Memcache
    • Rabbitmq
  • More tutorials, with different use-cases, such as:
    • Video Upload/Encoding
    • Translation
    • Image Manipulation
    • Background Processing of text files
    • Google+ Sentiment analysis
    • etc...
And don't assume everyone is using Java. Do examples in other languages, even in Go to show off how nice of a language it really can be.

My Plea

Please Google, keep on innovating, and make the developer tools better. BigQuery is an awesome tool, but accessing it is not.

I am not an AWS fanboy, it's just the only service that works. I would gladly use Google Cloud if it offered a competitive alternative. It just doesn't right now.

Wednesday, April 23, 2014

CloudSearch vs Algolia - Battle of the Search engines

It's pretty clear the new CloudSearch system isn't heading in the right direction, in fact they're starting to lose some of their important features and instead it's becoming a commodity. In addition to doing a lot of searching on the new CloudSearch to see if it had any redeeming factors, I also started looking around for alternatives. One in particular came out as having a lot of potential, Algolia. Silly name, but amazing product.

Lets break it down.

Indexing

CloudSearch v1 allows you to send random JSON fields to it even if they aren't in the index. However, anything not configured won't be searchable until you do configure it. Still, every field can be multiple, and every field can be added to the index later if needed. CloudSearch v2 does not let you send extra fields, and instead tosses an error and refuses to index anything with extra fields. That means that if you want to start searching another field, you have to re-submit all your documents after adding it to the index.

Algolia, on the other hand, accepts any arbitrary JSON, including nested data. Yes, you can send something like this to Algolia and it will just figure it out:

{
  "author": {
     "first_name": "Chris",
     "last_name": "Moyer",
     "id": 12345
  },
  "title": "Building Applications in the Cloud",
  "toc": [
     {
        "number": 1,
        "title": ...

The previous example would index "author.first_name", "author.last_name", "author.id", "title", "toc.number", and "toc.title". You can even go multiple levels deep and it just works.

All of this without having to pre-configure an index. Yes you can choose fields and how things are indexed, but you don't have  to do so. It tries to figure everything out for you automatically, and it does a pretty good job.

Winner: Algolia


Searching

Both versions of CloudSearch allow complex boolean queries. CloudSearch v1 allows for a prologue-like format of searching:

   (and (or author:'Chris' title:'Building Applications*') content:'Cloud Computing' (not 'Weather'))

This lets you combine for some very complex logic, and gives you full power to search full-text throughout your records. You can also do simple text-based searches, and define what fields those text-base searches by default. You can combine using wildcards or full-words, as well as phrases grouped by quotes ("this is a phrase"). With v2, this syntax changes slightly, but still allows you to do some very complex querying, and even adds in Location-based searching (Lat-Lon).

Algolia does not allow for complex boolean query searching. You can make full-text searches, and filter against Facets. You can not group searches in the way you can in CloudSearch. You can not do negation searches. You CAN do some OR logic with Facet Filters, but not nearly as complex as CloudSearch offers.

Algolia also allows you to search multiple indexes at once, with CloudSearch you do not get that option.

Both systems offer Lat-Lon Searching, Faceting, and Numeric Range filtering. Both systems also return results relatively fast (within a few milliseconds).

Winner: CloudSearch


Analytics

CloudSearch added support for Search Analytics. These analytics come in three different reports, Search Count, Top Searches, and Top Documents.

The most interesting one is Search Count:

All reports are also downloadable to a CSV which can be used for further analytics. Most of the data is very raw and not very useful right out of the console.

Algolia, on the other hand, provides a weekly email that shows many more stats, and the console for their system includes quite a few bits of Eye Candy.


They also provide a nice dashboard which contains a lot of useful performance stats, as well as a general "health" of your indexes:


There's also a full set of stats available on each index including the number of operations, searches, and records, all by time series.

Winner: Algolia


Setup/Administration

CloudSearch requires quite a bit of initial setup. You have to provision your domain, initialize some indexes, and then wait about 30 minutes for each domain to be created. You also have to configure IP addresses that can access the domains. This is quite contrary to other Amazon Web Services, and does not support IAM or Credentials at all.

Algolia, on the other hand, does support Access Tokens, and even supports setting up custom Credentials with varying levels of permissions on different indexes. It does not allow you to edit the permissions after the credentials are generated, but you can always revoke credentials and send out new ones. As for setup? There is almost none. you can create a new index in seconds, you don't need to start with anything. You can even do so from the API by just sending documents, and then configuring a few things like default sort order, facets, and default search fields.

Additionally, when you change an index in Algolia, it happens nearly instantaneously. With CloudSearch you have to re-issue an "Index Documents" request, which temporarily puts your domain in a partially-working state (searches might return out dated results), and takes anywhere from a few minutes to a few hours. It also costs you.

Algolia lets you clear a domain instantly and those records are gone immediately. This makes resetting an index very simple. With CloudSearch, you have to remove each story individually, and then issue a new Index Documents request to get the size of your domains down again.

Winner: Algolia



Pricing

CloudSearch v1 was entirely based on the A9 search system. It was built to run on large servers, and designed around speed of search results. It works very well, but requires a lot of resources, and thus is costly. You also can't tell ahead of time how much storage you'll need, and the transparency is very low on how much you're using. The domains automatically scale and you don't have much control over it.

CloudSearch v2 is based on a different system, and does significantly reduce the costs, however it still is expensive, and doesn't really let you know how much storage you're using. You can give the domain hints to how large of a domain you want to start with, but you don't get any control over where the domain goes.

With CloudSearch, all you can ever see is how many documents are in the domain, and how many servers of what size are being used. It automatically scales with you, but the cost is very high.

With Algolia, you pick a plan. Up until you decide to go with an Enterprise Plan, you're paying for the number of documents in your domain. About $450/month gets you 5 million documents. For me, that's about an XXLarge domain on CloudSearch, which is about $800/month on AWS, plus indexing costs. Want to make sure it's reliable? Then you have to turn on Multi-AZ, doubling the cost to $1600/month.  CloudSearch v2 has been known to reduce sizes by up to 50%, but even at that with Multi-AZ enabled you're looking at about $800/month. Plus you pay for Batch Uploads, and Document Indexes if you need to run those. 

Algolia also shows you right off the top how much of your quota you're using, and you can easily remove documents. When you remove a document it's gone right away, you don't have to fiddle about trying to get your domains to scale down in size. If you want to go Enterprise, you pay by the Storage Size, but you can get 150GB of storage index, mirrored onto 3 dedicated servers, for about $1,750/month. In my example, that will fit about 30 Million records pretty easily, which costs us right now about $6k/month. That's a pretty big difference.


Winner: Algolia


Conclusion

In total, that brings Algolia to 4 wins, with CloudSearch only at 1 win. Still, that one win is on Search capabilities itself. Algolia was designed around making things fast, and require very few resources. They're slick, powerful, and new. They have a long way to go but they're already winning over CloudSearch. For most of my needs, Algolia wins easily over CloudSearch, even without the complex querying capabilities.

If for nothing other then Cost alone, Algolia is vastly better then CloudSearch. The team is small, but the product is solid, and I can't wait to see where it goes next.

Have you worked with Search as a Service solutions? What other systems have you found useful?

Wednesday, April 2, 2014

Is the new Amazon CloudSearch April Fools Joke?


Anyone who knows me, knows I'm a pretty big fan of AWS. I come out in defense of them more often then not, and my twitter feed is always buzzing with how much better they (typically) are then other wanna-be cloud providers. I tend to love any new service they come out with, and I try just about everything they make available to me.

I don't usually rant about AWS. I use them for my everyday life. They've built an Amazing amount of services. My favorite quote (and many of your favorites as well):

AWS is the Python of the CloudSpace. Everything is in the STDLib… #AWSSummit
So when I do rant, it's because they've done something really out of character. In this case, it's something so bad I think they need to re-think even deploying this until it's fixed.

The "New" CloudSearch

Not too long ago, Amazon release a new version of CloudSearch, with a lot of anticipated features, some of which are very enticing, such as:


  • Geographic Search (Lat-Lon)
  • Search Highlighting
  • Multi-AZ
  • Autocomplete
Unfortunately, there are some services that are implemented very poorly, and so much of it is a backward step that I have to wonder if this was some sort of early April fools joke. This "new" cloud search feels more like a pre-beta version, taking several leaps in the opposite direction of progress.

Hey Amazon, this is a joke right?

A long awaited step was to support multiple languages, however, they also removed your ability to specify the language of the document. Instead they suggest what you really wanted was to specify the language of each field. What..


To upload your data to a 2013-01-01 domain, you need to:
Omit the version and lang attributes from your document batches. You can use cs-import-documents to convert 2011-02-01 SDF batches to the 2013-01-01 format.

Another very important and backwards step is the ability to upload documents with more fields then you need indexed initially. This was incredibly useful because your backend can simply dump out all the objects to your SDF and upload them to the domain, then in the future if you wanted to add a new field, you don't need to re-upload all of your documents. This has also been removed:
Make sure all of the document fields correspond to index fields configured for your domain. Unrecognized fields are no longer ignored, they will generate an error.


What's worse, they added support for indexing from DynamoDB, but if you don't put every single field directly in your domain, you have to hand edit the SDFs or put everything into your CloudSearch domain:
The DynamoDB uploads are also not a pipeline, it only helps with the initial upload.

The need for specifying the exact document format you're going to upload is really very intense too. Before any field could be multi-valued, and now you have to very specifically tell it if a field is multi-valued or not, and if you want to use something like a "suggester", it only works with single-valued fields.

You also can't make a single field that maps in multiple source-fields, unless each of those source fields are included in the index itself. There is no more merging in multiple source fields into a single field to save on space.

Not just a new API, a whole new (incompatible) system

Perhaps worst of all, the new version of CloudSearch is entirely incompatible with the old version. This means that if you want to try out any of the new features, you basically have to start all over and re-design all of your systems, as well as re-create and re-upload your existing indexed data. Amazon provides no automated tools to do so either, you're pretty much on your own.

If you are an existing user of CloudSearch, you won't want to switch to this new system. It's not nearly as advanced as your existing implementation. You'll be missing quite a bit of functionality. If you're just starting out, you might not notice, and you'll probably be happy with some of the different (not new) features they're providing, such as pre-scaling and multi-AZ support.

Hopefully this is not indicative of the new way that CloudSearch, and Amazon in general, is moving to. This is the first time they've released a new product that has completely frustrated me, to the point of wondering what they're thinking. This is not the path forward, this is a completely re-work of an existing system for some specific use case, not a general need.

Friday, December 6, 2013

Half-pops: The "real" corn-based-nut

Recently a friend of mine introduced me to Corn Nuts. After scouring the stores looking for them and finally locating them at a CSV about 15 miles away, I tried them. They were OK, but not that intense crunch I'd grown to love by eating the little half-popped kernels of corn you get when you make a big batch of popcorn.

I love popcorn, and I've even gone so far as to find my perfect type of popcorn, the softest and smallest I've found, Ladyfinger Popcorn. I particularly love the kind from Wabash Farms, on Amazon.

I also discovered HalfPops, which appeared to be almost exactly what I was looking for, but without a local branch to sell them and having to pay a very large sum to get them shipped to me, I set out to find a way to make them on my own. I'd already found that making Corn Nuts was as simple as deep-frying Hominy, so these new Half-Pops (sometimes called "Old Maids") couldn't be that hard to make.

HalfPops.com would like you to think it's an incredibly complex process; it's not.

First, lets take a look at what we're aiming for:

This is what I wanted to make, but not just a few with a lot of wasted popcorn, a whole bunch of them, something like this:

That's a whole lot of half-popped popcorn! Because I used ladyfinger popcorn, they also aren't incredibly hard, but they still are quite crunchy. Enough that they could probably break your teeth if you're not too careful, and you might occasionally still get a few that are just too hard to eat, so make these at your own risk... but oh are they incredibly delicious.

So what's the secret? It's much simpler then you might think. The issue with popcorn is the coating on the outside makes it hold all of the moisture inside until it finally explodes, so the key is to try to remove some of that coating before you pop the kernels (this also has the added effect of "softening" the hardness of the kernels). There's a lot of ways you could try to do this, but the easiest I've found is  by soaking it in water.

What you need:

You'll only need a few simple household items to make these, plus the Ladyfinger popcorn. It might work with others, but I've tried Mushroom popcorn with no success at all, so steer clear of that. Here is what I used

The process is quite simple, although it does take a while due to the soaking process. Take a lot of popcorn, say about a cup at least, and add it to an air-tight container. Add water, enough to cover it. Some of the kernels will float, don't worry about that, eventually you'll stir it around and they will mostly all sink. Add about the same amount of salt you would do for brining meat, so if you've got a cup of kernels, you should have about a cup of water, and a quarter cup of salt. Seal, then shake the container (like the old shake-and-bake bags) until all the salt, kernels, and water are mixed.

Then comes the long part, let it sit in a cool area (but not the fridge!) for at least 4 days, occasionally shaking it up to make sure the salt doesn't settle too much.

After the 4 days, you're ready to start. First, spin up your popcorn kettle (you can also try this with a traditional pot, but it might be tough to make sure the kernels don't burn at the bottom). You'll need to add a good quantity of Popping oil, about 3-4x what you normally would add. I usually squeeze in about 4-5 tablespoons, but you can experiment with more depending on how much popcorn you're using. You need to add enough that the top of the kernels will be covered with oil.

Next, turn the kettle on. While it is warming up a bit, drain the popcorn that's been soaking for at least 4 days. DO NOT dry it with a paper towel. You don't need to get all the water out, you just don't want to dump in a whole ton of the stuff.

You don't need to wait for the kettle to completely come up to temperature like with regular popcorn, in fact that seems to harm the process, so by the time you're done draining the popcorn, it's time to add it to the pot. Add about 3x what you normally would, as long as all the kernels are covered with oil. It's ok (and even good) if a little extra water gets into the pot.

Wait for the kernels to start popping (slightly) and then listen closely. The kernels won't pop out of the kettle, so you have to be very careful to watch that they don't burn.  It will start slowly, then pick up, and then start to slow down again. It takes a little longer then regular popcorn, so just be patient. If you start to see kernels popping out of the top, then you've left it in too long, or didn't soak it long enough.

Once the popping starts to really slow down, about a second between each pop, turn off the kettle and let it sit for at least a minute. This is very important as the oil is still very hot. If you dump the kernels out as they are, you'll end up burning yourself, and probably melt any plastic bowl you might be putting them into (I did this the first time I made it!).

After a minute or so sitting off, dump the kernels out into a bowl, and add whatever seasoning you want. If you added enough oil, there will still be quite a bit left with the kernels, so you shouldn't need to add any butter to get the seasonings to stick. Personally, I like mine hot, so I always add this:


You also probably want to add some salt. I typically also add Gourmet Fries Seasonings, Salt and Vinegar. The combination of these two adds quite an amazing little punch to these things.

Make sure you still let them breathe a bit, and you can even dry them off with a paper towel like you would fries.

They aren't quite the same as the half-pops, in fact usually they're a little less popped. The perfect snack, and it doesn't take a whole lot of new equipment to make it!

What do you like adding to your half-pops?

Enjoy!


Tuesday, August 27, 2013

Alfred 2: Workflows Review

Recently, a co-worker of mine discovered Alfred. (They have an app in the App Store as well, but version 2 is only available on their website.) The second iteration of this application has introduced a pro version which includes a powerful new feature, Workflows. While there are many pre-built workflows, the most powerful feature I've discovered is the ability to create your own workflow, which can be incredibly advanced, calling PHP, Python, or any generic script to populate the list of choices in Alfred, and even execute custom actions.

Before starting with Workflows, I suggest you download the very powerful and useful Workflow Searcher. It ties into the api behind AlfredWorkflow.com and lets you very easily download and install custom workflows right from Alfred.


Example: Searching for Spotify workflows

While I've enjoyed many workflows built by others, I also wanted to contribute by making some of my own. So far I've managed to make a custom workflow that searches our internal systems, and lets me access any record within our database very easily.

My next challenge was re-building the list_instances command from boto and combining it into an SSH script. The python ALP library has proven exceptionally useful in this experience. I've chosen to bundle this library with my workflows since the version installed via pip or easy_install is too old and doesn't contain the latest functionality. This also means that in order to install the workflow, you don't need to have alp, just boto itself installed and configured properly.

The result is something like this:

Selecting one of the items automatically copies the appropriate SSH commands to your clipboard.

I plan to improve this workflow to automatically perform the SSH commands in the future, however I've currently set it up to also allow me to SSH into instances behind a VPC. This is done by first running "ssh vpc" and then sshing directly into the instances private IP. This means two commands are copied to your clipboard, which I haven't yet found a good way to execute via a terminal directly.

You can find the full workflow here. Or a direct download link here.

Other boto commands to come… let me know if you'd like to help, or have any ideas by posting issues to the workflow on github!


Monday, August 12, 2013

Backing up your RDS instance to Dropbox using Sequel Pro

After searching around for a third-party backup solution to RDS, and realizing that many companies just want to charge way too much for something so simple, I decided it would be much easier to simply back up our SQL databases ourselves. While this process is manual, you could easily use a script to automate backing up (something like Mysqldump, or just write a simple python script to create the dump file then upload to dropbox).

For our purposes, we wanted to take snapshots of the databases after a certain task was completed, so it's easiest to just take manual snapshots. Fortunately, SequelPro makes backing up databases so simple anyone can do it.

Here's how.

Step 1: Download Sequel Pro:


It's free, and available here: http://www.sequelpro.com

Yes, it's Mac only.

Step 2: Authorize your IP

Make sure your IP is authorized to access your RDS server.  You can do this pretty easily from the AWS Management console. Click on Security Groups:

Then choose your security group, most likely you just have the default one. Click the icon next to the checkbox to dive deeper into the group


Lastly, enter your IP followed by /32 if it doesn't already appear in your list, and hit add.



Note that it can take a few minutes for your IP to be authorized.

Step 3: Connect to your server





It's pretty simple to connect to your server, just fill in the Host, Username, and password. Make sure to select "Connect using SSL". You can save this connection for later by pushing the Plus button in the bottom left and naming the connection.


Step 4: Select the database you want to export

Make sure you select the database you want to export from the menu located at the top of the SequelPro dialog:


Step 5: Choose "Export" from the File menu





 This will bring up a dialog box that will let you choose what to export. Make sure to choose "SQL" and you probably will want to select a compression method. You can access this option by clicking on the arrow at the bottom of the dialog box.


Make sure you choose the path you want to save to. Since we've got a large dropbox account, this is the best place to go for us. It's easy to simply export the compressed BZ2 (which takes longer, but is much more compact) right to dropbox, and have it automatically backed up to an external system.

Recovery

Recovering from this backup is also just as easy. All you have to do instead of choosing "export" on step 5 is choose "import". This will let you choose the location of the file to import, and your database will be replaced with the content of that latest backup.



Wizpert