Help! My CPU Cycles are being Stolen!

A lot of people have been having questions about EC2, specifically when you log in and you see this strange statistic, "% Stolen CPU". 

So what really does that mean? Why are my cycles being stolen?

To first understand what's going on, you need to know that Amazon doesn't buy the same CPUs all the time. In fact they rely on Commodity Hardware, often buying different types of CPUs every time they expand or replace systems. Datacenters even within just us-east-1 run all sorts of different underlying hardware, they spread out what instances use among these different CPU types, which not only helps prevent a single CPU defect from effecting large amounts of instances, but also makes things very interesting to keep performance constant. In the end, however, each EC2 instance must have relatively the same benchmarks, so Amazon uses Virtualization to limit your CPU usage no matter what physical hardware your instance is actually running on.

In a VM such as an EC2 instance, your VM is not on a dedicated CPU. Amazon buys larger CPUs, and then places multiple instances running on a single CPU. In order to bump down your CPU from these larger CPUs to one EC2 Compute Unit (One EC2 Compute Unit provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor. This is also the equivalent to an early-2006 1.7 GHz Xeon processor referenced in our original documentation), the Virtualization system will "steal" part of that CPU when you over utilize what you are provisioned. This "stolen" CPU simply means that even though the CPU you're on has more capacity, you aren't allowed to utilize that. This can make for a very confusing ponder at CPU monitoring, but in general it just means you're over-utilizing your CPU. 

In most instances, that Stolen CPU will be relatively constant (given relatively constant load). In Tiny instances, however, that Stolen CPU will fluctuate rapidly, as you are granted "Burst Capacity". 

So don't freak out when you log into your instance and it says you have 95% Stolen CPU. That's just Amazon balancing out how much CPU you were given. 

Stolen CPU cycles are actually a relatively high-overhead way of handling this issue of multiple different underlying hardware types, I've noted that the Amazon Linux AMI  performs much better on all instance types, but especially on the Micro instances. While ubuntu AMIs become almost impossible to log into on a t1.micro instance when you don't have burst capacity, the Linux AMI remains constantly accessible. If you're getting a lot of stolen CPU cycles and you can't figure out why, first try switching to the Amazon Linux AMI, and see if that helps make things smoother. This AMI appears to be much better at handling stolen CPU cycles.

Comments