One of the most exciting and underrated things to come out of AWS Re:Invent 2020 is one of the most foundational ones! GP3 EBS volumes are here! Why should anyone care? It turns out that it can have a dramatic effect of how you optimize cost in AWS...
EBS is a key foundational service for AWS adoption
One of the most versatile AWS storage types is the Elastic Block Store (EBS) Volume, the AWS equivalent of SAN (Storage Area Network) storage. EBS is the most common (and sometimes only...) method to add storage to a EC2 Compute Instance (AWS equivalent of a virtual machine). EBS is so foundational and ubiquitous as a service, that it is typically the 2nd largest expense on the monthly cloud bill for new adopters, behind EC2, or the compute instance itself.
Within EBS, there are several different storage types that vary in performance, scale, and use case, with the typical default of General Purpose V2 class storage (GP2) for many deployments. GP2 is SSD based storage with highly burst-able features…a way to buffer "spiky" reads and writes, using caching. GP2 is simple to deploy, highly reliable, simple to manage and works for a wide variety of typical use cases.
GP2 is Good, but Not Perfect
All is not perfect with GP2, as it also has some important limitations:
- GP2 is significantly more expensive per GB than disk based technologies
- Performance drops dramatically once the buffer cache is filled
- Performance can only be scaled with adding disk space to the volume
- Performance scale is limited by the EC2 Instance type
Three Key Limitations of EBS GP2
- Performance drops dramatically, once the buffer cache is filled
Usage of the burst cache is measured in AWS CloudWatch Metrics through the burst balance metric, and shows how often an instance "leans" into the buffer cache to maintain disk performance. This is one of the first metrics that we measure in Cloud Cost Optimization services, because it is such an important measurement of actual usage and user experience. To highlight how dramatically the burst balance can effect the sustained performance of an application, consider a 20GB GP2 volume has 3,000 IOPS burst-ability, but when exhausted, that volume drops to 100 IOPS of sustained performance. After 30 minutes of sustained 3,000 IOPS performance, the burst balance would be exhausted and sustained performance would drop to only 3% of the "norm". Not 3% less performance, 3% of "typical" performance...
- Performance can only be scaled by allocating additional disk space to the volume
In GP2, the total performance that the volume can provide is a function of the total allocation of the volume provisioned (in GB). The allocation directly affects baseline performance, while using the earlier discussed burst balance (cache) to "smooth" out performance spikes. An example table highlights this:
Table: GP2 Volume Size vs. Sustained IOPS performance
Note that the burst performance doesn't change across the volume allocation spectrum. What do you do if you need 1,500 baseline IOPS, on a 100GB volume? Sadly, you provision 500 GiBs and let 400 be wasted…
- Performance scale is limited by the EC2 Instance type
Not all EC2 instances can communicate with the EBS service at the EBS volume's full potential. In fact, unless you are operating on EBS-Optimized instances, the EBS volume traffic will contend with traditional network traffic over the same network interface card. In the case of a t2.micro, network performance can be constrained between 50Mbit-300Mbit…for both storage AND network traffic. For these situations, the end user ends up utilizing a higher cost EC2 instance, compounding monthly waste. Additionally, only EBS-optimized instances carve out dedicated bandwidth out for EBS volumes, under a separate network path.
All Hail GP3
GP3 was designed to address all three of these problems…for 20% lower cost, than GP2. Let's look at how…
- Burst Performance is now scalable to 16,000 IOPS (instead of 3,000 IOPS)
While the default burst performance is unchanged from 3,000 IOPS/sec, users can reserve additional burst performance up to 16,000 IOPS/sec for an additional fee.
- Volume Performance is now configurable independent of volume size!
Volume Performance can now be configured from 3,000 IOPS/sec (note: this is base, not burstable like GP2) up to 16,000 IOPS! Speed can be configured from 125MiBs to 1,000 MiBs independent of volume size.
- GP3 volumes are not dependent on the EC2 instance for network performance!
GP3 Volumes now communicate over a separate network path, instead of sharing the allocated network bandwidth
A Great Deal Gets Even Better
Existing volumes can be migrated in place through an API call without downtime or disruption! This has created a no risk scenario with no apparent downsides. Additionally, all snapshot capabilities are fully supported under existing lifecycle management policies. Did I mention that it is 20% lower in cost than GP2? This is proof that a few great things did come out of 2020. For the vast majority of workloads that were either over-provisioned for performance or migrated to IO2 (the most performant, and expensive tier of EBS), GP3 is now a lower cost, no compromise solution.
Of course, there will still be a few of the highest performance use cases that need more than 16,000 IOPS, they can stay on IO2…but for everyone else...GP3 is here and an important new tool for Cloud Cost Optimization.