Preface: when should be used Aurora / RDS
These can look quite similar. They’re both managed services, where you pay Amazon to manage and administer your database. They both let you spin up databases with a few clicks in the console. They both scale to dizzying heights, with terabytes of storage per database.
These similarities can make it hard to tell them apart.
Which begs the question: When should you use RDS and when should you use Aurora?
Amazon Aurora vs RDS: Head to Head
What is Amazon Aurora?
Aurora is an AWS database service that is compatible with MySQL and PostgreSQL but uses an innovative database engine behind the scenes. Applications that currently use MySQL / PostgreSQL can be migrated to Aurora with minor or no changes.
An Aurora database instance can store between 10 GB—64 TB, with data divided into 10 GB blocks and distributed on different disks.
It offers easy scalability—storage is scaled automatically when the database grows, and to support more read requests, you can create up to 15 Read Replicas.
What is Amazon RDS?
|Amazon Relational Database Service is a managed database service that supports several popular database engines, including PostgreSQL, MySQL, SQL Server, MariaDB, and Oracle.|
RDS can help you perform a wide range of database management tasks, including data backup and recovery, patching, and migrations.
It automatically backs up database instances, captures daily snapshots of data, preserves transaction logs, and enables point-in-time recovery.
Additionally, the service automatically patches database engine software.
To improve the reliability and availability of workloads, it enables the replication of databases with automated failover across several availability zones.
Four approaches to database management
If you run a database in your own data center, you’ve probably hired database administrators.
Your DBAs run databases on servers that you own. They install software, apply maintenance and security patches, make regular backups, and so on. It’s a labor-intensive process—but without the cloud, it’s how you do things.
As a first step toward the cloud, you can run a database on EC2.
This is like running a database in your data center but with EC2 instances instead of servers. This is less and less common, but it’s useful if you need very fine-grained control over your database.
Amazon created RDS to reduce the management overhead of running a database in EC2.
You click a few buttons, and you get a database ready to store data. Amazon handles all the fiddly bits—provisioning instances, replication and backups, maintenance and updates. It runs a variety of database engines—including MySQL, PostgreSQL, MariaDB, and Oracle—so it works with your existing application code.
If you have databases you want to migrate to the cloud, RDS is a great start.
(If you are migrating, talk to your AWS account manager first. Amazon wants to help you move your on-prem databases into RDS, and your account manager may be able to provide credits or professional services to help you along the way. They also have the Database Migration Service, an underrated tool that does exactly what it says.)
But RDS still looks like a database running in a data center. What if you didn’t have to imitate an existing architecture?
Amazon created Aurora to be a cloud-first database.
Externally, it behaves like any other RDS database. It’s API-compatible with MySQL and PostgreSQL, and it’s meant to be a drop-in replacement. Amazon still handles all the fiddly work of managing the database—but under the hood, it’s quite different.
Rather than running the entire database on a fleet of EC2 instances, Aurora splits the compute and storage into different pieces. Storage is handled by a custom data layer, designed to take advantage of Amazon’s cloud infrastructure. This “secret sauce” has a number of benefits, so let’s dive in a bit deeper.
Aurora’s cloud-first architecture
Although Aurora is closed-source, Amazon is pretty open about how it works. You can find a variety of re: Invent sessions, documentation pages, and white papers about the Aurora architecture—if you’re interested, do read further! Here, I’m just going to present a high-level overview.
In a traditional database cluster, you have one or more nodes (servers or EC2 instances): a read-write primary/writer (W), and read-only replicas (R). If you write to the primary node, that write gets synchronously replicated to the replica nodes. Different database engines use different consensus protocols for replication to ensure all the nodes have a consistent view of the data.
Each node is responsible for both computing and storage, so the entire database is contained in these nodes. If you want more durability or scale, you need to add more replicas—and that comes at a cost. More replicas mean more replication traffic, and at some point network, I/O becomes a bottleneck. It also takes time to modify the cluster, because new nodes have to replicate all the existing data before they can serve queries. To keep replication manageable, RDS limits you to five replicas.
In an Aurora cluster, there are different nodes for compute and storage.
Data is stored in a shared “cluster volume”, which spans six storage nodes and three availability zones. This gives you multi-AZ resilience, and it uses the same “pay for what you use” model as S3. Rather than provisioning storage upfront, the volume grows or shrinks to match your data, up to 128 TB—and your bill grows or shrinks to match.
Queries are handled by compute nodes, which all talk to the same cluster volume. You have the same primary/replica split, but replication is handled entirely within the storage nodes. There’s no synchronous replication between compute nodes, and they don’t hold any permanent state.
This architecture has several benefits:
Aurora scales faster because it can add new read replicas quickly. Because replicas all use the shared storage volume, a new replica can serve queries almost immediately. It doesn’t have to wait to replicate data from the other nodes.
Aurora scales further because extra compute nodes are cheap. Aurora does some asynchronous cache replication between nodes, but nothing synchronous. This reduces the internode I/O, which means Aurora can have more replicas; where RDS allows just five, Aurora allows 15.
Aurora is more durable. You always have multiple copies of your data. Every Aurora cluster has six storage nodes, spread across three AZs, even if you have just one compute node. Even if an Availability Zone suffered a spontaneous existence failure, your data would be safe. In RDS, you have to max out your read replicas for this level of durability.
Aurora is more resilient. It has a fast recovery from failures. If a computed node crashes, Aurora can recover quickly. It can start new read replicas with minimal lag, and if the writer fails, another replica can be promoted to take over without waiting for the other nodes to reach a consensus. All the shared state is in the data nodes, so failed nodes can be replaced almost immediately.
These are all desirable properties. But in addition to this, Aurora’s architecture enables another approach that’s worth discussing separately.
Aurora’s storage design means you don’t have to provision storage in advance; it scales to meet demand. What if the same was true for computing?
Amazon Aurora Serverless is an on-demand, auto-scaling configuration for Amazon Aurora. It automatically starts up, shuts down, and scales capacity up or down based on your application’s needs. It enables you to run your database in the cloud without managing any database capacity.
Manually managing database capacity can take up valuable time and can lead to inefficient use of database resources. With Aurora Serverless, you simply create a database endpoint, optionally specify the desired database capacity range, and connect your applications. You pay on a per-second basis for the database capacity you use when the database is active, and migrate between standard and serverless configurations with a few clicks in the Amazon RDS Management Console.
Aurora Serverless lets you run Aurora without having to guess how many compute nodes you need. It automatically starts and stops nodes to match the needs of your application. It scales up to meet a spike in demand and scales down when things are quiet. The data remains in the shared storage volume, independent of any scaling.
This is particularly useful if you have spiky, intermittent, or unpredictable workloads. You only pay for the compute capacity you use, so you could enjoy big savings if you have long idle periods. (But good luck predicting your spending in advance!)
If you require sustained workloads, look at using regular Aurora or RDS instead.
Here are a few ways to compare the features of these competitive services.
Amazon RDS improves I/O performance by using SSDs for database storage. You can choose between two SSD storage options:
- Optimized OLTP —storage is optimized for high I/O transaction database workloads. Each database instance can support up to 30,000 IOPS.
- General-purpose —cost-effective hardware offering 3 IOPS per stored GB.
Amazon Aurora improves on the performance of standard MySQL by 5x, and improves on standard PostgreSQL by 2x, with the same hardware configuration. It is a new database engine built to make the most of available compute capacity, memory, and network bandwidth.
Amazon RDS can provision storage up to 6TB, on-demand without downtime.
Amazon Aurora automatically increases storage from a minimum of 10 GB to a maximum of 64 TB, in increments of 10 GB, based on current usage—you do not need to pre-configure storage. Storage scaling does not affect database performance.
However, MySQL only supports the InnoDB storage engine. Tables are automatically converted if they use another storage engine.
Amazon RDS allows up to five live database copies, known as reading replicas. RDS failover is done manually, so some data loss may occur.
Amazon Aurora also provides read replicas, which share the same data volume as the original database instance. Up to 15 read replicas are supported. Failover is done fully automatically with no data loss. It also supports creating a highly available database cluster with synchronous replication across multiple availability zones.
Amazon RDS performs a daily snapshot of your database and saves transaction logs to allow point-in-time recovery. The snapshot occurs during a backup window you can specify. While the snapshot is being taken, storage I/O may be interrupted while data is copied, affecting database performance. Backups are saved to S3 with very high durability.
Amazon Aurora backups are continuous, incremental backups that do not affect database performance. Also, there is no need to take frequent snapshots to enable point-in-time recovery.
Factors to Consider Before Choosing
We covered a feature by feature comparison of the two database services. Here are some other important considerations before you select the service that is right for you.
Especially for production databases, data backup is a critical consideration. Aurora offers higher availability and better resilience than RDS, due to its unique storage model, and ability to perform continuous backups and restore with a very low recovery point objective (RPO).
Database engine support
Aurora only supports MySQL and PostgreSQL. If you need to run other database engines, such as SQL Server, you will need to opt for RDS.
Aurora is generally more expensive than RDS for the same workloads. It is priced based on the type and size of the instance and EBS volume. Aurora pricing is mainly based on instance size and storage is billed according to actual usage. Keep in mind that read replicas represent an extra cost on both platforms.
What are the limitations of Aurora?
Aurora sounds great: more scalable, more durable, more resilient—who wouldn’t want those things? But before you jump in, here are a few reasons you might choose RDS over Aurora:
You use a database engine that Aurora doesn’t support. RDS supports more database engines and features than Aurora:
- RDS supports five database engines; Aurora just two. If you need MariaDB, Oracle, or Microsoft SQL Server, RDS is your only choice.
- Although Aurora is API compatible with MySQL and PostgreSQL, it’s not always the newest version, and it doesn’t always have the same features.
- Aurora only uses the InnoDB storage engine. You can’t use other storage engines.
You want to use the AWS Free Tier. You can run an RDS database for a year in the free tier; there’s no free version of Aurora.
You want predictable database costs. RDS has the same pricing model as EC2; there’s a fixed per-hour cost for every instance you run. Aurora pricing is harder to predict. In addition to the hourly instance cost, you pay per million I/O operations. How many will you use? It’s hard to tell in advance.
Your workloads aren’t a good fit for Aurora. Amazon marketing tout statistics like “5X the throughput of standard MySQL” and “3X the throughput of standard PostgreSQL.” But database performance can’t be reduced to a single number. Aurora will be significantly faster for some workloads, sure. But it might be slower for others. If you’ve already fine-tuned your queries for a particular database engine, or Aurora seems slow, you might want to benchmark against regular RDS.
Should I use RDS or Aurora?
For many organizations, RDS is the “safe” choice. It’s a longstanding, reliable service that looks like the databases they already have—with predictable pricing and similar alternatives as other providers.
Aurora’s unique architecture gives you more durability, scale and resilience. And for many workloads, it’s cheaper and faster than running the equivalent RDS database. It’s what I’d choose, if I was building a new application today—but it may not be right for you.
Bottom line? Aurora can be a big win over RDS, but it’s not a slam dunk. There’s no “right” way to run a database, which is why Amazon gives us the choice.