
The promise of DBaaS like RDS is to reduce operational overhead (among other things) and one of the stellar cases is upgrades (major and minor). The suggested procedure involves just a couple of steps. For example, using AWS Console, you can enable “Auto minor upgrade” or modify the DB instance and schedule the upgrade to run in the next maintenance window.
But, both these options are risky because the upgrade process will start during the maintenance window but it is NOT guaranteed that the upgrade will be completed within the specified duration.
The Problem
RDS performs a few extra steps to ensure the data consistency and rollback, making the minor version upgrade a time-consuming process:
- It takes a backup (if automated backups are enabled) prior to starting the upgrade process.
- Performs slow shutdown after setting innodb_fast_shutdown=0 which can take minutes or even hours if there is a huge amount of change buffer to merge. This is the biggest bottleneck!
- ANOTHER backup is taken (if automated backups are enabled) after completing the upgrade.
- It does enforce that all the replicas should be upgraded to the targeted version first before starting the upgrade process on Master.
In a typical Master/Replica setup, the upgrade time will double as the first replicas need to complete all the above-mentioned steps and then Master. If you have a tight maintenance window time, you might not be able to finish the upgrade process in a single window, even if you don’t have replicas!
Why?
Backups are time-consuming (and tight to the datadir size).
But mostly because the fast shutdown is disabled. And the bottleneck is not even the dirty pages inside the buffer pool but the CHANGE BUFFER. Change buffer merges can take from minutes to hours, depending on the usage (the number of secondary index pages) and the size.
Monitoring the Change Buffer Merges
Am I being a victim of change buffer merges? Here’s how you can know:
Error log:
You will see similar to the following message in the MySQL error log when the change buffer is being merged during the slow shutdown process.
2020-01-06T18:47:59.563156Z 0 [Note] InnoDB: Waiting for change buffer merge to complete number of bytes of change buffer just merged: 158019 2020-01-06T18:48:01.024203Z 0 [Note] InnoDB: Waiting for purge thread to be suspended
InnoDB status:
Check the “SHOW ENGINE INNODB STATUS\G” output:
------------------------------------- INSERT BUFFER AND ADAPTIVE HASH INDEX ------------------------------------- Ibuf: size 14591, free list len 19607, seg size 34199, 174000 merges
Sizes are shown in pages (default 16K). Actual memory values are like this:
- Total memory allocated for Change buffer: seg size (34199) * Innodb_page_size (16384) = 534.35 MB
- Total memory used by Change buffer: Ibuf: size (14591) * Innodb_page_size (16384) = 227.98 MB
We ran a write-heavy workload on db.m4.large instance to increase the change buffer size to 227.98 MB (Ibuf: size 14591). Thereafter, the slow shutdown took 3 hours, 15 minutes. This means that the minor version upgrade for this instance will take 3.25 hours + time required for the actual upgrade process.
Instance details are as follows:
– Engine version: 5.7.22
– Multi-AZ: No
– Instance class : db.m4.large
– Provisioned IOPS: 1000
– Parameter group: default.mysql5.7 (except innodb_buffer_pool_size : 3G)
Slow shutdown logs:
2020-01-06T18:46:00.625227Z 0 [Note] InnoDB: Starting shutdown... 2020-01-06T18:47:59.563156Z 0 [Note] InnoDB: Waiting for change buffer merge to complete number of bytes of change buffer just merged: 158019 2020-01-06T18:48:01.024203Z 0 [Note] InnoDB: Waiting for worker threads to be suspended … 2020-01-06T20:27:19.473898Z 0 [Note] InnoDB: Waiting for purge thread to be suspended 2020-01-06T20:28:19.652650Z 0 [Note] InnoDB: Waiting for worker threads to be suspended … 2020-01-06T21:59:36.864031Z 0 [Note] InnoDB: Waiting for worker threads to be suspended 2020-01-06T22:00:37.055450Z 0 [Note] InnoDB: Waiting for worker threads to be suspended 2020-01-06T22:01:42.256680Z 0 [Note] InnoDB: Shutdown completed; log sequence number 18202546508
Percona Monitoring and Management
Being one of the key metrics on InnoDB, it has its own graph, where you can see the historical behavior:
Does Multi-AZ Save Me?
You might think that Multi-AZ will prevent the downtime on Master during the version upgrade. Well, that isn’t the case, as version upgrade happens simultaneously on the primary and standby instance and it will involve the downtime on Master instance. There is no failover during the version upgrade on the master.
Speeding Things Up
Avoiding a long period of downtime can be possible. The trick is to use a read replica to Upgrade+Promote.
The following steps should be taken to upgrade with minimal downtime:
- Create a read-replica (if you already don’t have one)
- Perform minor-version upgrade on the read-replica.
- Stop the application and confirm read-replica is up-to-date.
- Promote Read Replica as the new Master.
- Update application to send traffic to the new Master RDS endpoint.
- Create new read-replicas from the new Master.
Make sure the created read replica (and future master) have the same instance size, class, storage, and configuration as the current master. And don’t forget to do some housekeeping by deleting the old RDS instances.
What about disabling the change buffer?
InnoDB setting innodb_change_buffering controls whether InnoDB performs change buffering or not.
mysql> show global variables like "%innodb_change_buffering%"; +-------------------------+-------+ | Variable_name | Value | +-------------------------+-------+ | innodb_change_buffering | all | +-------------------------+-------+ 1 row in set (0.23 sec)
For reducing the time taken by the slow shutdown, you can set innodb_change_buffering to none temporarily. However, be aware that changing this will only affect the buffering on NEW operations. The merging of existing buffer entries is not affected. You can wait until the change buffer size reduces to a few KBs and then start the upgrade process.
Does this affect Aurora RDS instances?
Aurora doesn’t have an InnoDB change buffer, so it is not affected by this.
Conclusion
Amazon RDS is a great platform for hosting your MySQL databases. It provides an option for performing minor version upgrades in a few clicks. However, it can be a time-consuming process and may cause additional downtime. This blog suggests the recommended approach for planning the minor version upgrades with minimum downtime.
For more information, download our solution brief below which outlines setting up MySQL Amazon RDS instances to meet your company’s growing needs. Amazon RDS is suitable for production workloads and also can accommodate rapid deployment and application development due to the ease of initial setup.