March 27

Database Craziness. Downtime, too. Not Directly Related.

We’ve had two episodes of lengthy downtime in the past 60 days. Early this morning we were down for almost 3 hours. Back in February it was even longer at over 4 hours on the night of the 6th. Let me first be very clear: Any downtime at all is unacceptable, planned or otherwise. Our job is to do our best to prevent this sort of thing as much as is possible. We’re currently failing in that effort.

Both of these episodes were avoidable. No, it wasn’t a “perfect storm”. Port 443 was hung and we were asleep. In both instances, we were back up within a few minutes of cracking an eyelid.

No, we don’t have our humans monitoring services 24/7. We should. No, we don’t have enough redundancy to avoid this particular issue. We should. Why don’t we? The answer is simple and not one that anyone wants to hear. A choice has been made to keep our hosting infrastructure in it’s current state, a vulnerable state, until we secure additional capital to improve it. We should have more humans and more redundancy. Both cost money and lots of it.

You’re probably wondering what it is we *are* doing. I’ll give you an example. Yesterday I posted a long overdue change log of what the development team has been doing over the past few months. In one of those many bullet points, I mentioned that we had flipped our database keys – primary keys and foreign keys, from a 128 byte UUID to a basic integer. One of our favorite customers emailed me and said:

Impressive! Any quick tips you can share for low impact DB maintenance?

It’s quite an unsettling feeling that we are now boasting about DB maintenance fu when we just had a ridiculous amount of downtime. Well, this is actually a great example of what we have been doing to improve and how we sacrifice our time in extra effort to avoid any time offline. It’s interesting, too, and a subject of a blog post that has never surfaced. So, I’ll just describe briefly what we did and how we did it.

Before I get started, I should explain that the reason we were forced to make this drastic change to our database schema was another fault of our own. When I first created the schema for CheddarGetter’s primary database, I decided to give UUIDs a whirl. I used them for primary keys and foreign keys. The columns are char(36). They’re hex UUIDs with dashes – the kind you get from MySQL’s UUID() function. That’s 128 bytes UTF-8. Yeah, UTF-8, entirely unecessary for these columns. Integers are far more efficient in BTREE indexes than are 128 byte strings and far smaller, too. The difference in efficiency was causing severe headaches for everyone – our dev and support teams and for our customers. We absolutely had to make this change.

The size of our database at the time was around 12G. That’s not huge but it’s large enough that making the changes we needed to make would cause hours of planned downtime if we were to use traditional means. Since outages are unacceptable, even those for planned maintence, we figured out a way to do it without affecting live services at all, not even for a minute. Instead of a few days of work, it took more than the entire month of January. We put in the extra time in order to prevent hours, likely >10 hours of database locks. Here’s how we did it:

  1. Plan the new schema
    • New id columns
    • New primary keys
    • New foreign keys
  2. Override database access in the app framework so the app is tolerant of changes
    • Make inserts create both keys
    • Make updates reconcile both keys
    • Make lookups by primary key work for either key
  3. Make the changes

Normally when you make schema changes, you simply use an ALTER statement. The trouble with ALTERs is that the table is locked while the alter takes place. When an ALTER is executed, the entire table must be recreated. For a single 4G table, for example, that can take a long time, maybe an hour or more. Ours was a unique situation because we were going to have to run an ALTER more than once for the same table because we were dealing with primary and foreign keys. In short, these changes cascade into other tables and had to be well orchestrated. Each table had to be altered, then each related table had to be altered, then the original table altered again, and the related tables again, too. That’s craziness, so we found a way to do it without any downtime.

Enter Percona Toolkit’s pt-online-schema-change, a well-named tool. An ALTER using pt-online-schema-change 1) creates a copy of the target table schema, 2) runs the ALTER on the new empty table, 3) copies all the rows from the old table, 4) ensures the two tables are in sync, 5) replaces the old table with the new. There’s a lot of database trigger witchcraft mixed in there and it works quite well. The sacrifice is time. It takes around 10x longer to alter a table using pt-online-schema-change than a traditional alter.

Here’s a simple example of what we needed to do and how we did it with pt-online-schema-change:

Consider this simple schema:

CREATE TABLE Caves (
	id char(36) not null primary key,
	location varchar(255),
	createdDatetime datetime
) ENGINE=InnoDB;

CREATE TABLE Bears (
	id char(36) not null primary key,
	caveId char(36) not null,
	constraint caveId_fk foreign key (caveId) 
		references Caves(id),
	name varchar(255),
	createdDatetime datetime
) ENGINE=InnoDB;	

First, alter the table remove the primary key, add the new primary and set an index on the old primary:

alter table Caves 
	add unique index id(id), 
	drop primary key, 
	add column newId int unsigned 
		not null primary key auto_increment FIRST;

… and the pt-online-schema-change equivalent (two steps required):

pt-online-schema-change --alter \
"add column newId int unsigned not null \
auto_increment FIRST, add index newId(newId)" \
D=testdb,t=Caves --lock-wait-time=50 \
--alter-foreign-keys-method auto \
--nocheck-replication-filters --dry-run

pt-online-schema-change --alter \
"drop primary key, drop index newId, \
add primary key(newId), add unique index id(id)" \
D=testdb,t=Caves --lock-wait-time=50 \
--alter-foreign-keys-method auto \
--nocheck-replication-filters --dry-run

Next, do the same for the related table(s) but also make foreign key change(s):

alter table Bears 
	add unique index id(id), 
	drop primary key, 
	add column newId int unsigned 
		not null primary key auto_increment FIRST,
	add column caveNewId int unsigned not null after id;
pt-online-schema-change --alter \
"add column newId int unsigned not null \
auto_increment FIRST, add index newId(newId)" \
D=testdb,t=Bears --lock-wait-time=50 \
--alter-foreign-keys-method auto \
--nocheck-replication-filters --dry-run

pt-online-schema-change --alter \
"drop primary key, drop index newId, \
add primary key(newId), add unique index id(id), \
add column caveNewId int unsigned after id" \
D=testdb,t=Bears --lock-wait-time=50 \
--alter-foreign-keys-method auto \
--nocheck-replication-filters --dry-run

Populate the new foreign key:

update Bears b, Caves c 
	set b.caveNewId=c.newId 
	where b.caveId=c.id;

We did this a block at a time using the `watch` command:

watch -n 0.1 'mysql -e "select count(*) from Bears \
where caveNewId is null; update Bears b \
set b.newCaveId=(select b.newId from Bears b \
where b.caveId=c.id) \
where b.newCaveId is null limit 50000;" testdb'

Last, setup the new foreign key(s):

alter table Bears 
	add constraint cave_fk foreign key (caveNewid) \
	references Caves(newId);
pt-online-schema-change --alter \
"cave_fk foreign key (caveNewid) \
references Caves(newId)" D=testdb,t=Bears \
--lock-wait-time=50 --alter-foreign-keys-method auto \
--nocheck-replication-filters --dry-run

For one of our tables, that process took 6 days.

Alright, long story short, we’re working hard to improve and to avoid as much downtime as possible. Things are going to get better. Honest.