notes on shopify sharding rails in 2014
September 24, 2017
These are notes from Shopify’s 2014 presentation at Big Ruby 2014 on sharding RAILS.
Viewable Here: https://www.youtube.com/watch?v=6njTQdFLz6I
Presenter is Camilo Lopez.
Shopify is a ecomerince company
Shopify in 2014 runs on 100 application Servers - hardware is hetergenous.
~100,000 Shops, 88,000 are profitable.
Running Ruby 2.1, Rails 3.2 and MySQL 5.5 (percona patched)
In 2014 150,000 Requests per min, Peaks at 400,000 and 20,000 Queries Per Sec
In 2014 shopify processed $3,7000/min in orders
Outage = burning money.
Why you should NEVER EVER SHARD a rails app:
- Rails and libs assume one DB
- Hard to do aggregated queries - quering across DB’s is a pain
- Primary keys are now broken
- Be ready to write custom code - in uncharted waters
Alternatives to sharding:
- Can buy a bigger machine to avoid sharding - throw hardware at it
- Tune and Cache your way out of it
Why do you have to shard:
- Traffic has been doubling year over year - Alternatives can’t scale as fast
- Black Friday and Cyber Mondy BOOM load 2x and Writes EXPLODE! which cannot be cached
Shopify/Rails is not threaded but MySQL is.
- 2011 database is: 2xE5640, 192GB RAM, 2x300GB OCZ Z-Drive R4 PCIe SSD
- 2012 datbase is: 4xE54650, 256GB RAM< 2x600GB OCZ Z-Drive
Can’t scale the single system anymore - need to scale out - ie shard
Benifits of sharding:
- Horizontal Scaling
- Better localized buffer pools for keeping data warmer longer
- Smaller INDEX
- Failure isoloation
2013 Start Journey to Sharding
Constraints to Survive Sharding
- no mixing of shards in the same transition
- no Joins between shards
- no nested shard selection - use blocks to define shard contexts.
- Had to transition Live, can’t stop and shard the database in a flag day
How did they shard the datata:
- Shop is top level container which relates all the tables to a shop
- Add ShardID to a shop.
The basic data model around the shop is a master database table exist which contains global - the table of all shops and their shopid and shardid
Shard databases which contain Product, Blog, Order and Comment Etc tables related to a shop id.
So what about primary keys (auto-increment)
Add a service to provide unquie keys - issues with big numbers and service dependancy
MySQL configuration to the rescue:
- auto_increment_increment
- auto_increment_offset
- next_id = auto_increment_offset + N * auto_increment_increment where N is an positive int
- Setting inrement,offset on your shards generates interleaving ID’s
Shard | increment,offset | ids shard_0 | 4,1 | 5,9,13,17 shard_1 | 4,2 | 6,10,14,18 shard_2 | 4,3 | 7,11,15,19
This why you can avoid a depancy
Created module called Sharding::Concern which allows code blocks to operate in shard context
How do you route http request? Use shopid.
This is all mostly transparent in day-to-day tasks. Since most code is scoped to a single shop.
Rebalancing shards:
- Lock-and-Move a shop via a global exclusive lock
- Per shop shared lock
- Implemented the locks with Zookeeper (failed) went back to redis.
- If a shop is locked the top level controller will return 503
Shop Migration: Lock Shop, Wait 100s, Unicorn Dead, Jobs are done, Fork, Copy Verify Data, Change shard ID, Unlock it, done
What about the old data? Data is cleaned up in bulk offline
Online mover not done. I’m assuming this is done LATER (Jared mysql proxy)
Joins?
- Can’t just iterate - might hit a locked shop or a dead-uncleaned zombie shop
How long did this take?
- Dec 2013 - Start
- March - Look for a vendor solution - none found
- May - DB-Charmer, Constraints and verifiers
- May - Oct - worked!!!
- Oct, Nov Move Shops
Sharding Checklist:
- How to slice Data
- Primary Key Generation
- Connection Switching
- Query Across Shards
- Rebalancing
Affect on CyberMonday:
- Flash sales spiking on shards brings 40,000 checkouts in a min or two
- sharding allows seperation of flash sales from normal shops
- 80ms avg response time good!