Checklist for Scaling a Web Application
2010/08/31
After reading Todd Hoff’s list of scaling lessons learned, I decided to put together my own list of scaling tricks. These were all learned the hard way as well, and will scale you up to thousands of concurrent users. It’s worth noting that many of these don’t cost anything beyond programmer-time, and take less time than ordering/waiting for more hardware. You may still need to throw more hardware at the problem eventually, but these will make sure you’re effectively using the hardware you have.
- make sure you have an abstract data access layer that ALL queries go through – you can then reroute/cache them as needed without rewriting your entire app every time. You will have to do this eventually, and it’s easiest to do FIRST.
- make sure your DB servers are separate machines from your webservers. This seems obvious, but often isn’t the case.
- deploy your code to each webserver’s local filesystem – don’t use NFS or shared drives. This prevents NFS from fucking you, and is worth rewriting your deployment system (you do have a deployment system, right?)
- set up file caches (esp. smarty cache) per-machine, NOT on network/share. This avoids write-conflicts and loading delays. Write a script that will clear the caches on all machines on command.
- set up MySql in a master-slave setup with at least 2 slaves – this will allows you to scale at all.
- put any columns with fulltext indexes in separate tables (basic sharding), and do secondary writes in your code to update them. This speeds up all queries to the original table, reduces the chances of table corruption, and lets you return less data on many queries.
- use InnoDB for pretty much any table that doesn’t have fulltext indexes. This speeds up all queries to the tables by using row-level rather than table-level locking (there are exceptions, of course, but those depend on your app)
- set up a hot backup server that can be set as the master on short notice. TEST THIS FAILOVER!
- send all DB writes to the master. This allows you to scale reads and leaves the master time to handle all of the writes.
- send some reads to the master – specifically read-after-writes to confirm queries/get IDs/etc. This should be an option passed to your data access layer.
- set up a shared memcached or a similar technology to cache DB query results and user info – this avoids redundant read queries and can serve 100x as many queries per second as mysql. Seriously.
- do less DB write queries. Change your code to write multiple rows to a table at once instead of one at a time.
- write changes to the memcached cache as well as to the DB – this avoids re-reads.
- serve static files (images, css, etc) from a separate webserver or CDN – this leaves more CPU for actual work.
- move things to batch scripts/cron jobs. This lets you have single-processes doing heavy lifting instead of having them delaying page loads.
- move batch scripts/cron jobs to a separate server and slave database
- change batch scripts to use a job queue – this keeps your servers from overloading and lets you monitor health based on # of waiting jobs rather than on CPU usage. It also lets you have lots of copies of the script emptying the same queue.
- set up a second memcached server, shard your cached data between them – redundancy is good, and will let you scale memcached.
These are obviously not the only ways to scale a website, but they cover 90% of the problems that you’re likely to run into while scaling.
Advertisement
Comments are closed.