Last week we completed the migration of MindTouch Deki Express (aka wik.is) to the Amazon Elastic Computing Cloud (EC2). We’ve been seeing great growth from our Express offering lately, and we wanted to make sure we continued to offer the great uptime and performance our users have come to expect.
A ton of work went into this migration and I’ll give an overview of some of the problems we had to solve to make the migration a success.
For those of you who don’t already know, EC2 is a cloud computing infrastructure which allows you to dynamically allocate virtual servers based on demand. While EC2 by itself is a fantastic technology, it’s missing a few key components like a comprehensive management platform. For this, we chose to use the RightScale Platform.
RightScale builds on top of EC2 and provides:
- pre-configured server templates
- fully scriptable (and repeatable) server configuration
- ability to clone scripts, templates, deployments (collection of servers)
- performance graphs, monitoring, alerts
We are also using Amazon’s Elastic Block Storage for persistent, reliable storage volumes. EBS allows us to easily snapshot our volumes for easy recovery in case of a crash.
This whole system allows us to start-up new instances and scale out at a click of a button! If the system comes under high usage, more servers are automatically added to our cluster, thus ensuring great performance for our Express users
After many iterations, we came up with the following architecture:
I’ll provide a brief overview of the infrastructure behind the new deployment here, but for a more detailed view, check out the MindTouch Developer wiki.
For our load balancer, we are using HAProxy. HAProxy is a screaming fast software load balancer. It fit our requirements perfectly and RightScale had pre-built scripts for configuring and adding/removing backend web servers.
Apache (PHP/Deki API):
To keep things simple, we chose to host the PHP and C# bits on the same server (though they could easily be separated and scaled independently). When an Apache instance boots, it registers with the load balancer and starts accepting requests. We did performance testing using apachebench and found the High-CPU medium EC2 instance to offer the best price/performance ratio.
Our MySQL servers are setup in a master/slave configuration. Both master and slave are EC2 large instances with the data stored on an EBS volume. We take daily snapshots of the slave database. If the master fails, the slave can easily be promoted to master. If the slave fails, a new slave can be launched and populated with the data from the latest snapshot and replication can begin.
Our Deki config is fairly complex. Obviously, we are using a multi-tenant configuration. In the multi-tenant setup, instead of fetching configuration information for each wiki instance from the mindtouch.deki.startup.xml file, we fetch the data from a web service. This web service uses it’s own database to manage wiki instances. We also run our extension services and lucene index on a separate EC2 instance. Finally, we are storing our PHP sessions in memcache (with memcached running on our master and slave database instances). This allows us to launch any number of PHP/API servers and round-robin the requests.
Sending emails directly from EC2 instances is problematic. There are some spam filters that reject email from EC2 hosts so we decided to use an email relay. We configured postfix to send emails through our email provider (01.com) using SMTP auth.
RightScale provides a monitoring, alerting, and auto-scaling system. The scaling process is pretty simple and all based on voting. When a frontend instance load gets over the defined alert threshold for some time, it votes for “growth”. If at some point the majority of the voting instances are voting for “growth” then a new frontend is launched to help them out. Scaling down works exactly the same way, the instances vote for “shrink” when the load stays under the alert threshold.
As you can see, putting together this deployment took a great deal of work. However, there are many benefits. The new site performs better, can scale automatically, and gives us better disaster recovery. Finally, since everything is fully scripted, we can “clone” our deployment in RightScale, change a few configuration inputs, configure our DNS and launch the entire deployment with a single click! In fact, we’ve already seen our system auto-scale up (added more servers automatically!) under peak usage for Deki Express!