Infrastructure Services

Open Government Data in a Drupal Cloud

Achieve Internet

Drupal Consulting

In mid-March, the Secretary of the Department of Education (DOE) announced the findings of a study on computing in education. The study's data was to be published on the web in concert with the announcement, in order to take advantage of the news cycle.

The department wanted to allow site visitors to comment on the data, so they needed a platform that could manage and incorporate user feedback. As important, they needed a web architecture that could scale quickly in response to traffic spikes. Achieve Internet was brought in three weeks before the proposed release date to take on this considerable challenge, and we were able to deliver an effective Drupal solution.

For reasons unrelated to Drupal or Achieve, the DOE ultimately published the data themselves, albeit late and with limited interactive functionality. Nevertheless, the development Achieve was contracted to do on the project is noteworthy, particularly in terms of scaling Drupal in the cloud.

To handle the anticipated traffic fluctuations and adhere to performance requirements, Achieve deployed and configured Amazon EC2 servers using Rightscale’s cloud management platform, employed ai-Cache’s high-powered caching mechanism, and wrote several custom Drupal modules.

Here's how we—with a lot of help from our Drupal community friends—did it.

Drupal Amazon Machine Instance (AMI) Configuration

Since the Drupal site would sit in the Amazon cloud, we needed to configure the cloud servers by writing a custom Amazon Machine Image (AMI) that could house all our applications, data, and libraries. Luckily, we didn't have to start from scratch. ChapterThree developed an AMI in their Pantheon/Mercury project for Drupal that includes many of the usual Drupal stack suspects: Apache HTTP Server, MySQL, PHP, and Apache Solr. In the Mercury configuration, settings are already optimized for Drupal (e.g. Apache's .htaccess and httpd.conf).

We used Mercury as our starting point and modified it by removing the Varnish install and adding ai-Cache, a heavy-duty caching platform. With a few other tweaks, this new AMI was lean yet flexible enough to handle the various site tasks.

Rightscale Cloud Management

Next we turned to the Rightscale cloud management platform. One of the features of Rightscale is the ability to write Rightscripts, which can execute on new AMIs in the cloud and ensure appropriate configuration. For example, if a database server hits its trigger level (which you can set), a new AMI is spawned in the cloud and the db-server Rightscript runs. In this case, the script would start the database server, but would not start the web server, in order to save system memory for the machine's primary purpose. If a web server trigger is reached, a different script is run in the new AMI in order to start Apache and ai-Cache.

Using a base AMI with Rightscripts is analogous, in OOP, to instantiating an object and passing it some parameters to initialize its state.

Some dynamic instance configuration can be done using Amazon's “user-defined instance data,” but RightScale offers much more flexibility in terms of the scripts and monitoring servers. The scripts also enable you to tune the AMI, by allocating memory in specific ways and running cron jobs to perform tasks such as purging log files.

Drupal Customization for Heavy-Duty Caching

As with any sizable site, caching was a requirement and it posed significant challenges. As mentioned, the Achieve team used ai-Cache, a high-powered product that can serve over 250,000 requests per second. In order to take advantage of that capability, we needed to design the Drupal site carefully and also create an interface that would tell ai-Cache which content to store and when to refresh it.

On the Drupal side, most of the pages were fairly straightforward: Title, Body, Comments. In most cases, as much as 75% of the page could be cached in a standard fashion. What we ended up doing was splitting the pages in Drupal so the parts of the page that were specific to user roles remained independent entities. For the most part, the non-cached section was the part of the page that allowed certain users to post comments.

Then, the Achieve team wrote two custom modules that could communicate dynamically with the caching software. The first module, ai-Cache, sends refresh orders after common user events such as form submission. The other module, ai-Cache-comment, stores expired page keys in an array and can respond to user comment events by ordering ai-Cache to request a refresh.

Luckily, considering the compressed timeframe, minimal theme adjustments were required. We made sure that the DOE's data, not fancy bells and whistles, remained the focal point of the site.

A Successful Drupal Cloud

The Achieve team completed the project in the alotted three weeks. It certainly wouldn't have been possible without our partners ai-Cache and Rightscale, and we benefited from standing on the shoulders of giants: ChapterThree for the Mercury AMI, and the wider Drupal community for all their various contributions.

Given the time crunch, our custom modules aren't quite ready for community release, but we plan to get them out so others working on large sites can take advantage of ai-Cache.

Although it was extremely challenging, it was fun to work on such a compressed timeframe. Unfortunately, our site wasn't published, but the work the Achieve team did will certainly be useful in future projects. Additionally, it's great to see that Drupal sites can get up quickly and scale and perform in the cloud.

More about Infrastructure Services

Achieve Internet is excited to announce the recent launch of a new Digital Health Library created for our long-standing client Scripps Translational Science Institute.  
The importance of monitoring your server access logs We recently had a client come to us with random performance problems on their site.  The site is a low traffic brochure site on a shared hosting account.  Most of the time the site performs reasonably well.  However, there are mysterious times where the site slows down to a crawl. This reached a critical point Friday afternoon and their hosting provider shut down the site for using too many server resources.