Active context: blog_bios

Superfast Searching on Drupal.org with SOLR

An All New Drupal.org

Drupal’s web site is the hub of the Drupal community. Developers, business owners, CTOs, and curiosity seekers visit the site for core software downloads, add-on modules, support forums, documentation, news, and more. Drupal.org bustles with activity, and since it’s the face Drupal presents to the world, speed, capacity, cutting-edge look-and-feel, and impressive functionality are all important.

A massive effort to re-design and upgrade the site is underway, with the aims of better serving the community and promoting Drupal. A particularly important component of the re-design is site searching. Although some visitors just browse, many come for a specific purpose and want to quickly locate what they need. As a result, the ability to support fast and reliable filtered searching is an absolute must for Drupal.org re-design project managers Kieran Lal and Lisa Rex.

Project Description

The re-design draws on the expertise and contributions of the Drupal community, with various teams working on discrete portions of the site. Achieve Internet tackled the theme section of Drupal.org, promising to develop and deliver pages that allow users to browse, search for, and download themes for their own Drupal sites. The component would enable advanced searching so users can filter results based on theme characteristics such as column number, screen resolution, and layout type (i.e. fixed or fluid).

The Achieve Internet team, led by CTO Bill O’Connor, quickly discovered that the existing Drupal content types didn’t support the required functionality. Furthermore, Drupal.org offloads its searching onto a server running Solr, so filtering would require Solr configuration. As the project grew in scope, Kieran and Lisa depended on the Achieve Internet team, which had completed several successful enterprise Solr integrations, to lead the development and configuration of the search infrastructure for the new web site.

Customization and Integration Tasks

The first step in the theme search project would be to add fields to the “project” content type in Drupal in order to store additional characteristics. Because there are several project types (themes, modules, profiles, etc.), these modifications must be made so that appropriate fields appear for each type. In the case of themes, the fields would store number of columns, resolution, and layout type. Next, the search engine itself would have to be addressed.

Solr houses a database that is in essence a copy of the documents and data of an application (in this case, Drupal.org). The Solr database, organized to take ad-vantage of the high-performance Lucene Java search library, would have to be configured to index themes on the fields that were added to the content types. This would allow for results filtering (“faceting”, in Solr par-lance).

A chief difficulty in this part of the project, due to the number of teams working on the re-design, would be lack of direct access to the Drupal.org Solr instance. The theme search developers would have to find a way to replicate the site in a test environment. After configuring Solr, Drupal search queries would need to be optimized to take advantage of the indexing.

All this needed to be done with port-ability in mind, so groups working on search components specific to other areas of the site could implement faceted searching with minimal effort. For example, the team focusing on modules would need to be able to use the filtering mechanism built for themes.

Solution Delivery

Bill and his team were able to make the necessary content-type and querying modifications in Drupal, taking care to abstract enough of the structure to handle content types other than themes. The team also replicated the Drupal.org – Solr configuration on their own servers, an undertaking that required intimate knowledge of both technologies, in order to test code and index settings.

This work provided the plumbing for a rich and intuitive filtering form that allows users to drill down in result sets using theme characteristics. The form works at blazing speed and the plumbing can be reused in other search modules.

On the data maintenance side, the team modified the “add a theme” page to allow administrators to select and store the appropriate filtering characteristics when importing a new theme. The fields and possible values appear automatically, lowering the difficulty level for non-technical admins, and reducing the potential for data corruption.

With the front and back-ends complete, the Achieve Internet team delivered an on-time, functional, fast, and flexible theme searching component of the highest quality. In addition to experience with the specific technologies, their adherence to the policy of designing, developing, and testing for flexibility and reuse led to less buggy code. This practice paid off in the end: they delivered a theme searching component that worked on the staging site, as intended, right away.

When the new site goes live in 2010, the functionality and performance of the theme component will enhance the appeal of Drupal and solidify Drupal.org’s place as the hub of the Drupal universe.

Looking Ahead

In wrapping up the project, Achieve Internet made sure to carefully document its code and Solr configuration steps so the process could be repeated for the advanced searching that occurs elsewhere on Drupal.org—and there is a lot of searching. The support forums, knowledge-base, add-on modules, and other content areas have searching components.

As the website re-design marches on, Achieve Internet will continue to support Kieran, Lisa, and other Drupal.org community members. Bill and his team will assist with Solr integration and also help other teams take advantage of the benefits that open-source development has to offer, including code reuse, solution sharing, and community support.

Comments

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options