The State of Solr Search Integration on Drupal.org

William O'Connor
CTO

Apache Solr Search

For news and updates on the open-source Drupal CMS project, visit the Drupal Association’s Web site at www.Association.Drupal.org. For news and updates on the open-source Drupal CMS project, visit the Drupal Association’s Web site at www.Association.Drupal.org. This week, Achieve completed one of the final phases of the integration project for the powerful Apache Solr search platform. For this project, the goal was to implement greater functionality and provide users with the ability to search not only Drupal.org, but partner sites (e.g., Association.Drupal.org and Groups.Drupal.org) as well. In addition to Solr enrichment objectives, Achieve developed a project browsing module, which makes it possible for users to browse the projects section. At present, this module is used to provide the Downloads pages for Drupal.org, while it integrates a special taxonomy classification that will certainly benefit users.

The Solr Solution

By integrating the Lucene Java search library to store data, Solr provides fast, efficient, and highly customizable search solutions for Drupal-based Web sites. Compared to the conventional Drupal search platform, Solr provides far more accurate results with tailored sorting/filtering capabilities. Conventional Drupal searches simply do not scale well and become performance-heavy rather quickly; search results are not as relevant as desired and there is simply no way around some of these constraints using the MySQL-backed, Drupal search system. With Solr and its Lucene Java index, the end result is that users find what they’re looking for quickly, rather than fumbling through endless amounts of irrelevant data. Solr also provides a more systematic content organization system that allows it to handle not only more data but far more intricate data as well.

In order to allow users to search Drupal.org’s partner sites as well, Achieve designed an infrastructure and path that fused these additional sites into a single Solr index. While expanding the search database, these efforts intrinsically make it easier for users to find relevant content. By creating a meta-type filtering concept, content could be grouped in new, more valuable ways. For instance, Modules and Themes represent projects tagged with a particular taxonomy term.

Data Integration & Administrative Control

Through Solr integration, administrators gain control over which results are returned and how they’re displayed. Administrators also can “weight” certain data to increase or decrease its significance. Clever “and/or” algorithms, coupled with the weighting principle, allow administrators to easily customize their site’s search mechanism. In other words, Solr puts full control in the hands of people who understand the complexities of their site’s data in order to provide their users with increasingly accurate results. For these reasons and many more, Solr now provides search solutions for massive, high-traffic Web sites such as Netflix.com, WhiteHouse.gov, and StubHub.com.

Solr on Drupal.org

So what can users expect to see upon full launch of the redesigned Solr aspects on Drupal.org? On the redesigned core pages, the search box in the top right will allow users to perform a Solr search by Modules, Themes, Documentation, or Forums & Issues. As seen in Figure 1 below, users also can customize their searches within the Modules tab by Module Categories, Compatibility (i.e., Drupal version), and Module (i.e., content within the module). In addition, users can sort by elements such as Most Installed, Last Built, Title, Author, Date, etc. Achieve then took it a step farther and introduced sidebar blocks (refer to Figure 1 below) and built out blocks at the bottom of the Downloads page (refer to Figure 2 below). Other features include keyword highlighting, faceted search capabilities, database amalgamation, and body-copy searches that include rich document content. Some of these elements, including the rich searching components, have already been released to Drupal.org in their initial form; these allow users to begin to take advantage of the improved user experience and deeper search capability right away. Others, such as the blocks for Most Installed, will only be released once the redesign is completed.

Solr Search Integration on a Drupal Website
 
Similar search capabilities are included on the Themes tab and Homepage (image below). Achieve knew that users would benefit from sub-queries, such as resolution (e.g., 1024x768, 800x600, etc.), but the Solr module does not provide this functionality. To remedy the situation, they generated a system to expand project modules to allow association of vocabularies to top-level project terms (e.g., Modules, Themes, Installation Profiles, etc.). These efforts allowed users to associate vocabularies (e.g., resolution) to any-given project, making it far easier to find what they’re looking for. As seen below, the ability to filter by Drupal version was integrated as well.


 
Achieve will help launch Drupal.org in early October 2010 and is honored to contribute, in the future, to the advancement of Drupal community. Upon launch, developers will be able to more easily navigate, discover, and share content vital to the evolution of Drupal.

Achieve is uniquely qualified for this project having built some the most dynamic, high-traffic sites for companies such as Disney, NBC Universal, and Viacom. For more than a decade, their team of in-house Drupal experts has been providing architecture, module integration, and other content delivery solutions for even the most challenging environments. For more information on Solr technology and Achieve, visit www.AchieveInternet.com. 

To learn more about the redesign project visit Drupal.org: Redesign Update Sprint 2 and 3