Drupal Data Mining for HolaMun2.com
The Need for Data Mining
Mun2, a part of Telemundo cable, is a channel created for America’s Latino youth. The corresponding Drupal website serves the same demographic (American Latinos ages 13-18) and provides information on the tv lineup as well as news and related interactive content.

In an effort to anticipate and respond to user trends, and in order to draw and retain visitors, HolaMun2 project managers needed a robust business intelligence platform to query visitor data and perform analysis using visualization tools.
Data Mining in Drupal
However useful visitor data may be, significant challenges face any organization wishing to extract it. In some cases, a robust data mining solution requires a proprietary system with its own API, hardware requirements, license fees, and ongoing maintenance.
Drupal alleviates some of this pain because it offers organizations the option of building integrated data mining platforms within an open-source framework. In the case of HolaMun2, Achieve Internet built a web-accessible data-mining package that allowed managers to review visitor data through Drupal. The solution visually renders the data to speed up comprehension and make decision-making easier. The Achieve team satisfied the immediate reporting requirements and also created an extensible system that enables low-maintenance query-building for changing business needs.
Business Intelligence Architecture

Achieve and HolaMun2’s internal web team agreed that performing data mining queries on the site’s Drupal database would be ill-advised. First, the queries would be complicated because the Drupal database is optimized for content management, not business intelligence. Second, performing queries on the live, user-facing site would negatively affect performance.
Accordingly, the Achieve team built a custom database designed in a more mining-friendly format. This custom database pulls data from Drupal and re-deposits it into tables optimized for mining with page_view keys. This information is then accessed via a separate Drupal site and displayed in a user-friendly fashion.
Importing Site Data

Using the existing query requirements, and a list of future enhancements, the Achieve team designed the custom database and mapped the Drupal fields to their new homes. Achieve also created custom Drupal modules to handle the transfer. When import is clicked, the modules connect to the site database, present the user credentials, extract the data, and insert it into the custom database.
Though this data import could be scheduled in cron, HolaMun2 site admins are comfortable, for now, with performing this function manually as required. The UI for this function is straightforward, simply requiring the user’s credentials and location of the database.
The interface pictured is accessible on the Data-Mining Drupal site. Note that the user can specify the database name, host, and port. Although the current setup serves just one site, the Achieve team added this flexibility, and the mining platform can house data from multiple sites in the future, re-using the importing process.
Querying Visitor Trends

After importing the data, it can be queried and transformed into actionable intelligence. For HolaMun2.com, the initial need was for demographic data by page view. Site managers were interested in what content appealed to which segments of their users, in order to make sure the site appealed to all users in their targeted age ranges and geographic locations.
For an example, the site may post a video and get a strong response in terms of number of views. However, looking into the demographic data could reveal that mostly males 13-15 were watching. Site managers can react to this by posting content directed toward other visitors. Alternately, managers can take advantage of the traffic and appeal to those watching the popular video.
The filters shown here include date and age ranges and the breakdown of this data can be shown for each page. Filter configurations are limitless, and the Drupal modules that handle the queries are coded so that adding and removing filters is as simple as changing the module configuration or, in some cases, adding a small amount of code.
HolaMun2.com will be adding filters soon for poll questions. In addition to gathering how visitors voted, website administrators will be able to filter results by age, gender, and location. Another future enhancement will be related to the site’s bilingual content. Managers will be able to determine the specifics and tendencies of its Spanish-language visitors, and make appropriate content adjustments.
Rich Visualization Tools
Effective business-intelligence-focused data mining also requires that the data be rendered in a way that speeds up comprehension and makes decision-making easier. Here we see how the reports are rendered (please note: these numbers are only examples and don't represent actual data.):

These results break down the users who viewed a page by gender and location. The charts are configurable as pie or bar. Custom Drupal modules handle the extraction from the custom database and the subsequent rendering.
The Complement to CMS

For a visitor-focused site such as HolaMun2.com, data mining is the yin to content management’s yang. The data-mining platform the Achieve team built provides visual impressions to site managers who then can respond to usage trends and plot web strategy by making better content decisions.
HolaMun2.com can mine and perform usage queries without taking a performance hit and without employing a closed, proprietary system. With a Drupal front-end and a custom database mapped to the user-facing website, HolaMun2.com’s data-mining platform serves its current needs and can readily adjust in the future.



Comments
Post new comment