Internet Queries and Methicillin-Resistant Staphylococcus aureus Surveillance

The Internet is a common source of medical information and has created novel surveillance opportunities. We assessed the potential for Internet-based surveillance of methicillin-resistant Staphylococcus aureus and examined the extent to which it reflects trends in hospitalizations and news coverage. Google queries were a useful predictor of hospitalizations for methicillin-resistant S. aureus infections.

The Internet is a common source of medical information and has created novel surveillance opportunities. We assessed the potential for Internet-based surveillance of methicillin-resistant Staphylococcus aureus and examined the extent to which it refl ects trends in hospitalizations and news coverage. Google queries were a useful predictor of hospitalizations for methicillin-resistant S. aureus infections.
S taphylococcus aureus is the most common bacterial pathogen isolated from human infections (1). Methicillin-resistant Staphylococcus aureus (MRSA) isolates are strains constitutively resistant to β-lactam antimicrobial drugs. MRSA was initially largely confi ned to patients with health care exposures (2), but in the late 1990s, genetically distinct strains emerged and spread rapidly among healthy persons in the United States. These new strains, known as community-associated MRSA (CA-MRSA), differ epidemiologically and genetically from older strains (2,3). CA-MRSA strains have become the most common cause of skin infections in US emergency departments (4).
There is no systematic surveillance system in the United States for MRSA. The Centers for Disease Control and Prevention (CDC) tracks a limited group of infections defi ned as invasive through the Active Bacterial Core (ABC) surveillance system reported from 9 regions. These include MRSA infections at normally sterile sites. In a 2007 report, CDC used ABC surveillance to estimate that there were 94,000 cases and 18,650 deaths caused by invasive MRSA disease in the United States in 2005 (5). This report received extensive media coverage and increased public awareness of MRSA (6).
Recent efforts to overcome surveillance limitations, in particular delay and limited geographic coverage, have included Internet protocol (IP) surveillance. IP surveillance monitors Internet search terms related to a specifi c disease, assuming that greater disease activity correlates with more searches. The best known IP surveillance is Google Flu Trends (7), although other researchers have created additional models (8,9). Given the lack of comprehensive surveillance, we examined whether Google search data might productively supplement existing systems to track the changing epidemiology of MRSA infections. Because MRSA, unlike infl uenza, is unfamiliar to many persons, we hypothesized that Internet search activity might refl ect curiosity inspired by news reports and information-seeking related to actual infections or symptoms.

The Study
We used the Google Trends database to obtain the proportion of all Google searches that contained the words "MRSA" or "staph." "Staph" was included because many news stories refer to MRSA as "antibiotic resistant staph." "Methicillin-resistant Staphylococcus aureus" was too infrequently searched to be useful. Google Trends reports search activity relative to the average number of similar queries in February 2004. We only included US searches determined from IP addresses.
We extracted counts of US newspaper, wire service, and radio and television stories mentioning "MRSA" or "staph" from the LexisNexis Academic database. We spotchecked stories with the word "staph" to confi rm they were about MRSA. One event or medical publication could generate multiple news stories. We hypothesized that the volume of news coverage captured the relative effect of the story on search behavior.
We used quarterly hospital discharge data from the University HealthSystems Consortium Clinical Database, which includes >90% of US academic medical centers, to calculate the proportion of hospitalizations including an MRSA diagnosis. These data were a proxy for true MRSA incidence. We used the diagnostic code for MRSA from the International Classifi cation of Disease, 9th Revision (V09.0). MRSA hospitalizations include CA-MRSA infections that led to hospitalization and infections that developed during a hospitalization. This database includes <99 codes per discharge, more than other national hospital discharge databases. The likelihood of recording an MRSA diagnosis increases with longer lists of codes because of the many concurrent conditions in complex hospitalizations. Some medical centers systematically used <99 diagnoses fi elds. We adjusted hospitalization rates for the maximum number of codes submitted by each medical center each year. Data after the 3rd quarter of 2008 were not included because of implementation of a nationwide coding change for MRSA.
Quarterly variation in Google searches for "MRSA" and "staph" are shown in Figure 2. Search behavior changed markedly after the October 2007 publication. In addition to the spike, there was a subsequent change in the relative frequency of search term "MRSA" compared with "staph." Note that the news count peak in 2005 is not seen in the Google searches, and the peak in the Google searches in the 3rd quarter of 2006 is not apparent in the news counts.
Google queries were a useful predictor of MRSA hospitalizations and explained 33% of quarterly variation when used alone. Adding news counts to the model resulted in increasing the percentage of explained variation only modestly to 41%. The news counts were not a signifi cant addition to the model (p = 0.18).
Our fi nal model, which includes search queries and the 2 temporal indicator variables, but not the news counts, is shown in the Table. The correlation between model predictions and observed hospitalization rates was 0.93 (p<0.001). Although data after 2007 are insuffi cient for defi nitive comparison, a better prediction before than after the 4th quarter of 2007 is suggested (Figure 2).

Conclusions
We report an IP surveillance model for MRSA incidence. We hypothesized that news coverage for such an unfamiliar disease would strongly infl uence search activity. However, news coverage did not affect the relationship between search queries and hospitalization rates before the 2007 CDC report. The congruence of the Internet search activity and the hospital discharge data suggest that their temporal pattern represents the actual trend in MRSA: an increasing incidence during 2004-2007, with a suggestion of seasonal variation, and no increase in 2008. This pattern is not the same pattern documented by the ABC surveillance data for invasive MRSA infections (13).
The unfamiliarity of the public with MRSA poses a challenge to using Google Trends. Searches using the phonetic misspelling "mersa" show a parallel trend to searches using "MRSA," although they are less frequent, and the correctly spelled "methicillin" is too rare to track.
Hospitalized MRSA infections include hospitalassociated MRSA infections and the more serious CA-MRSA infections. Because evidence has shown that invasive hospital-associated MRSA infections decreased during the study period (13), the generally upward secular  trend in MRSA hospitalizations is more likely to represent the trend in CA-MRSA, especially because we now know that most MRSA infections have onset in the community (3). The inability to distinguish community and health care infections is nonetheless a limitation of the Google and the hospitalization data. Although some hospital databases include more hospitals, they include fewer diagnostic codes. Therefore, there are no additional comprehensive data available for MRSA incidence. The lack of any true standard for MRSA incidence is why IP surveillance is potentially useful.