Accepted for/Published in: JMIR Infodemiology
Date Submitted: Nov 11, 2021
Date Accepted: Jun 24, 2022
Confounding academic search volume in Google Trends data: Identifying true disease seasonality using a Fourier analysis approach
ABSTRACT
Background:
Internet search volume for medical information, as tracked by Google Trends, has been used to demonstrate unexpected seasonality in the symptom burden of a variety of medical conditions. However, when more technical medical language is used (e.g. diagnoses), we believe this technique two be confounded by the cyclic, school-year-driven internet searches of healthcare students.
Objective:
1) To demonstrate that non-physiologic “academic cycling” of Google Trends search volume is present in many healthcare terms, 2) to demonstrate how signal processing techniques can be employed to filter academic cycling out of Google Trends data, and 3) to apply this filtering technique to some clinically relevant examples.
Methods:
We obtained Google Trends search volume data for a variety of academic terms demonstrating strong academic cycling and used Fourier analysis to 1) identify the frequency domain fingerprint of this modulating pattern in one particularly strong example, and 2) filter that pattern out of the original data. After this illustrative example, we then applied the same filtering technique to Internet searches for information on three medical conditions believed to have true seasonal modulation (myocardial infarction, hypertension, and depression), and all bacterial genus terms within a common medical microbiology textbook.
Results:
Academic cycling explains much of the seasonal variation in Internet search volume for many technically-oriented search terms, including the bacterial genus term “Staphylococcus”, for which academic cycling explained 73.8% of the variability in search volume (using Spearman’s rank correlation coefficient^2, p < .0001). Of the 56 bacterial genus terms examined, six displayed sufficiently strong seasonality to warrant further examination post-filtering. This included 1) “Aeromonas + Plesiomonas” (nosocomial infections more searched in summertime), 2) “Ehrlichia” (tick-borne disease more searched in late spring), 3) “Moraxella” and “Haemophilus” (respiratory infections, more searched late winter), 4) “Legionella” (more searched mid-summer), and 5) “Vibrio” (which spiked for two months mid-summer). The terms “myocardial infarction” and “hypertension” lacked any obvious seasonal cycling after filtering, whereas “depression” maintained an annual cycling pattern.
Conclusions:
Although it is reasonable to search for seasonal modulation of medical conditions using Google Trends internet search volume and lay-appropriate search terms, the variation in more technical search terms may be driven by healthcare students whose search frequency varies with the academic school year. When this is the case, using Fourier analysis to filter out academic cycling is a potential means to establish whether additional seasonality is present.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.