Infodemiological data of West-Nile virus disease in Italy in the study period 2004–2015

Google Trends (GT) was mined from 2004 to 2015, searching for West-Nile virus disease (WNVD) in Italy. GT-generated data were modeled as a time series and were analyzed using classical time series analyses. In particular, correlation between GT-based Relative Search Volumes (RSVs) related to WNVD and “real-world” epidemiological cases in the same study period resulted r=0.76 (p<0.0001) on a monthly basis and r=0.80 (p<0.0001) on a yearly basis. The partial autocorrelation analysis and the spectral analysis confirmed that a 1-year regular pattern could be detected. Correlation between GT-based RSVs related to WNVD yielded a r=0.54 (p<0.05) on a regional basis. Summarizing, GT-generated data concerning WNVD well correlated with epidemiology and could be exploited for complementing traditional surveillance.


a b s t r a c t
Google Trends (GT) was mined from 2004 to 2015, searching for West-Nile virus disease (WNVD) in Italy. GT-generated data were modeled as a time series and were analyzed using classical time series analyses. In particular, correlation between GT-based Relative Search Volumes (RSVs) related to WNVD and "real-world" epidemiological cases in the same study period resulted r ¼0.76 (po 0.0001) on a monthly basis and r¼ 0.80 (po 0.0001) on a yearly basis. The partial autocorrelation analysis and the spectral analysis confirmed that a 1-year regular pattern could be detected. Correlation between GT-based RSVs related to WNVD yielded a r ¼0.54 (p o0.05) on a regional basis. Summarizing, GT-generated data concerning WNVD well correlated with epidemiology and could be exploited for complementing traditional surveillance. &

Subject area Epidemiology
More specific subject area

Digital epidemiology
Type of data

Experimental features
Validation of Google Trends-based data with "real-world" data taken from the Italian National Health Institute (ISS) was performed by means of correlational analysis. Further, autocorrelation and partial autocorrelation analyses and regressions were carried out.

Data source location
Italy Data accessibility Data are within this article

Value of the data
To the best of our knowledge, this is the first thorough quantitative analysis of West-Nile virus disease related web activities.
The analyses presented in this data article show that Google Trends-generated data concerning the West-Nile virus disease well correlated with epidemiology in Italy.
This analysis could be extended in other countries, in order to replicate the current findings in other settings and contexts.
These data could be further mathematically and statistically refined for designing an approach for complementing traditional surveillance of the West-Nile virus disease.

Experimental design, materials and methods
Google Trends (GT, a tool freely available at https://www.google.com/trends) was mined from 2004 to 2015, searching for West-Nile virus disease (WNVD).
GT-generated data were modeled as a time series and analyzed using classical time series analyses. In order to detect regular time patterns, spectral analysis was carried out using algorithms written in Matlab, freely available at http://paos.colorado.edu/research/wavelets/ [1]. Further, correlation between GT-based Relative Search Volumes (RSVs) related to WNVD and "real-world" epidemiological cases in the same study period was performed both on a monthly basis and on a yearly basis.
Correlation between GT-based RSVs related to WNVD was also carried out on a regional basis. Autocorrelation and partial autocorrelation functions enable to compute the correlation of a time series with its own lagged values, respectively non controlling and controlling for the values at all shorter lags. Moreover, a regression model of the GT-generated data concerning WNVD-related web activities was performed.
Autocorrelation and partial autocorrelation analyses, correlational analysis and regressions were performed using the commercial software Statistical Package for Social Science (SPSS, version 23.0, IL, USA) and the commercial software MedCalc Statistical Software version 16.4.3 (MedCalc Software bvba, Ostend, Belgium; https://www.medcalc.org; 2016).
Figures with a p-value o0.05 were considered statistically significant.

Conflicts of interest
All authors declare no conflicts of interest.

Transparency document. Supporting information
Transparency data associated with this article can be found in the online version at http://dx.doi. org/10.1016/j.dib.2016.10.022.