Big Data or Big Fail? The Good, the Bad and the Ugly and the missing role of Statistics


Abstract


The so called “Big Data” are data which we think as being “big” because of their volume, their amount per unit of time and because they are un- structured. The usual sources of big data are administrative repositories, transaction data or social media and social network feeds. Someone defines big data as those data which cannot be analyzed on a desktop machine or stored on one’s hard disk. These ways of defining big data completely miss the point of view of Statistics: they seem to be tailored more to advertising campaign of SaS or storage solution rather than to Science. Moreover, recent big fails, like e.g. the famous/infamous Google Flu Trend experiment, raised a series of popular news paper articles against the validity of information contained in these data and Statistics itself, even though none of these bad practices has been conducted by statisticians. While Information Technol- ogy and Computer Science are good at efficiently retrive and manage them, these data should be soon brought back into the field of Statistics to where data belong and this Special Issues of EJASA is one important step in this direction.


DOI Code: 10.1285/i2037-3627v5n1p4

Keywords: Big data; social media; unstructured data.

Full Text: pdf


Creative Commons License
This work is licensed under a Creative Commons Attribuzione - Non commerciale - Non opere derivate 3.0 Italia License.