Analysing spatiotemporal patterns of tourism in Europe at high-resolution with conventional and big data sources

Available statistics on tourism from official European sources are limited in terms of both the spatial and temporal resolutions, curbing potential analyses and applications relevant for tourism management and policy. In this study, we produced a novel, complete and consistent dataset describing tourist density at high spatial resolution with monthly breakdown for the whole of the European Union. This is achieved thanks to the integration of data from conventional statistical sources with big data from emerging sources, namely two major online booking services containing the precise location and capacity of tourism accommodation establishments. The produced dataset allowed us to uncover key spatiotemporal patterns of tourism in Europe at unprecedented detail, showcasing the usefulness of complementing official statistical data with emerging big data sources. © 2018 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).


Introduction
Tourism is a phenomenon with increasing social and economic importance but which has characterised human behaviour for centuries (Butler, 2015). The recent boom in tourism made it an important economic sector in the European Union (EU), but also in other parts of the world. In 2016, the EU had an estimated 40.5% market share of global international tourist arrivals, or around 500 million (UNWTO, 2017). According to available estimates, the total contribution (direct þ indirect þ induced) of the travel and tourism sector to the EU's GDP in 2016 was 10.2%, but with strong variation between countries, ranging from more than 20% in Malta, Croatia or Cyprus to about 5% in Poland, Netherlands or Romania (World Travel and Tourism Council, 2017). Besides, the importance of tourism as a factor of economic growth has also been demonstrated in many countries by recent studies (Brida, Cortes-Jimenez, & Pulina, 2016;Ohlan, 2017;Perles-Ribes et al., 2017;Salmani, Hossein, & Somayeh, 2014;Seghir et al., 2015).
Tourism has an important territorial dimension, with uneven spatial distribution between and within countries, and delivering localized impacts. The importance of the spatial dimension of tourism is also underscored by findings indicating that tourism growth in one region influences positively tourism in neighbouring regions (Romão, Guerreiro, & Rodrigues, 2017), or that public policy can impact on the spatial patterns of tourism demand (Kang, Kim, & Nicholls, 2014). Seasonality is another distinctive feature of this economic sector, with significant socioeconomic and environmental implications (Butler, 2001;Chung, 2009). Seasonality itself has a marked geographical structure, varying considerably from region to region, depending on climate and type of destination (e.g. city, sea-side, mountain) (Butler, 2001). Together, these two dimensions of tourism, i.e. the spatial and the temporal, are fundamental to characterise and study tourism in a given territory. And the more countries or regions the area of study encompasses, the more diverse it is likely to be, and the higher the need for sufficiently detailed and comparable spatiotemporal data on tourism.
Consistent tourism data for the EU are primarily assembled and published by Eurostat. However, currently available data from Eurostat have limited spatial and temporal resolutions, hindering EU-wide characterization of tourism at fine spatial and temporal scales. Unconventional, big data sources are emerging, with the potential to improve our knowledge of tourism at unprecedented detail for vast world regions. But, to the best of our knowledge, there are still only a few examples of the use of such emerging sources of data to characterise spatiotemporal patterns of tourism and typically for limited study areas.
The main aim of this study and, simultaneously, its main contribution to international literature is to improve the existing knowledge base of current spatiotemporal distribution of tourism in the EU-28 to enable new insights and applications relevant to tourism management and policy. This main objective can be broken down in four intermediate objectives or tasks, each leading to a tangible output: (i) increase the geographical detail of existing statistics on spatial distribution of tourism demand down to regional level; (ii) derive regional temporal profiles (monthly) of tourism demand; (iii) generate tourist density maps at high spatial resolution on a monthly basis and (iv) exploit the produced information to assess relevant dimensions of tourism regionally such as tourism intensity, seasonality and vulnerability.
To accomplish these objectives, we combined data from two distinct sources: European official statistical bodies, namely Eurostat and National Statistical Offices (NSOs) and online booking services. From Eurostat, we collected nights spent and accommodation capacity at regional level. From NSOs we assembled nights spent or arrivals at tourist accommodation establishment per quarter or month and per region. Finally, from online booking services, geographic coordinates and other descriptors of accommodation establishments were mined, totalling ca. 843 thousand individual records. The datasets were then combined using a predefined protocol to produce multi-temporal grid maps of tourist density at high spatial resolution (100 Â 100 m).
In the following section, we briefly review the current stateof-the-art concerning existing official tourism statistics and examples of the use of unconventional, big data sources for the study of tourism. In the Data and Methods section, we describe in more detail the various input data and the methodology applied to combine them. In the Results section, we show maps of tourist density for Europe and report findings concerning tourism prevalence, seasonality, and intensity, which we finally combine to assess regional vulnerability to shocks in the tourism sector. The last section of the paper wraps-up and discusses the work done and sets out areas that would benefit from further development.

Statistical and big data for tourism
When looking at tourism for a territory as large as the EU, the primary source of data is Eurostat. 1 Official statistical bodies such as Eurostat assemble and publish an important set of tourism-related statistical data with regional breakdown. Eurostat usually dedicates a chapter to tourism in its regional statistical yearbooks (e.g. Eurostat, 2016). Statistical data from Eurostat with regional breakdown include, on the demand side, arrivals and nights spent at tourist accommodation establishments, while, on the supply side, capacity of tourist accommodation establishments. All the regional data provided by Eurostat is available on a yearly basis (figures per region and per reporting year). Although relevant to characterising tourism demand and supply density in Europe at the regional level, these statistics do not permit uncovering the spatiotemporal patterns at fine resolution.
While the spatiotemporal resolution offered by official statistical data sources might remain limited, other non-conventional data sources are emerging. These new sources of information, often called 'big data' sources, for their variety, volume and velocity (Katal, Wazid, & Goudar, 2013), are enabling new opportunities for research and analysis in a myriad of domains, including tourism (Benjelloun, Lahcen, & Belfkih, 2015;Rodríguez-Mazahua et al., 2016). In fact, the applications of big data for tourism analytics seem to be growing by the day, and are now numerous and diverse. Social media has been used as a source of user-generated content (e.g. user/customer reviews, posts, photos) to assess international mobility patterns (Hawelka et al., 2014), estimate visitation rates of specific attractions (Wood et al., 2013), identify tourist hot-spots in cities (Garcia-Palomares, Gutierrez, & Minguez, 2015), or to finetune tourism marketing strategies (Marine-Roig & Anton Clav e, 2015). Other studies have used web search engine queries to forecast tourism demand for specific destinations (Li et al., 2017, pp. 57e66), or scraped online booking services to monitor hotel prices (Goni et al., 2017).
Mobile network operator (MNO) data is another emerging input for tourism analytics and a particularly promising one for mapping and monitoring patterns of presence of tourists at high spatial and temporal resolutions. Data derived from the use of mobile phones and geo-located to antennas already enabled researchers to assess spatiotemporal visitation patterns of tourist destinations in Estonia (Ahas et al., 2008;Raun, Ahas, & Tiru, 2016). Following these early advances, statistical bodies are conducting pilot studies to test the use of MNO data in the production of official tourism statistics (Dattilo & Sabato, 2017;Demunter & Seynaeve, 2017). However, the use of this data source in a systematic fashion is still hurdled by data access constraints, as profit-driven MNOs are still reluctant to release their data, as proper business models are not yet well established (Debusschere, Wirthmann, & De Meersman, 2017). In addition, there are several methodological challenges associated with the use of MNO data. These include incomplete penetration rates and lack of data for 'roaming' users (Dattilo & Sabato, 2017), heterogeneous market shares of MNOs across regions and