Spatio-temporal visual analytics: a vision for 2020s

Visual analytics is a research discipline that is based on acknowledging the power and the necessity of the human vision, understanding, and reasoning in data analysis and problem solving. Visual analytics develops methods, analytical workflows, and software tools for analysing data of various types, particularly, spatio-temporal data, which can describe the processes going on in the environment, society, and economy. We briefly overview the achievements of the visual analytics research concerning spatio-temporal data analysis and discuss the major open problems.


Introduction
Whatever algorithms and technologies for computerized data processing appear, human understanding and reasoning remains the principal and irreplaceable instrument of analysis, modelling, and problem solving. Visual representation of information is acknowledged as the most effective way of supplying information to the human's mind and as a promoter of ideation and analytical reasoning. Visual analytics is a research discipline developing methods, analytical workflows, and software systems that can support unique capabilities of humans by providing appropriate visual displays of relevant information and involving as much as possible the capabilities of computers to store, process, analyse, and visualize data [18].
Understanding of the processes going on in the environment, society, and economy is crucial for the survival of the human civilization. All these are spatio-temporal phenomena; hence, there is a high demand for approaches to supporting humans in analysis of spatiotemporal data [2]. Given the importance and complexity of such data, they have always been one of the major focuses in the visual analytics research.

State of the art
Particularly, much research has been done in the last decade on analysis of movement data [1] (i.e., sequences of spatial positions of moving entities), but substantial attention has also been given to spatial events (i.e., entities positioned in space and time) and spatial time series (i.e., temporal variation of spatially distributed attribute values). Visual analytics researchers proposed generalizable and reproducible analytical workflows involving data selection and filtering, visualization of movement trajectories, aggregate flows, and spatiotemporal distributions of events and attribute values, clustering based on the spatial, temporal, and thematic components of the data, transformations between spatio-temporal data types, and derivation of new data objects representing extracted pieces of task-relevant information. For example, spatial event data can be generated as a result of interactive detection of significant changes, anomalies, or critical circumstances in movement [4] or in time series of spatial situations [5]. Conceptual models have been proposed to address the diversity of spatio-temporal data and systematically consider the possible transformations between data types [1].
Before performing any analysis, it is necessary to ensure that the data are of appropriate coverage and quality. While there exist quite elaborated tools for inspecting properties of tabular data (e.g., Trifacta data wrangling tools 1 ), they are not yet addressing the specifics of spatial and temporal data such as the time cycles, spatial and temporal autocorrelation, smoothness of many phenomena in space and time, existence of geographic borders and barriers, just to name a few. Researchers are developing recommendations on how to deal with time series data [14,15] and movement data [3], as well as other types of spatio-temporal data [22]. These recommendations gradually find their ways to opensource implementations [13].
One of the recent developments in visual analytics that may have potentially wide applicability and appreciable utility for spatio-temporal data analysis is a novel method of time-based data querying and filtering called Time Mask [6]. Based on any kind of time-related data, it selects all time intervals in which particular conditions of interest are fulfilled. Next, from all currently available temporal data, it selects the portions where the temporal references fit in the selected intervals. The Time Mask filter can serve as a tool for integrated analysis of several time-related phenomena: analysts can select time intervals based on the state of one of them and investigate how the other phenomena developed during these intervals.
The idea and the work of the Time Mask query method is illustrated in Fig. 1. The example dataset consists of events of taking photos of cherry blossoms that occurred on the territory of the USA during 10 years from 2007 till 2016. The data (metadata of the photos) was obtained from the Flickr photo sharing web service using its API. The whole set of events is shown in a space-time cube in the upper part of Fig. 1. By means of densitybased clustering, large spatio-temporal clusters of the events were detected. These clusters signify time periods of mass photo taking in particular areas. The central image in  demonstrates an interactive visual tool for making Time Mask queries. The user interface (UI) shows the distribution of chosen datasets or subsets over time. In our example, the upper part of the UI shows the time series of the daily counts of the individual events, and the lower part shows the times and durations of the event clusters. In this visual interface, the user first selected the time intervals when mass photo taking was happening in New York. These intervals are highlighted by background painting in yellow. Then, the user selected the 45-days long time intervals before the beginnings of the event clusters in New York. These intervals are highlighted by background painting in blue. In response to this query, the data (individual events and clusters) has been filtered. Only the data items whose time references fit in the time intervals marked in blue have been selected. This data are shown in the space-time cube at the bottom of Fig. 1. This example demonstrates how a Time Mask query can select data from multiple disjoint time intervals, thus enabling flexible data selection and sophisticated analysis workflows.
A notably large amount of visual analytics research was done on analysis of data from social media [10,12,30,33], in particular, geographically referenced data. Luo and MacEachren [24] developed conceptual and methodological foundations for geo-social visual analytics. To support studies of people's reactions to significant events, visualizations and analytical workflows were proposed for integrated analysis of the temporal, spatial, social, and thematic facets of the social media data [20,[27][28][29]. There was also research on visualization and analysis of people's movements and long-term mobility behaviors derived from geo-tagged social media data [9,31]. In this respect, attention was given to development of privacy-preserving approaches to visual analysis [7]. The research on analysing social media data contributes to better understanding of the society and people's life.
Visual analytics research topics also include support to development of predictive models [23]. In relation to spatio-temporal phenomena, approaches have been proposed for forecasting hotspots of an epidemic outbreak [25], modelling of road traffic flows under usual and extraordinary conditions [8], prediction of crime incident distribution and decision support for resource allocation [26], to name a few. On the one hand, these approaches enable human analysts to incorporate their domain knowledge and insights gained from the analysis in the model development, on the other hand, they help analysts to use the models for improving their understanding of the phenomena at hand, verifying their insights and hypotheses, and developing appropriate decision options.
A strong current trend in visual analytics research is development of machine learning models with understandable and explainable behavior (often called as "eXplainable AI", or xAI) [11]; however, modelling of spatio-temporal phenomena has not been addressed in this research, yet [16].
It has become usual nowadays to study and predict the behavior of complex real-world phenomena by means of computer simulation models. Since the latter are affected by configuration parameters, they are usually executed multiple times with different parameter settings to generate ensembles of simulated behaviors. Supporting scientists to explore and comprehend ensemble simulation data is a prominent research topic in visual analytics [32]. It is important not only to portray complex behaviors but also help the analysts to assess the uncertainties arising due to divergences between different predictions and to investigate the variations of the uncertainty over space and time [17,19].
To summarize the state of the art in space-time visual analytics, we would like to highlight the following trends: www.josis.org • the advanced state of conceptualization of spatio-temporal data and emerging systematic approaches to data analysis; • detailed consideration of data properties and semantic interpretation of data; • active development of methods and tools for task-oriented data transformation and analysis; and • bridging the gaps between data exploration, analysis, and model building.

Open problems and possible approaches
Current research in visual analytics strives to address the challenges of big data. Apart from developing technical solutions for enabling rapid visual representation of and fluid interaction with large amounts of dynamically changing complex data, visual analytics must care about the natural limitations in the human capability to comprehend information regarding its amount, complexity, and rate of change. Moreover, since the human is not a passive recipient of information but an important active force in analysis and problem solving, visual analytics must also care about the limited speed of human cognitive processes. The involvement of the "human factor" makes the challenges of big data especially hard for visual analytics. The key idea for addressing these challenges is effective division of labor between the human and the computer. Thus, computational techniques can be utilized for partitioning very large spatio-temporal datasets into internally homogeneous coherent portions and extracting characteristic patterns from these portions, and analysts can employ visual and interactive techniques to understand the patterns and their distribution over space and time [21]. This approach, however, only deals with the problem of data volume but does not address the problem of data dynamism.
In dealing with the latter problem, visual analytics researchers can build on the experiences of successful application of visual analytics to model building. In the case of dynamically changing data, the basic approach is to create an initial model based on currently available data, continuously monitor the model appropriateness to the newly arriving data, and modify the model when it fails to accommodate the new data well enough. It may not be feasible to involve a human expert each time when the model requires adaptation. However, an expert may anticipate how the data may change, based on the previous history of the changes, and orchestrate suitable model adaptation mechanisms that can be activated automatically when the prediction fulfills. This can be done for several possible courses of data evolution. A specific task for visual analytics is to support the process of the expert's analytical reasoning based on the previous history of data changes and model adaptations. While it may not be possible to develop a general solution, a multitude of application-and task-oriented approaches to handling the problems of big data can be expected.
Another major challenge for visual analytics is achieving wide whilst appropriate and effective utilization of visual analytics techniques and workflows by analysis practitioners. A vast amount of research in visualization and visual analytics is concerned with development of user-oriented tools and systems, personalization, automated identification of users' needs and intentions, as well as user guidance. However, it is not very likely that the research prototypes developed along these directions will soon be converted into widely accessible and highly reliable visual analytics software suitable for a large variety of applications and analysis tasks. Some commercially available systems are quite user-friendly, but their analytical capabilities are very limited, especially with respect to spatio-temporal data. A practicable solution is proliferation of open-access libraries of combinable software codes implementing various methods for data processing, analysis, modelling, and visualization. This trend is now actively developing, being stimulated by the emergence of the data science languages R and Python and the interactive web-based computational notebook environments.
A large number and variety of notebooks with examples of application of different data analysis and visualization methods have been published on the web by numerous people. These notebooks can be relatively easily adapted to one's individual needs. As a result, analytical functionality becomes accessible to and usable by nearly everyone. This generally positive trend has its back side, however. The notebooks are often created or adapted by people having quite little idea of how to choose appropriate visualization techniques and design correct and effective visualizations of the data they deal with, and also have no good understanding of why, when, and how visualizations need to be used in analysis and what is their right place in analysis workflows. Some visualizations occurring in the publicly accessible example notebooks may look impressive and convincing to non-specialists, but, in fact, they may communicate spurious patterns in inadequate ways. Those who view these visualizations and think of doing the same for their data and tasks often lack knowledge that would enable critical assessment and understanding of the suitability of the techniques. Other notebooks include only basic graphics having little analytical value, whereas better ways exist for representing the relevant information. Furthermore, often notebooks apply computational methods without checking pre-requisites (e.g., data properties and quality) and investigation if the parameter settings are suitable and results are meaningful.
Besides insufficient visualization and data analysis literacy, there is a danger of uncritical trust in what is produced by computers and taking the outcome of a single run of an analysis algorithm with default parameter settings, or with settings previously used by someone else, as the final result. Naive analysts may not realise that a slight change in the data or parameters can sometimes significantly change the result; therefore, they may not bother to examine the reaction of the algorithm to such changes and to check results of several runs for consistency. More experienced and critically-minded analysts, who usually take the trouble to evaluate and compare what they get from computers, may tend to rely solely on statistical measures rather than trying to gain better understanding with the help of visualizations.
Hence, there is a strong need in disseminating the knowledge on how to create meaningful visualizations and how to use them effectively in data analysis together with computer operations. It is also necessary to spread the philosophy of visual analytics, where the main principles are the primacy of human understanding and reasoning and awareness of the weaknesses of computers, which cannot see, understand, and think, and thus need to be led and controlled by humans.
This challenge can be addressed by creation of easily understandable, practitioneroriented textbooks and open online courses. Visual analytics researchers should not only strive for advancing the research but also take the responsibility for transferring the operational knowledge to practitioners and casual analysts. This is especially important for achieving wider and better understanding of the environmental, economic, and societal processes, their interrelations and effects. www.josis.org