GeoAI: Where machine learning and big data converge in GIScience

: In this paper GeoAI is introduced as an emergent spatial analytical framework for data-intensive GIScience. As the new fuel of geospatial research, GeoAI leverages recent breakthroughs in machine learning and advanced computing to achieve scalable processing and intelligent analysis of geospatial big data. The three-pillar view of GeoAI, its two methodological threads (data-driven and knowledge-driven), as well as their geospatial applications are highlighted. The paper concludes with discussion of remaining challenges and future research directions of GeoAI.


Introduction
In the 2020s, the world is experiencing the greatest challenges ever regarding the degradation of environmental quality, the more frequent occurrence of natural disasters, new and reemerging diseases, and the surge in social unrest, among others. These global issues are largely due to rapid population and economic growth, the excessive consumption of natural resources, and growing levels of social inequality. Global climate change, which yields increasingly extreme heat and droughts, resulted in intensified wildfires across the world. In 2019, the bushfire in Australia caused 33 deaths, more than 2,000 houses were damaged, nearly 15 million acres of land burned, and 480 million animals were killed [6]. In addition, the fire generated significant amounts of hazardous smoke, causing severe air pollution and negative health consequences.
The Arctic, one of the Earth's remaining frontiers, has also undergone dramatic changes due to rising global temperatures. The Arctic permafrost is at high risk of thawing, in which case an immense amount of carbon dioxide and methane will be released, which will further exaggerate the greenhouse effect and global warming [15]. In January 2020, a novel coronavirus, COVID-19, emerged as a highly infectious disease that quickly spread across the world. As of this writing, in April 2020, it has sickened more than 2 million people and caused at least 130,000 deaths. Action needs to be taken to mitigate the negative effects of these devastating events to ensure sustainable development and the well-being of humankind [1].
It is clear that these global problems are geospatial in nature-they all occur at a certain location on or near the Earth's surface and have distinct space-time patterns due to different geographical processes or interactive mechanisms between humans and nature. GIScience, which deals with the collection, storage, analysis, and visualization of geospatial data, will no doubt play a central role in tracking and mapping environmental and social phenomena at and across different scales, projecting how these phenomena evolve, crafting theories related to driving factors and processes, and setting policies to prevent and respond to ongoing and emerging issues.
Three recent technological advances have prepared GIScientists to better tackle these problems. First, the rapid development in Earth Observation (EO), wireless sensor networks, Internet and Communication Technology (ICT), and the prevalence of social media platforms have fostered the explosive growth of geospatial big data available at very fine spatial, temporal, and spectral resolutions [8]. These data allow us to study in near-realtime the Earth, its changing environment, and human behavior at an unprecedented scale and level of detail. Second, the recent breakthrough in machine learning, or more generally artificial intelligence (AI) and more specifically deep learning, enables a new research paradigm-data-driven science-relying on which, massive geospatial data that are difficult to handle using traditional spatial analysis methods, can now be analyzed, mined, and visualized. As a result, complex hidden patterns can be revealed, new questions can be asked and answered, and new knowledge can be discovered. Third, the dramatic increase in computational resources, such as Graphics Processing Units (GPUs) offers backbone support for the efficient training of machine learning models with big data. In addition, the availability of cloud computing platforms to the public allows individual researchers to build big data applications. All these revolutionary advances are shaping the future of geospatial research.

GeoAI: a new power-up for geospatial research
GeoAI, or geospatial artificial intelligence, sits at the junction of AI, geospatial big data, and high performance computing (HPC) to provide a promising solution technology for dataor compute-intensive geospatial problems. Figure 1 illustrates the conceptual, three-pillar view of GeoAI. As an interdisciplinary expansion of AI, the aim of GeoAI is for the machine to gain the intelligence to perform spatial reasoning and analysis like humans. GeoAI develops as AI evolves and it has two major method classes: knowledge-driven, known as the top-down approach, and data-driven, known as the bottom-up approach. Without a doubt, the data-driven approach, led by machine learning, has become the mainstream AI today because of its outstanding ability to learn to make predictions from massive amounts of data without the need to explicitly program the analytical rules. Deep learning, as a recent breakthrough in machine learning, has transformed data analytics paradigm in two ways.
www.josis.org First, deep learning models, such as convolutional neural networks (CNNs), have the ability to automatically extract prominent features from the data to help differentiate object classes so that accurate predictions can be made. This is a great advantage over traditional spatial analytical approaches because deep learning allows for a more automatic and intelligent method of feature extraction, a strategy which is especially helpful in solving big data problems in which there is often limited prior knowledge about the underlying patterns and processes within the data. Second, deep learning models introduce a local operationconvolution-into the learning process, such that numerous interdependencies residing in the global computation of traditional neural network models are broken down. This type of model design makes it much easier to be parallelized and trained on a high-performance or distributed computing environment. Even when a model structure goes very deep with thousands to millions of parameters to learn, it is still very likely to converge with strong predictive power.
Machine learning has also powered up the more traditional, top-down, ontologicalbased GeoAI approaches. These approaches tackle spatial cognition problems, such as semantic similarity measures [10], by leveraging ontology and logical reasoning. Different from data-driven approaches, an ontological approach relies on a knowledge base to provide semantic definitions of real-world entities and their interrelationships in the format of <subject, predicate, object> triples. The knowledge discovery process follows pre-defined reasoning rules and constraints and uses deductive reasoning to ensure that each newly derived fact can be formally validated with its reasoning path clearly traceable. Although highly interpretable, this approach suffers from two drawbacks: (1) ontology engineering-the process of constructing a knowledge base-heavily or even solely relies on expert knowledge and manual work. Although a very deep structure can be established to describe the complex relationships among entities, the human-centered approach can hardly scale to make the knowledge base comprehensive to ensure its performance; and (2) although ontology tries to capture the complexity in human logic, it needs to be implemented in a way that is machine understandable so some simplification and abstraction are inevitable. This adds another layer of performance challenge in making accurate predictions and decisions.
The recent notable progress of the knowledge graph and its combined use with machine learning has elevated the ontological approach back to the forefront of GeoAI research. Similar to an ontology, a knowledge graph is based on semantics and designed to infer new knowledge and derive new insights. But the two differ in that an ontology normally emphasizes depth, whereas a knowledge graph orients more with breadth. In this vein, an ontology can serve as the schema which defines the semantic structure of domain knowledge; a knowledge graph will follow this schema to "instantiate" the knowledge base with millions to even trillions of triples for scalable geospatial applications. To achieve a size like this, it is imperative to rely on machine learning for the automatic construction of a knowledge graph. Knowledge inference will also build on recent advances in machine learning, such as graph neural network and embedding techniques to achieve automation and intelligence.
Both methodological threads of GeoAI have widespread applications in geospatial domains. The remote sensing community has extensively used CNN for scene classification (natural and urban), change detection, and other image analysis tasks [9,18,19]. Deep learning has been leveraged to support cartographic tasks such as generalization, smart mapping, and map element inspection [16]. Machine learning has been increasingly used for the semantic and sentiment analysis of social media data and other natural language text documents [3]. In spatial information retrieval, the knowledge graph has become a key component and the backbone technology for intelligent question answering, hidden link prediction, and semantic search, among other things [11]. Multi-dimensional geospatial data, such as lidar and scientific data from numerical simulation models, can also benefit from the processing power such as 3D CNN for 3D object detection and event classification [13]. Time-series data, streamed from Internet of Things (IoT) sensors, can take advantage of Recurrent Neural Networks (RNNs) to achieve real-time predictions and analyses [14]. The diversity in geospatial data and the prevalence of location-based service make GIScience a natural home for these uses and the boom of AI.

Concluding remarks
Although exciting progress has been made, GeoAI remains in an early stage. Many technical challenges, such as the lack of good quality training data, uncertainty modeling, geographical transferability of a model result, and cross-scale and cross-resolution learning have yet to be addressed [5]. To further advance GeoAI and establish it as a cornerstone of GIScience, progress needs to be made in the following areas: www.josis.org

Integrated GeoAI
As discussed, both data-driven GeoAI and knowledge-driven GeoAI have their strengths and weaknesses. Data-driven models often tackle problems by following the "trial and error" strategy without guidance from theory or prior knowledge. They achieve high predictive power by building complex models while sacrificing model interpretability. Knowledge-driven approaches have strong expressive power but less satisfying predictive performance due to the many constraints applied on a model. An integrated GeoAI approach, which uses domain knowledge to guide the design of data-driven models, will simplify model design, reduce training time, increase model expressiveness, and augment decision power.

Fasten two-way knowledge transfer in GeoAI research
Although a large number of research projects have been conducted for applying AI for geospatial problems, the study of GeoAI should not merely involve a simple import of AI to geography. A two-way knowledge transfer from both "AI" to "Geo" and "Geo" to "AI" has to be enabled in order to establish GeoAI as a research field that creates impact within and beyond the geospatial domain. Some pioneer research along this line (i.e., spatiallyexplicit GeoAI models) has been proposed [7]. In future research, it is critically important to integrate spatial thinking and spatial principles into the GeoAI model development to make the model smarter at solving geospatial problems and developing artificial geospatial intelligence.

GeoAI-enabled convergent GIScience
The newer science landscape is leaning toward the conduct of convergence research which requires the seamless integration of theories, knowledge, tools, and expertise across traditional discipline boundaries for tackling complex problems in a collaborative manner. Space and time could serve as the nexus of the web of knowledge. For the challenging environmental and public health problems mentioned in the beginning of the paper, GeoAI will and should play a key role in fusing massive datasets, performing intelligent analysis, enabling interactive question answering for both researchers and the general public, as well as providing decision-support capabilities for planners and stakeholders. Hence, besides fundamental research, it is essential to deepen and broaden applications of GeoAI toward addressing big and bigger problems, for the benefit of society in realizing convergent GI-Science.
Without a doubt, the advances in GeoAI will help address societal challenges such as disease outbreaks, natural disasters, and climate change. With situations related to COVID-19, for example, GeoAI is being applied to support the rapid collection of data from multiple sources regarding the global environment, social economy, transportation, human mobility, and confirmed cases. The advanced analytics and machine learning capabilities that GeoAI offers facilitate the timely identification of vulnerable populations in this pandemic [4]. It is also being leveraged to evaluate social distancing effects [17] and to forecast impacts on hospital resources [12]. Together, these will improve the understanding of disease transmission patterns and provide scientific support for governments to enact plans to protect its citizens and save lives. This technology can also be used to detect patterns and discover explanatory spatial models from big data [2]. In this way, the mechanics of spatial processes, such as the spread of wildfire and the formation and development of tropical storms can be better understood. All these disastrous events, whether naturally occurring or caused by humans, are related to long-term changes throughout the world. GeoAI, through its combined use with big data and knowledge-driven approaches, will contribute to the discovery and validation of new theories and science, thereby enriching and helping sustain our global environment and society.