A multivariate hierarchical regionalization method to discovering spatiotemporal patterns

ABSTRACT In GIScience, the regionalization method is widely used for geographical data mining, spatiotemporal pattern discovery, and regional studies. An ideal regionalization method should consider spatial contiguity, temporal contiguity, and attribute similarity. Existing regionalization approaches mostly focus on spatial contiguity and attribute similarity while ignoring the temporal contiguity characteristics of geographic phenomena. We propose a multivariate spatiotemporal regionalization (STR) method that considers spatiotemporal contiguity and attribute similarity. We design a bottom – up unsupervised multivariate hierarchical clustering algorithm with constraints using spatiotemporal proximity rules, enabling the automatic regionalization of spatiotemporal data. To test the performance of the STR method, we applied it to a synthetic dataset and a real-world dataset (Chinese air pollutant data) and achieved ideal results. Such a method offers a spatiotemporal perspective to address regionalization or clustering problems, potentially supporting other applications in spatiotemporal data analysis, remote sensing, urban planning, and social science. GRAPHICAL ABSTRACT1


Introduction
Spatial clustering or regionalization has been one of the most common problems in GIScience, spatial statistics, and spatiotemporal data analysis (Helbich et al. 2013;Wolf 2021).Given a set of geographical objects (e.g.Chinese cities) with univariate or multivariate information, a regionalization method attempts to aggregate the geographical objects into numerous spatiotemporal contiguous regions while optimizing an objective function, which is normally a measure of the attribute similarity in each region (Aydin et al. 2021;Wang et al. 2022).Regionalization has been applied to many fields, such as map generalization, climatic zoning, housing market, social sensing, and health-related analysis (Ferreira, Campbell, and Matwin 2022;Wang et al. 2022;Zhou and Matyas 2021).
In previous studies, researchers have proposed many regionalization methods.Regionalization has two definitions, and both are well-accepted: partitioning, which assigns a unique label to any location in the study area, and non-partitioning, which is unnecessary to assign labels to regions, but to obtain a cluster center (Zhu et al. 2021).This paper focuses on partitioning regionalization, but some discussions also use the term partitioning clustering as a convention in the literature.These methods can be divided into three categories in data source: regionalization of raster, regionalization of polygon (or point), and regionalization of origin -destination flow or network (Kowe et al. 2020;Shu et al. 2020;Wei, Rey, and Knaap 2020).The regionalization algorithm of the raster is closely related to remote sensing and image processing (Bogucka and Jahnke 2018;Cowpertwait 2011;Lin et al. 2021).It is often used to identify geographic objects in remote sensing images, thereby providing a basis for objectoriented remote sensing algorithms.
However, a large amount of socio-economic data in the real world are distributed in administrative regions that have more complex neighborhood relationships than that of grid cells (Daraganova et al. 2012;Mohammadpour et al. 2021).Therefore, the regionalization of polygon has gradually developed to solve the issues in social science analysis.For instance, some researchers use data-driven regionalization algorithms to analyze the housing market, whereas others propose a mixed-level regionalization method to study national health data (Govorov et al. 2019;Helbich et al. 2013).Some scholars focus on improving regionalization methods from the algorithm perspective, in which the regionalization results can have spatial autocorrelation and other characteristics that are consistent with the real world (Liu et al. 2015;Niesterowicz, Stepinski, and Jasiewicz 2016).With the gradual development of sensor networks in recent years, plenty of socio-economic data are dynamically distributed in geographic space that are being generated with new sensor networks.(Doreian and Conti 2012;Pereira, Segurado, and Neves 2011).For example, population data at different geographic boundaries are still valuable and continue to be generated despite the development of sensors to monitor population flows.
Subsequently, a large number of algorithms in complex networks are used in the analysis of geographic flow and geographic networks because of the similarity between network community detection and regionalization (Berline et al. 2014;Ghawi and Pfeffer 2022;Poorthuis 2018;Shen, Liu, and Chen 2017).However, community detection algorithms in complex networks often only consider topological relationships.Thus, some researchers have proposed network community algorithms with spatial constraints, such as spatial constrained Louvain, spatial constrained Leiden algorithms, minimum spanning tree (MST)-regionalization method, network Max-P model, and spatially encouraged clustering and dynamically constrained clustering method (He et al. 2017;Mahmood and Gloaguen 2013;She, Duque, and Ye 2016).They are widely used in research on social network structure, urban structure, regionalization uncertainty, and neighborhood unit detection (Yu et al. 2016;Zhong et al. 2014).
With the in-depth study of spatiotemporal data, scholars have found that not only does spatiotemporal data have spatial autocorrelation but also temporal autocorrelation (Kristensson et al. 2009;Shaddick, Lee, and Wakefield 2013).This introduces difficulties in obtaining the expected effect when the regionalization method that only considers spatial constraints is applied to spatiotemporal data (Krivoruchko and Gribov 2019;Xi et al. 2019).Scholars have proposed solutions to this issue.For example, Hoffman proposed the multivariate spatiotemporal clustering (MSTC) method to solve environmental applications.Inspired by multivariate geographic clustering (MGC), this method transformed KNN clustering from data space to geographic feature space and used a parallel principal component analysis (PCA) tool to improve operational efficiency in large datasets.MSTC does not take spatiotemporal contiguity, especially temporal contiguity, as a strong constraint, but emphasizes the cluster similarity in geographic feature space (Hoffman et al. 2008).AssunÇão proposed a regionalization method based on MST to solve multivariate spatial clustering for socio-economic geographic units (Assunção et al. 2006).It adopts a top -down clustering, that is, it uses MST to link all geographic units, and then breaks specific links according to modularity and link strength to achieve the purpose of clustering.This method is widely used in geographic pattern discovery and integrated into ArcGIS and ArcGIS Pro after optimization using a genetic algorithm (Mahmood and Gloaguen 2013).
ClustR and SatScan model multivariate units from the perspective of complex networks and then transform spatial clustering into spatial complex network community detection to achieve their goals (Kriventseva et al. 2001;Coleman et al. 2009;Xie et al. 2022).In the face of the spatiotemporal clustering task, the MST-regionalization method, ClustR, and SatScan often adopt the idea of layering the time dimension while clustering the space dimension and then integrating the time dimension and space dimension to complete the task.It is beneficial to improve the efficiency of the algorithm, but the error probability between different time layers will increase accordingly.Therefore, this study aims to propose a new method that considers spatial and temporal relationships to solve the problem of insufficient response to temporal autocorrelation in existing regionalization methods.Thus, we realize a regionalization-based spatial clustering method that considers spatiotemporal contiguity and attribute similarity.
The rest of this paper is organized as follows.In Section 2, we introduce the multiple-variable constrained spatiotemporal regionalization (STR) method proposed in this paper.The section has four parts: spatiotemporal proximity rules (Section 2.1), feature vector distance measurement (Section 2.2), spatiotemporal constraint regionalization algorithm (Section 2.3), and evaluation of regionalization results (Section 2.4).Then, we use a synthetic dataset (Section 3.1) and a real-world dataset (Section 3.2) as case studies to verify the STR method.Finally, in Sections 4 and 5, we discuss and conclude.

Methodology
This paper proposes a bottom-up unsupervised hierarchical clustering algorithm, which can realize the automatic regionalization of spatiotemporal data in consideration of spatiotemporal contiguity and multivariate similarity.We call it the STR method, which comprises four steps.First, we define the spatiotemporal proximity rule and neighborhood merging rule of the spatiotemporal cube (STC).It can ensure that the spatiotemporal cluster formed by the merger of STCs has spatiotemporal contiguity.Second, because geographic entities in spatiotemporal data have multiple attributes, we convert them into feature vectors of geographic entities.Then, this paper uses the Euclidean distance of the multivariate center of gravity as the calculation method of similarity between STC and its neighbor cubes.Third, we use the idea of unsupervised hierarchical clustering to search spatiotemporal neighborhoods of each STC.We then calculate the similarity between STC and its neighbor cubes and select the largest one to merge.Fourth, we evaluate and analyze the regionalization results and detect three basic types of clusters: cylindrical, pieshaped, and spherical.Various cluster types represent different spatiotemporal characteristics.This method fully considers spatial autocorrelation and temporal autocorrelation and is an attempt to extend the regionalization algorithm from the spatial dimension to the temporal dimension.

Data structure and spatiotemporal proximity rules
To realize the expression of spatiotemporal data, this study adopts NetCDF to construct STC. STC refers to a spatiotemporal cubic region comprising homogeneous pixels in a certain continuous spatial domain.Mathematically, an STC Cube i S i ; T i ð Þ comprises the spatial attributes data S i ð Þ and the temporal attributes data T i ð Þ. S i refers to the spatial domain, which can be a regular or irregular shape.T i is the temporal domain of STC. Figure 1 illustrates the data structure of STC in sample data.Figures 1a, 1b, and 1c show the three scales of STC data structure.Figure 1a is the data structure of spatiotemporal scale, which contains three dimensions of X, Y, and T. Figure 1b shows a layer of spatiotemporal scale, which can be the XY, XT, or YT domains.Figure 1c is the data structure of a single STC, which stores multiple values in it.
To ensure the spatiotemporal contiguity of regionalization, this study stipulates the spatiotemporal proximity rules of STC.The rules comprise three parts: the spatiotemporal proximity rule of (I) a single cube, (II) multiple cubes, and (III) common neighborhood cubes.For a single cube, this study stipulates that the cube with a common part (point, polyline, or polygon) can be neighborhood cubes (N A 1 ) of A 1 (Figure 2a).For multiple cubes (A 1 ; A 2 ; A 3 ), the neighbor cubes (N A 1 ;A 2 ;A 3 ) are defined as the intersection of N A 1 , N A 2 , and For various types of cubes, the spatiotemporal proximity rules are relatively complicated.When the neighbor intersection is empty (N A 1 \ N B 1 ¼ ;), the spatiotemporal proximity rules follow I or II.When the neighbor intersection is not empty the common neighbor cube of A 1 and B 1 .Whether it is A 1 's neighborhood calculation or B 1 's, CN A 1 B 1 will participate and decide the attributions based on its attribute similarity.

Feature vector distance measurement
The spatiotemporal proximity rule (Section 2.1) guarantees the spatiotemporal contiguity between the central cube and its neighbor cubes.Thus, this section calculates the feature vector distance between the central cube and neighbor cubes.
Each cube constructed in this study stores multiple values to form a feature vector (Cube i;j;k ¼ V i;j;k 1 ; V i;j;k 2 ; V i;j;k 3 ; . . . To achieve the STR with spatiotemporal constraints and attribute similarity, measuring the feature vector distance between the central cube and its neighbor cubes is necessary to realize hierarchical clustering under the premise of spatiotemporal contiguity.As shown in Figure 3, assuming that Cube i;j;k is located at a corner of STC, its neighbor cubes comprise Cube i;jÀ 1;k , Cube i;jÀ 1;kÀ 1 , Cube i;j;kÀ 1 , Cube iÀ 1;jÀ 1;k , Cube iÀ 1;j;k , Cube iÀ 1;jÀ 1;kÀ 1 , and Cube iÀ 1;j;kÀ 1 .The attribute values in a cube can form an n-dimensional feature vector.Given that the order of the attribute values in the cube does not affect the similarity among cubes, this study uses Euclidean distance to measure it (Formula 1).

Dist
where Dist is the feature vector distance between the central cube and its neighbor cubes, Cube i;j;k is the feature vector of the central cube, and Cube i 0 ;j 0 ;k 0 is the vector of the neighbor cubes.

Spatiotemporal constraint regionalization algorithm
Figure 4 illustrates the implementation of the spatiotemporal constraint regionalization algorithm, which contains three steps and one iteration rule.First, neighborhood searching.Each cube in the study area is the starting point of the algorithm.Each cube looks for its neighbors and determines the most similar cube to merge.As shown in Figure 4a, we draw two cubes (Cube A and Cube B ) as the starting points of the algorithm (all cubes can be used as the starting points of the algorithm).Cube A and Cube B belong to cube sets A ¼ Cube A f g and B ¼ Cube B f g, respectively.Second, Cube A and Cube B start to find their neighborhood cubes based on the spatiotemporal proximity rules, thereby forming two neighborhood cube sets NA ¼ Cube NA1 ; Cube NA2 ; . . .; Cube NAn f g and NB ¼ Cube NB1 ; Cube NB2 ; . . .; Cube NBm f g (Figure 4b).Third, we calculate the feature vector distance between the cubes in A and the cubes in NA by using Formula 1.The same is true for B and its neighbor set NB. Fourth, the cubes with the smallest distance are moved from NA and NB to A and B, respectively In the next iteration, the cubes in A will be regarded as a whole, and the neighborhood of A is the union of all cube neighborhood sets (Formula 2 and Figure 4e).Next, the algorithm will continue to iterate until it reaches the number of clusters required by users (Figures 4F-4h).
where NA is the neighborhood cubes set of A, Ncube 1 to Ncube m , which are the neighborhood cubes sets of the cubes in A.

Evaluation and analysis of regionalization results
This section evaluates the shape and variable distribution of regionalization results.The shape of the regionalization results can reflect their spatiotemporal characteristics, while the internal variable distribution can reflect their attribute characteristics.Figure 5 represents three basic types of STC regionalization results: (a) cylindrical clusters, (b) pie-shaped clusters, and (c) spherical clusters.Other cluster shapes can be composed by these three basic shapes.Cylindrical clusters have a smaller range in the X and Y direction but a larger range in the T direction (Figure 5a).This represents a phenomenon that is relatively stable in space but lasts for a long time.Pieshaped clusters have a larger range in the X and Y directions, but a smaller range in the T direction (Figure 5b).This represents a phenomenon that has a wide distribution in space but a short duration.Spherical clusters have a relatively uniform distribution in the three directions of XYT (Figure 5c).
Figure 6 shows the difference in variable distributions of various clusters.The left-hand part of Figure 6 is the regionalization results, and the middle part is the three clusters in the regionalization results: Clusters I,II, and III.Figures 6a, 6b, and 6c in the right-hand part are the variable distribution of these three clusters.The method comprehensively considers multivariate information in the progress of calculation.There is a similar variable distribution within each cluster, which can reflect cluster attribute characteristics.For example, the cubes in Cluster I have the characteristics of relatively high V2 and V4, as well as low V1, V2, and V5 (Figure 6a).Cubes in other clusters have the same phenomenon.When the variables have practical significance, each cluster can analyze specific geographical meanings according to its variable distribution.

Experiment of synthetic dataset
In this study, a synthetic dataset is simulated to evaluate the rationality of the STR method and designed as a regular cube containing 216 STCs.Each STC includes eight attributes, as follows: "Object ID" is the number of cubes; "V1," "V2," and "V3" are the three attributes of cubes; "X," "Y," and "T" are the spatiotemporal information of cubes; and "Cluster ID" is the regionalization result of cubes.Table 1 presents the list of attributes.We preset the attributes of the synthetic dataset to ensure its internal spatiotemporal heterogeneity, that is, random numbers for blocking are assigned to the attributes of the synthetic dataset.We assume that the attributes of a cube i constitute a three-dimensional vector Then, the difference in the distribution of attributes between different cubes can be  measured by the Euclidean distance between vectors V i .The Euclidean distance is small within the same cluster, and the distance between different clusters is large.Specifically, for each cube of the synthetic dataset, the range of attribute values (V1, V2, V3) is 0-200.Inside the same preset cluster, the Euclidean distance between V i is within 2.5.The Euclidean distance between different clusters tends to be over 50, guaranteeing global heterogeneity and local homogeneity within the same cluster.Several outliers to the synthetic dataset are added to better simulate the real-world situation.The synthetic dataset already has an apparent STR structure, which can be used to evaluate the rationality of the STR method results.
The synthetic dataset is converted into STCs, and the STR method is used to regionalize its spatiotemporal clusters.Given that the STR method is an unsupervised regionalization method, the number of regionalization k is determined by research needs.This section shows eight regionalization results, where the same color represents the same spatiotemporal cluster (Figure 7).As shown in the figure, the results have three basic characteristics.First, spatiotemporal contiguity of the same spatiotemporal cluster is guaranteed, making the cubes mutually spatiotemporal neighbors within the same cluster.Second, the spatiotemporal structure preset in the synthetic dataset has been reasonably expressed.Third, the three basic structures of cylindrical, pie-shaped, and spherical clusters have also been mined.Thus, the method can express the spatiotemporal structure of the synthetic dataset and complete the regionalization.
After completing the initial experiment on the small random synthetic dataset, we test the accuracy of the STR method on a larger random synthetic dataset.The dataset contains 8000 STCs, each consisting of five random variables.Before each clustering test, we pre-divided the entire dataset into a specific number of clusters and set labels.The results of the STR method are compared with the labels to obtain the misclassified STCs and accuracy.Figure 8 represents the comparison of theoretical and actual results on a random dataset.Each subgraph consists of two parts: the theoretical and actual results in the upper and lower parts, respectively.From the figure, we find that the actual results of the STR method are consistent with the theoretical results.At the same time, STR results also maintain a spatiotemporal contiguity.When the cluster number is 3 (Figure 8b), the STR results are suspected to be discontinuous in the spatiotemporal dimension.Upon inspection, the resulting outliers are connected to the main community through STCs at the bottom of the random dataset.A slight misalignment shows in the time dimension of the STR results when the cluster number is 5 and 7 (Figures 8d and 8f).However, in actual research, the spatial dimension of the study area is much larger than the temporal dimension.Thus, slight errors in the temporal dimension do not affect the analysis and mining of specific geographic patterns.
Figure 9 represents the numerical characteristics of each cluster generated by the STR method.The X-axis in this figure represents attributes of the synthetic dataset, and the Y-axis is the mean value of attributes  of each cluster.Each line represents the numerical distribution of attributes within each cluster.The figure shows that the STR method can also discover the differences in the feature space of STCs.For example, Clusters I and IV are not only independent of each other in spatiotemporal dimension but also exhibit opposite numerical distribution patterns in feature space.If this phenomenon occurs in the real-world datasets, it often reveals a specific geographic spatiotemporal pattern.This shows that the STR method is effective in performing regionalization-based clustering and mining spatiotemporal patterns.
From the specific misclassified STCs and algorithm accuracy, we can also find interesting phenomena.With the increase in cluster numbers, the amount of misclassified STCs decreased from 381 to 294.At the same time, the accuracy of the algorithm on the random dataset also increased from 95.2375% to 96.325%.When there are more than 10 clusters, the misclassification number of the STR method tends to be stable and remains at about 295 (Figure 10).The reason is related to the elimination of the error caused by the increase in the cluster number.In theory, for the same algorithm, its accuracy must be stable on any random dataset, and  thus the number of misclassified STCs is also stable.As such, when the number of clusters is small, a large number of misclassified STCs is concentrated in the same cluster, resulting in a slightly higher error rate.We also applied the MSC, MST, and SKATER methods to the synthetic dataset.The result shows that the STR method has a relative advantage in regionalizationbased clustering tasks for spatiotemporal data.Why other methods work poorly with spatiotemporal data is also clear.Given that these methods do not consider spatiotemporal contiguity logically but only consider spatial contiguity, they can only calculate each temporal layer and then integrate them.Discontinuous clusters will generate between different temporal layers, causing the rise of error probabilities.
Table 2 represents the performance of the STR method for each cluster at different regionalization thresholds.This table has 16 rows and 17 columns.Column 1 is the amount of regionalization.Columns 2-16 show the misclassification numbers of each cluster (Roman numerals indicate the ID of clusters).Column 17 is the misclassification generated by each experiment, which is corresponding to Figure 10.From Table 2, we can find that the error of the STR method gradually decreases with the increase in the amount of regionalization and tends to be stable after more than 10 clusters.The reason can be obtained from the performance of the STR method for individual clusters.Each cluster contributes to the total error and outliers.A portion of errors and outliers will become part of new clusters as the amount of regionalization increases.However, too many clusters often lead to small attribute differences between clusters.That is, multiple clusters might reflect the same spatiotemporal pattern.Therefore, we think it reasonable to regionalize spatiotemporal data using the STR method and set the cluster number to 10 in the realworld dataset experiment.

Study area and data description
Air pollution is one of the important issues affecting the quality of human life and global environment changes (Cooper et al. 2022;Jbaily et al. 2022).The  flourishing development of high spatiotemporal resolution and multivariate air pollutant datasets also provides a basis for the expansion of regionalization algorithms from the spatial to the spatiotemporal dimension.STR of air pollutants also helps us to study their distribution and attribute patterns from the natural environment and socioeconomic aspects.Therefore, in this section, we apply the STR method to delineate the air pollutants structure of China.The data used are derived from the China High Air Pollutants dataset co-produced by the National Earth System Science Data Center and the University of Maryland (Jing Wei et al. 2021).The dataset is composed of the distribution of seven major pollutants in PM 1 , PM 2.5 , PM 10 , O 3 , NO 2 , SO 2 , and CO from 2000 to 2020 (Figure 11).The dataset is generated from big data (e.g.ground-based measurements, satellite remote sensing products, atmospheric reanalysis, and model simulations) using artificial intelligence by considering the spatiotemporal heterogeneity of air pollution (Wei et al. 2021).The PM 2.5 , PM 10 , O 3 and NO 2 datasets have a spatial resolution of 1 km and a temporal resolution of 1 day.The spatial and temporal resolutions of other pollutant datasets are 10 km and 1 month.In our experiment, we select four pollutant datasets of PM 2.5 , PM 10 , O 3 , and NO 2 with the same spatiotemporal resolutions in 2018.Figure 11 shows the distribution of the four pollutants in March, June, September, and December 2018.The scope of these pollutant datasets covers the whole of China and the whole year of 2018.

Spatiotemporal regionalization result of real-world dataset
In this section, we first map the four-pollutant data according to their spatial and temporal resolutions.Thus, each grid point of the formed new dataset saves the emissions of the four pollutants at that location.
Then, according to the temporal attributes of the new dataset, the temporal association between layers is constructed such that each grid point in the new dataset has neighbors in the spatiotemporal dimension.For convenient reading, the results are shown in an STC and are mainly divided into two parts of the spatiotemporal clusters and their numerical distribution of attributes.In Section 3.1, we have conducted stability and precision experiments and proved that when the number of clusters exceeded 10, the error of the algorithm tends to be stable (Figure 10 and Table 2).Thus, we regionalized the real-world dataset to 10 clusters using the STR method.Figure 12 represents the spatiotemporal distribution of the Chinese air pollutants' regionalization results calculated through the STR method.Figure 13 shows the distributions in the vertical (temporal) dimension shows the length of duration and in the horizontal (spatial) dimension of each cluster.Figure 14 shows clear differences in the numerical distributions of pollutants in each cluster, proving that the STR method realizes the clustering not only in the spatiotemporal dimension but also in the attribute dimension.Furthermore, in Figure 14, the violin plot of each clustering attribute distribution shows a flat shape.This represents that the numerical distribution of attributes within the same cluster is similar, and it also shows the internal consistency of clusters generated by the STR method.
If we combine the spatiotemporal distribution of clusters with attribute values distribution, then many interesting phenomena can be found.Figure 12 shows that the distribution of each cluster on the temporal scale is continuous, and no clusters produce a break.This result reveals unique and stable patterns of pollutant emissions across various clusters, forming cross-verification with the apparent attribute distribution features displayed in Figure 14.At the same time, the necessity of using the STR method to study the multivariate distribution in spatiotemporal clusters is proven.The spatial distribution of clusters has a significant correspondence with China's natural geographical and socioeconomic factors.Specifically, the spatial distribution of clusters has a strong correlation with the physical geographic constraints in western China and a high correlation with socioeconomic development patterns in southeastern China.
In detail, the distribution of Cluster 1 in time is persistent, and the distribution in space gradually decreases with time.At the beginning of 2018, Cluster 1's coverage covered Northeast China, northern Inner Mongolia, Hebei Province, and Shandong Province.Then, it was gradually reduced to Northeast China over time (Figure 13a).Northeast China is the earliest industrialized area in the country, and its economic structure is dominated by heavy industry and agriculture.Thus, the numerical distribution of O 3 is concentrated in high values and few outliers occur (Figure 14a).The values of NO 2 , PM 2.5 , and PM 10 are concentrated at lower levels, indicating that vehicle exhaust emissions are lower in Northeast China.This finding may indirectly reflect the low urbanization rate and relatively sparse distribution of big cities (Wauchope et al. 2022).
Cluster 2's spatiotemporal coverage is stable.There are no faults in the temporal dimension, and the spatial coverage is concentrated in the North China Plain (Figure 13b).North China Plain is the most densely populated area in China.The distribution numbers of the four pollutants are significantly more than other clusters (Figure 14b).We can also observe a significant feature of its numerical distribution, that is, the median value of PM 2.5 is high and the value near the peak is significantly increased.This finding is related to the existence of many cities with large populations in the region and their severe domestic waste gas emissions (Sala et al. 2021).In addition, Cluster 6 has similar numerical distribution to Cluster 2 (Figure 14f), indicating that the Loess Plateau and North China have similar industrial structures and development patterns.However, the O 3 of Cluster 6 is more concentrated on high values than that of Cluster 2, and the emissions of NO 2 and PM 2.5 are slightly lower.This finding represents the urban residents in the Loess Plateau have lower emissions than those in North China.
The spatial range of Cluster 3 consists of the Taklimakan Desert, Gansu Gobi, and the Qaidam Basin.Its temporal distribution is persistent.In the spatial dimension, its coverage is stable throughout the year and expanded to central Gansu Province at the end of the year (Figure 13c).This part is the most sparsely populated and most desertified area in the country.Widespread deserts and frequent dust storms result in the extremely high PM 10 value in this cluster, which peaks at around 1200 ug/m 3 , the highest among 10 clusters (Figure 14c).Owing to the sparse population and industrial facilities, the emissions of the other three pollutants are extremely low, and the value of NO 2 is even distributed around 0. Also distributed in Northwest China are Clusters 5 and 8, which have similar attributes and numerical distribution to Cluster 3. (Figures 14e and 14h).Cluster 5 is mainly distributed in the grasslands of north Xinjiang and Inner Mongolia.The contiguity is strong on the temporal dimension, but the spatial range gradually decreased with time (Figure 13e).Cluster 8 covers almost all the towering mountains and plateaus in western China, including the Qinghai -Tibet plateau, Himalayas, Kunlun Mountains, Pamirs, Tianshan Mountains, and the Qinling Mountains extending to central China.Its spatial extent is also stable, with little change over time (Figure 13h).These two clusters are also sparsely populated areas.The PM 10 values are lower in both two clusters than in Cluster 3 because of being far from the deserts, suggesting that the region with similar natural and social environments have similar emission patterns of pollutants.
Clusters 4, 6, 7, 9, and 10 are distributed in the central area of China.Among them, Clusters 6, 7, and 10 do not easily change in time and space.(Figures 13f,13g,and 13j).This illustrates that the air emission patterns underlying the three clusters have obvious spatiotemporal autocorrelation.Clusters 4 and 9 are distributed in China's southeast coast (Figures 13d and 13i).Their spatial positions overlap each other, and their temporal distributions intersect each other, showing that the two have similarities in pollutant emission patterns.Clusters 4, 7, and 9 exhibit similar patterns in attribute numerical distribution (Figures 14d,14g,and 14i).The four pollutants show little difference in values and a relatively uniform distribution of each pollutant.Geographically, Cluster 4 is mainly located in the coastal area of Southern China.Cluster 7 covers China's central urban agglomeration.The range of Cluster 9 corresponds to the lower reaches of the Yangtze River plain, Taiwan Island, and Hainan Island (Figure 12).These regions are among the most economically developed and commercialized in China.Cluster 9's prosperous economic activity has also created a pattern of pollutant emissions that differs from other clusters.Cluster 10 contains Southern Tibet and Yunnan and exhibits a unique attribute numerical distribution pattern.The median value of O 3 is significantly higher than the other pollutants, which is different from the other clusters (Figure 14j).The values of PM 2.5 and PM 10 are clearly lower than those of other clusters.This emission pattern may be inextricably linked to high forest cover, sufficient precipitation, and lower human activities (Z.Liu et al. 2015).

Comparison with other methods
In the real world, most geographic phenomena follow two dimensions of properties: attribute similarity and spatial or spatiotemporal contiguity.Thus, numerous spatial clustering methods have been developed around these two properties and form three major categories: attribute-based, regionalization-based, and the methods that balance the first two (Kim et al. 2015;Yuvaraj et al. 2021).Attribute-based clustering focuses on the similarity of variables and ignores spatial relationships.This category includes the k-means, BIRCH, DBSCAN, OPTICS, CURE, SOM, and ADCN (Assunção et al. 2006;Guo 2008;Nowosad and Stepinski 2018).Regionalization-based clustering emphasizes the importance of spatial contiguity, such as SKATER and AUTOCLUST (Aydin et al. 2018;Kim et al. 2015).From the clustering purpose, the STR method belongs to this category because it also considers attribute similarity (R. Wei et al. 2021;Zhang et al. 2021).However, different from the traditional regionalization-based clustering, this study regards temporal contiguity as an important factor.Thus, the STR method results have spatial congruity, temporal contiguity, and attribute similarity.Researchers can then comprehensively consider the characteristics of spatiotemporal distribution and attributes of numerical distribution to find the geographic patterns within the spatiotemporal data.On this basis, we analyze the spatiotemporal characteristics and attributes numerical distribution of the STR results and achieve promising results in the synthetic and real-world datasets.The results effectively reveal China's pollutant emission pattern under the dual control of neutral and socio-economic factors.

Shortcoming and future directions
Although our proposed STR method has shown promising results in STR in the synthetic and real-world datasets, we acknowledge three limitations, which we discuss here.(1) Regionalization results of STR are affected by the dataset shape.The STR method is a bottom-up spatiotemporal hierarchical regionalization that produces phenomena known as "region growing" during the aggregation progress.That is, approaching the edge of the dataset, each cluster is inevitably affected by the shape of the dataset.For example, when the temporal dimension is small (the T direction is too short), the region is forced to extend in the spatial domain, and vice versa.(2) The algorithm is relatively inefficient.For the hierarchical clustering method, the time complexity is O n 3 ð Þ.Thus, the computational cost is high for the large-scale dataset, restricting the analysis of ultra-long time series data.
(3) The scale effect and modifiable areal unit problem (MAUP) exist in the STR method.In previous studies, scholars generally believed that MAUP is a source of geo-statistical bias and can significantly affect the results of statistical hypothesis tests (Cressie 1996;Holt et al. 1996).That is, when the smallest geostatistical unit changes, the distribution of statistical variables of each unit also changes, thereby affecting the spatial analysis results of geographical variables (Kwan 2012;Menon 2012).Therefore, we select geographical units of other scales in this study.The internal multivariate of units will be changed, causing the distance between units in the feature space to change, thereby leading to clustering results different from the original experiment.If the range of geographic units increases, the original features will be smoothed.Similarly, if the geographic unit's range is too small, many units will contain extreme values, destroying the spatiotemporal contiguity of clustering results.
The selection of "Distance" also impacts the clustering results.The distance in this study can be divided into two aspects: geographical space distance and feature space distance.The selection of geographical distance determines spatiotemporal proximity rules.When it is commensurable, users can use the traditional distance algorithm as geographical distance.When geographical distance is not commensurable, users can use an adjacency matrix as geographical distance.The selection of feature space distance determines the unit's similarity.For numerical geographic variables, users can choose Euclidean distance or weighted Euclidean distance.For character geographic variables, users can use Jaccard distance as the calculation method of similarity.On this basis, we can summarize three future directions of this study.First, we can use intelligent optimization algorithms -such as genetic algorithm, particle swarm optimization, and artificial neural network -to improve the method's efficiency.Second, we can consider the top -down idea to avoid the influence of the dataset shape on the clustering results in the bottomup clustering progress.Last, we can develop a multiscale-stable multivariate spatial clustering method to achieve relatively stable patterns and clusters at different scales.

Conclusion
This study proposes a multivariate hierarchical regionalization method (STR method) to find spatiotemporal patterns.This method effectively realizes the regionalization-based clustering and pattern discovery of spatiotemporal data and achieves ideal results in the small synthetic dataset, large synthetic dataset, and real-world air pollution dataset.Compared with other regionalization methods, STR also shows advantages in spatiotemporal contiguity and attribute similarity of clusters.Through the experiment on a real-world dataset, we found three characteristics of the spatiotemporal patterns of pollutant emission in China.That is, global heterogeneity, local homogeneity, and temporal intersectionality on the southeast coast of China.This is a powerful statement of Tobler's first law and the second law of geography (Tobler 1970;Goodchild 2004b).The similarity of attributes within the same cluster supports the third law of geography that has emerged in recent years (Zhu et al. 2018).While the STR method achieves ideal results, it also exposed its own limitations: dataset shape effects, inefficiency, and MAUP.It also extends the future directions of the STR method: (1) embedding of efficiency optimization algorithms, (2) top -bottom idea applications, (3) research on multi-scale stable clustering methods.
The major novel contribution of this study lies on providing a new framework for spatial -temporal comparisons and a spatiotemporal perspective on STR.This also enhances the efficiency of spatiotemporal heterogeneity analysis and enriches the means for spatial pattern discovery.The STR method is a universal method.Any data confirming the input structure of the algorithm can be used for spatiotemporal pattern discovery, greatly increasing the scope of its application.Therefore, the STR method can support remote sensing, urban planning, and social science applications with the abundance of multisource spatiotemporal data.

Figure 1 .
Figure 1.Three-level data structure of spatiotemporal cube (STC).(a) and (b) represent the STC structure in the spatiotemporal dimension and spatial dimension.(c) shows that each STC unit consists of multiple variables.

Figure 2 .
Figure 2. Schematic illustration of the spatiotemporal proximity rules.(a) Spatiotemporal proximity rules of single STC units.(b) Spatiotemporal proximity rules of multiple STC units of the same class.(c) Spatiotemporal proximity rules of multiple STC units of different classes.

Figure 3 .
Figure 3. Cube neighborhood and its attributes distance calculation.Each cube contains dual information: one is spatiotemporal position information, and the other is its attribute information.

Figure 6 .
Figure 6.Variable distribution of regionalization results.a, b, and c in the right-hand figures correspond to the variable distribution characteristics of Clusters I, II, and III, respectively.

Figure 7 .
Figure 7. STR method results of synthetic dataset: (a)-(h) results under different cluster numbers.

Figure 8 .
Figure 8. Accuracy comparison test of random synthetic dataset: (a)-(i) comparison between theoretical and actual results under different cluster numbers of the STR method.

Figure 9 .
Figure 9. Numerical characteristics of each cluster when the number of regionalization is 10.

Figure 10 .
Figure 10.Misclassification number of MSC, MST, SKATER, and STR methods under different cluster numbers.

Figure 11 .
Figure 11.Real-world pollutant dataset: (a)-(d) spatial distributions of PM 2.5 in March, June, September, and December 2018 in China; (e)-(h) the spatial distribution of PM 10 at these four time points; (i)-(l) distribution of O 3 ; and (m)-(p) distribution of NO 2 .

Figure 12 .
Figure 12.STR method result of air pollutant dataset showing 10 clusters calculated by STR method.

Figure 13 .
Figure 13.Spatiotemporal distribution of each cluster calculated by STR method.

Figure 14 .
Figure 14.Plots of the numerical distribution of pollutants in each cluster: (a)-(j) pollutant distributions from Clusters 1 to 10.A violin plot's width represents the amount of data, and its height represents the distribution range.

Table 2 .
Misclassification of STR method for each cluster under different cluster numbers.