Unsupervised machine learning in urban studies: A systematic review of applications

.


Introduction
Cities, the most sophisticated creation of people, are believed to have hidden patterns in their physical forms and day-to-day functioning (Batty, 2008;Bettencourt & West, 2010). The growing availability of urban data and machine learning techniques that enable automatic pattern recognition from massive datasets have gained momentum in serving researchers to untangle the complexity of cities, thereby informing urban interventions and giving rise to data-driven planning (Athey, 2017). For example, they have been used routinely to monitor urban change (Schneider, 2012), evaluate socio-economic well-being (Jean et al., 2016), and assess our physical surroundings (Doersch et al., 2012;. Machine learning has been integral to urban studies for myriads of purposes, and the permutations of techniques and datasets that have permeated the field are seemingly endless. As evident by the reviews of Grekousis (2019) and Ullah et al. (2020), most applications of machine learning in urban studies rely on supervised techniques. In such approaches, workflows rely on a sample of input (training) data that is labeled with known values or categories. These are used to develop predictive models to estimate unknown values and explain relationships among phenomena. While supervised methods have proven to be useful for a wide range of applications and datasets, they do not purport to answer all research questions, and various challenges remain, e.g. obtaining training data as the real-world urban data is largely unlabeled (Zheng et al., 2014).
Another broad category of machine learning, unsupervised learning (UL) infers patterns from unlabeled data, unleashing further potential of making sense of dynamic and massive datasets in urban studies. In contrast to supervised learning (SL), these techniques pay no attention to structured semantic relationships, and therefore, are suitable to be applied to heterogeneous data such as text, imagery, audio, and video (Jain, 2010). Unlike supervised learning, which manually presets a goal to predict outcomes, unsupervised learning determines what is relevant based on data features (El Bouchefry & de Souza, 2020), providing new perspectives for urban studies beyond human's a priori knowledge. In this regard, unsupervised machine learning is believed to be the pathway to real artificial intelligence (Bengio et al., 2013), which fundamentally understands the world around us and is the key to AIgenerated design and policies. Under the prevailing trend of interdisciplinary GeoAI research (Janowicz et al., 2020;Liu & Biljecki, 2022), UL is instrumental in learning spatial representations and semantic enrichment of spatial data infrastructure (Huang et al., 2022;Jenkins et al., 2019).
While they have been overshadowed by the popularity and uptake of supervised learning techniques, unsupervised counterparts also have a long and successful relationship with studying cities, as we will confirm in this paper. They have been instrumental in uncovering patterns from growing urban datasets and untangling their complexities: for many years, methods such as hierarchical clustering analysis (HCA) and principal component analysis (PCA) were recognized to be crucial in a diverse set of investigations include evaluating built environment quality (Bonaiuto et al., 2003) and suburban study (Mikelbank, 2004). Over the past two decades, UL methods have prospered across a wider spectrum of urban studies and related domains, and the volume of publications featuring them has expanded remarkably, as we will show in this review paper. Among others, UL techniques support studies in the domain of assessing urbanization processes (Cottineau et al., 2017), investigating travel patterns (Sun & Axhausen, 2016), understanding sustainability and ecology (Richards & Tuncer, 2018), semantic meaning extraction of urban spaces (Gao et al., 2017), urban perception (Capela & Ramirez-Marquez, 2019), quality assessment of spatial data (Jacobs & Mitchell, 2020), and analyzing energy performance (Oh & Kim, 2019). Such diffusion of UL applications is associated with the proliferation of diversity and volume of urban data, and the rapid advancements of UL techniques and their ease of use (e.g. accessible implementations). The trend is expected to continue, as AI scientists posit that UL will become even more important in the future (LeCun et al., 2015). However, despite their demonstrated importance, significance, and growing uptake in urban studies, no comprehensive review was conducted to summarize UL applications in the urban context and understand trends, a gap which we seek to bridge in this paper. This void is in contrast with other fields, such as biomedical research and building performance analysis, in which the role of unsupervised learning has been subject of reviews (Miller et al., 2018;Xu & Wunsch, 2010).
In this paper, we systematically review the use of unsupervised learning in urban studies, with a highlight on the state of the art of the main techniques and their applicability in a broad range of topics. We hope to provide informative resources to researchers seeking to leverage UL for research related to cities, but also have this review to double as a reference for researchers who are yet to become acquainted with such techniques. We define the scope of urban studies as: urbanization and regional studies, built environment, urban sustainability, and urban dynamics ( Fig. 1), four themes that have largely been the focus of urban research or initiatives over the past decades.
In Section 2, we provide a high-level overview of unsupervised learning and its capabilities. Section 3 describes the methodology of this systematic review, while Section 4 offers statistical insights to the reviewed papers. The wide spectrum of UL applications is discussed in Section 5, where the contents are adherent to the four thematic urban study groups. Further, in Section 6, we reflect on the common patterns in this study area, limitations of unsupervised methods, implementation, and future opportunities. Finally, Section 7 concludes the paper.

Unsupervised learning background
It is beneficial to give an introduction to unsupervised learning, clarifying its aims and what tasks it is suited for, together with an overview of methods. The major difference between the UL and SL is whether the model uses known values as supervisory signals. That is, supervised learning uses labeled data to infer patterns and train a model to label unseen data, while unsupervised learning uses only unlabeled data, and does so for the purpose of discovering patterns, e.g. grouping similar features. It is often employed in applications where labeling is expensive or where it is not relevant.
Here we introduce three general categories: clustering, signal decomposition, and neural networks. To better demonstrate their capabilities from a practical perspective, we use an urban dataset which contains multiple representative urban data types as case study. The selected dataset is on listings on Airbnb in Singapore with their properties (e.g. type of listing, number of bedrooms, and price) and reviews (text and numerical scores). The data is courtesy of Inside Airbnb, 1 a project that provides open data that quantifies the impact of short-term rentals on housing and residential communities, and it is frequently used in research (Gurran et al., 2018;Li & Biljecki, 2019).

Clustering
Clustering is the most established subcategory of UL, which identifies subgroups within a raw, unlabeled dataset by similarities and differences in features (Jain, 2010). There are multiple clustering techniques, with k-means (Hartigan & Wong, 1979) being the most prominent one. In this method, clustering is done by moving centroids and assigning points closest to a given centroid into the same group. In such a case, data points within a cluster share common properties whereas differences among clusters are clear. The number of clusters is specified by the user. Fig. 2 demonstrates a k-means clustering on the dataset, the data points are divided into four groups based on two dimensions of information. The input dimensions of clustering algorithm can be more than two, in fact, one of the most common applications of UL in urban studies, as identified by our review, is clustering by multi-dimension features for discovering typologies (Section 5). Fig. 3 gives an example to this application -Airbnb data points are clustered by four features, deriving 4 typologies distributed across the city.

Decomposing signals
This category summarizes techniques that extract feature components from composite signals (e.g. image, text, numeric value), the result of which is high-dimensional data being mapped onto a low dimensional space while retaining internal structure (reduction of dimensionality) (Blei et al., 2003;Lever et al., 2017). One representative technique is principal component analysis (PCA) (Wold et al., 1987), which compresses datasets through linear transforming input variables into "principal components", i.e. new data representations composed of input variables. Through PCA, we are able to visualize the four-dimensional features of Airbnb typologies in a two-dimensional space (Fig. 4). The four features are "compressed" and represented by two principal components, the colored axes indicate their directions. In practice, PCA is generally performed before clustering for simplifying the interpretation of multidimensional data. It is also effective in identifying the most salient features.
Another popular technique is latent Dirichlet allocation (LDA) (Blei et al., 2003), which generates explicit representations of collections of discrete data (e.g. text corpus) by topics, i.e. lists of weighted observations. It is often applied for topic modeling tasks, for example, discovering latent topics discussed on social networks. It facilitates understanding people's perception of the environment from volunteered data such as neighborhood reviews (Hu et al., 2019).
We present the topic modeling result on the reviews of rental listings 1 http://insideairbnb.com/.
in Fig. 5 ten major topics are discovered and each of them contains a series of weighted relevant terms. In the selected topic, the most characteristic words include "place", "mrt" (metro system), "convenient", indicating the importance of particular topics to guests. To make the LDA result sensible to the non-experts, it generally requires human interpretation. The applications of signal decomposition techniques are versatile compared to clustering, for example, it is widely adopted for anomaly detection, feature extraction, topic modeling, and change detection tasks. Other related techniques include latent semantic analysis (LSA) (Landauer et al., 1998) and t-distributed stochastic neighbor embedding (t-SNE) (van der Maaten & Hinton, 2008).

Unsupervised neural networks
Unsupervised neural networks are an active research field of UL techniques, which advanced rapidly thanks to the progress in deep learning. Artificial neural networks are comprised of three types of node layersinput layer, multiple hidden layers, and output layer. Through the flowing and interactions of signals across layers, it has the ability to model complex non-linear relationships that are common in real-world problems (Schmidhuber, 2015).
Self-organizing map (SOM) (Kohonen, 1990) is a classical shallow neural network, which associates the neurons in input layer and output neurons that summarize many observations in a two-dimensional grid map (Fig. 6). The training process is similar to clustering, i.e. neurons compete with others to group around closest centroid neurons. Compared to k-means clustering, SOM does not require preset cluster number, the neurons will gravitate toward the natural clusters learned from the data structure. Therefore, it is suitable for learning and visualizing patterns in datasets that have large variations.
Deep neural network with more hidden layers has been gaining strong momentum recently. Representative models include autoencoder (AE) (Hinton & Salakhutdinov, 2006) and generative adversarial network (GAN) (Goodfellow et al., 2014). AE leverages neural networks to learn representations for input data  while ignoring insignificant noise (encoding), further reconstructing the data through the learned features (decoding) (Fig. 7). Through comparing the reconstructed data with the input data, the model performance can be evaluated. During the encode-decode process, it can compress the data with minimal reconstruction error, detect the most salient features and anomalies, and generate new data that is similar to the input, i.e. make predictions. GAN uses a discriminator to produce self-supervision signals for finetuning results from a randomized generator, therefore is able to produce new data output as realistic as the original input (Fig. 8). It catalyzes the development of applications such as image-to-image translation, video prediction, 3D object generation, and so forth (Wu & Biljecki, 2022). Restricted Boltzmann machine (RBM)  is also included in this review.

Overview
To identify papers employing UL in urban studies, we follow the PRISMA protocol of systematic review (Moher et al., 2009) following the practices of recent reviews in the field Zhao et al., 2021). First, we conducted a systematic search using reproducible syntax in a bibliographic database, fetching an initial literature pool of 668 papers. The relevancy of the papers were further examined based on several criteria through manual screening, 140 papers are included for this review. The details of the two steps are described in the following subsections.

Searching methods
We performed a search for relevant papers in the Web of Science (WOS) database. The search syntax consists of two groups of keywords. The first set zeroes in on papers that pertain to cities and the urban context, while the second one aims to delineate publications that engage   For the first group, we set the expression "urban* OR city". The asterisk expands the search to include variations of the key search terms, covering words such as urbanization and urbanism. Since these terms appear frequently in articles from other fields that are not in the focus of this review (e.g. computer science and medicine), it was an imperative to narrow the search scope. Deriving a more focused result was accomplished by limiting the search to the following subject categories in WOS: Environmental Studies, Geography Physical, Green Sustainable Science Technology, Geography, Engineering Civil, Urban Studies, and Regional Urban Planning.
The second set of keywords aims to identify articles that use unsupervised learning methods. At first, we followed the same approach as UL review in other field (Li, Shepperd, & Guo, 2020), using 'unsupervised OR unlabeled', yet the results suggested that almost all papers belong to the field of remote sensing, with the exception of a small number of articles published in the last two years. This is an interesting phenomenon: the term "unsupervised" dominates the field of remote sensing, but it is often not found explicitly in other sub-areas of urban research. To ensure the diversity of the literature pool, we re-ran an exploratory search in Google Scholar and found that the relevant articles rather specified the techniques used in the abstract or keywords (e.g. PCA, k-means clustering), which belong to the large umbrella of unsupervised methods. Therefore, we added the UL techniques listed in Section 2 to the search syntax in WOS. Note that because too many papers apply PCA for dimensionality reduction without producing insights, this review will focus on its use for feature extraction. For a more thorough review on PCA and spatial data, the reader is referred to the paper of Demšar et al. (2013).
In the same way as the previously published reviews in the field (Ibrahim et al., 2020;Ma et al., 2019), we included only peer-reviewed papers written in English and published in academic journals, finally collecting 668 papers that range from 1996 to 2021 (until 19 October 2021, when the final search was executed). The full search syntax is attached in Appendix A.

Inclusion criteria
After deriving the initial literature pool, we proceeded to manually select those that are relevant for our review: we screened the titles, abstracts, and keywords of papers to assess their relevance to our context. We established the following criteria that a paper should meet to be considered relevant for this review.
1. The study is conducted in an urban or peri-urban area. 2. It is predominantly an urban studies paper, which represents or predicts the patterns in an urban system. To clarify the fuzzy meaning and ameliorate the blurry boundaries of our area of focus, we summarized four fundamental and trending thematic groups in urban analytics and practices. The included papers should belong to one of the following themes: (a) Urbanization and regional study: the divergence and change of cities that are shaped by economic, social and political powers at the macro level (Brenner, 2013). It can either be a physical process (e.g. land use change) or reflected in socio-economic well-being. (b) Built environment: the human-made physical space in which people live, work and recreate on a daily basis (Roof & Oleru, 2008). Typical examples are building blocks, streets, and public spaces. (c) Urban sustainability: collection of study topics such as biodiversity, ecosystem service, air pollution, and heat island effect (Verma & Raghubanshi, 2018). Such insights help cities to develop in a sustainable and resilient manner. (d) Urban dynamics: monitoring and predicting the patterns of people's activity, traffic flow and utilities demand (Gao, 2015), leading to smarter urban management or business decisions.
Our review (Section 5) gives specific examples of how UL is employed in each of these themes.
3. The paper is using one or more unsupervised methods. This step was crucial, as we have encountered a number of papers that happen to contain the same acronym as an UL technique but carrying a different meaning. 4. Unsupervised learning is the primary analysis method. This criterion is necessary to keep only papers in which UL techniques predominate. For example, papers that use UL only for minor or peripheral  tasks such as data pre-processing or comparative study are not considered to meet this criterion and are thus excluded from consideration. 5. The methods proposed in the study are tested on real-world datasets instead of being purely theoretical, generating substantial insights for researchers or implications for practitioners.
Following the screening of the papers in the initial pool, we took forward 140 papers that satisfy the criteria above.

Results and overview
The papers selected for this review turned out to focus on a variety of urban challenges and feature a diverse set of approaches, datasets and tools, affirming the permeation of unsupervised learning through urban studies ( Fig. 9). This section describes the general trends and insights of the reviewed papers, based on a set of information extracted during review (Table 1). Fig. 10 indicates the temporal evolution of unsupervised learning permeating through urban studies, and the change in share of each research category. Recently, there is a clear upsurge in the number of publications, which has increased fivefold from 2016 to 2021. With relevant papers proliferating rapidly, it is necessary and timely to summarize previous works and follow up with recent advances.
Regarding the share of categories, there are notable differences among themthe number of papers related to Urban Dynamics is twice of those pertaining to Urban Sustainability. It is also worth noting that although the application in Urban Dynamics started later than in other areas, it has gained considerable attention in recent five years. Aside from that, Built Environment has always been a field of interest, e.g. Owen et al. (2006) use a clustering algorithm in an early investigation of automatic land cover classification.
In Fig. 11, we break down the annual publications by their application types. The categorization of the applications is determined through summarizing common types from multiple related studies (Jing & Tian, 2019;Miller et al., 2018;Usama et al., 2019), while fine-tuning according to the contents in the literature pool. Note that due to the nuances in underlying techniques, the eight types are not mutually exclusive (e.g. topic modeling with clustering). However, we believe this level of detail in categorization is necessary for outlining the fundamental objectives of UL applications. The result suggests a steady growth of research relying on clustering, which is associated with the overall growth of paper volume. Topic modeling is an observable emerging topic with a considerable proportion of papers published in recent three years. This growth may be linked to the proliferation of readily available geo-tagged text data such as social media posts and Point of Interest (POI) data (Huang & Li, 2019;Wang & Taylor, 2019). It is also noteworthy that there is a new subset of papers inspecting prediction applications, virtually all of which focusing on traffic prediction Ranjan et al., 2021;Zhang et al., 2021), with just one exception that simulates urban growth and housing dispersal by Generative Adversarial Network (GAN) (Ibrahim et al., 2021).
Deriving from the data structure, the analysis result of unsupervised learning has a close connection with the input data. The numerous permutations of data types and methods create a very diverse landscape of research in this topic. In Fig. 12 we extract three core features that characterize a study: category, application, and the data type been used, Fig. 9. Conceptual structure of the reviewed papers by correspondence analysis of keywords. Incidentally, unsupervised learning methods are useful in bibliometrics, e.g. clustering a large set of papers and establishing categories. The figure is created by an R package developed by Aria and Cuccurullo (2017). Urban dynamics Prediction Spatiotemporal Autoencoder revealing the share and relationships among the feature segments. The first key observation is that clustering is the most common practice for papers in all categories, especially is a substantial component in studies about urbanization and sustainability making up more than 80 % of the total. Second, for Urban Dynamics and Built Environment, the shares of other applications such as topic modeling, feature extraction, and prediction are relatively higher. Because they are emerging applications, the corresponding research categories are expected to gain a stronger momentum of innovation. The third observation comes from the data type. Unsupervised learning can be successfully applied to a wide variety of urban data (Table 2), and there is no data type that is predominantly popular. A subset of studies relies on multi-source data. Thanks to the wide applicability, researchers use unsupervised method as a bridge to connect multiple datasets, which introduces more perspectives to the analysis process while mitigating the bias caused by reliance on a single data source (Cai et al., 2019;Devkota et al., 2019;Vizzari & Sigura, 2015). The technical aspect is also important to cover. For each study, we have noted the specific techniques and programming languages, about one-third of the authors reveal such information. Fig. 13 illustrates the findings: the top 3 frequent techniquesk-means, Self-organizing maps, and DBSCANare essentially all for clustering data, with only a few exceptions that use SOM in extracting spatio-temporal features (Liu, Zhang, & Long, 2019;Oldoni et al., 2015;Sohn, 2013). The next most frequent technique is latent Dirichlet allocation, which is primarily used for topic modeling. In fact, most of the techniques only have a single type of application, except for autoencoder which serves versatile uses.
On the technical side, unsupervised methods are supported by the mainstream programming languages, including R, Python, Java, Matlab, and so forth, which often enable such functionalities through welldocumented and popular machine learning packages. They are mostly open-source, and there appears to be no dominant language in the field. Among others, we feature the most frequent one -R. The base R (using it 'out of the box' without packages) supports PCA and k-means clustering, as used in the example in Section 3. Further techniques are implemented thanks to packages, e.g. the kohonen package supports SOM training and visualization. However, the functions of R packages supporting unsupervised learning appear to be limited as among them over two thirds deal with clustering-related issues on numeric values, while missing input data types such as image and spatio-temporal data. In contrast, Python is backed by a wide range of machine learning libraries and can be applied in myriad ways. For example, there is an integrated machine learning package for various supervised and unsupervised algorithmsscikit-learn (Pedregosa et al., 2011), and gensim package, which support the review topic modeling example shown in Section 3. On top of that, thanks to the integration with a deep learning environment, Python is able to process high-level features of a large amount of data. For example, Singh and Mohan (2019)    Finally, we provide a comprehensive list of the reviewed papers in Appendix B. The table in the appendix contains the extracted information supporting the analysis, and the papers are organized by the taxonomy of data type.

Urban sustainability
Considering that the evaluation of urban sustainability largely relies on quantifying environmental or geographic indicators (Keirstead & Leach, 2008), a common use case of unsupervised learning in this domain is clustering geographical units (i.e. cells and administrative zones) according to features from those indicators. A recurring related area of study is ecosystem services (ESs), which means the benefits humans obtain either directly or indirectly from ecosystems (e.g. food production, nutrient retention) (Lyu et al., 2019). The results of studies in this domain are ES bundlesthe joint spatial distribution of ecosystem services, based on multi-sourced datasets that consist of land use, climate, census, and geographical data, in where unsupervised learning identifies and clusters geographic units that have common high-dimensional features (Karimi et al., 2021;Lyu et al., 2019;Yang et al., 2019). The ES bundles act as ideal units of visualizing their spatial distribution, studying temporal changes (Yang et al., 2019), analyzing spatial trade-offs and synergies (Karimi et al., 2021), and identifying effective environmental protection strategies (Lyu et al., 2019). Other than identifying similar patterns in environmental metrics, Richards and Tuncer (2018) experiment with using unsupervised learning for assessing cultural values of nature depend on social media photos. First, a step using computer vision (implemented in Google Cloud Vision) generates specific object labels from nature photographs, and second, hierarchical clustering summarizes the unstructured labels into 7 distinct groups. It is estimated that this unsupervised workflow can save 170 h of manual work on human-assigned subject classification.
Moreover, the clusters with distinct environmental features may serve as foundation for exploratory multivariate analysis that studies the relationship between human activities and environmental outcomes (Ferrara et al., 2017;Schmiedel et al., 2015). For example, after classifying Italian municipalities into homogeneous partitions by forest cover indicators, Ferrara et al. (2017) summarize social indicators by the Fig. 12. The share and relationship among publication segments. In this review, data type "spatial" represents static geographic objects such as building footprints, while "spatial-temporal" indicates data that record human or traffic movements such as GPS trajectory. For detailed implications of each data type, please refer to Table 2.  (Gao et al., 2017) clusters, and through discriminant analysis, the authors point to agriculture, income, education, and the labor market as key predictors of forest cover. Although the limitation of this method is evident, i.e. the degree of human impact on the environment has not been quantified, the results can simplify indicators selection in future statistical learning tasks.
As urban sustainability is a common objective for cities around the world, many of the studies have been focusing on assessing and comparing sustainable development across cities. The conventional selection and weighting processes for sustainability evaluation can be subjective to human bias (Paulvannan Kanmani et al., 2020), therefore, researchers use unsupervised learning for automating the extraction of key factors that indicate sustainability (Akande et al., 2019;Martins et al., 2021). In addition, unsupervised learning is practical in comparing the environmental performance among cities or countries (Amaral et al., 2021;Lu et al., 2015;Paulvannan Kanmani et al., 2020). For example, Paulvannan Kanmani et al. (2020) apply SOM, a technique that maps and visualizes the high-dimensional data onto twodimensional output space while preserving their relative distance, on 10 environmental indicators of 180 countries, results in a map of nodes that inform the countries' relative locations of sustainable development.
Further, on the temporal aspect, unsupervised learning is proven valuable in capturing the environmental dynamics and natural hazards due to its simplicity and capability of enhancing the information on changes. One particular use case is monitoring urban forest and vegetation conditions, which can be carried out by unsupervised classification of vegetation indices derived from multi-temporal satellite data (Krtalic et al., 2021). Another research area is related to disasters such as flooding. Peng et al. (2021) propose a framework of large-scale unsupervised urban flood mapping, in which auto-encoder learns the multidimensional features of both pre-and post-flood patches for comparison, while Xu et al. (2018) generate the urban flood map of 5 risk levels through clustering flood related features. Thanks to being able to save time from human-annotated training data, results from such methods may be used in real-time to inform emergency humanitarian assistance and disaster relief (Peng et al., 2021). For further studies on environmental change see (Tessler et al., 2016) and (Kropp, 1998).
Lastly, there are numbers of papers zero in on urban thermal environment (Kwon et al., 2021;Xu et al., 2020;Zawadzka et al., 2021), the application type of which also falls into clustering multi-sourced data such as temperature, humidity, building density, and ground surface. Among them, we feature the work of Kwon et al. (2021), which discovers the unfavourable and favourable thermal areas cities based on sensible heat flux data through k-means clustering, the resulting zoning map may inform associated sustainable energy policies.

Urbanization and regional study
The process of urbanization can be directly reflected by land use change (e.g. farmland shift into built-up area). Since there are hardly any cities that have detailed land use registries over a long temporal range, researchers in this domain mostly develop their own land use and land cover classifications based on satellite imagery (Naikoo et al., 2020;Owen et al., 2006;Xu et al., 2012;Ye & Chen, 2015). Though this issue is widely explored by supervised classifiers, unsupervised learning presents a complementary avenue, which is especially useful in a common situation where suitable reference data (e.g. current land use map) is not available (Ye & Chen, 2015). Moreover, without manually setting digital references that generally requires prior knowledge, the economic and professional barriers of unsupervised method are lower (Johnson & Xie, 2011). In performing such tasks, researchers engage unsupervised learning to extract the most representative spectral information from satellite images (Xu et al., 2012), cluster pixels or grids by internal homogeneity and external heterogeneity of spectral values (Naikoo et al., 2020;Qi et al., 2019). Because this field of study is closely related to remote sensing, technical explorations on improving model performance are active, including utilization of luminance and saturation information (Ye & Chen, 2015), and applying LDA to enhance the semantic correlation of the multitemporal image scenes .
The demographic and socioeconomic transitions have been subject of the urbanization discourse as well, in which machine learning plays an important role. Many studies concentrate on neighborhood change, essentially relying on census and survey data collected in past decades and using clustering methods to summarize the vectors of change (e.g. gentrification, depopulation etc.) (Delmelle, 2017;Dias & Silver, 2021;Li & Xie, 2018;Liu, Deng, et al., 2019;Serra et al., 2014;Yuan et al., 2021). For example, in a nation-wide study of 50 American metropolitan areas, Delmelle (2017) introduces SOM to summarize neighborhoods by their similarity onto a two-dimensional output space, simplifying the large and high-dimensional census datasets into the distinct groups of the neighborhood change trajectories. Compared with the thresholdbased method that is commonly used in social studies, Liu, Deng, et al. (2019) argue that unsupervised learning avoids arbitrariness, yet is less intuitive to interpret the result as there are no preset rules that follow theoretical guidelines.
Another topic in this domain is regional typology studies, which is aimed to demonstrate the divergent states of development across cities and suburbs. Thanks to the versatile applicability of unsupervised learning to a variety of data structures, the regional typologies have been investigated from multiple perspectives include social-economic powers Baum et al., 2006), development conditions (Cabrera- Barona et al., 2020;Rahman et al., 2019), urban form (Lemoine-Rodriguez et al., 2020), and hybrid features (Mikelbank, 2004;Fiaschetti et al., 2021). We emphasize the work of , which visualizes the relative socio-economic locations and movements of 35 global cities on a hex grid coordinates plane, and suggest using this unsupervised mapping technique as a supporting toolbox of informed decisions.
Considering the fuzzy spatial boundaries between urban centres, urban, and natural or rural areas, researchers have also resorted to unsupervised learning for automating the segmentation of different urbanized areas, eventually generating urban spatial maps that show planners and policymakers the potential areas for future development. A common pathway is clustering urban-rural diagnostic features such as nighttime light intensity and fluctuation (Feng et al., 2020), land use (Vizzari & Sigura, 2015), travel pattern (Ozus et al., 2012), and multivariate statistics (Arribas-Bel & Schmidt, 2013). For comparison among cluster methods, one can refer to the work by Fusco and Perez (2019). There is also a special case that is not relying on clustering: Kit et al. (2012) outline urban slums only through detecting lacunarity in satellite images by PCA and line detection algorithm.
Finally, although the number of related studies is small, we would like to highlight that advances in urban growth prediction have been brought by unsupervised learning (Feng & Liu, 2013;Ibrahim et al., 2021). In an experiment in Qatar, Ibrahim et al. (2021) use GAN to simulate urban growth and housing dispersal, the result exhibits a high similarity with historical maps. According to Albert et al. (2018), GAN can accurately predict future urban growth with a relatively small training data set composed of satellite images only. This means with the help of unsupervised learning, realistic simulation of the urban land use displacement is available to the developing countries without undergoing the costly compilation of spatial variables in other simulation methods.

Built environment
Thanks to the ability of unsupervised learning to identify and disentangle the discriminative information hidden in the milieu of large data collection methods, study on social sensing of the real usage of urban spaces from a bottom-up perspective (as the counterpart to topdown land use plan) (Papadakis et al., 2019) is made possible by learning patterns from user-generated data (Gao et al., 2017). This type of application can be summarized as urban function study. Some studies adopt the approach of first finding spatial aggregation of activities by travel trajectories and call records (Rios & Munoz, 2017;Tao et al., 2019;Wang et al., 2021), and then inferring the urban function according to prior knowledge on behavior patterns, e.g. high and regular activity during weekdays refer office areas (Rios & Munoz, 2017). However, researchers point out that this method does not capture the true semantics of urban space (Tao et al., 2019).
A more popular method is utilizing POI data that both reflects the concentration of activities and has embedded semantic information (Gao et al., 2017;Hu et al., 2020;Jing et al., 2021;Miao et al., 2021;Papadakis et al., 2019;Pavlis et al., 2018;Yu et al., 2020;Yuan et al., 2020;Zhang et al., 2018). In the list, we feature the work of Gao et al. (2017), which develops a statistical framework to help discover semantically meaningful topics and functional regions based on the cooccurrence patterns of POI types. The functional regions are grouped together by similarity in semantic meaning, resulting in a convex polygon map with distinct thematic characteristics. The featured technique in this study is LDA, which maps the semantic information onto a vector space, therefore the numeric distance between words can be calculated and compared, i.e. measuring the semantic similarity of places. It is one of the most powerful techniques in identifying urban functions and has been replicated and adapted in other geographical spaces, for related studies see Papadakis et al., 2019;Zhong et al., 2018).
Urban function is entangled with another area of researchurban structure (Cui et al., 2019;Kim, 2020a;Zhong et al., 2018), which can be viewed as a further step of spatial summarization of urban function that reveals concentration pattern, e.g. after generating function zones of London from aggregated tweets, Zhong et al. (2018) profile the spatial structure by hierarchical clustering, results in multilevel structural maps that may support strategic planning of economic clusters.
Unsupervised learning is also useful in helping researchers to understand our physical surroundings, by automatically extracting or reconstructing the most salient visual features encoded in the images (e. g. street view imagery (SVI)). The purposes of studies under this topic are quite diverse, including investigating visual characteristics that affect the quality of a space (Comber et al., 2020;Wu et al., 2020), constructing urban appearance libraries (Nguyen et al., 2020;Taecharungroj & Mathayomchan, 2020), and generating design intervention based on learnt features (Wijnands et al., 2019). Traditionally, such tasks are arduous, largely relying on field surveys and manually distinguishing the variety of visual cues. Supervised learning is able to annotate properties of large image collections. There are papers using supervised image labeling services such as Google Cloud Vision (Taecharungroj & Mathayomchan, 2020) and SegNet  in combination with unsupervised methods that reduce the dimensionality of the photo labels and find image groups with common characteristics (topics). However, in studies that require information more specific than general labels, supervised learning is found hard to balance between speed and accuracy (Comber et al., 2020). With the advancement of deep unsupervised models, it is possible to capture visual representations from images directly without laborious manual labeling. Here we emphasize two enabling algorithms: autoencoder that learns the most useful pixel groups traits (e.g. signage, design style, and color) of reconstructing the input building frontage images (Comber et al., 2020), and GAN that captures key characteristics (e.g. ground texture, tree density) of SVI from one urban area and translate the style to another one (Wijnands et al., 2019).
Apart from urban features, we observe a set of papers focusing on detecting specific urban objects. One core idea of this practice is to find iterated objects, for instance, landmarks that appear frequently in geotagged photos (Samany, 2019) and landscape amenities that be repeatedly mentioned in housing advertisements (Su et al., 2021). The other approach builds upon LiDAR point cloud data, which exploits the intrinsic characteristics of the raw 3D points (e.g. proximity, connectivity, symmetry) and converts the points into sets of clusters by similar characteristics (Aljumaily et al., 2017;Xue et al., 2020). Each cluster represents certain urban objects such as cars, buildings, and ground surfaces, and the performances of classification are tested to be highly accurate. In short, unsupervised learning presents scalable and efficient frameworks for mapping real-world objects that can be used in building digital twins of cities (Aljumaily et al., 2017;Xue et al., 2020).
Another domain worth highlighting is urban morphology study. Generally, urban morphology can be represented by a series of numeric metrics generated from the shapes of buildings, plots, and streets (Biljecki & Chow, 2022). Because the morphological datasets are highdimensional in nature (e.g. the morphological metrics derived from building footprint include density, size, shape and so forth (Jochem et al., 2021)), unsupervised learning is more capable of discovering underlying common geometric patterns and producing urban form typologies than any other method (Abarca-Alvarez et al., 2019; Bobkova et al., 2021;Jochem et al., 2021;Oh & Kim, 2019). Specifically, in a study across five European cities, Bobkova et al. (2021) identify seven plot types by clustering plot configurational attributes, enabling scalingup morphological studies and substantial comparison within and across regions. The results of such studies are also effective in simplifying downstream urban analytic tasks. For example, Oh and Kim (2019) develop 13 building block typologies for energy performance simulation, providing a building geometric features reference set to urban energy planning and design. Note that in representing the types by morphological metrics, researchers often choose the closest values to the cluster centroids (Bobkova et al., 2021;Oh & Kim, 2019).
Besides the aforementioned use cases, the remaining built environment studies relying on unsupervised learning are quite diverse, including extracting indicators for environmental quality or livability assessment (Bo et al., 2019;Bonaiuto et al., 2003;Tu & Lin, 2008), establishing typologies of Transit-Oriented-Development stations by spatio-temporal features (Li, Han, et al., 2019;Liu, Singleton, & Arribas-Bel, 2020;Sohn, 2013), creating acoustic summary of sound features from urban environment (Oldoni et al., 2015), and analyzing the dominant source of noise under various types of land (Chew & Wu, 2016).

Urban dynamics
The rise of the research domain of urban dynamics is associated with the increasing volume and accessibility of datasets that register individual activities at a dynamic pace, e.g. smart card data that traces travel patterns (Manley et al., 2018), call records and cellular data that imply movements or social interactions (Sagl et al., 2014), and social media posts and photos that reflect personal sentiments on urban spaces (Olson et al., 2021). Given the complexity and large variety in such datasets, retrieving key information from them has become challenging (Sun & Axhausen, 2016). This problem is compounded with the lack of prior knowledge on the complex individual behaviors in cities, making supervised learning a less favourable option for analysis. Therefore it is not surprising a large number of papers employ unsupervised learning to discover the interactions among space, time, and individuals from the high-dimensional spatio-temporal data, i.e. study the collective behavior patterns within cities (Bi & Ye, 2021;Chen et al., 2019;Kim, 2020b;Li, Zhu, & Guo, 2019;Manley et al., 2018;Ouyang et al., 2018;Pieroni et al., 2021;Sagl et al., 2014;Sun & Axhausen, 2016;Xing et al., 2020;Yu et al., 2021;Yu & He, 2017;Yue et al., 2018). Sun and Axhausen (2016) apply tensor decomposition method that reproduces the complex dependence and interactions in 14 million transit journeys extracted from smart card data by simple latent structures. The decomposition results depict several principle travel patterns and their corresponding profiles, e.g. peak hours, origin-destination, and age groups, revealing the underlying spatio-temporal structure of Singapore. The insight from the study could provide reference to practitioners for fleet management and infrastructure planning. Besides tensor decomposition, clustering techniques are straightforward in dealing with travel data as well. For example, DBSCAN captures clusters in smart card data based on the time and location similarities of individual travel behaviors. Compared with k-means clustering, DBSCAN considers point density and is therefore capable to identify high density temporal events that indicative of regular behaviors (Manley et al., 2018). For related studies on using unsupervised learning in discovering travel patterns, see the work by Ouyang et al. (2018); Chen et al. (2019); Xing et al. (2020); Pieroni et al. (2021) and Yu et al. (2021).
Human activities that represented by mobile phone data is another direction of behavior pattern study. Sagl et al. (2014) characterize variations in intensity and similarity of collective human activity via SOM, and Liu, Zhang, and Long (2019) further aggregate human activities with similar characteristics spatially and identify several urban vitality areas. In fact, pedestrian traffic measured by cellular activity is a popular indicator for urban vitality, and unsupervised methods is prove useful in unleashing its latent spatio-temporal features, for relevant studies see (Kim, 2020b) and (Guo et al., 2021). The findings from such studies also help to reveal how collective human activities relate to the underlying urban structures, e.g. connections within and between communities (Ghahramani et al., 2019), land use (Manley & Dennett, 2019), and regional functions (Dong, Wang, & Liu, 2021).
In addition to the behavior pattern portion, unsupervised learning on urban dynamics information also helps to uncover the way people perceive and interact with urban spaces. Much of the research aim to answer where do people like to visit in cities, that is to pinpoint the Area of Interest (AOI) (Devkota et al., 2019;Hu et al., 2015;Huang & Li, 2019;Li et al., 2021;Liu, Singleton, et al., 2021;Sun et al., 2021).
In a study on the evolution of AOI in six cities across ten years, Hu et al. (2015) apply DBSCAN to geotagged Flickr photos to extract point cluster based on density. Moreover, the authors develop a spectral clustering workflow that computes image similarity so that photos contain views shared by multiple people are grouped into clusters, and the photo that is most similar to all others is selected representation of the AOI. The results help reveal the growth of cities' attractive regions alongside with urban development, may inform the planning of "charismatic" destinations. This work also demonstrates the versatility of unsupervised learning, as in the same work, it is applied in parallel on different types of data and the results are combined. Many other studies on AOI follow the same two-step workflow, i.e. first identifying AOI clusters, then interpreting the grounds for attractiveness. The approach to the second step varies, besides the aforementioned identification of preferable photos, it can also be achieved through topic modeling from posts (Huang & Li, 2019) and POI . However, the technique used in the first step is almost exclusively DBSCAN, with only a single research applying k-means on phone signal data . DBSCAN is particularly suitable for AOI extraction for two reasons: (1) the number of AOI clusters is hard to estimate but DBSCAN does not require such pre-determined number of clusters as k-means clustering does; and (2) it is solid at detecting clusters with arbitrary shapes (Hu et al., 2015). However, as unsupervised methods learn patterns from the data structure, researchers point out that the resulting locations may inherit the location accuracy issue in the input datasets and are not representative enough for all age groups (Devkota et al., 2019;Huang & Li, 2019;Sun et al., 2021).
Similarly, there is a set of papers that pay specific attention to human perceptions of space (Cai et al., 2019;Capela & Ramirez-Marquez, 2019;Liu, Yin, et al., 2020;Olson et al., 2021;Sparks et al., 2020;Steiger et al., 2016). Although all of them deal with text data, there appears to be no predominant methodology or purpose among these studies, thus, we highlight a few characteristic papers: Steiger et al. (2016) combine SOM with LDA to extract the spatio-temporal aggregation of popular topics from georeferenced tweets, Capela and Ramirez-Marquez (2019) detect topics that compose the "personality" of each city in the electronic media outlets by LDA topic modeling, Liu, Yin, et al. (2020) visualize the semantic structure of urban regions through embedding the POI-type onto 2-dimensional space using t-SNE, which preserves semantic relationship among words during dimensionality reduction, and Olson et al. (2021) exploit autoencoder to learn the compact representations of reviews from Yelp from relatively sparse word usage, and use the representations for the ascription of tangible neighborhood change. In addition to the specific applications, the work by Abdul- Rahman et al. (2021) establishes a general framework to simplify the process of extracting public sentiment on urban issues from social media, where LDA categorises sentiments into major themes such as high rental prices, noise, and social segregation. The code supporting the proposed method is released openly.
Another essential component of urban dynamics is traffic flow. Unsupervised learning has been introduced for traffic forecasting since an early point, Sun et al. (2006) predict short-term traffic flow by adopting Gaussian mixture model (GMM) to compute the joint probability distribution between input traffic and the output (traffic in the next time interval). The fundamental concept of choosing GMM is events in the natural world obey Gaussian distributions. Similarly, Fiez and Ratliff (2020) apply GMM in parking demand modeling. However, GMM only predicts by Gaussian distributions and does not take into account the real complex interconnections in the data. Given the advancements in deep representation learning (GAN, autoencoder), there is a surge of papers published in 2021 that opt for it for extracting implicit and complicated traffic features, and compressing the large volume of raw data Ranjan et al., 2021;Zhang et al., 2021). For example, Chen et al. (2021) propose a hybrid forecasting model with autoencoder embedded, and Zhang et al. (2021) train GAN to learn the probability distribution of the real historical traffic flow, generate future traffic flows through the learned probability distribution. The models in both of the studies effectively improve the prediction accuracy compared to previous statistical models and have a strong generalization ability.
Aside from the regular patterns in cities, there are also outliers, i.e. special events. In this context, crowdsourced data is akin to sensors in cities, and unsupervised learning is of assistance to process the massive volume of data and identify anomalous urban events in real-time, primarily by focusing on semantic information, a key information layer for event detection. For example, Wang and Taylor (2019) identify geolocations of Twitter topics that rapidly escalate in a short time period through LDA, and combine with sentiment analysis for ranking emergency events by intensity of negative sentiment. The proposed framework is directly built upon the API of this social media channel, therefore providing real-time tracking and assessment of disasters for emergency management agencies. For related work see (Zuo et al., 2018).
Another information layer is composed of traffic records such as taxi GPS trajectories  and traffic video (Singh & Mohan, 2019), identifying events in this portion can infer the occurrence of adverse traffic behavior or road accidents. The approach of detecting anomalous patterns in movement trajectories is straightforward, for example, hierarchical clustering is successful in distinguishing regular trajectories and anomalous trajectories . However, the information extracted from traffic videos is more ambiguous and presents larger variations. Therefore, representation learning is introduced in such a task again, Singh and Mohan (2019) train a stacked autoencoder for generating deep representation of video features, the proposed generalized method is able to detect abnormal incidents, such as unusual speed, trajectories, position at high accuracy.
Lastly, there is a portion of studies in which specific improvements to the unsupervised algorithms are made for better inspecting spatiotemporal data. Park et al. (2021), Liu, Huang, et al. (2021), and Choi and Hong (2021) optimize DBSCAN algorithm for clustering large-scale spatiotemporal datasets or detecting feature objects (similar to the task elaborated on in (Hu et al., 2015)) and collective activities in an more efficient manner, You (2021) proposes a clustering method that can be directly applied on spatial data without setting prior assumptions or user-defined parameters. Notably, those papers are all published very recently (in 2021), suggesting the rising attention on the pertinence of unsupervised methods and urban dynamics analysis.

General observations
Based on our review (Section 5), there are scores of ways unsupervised machine learning permeated through urban studiesthe use cases cover almost the entire landscape of urban data sources, from conventional census data (Paul & Sen, 2018) and satellite imagery  to prevailing spatial big data of individual activities such as smart card records (Manley et al., 2018), call records (Rios & Munoz, 2017), and social media (Olson et al., 2021). We observe that the input datasets in the reviewed papers are generally very large, e.g. 14 million transit journeys (Sun & Axhausen, 2016) and 7 million geotagged Flickr photos (Hu et al., 2015), which confirms the increasing importance of unsupervised learning in mining invisible patterns with the growing volume and diversity of urban data.
Among all the application types, clustering is most frequently exploited (Fig. 12). The popularity of clustering is in line with two paradigmatic tasks in urban studies: typology study and spatial aggregation profiling, which are found frequently in the reviewed papers. Typology study is a multi-faceted task across several domains and at various scales (Bobkova et al., 2021;Guo et al., 2021;Mikelbank, 2004;Nguyen et al., 2020;Oh & Kim, 2019;Tessler et al., 2016). Such fashion can relate back to the theories by urban study precursors, which simplify the complex urban systems into a few types for facilitating further interpretations. Compared to the traditional categorizations constrained by limited analytical ability, unsupervised methods relegate the issue of arbitrariness through quantifying the latent relationship of a larger number of observations (Liu, Deng, et al., 2019).
Spatial aggregation profiling inspects the geographical extent of clusters of specific functions (e.g. retail) or human activities (Liu, Singleton, et al., 2021;Manley et al., 2018;Pavlis et al., 2018;. Unsupervised learning demonstrates its ability in dealing with spatial data on tasks such as this one by taking into account not only the similarity and difference, but also the relative spatial distances between points. In fact, we have also noted several other spatially aware unsupervised techniques for different application types, e.g. geographically weighted PCA , meanwhile new enabling frameworks is keep being published (Park et al., 2021;You, 2021), implying the wide applicability and growing relevance of unsupervised methods to geographical investigations.
Considering the methodological aspect, some papers utilize the synergy of various unsupervised techniques in analysis and deriving interpretations (Gao et al., 2017;Hu et al., 2015;Samany, 2019). Yet a more common approach we noticed is combining unsupervised methods with other methods, for example, in some cases UL is introduced to improve the performance of previous models (Honjo et al., 2015;Xue et al., 2020), or is applied in conjunction with supervised methods, e.g. unsupervised topic modeling supplements supervised image segmentation (Taecharungroj & Mathayomchan, 2020;Wu et al., 2020), supervised model tests the validity of representations learned by from unsupervised method (Olson et al., 2021). Such collaborative workflows reveal that unsupervised learning can be applied not just standalone, but also to optimize existing methods and simplify downstream tasks. Moreover, supervised learning can provide validation to the UL results which assure reliability, also makes the UL results more comparable.
Regarding the state of the art of unsupervised learning, which is continuously evolving, we observe that it takes time for the advances in computer science to become adopted in urban studies. Approaches that have been developed years ago, e.g. autoencoder and GAN (Baldi, 2012;Zhao et al., 2017), are being spotlighted in urban studies only recently Comber et al., 2020;Wijnands et al., 2019;Zhang et al., 2021). In addition, it appears that the penetration of UL techniques varies considerably among the domains we examined, together with the adoption of the latest instances, e.g. transportation research and GIS are benefiting from the most advanced techniques whereas others lag.
A potential limitation of this review is that the methodology (Section 3), following the common systematic review approaches, does not include combing through preprints and conference papers, which may describe developments and directions not captured by this review. In addition, due to the breadth of research topics and few exposed technical details in reviewed papers, the comparison of performances of UL in urban-related tasks is not provided in this review.

Issues
In this section, we discuss the common challenges and limitations of unsupervised methods cited frequently by researchers, with a few observations on our own. Data quality. The effectiveness of unsupervised methods relies on the intrinsic structure and quality of the input data. Problems with the data can accrue in bias, and data collected from different sources may generate different results. Researchers cite the specific issues in social media data most frequently, such as the representativeness of demographic groups (Hu et al., 2015), positional accuracy of geo-tagged photos and posts (Samany, 2019), and sparse signal in fringe areas (Steiger et al., 2016). In addition to data from social media, the quality of video and sound data is also a concern, as they are susceptible to environmental conditions such as darkness and noise (Oldoni et al., 2015;Singh & Mohan, 2019).
Besides the fact that unsupervised learning can be affected by poor data quality, it is important to note that data that has obscure patterns also brings challenges. For example, Zuo et al. (2018) find it difficult to identify emergent events that are not intensively discussed in tweets, while Singh and Mohan (2019) report the issue in extracting patterns from vehicle movements with large variations. In addition, without labels that control the expected outcomes, unsupervised learning generally has less accuracy in comparison with its supervised counterpart in performing the same task. For example, in land use classification and change detection tasks, supervised learning methods gained higher overall accuracy to 5-10 % than unsupervised methods (Chughtai et al., 2021;Mohammady et al., 2015). However, considering its benefits of reducing cost and labor, which are vital to the implementations of urban studies, there is inevitably a trade-off between the two.
Interpretation. Considering that UL learns patterns without labels as semantic reference, it may be difficult to interpret the results. In the case of clustering, the most common way to generate substantive interpretations is through manual summarizing the cluster features by identifying the most representative variables in each grouping. For example, Baum et al. (2006), Tessler et al. (2016), and Ferrara et al. (2017 develop and name localities' typology based on the most significant features from social and environmental indicators. However, these researchers all posit that such a method is not considered statistically sound. The large room of interpretation also entails that the conclusions may be easily affected by personal view, which impairs their comparison.
In this regard, although UL is an objective method which preserves the intrinsic patterns within the data structure, human bias is still possible to happen in the interpretation stage. On the contrary, the human bias of SL might be produced during the crafting of labels. Before making a selection between the two methods, it is worth considering which is more convenient and the bias is more acceptable.
Moreover, interpreting UL results is costlier as it requires professionals to correlate the patterns with the domain knowledge, which explains its relatively less popular in urban studies than other domains few experts in urban studies possess interdisciplinary knowledge as required.
Similar to the absent semantic meaning, the statistical relationships between the results (e.g. typologies, determinants) and urban phenomenons of interest (e.g. energy consumption, forest dynamics) cannot be revealed by the result of unsupervised learning alone (Ferrara et al., 2017;Li, Ying, et al., 2020), because it does not imply causation and degree of influence.
Validation. Validation is an imperative step to assess the confidence of the findings and facilitate replication of studies in more geographical locations. Researchers cite two ways of validation: internal and external. Internal validation analyzes the internal structure (e.g. cohesion and separation) of the results, each UL technique has well-established internal validation methods, they are routinely used in many papers (Arbolino et al., 2019;Feng & Liu, 2013;Schmiedel et al., 2015;Serra et al., 2014;Vizzari & Sigura, 2015).
However, studies engaging unsupervised applications in cities do not stop short at internal validation. Because urban studies have a close connection with real-world practice, justifying the results are aligned with the ground truth is always an area of focus. For this purpose, researchers resort to various external sourceshistorical flooding maps, official emergency event records, land use from the master plan, results of preceding studies, ground survey, and manual labeling (Akande et al., 2019;Aljumaily et al., 2017;Peng et al., 2021;Richards & Tuncer, 2018;Xu et al., 2018;Xue et al., 2020;Ye & Chen, 2015;Zhang et al., 2018). However, these studies are rare, accounting for less than 10 % of publications. Such external information is not always readily available (e.g. survey data is time-consuming and costly to gain), and results from certain application types (e.g. clustering and topic modeling) generally have no existing benchmark to compare with, consequently the integrity and reliability of UL may be questioned.

Open science
The open science aspect is worth highlighting. There are only 10 papers in our review pool that released their code openly, the lack of sharing on advances inhibits their diffusion to other cities. Given the fact that unsupervised learning can lower the economic and professional barrier of massive scale urban analytics, it is especially valuable to the developing countries where most of the world's urbanization takes place (Ibrahim et al., 2021;Rahman et al., 2019;Rios & Munoz, 2017).
There seem to be a need to raise the awareness of open science and this is something we hope this discussion will spurthe developers should consider sharing code or model for promoting geographical equity of urban study developments, and contribute to the open developments in the urban data science community (Yap et al., 2022).

Research opportunities
This section discusses potential research opportunities that may increase diversity and depth to this rapidly expanding topic. Concerning the issues we discussed in Section 6.2, there are scores of possible research directions responding to them. For example, the proliferation of social network data (e.g. text, photos, and videos sourced from social media, and POIs) presents a viable opportunity for less subjective and more comparable interpretations. In rare instances of related studies, they all target limited applications, i.e. revealing urban functions and travel purposes (Bi & Ye, 2021;Gao et al., 2017;Huang & Li, 2019). It seems that research using crowdsourced information has gaps worth investigating because explaining other discovered urban patterns that relate to human perceptions remains unexplored, e.g. visual features, vitality, and urban morphology.
Further, as revealed in the review, some tasks can be carried out both by supervised and unsupervised methods that have dissimilar traits: SL results in higher accuracy while UL is more transferable and accessible. We believe that revisiting SL studies using unsupervised methods will be a meaningful research direction, whose reliability could be validated by SL results as ground truth, while presenting cheaper and more efficient urban analysis frameworks for supplementing the gap in urban study in numerous developing regions.
Continuing the discussion on geographical aspect, as unsupervised method can be easily employed in elsewhere, it is worthwhile to conduct comparative studies across locations of different social-economic and cultural backgrounds. To our knowledge, in the built environment and urban dynamics sectors this kind of study is largely under-investigated.
As the current deep learning models such as autoencoder and GAN are proven to be highly capable in reconstructing data, fulfilling missing pieces, even making realistic predictions, engaging them into urban applications is unquestionably a research frontier. Considering the small proportion of related publications and the limited topics (e.g. transportation prediction) (Ranjan et al., 2021;Singh & Mohan, 2019), there is a clear untapped opportunity of extending their use cases into a wide variety of scenarios such as complementing data infrastructure of digital twins (Rasheed et al., 2020), data compression for real-time urban monitoring, simulating dynamics in urban systems, and AI-generated urban planning and design.
Finally, opportunities can also be found in the rapidly evolving technological landscape of unsupervised learning, as newly developed methods may enable novel applications. For example, a recent selfsupervised image pre-trained model has outperformed the best supervised model in a diverse set of computer vision tasks (Goyal et al., 2021). Similar efforts are being made intensively for improving both the accuracy and efficiency of unsupervised learning, the cumulative progress appears to be promising. We believe that with the strides made in unsupervised techniques, urban studies engaging them will grow exponentially, but they should reduce the lag in adopting them.

Conclusion
In this paper, we have reviewed the application of unsupervised learning in urban studies. Unsupervised techniques, despite their versatility and number of methods, have been somewhat overshadowed by supervised counterparts due to the different volume of papers and disparate applications. However, that makes it possible to capture the developments in a single and self-contained review paper, in which we have revealed trends and provided a comprehensive list of applications of the comprehensive list of more than a dozen unsupervised methods.
We find that the applications of unsupervised learning span the entire landscape of urban data sources. With the growing human-sensing data and built environment records such as street view imagery, novelties of UL applications have been emerging continuously, discovering new patterns and representations of cities that inform decision making or catalyze novel downstream analytics. Moreover, in many instances, unsupervised learning enables heterogeneous multi-source data convergence that explicitly represents the complex real-world urban systems at multiple scales (Zhan et al., 2020(Zhan et al., , 2021. We identify clustering as the most prominent application type, followed by topic modeling that provides bottom-up understandings of the urban environment. Although unsupervised deep learning models have gained popularity in other disciplines, the potentials of them in urban studies are underexplored. Such domain is where we expect to see a growing volume of studies in the coming years. Relevant studies are emerging, in simulating urban growth (Albert et al., 2018), traffic forecasting , and environment beautification (Wijnands et al., 2019).
The limitations of unsupervised methods are discussed in this review as well, followed by several research opportunities addressing them. Through making use of the urban semantics information we believe the interpretation bias of unsupervised learning results can be alleviated. It is also important to consider the strengths of both supervised and unsupervised learning and optimize analysis methods through their synergy.
We have also provided a concise introduction to unsupervised learning, as a gentle overview for our peers who are yet to consider using it in their research. Thanks to the growing and easily accessible free and open-source implementations, and the vibrant data science community, the entry barrier has never been lower. We hope that this paper will raise awareness of the potential of unsupervised learning and will catalyze further applications.
We expect that in the future, we will indeed witness a growing volume of uses in urban studies, largely thanks to the increase in the volume of available data and the advancements in techniques.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability
No data was used for the research described in the article.