From intangible to tangible : The role of big data and machine learning in walkability studies

Walkability reflects the well-being of a city, and its measurement is evolving rapidly due to advancements of big data and machine learning technologies. The study examines the transformative impact of these technological interventions on the evaluation of walkability trends over the period 2015 to 2022. We create a framework consisting of big data sources, machine learning methods, and research purposes, revealing research trajectories and associated challenges. Despite diverse data usage, image data dominates in walkability research. While street view and point of interest data were primarily used to depict the environment, social media and handheld/ wearable data were more commonly employed to represent user behaviours or perceptions. Leveraging machine learning in conjunction with big data assists researchers in three aspects of walkability studies. First, researchers utilise classification and clustering to predict street quality, walkability, and identify neighbourhoods with certain characteristics. Second, researchers unveil relationship between the built environment and pedestrian perceptions or behaviours through regression analysis. Third, researchers employ generative models to create streetscapes or urban structures, although their utilisation is limited. Meanwhile, challenges persist in data access, customisation of machine learning models for urban studies, and establishing standard criteria to guarantee data quality and model accuracy.


Introduction
Machine learning (ML) models trained from a large volume of data have recently transformed many fields of science and technology.The field of urban studies is no exception to this rule.Researchers utilise street view (SV) data to quantify the composition of street space elements, enabling the estimation of detailed spatial quality on a large scale (Biljecki & Ito, 2021).They also, for example, explore social media data to understand collective behaviours, aiming to provide improved services to citizens (Zhang & Pan, 2019).
Walkability serves as a vital urban indicator, reflecting aspects of urban vitality, sustainability, and general well-being in a city.Numerous studies demonstrate that cities with higher walkability experience more active commutes and fewer health issues compared to their nonwalkable counterparts (Hoehner, Handy, Yan, Blair, & Berrigan, 2011;Kim, Kim, & Kim, 2020;Litman, 2003;Yue et al., 2022).While researchers have highlighted various benefits of walkability in previous studies, enhancing it presents challenges due to varied definitions across different domains and contentious evaluations (Lo, 2009).Some studies focused on objective calculations based on easily accessible statistical data (e.g., land mix use, population density) on large scales (Frank & Pivo, 1994), while others concentrate on subjective measurements derived from self-reports or surveys (Saelens, Sallis, Black, & Chen, 2003) at the neighbourhood level.
Prior studies have established that objective and subjective measures represent different aspects of walking (Gebel, Bauman, & Owen, 2009;Hoehner, Ramirez, Elliott, Handy, & Brownson, 2005;Leslie et al., 2005).However, obtaining subjective data on a large scale remains challenging and expensive.With the increasing accessibility of new digital data, such as SV or social media data, ML excels at automatically and continuously processing data, surpassing human capabilities.
ML trained on large-scale data, such as SV data or data generated via social media, offers fresh insights, revealing factors influencing walkability, and even providing hundreds of solutions for rapid improvement of walkability.Despite a small portion of walkability studies involving ML, exponential growth in the past decade has captured researchers' attention (Fig. 1).However, previous reviewers often focused only on one type of data, such as SV data (Biljecki & Ito, 2021;Cinnamon &

Method
In this section, we begin by outlining our criteria for selecting the relevant literature.We then filter out studies based on their alignment with the definitions of big data and their integration of ML.

Literature search procedure
According to the research questions, we define the following primary keywords 'walkability','machine learning', 'artificial intelligence', and 'big data'.We perform a literature search using these keywords in Web of Science, Scopus, and Google Scholar.In the initial phase, we employed combined keywords, namely 'walkability AND machine learning OR big data OR artificial intelligence', in three databases to understand the temporal distribution of the literature to determine the trend of the study, and to establish the time range of this paper.We observed a similar trend in all databases: before 2012, there was little growth in walkability studies involving big data or ML.Between 2012 and 2015, the literature in this field began to increase slowly.Afterwards, the number of publications in this field exhibited a snowballing growth.In 2021, the number of publications based on Google Scholar results were more than a thousand times higher than in 2000.In the same year, the results from Scopus and Web of Science were also returned several hundred times more than twenty years ago (Fig. 1).
We exclude publications before 2015 due to limited prior literature.Then we again searched these three databases using the same keywords and applied the following criteria: 1. Time: articles published between 2015 and 2022; 2. Type: peer-reviewed journal articles and conference papers; 3. Must keyword: articles including walkability; 4. Selective keyword: articles including at least one of the following keywords, namely 'machine learning', 'big data', 'artificial intelligence'; 5. Language: articles written in English.We conducted the search using the keywords "walkability AND machine learning OR big data OR artificial intelligence" across all databases.In Scopus, we searched the literature with keywords that appeared anywhere in articles within our time scope.We initially retrieved 1320 articles where we selected only peer-reviewed articles and conference papers in English.The total result from Scopus was 373 articles.The search result from the Web of Science with the given keywords returned around a hundred thousand publications, many of which were deemed irrelevant to our study upon skimming.To narrow down the focus, we also employed the term "research area" in Web of Science, limiting the studies to transportation, urban studies, or architecture.After checking the types of articles and language, we obtained 424 relevant articles from Web of Science.Using the same keywords and criteria, we received 725 articles from Google Scholar.After excluding duplicate articles, we systematically read the abstract of all 1522 articles to further filter out those that did not meet our criteria for literature selection.Finally, we identified a total of 103 articles for systematic review processing (Fig. 2).

Criteria based on the data definition and ML methods
Many walkability studies have focused on, for example, the selection of variables influencing walkability (Maghelal & Capp, 2011;Saelens & Handy, 2008), the walking ability among different groups (Alfonzo, 2005;Liao, van den Berg, van Wesemael, & Arentze, 2020;Moura, Cambra, & Gonçalves, 2017) or purposes of walking (Alfonzo, 2005).Due to the complexity of walkability and the controversial nature of different disciplinary perspectives (Lo, 2009), our literature selection procedure may not cover all aspects related to walkability.Instead, we focus on understanding how big data and ML methods facilitate walkability studies in terms of analysing the built environment (e.g., sidewalk quality, urban vitality) and users (e.g., pedestrian perception, walking behaviour), providing a snapshot of how this knowledge can be further utilised by urban planners and researchers for walkability studies and other urban studies.

Definition of big data
To answer our first research question, we reviewed the prevailing definition of big data, which established our choice of articles.According to (Hashem et al., 2015), four main aspects of big data are widely accepted.First, big data are generally much larger in volume compared to traditional data, although volume may vary depending on their formats (Gandomi & Haider, 2015).Second, big data possesses variety because of its diversity of data structures and forms (De Mauro, Greco, & Grimaldi, 2015;Gandomi & Haider, 2015).Third, big data also represents velocity based on the speed of generation and the efficiency of data processing (Gandomi & Haider, 2015).Finally, processing big data usually requires specific technology or analytical methods (De Mauro, Greco, & Grimaldi, 2016), allowing the hidden value of big data to be revealed.We selected the literature that contains at least one type of big data according to these characteristics.Since many studies used the data fusion method where multiple data sources were used to conduct research, we also included studies that used both big data and traditional data.Studies that used only conventional data collected using methods such as observation, survey, or GIS tools were excluded.

ML methods
ML generally includes supervised learning, unsupervised learning, semi-supervised learning, and reinforcement learning (Jordan & Mitchell, 2015;Mitchell, 2006;Sarker, 2021).To address our second research question, we select studies that used at least one type of ML method.Since many studies used mixed methods, our in-depth discussion is limited to the portions involving ML.Studies using methods other than ML are not considered in this context.

Comparison of the workflow in study purposes
To understand the general process of the studies from data through methods to their purposes, we categorise them into four groups with 11 subgroups (Table 1) and generalise the workflow based on their purposes.Most studies used supervised learning to determine correlations in data from multiple sources or evaluate particular aspects of urban quality.Others used unsupervised learning to cluster neighbourhoods or generate urban forms.It is important to note that the purposes of the studies are not always unique and may overlap.

Correlations
We group the articles into the correlation group when a study determines a relationship between variables based the definition of correlation (Press, 2023).We reveal four types of correlation.

Correlation between built environment variables and walking behaviours
The studies examined here how built environment variables affect walking behaviours, constituting the largest portion of the total literature.Common data used to represent built environment included street view (SV), point of interest (POI) data, and satellite data.Walking behaviours were actively measured through surveys to find where (Keyvanfar et al., 2018;Koo et al., 2022b) and how often (Deng & Yan, 2019;Nagata et al., 2020;Yang et al., 2021;Yang, Fricker, & Jung, 2022) citizens travelled on foot.Walking behaviours were also passively detected through handheld device data, such as GPS positioning (Wu et al., 2022) or GPS trajectories (Miranda et al., 2021;Santucci et al., 2018) and through camera sensors (Lai & Kontokosta, 2018;Li, Yabuki, & Fukuda, 2022b).In addition to the aforementioned information, studies calculated pedestrian density from SV data (Chen et al., 2022;Doiron et al., 2022) to examine the preferred locations for pedestrians.Studies typically used computer vision techniques or GIS tools to extract built environment variables from visual data or POIs and then employed regression to establish the relationship between those data and walking behaviours.While none of the studies used the same variables to represent their built environments, over half of the studies (n = 15) in this category found that greenness was associated with walking behaviours.However, some studies reported contradictory results.Some showed that the openness of the sky encourages walking (Basu & Sevtsuk, 2022;Chen et al., 2022;Nagata et al., 2020;Santucci et al., 2018), while others found that it does not have an impact on walking (Doiron et al., 2022;Sevtsuk et al., 2021).

Correlation between built environment variables and pedestrian perception
This group emphasises how the built environment relates to pedestrian perceptions (e.g., safety, comfort) and accounts for a large portion of the total literature.SV data (n = 22) were the most frequently used data in this group.In addition to representing various built environment variables, SV data were used for perception ratings.Therefore, the studies relied heavily on data labelling.Researchers conducted imagebased rankings based on the opinions of experts or the public to obtain scores that represent perceptual aspects as image labels for ML tasks.Meanwhile, according to the definition given by the researchers, the variables representing the built environment were extracted or calculated from the SV data using computer vision techniques.Regression was then performed to examine the correlation between the built environment variables derived from the images and perceptual scores from humans.About one-third of the studies in this group used six perceptual attributes (safe, lively, boring, wealthy, depressing, and beautiful) originally proposed by Dubey et al. (2016) to rank image data.Other perceptual aspects, such as openness (Dai et al., 2021;Ma et al., 2021), transparency (Zhao & Guo, 2022), satisfaction (Lee et al., 2022;Yang, Zhang, et al., 2022), and complexity (Joglekar et al., 2020;Qiu et al., 2021;Zhao & Guo, 2022) were also commonly used for rating images or were calculated directly based on the pixels of images.In

Table 1.
Categorising literature based on their study purposes.
addition to SV data, social media data were used to understand perceived opinions about locations (Song, Zhou, et al., 2022;Tang et al., 2022).Although the perceptual attributes between studies are similar, it is rather difficult to summarise the consistency of findings due to different definitions of built environment aspects (e.g., enclosure, openness, complexity) in individual studies.However, when variables were extracted directly from SV data and used as built environment factors, high greenness and presence of sidewalks demonstrated a positive impact on safety, beauty, and liveliness (Dai et al., 2021;Rossetti et al., 2019;Zhang et al., 2018).

Correlation between built environment variables
Six studies are in this group, where they have all used visual data (SV data, n = 5; social media data, n = 1).Some aimed to compare the difference in methods between street level walkability derived from street images and macroscale walkability based on GIS calculation (Adams et al., 2022) or results of @Walkscore (2023) (Deng et al., 2020).Some focused on a correlation between visual environment variables and auditory data (Xie et al., 2022) or green accessibility (Luo et al., 2022).

Correlation between built environment variables and health
The studies examined here how the built environment variables extracted from SV data or other data sources are related to health issues, including obesity (Althoff et al., 2017;Yue et al., 2022), mental health (Wang, Lu, et al., 2019), and stress (Benita et al., 2020;Kim et al., 2022), which were assessed using wearable sensors or surveys.Although all studies focused on different health problems, the findings collectively pointed to the conclusion that the built environment has a significant impact on health levels.

Evaluation
Evaluation involves assessing qualities (e.g., street space quality) through various variables.In the ML field, evaluation refers to predicting unknown data labels using existing labelled data.There are four groups in this category, all representing various aspects of walkability.

Evaluating street space quality
The quality of street space greatly affects the walkability (Jacobs, 1993;Maghelal & Capp, 2011).Most of studies (n = 14) have taken advantage of SV data to assess street space quality.They directly evaluated sidewalk quality (Duan et al., 2022;Jiang et al., 2022;Weld et al., 2019), greenness (Ye, Zeng, et al., 2019,Ye, Richards, et al., 2019;Middel et al., 2019), or openness (Wang, He, et al., 2022;Yin & Wang, 2016) from visual data, or calculated aspects such as diversity (Zhang et al., 2019), accessibility (Ye, Richards, et al., 2019) or safety (Wang, Zeng, et al., 2022) based on SV data and other types of data (social media data, POI data, satellite data).The ML used in these studies functions primarily as a predictive tool, relying heavily on dense labelling.Fourfifths of studies labelled their data for training.To understand how well ML evaluates street space quality, we investigated the accuracy of models.When predicting a single object (e.g., the sky), the model can achieve accuracy up to 98% (Yin & Wang, 2016).However, as the complexity of the prediction task (e.g., safety) increases, the precision can drop to around 60% (Wang, Zeng, et al., 2022).

Evaluating walkability
This group integrated SV data and POI data to directly predict walkability.The walkability here was defined by experts through image labelling (Blečić et al., 2018) or calculated based on different data sources (Alfosool et al., 2022a).Although the number of studies in this group is limited, Koo et al. (2022a); Zhang et al. (2022); Deng et al. (2020) have shown that walkability at the street level and walkability assessments at the macro level differ spatially.And street-level walkability appears to better explain pedestrian walking behaviours.The models in this group generally achieved 80% accuracy, which shows the potential for future walkability tasks.

Evaluating pedestrian volume
Pedestrian volume usually reflects walkability (Lo, 2009;Sohn, Moudon, & Lee, 2012).Historically, it was manually counted through field observation (Hess, Vernez Moudon, Catherine Snyder, & Stanilov, 1999) or using a counter device (Aultman-Hall, Lane, & Lambert, 2009).All studies here utilised visual data, including SV data (n = 5) (Chen et al., 2020;Chen et al., 2022;Yin et al., 2015) and webcam data (n = 1) (Petrasova et al., 2019).They calculated pedestrian volume through object detection or semantic segmentation belonging to computer vision techniques.As computer vision technology develops, it can detect pedestrians more accurately, which may be more effective than traditional computing.However, only Chen et al. (2020) mentioned the comparison of SV data with conventional data, where they argued that ML is better at predicting pedestrians in SV data with a high density of pedestrian flow.

Evaluating vitality
Urban vitality reflects the attractiveness and activeness of cities. Walkable space usually indicates high vitality (Jacobs, 1992;Marquet & Miralles-Guasch, 2015).In total, four studies predicted vitality using different data sources, which we grouped into two categories.One represents the built environment (SV data (Guo et al., 2021), POI data (Huang et al., 2020), satellite data (Scepanovic et al., 2021)) and the other represents human activities (survey (Scepanovic et al., 2021), social media data (Huang et al., 2020), handheld device data (Guo et al., 2021)).Although each study defined vitality differently, all their vitality calculations were based on both types of data.

Sorting
Sorting means categorising data, where ML models learn similarity of data features and group them through clustering.Unlike other groups where SV data were widely used, many studies in this group benefited from social media data and POI data.

Sorting neighbourhoods
Neighbourhoods were either sorted based on opinions from social media data or on service characteristics from POI data.Over half of the studies here used social media data, which were processed through natural language processing (NLP) to gather opinions on urban quality (Guhathakurta et al., 2019;Marsillo et al., 2022;Sandoval Olascoaga et al., 2016) and then clustered neighbourhoods based on opinions.Others in this group utilised POI data and survey data to cluster servicebased neighbourhoods (Bramson & Hori, 2021).

Sorting mobility groups
Although our review does not focus on walking in different groups and there are only three articles in this group, it is worth mentioning that the use of clustering models helps to achieve their particular aims.Social media data was used to identify mobility groups based on user content (Orama et al., 2022;Zhang et al., 2020) or SV data were extracted to understand the characteristics of environment where different pedestrian groups were located (Xue et al., 2021).

Generation
Generation refers to generative AI that learns to generate synthetic with properties similar to a given training set.All studies in this group used image data and generative adversarial networks belonging to generative AI to envision new streetscapes for enhancing street space quality (Joglekar et al., 2020;Wijnands et al., 2019).Additionally, they proposed new roads (Fang et al., 2021;Hartmann et al., 2017) or new block buildings (Shen et al., 2020) on vacant lots to match existing urban structures.Although all studies used similar models to generate different objects, half of them acknowledged the difficulty in controlling the output quality, especially when generating complex images.

Data sources
Most of the studies used diverse data sources simultaneously, classified into 13 categories (Fig. 3).We highlight SV data, point-of-interest (POI) data, handheld / wearable device data, and social media data, aligning with the definition of big data.The remaining categorises provide supplementary insights.SV data played a prominent role in more than half of the studies, while around one-third of all studies applied POI data.Similar numbers were observed in road network data obtained from the OpenStreetMap API (Hartmann et al., 2017;Ye, Richards, et al., 2019,Ye, Zeng, et al., 2019;Wu et al., 2020).Approximately one-fifth of the studies incorporated survey data, collecting information on variables such as walking frequency (Song, Ning, et al., 2022;Yang et al., 2021), walkability perception (Ramírez et al., 2021;Tang & Long, 2019), choice of transportation mode (Aschwanden et al., 2021;Kim et al., 2022), walking satisfaction (Jiang et al., 2021;Lee et al., 2022) and neighbourhood classification (Le Falher et al., 2015).About 20% of the studies collected wearable sensor data, including Fig. 3.The relationship between data sources and study purposes.While walkability studies utilised various data sources, SV data remained the most important data source.The correlation between environmental factors and walking behaviour was extensively explored using a range of data sources.mobile GPS data to understand pedestrian preferences during trips (Basu & Sevtsuk, 2022;Yamagata et al., 2019), and bio-signal data to unveil the impact of the environment on pedestrians (Kim et al., 2022).Similarly, a comparable percentage of studies gathered social media data (e.g., Twitter, Flickr) to gain insights into neighbourhood quality (Guhathakurta et al., 2019;Le Falher et al., 2015;Marsillo et al., 2022) or assess urban vitality (Huang et al., 2020).
A small group of studies used earth observation data (satellite map) to predict urban vitality (Scepanovic et al., 2021) or streetscape (Verma et al., 2021).Two studies employed Li-DAR point cloud data to assess urban streets (Wu et al., 2021) and sidewalks (Jiang et al., 2022), which were not commonly used.Interestingly, although the auditory field is traditionally considered a significant factor influencing pedestrian preference (Appleyard, 1980;Bosselmann, Macdonald, & Kronemeyer, 1999;Zhang et al., 2022), few studies have explored and used this type of data.Furthermore, other traditional data sources, such as weather and census data, were also used to support walkability studies.In the following part, we focus on the most prevalent big data sources (e.g., SV data, POIs).We delve into the origins of the data and their diverse applications in enhancing walkability.Then we interpret the characteristics of the data in terms of volume, variety, velocity, and value.Additionally, we discuss the weaknesses of different data sources.

Sources of SV data
Google first developed SV data in the early 21st century.Google Street View remains the leading choice for research in this field, followed by Baidu and Tencent (Fig. 4).SV data are generally captured by cameras and GPS placed on the roof of a vehicle (Anguelov et al., 2010).

Purposes of using SV data
SV data served nine different study purposes, which we defined earlier (Table 1 and Fig. 3).Two primary functions emerged in these studies.First, SV data were used to represent built environment variables when research is focused on the relationship between pedestrians (e.g., walking behaviour, health) and the environment or when research assesses spatial quality (e.g., street space quality, walkability) (Weld et al., 2019;Ye, Zeng, et al., 2019;Zhang et al., 2019) directly.In such studies, SV data were typically extracted through deep neural networks (Wang, Lu, et al., 2019;Yin & Wang, 2016;Zhou et al., 2019) for regression analysis.Second, the SV data served as a medium for surveys to understand perception when research determined a relationship between perception and built environment variables (Dai et al., 2021;Yao et al., 2019;Zhang et al., 2018;Zhao & Guo, 2022).

Advantages of SV data
The volume of SV data used in the studies varies widely.Developing an ML model from scratch for tasks such as understanding human perception (5 million data points) (Wang, Liu, et al., 2019) or generating street images (4.5 million data points) (Wijnands et al., 2019) requires vast amounts of data.However, the use of pre-trained models mitigates this demand, reducing the required data to a few hundred for evaluating the built environment (Wu et al., 2021).In discussion about the value of SV data, Yin and Wang (2016), Adams et al. (2022), Zhao and Guo (2022) and Suminski Jr et al. (2019) agreed that SV data served as an effective auditing tool, as the SV data can be easily extracted automatically through deep learning (Suminski Jr et al., 2019), increasing the scalability of street environment auditing (Adams et al., 2022).SV data show to be more adept at reflecting walking behaviours than macroscale variables (Koo et al., 2022b), and properly trained ML algorithms measure variables in the built environment in street scenes objectively (Chen et al., 2020;Yin et al., 2015).The variety of SV data refers to the data coming from different API services and taken at different times.Tang and Long (2019) compared SV data from different years in their longitude study to determine whether the quality of space has improved.The advent of SV data provides remote site inspection.More importantly, ML models such as convolutional neural networks (CNN) (Albawi, Mohammed, & Al-Zawi, 2017) and fully convolutional neural networks (FCN) (Long, Shelhamer, & Darrell, 2015) trained with existing data sources (e.g., Cityscapes, ADE20K) can interpret image data in bulk in a short period of time, demonstrating its velocity surpassing human-intensive environment evaluation.

Weaknesses of SV data
Vehicle-collected data often lack pedestrian perspectives (Koo et al., 2022a;Xuan & Zhao, 2022) and have limited availability in nonmotorised areas (Luo et al., 2022;Zhang et al., 2022).Although SV data can partly reflect street quality, using them to evaluate street space can yield biased results due to divergent perspectives of pedestrians.Unfortunately, there is limited exploration into how this bias might impact the weighting of environmental variables that shape pedestrian preferences.Although walkability studies are highly dependent on pedestrian space preferences, collecting data from a pedestrian's point of view remains challenging (Luo et al., 2022;Xuan & Zhao, 2022;Zhang et al., 2022).Whether the results of the street evaluation based on SV data vary from different perspectives remains an open question for future research.Furthermore, location without SV images may result in, for example, lower green accessibility on the macroscale due to data insufficiency in parks or pedestrian paths (Luo et al., 2022).SV data are discontinuous due to slower updates compared to urban construction progress (Yue et al., 2022) and data collection at varied times or seasons (Luo et al., 2022).This leads to a misrepresentation of actual traffic flow and human activity (Porzi et al., 2015;Yin et al., 2015).This can affect the quality of the model training and the accuracy of the prediction.Additionally, the capture of SV data may introduce distortions (Yin et al., 2015) or fuzzy pixels (Chen et al., 2020;Koo et al., 2022a), especially at night (Song, Zhou, et al., 2022).At the same time, fogged and tiny objects (Song, Zhou, et al., 2022;Xue et al., 2021) are difficult to extract by ML models.When the data used to train pre-trained models differ significantly in size or style from new SV data, the pre-trained model's predictions may fall short of expectations (Xue et al., 2021).

Sources of POI data
Point of interest (POI) data represent services (e.g.restaurants, museums, transport hubs) in a city.In most studies, researchers obtained data through map API services.The sources of POI data are among the most diverse of all data types.Popular options include the AutoNavi map API and the OpenStreetMap API.Other POI data sources are evenly distributed in rest of studies (Fig. 5).

Purposes of using POI data
Studies generally integrated POI data with other data sources (e.g., SV data, survey data) to explore association of variables, predict urban quality, or classify neighbourhoods.Given that POI data reflect aspects of the built environment, such as attractiveness of a location (Chen et al., 2022;Deng et al., 2020), the researchers used them to correlate the built environment with walking behaviours (Chen et al., 2022;Deng & Yan, 2019;Wu et al., 2022).Some studies also showed that certain types of POI data (e.g., density of amenities, distance from public transportation) influenced pedestrians more than others (Chen et al., 2022;Deng et al., 2020;Wu et al., 2022).Chen et al. (2022) found that pedestrians favoured locations with high-density public or commercial POIs over enterprise POIs.POI data also played an important role in directly predicting streetscape quality (Zhang et al., 2019), urban vitality (Guo et al., 2021) or walkability (Alfosool et al., 2022b;Deng et al., 2020;Zhang et al., 2022).

Advantages of POI data
The volume of POI data varies significantly depending on the scales or purposes of studies.Studies exploring the correlation between the environment and perceptual variables from humans on a large scale often require hundreds of thousands of data points (Chen et al., 2022;Yao et al., 2019).In contrast, fewer POI data are needed for studies on the neighbourhood scale (Benita et al., 2020;Kim et al., 2022).When combined with other data sources, the value of POI data gauges different aspects of neighbourhood quality.Qiu et al. (2021) found that the density of POI data reflects urban complexity.Orama et al. (2022) showed that the POI data reflect the historical and religious attraction.The variety of POI data arises from various types of services present in a city.The velocity of POIs is evident in real-time updates of organisations on maps and data collection through APIs, allowing ML to derive valuable insights.

Weaknesses of POI data
Although POIs reflect various dimensions of urban functioning due to their diversity, this characteristic also complicates data categorisation.Most studies used POI data to calculate diversity (n = 18) or density (n = 13) of POI at a given site, but inconsistencies arise in POI types and classification methods.The origin of POI data from various platforms further exacerbates this lack of consistency.In particular, almost all studies relied exclusively on POI data from a single platform, posing a substantial hurdle for conducting meaningful comparisons between different studies.For example, land use mix, as an essential indicator of walkability, was frequently calculated through POIs in studies (Huang et al., 2020;Yang et al., 2021;Zhang et al., 2019).However, the specific POIs chosen for this calculation varied significantly due to disparities in POI data across different platforms.Furthermore, since POIs are generated by humans, the quality of data can vary between different platforms (Yeow, Low, Tan, & Cheah, 2021), which can potentially affect the overall quality of the research.Additionally, Wu et al. (2022) discussed that POI data struggle to represent multi-functions for single POI data (e.g.restaurant as a place of consumption or workplace) and did not reflect street vitality alone.

Sources of handheld/wearable device data
Data from handheld/wearable devices encompass information gathered through smartphones or bio-signal sensors.Most studies relied on mobile GPS positioning data, which were passively collected through mobile services or Internet connections.The mobile application has a similar function (location sharing) as the mobile GPS positioning data (Althoff et al., 2017).Another prevalent data type is GPS trajectory data that illustrate pedestrian movement on streets.Unlike the previously mentioned data sources, which were mainly used for large-scale studies, video camera and bio-signal device data were primarily employed in street-level studies, measuring pedestrian stress levels or behaviours (Fig. 6).

Purposes of using handheld/wearable device data
Handheld/wearable device data served various purposes.First, the spatial and temporal distribution of the pedestrians was measured using mobile GPS positioning data (Huang et al., 2020;Wu et al., 2022;Yamagata et al., 2019;Zhang et al., 2019) or mobile application (Althoff et al., 2017).Second, the correlation between built environment variables and walking behaviours were assessed through GPS trajectory data Fig. 5.The relationship between proportions of POI data sources and different study purposes.(Basu & Sevtsuk, 2022;Miranda et al., 2021;Tribby et al., 2017) or video cameras (Li, Yabuki, & Fukuda, 2022b;Suminski Jr et al., 2019).Third, the impact of the built environment on health can be studied through bio-signals (Berzi et al., 2017;Kim et al., 2022).Additionally, GPS positioning data were also applied to assess urban vitality (Guo et al., 2021) and street quality (Zhang et al., 2019).

Advantages of handheld/wearable device data
The volume of handheld device data often ran into hundreds of millions of data points when studies focused on understanding the relationship between built environment variables and walking behaviours using mobile GPS positioning data (Guo et al., 2021;Nagata et al., 2022) or GPS trajectory data (Miranda et al., 2021).In contrast, studies using bio-signal (Kim et al., 2022) or video cameras (Li, Yabuki, & Fukuda, 2022b) required less data on the neighbourhood scale.Handheld/ wearable device data hold immeasurable value compared to small-scale observation or interviews for understanding pedestrian preferences.Large GPS positioning data or trajectory data reveal attractive places and public preferences on a citywide scale.This helps urban planners in identifying beneficial environmental variables to improve pedestrianfriendly infrastructure and attract more pedestrians.While only Althoff et al. (2017) used data collected from smartphone applications (e.g., smartwatch) to correlate physical activity with urban walkability, the potential of this study is significant due to its global scale.The variety of handheld/wearable device data is evident in individual activity level from wearable devices (Iqbal, Mahgoub, Du, Leavitt, & Asghar, 2021) and the user location information from handheld devices.Although challenges such as the high cost of data acquisition, privacy concerns (Liang, Zhao, Shetty, Liu, & Li, 2017) and processing difficulties limit data usage, there is considerable potential for growth, offering future quantifiable insights into individual and general walking needs.The velocity of handheld/wearable device data is apparent through constantly updated data through the sensors (Ates, Yetisen, Güder, & Dincer, 2021;Herrera et al., 2010), though data processing is technically demanding.Special techniques such as Hidden Markov map matching (Basu & Sevtsuk, 2022;Santucci et al., 2018) or GraphHoper (Miranda et al., 2021) were used to match GPS trajectory data with road segments to achieve meaningful results.

Weaknesses of handheld/wearable device data
Conducting neighbourhood-scale studies using portable equipment to collect bio-signal data and demographic information is feasible for evaluating individual activities.However, scaling up becomes challenging due to the intensive cooperation required from participants.Most GPS positioning data or GPS trajectory data originated from mobile service providers or map services, often incurring substantial fees (Miranda et al., 2021;Yamagata et al., 2019).Additionally, GPS trajectory data often contain high levels of noise (Basu & Sevtsuk, 2022;Santucci et al., 2018), which requires technical expertise to match them with road networks (Sevtsuk et al., 2021;Tribby et al., 2017).
Meanwhile, data collected passively by third parties anonymised personal information on a large scale, revealing only general walking preferences (Basu & Sevtsuk, 2022;Santucci et al., 2018), making it difficult to find improved solutions for different groups of users.

Sources of social media data
Social media data are generated on online platforms where users engage in various activities such as writing mini-blogs (e.g., Twitter, Sina Weibo), sharing feedback on services (e.g., Ctrip, Tripadvisor), or posting images (e.g., Flickr).Our analysis revealed that half of all studies utilising social media data focused on Twitter or Flickr data.Approximately a quarter of the studies used Chinese social media platforms (e.g., Sina Weibo and Dazhong Dianping), while a small fraction derived data from Tripadvisor, Instagram and Ctrip (Fig. 7).

Purposes of using social media data
Social media data were predominantly used for sorting tasks compared to other types of data.Studies leveraging Twitter data have categorised neighbourhoods based on users' opinions on urban space quality (Guhathakurta et al., 2019;Le Falher et al., 2015;Marsillo et al., 2022;Sandoval Olascoaga et al., 2016) or integrated tweets with other data sources to identify mobility groups (Orama et al., 2022).Flickr, a photo-based platform, served diverse purposes.Zhang et al. (2020) compared the Flickr data from locals and tourists to differentiate their preferences in attractiveness.Berzi et al. (2017) and Sottini et al. (2021) counted the frequency of the photo shoot in various locations, linking the shoot rates with environmental variables across different neighbourhoods.Flickr data were also used to assess street safety and walkability (Quercia et al., 2015).Sina Weibo was utilised to explore popular pedestrian routes in historical towns (Xie et al., 2022).Dazhong Dianping, Ctrip and Trip-Advisor are online agencies that provide information on local services (e.g., restaurants, hotels).However, researchers here focused on the social media features from those platforms, analysing user-generated content to find the relationship between built environment variables and perception (Song et al., 2021;Tang et al., 2022), or to predict street quality (Wang, He, et al., 2022) and urban vitality (Huang et al., 2020).

Advantages of social media data
The volume of social media data in studies typically ranged from few thousands to tens of thousands, which is relatively small compared to the usage of the previously mentioned data sources (Guhathakurta et al., 2019;Xie et al., 2022).The variety of social media data lies in various formats (e.g., text, images, videos), offering multidimensional urban perspectives.Moreover, analysing data from the same platform as Flickr can yield various results depending on perspectives, such as discovering Fig. 6.The relationship between handheld/wearable device data sources and study purposes.the focus of tourists on architecture and locals' on green spaces (Zhang et al., 2020) or revealing hotspots in cities (Berzi et al., 2017;Sottini et al., 2021).The velocity of social media data surpasses other data (e.g., SV data) due to frequent user updates.Meanwhile, NLP and computer vision techniques enable experts to swiftly assess opinions on urban space quality, surpassing traditional surveys or interviews.The value of social media data includes several aspects.Geotagged social media data can unveil place popularity distribution (Berzi et al., 2017) on a large scale or identify the attractiveness for different demographic groups (Zhang et al., 2020).Unlike traditional walkability assessments, social media introduce a bottom-up approach to understand the real needs of pedestrians (Berzi et al., 2017).They were also used for various purposes to assess qualities such as aesthetics (Quercia et al., 2015), street quality (Wang, He, et al., 2022), and safety (Quercia et al., 2015).

Weaknesses of social media data
While social media data offer insights into people's opinions, its processing poses challenges due to its diverse forms and humangenerated content.For example, interpreting topics in text-based social media data requires substantial effort (Song et al., 2021;Tang et al., 2022).Moreover, hashtags or text on image-based platforms may not reflect uploaded images (Xie et al., 2022) or accurately represent the environment (Berzi et al., 2017).Furthermore, sampling biases can arise from excluding offline individuals (Song et al., 2021;Tang et al., 2022;Xie et al., 2022) or including unidentifiable fake or advertising accounts (Song, Zhou, et al., 2022).Limited geotagged data (Guhathakurta et al., 2019;Xie et al., 2022) may further compound this bias.While we discuss user opinions on the quality of the neighbourhood, Guhathakurta et al. (2019) noted that users tended to express satisfaction rather than dissatisfaction with environmental variables, potentially exaggerating high-quality urban spaces while overlooking problematic spaces.

Other data sources
Here, we focus on the rest of the data that have not been widely used in research but have potential for future development.We divide the remaining data into three categories, with data representing the environment, data representing pedestrians, and data representing both.

Other data sources representing the environment
Only a fraction of the studies used satellite map data, Li-DAR point cloud data or weather data, all of which were employed to represent built environment variables.These studies aimed to predict urban quality (such as vitality (Scepanovic et al., 2021) and sidewalk quality (Jiang et al., 2022)) or explore the correlations between the environment, weather, and walking behaviour (Benita et al., 2020;Santucci et al., 2018).High-resolution satellite maps were used to assess spatial quality in cases where SV data were unavailable or difficult to obtain (Verma et al., 2021).The Li-DAR data offered superior spatial detail compared to SV data (Wu et al., 2021), as the high-density point cloud data, when combined with SV data, addressed the challenge of recognising tiny objects in environment (Zhang et al., 2017).

Other data sources representing pedestrians
Two data sources representing pedestrians including pedestrian counts data and webcam data.Pedestrian counts illustrate pedestrian distribution and reflect walkability of cites (Jacobs, 1993, p.271-272).
Strictly speaking, only one study used pedestrian counts data collected from the public sector through field observation (Lai & Kontokosta, 2018), while the other studies obtained this information through SV data (Chen et al., 2020;Xue et al., 2021) or GPS positioning data (Rossetti et al., 2019;Wu et al., 2020) mentioned earlier.Unlike static pedestrian count data, webcams provide continuous and dynamic data directly connected to computers, allowing researchers to reveal the spatial and temporal distribution of pedestrian patterns (Petrasova et al., 2019).

Other data sources representing environment and pedestrians
We found two distinct aspects of auditory data.When it is defined as traffic noise or ambient sound, studies usually used it to assess its correlation with pedestrian behaviours (Appleyard, 1980;Benita et al., 2020;Zhang et al., 2022).In contrast, a study also interpreted ambient sound as indicative of pedestrian activities and combined with image data to identify a high correlation between visual elements (e.g., trees, pedestrians) and ambient sound in historic towns (Xie et al., 2022).

ML for walkability
This section explores the application of ML in walkability studies.Broadly speaking, ML methods are geared towards predicting some quantities of interest based on low-level properties of a data point (Jung, 2022).Technically, these methods learn a predictor map that takes into account the features of a data point and produces a prediction for the quantity of interest.
The predictor map learnt by an ML method is constrained to belong to some hypothesis space or model.ML methods differ in the underlying design choices for data points, their features and labels, the model structure, and the loss function employed to score (and choose between) different predictor maps.
We delve into the design considerations regarding data, models, and loss functions used by ML applications in walkability studies.To facilitate the discussion, we divide the previously defined data sources in Figure (Fig. 3) into visual data, textual data and non-image/text data (Fig. 8).We excluded road network data, as it serves as a query medium for other data sources (e.g., SV data) rather than being processed by ML algorithm.In Section 5.1, we discuss ML methods for visual data (e.g., images or videos).In Section 5.2, illustrate NLP techniques that train ML models from text data, which might also be obtained indirectly by transcription of audio recordings.Section 5.3 provides information on ML methods applied to non-image/text data (e.g., POIs or handheld device data).Besides the choice of data points, ML methods are also distinguished by their choices of models and loss functions.The corresponding design choices can be guided by the regularisation techniques discussed in Section 5.4.
These different data types correspond to specific choices for the features of data points.For example, an image is naturally characterised by pixel colour intensities, whereas text is a sequence of characters out of a pre-specified alphabet.In addition to its features, a data point is also characterised by labels, which represent higher-level information or quantities of interest.Regression methods use data points with numeric label values, while classification methods apply to data points with discrete or categorical labels.Another distinction between ML methods is whether they require training data points with known labels.The latter class of ML methods is known as unsupervised learning and includes clustering or feature learning methods (Fig. 8).

Computer vision for walkability
Visual data includes images, videos, and point-cloud data.We categorised the data processing methods into five sub-classes (Fig. 9).

Image classification
Image classification models predict high-level information of images.Given the complexity of the image features, training such models requires substantial data.They were trained to predict perceptual attributes such as safety (Dubey et al., 2016;Porzi et al., 2015), liveliness (Zhang et al., 2018), boring (Wei et al., 2022) and perceived walkability (Li, Yabuki, & Fukuda, 2022c)

Object detection
Object detection models identify objects (e.g., cars, trees, sidewalks) in images using bounding boxes (Adams et al., 2022;Song, Zhou, et al., 2022;Verma et al., 2021).The most commonly detected objects were pedestrians to calculate pedestrian volumes (Chen et al., 2020;Xue et al., 2021;Yin et al., 2015).Edge detection, another type of detection model, can predict the edges of objects on satellite maps to built vector maps (Verma et al., 2021) or estimate visual complexity of streets (Lee et al., 2022).Region-based convolutional neural networks (R-CNN) and Canny Edge accounted for one third of studies that used object detection (Fig. B.3).

Image generative models
Image generative models were used mainly to generate SV data (Joglekar et al., 2020;Noyman & Larson, 2020), road network data (Fang et al., 2021;Hartmann et al., 2017) or urban maps (Shen et al., 2020).Many studies here have utilised models based on generative adversarial neural networks (GANs) (Fig. B.4), which generate images by learning the features of the images (Salimans et al., 2016).However, few studies used generative models due to the challenges associated with controlling the quality of the results (Hartmann et al., 2017;Joglekar et al., 2020;Noyman & Larson, 2020).

Image clustering
Unsupervised clustering is not common in our literature to use for processing images directly.Only one study used an ISO clustering model to cluster tree pixels from satellite images to assess streetscape quality (Verma et al., 2021).5.2.Natural language processing for walkability ML models can be trained using textual data sourced from social media platforms and labels obtained from ambient noise sources (Fig. 8).These natural language processing (NLP) methods include text classification, as well as generative methods.

Text classification
In text classification, social media data was processed through NLP models to discern neighbourhood preferences (Guhathakurta et al., 2019;Sandoval Olascoaga et al., 2016), which were then integrated with other environmental variables (e.g., walkability, connectivity) to classify neighbourhoods based on their environmental and emotional characteristics (Marsillo et al., 2022).Unlike other approaches that analysed text data directly, Xie et al. (2022) analysed emotions from auditory data to correlate them with environmental variables (e.g., vegetation, pedestrians) extracted from images.The common NLP models used here included the K nearest neighbors (KNN) and the Naive Bayes (NB) models (Fig. B.5).

Generative model
The generative model for text analysis can reveal the hidden structure or topics by learning from the text data.Two studies applied this method using the Latent Dirichlet Allocation (LDA) model to distinguish neighbourhoods based on Twitter users (Sandoval Olascoaga et al., 2016) and analysed Tripadvisor data for the sense of place (Song et al., 2021).

Non-image/text data processing methods
Image and textual data have a well-defined structure which is given by the two-dimensional arrangement of image pixels or the sequential structure of text.However, around half of studies have used non-image/ text data (e.g., POIs, handheld/wearable device data, survey data) that can be converted into numeric or string variables for training ML models, which can explore a linear or non-linear relationship between variables.

Classification
Unlike image classification, the classification models here are based on numerical or textual data and determine non-linear relationship between variables.As a typical approach, computer vision techniques (e.g., semantic segmentation) were used to extract the values of variables from images and labels (e.g., safety, openness, aesthetics) or perceptual scores of the image were evaluated by experts.The variables were then fed into the classification models to predict perceptual scores or labels.Over a quarter of this part of the studies used random forests (RF), which are capable of predicting labels without data scaling (Lee et al., 2022).Logistic regression (LR) models were also used to predict, for example, pleasant walking locations (Yang, Zhang, et al., 2022) or street level walkability (Koo et al., 2022b).SVM was used to predict perceived safety (Ma et al., 2021;Porzi et al., 2015) or street greenery (Ye, Richards, et al., 2019).Wu et al. (2021) used gradient boosting (GB) to predict the categories of point cloud data, similar to studies using semantic segmentation to extract environmental variables from street images (Fig. B.7).

Clustering
Clustering models group data according to the data distribution, using distances between data points or data density (Madhulatha, 2012).Three main types of clustering models (K-mean, hierarchical clustering, density-based spatial clustering) were used (Fig. B.8).Studies used clustering to classify neighbourhoods by analysing built environment variables like POIs (Deng & Yan, 2019;Le Falher et al., 2015;Zhang et al., 2019) or to find streets with similar features (Zhang et al., 2019) or space qualities (Guo et al., 2021;Marsillo et al., 2022).

Other ML techniques used in walkability studies
High-dimensional ML models such as complex neural networks are capable of capturing useful information from raw data but also have the risk of being vulnerable to overfitting, where a trained ML model performs exceptionally well on the training set but delivers poor predictions on new data points.
The tendency of ML methods to overfit a training set typically depends on the relationship between the dimension of the ML model and the size of the training set (Jung, 2022, Ch. 6).Unfortunately, some applications do not allow to easily increase the training set by collecting more data points as it might be too costly.Regularisation techniques address this challenge by either reducing the effective model size or increasing the size of the training set by data augmentation (Jung, 2022, Ch. 7).In the following discussion, we explore two specific regularisation techniques, including feature reduction and transfer learning.

Feature reduction
Feature reduction involves decreasing the number of features that describe a data point.In general, the number of features fed into an ML model (e.g., a deep neural networks) is proportional to the model size.Therefore, reducing the number of features results in a smaller model size, making overfitting less likely.Only six studies have employed this method, all opting for principal component analysis (PCA).PCA can be applied to various data sources, including image data for predicting visual quality (Scepanovic et al., 2021;Wu et al., 2020), POI data for predicting walkability (Deng et al., 2020), or text data for predicting urban diversity (Marsillo et al., 2022).These instances demonstrated that PCA could facilitate data processing in walkability studies, while challenges such as interpreting non-linear relationships or selecting the optimal number of components were rarely addressed (Mwangi, Tian, & Soares, 2014).

Transfer learning
Transfer learning involves leveraging the training of different ML models that applied to distinct yet related learning tasks.It saves time and requires less data compared to building a model from scratch.Eight studies used transfer learning, seven of which used it to process image data to predict sidewalk quality (Adams et al., 2022;Duan et al., 2022;Weld et al., 2019), street space quality (Koo et al., 2022a;Middel et al., 2019;Wang, He, et al., 2022), or perceived walkability (Scepanovic et al., 2021).Only one study mentioned applying transfer learning directly based on their previous trained model to predict walkability in new cities (Alfosool et al., 2022a), highlighting its underutilisation in walkability studies, aligning with our review findings.

Big data representing environment and user
In terms of data representativeness, big data can represent either environment variables or user-related variables.Environmental variables are often represented by certain data sources (e.g., SV data, POI data), which studies typically use directly to assess the built environment.On the other hand, user-related variables can be reflected in data (e.g.handheld/wearable device data, social media data) collected from users, where studies emphasise pedestrian behaviours or perceptionrelated studies.
Big data has demonstrated its effectiveness in quantifying environment variables.In particular, image data processed through computer vision techniques allow for quantitative assessment of objects in images in a short period of time.The intention of using image data is to swiftly and comprehensively identify and analyse environmental variables, as this is precisely the aspect where image recognition may outperform manual efforts.However, the benefit of image recognition is contingent upon its ability to accurately identify diminutive objects that can be potentially crucial as environmental variables (e.g., street furniture, stairs) in an image, which happens to be the current challenge (Song, Zhou, et al., 2022;Xue et al., 2021).This might also be the reason why most studies have focused on choosing variables (e.g., sky, building, road, tree) with a larger proportion of pixels in the images.Since the pedestrian perception of space is not only affected by the variables we mentioned, future research may benefit from integrating high-density point cloud data, which can accurately pinpoint small objects in space, with high-resolution satellite imagery and SV data to enhance environment evaluation.
At the same time, there remains considerable potential for research that focuses on other environmental-related data.Given that few studies have explored similar POI data for their research, coupled with the fact that POI data have multi-functions that differ according to their definitions (Wu et al., 2022), using POI data in walkability studies might skew results and overlook vital variables that affect pedestrians due to the absence of an established standard.It is necessary to quantify the hierarchical impacts of individual POI types on pedestrians by establishing a systematic standard for POIs evaluation in the future.Although data and data processing methods are flourishing, we find that the combined use of data types is underdeveloped.For example, noise and weather data, which are vital for walkability studies (Jacobs, 1993;Jan Gehl, 2013) and can be easily processed using ML, have been underutilised in research.Future studies should explore the integration of diverse data types beyond existing ones to enable more accurate assessments of urban environments.
Big data has also demonstrated its potential to understand userrelated variables (e.g., perception, behaviour).Image data can be used not only to quantify environmental variables but also to explore the relationship between environmental factors and perceptual attributes.Although studies have collected perceptual attributes globally (Dubey et al., 2016), little attention has been paid to whether cultural differences influence environmental variables that affect walking behaviours.Researchers should consider the influence of culture on pedestrian perceptions and look for differences in the effects of environmental variables on the same perceptual variables in different cultural contexts.Image data also represent pedestrian behaviours.While many studies have demonstrated the precision of object detection in identifying pedestrians (Chen et al., 2022;Yin et al., 2015) to assess their spatial distribution, little has been asked about the extent to which image data reflect the actual number of pedestrians in reality.Accurate image recognition becomes meaningless for analysing pedestrian distribution if the images themselves do not accurately represent the true distribution of pedestrians in space.
We recognise the potential of social media data in perception-related studies, although its number is much smaller compared to SV data.Social media data offer insight into people's perceptions of their neighbourhoods, which may be more effective than traditional surveys (Martí et al., 2019).However, the tendency for individuals to express positive views on social media platforms (Guhathakurta et al., 2019) may mask problematic urban spaces.Unfortunately, there is a lack of research that incorporates the psychological behaviours of individuals when using social media data, which raises questions about the reliability of such data in urban studies.Meanwhile, studies that have worked to sort neighbourhoods have either used POI data to cluster service-based neighbourhoods or relied directly on social media data to cluster opinion-based neighbourhoods.There remains a gap in research on whether there is a correlation between these two types of clustering results or those based on other types of data.Future research may encounter challenges in accessing limited social media data due to personal data protection legislation.Given that social media data are increasingly being used as a substitute for traditional surveys, researchers should establish data agreements with social media platforms or individuals directly, just as respondents in surveys have the right to be informed about the processing of their information.More importantly, considering the previously mentioned limitations of social media data, researchers should pivot towards developing human-computer interactions for urban development to understand users' needs directly, rather than relying solely on interpreting random social media content to improve urban environment quality.
The effectiveness of handheld/wearable device data in studies related to walking behaviours aligns with findings of previous studies (Rout et al., 2021).However, the availability of such data remains contentious due to issues surrounding company ownership and individual privacy protection (Yamagata et al., 2019).For example, the use of mobile GPS trajectory data, which is designed to understand pedestrian route preferences at the street level (Miranda et al., 2021), is often hindered by privacy concerns or local regulations.While mobile GPS data can reflect the distribution and movement of people on a large scale, its accuracy poses challenges in computing pedestrian behavioural patterns at the street level (Rout et al., 2021).Making handheld/wearable device data publicly available while safeguarding user privacy could propel research in this area forward.However, achieving this goal will require collaborative efforts in various sectors of society.
Big data has improved the efficiency of walkability research despite its limitations.Regarding data collection related to human perception and walking behaviours, which used to be labour intensive, it is now possible to conduct more comprehensive evaluations through automated data collection methods.Nevertheless, the rapid proliferation of big data in walkability studies makes it even more important to establish standardised practices in data usage to ensure research precision, which unfortunately is lacking in current studies.

ML enhancing data processing but requiring criteria
ML methods have proven valuable in quantifying visual environment variables and uncovering correlations between these variables.Future research should leverage ML to simultaneously process multiple types of data to evaluate various facets of the urban environments.
The application of generative methods is still in its infancy due to difficulties in controlling output quality and the absence of benchmarks (Hartmann et al., 2017;Joglekar et al., 2020;Noyman & Larson, 2020).Nevertheless, we expect a growing role for generative methods in the short term as both data and models continue to evolve and be refined (Mikołajczyk & Grochowski, 2018).
Although many studies have shown that the prediction results of ML models are generally in accordance with expert opinions (Ye, Zeng, et al., 2019;Zhang et al., 2019), ML has primarily served as an aid rather than a replacement in urban studies.The ultimate decision-making authority still rests with humans.Future research should focus on developing customised ML solutions tailored to urban studies to assist urban planners rather than replace them.
The importance of having distinct datasets (training, validation, testing), where the performance of a model in the test data reflects its true capabilities, has been emphasised by (Jung, 2022).However, this aspect was rarely mentioned in most studies, where researchers mainly discussed the validation results.Furthermore, not many studies demonstrated how they implemented loss functions, which represent model accuracy.
Many models and datasets are not publicly available for reuse.Furthermore, common regularising tools in the ML field, such as transfer learning and feature reduction, are rarely mentioned in the reviewed studies.Given the increasing adoption of ML-based techniques in walkability studies, establishing criteria concerning aspects such as data regularisation, loss function, and data openness for ML in this domain is necessary to ensure model quality and reproducibility.
ML models are generally developed by researchers without an urban planning background, making them unsuitable for directly predicting essential urban factors such as walkability.In the future, urban studies using big data and ML should involve interdisciplinary experts to effectively address field-specific needs.

Conclusion
In our review, we introduce a framework that includes data sources, ML methods, and research purposes.Rooted in a comprehensive understanding of big data, we thoroughly examine the acquisition and specific utilisation of data, highlighting their respective strengths and weaknesses.We illustrate how various data sources represent environment variables and users.This elucidates the pivotal role of big data in walkability studies.This information extends beyond walkability studies and can be harnessed in various aspects of urban studies.For example, image data and POI data can help evaluate the quality of green spaces or public spaces.Handheld/wearable device data may hold potential for predicting different transportation modes.Social media data may offer a bottom-up assessment approach, enabling decision-makers to gain a deeper understanding of the needs of citizens.Furthermore, we systematically categorise ML methods for data processing, shedding light on their role in the context.This framework not only underscores the current challenges in walkability studies, but also guides future research directions.An objective understanding of data and ML for data processing is particularly important in the current context of burgeoning big data and applications of artificial intelligence.Our adaptable framework provides a valuable research approach, applicable for future literature reviews in other studies on urban quality or mobility.It enables exploration into the roles of big data and ML in diverse urban research domains.

Declaration of competing interest
None.

Fig. 1 .
Fig. 1.The publication trend based on searched keywords from Web of Science, Scopus, Google Scholar between 2000 and 2022.

Fig. 2 .
Fig. 2. The procedure of literature selection and analysis.(1) Searching peer-reviewed literature with keywords between 2015 and 2022 in English from literature sources; (2) Initial results from three literature resources; (3) Removing duplicate articles and irrelevant articles by reading; (4) Obtaining selected literature for reviewing; (5) Analysing literature based on three components.

Fig. 4 .
Fig. 4. The relationship between proportion of SV data sources and different study purposes.

Fig. 7 .
Fig. 6.The relationship between handheld/wearable device data sources and study purposes.Fig. 7.The relationship between social media data sources and study purposes.
from images.The prediction results were then correlated with other environmental factors at the same locations.Approximately half of the studies that used image classification models applied Visual Geometry Group (VGG) series and Residual neural network (RestNet) (Fig. B.2).

Fig. 8 .
Fig. 8. Twelve data sources (exclude road network data) found in our literature are categorised into three groups (visual data, textual data and non image/text data) based on data structure and data processing methods (supervised learning and unsupervised learning).

Fig. 9 .
Fig. 9. ML methods used in different walkability studies based on their purposes.Semantic segmentation is the most popular ML method.*Note: Yellow: ML methods for processing visual data data; Blue: ML methods for other data; Green: ML methods for textual data.(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Fig. B. 6 .
Fig. B.6.The proportion of publication using regression models.

Fig. B. 7 .
Fig. B.7.The proportion of publication using classification for non-image and non-text data.

Fig. B. 8 .
Fig. B.8.The proportion of publication using clustering model for non-image and non-text data.