Virtual Assessment of Physical Activity-Related Built Environment in Soweto, South Africa: What is the Role of Contextual Familiarity?

Understanding how urban environments shape physical activity is critical in rapidly urbanizing countries such as South Africa. We assessed the reliability of virtual audits for characterizing urban features related to physical activity in Soweto, South Africa. We used the Microscale Audit of Pedestrian Streetscapes Global tool to characterize pedestrian-related features from Google Street View images in four neighborhoods of Soweto. Neighborhoods were selected to represent different levels of deprivation. Inter-rater reliability was analyzed according to the rater’s familiarity with the local area. The results show a higher inter-rater reliability was observed among auditors with greater contextual familiarity. Many measurements however, generated inconclusive results due to either low variability in the raters’ responses or the absence of the features in the streets. It is evident from our �ndings that virtual audits are e�cient tools that can be used to assess the built environment. However, to ensure meaningful use of these tools in diverse settings, we recommend that auditors comprise of people with contextual familiarity.


INTRODUCTION
By 2050, the majority urban growth is expected to occur in low-income and middle-income countries (LMICs) (UN-Habitat, 2022).South Africa exempli es this trend, experiencing dramatic urban expansion in recent decades (Ritchie et al., 2024).This pivotal moment of expansion, transformation in urban planning, and informal urban development directly in uences the built environment, which is closely linked to residents' physical activity levels (Bauman et al., 2012;Cerin et al., 2022).Understanding these dynamics is crucial for developing cities that promote healthier lifestyles (Lowe et al., 2022).
Globally, one in four adults and four in ve adolescents are insu ciently physically active despite the substantial health bene ts of physical activity (Guthold et al., 2018(Guthold et al., , 2020)).It was estimated that only 60% of South Africans are meeting the World Health Organization (WHO) recommended standards for PA (Basu et al., 2022).Despite the WHO targets to reduce physical inactivity by 15% by 2030 (WHO 2020) and the ambitious policies to create healthy cities that will increase physical activity, several studies have found that there is a gap between what has been implemented and what is needed to achieve these targets (Cerin et al., 2022;Lowe et al., 2022).However, lack of data regarding the built environment in LMICs cities is a barrier for creating healthy environments that support physical activity (Dixon et al., 2021).
Traditionally, the collection of the built environment data has been carried out through eld audits whereby assessors walk a predetermined route through a speci c area and use an observational form to assess prede ned environmental characteristics (Dixon et al., 2021;Phillips et al., 2017).The Microscale Audit of Pedestrian Streetscapes Global version (MAPS-Global) is one of the tools used to measure and characterize features of the built environment such as street characteristics, sidewalks, intersections, streets aesthetics and other design features which may help to explain physical activity variation within a population (Cain et al., 2018).However, where these assessments are most relevant and needed, such as highly urbanized African cities, very little evidence exists on the nature of the built environment.
Assessing the built environment as a rst step is important for measuring the association between features of the built environment and physical activity.
The use of virtual assessment tools has been advocated to reduce the time and resources required for conducting in-person audits (Fox et al., 2021;Mooney et al., 2017;Phillips et al., 2017).In high income countries, these tools have been found to be reliable assessments of the built environment (Andersen et al., 2021;Fox et al., 2021;Phillips et al., 2017;Zhu et al., 2017).Few researchers have explored the concept of contextual familiarity (living or working in the study area vs outside the study area) when assessing the reliability of the tools (Fox et al., 2021;Vanwolleghem et al., 2016;Zhu et al., 2017).This concept of familiarity becomes paramount when conclusions about the reliability of virtual tools are drawn from studies that do not contemplate African settings (Dixon et al., 2021;Rzotkiewicz et al., 2018).
The African setting is unique because of its informality and factors including rapid urbanization.Thus, the aim of this study was to measure the reliability among raters with different levels of familiarity to a highly urbanized African city using the MAPS-Global tool.

METHODS
We conducted virtual audits of the built environment in Soweto, South Africa.Soweto is located in Johannesburg and was established in 1931 as a result of spatial segregation laws during the apartheid regime in South Africa.The region is now an urban settlement characterized by varying levels of socioeconomic deprivation with a population of approximately 1.9 million people living in a 200 km 2 area (SAHO, n.d.).
We collected data from four small areas within Soweto: Chiawelo; Diepmeadow; Orlando East; and Protea Glen (Fig. 1).We purposively selected the areas to provide variation in socioeconomic deprivation (two areas of higher deprivation, two of lower deprivation).We determined deprivation level based on a methodology previously outlined (Prioreschi et al., 2022).
Figure 1: Reference map of the small areas audited and their deprivation level.
To assess features of the built environment we used the global version of the Microscale Audit of Pedestrian Streetscapes (MAPS-Global) (Cain et al., 2018;Fox et al., 2021).The MAPS-Global tool was developed by researchers from the University of California San Diego and validated across countries with varying built environmental characteristics such as Australia, Belgium, Brazil, China, Spain (Queralt et al., 2021).The instrument comprises four sections, collected along a prede ned route: (1) segment (measures block faces between intersections); ( 2) crossings (collects information on street intersections); (3) route (evaluates destinations and use, streetscape characteristics and aesthetic and social characteristics from a de ned origin to a de ned destination); (4) cul-de-sac (assesses amenities in dead ends).
For this study, the routes were chosen using a geographically strati ed sampling design.Speci cally, the selection process involved three key features: rstly, random households (extracted from the OpenBuildings dataset (Sirko et al., 2021)) were utilized as starting points; secondly, local points of interest (POIs) identi ed by the local team served as endpoints; and thirdly, the street network (from OpenStreetMaps (OpenStreetMap contributors, 2023)) functioned as the connecting routes between these starting and ending points.Routes had a length between 400m and 700m.For each small area, sampled routes covered 25% of the total street network, which was considered to give an adequate representation of the built environment in that small area (McMillan et al., 2010).Data collection was conducted using the Google Street View (GSV) functionality within Google Earth Pro, where the designated routes were uploaded.
The virtual audits were conducted between April and May 2023, and data collection took place in two phases.The auditing team consisted of 10 researchers collaborating with the Global Diet and Physical Activity (GDAR) network from ve different countries (South Africa, Nigeria, Cameroon, the United States and United Kingdom).There were three categories of auditors, seven with no experience of the Soweto context (none), three who worked in the Soweto area (context), and two auditors from Soweto who had conducted eld audits on the same streets nine months prior to the virtual assessment ( eld).The two eld auditors are also within the context group.
All auditors participated in an online training session to standardize the data collection methodology using the MAPS-Global material (MAPS, n.d.).Subsequently, each auditor was tasked to assess seven routes in the phase one, and three routes in the phase two.All auditors assessed the same set of routes.
Data entry was completed in REDCap, both via its online platform and mobile application.REDCap's functionality also enabled the upload of precise counts of segments and crossings (which varied by route), which were required for conducting the intraclass correlation coe cient (ICC) analysis (using R version 3.x).This study expands on the preliminary GDAR research assessing the built and food environments in four African cities (unpublished data), for which ve items from MAPS global were incorporated into a different assessment of the food environment.Therefore, these ve items were not scored in the current study.The list of items used in the current study can be found in supplementary table 1.
The inter-rater reliability of MAPS-Global was measured on several single-item indicators, sub-scales, valence scores (composite of positive or negative), and overall scores as described in Millstein et al., (2013).Numerical data was assessed with the ICC measurement and Cohen's kappa coe cients for categorical data using the package "pysch" in R version 3.x.For this study the ICC and Cohen's kappa were classi ed to indicate inter-rater reliability that was: 'excellent' (ICC ≥ 0.75), 'good' (0.60-0.74), 'fair' (0.40-0.59), and 'poor' (< 0.40) (Cicchetti, 2001).If the absence of features in the sub-scales, valence and overall scores creation was higher than 80% (i.e.places of worship, private recreational facilities, etc.), we excluded it from the analysis as there would not be enough variability for a correct interpretation of the ICC.

RESULTS
The feasibility and operational practicality of virtual assessments in Soweto We encountered two signi cant challenges in adhering to the MAPS-Global auditing procedures in phase 1 of data collection.Firstly, the coverage of several areas by GSV was incomplete, with GSV coverage for routes as low as 14.3%, limiting our ability to complete the audits in some streets.Secondly, inconsistencies in the number of segments and crossings recorded by different auditors hindered our ability to make comparisons.These challenges contributed to uctuations in assessment times and inter-rater reliability.
To address these issues in phase 2 of the data collection, we excluded routes with less than 75% GSV coverage.The challenge of discrepancies in the number of segments and crossings was partly derived from Soweto's street layouts, which differ from those assumed in MAPS-Global procedures.For example, some streets lack sidewalks, complicating the determination of safe pedestrian passages and crossing points.Acknowledging this limitation, the trainers standardized the number of segments and crossings for each route prior to audit, enabling consistent comparisons between auditors.
In addition to the initial challenges, we also encountered signi cant network issues, as the auditors' limited broadband access delayed the auditing process.During the debrie ng following the second phase of data collection, some auditors revealed they used Google searches to con rm or identify elements that were di cult to discern in the GSV images, particularly concerning land use, such as the presence of amenities along the route or the type of business present.
In the second phase, a total of three routes, 19 segments and 16 crossings were analyzed virtually by all auditors.Most of the images used were approximately one year old, although some were as old as 10 years.The second phase of data collection showed marked improvement from the rst phase, with GSV coverage rates reaching almost full coverage (96% vs. 76% in the rst phase) for all routes.The data collection process was signi cantly more e cient, with the mean assessment time for routes at 7.7 ± 6 minutes (compared with 12.3 ± 11.6 min.in the rst phase).Furthermore, the average time to assess segments and crossings was reduced to 4.1 ± 2.2 and 1.1 ± 0.1 minutes, respectively, (compared with 4.9 ± 6.3 min.and 2 ± 2.6 min.in the rst phase) indicating a more consistent and streamlined auditing procedure.Data entries where image date coincided with the collection day were excluded from the analysis as they were considered a methodological error by auditors.Cul-de-sacs were not included in the analysis as there were none on the routes audited.Detailed descriptions of each route are delineated in the supplementary table 2.
The in uence of familiarity in the IRR Overall, we found that contextual familiarity was associated with greater inter-rater reliability of virtual audits in Soweto.Figure 2 shows that for almost all the sub-scales, valence and overall scores, inter-rater reliability was higher when the online auditors were familiar with the context.We calculated measurements for only 30 out of 41 items or sub-scales, adhering to the criterion that required more than 80% presence for calculation.
Figure 2: Inter-rater reliability for virtual MAPS-Global assessments in Soweto, delineated by color-coded thresholds.

Routes
Detailed results for route reliability subscales and valence scores are presented in Table 1.In overall, for the route section, reliability was markedly lower compared with segment and crossing, with no clear pattern by familiarity.

Destinations and land use
We evaluated ve out of eight positive sub-scales and single items, because three (place of worship, public and private recreation) had over 80% zeros.Interestingly, in the destinations and land use section, unfamiliarity with the local context was associated with higher reliability, in contrast with the valence and overall scores.Notably high agreement between auditors who were not familiar with the context was observed in the assessment of residential mix (ICC = 1.00, 95% CI [1.00, 1.00]), and restaurants and entertainment (ICC = 0.83, 95% CI [0.39, 1.00]), and institutional services (ICC = 0.73, 95% CI [0.21, 0.99]).
Conversely, agreement on the numbers of schools (ICC = 1.00, 95% CI [1.00, 1.00]) and shops (ICC = 0.97, 95% CI [0.22, 1.00]) was higher in eld-experienced auditors compared to the other familiarity groups.Among negative subscales, none of the raters identi ed any age-restricted bar or nightclub.The presence of liquor or alcohol stores yielded near-perfect agreement between all familiarity groups, with the eld and context team achieving a perfect score.Upon measuring positive and negative valences along with the overall scores, the in uence of familiarity became evident, as the eld group exhibited higher ICC values than the other groups.

Streetscape characteristics
The streetscape's positive subscale revealed no cycling infrastructure across audited routes.Notably, the mean count for all familiarity groups was below three streetscape features per route (out of a maximum of 22) noting a very low presence of amenities in the surveyed routes.The ICC for all familiarity groups was poor, or with negative values, implying that any agreement among raters was lower than what would be expected by chance alone.

Aesthetics and Social
Both the positive and negative subscales, as well as the overall aesthetics and social scale, demonstrated a lack of consensus among auditors, unaffected by their familiarity with the area.Notably, no measurements exceeded 0, suggesting either random variations in ratings or lower agreement than by chance.

Segments
Detailed results for segment reliability subscales, valence and overall scores are presented in Table 2. Two categories, cycling infrastructure and informal path or shortcut, had over 80% zeros, indicating a lack of these features in the audited areas.Familiarity with the local context variably in uenced agreement levels for the different segment sub-scales.The eld group showed higher agreement between the auditors in both the positive (ICC = 0.85, 95% CI [0.64, 0.94]) and negative (ICC = 0.85, 95% CI [0.65, 0.94]) valence scores, as well as the overall score.All the subscales with exception of the buffer had the eld or context group achieving the highest reliability.

Crossings
Detailed results for crossing reliability subscales, valence and overall scores are presented in Table 3.
The only positive crossing subscale that did not have more than 80% zero's was the intersection control and signage sub-scale and this still only reported the mean number of features as < 1/ crossing.Notably, crosswalk amenities, which are crucial for safe road crossing, had 90% zeros.In the negative subscale assessing road width, the group unfamiliar with the context showed no agreement, while those with context knowledge scored fair to excellent agreement.In the overall score, only the eld auditors reached an excellent score and the other groups a fair reliability.

Grand score reliability
Detailed results for the positive, negative, and nal overall score are presented in Table 4. Across all three scales we observe a familiarity gradient, with the eld group having higher reliability than the other groups.However, the mean scores among rater familiarity groups are notably similar, indicating that the presence or degree of context familiarity does not markedly distinguish these groups.We acknowledge the limitations of our study posed by a small sample size, due to a combination of challenges, such as image coverage and limited time resources.While certain measurements, such as the pedestrian buffers, showed high percentage of agreement across all auditors, the statistical measures of reliability, such as ICC or Kappa, indicated lower values.This discrepancy stems from limited variability in exposure, where we obtained low ICC or Kappa despite a high percent agreement (McHugh, 2012;Zhu et al., 2017).However, the adoption of virtual auditing markedly decreased the costs associated with conducting audits, offering a more economical alternative to traditional in-eld methods.
The auditors categorized as eld familiarity for this study conducted in-eld audits in the same area nine months prior, and the virtual audits were completed faster (unpublished data).Additionally, while in-eld audits necessitated pairs of auditors for safety reasons, virtual audits allowed individuals to work solo, providing exibility and the comfort of conducting audits from any location.

CONCLUSION
We advocate for the thoughtful application of virtual audits in highly urbanized African cities. Utilizing raters familiar with the local context will help to ensure the bene ts of virtual audits, including e ciency, resource allocation and safety, are realized.However, key factors need to be considered including image Figures Reference map of the small areas audited and their deprivation level.
Inter-rater reliability for virtual MAPS-Global assessments in Soweto, delineated by color-coded thresholds.

Table 1
MAPS-Global -route section item-level and subscale inter-rater reliability and descriptive statistics.

Table 2
MAPS-Global -route section item-level and subscale inter-rater reliability and descriptive statistics.

Table 3
MAPS-Global -route section item-level and subscale inter-rater reliability and descriptive statistics.

Table 4
(Koekemoer et al., 2017)ox et al., 2021;Gullón et al., 2015;Zhu et al., 2017)d descriptive statistics.To our knowledge, this is the rst study assessing the inter-rater reliability of MAPS-Global in an African urban context, and our ndings highlight the importance of local knowledge in applying research tools effectively.Our ndings suggest that auditors with local familiarity yielded more reliable audits compared to their international peers.Despite the global accessibility of virtual platforms like Google Street View and Google Earth for environmental assessment, our results underscore the value of contextual familiarity in enhancing the meaningful application and rigor of research tools.Incorporating contextual familiarity in global health research is crucial and at the same time an ethical responsibility when the tools we use have not been validated in the contexts where we work(Canelas et al., 2024).This practice risks oversimplifying complex realities and may lead to wrong or misleading conclusions.Rzotkiewicz et al., (2018)highlighted this gap, noting the absence of studies using Google Street View in Africa and limited research in Latin America and Asia, challenging the assumption of universal applicability for virtual audits.This study contributes from an African setting to the limited and inconclusive research on rater familiarity in evaluating the built environment.Two studies, one utilizing the MAPS-Global in Belgium and the other applying S-VAT tool in Norway, demonstrated that auditors with greater contextual familiarity or those Their studies present ndings from well-planned cities making it di cult to draw similar conclusions to a highly urbanized and dynamic environment of the Soweto township.Not only are LMICs underrepresented in virtual auditing, but there are also acute spatial inequalities in the amount of GSV coverage within cities depending on deprivation levels(Fry et al., 2020).Several authors have stated that virtual audits are a reliable alternative to in-person street audits but with a caveat that there is the need for high coverage and updated images(Fox et al., 2021;Vanwolleghem et al., 2016).Our study tackled the variability in coverage by selecting routes with at least 75% visibility.However, assessing the recency of GSV images posed a challenge, as image dates can vary widely even within the same location, depending on the viewing angle.Incorporating insights from a local team regarding acceptable image year ranges can signi cantly enhance the relevance of urban assessments, especially An unexpected challenge emerged from feedback sessions: some resorted to using search engines to identify unclear elements in images.This practice potentially introduced inconsistencies in the auditing process, emphasizing the need for clearer guidelines in the training to ensure uniformity in virtual environmental assessments(Fox et al., 2021;Griew et al., 2013;Gullón et al., 2015).Although it is not unusual to nd a high percentage of absence in some features of the microscale(Fox et al., 2021;Phillips et al., 2017), we found our study to lack many of the features of MAPS-Global.This highlights the lack of many essential amenities in these low-resourced settings but also raises the concern whether MAPS-Global was indeed the correct tool for our study site.The choice of MAPS-Global, was made collectively by the GDAR Network members (GDAR, 2023).To enhance representativity, future studies using global audit tools should consider the differences within and between neighborhoods, regions, and countries.It is important to note that developmental patterns, urbanization levels, land uses, and socioeconomic statuses of residents have different de nitions, interpretations, and representations across the world.Similar to other virtual audits of the built environment, the subjective features of the built environment such as the streets or building aesthetics had the lowest IRR across all the MAPS-Global measurements(Andersen et al., 2021;Fox et al., 2021;Gullón et al., 2015;Zhu et al., 2017).Our ndings indicated interrater reliability was highest in the land use section of the routes, similar to results fromZhu et al., (2017).The high frequency of null responses in the crossing section (indicating a lack of infrastructure to facilitate road crossing) is notable in a country where pedestrians constitute almost 40% of road tra c fatalities(International Transport Forum, 2019).Additionally, a study in a low-income community in South Africa, showed that half of the children walking to school alone report experiences with pedestrian collisions(Koekemoer et al., 2017).
conducting audits in-person reported higher inter-rater reliability(Andersen etal., 2021; Vanwolleghem et al., 2016).On the other hand, Fox et al., (2021) using MAPS-Global in ve HICs countries and Zhu et al., (2017) using MAPS-Global in the US suggested that familiarity does not signi cantly affect virtual audit outcomes.Most studies that have used virtual tools to characterize the built environment have been carried out in HICs with different environmental characteristics compared to LMICs (Andersen et al., 2021; Curtis et al., 2013; Fox et al., 2021; Kelly et al., 2013; Vanwolleghem et al., 2016; Zhu et al., 2017). in LMICs where rapid urbanization is prevalent (Ritchie et al., 2024; UN-Habitat, 2022).This dialogue with the local auditors is crucial due to the continuous and fast-paced urban changes, underscoring the necessity for up-to-date imagery in environmental audits of the built environment.Our auditors faced several challenges, including issues with image quality, outdated images, blurriness, and obstructions, similar to studies elsewhere (Andersen et al., 2021; Fox et al., 2021; Rzotkiewicz et al., 2018).