Objective assessment of urban built environment related to physical activity — development, reliability and validity of the China Urban Built Environment Scan Tool (CUBEST)

Background Some aspects of the neighborhood built environment may influence residents’ physical activity, which in turn, affects their health. This study aimed to develop an urban built environment evaluation tool and conduct necessary reliability and validity tests. Methods A 41-item urban built environment scan tool was developed to objectively assess the neighborhood built environment features related to physical activity. Six neighborhoods in Hangzhou were selected from three types of administrative planning units. A pair of auditors independently assessed all of the 205 street segments at the same time. Half of the segments (n = 104) were audited twice by the same auditor after a two-week time interval. Inter-rater reliability was assessed by comparing the audits of paired observers, while intra-rater reliability was evaluated by comparing an auditor’s repeated assessments of the same segments. The construct validity was tested using factor analysis. Results The inter-rater reliability for most items was above 0.8. The intra-rater reliability for most items was above 0.4, and was lower than corresponding inter-rater reliability. Six factors were extracted by factor analysis and the factor loading matrix showed good construct validity. Conclusions The CUBEST is a reliable and valid instrument that can be used to assess the physical activity-related built environment in Hangzhou, and potentially other cities in China.


Background
Physical inactivity is an important, modifiable behavioral risk factor for non-communicable chronic diseases [1]. Epidemiological studies have shown that physical inactivity is associated with increased risks of obesity, diabetes, cardiovascular disease, and other chronic diseases [2,3]. A growing number of studies have focused on the ecological context of physical activity [4], i.e. the influence of the residential built environment on physical activity patterns [5].
The built environment-the physical form of communities usually consists of 6 dimensions: (1) residential density; (2) street connectivity; (3) accessibility to destinations (land-use mixed) and services; (4) walking and cycling environment; (5) aesthetic quality; and (6) safety. Together, these elements shape access to opportunities for physical activity. Three types of data are usually used to measure built environment attributes believed to be related to physical activity: (1) perceived (subjective) measures obtained by personal interview with questionnaire; (2) observational (objective) measures obtained using systematic scans or audits; and (3) archival data sets that are often layered and analyzed with GIS [6].
China is experiencing unprecedented rates of urbanization, which have contributed to the decline of physical activity [7] and dramatically changes in built environment. For example, sprawling development, transportation infrastructures that emphasizes driving, and inconvenient, unsafe, and unpleasant public open spaces have been discouraging active transportation behaviors like walking and bicycling [8]. Thus, it is critical to accurately assess built environment characteristics before understanding how they correlate with physical activity in China. Some instruments have been developed for assessing the built environment in developed countries, yet few have been done in China [9]. Furthermore, unique built environmental features are presented in Chinese and other Asian ultra-dense cities, which could bring about different associations with physical activities. These include but not limited to mixed types of residential housing; crowdedness; a complex public transport network; a great number of overpasses; bike parking facilities; man-made obstacles on sidewalks and bike lanes; and unique types of destinations (e.g., historical buildings, tea houses, open-air food outlets). In additional, even the same feature of built environment could correlate with physical activity differently between in China and other Western countries. For example, positive relationship is usually observed between residential density and physical activity in the Western countries. But studies have suggested the possibility that densely settled Chinese cities could hinder leisure-time physical activity due to decreased availability of physical activity resources and increased concerns about traffic safety [10]. These facts show that the application of current instruments on assessing Chinese urban built environment is limited. Therefore, it is of vital importance to make necessary adjustments to have an instrument specifically designed for urban built environment in China.
In this study, we created an adapted instrument (observational measures obtained using systematic scans or audits) based on a review of existing reliable auditing tools. This instrument is specifically designed for urban built environment related to adult physical activity in mainland China, that is, the China Urban Built Environment Scan Tool (CUBEST). This article sought to report the reliability and validity assessment results of the CUBEST conducted in the city of Hangzhou, China.

Study site
The city of Hangzhou, which is the capital of Zhejiang Province, is situated in the southeast coastal area of China. Hangzhou exercises jurisdiction over eight districts, three county-level cities, and two counties. It is an economically developed city in China, and its comprehensive economic strength ranked eighth among all large-and medium-sized cities in China in 2011. By the end of 2011, the population of long-term residents in the city was 8.74 million, of which the urban population is 6.45 million, accounting for 73.9% [11]. Two districts located in a central geographic location of Hangzhou were included to test the reliability and validity of the CUBEST, i.e., Shangcheng District and Xiacheng District.
All administrative planning units in these two districts are classified into five categories [12] based on the degree of land-use mix and service capacity of public buildings. Public buildings usually consist of buildings used for government, commercial, educational, transport and health care purposes. A Type I unit is characterized by fully developed commercial and residential areas with dense population and mixed land use. A Type II unit has developed but scattered public buildings as a feature, lacking of comprehensive service capacity. A Type III unit is featured by partly developed and single functional public buildings. Type IV and type V units are mainly composed of farmland and industrial storage warehouses and were excluded from this study. A typical neighborhood in most urban areas of China usually shows a shape of square or rectangle with 0.2 to 0.5 square kilometers in area. In this study, we extended 400 meters out from each side of the neighborhood boundaries to form a study area with 1.0 to 1.5 square kilometers in area. Two neighborhoods were selected in each of the three types of units and all the street segments in these 6 extended study areas were evaluated using environmental audit instrument. A street segment was defined as a section of street or road between two intersections with a maximum length of 400 meters.

Development of the CUBEST
The CUBEST was designed based on a review of existing reliable instruments which have been developed since the year 2000, including Analytic Audit Tool [13], Active Neighborhood Checklist [14], SSO [15], PIN3 Neighborhood Audit Tool [16], Irvine -Minnesota Inventory [17], NALP [18], EAST_HK [9], SPACES [19], PEDS [20], WABSA [21], Sidewalk Assessment Tool [22], and PARA [23]. Table 1 shows the information on the abovementioned instruments assessing neighborhood built environment related to physical activity. An item pool was generated mainly from the Analytic Audit Tool, Active Neighborhood Checklist and EAST_HK after consideration of the date of development, sample size, comprehensiveness of dimensions, survey method and time cost of current instruments. Six dimensions were involved in the CUBEST, including residential density, street connectivity, accessibility (land-use mix), sidewalk quality, bike lane quality, and aesthetic. Items belonged to the dimension of safety from traffic were integrated into the dimensions of sidewalk and bike lane quality. Meanwhile, necessary adjustments were made to fit in the characteristics of Chinese urban settings. Items with Chinese and local characteristics were added, such as "Cycling or walking against the flow of traffic", "Cultural features (historical buildings) in the neighborhood", "Government sponsored public recreational equipments", "Tea house", "West Lake or Beijing-Hangzhou Grand Canal". Physical activity facilities were categorized to 9 kinds of destinations based on the PANES questionnaire [24].

On-site evaluation
The validation of the CUBEST was conducted in Hangzhou City from October to December in 2011. In-the-field audits of street segments was conducted by two raters who were also involved in the development of the CUBEST. A standard operating procedure for environmental audit was developed using detailed written instructions and field pictures to achieve uniformity in the performance of evaluation. A two-day intensive rater training was developed with the goal of making them become proficient at completing the measures, including explanation of the principles, operation, potential problems and solutions of the CUBEST and GPS positioning device. Before the evaluation was begun across the whole sample, two raters independently finished a pilot evaluation of two neighborhoods (about 60-80 street segments) using the CUBEST. Any discrepancies were resolved by another one of the developers of the CUBEST. In the formal survey, the two raters independently performed evaluations of a total of 205 street segments at the same time. Approximately half of the street segments (n = 104) were randomly sampled and independently scanned by the two raters for the second time after a two-week time interval. All environmental scans were conducted during daylight hours. The average time required for data collection was 7.4 minutes per segment.

Data analysis
Intra-rater reliability (i.e. test-retest reliability) was evaluated by the consistency of judgments made by the same rater over a period of time, and inter-rater rater reliability was assessed based on the level of agreement achieved by independent raters. The intra-class correlation coefficients (ICCs) were calculated to assess reliability for continuous and ordinal variables. Agreement among dichotomous variables was assessed using Cohen's Kappa statistic [25]. The adjectival ratings suggested by Landis and Koch [26] in the following categories were followed: 1.0 to 0.8 (almost perfect agreement), 0.8 to 0.6 (substantial agreement), 0.6 to  [27]. For items with zero or little variance, the percentage of observed agreement of two raters was used to assess the reliability. The criterion for good level was set to no less than 75%, as used by Pikora and colleagues [19]. Construct validity refers to the extent to which an instrument measures what it claims to measure. Exploratory factor analysis (EFA) with equamax orthogonal rotation was applied to assess construct validity. Factor analysis appropriateness was assessed by Kaiser-Meyer-Olkin (KMO) value (greater than 0.7), and with a significant level less than 0.05 for Bartlett's test of sphericity. Factor loadings with an absolute value greater than 0.40 were considered to be significant. The PASW version 18.0.0 (IBM Corporation, Somers, NY, USA) was used for data analysis.

Inter-rater reliability
A total of 309 street segments (205 for the first scan and 104 for the second) were assessed to test inter-rater reliability. The results of inter-rater reliability are shown in Table 2 and Table 3. Seven items were not suitable for computing Kappa statistic or ICCs due to zero variance or data distribution characteristics; nevertheless, all of them had very high percentage of agreement (all > 85%). For items with adequate variance, the Kappa statistic was used to assess intra-rater reliability of 25 dichotomous items and the ICCs were used for 17 ordinal items. For Kappa values, eighteen out of 25 (72%) dichotomous items reached almost perfect level of agreement, and the rest 7 items (28%) also had a substantial level of agreement. Twelve out of 17 (71%) items using ICCs reached good level of agreement, and 3 items (22%) were in fair level. Two items, "Evenness and Bike lanes maintenance" and "Cigar butts or discarded cigarette", showed poor reliability. All the six dimensions reached good level of inter-rater reliability (all ICCs > 0.75, p < 0.001). When the PABAK coefficient was applied, the inter-rater reliability showed varying degrees of improvement, especially for the items with higher prevalence index.

Intra-rater reliability (test-retest reliability)
A total of 208 street segments (104 segments for each rater) were evaluated to test intra-rater reliability. Table 2 and Table 4 indicate the results of intra-rater reliability for each item. The percentage of agreement was calculated for 9 items with zero variance or specific data distribution. And all of them had very high percentage of agreement (Evenness and sidewalks maintenance = 83%, rest 8 items > 90%). For the 24 dichotomous items with adequate variance, Kappa statistic was used to assess intra-rater reliability and ICCs were used for other 16 ordinal items. Nineteen out of 24 (79%) items using Kappa statistic reached at least moderate level of intrarater reliability. Four items showed fair level and one item fell into the poor category ("Abandoned building or vacant lot"). ICCs for the 6 dimensions were: residential density (0.563), street connectivity (0.721), accessibility (0.758), sidewalk quality (0.360), bike lane quality (0.587), and aesthetic (0.710), all of which were statistically significant (all p < 0.001). After calculation of the PABAK coefficient, fifteen out of 23 items reached higher levels of intra-rater reliability.

Construct validity
The KMO value in the present analysis was 0.758, which was within the range of "acceptable" values. The Bartlett's test of sphericity was 1593 and significant at p < 0.001, which indicated a highly significant correlation among the survey questions. This information allowed us to identify the factor model using the EFA approach.
The EFA yielded 6 factors with eigenvalues greater than 1, which explained 60.0% of the variance. After suppressing small coefficients with absolute value below 0.40, the rotated component matrix is shown in Table 5. Almost perfect matching was found between 6 components and the corresponding items designed for them, which suggests very good construct validity of the CUB-EST. But still, the main factor loadings of a few items (or summary scores of items) deviated from their original design. These were "Presence of bus or other transit stops" designed for street connectivity, "Evenness and sidewalks maintenance" designed for sidewalk quality, "Crossing aids for pedestrians and bicyclists" designed for bike lane quality, and "Cultural features" and "Nature sights" designed for aesthetic.

Final Scan Tool
Six dimensions and 41 items were included in the final CUBEST specific to Hangzhou City (Q1 to Q41), as shown in Table 6. Items with zero or little variance were excluded because of little distinction among characteristics of street segments in Hangzhou, which includes "Abandoned building or vacant lot", "Slope of sidewalks", "Slope of bike lanes", "Whole or broken liquor bottles or cans", and "Unattended pets or lost animals". However, these items should be viewed as back-up items and added when applying the CUBEST to other Chinese cities. For example, slope of sidewalks and bike lanes should be added when assessing the built environment in cites with hilly terrain, like Qingdao and Chongqing.

Discussion
In this study, we developed the instrument designed specifically for measuring the urban built environment related to physical activity in the Chinese context and conducted necessary reliability and validity tests. In general, most of the items demonstrated at least substantial level of inter-rater reliability (Kappa > 0.6, ICCs > 0.75), as well as fair to acceptable levels of agreement for the intra-rater reliability. Of the 6 dimensions, the accessibility dimension showed the highest reliability, which was consistent with Brownson's study [13]. A low ICC or Kappa statistic could be attributed to actual low reliability, a change of the environment between the first and the second audit, the subjectivity of judgments, and/or little variation across segments [28]. Items Table 2 Inter-rater and intra-rater reliability of the CUBEST (Continued)      Using total score or summary score to represent original items.
measuring access to destinations changed little over time, which ensured higher intra-rater reliability. By contrast, some items measuring aesthetic, sidewalk, and bike lane quality varied over time (e.g. "Obstructions on bike lanes", or "Cigar butts or discarded cigarette"), which resulted in lower intra-rater reliability. To minimize the variation in street environment over time, a two-week time interval between the first and second audit was chosen, which also providing sufficient washout period to avoid memory effect. In addition, necessary subjective judgment for a few items can lead to lower inter-rater reliability despite of proper auditor training [9,22]. These items included obstacles on sidewalks and bike lanes and aesthetic items. Little variance or certain data distribution can also cause problems (e.g. lower Kappa statistic due to high prevalence effect). The PABAK coefficient was reported alongside of the original Kappa coefficient, which was especially necessary when the prevalence index or the bias index was high. The item "Presence of community park" presented perfect observer agreement (96%), however, the original Kappa coefficient was only 0.316 due to high prevalence index. The PABAK adjusted for prevalence effect reached 0.923, much higher than unadjusted value. Adjustments for prevalence and bias effect increased the Kappa coefficient in 4 items in the inter-rater reliability test and 15 items in the intra-rater reliability test. Hoehler [29] is critical of the use of the PABAK because the effects of bias and prevalence on the magnitude of kappa are themselves informative and should not be adjusted for. Alongside the obtained values of Kappa and the PABAK, we also reported the prevalence index and bias index to provide more valuable information as recommended by other authors [27,30].
Factor analysis was used to test construct validity. The six extracted principal components were basically in accordance with the theoretical framework of the CUBEST. A few items showed larger factor loading on dimensions to which they did not belong, however, these results can be explained. For example, the commercial buildings usually were located at places where streets are wellconnected to increase business visibility. The analysis correspondingly indicated that the commercial destination items loaded highly on the factor of street connectivity instead of accessibility. Similarly, "Evenness and sidewalks maintenance" could not only be used to measure the sidewalk quality, but also be in reflection of the aesthetic quality. Another example is that, the "Presence of bus or other transit stops" loaded strongly on bike lane quality instead of street connectivity because the public bicycle rental stations are usually built near bus stops to facilitate the traffic movement.
A 91-item Environment in Asia Scan Tool -Hong Kong (EAST_HK) was developed by Cerin et al. to objectively assess aspects of the neighborhood environment hypothesized to affect walking in Hong Kong and similar ultra-dense Asian metropolises [9]. There were differences in several aspects between the CUBEST and EAST_HK. First, items hypothesized to affect cycling behavior were added in the CUBEST given that cycling remains one of the most popular form of transport in China. Second, the EAST_HK included 41 items as a list of destinations. However, we had special focus of the CUBEST on two most important types of destinations, that was commercial (8 items) and physical activity destinations (8 items), and had other types of destinations simplified to keep the questionnaire as short as possible.
A simplified instrument would be a more feasible option for large-scale survey. Third, items measuring safety from crime were not included in the CUBEST due to consideration of hard judgment for raters during a short stop in the street. In addition, additional items were added, like the presence and number of transit stops to better capture variation of street connectivity. We acknowledge the limitations of this study. One is the lack of criterion validity test for the instrument. GIS data could be used as criterion measure to test the validity of residential density, street connectivity, and landuse mix. Unfortunately, poor accessibility and high cost make the use of GIS data impossible in this and many other studies conducted in China [31]. The other limitation is that this study took place in one city of China, which may limit the application of the CUBEST to other Chinese cities. However, we evaluated the CUBEST in three different types of administrative planning units, which to some extent ensured enough variations in the environmental features. When applying the CUBEST in other Chinese cities, we suggest that some items should be added or modified to capture specific built environment features to different cities.