The influence of socioeconomic characteristics on active travel in US metropolitan areas and the contribution to health inequity [version 1; peer review: awaiting peer review]

Background: The prevalence of chronic disease in the US adult population varies across socioeconomic groups in the USA where approximately six in 10 adults have a chronic condition. Walking or cycling reduces the risk to many of these diseases and is influenced by the built environment, accessibility, and safety. Methods: We performed multivariate logistic and linear regression on the Health-Oriented Transportation model parameters using the 2009 and 2017 US National Household Transportation surveys, restricted to adults in major metropolitan areas. Model covariates included socioeconomic and environmental characteristics. Results: Using odds ratios (OR) adjusted for model covariates, we observe several significant variables in 2009 and 2017. Residents of households with no cars were more likely to walk or cycle than those with two cars; OR=5.4 (4.8, 6.0). Residents of households in a census block with population density greater than 2,5000 persons/square mile were more likely to walk or cycle than those with a population density of 2000–3999; OR=2.6 (2.3, 2.8). Individuals with a graduate or professional degree were more likely to walk or cycle than those with a high school degree; OR=2.1 (1.9, 2.2). Individuals that self-report as Black or African American, or


Introduction
Currently in the USA approximately six in 10 adults have a chronic condition such as heart disease, stroke, cancer, or diabetes (US Center for Disease Control and Prevention). In 2020 seven of the top 10 causes of death were chronic diseases 1 . Disparities in the prevalence of chronic disease across race and ethnicity is a problem faced by the United States with some studies suggesting there that has been no improvement in the last 20 years [2][3][4] . Risk factors for chronic disease are known to include tobacco use, poor diet and physical inactivity 5 .
Here we focus on the physical inactivity risk factor for chronic disease and examine how socioeconomic factors affect transportation-related physical activity and the corresponding potential health inequity.
Studies have shown that commuting by walking or cycling provides sufficient physical activity to reduce the risk of some chronic disease 6,7 . A meta-analysis of commuting by walking or cycling and cardiovascular disease reported an 11% reduction in cardiovascular risk (RR=0.89) in those who commute by walking or cycling compared to those who do not 6 . In 2009 and 2017, the percentage of adult Americans meeting the minimum aerobic physical activity guidelines published by the US Center for Disease Control and Prevention (>150 minutes of moderate-intensity exercise per week) was only 47.2% (46.2%-48.2%) and 54.1% (52.9%-55.2%), respectively 8 .
Participation in transportation-related walking or cycling (active travel) is known to be influenced by the built environment and access to public transportation 9,10 . Less is known, however, about which sociodemographic characteristics are associated with active travel, after controlling for known environmental factors. Using the National Household Transportation Survey data from 2009 and 2017, we identified clear disparities in active travel across race, education, and household income in major US metropolitan areas in both 2009 and 2017 which have not been previously reported in the literature. Our analysis is the first to restrict the sample to major metropolitan areas, and one of the few to use regression models with social and environmental covariates. We use the Health-Oriented Transportation (HOT) model to estimate how these disparities may affect the health of US subpopulations 11 .

Data Description
We used publicly available US National Household Travel Survey data from 2009 and 2017 distributed by the US Department of Transportation, Federal Highway Administration. The 2009 survey is a one-day, list-assisted, stratified, random-digit dialing of households with landline telephones with a sample size of 150,147 households [12][13][14] . The 2017 survey is also a one-day survey, but is the combination of an address-based national sample of households with 13 add-on statewide samples purchased by various states/Metropolitan Planning Organizations (MPOs) for a total of 129,696 households (Arizona; California; Dallas-Fort Worth, Texas; Des Moines, Iowa; Georgia; Maryland; New York; North Carolina; South Carolina; Texas; Tulsa, Oklahoma; Waterloo, Iowa; and Wisconsin) 14 . We restricted the sample to adults aged 19 to 65 years old. The prevalence of active travel in non-urban areas of the USA is significantly less than in urban areas so we further restricted our analysis to households in metropolitan areas with a population size greater than one million 12  Values for the race variable were recorded as the respondent's choice from among a predefined list defined by the survey designers. In 2009, the choices were "White", "African American, Black", "Asian", "American Indian, Alaskan Native", "Native Hawaiian, or other Pacific Islander", "Multiracial", "Hispanic/Mexican", "other", "refused", and "don't know." Respondents were asked to report which one best describes their race. In 2017, the choices were "White", "Black or African American", "Asian", "American Indian or Alaskan native", "Native Hawaiian or other Pacific islander", "Some other race", "I don't know", and "I prefer not to answer." The respondent was asked to select all that apply. Individuals that selected multiple values in 2017 along with respondents who selected "Multiracial" in 2009 are identified in the data as "Multiple Responses Selected." In 2009, only the household respondent was asked to report their race. This value was used for all members of the household. Because "Hispanic/Mexican" was not an option in 2017, respondents who chose "Hispanic/Mexican" in 2009 were grouped with "other." See the Limitations section for more discussion on the category labels. Table 1 provides summary statistics for selected social and environment variables in the sample. Social variables include sex, age, race, Hispanic status, race by Hispanic status, household income, and education. The environmental variables included in Table 1 are the discretized population density for the household's census block, and the number of cars per household (0, 1, 2, 3+). Values for race were recorded as the respondent's choice from among a predefined list. The Hispanic variable was coded as true if the respondent identified as being of "Hispanic or Latino Origin" in 2017, or of having "Hispanic status" in 2009. In 2009 both the race and Hispanic variables were recorded only for the household respondent and used for each person in the household.

Prevalence and participation
As in Younkin et al. (2021), we made a distinction between prevalence and participation in active travel 11 . The prevalence of active travel is a one-day snapshot of the proportion of active travelers, while participation is the proportion of active travelers over one week. An active traveler is defined as a respondent that reported either a walk or cycle trip on the survey day. We estimated 95% confidence intervals for the prevalence of active travel among adults in US metropolitan areas, stratified by social and environmental variables. We used the sample weights provided in the NHTS data set. Two self-reported values for the number of trips taken by walking and cycling over the past week are included in the NHTS data and we use these to estimate participation as the proportion of respondents with at least one walking or cycling trip over the last week. The true prevalence, p, will always be less than the true participation, π, for p = fπ, where 0 ≤ f ≤ 1, and f is the frequency of active travel. Since we have independent estimates for participation and prevalence, we can estimate the frequency and use it to estimate a weekly rate of physical activity due to travel, which is referred to here as travel activity.
The ratio of prevalence to participation may be used to estimate the frequency of active travel and is 0.286 and 0.264 in 2009 and 2017, respectively, or approximately once every 3-4 days.

Regression models
We fit three multivariate regression models (prevalence, participation, and intensity) that include social and environmental variables, allowing us to estimate the effect of social variables after controlling for confounding due to environmental variables with strong effects. To represent different environments across the USA, we include variables that serve as surrogates for variations in infrastructure and city design (state), climate (state × season), access to businesses (population density) and access to personal automobiles (number of cars per household). We considered models for the prevalence, participation, and intensity and used the variables listed in the data description along with a state variable and a state by season cross-product. The state variable serves as a surrogate for differences in climate, infrastructure, and city design, while the state by season cross product accounts for seasonal differences. Subjects were classified as having engaged in active travel if they reported any walking or cycling on the survey day, i.e., nonzero travel activity. We used logistic regression for both the prevalence and participation models.
Travel activity is defined as an individual's amount of physical activity due to active travel, measured as a weekly rate in terms of MET-hours/week. An active traveler is defined as an individual with nonzero travel activity, and the travel intensity is a population-level measure defined as the mean of travel activity among active travelers 11 . Travel intensity varies across social and environmental variables, albeit not as drastically as prevalence and participation. Travel activity is modeled using a log-normal distribution therefore we use the logarithm of travel activity in our regression models 11 . Use of the state-level location also allows us to account for the over-sampling that occurred in thirteen of the states. Without the inclusion of a state variable some states would exert greater influence on the overall result than others, making for a poor representation of the whole population.

Health estimates
We quantified differences in active travel among various US socioeconomic groups and the corresponding potential difference in all-cause mortality rates using the HOT model, a comparative risk assessment of a change in the distribution of travel activity 11 . The HOT model uses the distribution of leisure time physical activity from Arem et al. (2015) and compares it to the distribution found by adding (or subtracting) the difference in travel activity between two populations 17 . These two distributions, along with an exposure-response function for all-cause mortality also estimated from Arem et al., make up a comparative risk assessment which allows the estimation of the population attributable fraction 17 . The HOT model assumes that travel activity is distributed as a mixture of a log-normal distribution and a point-mass at zero and that all new travel activity is additional physical activity with no substitution from other domains 11 . We tabulate these estimates in Table 2. For comparison across population groups, we set the baseline reference group as White, non-Hispanic, males, with a high school education, residing in California, in the fall, in a census tract with 2,000 to 3,999 people per square mile, with a household income of USD 35,000-49,999 and two cars available to the household.
The prevalence and participation for each subgroup are found as p j = e β 0 + β j and π j = e β 0 + β j, where β 0 and β j are the respective regression model coefficients. Daily intensity estimates on a log-scale are found as μ j = β 0 + β j and σ 0 is the residual standard deviation of the intensity model (log-scale). The transformation to a linear parameter space ( The parameters are then scaled to represent a weekly rate We simulate a vector of travel activity among the reference group, TA 0 , with length one million from a mixture of a log-Normal distribution and a point-mass at zero with parameters (π 0 , Participation is the proportion who engage in active travel, intensity the strenuousness of the active travel, and frequency is the frequency of active travel among active travelers. Overall Travel Activity (TA) is the product of participation and intensity and represents the travel-related physical activity for the entire subpopulation. The reference group in both models is made up of White, non-Hispanic, male residents of California with a household income of USD 35,000-49,999, no more than a high school education, living in a census tract with population density of 2,000 to 3,999 people per sq. mi., two cars available to the household, and surveyed in the fall. Each of the seven non-reference subpopulations are created by changing one variable from the reference group. ΔTA is the change in TA from the reference subpopulation. PAF (population attributable fraction) is estimated using the HOT model and an exposureresponse function for all-cause mortality and represents the proportional change in all-cause mortality. We do the same for each subgroup, TA j , using the parameters (π j , ' j µ , ' j σ ). The differences δ j = TA j -TA 0 are computed and added to a simulated vector of leisure time physical activity values, PA 0 , drawn from an empirical distribution estimated using data from Arem et al. 18 A function describing the empirical distribution of baseline leisure time physical activity is available and documented in the HOT R Package 19 .
If negative, estimates for PA j are set to zero. Vectors of 10,000 evenly spaced quantiles representing the distributions of PA 0 and PA j , q 0 and q j , are then used to estimate the population attributable fraction, ρ j , for each subgroup j relative to the reference group.
The function R is an exposure-response function for all-cause mortality and leisure time physical activity given in terms of MET-hours/week. The exposure-response function used in this analysis is a piecewise linear function created from hazard ratios found in Arem et al. 18 This exposure-response function, along with others, is available and documented in the HOT R package 20 .
All results were computed using the R programming language and environment for statistical computing version 4.2.1 21 . All multivariate regression models were carried out using glm methods in the R core package stats and nominal significance is set at the α = 0.05 level. The R package HOT contains many of the functions used in the analysis and is available on GitLab 20 . The analysis here is a secondary analysis of a publicly available database with no identifiable data. As such, IRB approval of informed consent requirement by an IRB is waived.

Results
The sample size and proportion of each of the model variables are displayed in Table 1

Active travel prevalence and participation
The prevalence and participation of active travel were estimated across each of the variables independently (weighted estimates) and are shown in Figure 1 with 95% confidence intervals. Using the R package survey version 4.1-1, we computed Pearson chi-squared statistics (adjusted by design effect) using the survey design weights and found that both the prevalence and participation of active travel for each variable (tested independently) yielded highly significant results, with only the sex and race variables not significant at the α = 0.05 level in all four models (2009, 2017 × prevalence, participation) 22 . Table 3 contains log transformed p-values (-log 10 p) for all tests and in it we see that sex is highly significant in the 2017 participation model (-log 10 p = 7.15), and race is marginally significant in the 2009 model (-log 10 p = 1.38), and not significant at all in the 2017 prevalence model (-log 10 p = 0.87). In both 2009 and 2017, the estimates for prevalence among the Black or African American population were not significantly different than the overall mean ( Figure 1A). The prevalence in the Hispanic population changed dramatically from 2009 to 2017, going from greater than the non-Hispanic population to less than ( Figure 1A). With household income, we observe similar patterns in the prevalence and participation models in 2009 and 2017. Using the participation measure we see a U-shaped relationship across household income with the minimum at USD 25,000-34,999. With the prevalence measure, however, the location of the minimum is not clear. Prevalence is greatest at the lowest household income level. For all education levels above Less than High School we see an increasing trend in both prevalence and participation with above-average values for Bachelor's Degree and above. There was no significant difference in prevalence across sex.

Prevalence and participation logistic regression
We see in Figure 1 that the variables with the largest effect size are environmental, namely population density and number of cars per household. Thus, to truly understand the effect of the social variables we must use a model that accounts for all the variables simultaneously. We constructed three multivariate logistic models using the variables described above (sex, age, population density, income, education, number of cars, race, Hispanic status, race × Hispanic status, state × season). In the prevalence model, the response variable was I TA > 0 , and in the participation model I nwalk+ncycle > 0 , where TA is travel activity and nwalk and ncycle are the self-reported number of walking and cycling trips taken over the last week. In 2017, an individual living in a census block with population density greater than 25,000 people per square mile is more than twice as likely to engage in active travel as the same person living in a census block with a population density of 2,000 to 3,999 (reference group). Similarly, in 2017 someone living in a household with no cars is more than four times as likely to engage in active travel daily (prevalence) compared to a person with the same characteristics except living in a house with two cars (reference group).
Among the social variables, education has the largest effect size, increasing active travel as education level increases. A person with a graduate or professional degree is approximately twice as likely to engage in active travel than a person with all the same characteristics except having only a high school education. Prevalence and participation across household income, unlike the other ordered variables, do not strictly increase or decrease. A clear pattern appears in the participation model in which the minimum odds ratios occur in the middle of the range at USD 25,000-34,999. The upper and lower

Travel intensity linear regression
A forest plot of regression coefficient estimates for the travel intensity model is presented in Figure 3.  The coefficients are presented with 95% confidence intervals. Variables for age, state, and state by season were included to account for seasonal effects and over-sampling of some states and are not displayed here. The reference group in both models is made up of White, non-Hispanic, male residents of California with a household income of USD 35,000-49,999, no more than a high school education, living in a census tract with population density of 2,000 to 3,999 people per square mile, two cars available to the household, and surveyed in the fall. American Indian, Alaskan Native, Native Hawaiian, or other Pacific Islander were not included due to small sample size (less than 1%). p = 0.0234 and β 2017 = 0.120, p = 0.0149) while all other levels remained near the mean. As with prevalence and participation, intensity increases as the population density or the number of cars per household decreases. The pattern of travel intensity across the household income variable is similar to the ones seen in the participation and prevalence models where the low and high ends show a significant positive effect, and the middleincome levels are not significantly different than the referent group (USD 35,000-49,999). Note that a daily rate for travel activity was used in the regression analyses and later scaled to a weekly rate.

Health estimates
The mean overall travel activity in a population is the product of participation and intensity. Using the multivariate regression models, we see that participation in the reference group is 0.650 and 0.648, and the intensity is 2.

Discussion
Using odds ratios for active travel in the USA that are adjusted for multiple social and environmental variables, we observed that racial and ethnic minority populations (Black or African American, Asian, and Hispanic) are less likely to engage in active travel than the White population with the same socioeconomic and environmental characteristics in both 2009 and 2017. The most influential factors in active travel are environmental, e.g., population density and access to personal automobiles, but sociodemographic variables such as race, income, and education are also significant. This disparity in active travel creates a potential health burden in some populations which we estimate may be responsible for a relative increase in all-cause mortality of approximately one percent.
This analysis is the first to demonstrate that the adjusted odds for active travel among the Black or African American and Asian adult populations in US major metropolitan areas are less than their White counterparts. In contrast to this study, previous studies of US transportation surveys considered the entire US and reported that walking and active travel are both more prevalent among minority populations than the White population 12,25,26 . An analysis by Paul et al. of transportationrelated walking reported odds ratios adjusted for sex, age, race, education and BMI and observed that the adjusted odds of walking for transportation was highest among the non-Hispanic Black population, that the prevalence increased with increasing education level, and that the prevalence of walking for transportation was lowest in the South 27 . We chose to restrict the population to major metropolitan areas to remove some of the confounding between the race and walking relationship due to urbanicity and population density.
Whitfield et al. reported that members of minority populations were more likely to walk for transportation than members of the non-Hispanic White population during 1999-2012 12 .
Another study reported that the prevalence of walking 30 minutes per day was higher in the Hispanic, African-American and Asian populations than in the White population 25 . It was estimated that in 2005 the highest prevalence of transportation walking in the USA was among non-Hispanic Black men (36.0%) and Asian/Native Hawaiian/Pacific Islander women (40.5%) 26 .
The disparity in cycling prevalence across race is much greater than the disparity in walking, with the odds of cycling being significantly lower in minority populations. There are several potential explanations for these observed results. Exclusionary zoning, discrimination, systematic and institutional racism have all contributed to the inequities in minority and low-income communities across the USA resulting in less access to safe street infrastructure and green space [28][29][30][31][32] . Racial profiling, harassment, and discriminatory treatment, along with a lack of access to cycling and educational resources, can discourage low-income communities and communities of color from cycling 32,33 . For racial and ethnic minorities and those living in low-income communities, concerns about personal safety due to traffic collisions and crime are two of the top-cited barriers to engaging in active travel modes such as cycling 28,29 .
Researchers have found that walking due to concerns about crime is significantly larger among women than men and any attempt to improve walkability must address, in particular, the safety of women 30 . Moreover, concerns about gentrification may limit investments in walking and cycling infrastructure, given that improvements in the built environment have been associated with increasing property and housing values thereby posing a risk to long-term residents who have to contend with the tensions of revitalization and displacement 31 .
Equity is an important consideration in the development and implementation of policies, plans, initiatives, and programs designed to improve health and well-being by increasing participation in active travel. Planning to improve walking and cycling infrastructure and creation of policies, initiatives, plans, and programs to increase access to active travel, must be informed by key environmental, socioeconomic and demographic factors that drive existing inequities 32,33 . Equitable approaches are especially warranted as our study finds significant difference in the adjusted odds of active travel between the Black or African American and Asian populations compared to the White population. Future studies should investigate the specific drivers of these disparities in metropolitan areas, including the extent of disparities in walking and cycling infrastructure across metropolitan areas 29 . Improving representation in the decision-making process and targeted outreach to transportation in disadvantaged groups and communities throughout the planning and decision-making process will be essential for increasing equity in active travel and in the resulting health benefits 33 .

Limitations
Assumptions regarding the distribution and replacement of physical activity are necessary and therefore the estimates for health benefits are broad. The trends across the model variables were remarkably consistent between 2009 and 2017, however we did observe an increase in participation estimates across all variables in 2017. Changes in survey methodology between the 2009 and 2017 survey likely affected our results. The increase in participation from 2009 to 2017 was likely due to a questionnaire change. As of 2017, the questionnaire now explicitly prompts respondents to recall walk and bike trips 34 . Additionally, walk and bike trips to and from home (loop trips) are included in 2017 but not in 2009 34 . The increase in participation estimates (while not also in prevalence estimates) will have the effect of decreasing the estimate for travel activity and intensity, due to the decrease in the frequency of active travel. Of note, the categorization of Hispanic/Mexican also changed between 2009 and 2017, therefore the insights that can be drawn in this population over time are limited in this study. As racial/ethnic disparities remain a clear concern when it comes to active transportation, physical activity and chronic illness, we need more precise and inclusive measures of racial and ethnic identity categories to be able to draw accurate conclusions.
To arrive at estimates of the health impacts we must make assumptions regarding the underlying distribution of overall physical activity. If the underlying distribution of physical activity is already high in a population, changes to travel activity will have less of an effect. Since it is unknown how this distribution varies across the social and environmental variables considered here, we assume that the distribution of physical activity is the same in all subpopulations.
The NHTS dataset has geographic resolution at the state level, giving the metropolitan area only if the population size is greater than one million. Thus, very little information can be included in the model to account for local urban design and culture. In future studies, we would like to include local measures of safety and walkability. Furthermore, capturing the lived experience of community members through qualitative methods, such as surveys, may help elucidate barriers to walking and biking, and may shed light on other ways that people that report low physical activity may increase their physical activity as part of their daily routine. Policy measures should focus on addressing structural factors that prevent people from active transport to reduce sociodemographic disparities. Efforts to design and evaluate interventions such as adult cycling lessons and walking groups can encourage more people to enjoy the health benefits of physical movement.

Data availability
Underlying data Data used in this study are from the 2009 and 2017 U.S. National Household Travel Survey (NHTS) datasets, available from the NHTS website at https://nhts.ornl.gov/downloads.