Crime and physical activity measures from the SAFE and Fit Environments Study (SAFE): Psychometric properties across age groups

Highlights • Newly-developed measures of crime, environmental context, and activity are valid.• Factorial validity of measures supported across four across age groups.• Minimal measurement redundancy found across measures.


Introduction
The lack of consistent support for widely held hypotheses about crime's effect on physical activity (Carver et al., 2008;Foster, Giles-Corti, 2008) calls for an examination of the conceptual models, measures, outcomes, and analyses. The Safe and Fit Environments Study (SAFE;Patch et al., 2019) sought to evaluate the relations among crime, fear of crime, responses to crime, environmental contexts, and physical activity by developing an interdisciplinary (i.e., criminology, psychology, public health) multilevel conceptual framework with valid and reliable measures to test hypotheses derived from the framework (Patch et al., 2019). The conceptual framework is multilevel because hypothesized processes reflecting individual, social, and neighborhood/community attributes are involved in pathways linking crime and fear of crime with multiple measures of physical activity. Measures were developed for each construct of the framework, with the goal of generating items and subscales suitable for use across age groups , Young Adult , Middle Age Adult , Older Adult [66+]). These broader conceptual domains referenced the (a) Micro Level, representing individual factors such as cognitive assessment of crime/safety, emotional responses to crime, behavioral responses to crime/victimization/physical activity, and personal experiences/victimization; the (b) Meso Level, representing perceptions of social control and social cohesion in participants' communities; and the (c) Macro Level, referencing crime prevention through reducing opportunities for crime in the neighborhood context. Each conceptual domain was operationalized with multiple indicators. Table 1 briefly describes the indicators.
A systematic process was used to develop and adapt survey items and scales to provide operational definitions of the constructs in the SAFE conceptual framework (Patch et al., 2019). The survey development process was guided by several principles: 1) draw from existing crime and physical activity measures when possible, 2) develop new items as needed based on focus group and expert input, 3) design items to apply across age, gender, and socio-demographic groups, and 4) minimize respondent burden by limiting length of scales and maintaining a common response format. The survey development process began with a non-systematic literature review to identify existing measures in the fields of criminology, sociology, and physical activity that related to constructs in the framework. Over the course of a year, an interdisciplinary team of experts met weekly to review existing measures and adapt them as necessary. When measures did not exist or were deemed inadequate to represent a content domain, items were generated by the team. The experts discussed the language used for items, response formats, and psychometrics (when available). Some items were adopted from existing measures without material change, but most were altered somewhat in content, format, or both.
Test-retest reliability information for these measures was provided on a subsample of participants (N = 176) by Patch et al. (2019). The majority of these measures achieved adequate reliability across age groups. Given the construction of "new" measures for the SAFE study, it is vital to conduct an integrated psychometric evaluation to confirm the construct validity of each measure before these measures can be used in predictive models. Gaining important additional information about the construct validities of these indicators was the primary purpose of the present paper. Of specific interest in the current analysis is the invariance of these measures (and by implication, constructs) across age groups. The use of large sample sizes for each age group provides an opportunity to validate the majority of the measures and establish construct validity, internal consistency reliability, and concurrent/ discriminant validity across the lifespan. Only after it has been demonstrated that the constructs of interest are being measured similarly and independently in each age group can one attribute substantive differences/associations to substantive factors rather than noncomparability of the measurement instrument.
There are multiple aspects to validity, conceptually unified under construct validation (Messick, 1995). The structural aspect of validity refers to the correspondence between the scoring of the measurement instrument and the hypothesized constructs and involves examination at the level of the items (also referred to as measurement validity). The extent of evidence required for demonstrating validity depends on the types of inferences one wishes to make from the obtained scores, but at a minimum, the structure of the measure should be established. This is referred to as factorial validity, and is traditionally established using factor analysis (exploratory, confirmatory). When using measurement instruments that have been developed for different age groups, an added concern is the extent to which the measurement instrument has the same structure across age groups. At the initial level of analysis, this is referred to as configural or pattern invariance (Roesch et al., 2013;Vandenberg & Lance, 2000). In the present study, factor loadings in each age group were tested to determine if they are descriptively invariant (equivalent). If this initial invariance is observed, one can infer that a measurement instrument yields scores that can be interpreted in a similar fashion across different populations or groups, in this case different age groups.
In sum, the current study, a comprehensive psychometric evaluation Response scale: 0 = never, 1 = 1 time, 3.5 = 2-5 times, 6 = 6 or more times 7) Recent (≤12 months) 4 items; "In the past 12 months, how many times have YOU been the victim of a shooting?" 8) Past (>12 month) 4 items; "PRIOR to the past 12 months, how many times have YOU ever been the victim of property crimes?" 9) Witness Crime 4 items; "In the past 12 months, how many times have you WITNESSED bullying in your neighborhood?" 10) Hearing about Crime 4 items; "In the past 12 months, how many times have you HEARD about someone in your neighborhood being attacked?" 11) Crime Information Sources Where the respondent obtains crime information (e.g., media sources) Response scale: 0 = No, 1 = Yes 10 items; "Please CIRCLE whether you get information about CRIME IN YOUR NEIGHBORHOOD from the radio." Individual factors, Cognitive Assessment of Crime 12) Evaluation of risk A cognitive assessment of the likelihood of crime Response scale: 1 = very unlikely, 2 = somewhat unlikely, 3 = somewhat likely, 4 = very likely 12 items; "How likely is it that in the next year you will a victim of crime when you are in a local park.") 13) Values/Incivilities Concerns relating to the personal tolerance of the respondent to crime and incivilities Response scale: 1 = not present in my neighborhood, 2 = present, but not a problem, 3 = present, somewhat a problem, 4 = present, big problem 17 items; "Extent to which the issue is a problem in your neighborhood…gang activity." 14) Street Efficacy Confidence in the ability to avoid crime or to find ways to be safe (continued on next page) S.C. Roesch et al. of the SAFE measures was undertaken. In addition to assessing the basic psychometric properties of each measure (e.g., variable distributions, internal consistency reliability), this study formally examined the factorial validity and invariance of measures using confirmatory factor analysis (CFA). Not all measures from the SAFE study were amenable to CFA. Of the 25 measures identified in Table 1, 17 were subjected to psychometric evaluation that specifically included CFA; these measures are referred to as (sub)scales throughout the manuscript. Eight measures could not be evaluated using CFA for a number of reasons, including operationalization by only two items for a given measure, use of a binary response scale, or the lack of an underlying continuum represented by the response scale. These measures are referred to as (summative) indexes through the manuscript. Physical activity measures used in the study were generally previously-evaluated measures, and they were not further examined in present analyses.

Participants
The survey was administered to study participants of the Safe and Fit Environments Study (SAFE Study) recruited from four metropolitan US regions: Baltimore counties Maryland/Washington DC; Seattle/King County, Washington; San Diego County, California; and Phoenix/Maricopa County, Arizona. Most participants in the SAFE Study were rerecruited from one of four previous studies conducted by the same research team: the Neighborhood Quality of Life Study (adults; Sallis et al., 2009), the Senior Neighborhood Quality of Life Study (older adults; King et al., 2011), the Teen Environment and Neighborhood Study (adolescents; Carlson et al., 2015), and the Neighborhood Impact on Kids Study (children; Frank et al., 2012;Saelens et al., 2012). Recruitment of new participants was conducted in the same regions, with oversampling from high-crime and low-socioeconomic status areas, as well as from a new region (WalkIT Arizona; Adams et al., 2019).
In the present SAFE Study, the research team sampled to achieve a balance of participants from high-and low-crime neighborhoods and across four current age groups: 2,173 participants comprised the sample, with 336 participants in the Adolescent group (12-17), 532 participants in the Young Adult group (18-39), 838 participants in the Middle Age The complete survey and more information item adaptation from original sources is available online at https://drjimsallis.org/measures.html. Adult group (40-65), and 467 participants in the Older Adult group (66+). Additional demographic information is provided in Table 2.

Measures
A brief description of each (sub)scale and index is provided in Table 1. The complete survey and more information about item adaptation from original sources is available online at https://drjimsallis. org/measures.html (see also 3).

Statistical approach
The data for each measure in Table 1 were subjected to basic psychometric analysis. For the 17 measures for which CFA was used, descriptive statistics at the item level were evaluated to identify items with low variability and non-normal distributions. Subsequent to this, CFA models were tested for each measure to establish the best-fitting, unidimensional measurement model in the overall sample and stratified by age group. Traditionally, the likelihood ratio chi-square test has been reported but sparingly used to determine whether a model fits well. This test statistic has been identified as unsatisfactory for numerous reasons, including the heavy reliance of this statistic on sample size (Hoyle, 2000). While the use of alternative descriptive fit indices and the values used to determine overall model fit is contentious (Bentler, 2007;Marsh et al., 2004), three descriptive fit indexes have been generally recommended (Hu and Bentler, 1999): (a) the Comparative Fit Index (CFI; Bentler, 1990), a relative index of model fit with values >0.90 indicating acceptable model fit; (b) the root mean square error of approximation (RMSEA; Steiger, 1990), an absolute index of overall model fit with values less than 0.08 indicative of acceptable model fit, and (c) standardized root mean-square residual (Bentler, 1990, an absolute index of overall model fit with values less than 0.08 indicative of acceptable model fit. However, given the exploratory nature of this psychometric analysis, these more liberal threshold values were used to indicate reasonably acceptable fit. All CFA models were estimated with the MPlus software version 8.1 (Muthén and Muthén, 2019) and used both Maximum Likelihood-Robust and Weighted Least Squares estimation procedures.
Finally, descriptive statistics at the (sub)scale/index level were examined for all 25 measures to identify (sub)scales/indexes with low variability items and non-normal distributions. Correlations among all measures were examined to determine if redundancy (i.e., correlations >0.70) existed. These correlations were evaluated for the overall sample and for each age group. These correlations were evaluated for the overall sample and for each age group. Given the relatively large size of the overall sample and the sample size of each age group, statistical significance of these correlations should be interpreted with caution.

Results
Descriptive statistics, confirmatory factor analyses, and correlations among measures are presented in Tables 3-7 for the overall sample and by age group. Specifically, the tables present item-level descriptive statistics (Table 3), standardized factor loadings from the CFAs and internal consistency coefficients (Table 4), (sub)scale-and index-level descriptive statistics (Table 5), and correlations among all (sub)scales in the overall sample (Table 6) and by age group (Table 7). The results are presented below by conceptual level of assessment (Macro, Meso, Micro).

Macro-level: neighborhood context
Means and standard deviations for the Crime Prevention through Environmental Design (CPTED) measures all indicated reasonable normality and sufficient variation at the item level and across age groups ( Table 3). The configural invariance model fit reasonably well for CPTED Surveillance (RMSEA = 0.114 to 0.156, SRMR = 0.057 to 0.078, CFI = 0.900 to 0.970). However, the standardized factor loadings for two of the items were not practically significant (loadings <0.25) in the overall sample and for each age group ("There are many places in my neighborhood where criminals could wait for victims without being seen" and "The police patrol my neighborhood frequently").
Moreover, internal consistency values were relatively low (see Table 4). Conversely, all items for CPTED-Maintenance had large loadings, and internal consistency values were reasonable. At the subscale level, the descriptive statistics and normality for the four CPTED subscales exhibited similar means and standard deviations across age groups (see Table 5). No excessive non-normality was exhibited. Correlations with other study measures did not exhibit redundancy with other measures and correlation patterns were similar across the four age groups (see Tables 6 and 7).
Note. All values reported are ranges of mean values (standard deviations) across the individual items. *Statistical non-normality indicated when skewness, kurtosis values > |2| and SD > Mean were indicative of well-fitting models.
FL = standardized factor loadings; α = Cronbach's alpha Table 5 Descriptive statistics for all (sub)scale scores and summative indexes: Overall and by age group.

Meso-level: social dynamics
Means and standard deviations for the Collective Efficacy and Neighborhood Integration also indicated reasonable normality and sufficient variation at the item level. The configural invariance model fit reasonably well for both Collective Efficacy (RMSEA = 0.119 to 0.184, SRMR 0.067 to 0.081, CFI = 0.906 to 0.925) and Neighborhood Integration (RMSEA = 0.156 to 0.214, SRMR 0.042 to 0.067, CFI = 0.935 to 0.981). The standardized factor loadings for both measures were all large, significant, and similar across age groups. The internal consistency values were all large. At the scale level, the means and standard deviations were similar across age groups and non-normality was not evident. While the two measures within this domain did exhibit a strong correlation with each other (rs ranged from 0.54 to 0.57), correlations with other study measures did not exhibit redundancy with other measures, and correlation patterns were similar across the four age groups.

Micro-level: individual factors (personal experiences)
Means and standard deviations at the item level for the four Victimization subscales all indicated non-normality with significant positive skew. There was infrequent endorsement of victimization. The    (RMSEA = <0.001 to 0.001, SRMR = 0.001 to 0.027, CFI = 0.999 to 1). The overwhelming majority of factor loadings were large, significant, and similar across age groups. The exception was a single item in the 12-17 age group for the Recent subscale (item: In the past 12 months, how many times have you been the victim of property crimes [including theft, motor vehicle theft, burglary, vandalism]). This item did not significantly load (standardized value = 0.08). Internal consistency values were relatively low for the Recent and Past subscales but were consistently reasonable for the Witnessing Crime and Hearing about Crime subscales. At the subscale level, these measures continued to exhibit severe non-normality with significant positive skew. The Crime Information Resources summated index exhibited an approximately normal distribution with significant variation. All measures within this domain exhibited similar means and standard deviations across age groups. Some correlations among the four Victimization subscales were strong, but none exceeded 0.65. Correlations with Crime Information Resources and other measures did not indicate redundancy.

Micro-level: individual factors (Cognitive Assessment of Crime)
Item-level means and standard deviations for Evaluation of Risks and Values/Incivilities indicated relatively low endorsement of some items; this did not, however, result in significant non-normality. The Street Efficacy measure approximated normality well. There was significant variation for all three measures. The configural invariance model fit reasonably well for Evaluation of Risks (RMSEA = 0.083 to 0.094, SRMR 0.045 to 0.056, CFI = 0.968 to 0.986), Values/Incivilities, (RMSEA = 0.040 to 0.060, SRMR 0.044 to 0.064, CFI = 0.962 to 0.977), and Street Efficacy (RMSEA = 0.041 to 0.141, SRMR 0.009 to 0.025, CFI = 0.988 to 0.999). The standardized factor loadings for all three measures were large, significant, and similar across age groups. The internal consistency values were also large. Similar to what was observed at the itemlevel, at the scale level, the means and standard deviations were similar across age groups, and non-normality was not evident. Significant and strong correlations were found between Evaluation of Risks and Values/ Incivilities (rs ranged from 0.47 to 0.65). Strong, positive relationships were also found between these two measures and No Behavioral Response (rs ranged from 0.48 to 0.64), but not strong enough to indicate redundancy. Correlations with other study measures did not exhibit redundancy with other measures, and correlation patterns were similar across the four age groups. No strong correlations were found for the Street Efficacy measure with other study measures, and the pattern of correlations was similar across age groups.

Micro-level: individual factors (Emotional responses to Crime)
Means and standard deviations for the Fear of Crime measure indicated reasonable normality and sufficient variation at the item level. The configural invariance model fit reasonably well (RMSEA = 0.115 to 0.124, SRMR 0.056 to 0.070, CFI = 0.948 to 0.966). The standardized factor loadings for both measures were all large, significant, and similar across age groups. The internal consistency values were all large. At the scale level, the means and standard deviations were similar across age groups, and only mild non-normality was evident. This measure did exhibit a strong correlation with the Values/Incivilities measure (r = 0.65), but correlations with other study measures did not exhibit redundancy.

Micro-level: individual factors (Behavioral responses to Crime)
Means and standard deviations for the Protective Behaviors measure indicated reasonable normality and sufficient variation at the item level. The configural invariance model fit reasonably well (RMSEA = 0.068 to 0.078, SRMR 0.050 to 0.080, CFI = 0.962 to 0.980). The standardized factor loadings for this measure were generally large, all were significant, and they were similar across age groups. The internal consistency values were all large. At the scale level, the means and standard deviations were similar across age groups but mild non-normality was evident across age groups. This measure exhibited strong correlations with Avoidant Behaviors (Dark, Alone subscale and the Positive Avoidant Behaviors measure, respectively [rs ranged from 0.61 to 0.68]), but correlations with other study measures did not indicate redundancy.
Means and standard deviations at the item level for the four Avoidant Behaviors subscales all indicated some non-normality with positive skew. There was generally a low endorsement for the items from these subscales. In general, mild non-normality was evident for the majority of items. There was, however, significant non-normality (positive skew) for both the Daylight, Alone and Daylight, Others subscales in both the 40-65 and 66+ age groups; also for the Daylight, Others subscale in the 18-39 age group. The configural invariance model fit reasonably well for all four subscales: Daylight, Alone (RMSEA = 0.028 to 0.070, SRMR = 0.012 to 0.018, CFI = 0.996 to 1), Daylight, Others (RMSEA = 0.001 to 0.060, SRMR = 0.009 to 0.018, CFI = 0.995 to 1), Dark, Alone (RMSEA = 0.059 to 0.106, SRMR = 0.006 to 0.021, CFI = 0.995 to 1), Dark, Others (RMSEA = 0.024 to 0.082, SRMR = 0.010 to 0.021, CFI = 0.996 to 1). All standardized factor loadings were large, significant, and similar across age groups. Internal consistency values were also strong across subscale and age group. At the subscale level, these subscales exhibited mild non-normality, but not overly-severe. Correlations among the four subscales were highly redundant across age groups (rs ranged from 0.72 to 0.91). These subscales were not, however, redundant with other study measures.
For the summated indexes in this variable domain, both the News-Related Avoidant Behaviors index and the Obligatory Behaviors index were severely non-normal, with positive skew, in the two older age groups. Severe non-normality was also evident for all four age groups for the Community Participation index. The Positive Avoidant Behaviors summated index and the No Behavioral Response summated index indicated reasonable normality and variation at the scale level. While correlations between these summated indexes were similar across age groups, some strong correlations were found between Positive Avoidant Behaviors and No Behavioral Response (rs ranged from 0.51 to 0.64) and Effect of Safety on Physical Activity measure and No Behavioral Response (rs = 0.63).

Discussion
The psychometric evaluation of the SAFE (sub)scales showed consistent factorial validity and internal consistency reliability across the majority of the measures, and importantly, across the four age groups. Therefore, the newly-developed and adapted single set of questions accurately measured target constructs for all groups, and the common measures can be used to facilitate analysis and interpretation of age-related patterns. This approach to simultaneous measurement development for the lifespan is the best way to determine whether common measures are feasible, and it is more time-and cost-efficient than a sequential development approach. Specifically, 14 of the 17 measures subjected to a test of factorial validity displayed statistically significant and strong factor loadings and internal consistency in the overall sample and across the age groups. The pattern of correlations for each (sub)scale with other (sub)scales and the 8 indexes did not exhibit any redundancy, with the exception of the Avoidant Behavior subscales. That is, each measure identified a degree of unique measurement variance in operationalizing a given construct.
Drilling down a bit more on this matter, establishing construct validity of empirical indicators has long been recognized (Cronbach, 1970) as an ongoing iterative process with the constant interplay between conceptualizing and empirical assessment. That said, following Messick's (1995: 745) approach to unified construct validation, the current work has systematically assessed both the "structural aspect …the fidelity of the scoring structure to the structure of the construct domain at issue," and the "generalizability aspect … [which] examines the extent to which score properties and interpretations generalize to and across population groups, settings, and tasks" (Messick, 1995: 745). The former was assessed by confirming at least roughly acceptable levels of interitem consistency for almost all of the indices. The latter was addressed by replicating roughly acceptable levels of interitem consistency across four different age groups of respondents. That said, we look forward to work by scholars, preferably with multiple methods, advancing work on the other relevant aspects of construct validation and particularly "the external aspects" gauging convergent and discriminant validities using multi traitmultimethod matrices (Campbell & Fiske, 1959), as well as tests of criterion validity with behavioral indicators. Such future studies with individual-level behavior-based criterion variables will further advance our understanding of potential differences in the construct validities of survey items that ask in different ways about walking, and walking and safety concerns. Putting aside disparities in other scientific quality benchmarks, until such evidence becomes available, disputes about how to word survey questions about walking and safety must necessarily remain un-resolved.
A further strength of the current work is that the avoidance and behavioral constraint items adopted here represent elaborations of items widely used as indicators for avoidance and behavioral constraint in the reactions to crime literature (Ferraro, 1994;Taylor, 2017). This further supports the "content aspect of construct validation" which "includes evidence of content relevance." While this psychometric evaluation was largely supportive of the validity and reliability of the SAFE measures, some items, and by implication measures, had weaknesses. The primary Macrolevel measure of neighborhood context, CPTED, could use improvement through additional item development. It should be emphasized that no existing scale or measures operationally defining the CPTED construct previously existed. The measures (four subscales) were developed to fully define physical design features (Taylor, 2017). The CPTED-Surveillance subscale in particular had lower factor loadings for two of the five items, and this was found across the four age groups. While the inclusion of these two items attenuated reliability to a degree, the overall reliability value is still fair given the number of items comprising the scale. Moreover, these two items are core concepts for CPTED construct. Given the conceptual strength of these items, and the minimal impact on reliability, these items can still be used but with great care. The Access Control and Territorial Reinforcement subscales, respectively, had limited psychometric evaluation. Given that these subscales were operationalized by only two items, factorial validity and internal consistency reliability could not be meaningfully assessed. The CPTED-Maintenance subscale, however, showed strong psychometric characteristics across the age groups.
Similar to the CPTED measure, the four Victimization subscales showed varying degrees of psychometric viability. All four subscales did exhibit strong factor validity. However, two of the four subscales, Recent and Past, respectively, exhibited a low degree of internal consistency. This was particularly pronounced for the Recent subscale. Not surprisingly, this/these measures also exhibited the most severe non-normality of all the measures. The low internal consistency is likely reflective of the low rates observed for some items from this/these specific subscales, and by implication the construct(s) being measured. For example, in the Adolescent age group, the item "In the past 12 months, how many times have you been the victim of property crimes [including theft, motor vehicle theft, burglary, vandalism]" from the Recent subscale did not significantly load on the factor in the CFA. This item is likely not as relevant for this younger age group as it is for the 3 older age groups. The low internal consistency values could also be a function of how Cronbach's alpha is calculated. In addition to the inter-item correlations, Cronbach's alpha is also a function of the number of items that compose a scale. In this case, there are 4 items per subscale. The magnitude of the standardized factor loadings, except for the Adolescent age group, all reflect fairly strong inter-item correlations. However, when the number of items is factored in to the calculation of alpha, the overall internal consistency value is necessarily attenuated. Given this, applied researchers should use these subscales with care.
The four Avoidant Behaviors subscales (Daylight, Alone; Daylight, Others; Dark, Alone; Dark, Others) all exhibited factorial validity and reliability across the age groups. However, the correlations among the four subscales indicated strong redundancy. For the majority of the correlations these values exceeded 0.70. This indicates a high degree of collinearity that could impact predictive models in which all subscales are added as simultaneous predictor variables. For this/these measures, participants' responses did not reflect a difference between whether or not the target avoidant behavior was applicable to themselves/others or daylight/dark. Interestingly, these subscales were not redundant with other study measures, including related Avoidant measures from the same variable domain (e.g., Positive Avoidant Behaviors). At this point, it would not be prudent to use these four subscales in similar analyses. However, it does appear that the items from these four subscales could be aggregated for use as a General Avoidant scale.
This study is not without other general weaknesses. Not all of the SAFE measures could be validated using a factor-analytic approach. Many of the indexes were measured using a dichotomous rating scale. First, factor analysis is based on the assumption that items have an underlying continuum, even if measured in a binary format. This assumption could not be met with some of these indexes. Second, some of the indexes (e.g., Crime Resources Index) are more appropriately conceptualized and operationally defined using a checklist format, and thus are not represented meaningfully as a factor (or latent variable). Third, some measures were operationalized using as few as two items. These measures are similarly not amenable to factor-analytic evaluation to establish construct validity or reliability. However, these measures all showed significant unique measurement variance as reflected by the relatively small correlations found with the sub(scales) and other summated indexes. While these indexes were not amenable to formal factoranalytic procedures, their measurement form and properties suggest that they are reasonable measures to use in future studies. Finally, several sub(scales) and summated indices exhibited non-normality in the form of positive skewness at both the item-and scale level. This should not preclude the use of these measures in future studies. Measures of victimization, for example, are inherently skewed given their infrequent nature, but are vitally important in conceptual models of crime and physical activity (Patch et al., 2019).

Conclusion
These analyses supported the psychometric characteristics for the measures from SAFE, with the exceptions identified above. Importantly, the measurement properties were shown to be invariant (equal) across the four age groups of interest, thus facilitating cross-group comparisons in predictive models. This conclusion is consistent with the preliminary test-retest reliability evaluation reported in Patch et al. (2019). Although not reported due to space limitations, the findings reported herein were confirmed using additional statistical procedures specific to categorical data analysis (including item response theory) and accounting for clustering at the block group level. While we recommend these measures for use (with the caveat of not simultaneously among the Avoidant Behaviors measures), continued psychometric evaluation, and implementation of other forms of measurement invariance (Ployhart and Oswald, 2004), should be pursued to further refine these instruments and enhance their validity and reliability. The detailed evaluation of the measures, especially identification of weaknesses, will inform interpretation of results in subsequent papers.

Data availability statement
Data are housed at the University of California, San Diego. Data can be made available on request to Dr. James F. Sallis (jsallis@health.ucsd. edu).

Funding
This study was funded by NHLBI grant #5R01HL117884.