Weighting the United States All of Us Research Program data to known population estimates using raking

Background The All of Us Research Program aims to collect longitudinal health-related data from a million individuals in the United States. An inherent challenge of a non-probability sampling strategy through voluntary participation in All of Us is that findings may not be nationally representative for addressing health and health care at the population level. We generated survey weights for the All of Us data that can be used to address the challenge. Research design We developed raked weights using demographic, health, and socioeconomic variables available in both the 2020 National Health Interview Survey (NHIS) and All of Us. We then compared the unweighted and weighted prevalence of a set of health-related variables (health behaviors, health conditions, and health insurance coverage) estimated from All of Us data with the weighted prevalence estimates obtained from NHIS data. Subjects The sample included 100,391 All of Us participants 18 years of age and older with complete data collected between May 2017 and January 2022 across the United States. Results Final variables in the raking procedure included age, sex, race/ethnicity, region of residence, annual household income, and home ownership. The mean percentage difference between known proportions obtained from the NHIS and All of Us was reduced by 18.89% for health-related variables after applying the raked weights. Conclusions Raking improved the comparability of prevalence estimates obtained from All of Us to known national prevalence estimates. Refining the process of variable selection for raking may further improve the comparability between All of Us and nationally representative data.


Introduction
The All of Us Research Program is a key component of the Precision Medicine Initiative that was launched under the direction of the White House in the United States (US) in 2015 (The Precision Medicine Initiative, 2023).The initiative leverages advances in science and technology to develop new health care models that take into account variability between individuals in genetics, the environment, socioeconomic conditions, and lifestyles (Denny et al., 2019).The goal of All of Us is to develop a longitudinal database that integrates information from self-reported surveys, electronic health records (EHRs), genomic data, physical measurements, and other health-related data from at least one million individuals (Denny et al., 2019).
A core principle of All of Us is that it prioritizes reaching populations that are historically underrepresented in biomedical research (UBR) (Mapes et al., 2020).UBR populations include members of racial/ethnic minority groups as well groups defined by factors such as age, sex, gender, socioeconomic status, access to health care services, rurality, and/or disability (Mapes et al., 2020).The All of Us Research Program has developed strategies for recruitment that include engagement efforts with community partners to conduct outreach with UBR groups (Mapes et al., 2020;Lyles et al., 2018;Mercer et al., 2018).Despite the intentional efforts to recruit UBR populations, the COVID-19 pandemic and other recruitment challenges have made it difficult for the All of Us Research Program to reach the projected enrollment level of participants and especially UBR populations.Recruitment and engagement with participants during COVID-19 was limited as a result of lack of access to in-person activities (Hedden et al., 2023).All of Us modified strategies for recruitment and the collection of biospecimen data, physical measurements, and surveys, so that some activities could take place remotely while also following safety guidelines from the Centers for Disease Control and Prevention (Hedden et al., 2023).Consent to participate in All of Us fell by 53 % in Black/African American adults and increased by 29 % for White adults before and after March 2020 when the COVID-19 pandemic hit the US (Heddedn et al., 2022).Socioeconomically disadvantaged populations were also less likely to participate in the study than the rest of the population (Hedden et al., 2023).
The objective of this study is to generate survey weights that can facilitate the use of All of Us data for population-based research.Developing strategies to calibrate All of Us data is crucial to make sure estimates obtained from the database are consistent with known population estimates.Understanding the patterns of health and diseases at the population level can help inform policies and programs needed to expand the utility of precision medicine at scale.Raking is particularly useful for estimating survey weights because access to All of Us data is highly protected through the use of a cloud-based platform.This data management approach protects data security and participant privacy but does not allow the use of data management and analytical/statistical software outside the All of Us cloud-based platform.Raking provides a convenient way to integrate population-level data (i.e., known population estimates/proportions for different categorical variables) with protected individual-level data to estimate survey weights.

Data source
All of Us began its data collection in May 2017 and the program has recruited over 630,000 individuals as of April 30, 2023.Persons 18 years of age and older living in the US or a territory of the US with the "legal authority and decisional capacity to consent" in either English or Spanish were eligible to participate in the program (All of Us Research Program Operational Protocol, 2021).Participants included in the database had provided consent to participate in the study, authorized the sharing of EHR data, completed baseline surveys, and provided physical measurements and at least one biospecimen for biobank storage.Key characteristics of the All of Us data are summarized in Table 1.

Benchmark data used in the raking procedure
We used the 2020 NHIS to identify benchmark variables as the population parameters to be included in the raking procedure.NHIS is an annual, in-person survey collected from a sample representative of US households.Persons institutionalized in long-term care or correctional facilities or on active military duty were excluded from participation.NHIS uses a multistage probability sampling design for sample selection and collects information from about 100,000 individuals every year.

Analytic sample
Participation in the different survey modules is voluntary and takes place at the pace of the respondents.As such, we selected respondents with complete data in all variables of interest, which resulted in an analytic sample of 100,391 All of Us adult participants residing in the US.Data included in this study were released on June 23, 2022 (this data release includes participants that provided data up to January 1, 2022) (Master et al., 2022).

Measures
Variables for raking.Variables available in both All of Us and the NHIS were considered for the raking procedure.The variables included were age group (18-24, 25-34, 35-44, 45-54, 55-64, 65 years of age and older), biological sex at birth (female, male), race/ethnicity (non-Hispanic White, non-Hispanic Black, non-Hispanic Asian, Hispanic, Others), Census region of residence (Northeast, Midwest, South, West), annual household income (less than $25,000, $25,000 to less than $35,000, $35,000 to less than $50,000, $50,000 or more), educational attainment (did not graduate high school, graduated high school, some college, college graduate), home ownership status (own, rent, other arrangement), sexual orientation (gay/lesbian/bisexual/something else, straight), self-reported health status (excellent, very good, good, fair, poor), usual place of care (doctor's office, clinic or health center, urgent care or minute clinic, hospital emergency room, some other place or don't go to one place most often, don't have a usual place for care), and interval since last doctor's visit (less than 1 year, 1 year to less than 2 years, more than 2 years).
Outcome variables.We selected key lifestyle risk factors, associated chronic health conditions, and health insurance coverage to compare the difference between the All of Us estimates applying raked weights and the national prevalence values estimated from the NHIS.Lifestyle risk factors included alcohol use (yes/no) and being ever a smoker (yes/no).Associated chronic health conditions were identified through separate survey questions asking respondents whether they had hypertension, coronary artery disease, and diabetes.

Statistical analyses
We applied iterative proportional fitting (IPF, known also as raking) to estimate survey weights for the cohort of US adults included in our analytic sample.We followed the general approach used to compute sampling weights for the survey datasets of the American National Election Studies (ANES) (DeBell and Krosnick, 2009;Pasek et al., 2014).Participation in the All of Us Research Program is voluntary and, as such, weighting the data following the generally recommended approach for the ANES survey datasets may prove useful to adjust the data to known population values.
We first identified categorical variables comparable between the study proportions to known population proportions in NHIS (Deville et al., 1993).These variables were described in the Measures section above.In raking, the study proportions were compared to the known proportions one variable at a time, and they were adjusted by multiplying existing sampling weights by the ratio of the known proportion and the study proportion.The process was applied sequentially to each of the selected variables, adjusting the existing weight calculated in the previous step with a new weight adjusted for the new variable being considered in the raking process.The process was repeated until the weighted proportions for all the variables considered were as close as possible to the known proportions for all the variables (this was accomplished using a prespecified variable discrepancy tolerance level) (Deville and Sarndal, 1992).The final selection of variables to be included in constructing weights via raking were based on eliminating bias (i.e., the difference between the All of Us study prevalence estimates and the benchmark values obtained from the NHIS).In following the guidance developed by the ANES committee, we excluded variables with less than a five percentage point discrepancy (Pasek et al., 2014).We used the anesrake R package (version 0.80; available at the Comprehensive R Archive Network) to implement the raking procedure.
All of Us data were accessed through the Researcher Workbench, a cloud-based platform that supports data analysis and collaboration in a Jupyter Notebook environment.All data management and statistical analyses were conducted using Structured Query Language (SQL) and the R (version 4.2.1)software environment for statistical computing.All of Us participants provided consent to participating in the study, the protocol of which was approved and is actively monitored by the All of Us Institutional Review Board (IRB).All data in All of Us was deidentified; therefore, the requirement for institutional IRB review was waived for this study.

Table 2a presents a comparison of population proportions of NHIS
and study proportions of All of Us, both unweighted and weighted with raked weights.The variables used in the raking procedure for this study included age, sex, race/ethnicity, region of residence, annual household income, and home ownership.The mean percentage difference between the NHIS and unweighted All of Us estimates was 6.69 %.When compared to NHIS estimates, the unweighted age distribution of All of Us had a lower proportion of younger adults and a higher proportion of older adults; a higher proportion of female, gay, lesbian, bisexual, and other, non-Hispanic White adults, college graduates, participants with other living arrangement, residence in the Northeast, Midwest, and West regions, self-reported health status as very good, good, or fair, individuals having doctor's office, clinic or health center as a usual source of care, and individuals with a visit to the doctor in less than a year.
Table 2b reports the prevalence of health and health behavior outcomes for All of Us adults estimated using unweighted data and data with raked weights.The mean percentage difference between NHIS and All of Us after applying raked weights was 3.21 %.Applying the raked weights to socioeconomic and demographic characteristics reduced the mean percentage difference between known proportions obtained from the NHIS and All of Us by 51.94 % (i.e., from 6.69 % to 3.21 %) among demographic variables and by 18.89 % (i.e., from 6.25 % to 5.07 %) among health and health behavior outcome variables (hypertension, coronary artery disease, diabetes mellitus, alcohol use, ever smoker, and health insurance coverage).

Discussion
Our findings suggest that applying raked weights can better align prevalence estimates obtained from All of Us data with prevalence estimates obtained from the NHIS for health-related behaviors, conditions, and health insurance coverage.In this study we were able to reduce the mean percentage difference between known proportions obtained from the NHIS and All of Us by one fifth of the mean percentage difference.The reduction in bias was obtained by calculating raked weights based on age, sex, race/ethnicity, region of residence, annual household income, and home ownership variables.
The Pew Research Center conducted a comprehensive weighting study with 30,000 online opt-in panel interviews with three commercial panel vendors (the vendors were not named in the study) (Mercer et al., 2018).The purpose of the study was to compare weighting using raking, propensity score weighting, and matching.Variables considered in the weighting procedure included demographic variables (age, sex, race/ ethnicity, education, and region of residence) and other variables related to political attitudes (voter registration, political party, and religion).The study evaluated different sample size scenarios (up to 8,000 interviews) and found an 8.4 percentage point difference between 24 benchmark variables from the American Community Survey (and other gold standard surveys such as the Current Population Survey and the General Social Survey (GSS)) and the unweighted variables from the opt-in panel surveys.Bias was reduced by about 30 % with the three weighting methods, and the differences between them were very small.The study concluded that accuracy can be improved by carefully choosing the right variables for weighting and adding political variables (i.e., variables related to the main topic of interest in the survey work), and that raking performs as well as propensity score weighting and matching (Mercer et al., 2018).
Other studies have come to similar conclusions.An online opt-in survey (n = 1,017) consisting of 49 multiple-choice attitudinal questions from the GSS and random-digit dialing surveys from the Pew Research Center found that the median absolute difference between the opt-in (non-probability) and probability surveys were 9.1 percentage points; statistical adjustment using raking reduced this to 7 percentage points (Goel et al., 2017).A study comparing raking and poststratification adjustments for the 2013 South Australian Monitoring and Surveillance System (n = 7,193) found that raked weights (to adjust the data to the 2011 Census) substantially improved the accuracy of prevalence estimates for health conditions and behavioral risk factors (Dal Grande et al., 2015).The study concluded that raked weights calculated using demographic variables as well as dwelling status (home ownership) and other variables reduce nonresponse bias and incorporate "lower socioeconomic groups and those who are routinely not participating in population surveys into the weighting formula" (Dal Grande et al., 2015).
Applying raked weights can improve the accuracy of estimates obtained from All of Us but it can also lead to more variation around mean prevalence estimates.Other important considerations include minimizing the difference between the data collection date of All of Us variables and the reference date of the benchmark value variable, addressing missing data, making sure that there are enough All of Us participants in each category of each variable to obtain meaningful estimates, and considering whether there is a need to develop raked weights that vary over time given that each year more participants join the All of Us Research Program and the study has been collecting data on participants beginning in 2017.Still, the approach conducted here is useful given the non-probability sampling approach used in All of Us, the difficulties faced by the All of Us Research Program to collect data during V.H.-C.Wang et al.   the COVID-19 pandemic, and the data protection and access limitations in place that do not allow the use of data management and analytical/ statistical software outside the All of Us Researcher Workbench.

Table 1
Features of the All of Us Research Program.
Remote (online) or in-person enrollment through health care provider organizations and community partnersParticipant diversityEmphasis on populations underrepresented in biomedical research (UBR): non-White, 65 years of age and older, intersex, identified with neither man nor woman, low income, without a high school diploma, no health insurance coverage, living in rural area, with a physical or cognitive disability, or pregnant Benefits

Table 2a (
continued ) Variables used in raking: age, sex, race/ethnicity, region of residence, annual household income from all sources, and home ownership.Prevalence of health and health behavior outcomes for adults 18 + from the May 2017-January 2022 All of Us Research Program estimated using unweighted data and data with raked weights (N = 100,391)*.Mean percentage difference between NHIS and unweighted All of Us (%) 6.25 Mean percentage difference between NHIS and raked All of Us (%) 5.07 Absolute change in mean percentage difference after raking (%)1.18Relative change in mean percentage difference after raking (%) 18.89 *Variables used in raking: age, sex, race/ethnicity, region of residence, annual household income from all sources, and home ownership.