Recruitment and Retention in Remote Research: Learnings From a Large, Decentralized Real-world Study

Background Smartphones are increasingly used in health research. They provide a continuous connection between participants and researchers to monitor long-term health trajectories of large populations at a fraction of the cost of traditional research studies. However, despite the potential of using smartphones in remote research, there is an urgent need to develop effective strategies to reach, recruit, and retain the target populations in a representative and equitable manner. Objective We aimed to investigate the impact of combining different recruitment and incentive distribution approaches used in remote research on cohort characteristics and long-term retention. The real-world factors significantly impacting active and passive data collection were also evaluated. Methods We conducted a secondary data analysis of participant recruitment and retention using data from a large remote observation study aimed at understanding real-world factors linked to cold, influenza, and the impact of traumatic brain injury on daily functioning. We conducted recruitment in 2 phases between March 15, 2020, and January 4, 2022. Over 10,000 smartphone owners in the United States were recruited to provide 12 weeks of daily surveys and smartphone-based passive-sensing data. Using multivariate statistics, we investigated the potential impact of different recruitment and incentive distribution approaches on cohort characteristics. Survival analysis was used to assess the effects of sociodemographic characteristics on participant retention across the 2 recruitment phases. Associations between passive data-sharing patterns and demographic characteristics of the cohort were evaluated using logistic regression. Results We analyzed over 330,000 days of engagement data collected from 10,000 participants. Our key findings are as follows: first, the overall characteristics of participants recruited using digital advertisements on social media and news media differed significantly from those of participants recruited using crowdsourcing platforms (Prolific and Amazon Mechanical Turk; P<.001). Second, participant retention in the study varied significantly across study phases, recruitment sources, and socioeconomic and demographic factors (P<.001). Third, notable differences in passive data collection were associated with device type (Android vs iOS) and participants’ sociodemographic characteristics. Black or African American participants were significantly less likely to share passive sensor data streams than non-Hispanic White participants (odds ratio 0.44-0.49, 95% CI 0.35-0.61; P<.001). Fourth, participants were more likely to adhere to baseline surveys if the surveys were administered immediately after enrollment. Fifth, technical glitches could significantly impact real-world data collection in remote settings, which can severely impact generation of reliable evidence. Conclusions Our findings highlight several factors, such as recruitment platforms, incentive distribution frequency, the timing of baseline surveys, device heterogeneity, and technical glitches in data collection infrastructure, that could impact remote long-term data collection. Combined together, these empirical findings could help inform best practices for monitoring anomalies during real-world data collection and for recruiting and retaining target populations in a representative and equitable manner.


Introduction Background
Smartphones offer an unprecedented anytime-anywhere medium for researchers to engage with and assess health-related behaviors in large populations in real-world settings [1,2]. As of 2020, the rate of smartphone ownership in the United States has reached over 80% [3]. The large-scale, high-frequency daily use of such devices coupled with increasingly multimodal onboard sensing capabilities offers an effective approach for conducting large-scale health research [4,5]. The adoption of digital health tools to develop and deploy digitally augmented trials has been rising steadily since the first fully remote decentralized trial in 2011 [6][7][8]. Recent studies have shown the benefits of remote monitoring using smartphones for assessing real-world behavior [9,10], for managing chronic pain [11], cancer care [12], diabetes [13], Parkinson symptom severity [14], and cardiovascular health [15] and for the delivery of remote interventions [16]. The COVID-19 pandemic has further accelerated this growth, enabling over 220 digitally augmented trials in 2021 alone [17,18].
Using smartphones for health research can also help achieve operational efficiency by relying less on traditional research facilities or intermediaries for data collection, which require in-person contact between the study participants and the research team [6,19,20]. Researchers can communicate asynchronously and synchronously with participants and assess their health by actively and passively collecting individualized real-world data [4,21,22]. Active data are defined as data generated through effortful participation (eg, completing a survey). In contrast, passive data are collected without direct input from participants (eg, the number of daily steps estimated through onboard sensors) [23]. Such scalable remote observational models [6,20] could help investigators to understand people's day-to-day experiences of living with a health condition [4] and the relationship between individualized real-world behavior and health outcomes [22].

Challenges in Remote Participant Recruitment and Retention
However, despite the promise of decentralized health research, several challenges related to the representation and inclusiveness of recruitment and the retention of target populations have surfaced [21,24,25], resulting in sparse, unbalanced, and nonrepresentative real-world data collection [21]. Typically, decentralized studies recruit from various web-based sources such as social media (Facebook [26] and Reddit [27]), crowdsourced platforms (Prolific [28]; Amazon Mechanical Turk, MTurk [29]; Centiment [30]; and CloudResearch [31]), and partnerships with patient registries or advocacy groups [32,33]. Although these recruitment channels have shown the potential to reach and recruit large populations remotely [34][35][36], the long-term and uniform retention of remote participants has been challenging. Recent findings show that retention rates vary from 1% to 50% [24], with monetary incentives being able to significantly improve long-term retention [10]. With large-scale open recruitment approaches, including the use of financial incentives, the risk of enrolling gamers or malicious actors increases [37].
With large studies using multiple web-based sources to reach and recruit participants remotely, there is a need to assess the impact that such strategies have on the characteristics of the enrolled cohorts and their retention in the studies. In addition, further research is needed to understand how variations in study participation incentives (eg, time and frequency of payments) and differences between Android and iOS operating systems [38] affect long-term data collection in decentralized studies.

Objectives
To investigate some of these challenges in collecting health data through smartphones in real-world settings, we examined the recruitment, retention, and passive data-sharing patterns of more than 10,000 participants in a large, decentralized research study. Specifically, we evaluated the following three key questions: (1) Does combining different recruitment and incentive distribution approaches lead to a heterogeneous cohort with varying characteristics? (2) Can the participant retention and uniformity of data collection in remote studies be affected by cohort heterogeneity? (3) What are the factors that can affect passive data collection in real-world settings?

Ethics Approval
This study was approved and monitored by the Institutional Review Board of the University of Washington (STUDY00004997) and the Department of Defense Human

Study Overview
The participants in the Warfighter Analytics Using Smartphones for Health (WASH) study were volunteers who lived in the United States and agreed to engage in a 12-week smartphone-based study. The primary goal of the study was to understand the real-world factors that could help with the early prediction of cold, influenza, and the impact of traumatic brain injury on daily functioning. The eligible participants were individuals aged ≥19 years, English speakers, residents of the United States, owners, and primary users of iPhone or Android smartphones with internet access. The potential participants were required to complete an eligibility screener before consenting, and those who did not meet the inclusion criteria were not permitted to complete subsequent procedures.

Recruitment
Participant recruitment started on March 15, 2020, with rolling enrollment until January 4, 2022. The participants for the study were recruited in 2 phases, using different recruitment and incentive distribution approaches (Figure 1). Participants could receive up to US $90 for completing the baseline survey and 12 weeks of follow-up surveys. The final participation incentive was determined on the basis of the number of complete surveys. Participants were not informed about the financial breakdown during the consent process; however, additional details regarding when they would receive compensation and how much compensation they would receive were provided upon request. Participants recruited during phase 1 were paid weekly (12 times) starting their first day in the study. Anyone who had participated up to October 3, 2020 (who was recruited close to the recruitment pause date) still received weekly payments. Participants recruited during phase 2 were paid every 3 weeks (a total of 4 times) starting their first day in the study. (B) Details of smartphone-based active and passive data collected through the study app during the study observation period (84 days). *Indicates that sharing of accelerometer and gyroscope was made mandatory on August 28, 2020.

Participation Incentives
Phase 1 (March 15, 2020, to July 11, 2020) Participants were primarily recruited by placing advertisements on social media platforms that directed potential participants to a study recruitment website. Press releases in local news outlets also served as a recruitment source [39,40]. Participants recruited during this phase were paid weekly on the basis of the days a participant completed all daily surveys. The amount received per day increased throughout the 12 weeks (eg, approximately US $4 in weeks 2 to 4, approximately US $6 in weeks 5 to 8, and US $7 in weeks 9 to 11, with more significant payments made for weeks 1 and 12 because of higher incentives rewarding those who completed the baseline survey in week 1 and the exit survey in week 12). However, a significant increase in study enrollment in June 2020, which seemed to be inconsistent with planned recruitment, led the study team to pause enrollment on July 11, 2020. The analysis of participant activity during this period indicated that some malicious actors were engaged in the study. Further details on the assessment of malicious actors can be found in the study by Bracken et al [37].

Phase 2 (August 30, 2020, to the Data Freeze Date, February 9, 2022)
such as disallowing the autofilling of surveys in the study app, changing recruitment sources, and changing incentive payment frequency [37]. Participants were recruited from 2 web-based recruitment platforms, Prolific and MTurk, from January 4 to December 28, 2021, and from May 15 to December 21, 2021. Prolific is a web-based research platform that includes several safeguards for preserving data quality [41][42][43][44]; minimizes gamers or malicious actors; and has been shown to be reliable, efficient, and affordable for remote data collection for behavioral research [45]. Similar to Prolific, MTurk is another web-based crowdsourcing platform regularly used in health research to recruit study participants to complete tasks such as data processing, problem-solving, and surveys [46]. In phase 2, participants were paid every 3 weeks, with the first participation incentive payment taking up to 5 weeks. The change in the payment schedule was implemented for allowing sufficient time to execute procedures intended to identify malicious actors.

Active Surveys
Assessments were divided into 1 longer baseline survey and brief daily assessments. The baseline survey assessing participants' health history, mood, physical activity, and phone use was administered 24 hours after consent was obtained in phase 1 of the study. However, in phase 2, the baseline health survey was administered immediately after consent was obtained. In both phases, the participants were administered the same scheduled health-related surveys twice daily for 12 weeks. The survey asked participants about their mood, physical activity, and phone use.

Sensor-Based Data Collection
Sensor-based data were collected actively and passively from participants through the study app. Participants completed performance outcome measures [47] such as standing and walking tests and sharing voice recordings. The participants were also asked to allow the study app to collect passive data from their smartphones. Passive data included, but were not limited to, device movement and orientation; actual and relative location; the device's status (eg, active use or connected to a data network); and local environmental information such as ambient light, temperature, and humidity. Participants had the option to not share the passive data and remain in the study. However, all participants enrolled in the study on or after August 28, 2020 (before the start of phase 2), were required to allow the study app to passively collect the accelerometer and gyroscope sensor data from their smartphones.

Overview
All the data collected from the participants were deidentified. The data collected through the app were encrypted on the phone and stored on secure servers, separate from any identifiable information. Raw data, such as image, proximity, voice, and actual location data of participants, were stored separately from all other sensor data and were not shared with the research team. For this study, data from the enrolled participants between the study launch date (March 15, 2020) and the data freeze date (February 9, 2022) were used for analysis.

Data Cleaning
Before analysis, data from 6788 suspected malicious actors were removed based on the rules for flagging such actors that were defined in the study by Bracken et al [37]. Test data collected before the study launch date on March 15, 2020, were removed. If a survey was submitted more than once, we used the most recent submission to assess the participant's compliance in the study. If participant responses had values outside the expected range of valid values, they were marked as invalid data.

Data Harmonization
To investigate participant retention in the study, we classified the data collected by the study app into two broad categories: (1) survey data, representing any active survey data shared by participants through the study app, and (2) sensor data, representing passive continuous sensor data gathered by the study app without active input from participants as well as active sensor data collected during a performance outcome assessment (eg, walking test data collected from accelerometers and gyroscopes).

Overview
Statistical analyses were performed using data from 10,768 participants after excluding 6788 malicious actors from the data set (6788/17,556, 38.66%). Descriptive analyses of recruitment and cohort characteristics for categorical variables were based on frequencies and percentages. Levels of categorical baseline variables that contained <5% of the cohort were omitted or combined with other levels that contained <5% of the cohort to reduce data sparsity in the analysis. We used median values with the 25th and 75th percentile (IQR) for summarizing continuous variables that were not normally distributed. The differences in cohort characteristics were compared using bivariate analysis methods. The chi-square test was used for testing statistically significant differences between categorical variables; the Fisher exact test was used when table cell counts were <5, and the Mann-Whitney U test was used for continuous variables. We used the logistic regression model to assess any statistically significant association between patterns of passive data sharing and participants' sociodemographic characteristics and technical variables. These included race, ethnicity, age, sex, education level, income level, device type, and recruitment phase. Specifically, we compared 3 data-sharing patterns of participants sharing at least 25% (2/8), 50% (4/8), or 75% (6/8) of the 8 common passive data streams between Android and iOS devices. The 95% CIs and P values were computed using a Wald Z distribution approximation.
We adjusted P values by using false discovery rate correction to correct for multiple comparisons across different sensor types. The analyses were conducted using R (version 4.1.1). Statistical significance was assumed when the false discovery rate-corrected P value was <.05.

Retention Analysis
To examine overall retention in the study, we used the univariate Kaplan-Meier survival curves [48], which were tested for statistically significant differences using the nonparametric log-rank test [49]. A participant's last day in the study was determined by the last day of their data sharing. To assess the difference in retention between active and passive data sharing, we also computed study retention for active and passive data streams separately. We used right-censored data for the Kaplan-Meier estimator, given that participants could have continued to use the study app beyond the end of the study period (84 days).
To assess the joint effect of multiple variables of interest, including sociodemographics, on participants' retention in the study, we initially used a multivariate Cox proportional hazards (CoxPH) model [50]. However, one of the key assumptions for CoxPH models (the effect of covariates should not change over time) tested using the Schoenfeld individual test was not met [51]. Multimedia Appendix 1 presents test statistics showing that the CoxPH model assumption is not being met. With the underlying retention data not supporting the CoxPH model assumption, we used a nonparametric log-rank test [52] to assess the statistically significant impact of individual variables on retention within each phase. We cross-compared the median retention for each level of a variable of interest across the 2 study phases.

Recruitment
As of the data freeze date (February 9, 2022), the study recruited 10,768 participants. Most participants (6494/10,768, 60.3%) were recruited during phase 1, and the remaining (4274/10,768, 39.69%) were recruited during phase 2 (see the Methods section). A significant proportion of participants, most notably in phase 1, did not complete the baseline survey (phase 1: 3135/6494, 48.27%, vs phase 2: 918/4274, 21.47%). Figure 2 compares the recruitment rate of the study with the baseline survey submission rate over time. The number of baseline surveys completed generally was in line with the number of participants recruited during the study period. Recruitment peaked in mid-April and mid-May 2020 for phase 1 and in mid-January and early March 2021 for phase 2. However, during phase 1, between May and July 2020, the number of baseline surveys completed was significantly lower than the number of recruited participants, which explains the large proportion of missing baseline data in phase 1. We further assessed the effect of missing baseline surveys on participant engagement in the study (see the Retention Analysis section). Additional statistics on missingness and invalid data entries in the baseline surveys are summarized in Multimedia Appendix 2. Comparison of the 7-day moving average between the number of participants recruited (black) and the number of baseline surveys completed (red) during the study period. Gray shaded area shows the study recruitment pause phase from July 11, 2020, to August 29, 2020.

Cohort Characteristics
Most of the participants who completed the baseline sociodemographic survey were female (3817/6574, 58.06%

Passive Data Sharing
The number of data modalities that were passively collected by the study app varied across the Android (31 data modalities) and iOS (14 data modalities) operating systems. The variation in the number of passive data modalities available across Android and iOS devices is because of the available onboard sensors and data collection restrictions across the two operating systems [38]. Of the 31 Android passive data streams, 18 (58%) were shared by at least 50% of the Android users across the 2 study phases ( Table 2). In contrast, 86% (12/14) of the distinct passive data streams were shared by at least 50% of the participants using iOS devices. Multimedia Appendix 4 summarizes data-sharing proportions per sensor stratified across Android and iOS devices. None of the participants with iOS devices shared passive data from the camera or barometer. Similarly, participants with Android devices did not share any data from some passive data streams, including temperature, camera, and humidity (Multimedia Appendix 4). This variation in passive data sharing could also be linked to the heterogeneity and nonavailability of specific sensors in some devices. It is worth noting that phase 2 of the study required participants to share accelerometer and gyroscope data passively. However, a small yet notable proportion of the cohort recruited in phase 2 did not share accelerometer (503/4089, 12.31%) and gyroscope (856/4089, 20.89%) data.
In addition, across the 8 passive data streams that were common between Android and iOS devices, the participants' passive data sharing was linked to sociodemographic characteristics and device type. In total, 3 data-sharing patterns of participants sharing at least 2 (25%), 4 (50%), or 6 (75%) of the total 8 passive data streams were tested. Across all 3 data-sharing patterns, Black or African American participants were found to be statistically significantly less likely to share passive sensor data than non-

Participant Retention
The median retention time of the overall cohort was 38 days, within the 84-day study observation period. No meaningful difference was observed in cohort retention across the active (median 37 days) and passive (median 38 days) data streams (Multimedia Appendix 6). The sensitivity analysis of participant retention also showed no significant difference in median survival across the active and passive data streams (Multimedia Appendix 7). Consequently, all subsequent retention analyses were conducted by combining the active and passive data streams.
Notable differences in retention were observed across the population recruited between phases 1 and 2. Participants recruited in phase 2 had a significantly higher median retention (+14 days) than those recruited in phase 1 (phase 1: median 37 days; phase 2: median 51 days; P<.001; Figures 3A and 3B It is worth noting that certain characteristics, including socioeconomic factors, distinctly impacted participant retention across the cohorts recruited in phases 1 and 2 ( Table 2). Participants who completed the baseline survey administered immediately after enrollment in phase 2 were retained for a significantly longer period (with median values of baseline survey: yes 59 days vs no 19 days in phase 2; Figure 3D). However, the same trend was not observed for participants recruited from social media platforms in phase 1. Similarly, in phase 1, the non-Hispanic White population was retained in the study for a significantly longer time (median 37 days) than the Hispanic or Latino population (median 20 days; Figure 3G). No meaningful differences were observed among non-Hispanic White and Hispanic, Latino, or Spanish populations in phase 2 ( Figure 3H). Education level mainly impacted retention in phase 1. Participants reporting high school or lower education levels had the shortest retention (median 5 days) than other participants (median ≥36 days) in phase 1. Such a large difference in retention because of educational level was not seen in the population recruited in phase 2 from crowdsourcing platforms (Figures 3I-3K). Participants' self-reported income was also found to be significantly associated with retention in phase 1 only. Participants with incomes of <US $49,999 were retained longer than participants earning >US $100,000 (phase 1: US $49,999 vs US $100,000 median retention 34 days vs 24 days, respectively; P<.001; Figures 3I and 3J). We also noticed a dramatic difference in median participant retention between Android and iOS users enrolled in phase 1 (iOS 37 days and Android 22 days; P<.001). Table 2 and Multimedia Appendix 8 provide additional results and details on the survival analysis.

Principal Findings
Our results from the analysis of over 330,000 days of engagement data collected from over 10,000 participants in real-world settings showed that combining different recruitment and incentive distribution approaches can yield heterogeneous cohorts. To the best of our knowledge, this is one of the first studies to empirically assess real-world differences in participants' sharing of multimodal passive data collected from iOS and Android devices using a bring your own device (BYOD) approach.
Overall, there were 5 key learnings. First, recruiting participants using different media, for example, digital advertisements on social media and web-based newspapers or crowdsourcing platforms, could result in heterogeneous subcohorts with varying characteristics. Second, participant engagement could vary significantly based on the recruitment source (eg, social media vs crowdsourced platforms) and incentive distribution approaches. Third, passive data collection could be substantially affected by technical variations in Android and iOS devices and the sociodemographic demographics of the cohort. Fourth, there is a greater likelihood of participants completing baseline health surveys if they are administered immediately after consent or enrollment. Fifth, monitoring patterns in real-world data collection at the study level could reveal technical glitches that could help guide contextual data filtering and cohort selection, leading to more reliable evidence generation. We now contextualize our principal findings to inform strategies to recruit, retain, and monitor trends in remote data collection to help collect real-world health data in a representative and equitable manner.

Combining Recruitment Platforms Could Yield Heterogeneous Real-world Cohorts
Notable differences were observed between the demographic and socioeconomic characteristics of participants recruited from web-based advertisements in social media and newspapers (phase 1) versus crowdsourcing platforms (phase 2). This indicates that combining multiple web-based recruitment sources could yield heterogeneous cohorts, resulting in nonuniform data collection. Future remote studies should assess the potential impact of combining the real-world data obtained from participants enrolled through different recruitment media. Furthermore, the web-based advertisement-based open enrollment approach in phase 1 while successfully recruiting a large cohort quickly also resulted in a significant proportion of bad actors joining the study to receive monetary incentives [37]. On the other hand, crowdsourcing platforms (MTurk and Prolific) were slower in recruiting participants, but their retention was notably higher than that of participants recruited using social media advertisements in phase 1. Indeed, as we have noted in an earlier paper, news outlets and social media recruitment are more likely to attract malicious actors [37] and, as we demonstrate here, less-committed research participants. However, despite the benefits of paid crowdsourcing platforms in effectively reaching and recruiting participants, researchers should carefully consider other factors that could influence the findings of a study [53][54][55][56][57] when recruiting participants from such platforms. These include (1) the primary motivation to remain engaged in remote studies, which may be tied to monetary incentives linked to task completion, and (2) the recruited population may not be representative of the general population [58] or of target health conditions. The characteristics of recruited participants may also vary substantially across recruitment platforms. (3) Nonnaivety-recruited people could be routine participants in research, which could impact the assessment of the actual underlying effect and (4) assessment of the fitness for the purpose of crowdsourced workers for a particular task or study [59][60][61].

Changes in the Incentive Distribution Can Have an Impact on Recruitment and Retention
By increasing the interval at which participants were paid, a significant reduction was observed in the number of malicious actors joining the study in phase 2. Furthermore, keeping the total incentive paid the same, participants who received less compensation weekly (phase 1) remained engaged in the study for a significantly shorter duration than those receiving a larger sum every 3 weeks (phase 2). Although higher retention in phase 2 cannot solely be attributed to a change in incentive distribution (because of a lack of randomization), it is indicative of a potentially interesting behavioral economics model [62] that addresses the perceived burden of participants with episodic but more significant rewards. The value of incentives relative to the study burden also varied by socioeconomic characteristic. In both phases, participants in lower-income groups engaged for longer, likely driven by the incentives, than those in higher-income groups, a finding evident in other studies [63]. Past research has shown that incentives can be an effective way to retain such participants, as small incentives could constitute a way of dealing with monetary barriers [64]. However, researchers should use such incentive-based engagement strategies in a noncoercive manner [65,66] so that potential study participants are not unduly influenced to join and share their data in a research study.

Assessing Patterns in Real-world Data Collection Can Reveal Underlying Technical Issues
The evaluation of day-to-day study-level data revealed several patterns indicating transient technical glitches in data collection that, if unaddressed, could bias downstream evidence generation. First, a significant drop in the relative rate of enrollment for baseline survey completion was observed in phase 1 (Figure 2). This could be indicative of a technical glitch in the data collection system or an attempt by a large number of bad actors to join the study to gain monetary incentives (if applicable). Second, active and passive data collection patterns varied notably across the study recruitment phases. For example, we identified 2 periods during the second phase of the study, when the study app collected no passive data despite the completion of active tasks by participants (Multimedia Appendix 9). This is likely a technical glitch in passive data collection that could severely impact the passive data density for the participants who were active during this period. Understanding the context and period in which the data are missing can guide cohort and data selection for a reliable and unbiased downstream analysis. Third, a small but substantial subset of participants recruited in phase 2 did not share the 2 mandatory passive data streams, accelerometer (503/4089, 12.3%) and gyroscope (856/4089, 20.9%), but continued to remain active in the study (Table 2). Near real-time comparison of data modalities shared by participants can help the study teams triage participants who do not meet the required inclusion criteria per the approved study protocol. Fourth, the retention analysis stratified by technical variables (eg, device type) revealed latent idiosyncratic patterns. We observed a notable trend in retention for the participants recruited in phase 1 (Multimedia Appendix 8; Figure  3G). Participants with iOS devices showed a dramatic drop in retention around day 37 compared with a gradual decline for those with Android devices. There may be several plausible reasons for this significant yet idiosyncratic retention pattern, seen only in phase 1 of the study. The sociodemographic characteristics varied significantly between the iOS and Android cohorts in phase 1 compared with phase 2 (Multimedia Appendix 3). In addition, there could have been a bug in the iOS app around week 6 (days [35][36][37][38][39][40][41][42] that could have impacted participant experience and data sharing in phase 1. Taken together, these findings show an urgent need to prioritize real-time monitoring of data collection in real-world settings while the study is in progress. This also provides a just-in-time intervention opportunity to understand, document, and fix the root cause, preventing lower-quality data collection.

Passive Data Collection Can Vary Substantially in BYOD Studies
Passive data collected from the participants' own devices showed that the onboard sensors available across Android and iOS devices can vary substantially. Even for the common passive data streams available on both Android and iOS devices, there can be substantial differences in the sharing of multiple passive data streams linked to participants' sociodemographic characteristics and device types. For example, in this study, Black individuals or African Americans were significantly less likely to share multimodal passive sensor data (Multimedia Appendix 5). Researchers should expect a high degree of heterogeneity in passive sensor data streams in large BYOD studies and consider the impact of device heterogeneity on data collection, analysis, and evidence generation [38,[67][68][69][70].

Impact of Participants' Sociodemographic Characteristics on Retention
Older participants (aged ≥60 years) were retained in the study for the longest duration. This finding is consistent with a previous large cross-study comparison of retention [63]. However, the impact of sociodemographic characteristics on participant retention was considerably different between the cohorts recruited using social media advertisements (phase 1) versus crowdsourcing platforms (phase 2). The relative difference in median retention within individual categories (eg, non-Hispanic White vs Hispanic or Latino) was remarkably higher and aligned with prior research [63] in the phase 1 cohort than the cohort recruited in phase 2 ( Table 2). This indicates a significant discrepancy in how sociodemographic characteristics may affect participant retention based on recruitment sources. Our findings offer evidence that the population recruited from web-based crowdsourcing platforms shows more homogeneous engagement in research studies than the general population, a behavior that is likely driven by underlying motivation and monetary incentives.
In addition, the demographic composition of the United States is becoming increasingly multiethnic and pluralistic, and it is projected that there will be no majority racial or ethnic groups by 2060 [71]. The sociodemographic characteristics of the enrolled cohort together with nonuniform participant attrition show that large observational studies may not enroll and collect health outcomes from a diverse and representative population uniformly. Future studies should emphasize enrolling diverse populations, such as an All of Us cohort [72], and retaining a diverse sample throughout the study period to ensure that their learnings apply to diverse populations. In addition, some of the challenges in recruiting a diverse cohort have been identified to be related to participants' perceptions, trust, and willingness to enroll and share their data with researchers, governments, and academic institutions [46,71].

Timing of Administration of Baseline Surveys May Impact Completion Rates
The engagement data showed that the timing of administration of the baseline survey could be linked to survey completion rates. The missingness rates of the baseline survey were notably different between the 2 phases (phase 1: 3135/6494, 48.27%; phase 2: 918/4274, 22.47%; Figures 3C and 3D). This indicates that participants were more likely to complete the baseline assessments if they were administered immediately after consent or enrollment (phase 2). This is likely due to a higher level of engagement when enrolling for the study than at subsequent time points, when attention may be captured by other activities. This finding is aligned with some prior research in which participants were more likely to engage with a mobile health app within 24 hours if prompts were provided when participants are most receptive [73]. Moreover, Bidargaddi et al [73] revealed that the degree of engagement is also influenced by other contexts, such as the time of day and the day of the week. These results could help us understand the importance of time of administering an assessment and its impact on data quality in research studies.

Limitations and Future Directions
The analysis of participant recruitment and retention data from the WASH study should be interpreted within the context of certain limitations. First, large-scale, fully remote data collection started close to the declaration of the COVID-19 pandemic in the United States, which is known to have changed our behavior and interaction with technology and devices [74]. Indeed, Inverso et al [75] showed higher engagement rates during the COVID-19 pandemic because of an increased reliance on technology during the lockdown. The WASH study began recruitment on March 15, 2020, shortly after the World Health Organization declared COVID-19 a pandemic on March 11, 2020. Therefore, we did not have pre-post pandemic data to account for the potential impact of the pandemic on participant engagement with technology and devices. Second, the original purpose of the WASH study was to use the study app to detect cold and influenza symptoms. Thus, participants were not randomized among different recruitment platforms, incentive distribution frequency, and timing of baseline surveys that varied between phases 1 and 2 of the study. Consequently, our findings are not causal or linked to the impact of 1 factor on participant recruitment and retention between phases. For example, this analysis compares the population characteristics of those recruited from web-based crowdsourcing platforms (phase 2) compared with participants enrolling based on social media and local advertisements (open enrollment phase 1) as a whole. We were not able to explore within-phase recruitment differences; that is, between those recruited from social media versus those recruited from local news media advertisements. This is mainly because of the limited information available in the study data, which does not allow for such differences to be investigated. Further research studies using a randomized design are needed to investigate the impact of individual changes in recruitment and retention strategies and their effectiveness for use in decentralized research. Third, we could not control for the participants' previous experience in crowdsourcing platforms and research tasks, which can be a confounder [41] depending on the nature of the assessment. Future research studies should assess participants' prior participation in similar or other research studies to assess any differential impact on primary outcomes. Fourth, in phase 1, participant recruitment via press releases was centered in the Greater Seattle area, which may not be representative of the population of the United States. In addition, because of the high proportion of missingness in the baseline geolocation data, we could not determine the geospatial representativeness of the cohort. Future studies should prioritize collecting high-level geolocation data, such as the state, city, or zip code, to help assess the geospatial representativeness of the study cohort. Fifth, we could not account for all the underlying within-study differences in the outcomes; for instance, the probable technical glitches concerning the steep drop in participant engagement at the participant level on day 36 in phase 1 and fluctuations in sensor data collection or management in phase 2 (Multimedia Appendix 9). These technical issues could have impacted the participants' willingness to remain engaged and increased the perceived burden of participants who were active in the study at the time of technical glitches. Sixth, despite our filtering out bad actors, some could still have been successfully enrolled by creating multiple accounts or using multiple devices. We suggest that future digital health research studies specifically report and compare the impact of different temporal recruitment and incentive strategies on enrolled cohorts' characteristics and engagement metrics as well as fraudulent enrollments to allow for future replication and the establishment of a set of guidelines for successful methods of participant recruitment and retention.