Lessons Learned From a Sequential Mixed-Mode Survey Design to Recruit and Collect Data From Case-Control Study Participants: Formative Evaluation

Background Sequential mixed-mode surveys using both web-based surveys and telephone interviews are increasingly being used in observational studies and have been shown to have many benefits; however, the application of this survey design has not been evaluated in the context of epidemiological case-control studies. Objective In this paper, we discuss the challenges, benefits, and limitations of using a sequential mixed-mode survey design for a case-control study assessing risk factors during the COVID-19 pandemic. Methods Colorado adults testing positive for SARS-CoV-2 were randomly selected and matched to those with a negative SARS-CoV-2 test result from March to April 2021. Participants were first contacted by SMS text message to complete a self-administered web-based survey asking about community exposures and behaviors. Those who did not respond were contacted for a telephone interview. We evaluated the representativeness of survey participants to sample populations and compared sociodemographic characteristics, participant responses, and time and resource requirements by survey mode using descriptive statistics and logistic regression models. Results Of enrolled case and control participants, most were interviewed by telephone (308/537, 57.4% and 342/648, 52.8%, respectively), with overall enrollment more than doubling after interviewers called nonresponders. Participants identifying as female or White non-Hispanic, residing in urban areas, and not working outside the home were more likely to complete the web-based survey. Telephone participants were more likely than web-based participants to be aged 18-39 years or 60 years and older and reside in areas with lower levels of education, more linguistic isolation, lower income, and more people of color. While there were statistically significant sociodemographic differences noted between web-based and telephone case and control participants and their respective sample pools, participants were more similar to sample pools when web-based and telephone responses were combined. Web-based participants were less likely to report close contact with an individual with COVID-19 (odds ratio [OR] 0.70, 95% CI 0.53-0.94) but more likely to report community exposures, including visiting a grocery store or retail shop (OR 1.55, 95% CI 1.13-2.12), restaurant or cafe or coffee shop (OR 1.52, 95% CI 1.20-1.92), attending a gathering (OR 1.69, 95% CI 1.34-2.15), or sport or sporting event (OR 1.05, 95% CI 1.05-1.88). The web-based survey required an average of 0.03 (SD 0) person-hours per enrolled participant and US $920 in resources, whereas the telephone interview required an average of 5.11 person-hours per enrolled participant and US $70,000 in interviewer wages. Conclusions While we still encountered control recruitment challenges noted in other observational studies, the sequential mixed-mode design was an efficient method for recruiting a more representative group of participants for a case-control study with limited impact on data quality and should be considered during public health emergencies when timely and accurate exposure information is needed to inform control measures.


Introduction
Often used during disease outbreak investigations, case-control studies that retrospectively compare people who have a disease (case participants) with people who do not have the disease (control participants) are an efficient and relatively inexpensive method of identifying potential disease risk factors to guide control measures and interventions.Perhaps the most critical and challenging component of conducting a case-control study is the recruitment of appropriate control participants who are from the same source population as case participants [1].Because control participants are not ill and may not be connected to the outbreak, they may be less motivated to complete a lengthy questionnaire that collects personal information and detailed exposure histories [2][3][4].Moreover, with the increased use of mobile telephones and the routine use of caller ID, study participants contacted by traditional telephone-based survey methodologies may be less likely to answer the telephone [5,6], further reducing the opportunity for participant screening and recruitment.
Recruitment challenges are not unique to case-control studies, and other types of observational studies have shifted from traditional telephone interviews to web-based surveys with the goal of reaching larger groups of people more efficiently and at a lower cost [7][8][9][10][11][12].While offering some advantages over traditional telephone interviews, web-based surveys often experience lower response rates and lower data quality [13], and some studies have found demographic differences between telephone and web-based survey participants, likely driven in part by disparities in internet connectivity and access [14].For this reason, researchers have increasingly used both telephone interviews and web-based surveys in a sequential mixed-mode design, first contacting participants using a self-administered web-based survey, and then following up with nonresponders with an interviewer-administered telephone survey [15].In other types of observational studies, this mixed-mode design has been shown to reduce selection bias, reduce costs, improve data quality, and result in higher response rates and faster participant recruitment [16,17], making it an appealing design choice for case-control studies.
In March 2020, the World Health Organization declared COVID-19 a global pandemic, and throughout many countries, public health or other governmental authorities implemented stay-at-home orders, travel restrictions, and other public health interventions to reduce disease transmission.In the absence of adequate data-driven evidence about community risk factors for COVID-19 transmission, we implemented a sequential mixed-mode case-control study design in Colorado to evaluate community exposures and behaviors associated with SARS-CoV-2 infection and inform public health control measures.While the benefits and limitations of sequential mixed-mode designs have been well-documented in other contexts [14,16,[18][19][20], they have not been examined in the context of rapidly implemented epidemiological case-control studies.In this paper, we discuss the challenges, benefits, and limitations of using a sequential mixed-mode survey design using web-based surveys disseminated via SMS text message and telephone interviews for a case-control study assessing exposures during a public health emergency.Specific aims are (1) to compare the sociodemographic characteristics of web-based and telephone survey participants, (2) to evaluate the representativeness of survey participants to the sample population, (3) to assess the completeness of participant responses by survey mode, and (4) to estimate the time and resources required to recruit web-based and telephone survey participants.

Case-Control Study Design and Implementation
The case-control study was conducted among Colorado adults aged 18 years and older who had a positive (case) or negative (control) SARS-CoV-2 reverse transcription-polymerase chain reaction test result in Colorado's electronic laboratory reporting (ELR) system with a specimen collection date from March 16 to April 29, 2021 [21].Eligible individuals testing positive with a completed routine public health interview in Colorado's COVID-19 surveillance system were randomly selected and individually matched on age (±10 years), zip code (urban areas) or region (rural and frontier areas), and specimen collection date (±3 days) with up to 20 individuals with a negative test, with the goal of enrolling 2 matched controls per enrolled case.
Self-administered (web-based) and interviewer-administered (telephone) case and control surveys were developed in Research Electronic Data Capture (REDCap; Vanderbilt University).REDCap is a secure, web-based platform designed to support data capture for research studies [22].The surveys asked about contact with a person with confirmed or suspected COVID-19, travel history, employment, mask use, and community exposure settings (bar or club; church, religious, or spiritual gathering; gathering; grocery or retail shopping; gym or fitness center; health care setting; restaurant, cafe, or coffee shop; salon, spa, or barber; social event; or sports or sporting events) during the 14 days before illness onset or specimen submission.The full survey questionnaire is available in Multimedia Appendix 1.

RenderX
Demographic data were obtained from Colorado's COVID-19 case surveillance system and the control survey.Web-based surveys were offered in English and Spanish and included clarifying language, prompts, skip logic, text piping, and progress bars.Interviewers used computer-assisted telephone interviewing in REDCap with scripting and language line services when needed.Questions and response options were identically worded in the web-based and telephone surveys, with the exception of a "refused" option for questions in the telephone survey.
Using the Twilio integration in REDCap, selected individuals were sent an SMS text message to the telephone number provided at the time of testing (which may include both landlines and mobile phones) 3 to 7 days after their specimen collection date, inviting them to complete the web-based survey.A team of trained interviewers began contacting nonresponders for telephone interviews approximately 3 hours after the initial SMS text message was sent, making 1 contact attempt for individuals testing positive for SARS-CoV-2 and up to 2 contact attempts for those testing negative.Interviewers only contacted as many controls by telephone as needed to enroll 2 matched controls per enrolled case.The web-based survey link was resent via SMS text message or sent via email when requested.When possible, voicemail messages were left encouraging SMS text message recipients to complete the web-based survey.As the goal of the case-control study was to assess the risk of SARS-CoV-2 infection from community exposures, we only included surveys that had responses to all 15 community exposure questions.Partial surveys that did not have complete community exposure data were excluded from analyses.Individuals were also excluded if they reported living in an institution, close contact with a household member with confirmed or suspected COVID-19, receiving ≥1 dose of a COVID-19 vaccine (which was not universally available in Colorado at the time of the study), symptom onset date >7 days from specimen collection (case participants), a prior positive COVID-19 result (control participants), or providing personal identifying information in the web-based survey that was inconsistent with information from the ELR system (control participants).

Evaluation of a Sequential Mixed-Mode Survey Design
We evaluated the impact of conducting the COVID-19 case-control study using a sequential mixed-mode design by (1) comparing the sociodemographic characteristics of web-based and telephone survey participants, (2) evaluating the representativeness of study participants to the sample population, (3) assessing the completeness of participant responses by survey mode, and (4) estimating the time and resources required to recruit web-based and telephone survey participants.All analyses were performed using SAS (version 9.4; SAS Institute).

Comparison of Web-Based and Telephone Survey Participants
Case and control participants were eligible individuals who completed the web-based or telephone survey.We compared the demographic characteristics (age, gender, race and ethnicity, geographic location, working outside the home, and socioeconomic factors) of case and control participants

Representativeness of Study Participants
We compared the demographic characteristics (as described earlier) of case and control participants completing the web-based and telephone surveys (separately and combined) to the sample pool of all randomly selected individuals testing positive (case sample pool) or negative (control sample pool) for SARS-CoV-2 using 2-tailed t tests, Pearson χ 2 , or Fisher exact tests.

Participant Responses
We evaluated data completeness and differential responses between web-based and telephone survey modes by comparing responses to exposure and behavior questions we deemed prone to social desirability bias (close contact with individuals with confirmed or suspected COVID-19, community exposures, travel, and mask use).Two bivariate logistic regression models, the first adjusting for case-control status and the second adjusting for case-control status and sociodemographic variables shown to be associated with mode effects (age, gender, race and ethnicity, and geographic location), examined the association between survey mode and participant response.Question nonresponse, where data were missing or refused, was evaluated for these questions as well as for other questions with free-text or multiple-choice response options (industry, occupation, reasons for COVID-19 testing, and mask type).

Time and Resource Needs
The time spent by study personnel contacting potential participants by SMS text message and telephone was obtained from self-recorded data in timesheets and used to calculate the person-hours required per enrolled participant.Total expenditures for the web-based and telephone surveys were calculated using staff wages and Twilio texting costs (an average of US $0.008 for a 160-character SMS text message).

Ethical Considerations
The case-control study was deemed by the Colorado Multiple Institutional Review Board to be public health surveillance and not human participant research and was therefore exempt from full approval and requirements for informed consent (protocol 21-2973).

Comparison of Web-Based and Telephone Survey Participants
Case participants completing the web-based and telephone surveys were similar in age (mean 37, SD 13.21 and 14.69 years, respectively), whereas web-based control participants were slightly older than those completing the telephone survey (mean 38, SD 12.44 vs mean 36, SD 12.62 years, respectively; Table 1).For both case and control participants, those aged 40-59 years were more likely to complete the web-based survey, whereas participants aged 18-39 years and 60 years and older were more likely to complete the telephone survey.Web-based case and control participants were more likely to identify as female, White, non-Hispanic, reside in urban areas, and be less likely to work outside the home.Compared to web-based case and control participants, telephone participants had higher EnviroScreen scores for all socioeconomic indicators, indicating they resided in counties with larger populations of individuals with less than high school education, linguistic isolation, low income, and people of color.g Information on sex and working outside the home were not available from Colorado's electronic laboratory reporting system for control participants.
h Not available.
i Colorado EnviroScreen is an environmental justice mapping tool.Scores are assigned at the county level, with a higher score indicating that an area is more likely to be affected by the indicated health injustice.

Representativeness of Study Participants
There were statistically significant sociodemographic differences noted between web-based and telephone case and control participants and their respective sample pools (Table 1).More web-based case participants identified as female (134/228, 58.8%) than those in the case sample pool (642/1318, 48.7%).More web-based control participants identified as White, non-Hispanic (205/267, 76.8%) than those in the control sample pool (4467/7812, 57.2%) and more often resided in urban areas (247/306, 80.7%) than those in the control sample pool (7841/10,898, 71.9%).Case and control participants were more similar to their respective sample pools when evaluated as a single group (total enrolled).

Time and Resource Needs
Over the course of the study, staff spent a cumulative 15 hours randomly selecting and texting potential participants for the web-based survey, averaging 0.03 person-hours per enrolled participant (15 person-hours per 535 web-based participants) and US $500 in staff wages.Twilio texting costs were US $420, amounting to US $920 in total expenditures for the web-based survey.Comparatively, 3319 hours were spent by interviewers attempting to contact nonresponders by telephone, for an average of 5.11 person-hours per enrolled participant (3319 person-hours per 650 telephone participants) and US $70,000 in interviewer wages.

Principal Findings
While the web-based survey was more time-and cost-efficient than the telephone interview, participant enrollment was low, and there were statistically significant sociodemographic differences between the web-based case and control participants and their respective sample pools.Adding the follow-up telephone interview increased participant enrollment and the representativeness of both the case and control participants to sample pools.Participant responses to exposure and behavior questions and data completeness were similar between the 2 survey modalities.
Enrollment more than doubled for case and control participants after interviewers called individuals who did not respond to the web-based survey to complete the survey by telephone.Case participant enrollment for our mixed-mode study was higher than those for other COVID-19 case-control studies using telephone only (40.6% vs 3%-25% case participant enrollment in other studies), but control participant enrollment was lower (5.9% vs 9%-13% control participant enrollment in other studies) [23][24][25].However, control participant enrollment in our sequential mixed-mode study may not be comparable to telephone-only COVID-19 case-control studies for 2 reasons.First, we texted up to 20 potential controls for every enrolled case participant in anticipation of lower response rates for the web-based survey, inflating the number of contacted controls in our response rate calculations.Second, we did not follow up with all potential controls by telephone once our quota of 2 controls per case was reached.In contrast, telephone-only studies only call as many controls as needed to enroll the desired number of matched control participants, which is typically less than 20.
We found sociodemographic differences between participants completing the survey on the web and by telephone.Web-based respondents were more likely to be female, identify as White, non-Hispanic, have higher levels of education, and reside in urban areas, which was consistent with other studies evaluating survey mode effects [12,26,27].Contrary to other studies that found higher web-based response rates among those younger than 35 years of age [14], participants aged 18-39 years in our case-control study were more likely to respond to the telephone survey, as were participants aged 60 years and older, participants working outside the home, and participants residing in areas with a higher burden of health injustices.Some of these differences may be attributable to the timing of when potential participants were contacted.While potential participants were texted a link to complete the web-based survey only in the morning, telephone interviews were administered throughout the day, including in the late afternoon and evening when more people may be at home and not working.In addition, older participants and participants in lower socioeconomic settings may experience more barriers to completing a web-based survey, such as limited internet access or less comfort using mobile platforms [15], making them more likely to complete a telephone interview.
While there were sociodemographic differences between web-based and telephone participants and between web-based and telephone case and control participants and their respective sample pools, the sociodemographic characteristics of combined web-based and telephone survey participants were broadly representative of the sample pools.This indicates that the sequential mixed-mode design allowed for the recruitment of more representative case and control participant groups than if we had used a telephone or web-based survey alone, and the use of this survey design can help reduce selection bias in case-control studies.
Telephone surveys conducted by trained interviewers have several advantages over other modes of administration.Most importantly, trained interviewers can answer participants' questions, add clarifying questions, and probe interviewees for more complete responses, leading to better data completeness and quality.While increasing data quality, telephone surveys can lead to social desirability bias as participants may alter answers to questions to seem more favorable or socially acceptable to an interviewer [19,20].An advantage of using a web-based survey is that the absence of an interviewer may provide participants with the opportunity to answer questions more candidly, potentially reducing social desirability bias [19,20].While we found that web-based participants were more likely to report certain community exposures, most of the differential responses between web-based and telephone participants were no longer statistically significant after adjusting for variables shown to be associated with mode effects (age, gender, race, ethnicity, and geographic location).This suggests that demographic differences between web-based and telephone participants may be confounding variables and should be considered when analyzing and interpreting data for case-control studies.

Limitations
This project was subject to several limitations.First, cases were randomly selected from persons reported in Colorado's COVID-19 surveillance system who had already completed an interview with public health, which may impact study findings.For example, this method of case-participant selection may account for the high enrollment rates we had for our case-control study, and these individuals may systematically differ from those testing positive for SARS-CoV-2 who did not complete an initial interview with public health.Second, sample pool data were obtained from the ELR system for control participants, which had incomplete demographic data.The sample pool characteristics presented in this paper may not be accurate

XSL • FO
RenderX because of these missing data and, in turn, affect our evaluations of sample representativeness.Third, the socioeconomic characteristics of participants may be subject to ecological fallacy as we used county-level Colorado EnviroScreen scores as a proxy for individual socioeconomic status.Fourth, it is unclear whether the systematic differences noted between web-based and telephone participants were due to the survey mode itself or due to the additional contact attempts made to enroll telephone participants.Finally, this sequential mixed-mode case-control study was implemented during the COVID-19 pandemic, a period marked by various political and social factors that could have influenced who responded to our survey and their responses.As such, findings from this paper may not be generalizable to case-control studies evaluating other diseases or outbreaks.

Conclusions
Telephone interviews conducted as part of an outbreak investigation are time-consuming and costly [8].Given the limited resources and staff at many public health agencies, it is critical to find methods to increase efficiency and reduce the costs of outbreak investigations.Web-based surveys are more time-and cost-efficient than telephone interviews, greatly reducing the workload for health departments.However, web-based surveys may appeal to specific demographics, have lower enrollment rates, and may require a larger sample pool or a longer time to enroll participants, which may not be feasible for small outbreaks or ideal for public health emergencies when timely data collection is crucial.
By using a sequential mixed-mode design, we were able to efficiently recruit participants for a case-control study with limited impact on data quality.Moreover, using the sequential mixed-mode approach allowed for maximal sample representativeness compared to a web-based or telephone interview alone.This is critical during public health emergencies, when timely and accurate exposure information is needed to inform control measures and policy.While the sequential mixed-mode design allowed us to reach more potential control participants with fewer resources, we still encountered the same challenges recruiting control participants noted in other studies.

Figure 1 .
Figure 1.Web-based and telephone survey enrollment among case and control participants, Colorado, March to April 2021.
completing the web-based and telephone survey to each other using 2-tailed t tests, Pearson χ 2 , or Fisher exact tests.Socioeconomic factors, which are not routinely asked in surveillance and therefore not included in the survey, were evaluated by aggregating mean scores for 4 Colorado EnviroScreen indicators (less than high school education, linguistic isolation, low income, and people of color) based on the participant's county of residence.Colorado EnviroScreen (version 1.0; Colorado State University and the Colorado Department of Public Health and Environment) is a publicly available environmental justice mapping tool developed by the Colorado Department of Public Health and Environment and Colorado State University that evaluates 35 distinct environmental, health, economic, and demographic indicators.Colorado EnviroScreen scores range from 0 to 100, with the highest score representing the highest burden of health injustice.

Table 1 .
Comparison of sociodemographic characteristics of the case and control sample pools with case and control participants overall and by survey completion mode, Colorado, March to April 2021.P<.01; survey mode (web-based, telephone, and web-based and telephone combined) versus sample pool.P<.05; case participant web-based versus telephone.P<.05; survey mode (web-based, telephone, and web-based and telephone combined) versus sample pool.

Table 2 .
Association between question response and survey completion mode (web-based vs telephone), Colorado, March-April 2021.Adjusted for case or control status, age, gender, race and ethnicity, and geographic location.
b Adjusted for case or control status.cOR: odds ratio.d e N/A: not applicable.