A nationwide mobile phone survey for tobacco use in Tanzania: Sample quality and representativeness compared to a household survey

We investigated the feasibility of an interactive voice response (IVR) survey in Tanzania and compared its prevalence estimates for tobacco use to the estimates of the 'Global Adult Tobacco Survey (GATS) 2018′. IVR participants were enrolled by random digit dialing. Quota sampling was employed to achieve the required sample sizes of age-sex strata: sex (male/female) and age (18–29-, 30–44-, 45–59-, and ≥60-year-olds). GATS was a nationally representative survey and used a multistage stratified cluster sampling design. The IVR sample’s weights were generated using the inverse proportional weighting (IPW) method with a logit model and the standard age-sex distribution of Tanzania. The IVR and GATS had 2362 and 4555 participants, respectively. Compared to GATS, the unweighted IVR sample had a higher proportion of males (58.7 % vs. 43.2 %), educated people (secondary/above education: 43.3 % vs. 21.1 %), and urban residents (56.5 % vs. 40 %). The weighted prevalence (95 % confidence interval (CI)) of current smoking was 4.99 % (4.11–6.04), 5.22 % (4.36–6.24), and 7.36 % (6.51–8.31) among IVR (IPW), IVR (age-sex standard), and GATS samples, respectively; the weighted prevalence (95 % CI) of smokeless tobacco use was similar: 3.54 % (2.73–4.57), 3.58 % (2.80–4.56), and 2.43 % (1.98–2.98), respectively. Most differences in point estimates for tobacco indicators were small (<2%). Overall, the odds of tobacco smoking indicators were lower in IVR than in GATS; however, the odds of smokeless tobacco use were reversed. Although we found under-/over-estimation of the prevalence of tobacco use in IVR than GATS, the estimates were close. Further research is required to increase the representativeness of IVR.


Introduction
Globally, noncommunicable diseases (NCDs), such as cancer, heart disease, and diabetes, are the leading causes of death (Dicker et al., 2018).Over the past three decades, the number of deaths, years of life lost, years lived with disability, and disability-adjusted life years attributable to these conditions have increased in low-and middleincome countries (LMICs) (Dicker et al., 2018;Forouzanfar et al., 2016;Wang et al., 2016).Many LMICs are now dealing with a double disease burdena simultaneous high burden of communicable and noncommunicable diseases (Dicker et al., 2018).Tobacco consumption is a major modifiable behavioral risk factor for these diseases; its use increases the risks for cancer, cardiovascular, and respiratory disease (U. S. Department of Health and Human Services, 2014).More than 8 million deaths are caused by tobacco use globally each year (Reitsma et al., 2017).
Continuously monitoring the burden of diseases and risk factors of public health importance, such as tobacco use, helps develop effective programs and policies to reduce their future burden (World Health Organization, 2021).Currently, the World Health Organization's STEPwise approach to NCD Risk Factor Surveillance (WHO STEPS), Multiple Indicator Cluster Survey (MICS), Demographic and Health Surveys (DHS), and Global Adult Tobacco Surveys (GATS) are used to obtain nationally representative health data from several countries (United States Agency for International Development, 2022;World Health Organization, 2022).However, conducting face-to-face interviews to collect such data is expensive, time-consuming, and labor-intensive.The challenge of cost and effort may affect the ability to collect survey data and, therefore, impact efforts to reduce the burden of NCD (Blinson et al., 1996;DeFranzo, 2021).
In high-income countries (HICs), telephone interviews are often conducted to collect data on behavioral risk factors.For example, the Behavioral Risk Factor Surveillance System (BRFSS) collects U.S. residents' data on health risk behaviors, chronic conditions, and preventive service use (Centers for Disease Control and Prevention, 2017).Lack of telephone access was previously an obstacle to implementing such surveillance in LMICs; however, the increasing use of mobile phones globally may allow for implementing mobile phone surveys (MPS) in LMICs.More than 95 % of people globally use mobile phones (International Telecommunication Union, 2018).In most LMICs, a majority of people live in rural regions, and the distance from one geographic location to another is large; therefore, MPS could be more useful for collecting data from these hard-to-reach population groups (L'Engle et al., 2018;Leo et al., 2012).
Multiple MPS data collection methods have been developed (Ballivian et al., 2015;Gibson et al., 2017).Interactive voice response (IVR) is an MPS method where eligible participants use their mobile phone keypad to answer a prerecorded questionnaire through an automated system (e.g., "If you are a male, press 1.If you are a female, press 2").IVR has been used to collect nationally representative estimates of demographic and health indicators (Song et al., 2020).However, the representativeness, reliability, or how the prevalence estimates reported by IVR differ from the nationally representative data collected by household face-to-face surveys, the gold standard, have not been well studied in many countries.The validity of indicators reported by MPS or how they are similar to those reported by the household face-to-face surveys are unknown.The United Republic of Tanzania is an example of an LMIC that is currently dealing with the double disease burden.This is a sub-Saharan African country with a population of about 60 million.In 2020, the mobile phone subscription rate was 86 per 100 people (The World Bank, 2023).An IVR was conducted in this country to understand the feasibility and cost of conducting a nationally representative survey.In this study, we attempt to compare the representativeness and validity of IVR data by comparing its tobacco use estimates with GATS Tanzania 2018 data.

Study design and participants
We conducted a cross-sectional study here.The IVR participants were recruited by random digit dialing (RDD).RDD sampling is a probability sampling technique where a software system generates a list of phone numbers at random to be used as the sampling frame.The sample was drawn from the RDD sampling frame by calling the participants (Research, 2023;Waksberg, 1978).Quota sampling was used to recruit participants of the following age-sex strata: 18-29-, 30-44-, 45-59-, and ≥60-year-old males and females.Due to the RDD, we did not have a specific sampling frame in the IVR.The method of obtaining the sample was described in the procedure section.
The GATS Tanzania 2018 was a part of the Global Tobacco Surveillance System.This nationally representative household survey aimed to estimate tobacco use indicators among ≥15-year-old non-institutional people.A standard survey protocol with standardized questionnaires, sample design, data management, and analysis procedures were followed.Data were collected using a multistage (i.e., three-stage) cluster sampling design to report estimates for the country as a whole and males and females of rural and urban regions.To make the survey sample nationally representative and to provide reliable national estimates for tobacco use indicators, GATS 2018 covered all regions (i.e., 31 areas − 26 from Mainland and 5 from Zanzibar) of Tanzania.The Population and Housing Census of 2012 was used as the sampling frame.The sampling frame had the lists of regions, districts, wards, and enumeration areas.In the first stage, 84 urban and 120 rural clusters were selected.In the second stage, 26 households were randomly chosen from the household list in each cluster.At last, one person with at least 15 years of age from each household was randomly interviewed (Ministry of Health, Community Development, Gender, Elderly and Children Dodoma et al., 2020).
The questionnaire primarily asks about sociodemographic characteristics, tobacco smoking, and smokeless tobacco use, among others (Ministry of Health, Community Development, Gender, Elderly and Children Dodoma et al., 2020).The GATS sampling frame included all households in a cluster.From each sampled household (5297), one participant with at least 15 years of age was selected.This resulted in a total of 4797 respondents, with a 96.4 % individual response rate.Data were collected from February to April 2018 (Ministry of Health, Community Development, Gender, Elderly and Children Dodoma et al., 2020).

Procedures
Data collection for IVR took place from October 2020 to March 2021.The phone calls were administered between 8:00 AM and 8:00 PM local time and sent in Kiswahili.Only randomly generated numbers were called.For RDD, the first three digits of the phone numbers were the country code, the next three digits were the mobile network operator's base digits, and the remaining seven digits were randomly generated to create a mobile phone number.These included all the existing Tanzanian mobile operators (i.e., 8).Upon answering the phone, participants were told about the purpose of the study, its expected duration and sponsoring agency, and the requirements for receiving an airtime survey.Participants were eligible if they were at least 18 years old and the sample size for the age-sex strata to which they belonged had not been met.Eligible participants were read a brief consent statement and asked to press '1′ if they consented to participate.
The IVR survey had five major components: 1) survey introduction, 2) age-sex screening questions, 3) consent, 4) demographic questions, and 5) five NCD modules.The NCD modules included questions related to 1) tobacco use, 2) alcohol consumption, 3) dietary habits, 4) physical activity, and 5) blood pressure and diabetes.The order of NCD modules was also randomly assigned for each participant to reduce attrition bias.However, the questions were not randomized within each module to preserve skip patterns.Participants who completed the survey were entered into a lottery where 1 in 20 complete surveys would receive 50,000 Tanzanian Schillings worth of mobile phone airtime (USD 21.68 as of October 20, 2020).Participants were also informed that they would not need to pay any money for the survey.

Outcomes
We limited our analysis to participants with completed IVR interviews.An interview was considered complete (I) when the participants answered at least four of the five NCD modules.Interviews with one to three modules completed were considered partial interviews (P).Refusals (R) were considered when age-eligible participants did not indicate consent or terminate the survey before consenting.Age-eligible participants who consented but did not complete any NCD module were considered break-offs (R).Those who did not answer the age question after initiating the survey were unknown (U).The estimated proportions of age-eligible respondents (e) were selected from people screened for age-eligibility but remained of unknown status.Individuals who indicated they were not at least 18 years old were considered age-ineligible.The response and cooperation rates were calculated using the American Association for Public Opinion Research equations (The American Association for Public Opinion Research., 2016) (S1 Table ).
We compared indicators that were available in both surveys.The primary outcomes were responses related to tobacco use: current smoking, current daily smoking, past smoking, past daily smoking, current smokeless tobacco use, current daily smokeless tobacco use, former smokeless tobacco use, and former daily smokeless tobacco use.The proportion of current smokers or smokeless tobacco users was obtained by dividing the number of participants who responded 'yes' to that question by the number of participants who responded to that question.Skip patterns of questionnaires were considered to calculate this.At the end of the survey, participants were asked how satisfied they were with the survey.
The questions are presented in the S2 Table .In GATS, the current smokers were those who smoked tobacco currently (i.e., daily or less than daily); the former and former daily smokers were those who smoked tobacco in the past.Then, current and past smokeless tobacco users were those who used smokeless tobacco currently and in the past (i.e., daily or less than daily), respectively.IVR asked the questions directly (S2 Table ).

Ethics approval
The study received ethical approval from the Institutional Review Boards of Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA, and Ifakara Health Institute, Dar es Salaam, Tanzania.

Statistical analysis
There were methodological differences between IVR and GATS, including the age requirement and quota sampling.We limited our analyses to people at least 18 years old and applied 'weighting' to the IVR sample to minimize these.
Weighting was also required to reduce the proportion of sociodemographic differences between GATS and IVR participants.We used two weighting methods to generate the IVR sample's weight.First, we employed a logistic regression model to generate inverse proportional weights (IPW), considering the GATS as the reference.This creates a sample weight for each of the participants of GATS.We adjusted for age, sex, education, and location of residence to generate these weights.Then, we used the United Nations Department of Population's standard age-sex distribution for Tanzania to get the sample weight for age-sex strata (United Nations, 2019).
To understand how these samples differed, first, we described the unweighted and weighted sociodemographic characteristics of GATS and IVR participants.Then, we reported unweighted and weighted prevalence (with 95 % confidence intervals [CI]).Lastly, using the weighted GATS sample as the reference, we conducted unadjusted and adjusted logistic regression analyses to test the association of survey mode with tobacco use indicators.We adjusted indicators for age, sex, education, and location, and reported crude and adjusted odds ratio (OR) with 95 % CI.We calculated the direct delivery cost per complete survey for each age-sex strata, which included the cost of airtime used to complete the survey and the incentive.The time spent answering the survey was multiplied by the per-minute airtime cost.Data analyses were conducted using Stata 14.1 (Stata Corporation, College Station, Texas USA, 2017).

Results
A total of 534,678 IVR calls were made; 20,985 respondents indicated they were at least 18 years old, and 5605 consented to participate.The number of completed interviews was 2362 (Fig. 1).Sample sizes for five of the eight age-sex strata were reached: 18-29-and 30-44-year-old males and females, and 45-59-year-old males.The sample sizes for unfilled age-sex strata for females ages 45-59, males ages 60+, and females ages 60+ were 126, 217, and 73, respectively.The cost per complete survey increased with age (S3 Table ).
The disposition codes and participation rates are reported in Table 1.The contact, response, cooperation, and refusal rates were 1.3 %, 0.8 %, 46.4 %, and 0.5 %, respectively.Most respondents were satisfied with the survey (98.4 %).
Table 2 shows the sociodemographic characteristics by survey mode.The proportions of 18-29-, 30-44-, 45-59-, and ≥60-year-olds were 33.2 %, 32.9 %, 22.0 %, and 12.0 %, respectively, in IVR; these proportions were 33.0 %, 35.5 %, 16.7 %, and 14.8 %, respectively, in GATS.The proportions of males in IVR and GATS were 41.3 % and 56.5 %, respectively.Overall, the proportion of people with secondary or higher education was higher in IVR than in GATS, at 43.3 % and 21.1 %, respectively.The proportion of urban residents was 56.5 % in IVR, while this was 39.9 % in GATS.The age-sex distribution of IVR and GATS samples became similar after weighting.The age-sex weighted IVR sample (58.7 %) had a higher proportion of urban residents than the IPW IVR (38.9 %) and GATS (33.2 %) samples.The proportion of people with any formal education was also higher among the age-sex weighted IVR sample (93.6 %) compared to the IPW IVR (84.6 %) and GATS (83.9 %) weighted samples.

Discussion
In this study, we employed RDD and quota sampling to determine the feasibility of using an IVR survey to collect nationally representative data in Tanzania and compared the prevalence estimates of tobacco use indicators using two different survey modes (i.e., IVR and face-to-face).We found a significant difference in the likelihood of reporting these indicators; overall, the odds of reporting tobacco smoking indicators were lower, and smokeless tobacco use indicators were higher among IVR respondents compared to those of GATS; however, most of the prevalence estimates were close to one another (i.e., a < 2 % difference).
This study adds to a growing body of literature investigating MPS's usefulness and validity in LMICs.
Different evaluation modes can generate different prevalence estimates (Clagett et al., 2013;Worges et al., 2022).In many HICs, multiple surveys are conducted each year to obtain prevalence estimates.Several studies examined the differences in prevalence estimates according to survey mode (Carlson et al., 2009;Hsia et al., 2020;Keadle et al., 2016).For instance, Keadle and colleagues examined the prevalence of physical activity (PA) among older individuals using the National Health and Nutrition Examination Survey (NHANES), National Health Interview Survey (NHIS), and BRFSS data; NHANES and NHIS were in-person surveys while BRFSS was a telephone-based survey [23]; they found the estimates of meeting PA guidelines as 27 %, 36 %, and 44 %, respectively (Keadle et al., 2016).Despite these differences in estimates, all three data collection methods are in use.Additionally, as the IVR and GATS used different sets of questions for the same indicators, differences in the wording of questions can yield different estimates (Carlson et al., 2009;Hsia et al., 2020;Keadle et al., 2016).
Although we found differences in the odds of reporting different tobacco use indicators, the point estimates were close to one another.We observed similar differences after stratifying the unweighted sample by age and sex (S4 and S5 Tables).The small differences in point estimates (<2%) of most tobacco use indicators between these two survey modes indicate that IVR can obtain similar estimates as other surveys.The observed differences can also arise from the differences in the study sample (i.e., age, sex, education, and location) and variability in the timing of survey administration.The IVR sample had a relatively higher proportion of people with higher education and urban residence.These population groups are also more likely to use and own cellular devices in LMICs (Greenleaf et al., 2019;Poirier et al., 2021), and previous studies have shown differences in tobacco use by these characteristics (Reitsma   Tingum et al., 2017).The under-or over-representation of some population groups is common in MPS studies.For instance, a previous study by L'Engle and colleagues compared the findings of an RDD IVR survey with the Ghana DHS.
Although the results of that study were promising, and they enrolled a large sample in a short period, they also had an underrepresentation of women and rural and older people (L'Engle et al., 2018).Another study by Greenleaf et al. in Burkina Faso found about two times higher odds of reporting modern contraceptive use among RDD women compared to data from women obtained using face-to-face interviews (Greenleaf et al., 2020).We could not achieve sufficient sample sizes for 45-59-and 60+ -year-old females (S5 Table ).In addition to lower ownership of mobile phones, females in LMICs tend to spend more time doing household chores and caring for children, which may prevent them from answering phone calls (L'Engle et al., 2018).Future MPS should explore ways to reach women and other underrepresented population groups (i.e., elderly, less educated, and rural residents), such as varying the timeframe in which calls are made or having the respondent suggest a better time for outreach.The average cost and number of calls for a survey completed by a woman were also higher than that of men because of our quota sampling approach.The average cost for recruiting younger people was lower than household surveys; however, this increased substantially with increasing age (S3 Table ).More research is required to understand the methods to minimize IVR costs.After stratification by age and sex, we could not obtain some estimates due to the smaller sample size of some age-sex groups (e.g., 60+-year-olds, S5 Table ).Previous studies have shown that airtime incentives increase participation rates (Gibson et al., 2019).Although we used incentives in this survey, one possible solution is to oversample individuals from underrepresented or hard-to-reach population groups.Other nationally representative surveys (e.g., NHANES) also follow that approach and generate the sample weight after data collection to reflect populationlevel estimates.Though we used quota sampling to obtain age-sex strata, additional strata like age-sex-education-place of residence could be used.Other strategies could be to use motivational introductory messages, send pre-survey text messages, and make multiple phone calls; however, the usefulness of all of these methods should be tested    (Dal Grande et al., 2016;Gibson et al., 2019;L'Engle et al., 2018).
Overall, the IVR had a lower response rate than the GATS.Without a working sampling frame, assigning appropriate disposition codes, particularly distinguishing phone numbers that are active versus inactive, and calculating a response or cooperation rate is challenging.Our response rate is conservative in that phone calls that did not pick up (75.4 %) were labeled with the unknown disposition code.A certain percentage of these phone numbers are likely inactive, thereby being classified as ineligible, which would reduce the denominator for the response rate and increase its size (Phadnis et al., 2021;Yang Song et al., 2020).However, similar MPS in HICs yielded a higher response rate (Gundersen et al., 2014a,b;;Margo, 2012).For example, the 2012 Australian New South Wales Population Health Survey, an MPS, obtained about a 32 % response rate (Margo, 2012).Population-based surveys in LMICs (e.g., DHS) usually have a high response rate (United States Agency for International Development, 2021).Furthermore, BRFSS has a sampling frame of local working phone numbers (Centers for Disease Control and Prevention, 2017).Obtaining such a sampling frame of working mobile phone numbers is expensive and laborintensive.Before collecting DHS data, a complete list of households is made in an enumeration area (United States Agency for International Development, 2021).A similar listing of working phone numbers may be made by randomly selecting enumeration areas; the same sampling frame could be used repeatedly for other surveys and may be updated regularly.In addition, differences in questionnaire design, survey length, and the calculation methods of disposition codes (i.e., break-offs, refusals, and partial interviews) may account for the differences in participation rates.
Our study has several notable strengths.We tested the reliability of our results by comparing them with a nationally representative survey, increasing the authenticity of our results.Our sample also included all mobile phone operators in Tanzania, removing any potential selection bias due to differences in subscribers' characteristics between survey operators.As the data were collected anonymously, the risk of social desirability bias was additionally minimized.
However, limitations of the present study also warrant discussion.Though IVR had a large overall sample, the sample size in some age-sex strata was low, and we had the underrepresentation of some population groups.The information was collected based on self-reports and may be subject to recall bias.As the IVR sample only included participants with mobile phones, prevalence estimates for tobacco use among those without mobile phones are not known.

Conclusion
This study suggests that although there may be some differences in prevalence estimates obtained by IVR and household surveys, the point estimates could be close.There may be an underrepresentation of some population groups.Future studies should aim to increase the participation of people belonging to these groups.Additionally, the reliability of IVR findings should be tested in other LMICs.

Fig. 1 .
Fig. 1.Flow Diagram of the Retention of the Interactive Voice Response Survey's Study Sample, Tanzania.

Table 1
Description of the Participants Disposition Codes and Participation Rates of the Interactive Voice Response Survey (N = 534,768), Tanzania 2020-21.

Table 2
Comparison of IVR & GATS Sample to Study Participants' Sociodemographic Characteristics, Tanzania.

Table 3
Comparison of Unweighted and Weighted Prevalence (95 % CI) of Tobacco Use Indicators among GATS and IVR Participants, Tanzania.

Table 4
Unadjusted and Adjusted Odds Ratios (95 % CI) for the Association Tobacco Use Indicators with Survey Modes, Tanzania.