High-Frequency Phone Surveys on COVID-19: Good Practices, Open Questions


 Following the onset of the COVID-19 pandemic, face-to-face survey data collection efforts came to a halt due to lockdowns, limitations on mobility and social distancing requirements. What followed was a surge in phone surveys to fulfill rapidly evolving needs for timely and policy-relevant microdata for understanding the socioeconomic impacts of and responses to the pandemic. Even as the face-to-face survey data collection efforts are resuming in different parts of the world with COVID-19 safety protocols, the rapidly-acquired experience with phone surveys on the part of national statistical offices and survey practitioners in low- and middle-income countries appears to have formed the foundation for phone surveys to be more commonly implemented in the post-pandemic era, in response to other shocks and as complementary efforts to face-to-face surveys. Informed by the practical experience with the high-frequency phone surveys that have been implemented with support from the World Bank Living Standards Measurement Study (LSMS) to monitor the socioeconomic impacts of the COVID-19 pandemic, this paper provides an overview of options for the design and implementation of phone surveys to collect representative data from households and individuals. Further, the discussion identifies the requirements for phone surveys to be a mainstay in the toolkits of national statistical offices and the directions for future research on the design and implementation of phone surveys.



High frequency phone surveys on COVID-19: How they shaped up in developing countries
It is not unusual for crises to trigger demand for data to understand crisis impacts and shape short-and longer-term policy responses. The 2007-2008 global food price crisis is an important example in the agriculture and food domain, which fueled a new interest in household survey datasets that would allow understanding the distributional impacts of that crisis (Ivanic and Martin, 2008;Ruel et al., 2010). The 2014 Ebola outbreak in West Africa and the 2017 drought-and conflict-related food insecurity crisis in Nigeria, Somalia, South Sudan, and Yemen led to a need for timely data on how people's livelihoods were affected. In both cases, phone surveys were deployed to meet emerging data demands (Etang and Himelein, 2020;Pape, 2020). The COVID-19 crisis is no different in the way policymakers, academics and the general public rushed to look for data to understand the evolution and impact of the pandemic. If anything, this crisis has been characterized by an exceptional presence of data in the public debate as statistics on number of cases, time-varying reproduction numbers (i.e. Rt rates), mortality, hospitalizations, and others came to dominate the headlines and governments and the general public started paying an unprecedented level of attention to data for policy decisions, such as the design and triggering of lockdowns, mobility restrictions, school closures, and more.
In this context, national statistical offices (NSOs) found themselves trapped between an increasing demand for more timely and policy-relevant data and the difficulties of collecting data under the circumstances created by the pandemic in terms of (i) their own staff having to transition to remote work and (ii) the need to halt face-to-face interviews to protect the health of their workers as well as that of survey respondents. In these conditions remote data collection quickly emerged as the tool of choice for meeting the new data demands. While NSOs in high-income countries moved to capitalize on existing capabilities for web-or phone-based surveys to fill data gaps, NSOs in low-income countries for the most part found themselves in relatively unexplored territory. The use of phone surveys in these settings has been on the rise for nearly two decades now, facilitated by the increased penetration of mobile phone services, which filled the gap on phone connectivity left by limited landline development (Tomlinson et al., 2009;Dillon, 2012;Demombynes et al., 2013;Ballivian et al., 2015;Larmarange, 2016;Lau et al., 2019). Several low-income countries had had experience with occasional phone-based surveys for specific topics or populations, but national level socio-economic data collection via phones had rarely, if ever, been attempted, and web-based surveys were not really an option for population-based studies given individuals' limited access to internet and personal computing.
The growth curve for the increase in phone surveys did not quite match, however, the exponential growth in mobile phone subscriptions, at least not until the COVID-19 pandemic started. When face-to-face data collection was suspended or halted indefinitely due to mobility restrictions and social distancing measures, the phone survey revolution that had timidly been in the making received an exceptional acceleration that made it the preferred mode of data collection across low-and middle-income countries. It has in fact been remarkable to witness how low-resourced NSOs managed to swiftly launch remote data collection efforts using innovative methods. Already in May 2020, 58 percent of NSOs that responded to a global survey reported that they were using phone surveys to collect data on the impact of COVID-19, with this percentage being higher in low-income countries (web-surveys being comparatively more common in high-income countries; UN and World Bank, 2020).
With this accelerated adoption, survey practitioners and NSOs realized they had not been adequately capitalizing on the potential offered by phone surveys both in terms of cost savings as well as in the temporal resolution and frequency of data collection. Several calls had been made over the years for more frequent collection of data on development issues in general (Ballivian et al., 2015), and specifically on issues that would benefit from an increase in the frequency of the available data, such as poverty (Blumenstock et al., 2015, Jean et al., 2016, food security, resilience and exposure to shocks (Barrett, 2010;Headey and Barrett;Knippenberg et al., 2019), crop production , or agricultural labor (Arthi et al., 2018). The pandemic finally opened the floodgates and pushed many countries to make the leap to new forms of data collection and on what one might expect to be a new equilibrium, following what in retrospect the statistical community will likely see as a structural break.
At the time of writing the dust has not quite settled, but there is an opportunity to start reflecting on what has been learned since the onset of the pandemic on the potential, the limits, and the challenges of implementing phone surveys in low-income settings. This paper offers some perspectives on this process of learning from the specific vantage point of an international survey program, namely the World Bank Living Standards Measurement Study (LSMS), which has been assisting seven African countries in the implementation of High-Frequency Phone Surveys on COVID-19 (HFPS) starting in April 2020. 2 Between April 2020 and June 2021, the LSMS supported 61 monthly phone survey rounds across 7 countries, amounting to over 119,000 completed interviews. The data from each phone survey round were disseminated publicly on the World Bank Microdata Library and were coupled with a round-specific survey report; 3 were used to create harmonized indicators disseminated through the World Bank COVID-19 High-Frequency Monitoring Dashboard; 4 and informed a wide range of knowledge products produced by the World Bank, client countries and development partners 5 .
2 Though the LSMS has been assisting seven countries, namely Burkina Faso, Ethiopia, Malawi, Mali, Nigeria, Tanzania, and Uganda, this paper focuses on the experience in five of these countries, excluding Tanzania and Mali. The HFPS in Tanzania did not commence until Spring 2021, and access to the data needed for cross-country analysis on some of the implementation aspects discussed in this paper were not available for Mali. 3 As of June 30, 2021, the total number of Microdata Library download counts across the LSMS-supported phone surveys stood at 3,691, and the total download count across the forty-four survey reports that have been disseminated publicly was 8,439. 4 The dashboard can be accessed here: http://bit.ly/wbcovid19dashboard. 5 The examples include Abay et al. (2021a), Gourlay (2021a, 2021b), Amare et al. (2021), Ambel et al. (2021), Brubaker et al. (2021, Contreras Gonzalez et al. (2020), Dang et al. (2021), Furbush et al. (2021, Josephson et al. (2020), Josephson et al. (2021), Kanyanda et al. (2021), Khamis et al. (2021). At the country-level, there were numerous examples of phone survey data use to highlight, besides the domestic newspaper columns and TV news coverage at the time of data dissemination. For instance, the Malawi HFPS findings have fed into (i) Malawi Poverty Assessment, and (ii) the policy briefs on employment, education, food security, and social assistance that were prepared for the World Bank 2020 Annual Meetings. Likewise, the Uganda HFPS data and findings were used for (i) the second update of the Uganda Systematic Country Diagnostic -the key input into the Country Partnership Framework, which is the central tool for guiding World Bank Group's development operations and gauging their The objective of this viewpoint is two-fold: in the short-term to inform the continued implementation of such surveys during the evolving phases of a pandemic that threatens to inflict further damage on public health and economies; in the medium-to long-term to start reflecting on the positioning of phone surveys in the context of a post-pandemic survey system which is likely going to be characterized by a combination of 'traditional' face-to-face surveys with other modes that will allow for higher-frequency data collection (via phone, as well as other non-traditional data sources (see e.g. Bottan, et al. 2020 and2021) including many types of new 'private intent data' (World Bank, 2021)). This model would allow cost savings, higher temporal resolution, and reduced recall burden for events that are less salient or more variable over time.
Returning to a pre-COVID-19 survey world that ignored the lessons learnt during the pandemic would be a loss.
In the what follows, though the experiences and lessons are mainly informed by the LSMS-supported phone surveys during the COVID-19 pandemic, we also draw on evidence from other national or international survey efforts, including those supported by the Innovation for Poverty Action (IPA) (Glazerman et al., 2020) and the pre-COVID-19 phone survey programs supported by the World Bank (Dabalen et al., 2016;Ballivian et al., 2015).
One key feature of the LSMS-supported HFPS effort was the use of pre-COVID-19 face-to-face surveys as sampling frames and a source of valuable information. The availability of an existing list of households with phone contacts enabled the HFPS deployment to be rapid and extremely timely. While selection biases were initially of concern, the rich pre-COVID-19 information available turned out to be extremely useful for understanding and to a large extent correcting the biases associated with the sample selection process, albeit with limits (Ambel et al., 2021, Brubaker et al., 2021. The same information was invaluable in terms of adding depth to the analysis of the HFPS data. The data collected on households and respondents could be related to their pre-COVID-19 socio-economic status (education, employment, consumption expenditures) to describe the impacts of the crisis along familiar household profiles without having to spend precious interview time trying to reconstruct the history of the participating units.
The LSMS-supported HFPS operations were also highly flexible. The frequency of the survey rounds allowed for a rotation of survey modules that could be responsive to the policy questions that were becoming more relevant with the evolution of the crisis or ever-present seasonality in specific outcomes. For instance, information on agricultural activities, such as planting and harvesting, could be fielded at different times to optimally align with the timing of these activities on the ground. Education questions could be adapted to take into account the specific timing of the school year and the specific decisions that were being taken in each country in terms, for instance, of school reopening. Questions that were less effectiveness, (ii) the Uganda Economic Updates that were produced by the World Bank during July 2020-June 2021. In Burkina Faso, the HFPS data and findings were used for (i) the Burkina Faso Economic Updates that were produced by the World Bank during July 2020-June 2021, (ii) the Burkina Faso Poverty Assessment, (iii) the targeting of the World Bank-financed Social Safety Net Project, and (iv) the studies carried out by the Ministry of Economy, Finance and Development, Institut Supérieur des Sciences de la Population, and Institut de Recherche en Science de la Santé, with a focus on the socioeconomic, health, and education impacts of the pandemic. likely to vary over time, such as access to water and sanitation facilities, were only asked in the initial rounds, whereas questions on access to medicine and treatment were generally kept throughout, alongside those on impacts on food security, employment and income. Credit and debt questions were included in the later rounds, as were questions related to attitudes towards vaccinations. These advantages of relying on an existing information base for conducting the phone surveys, we argue, is an important element to consider when thinking about the design of the post-COVID survey infrastructure. Having a strong base of face-to-face surveys will constitute the backbone of a system integrating traditional and new forms of data collection, including but not limited to phone surveys. Thinking of the two sets as substitute, rather than complements, would do a disservice to data users and to society at large.
The paper is organized as follows: Section 2 outlines issues with sampling and respondent selection, including response rates, attrition, possible biases and coverage issues and how those can be mitigated or corrected; Section 3 discusses questionnaire design, and how this needs to be approached differently in a phone survey, and the limits and opportunities that poses for the analyst; Section 4 zooms in on cost and implementation issues. A set of concluding thoughts is offered in Section 5 which highlight lessons learnt that can help inform future design decisions, as well as areas where more research is needed to pin down the appropriate approach to data collection.
2. Sampling design and respondent selection 2.1. Modes of implementation: CATI, SMS, IVR In phone surveys, there are three main survey modes: Computer Assisted Telephone Interviews (CATI), Interactive Voice Response (IVR), and text message-based surveys (SMS). In CATI surveys, interviewers call respondents to collect data, often from a call center, entering responses into the interactive survey questionnaire on a computer. IVR surveys also contact respondents through voice calls but rely on automated, prerecorded questions instead of live interviewers. Respondents answer to the prerecorded questions by voice or by pressing numbers on the keypad. SMS surveys consist of questions being sent to respondents by text message which they answer by text message prompting follow up questions in an automated fashion (Lau et al., 2019). CATI surveys tend to be more expensive than IVR or SMS but are considered to yield richer information and better data quality (Glazerman et al., 2020). SMS surveys may not be appropriate when large sections of the target population are illiterate. SMS and IVR surveys generally allow shorter, simpler questionnaires relative to CATI surveys. The mode of the LSMS-supported HFPS is CATI, as was the case in many other COVID-19 monitoring surveys.

Sampling for phone surveys
Whatever the mode, all phone surveys need a list of phone numbers to contact. The choice of what kind of list to use as the sampling frame is a critical design feature with implications for the representativeness of the phones survey estimates. There are three common types of sampling frames for phone surveys in lower income settings. First, based on phone numbers obtained in a previous household survey or program. Second, based on lists of phone numbers, for example from a mobile network operator. And, third, based on phone numbers created through random digit dialing (RDD; Himelein et al., 2020).
The LSMS-supported HFPS used phone numbers obtained in recent representative household surveys supported by the program. 6 This approach has the advantage that there is extensive household and individual information associated with each phone number which can be used to assess and improve the representativeness of the phone survey estimates (Himelein et al., 2020). The approach is possible only if a recent representative survey exists, which also collected respondent phone numbers, and the sample size of phone surveys using this approach is limited by the sample size of the existing representative survey. In comparison, list-based and RDD sampling frames usually lack sociodemographic information associated with each phone number, making it harder to assess and improve their representativeness. In RDD surveys, in particular, response rates also tend to be much lower than in the household survey approach because many of the randomly created phone numbers do not work, which makes it necessary to contact a large number of potential respondents (Himelein et al., 2020). An RDD survey in Ghana, for example, successfully reached less than 15 percent of phone numbers (L'Engle et al, 2018;Himelein et al., 2020). In contrast, the LSMS-supported HFPS successfully contacted between 62 and 94 percent of the phone numbers attempted in each country (Table 1). Similarly, Henderson and Rosenbaum (2020), in a review of recent remote surveys, find an average of 19 percent of phone numbers based on RDD and lists to be connected, compared to 63 percent of phone numbers from pre-existing surveys.

Phone ownership, response rates, and representativeness
Access to mobile phones is a key determinant in whether a population of interest can be interviewed in a phone survey. While mobile phone penetration continues to grow globally, significant segments of the population remain uncovered and therefore unrepresented in phone surveys, especially in less developed countries (GSMA, 2020). 7 The concern is that the population with access to mobile phones differs meaningfully from that without access, which may lead to "coverage bias" and undermine the representativeness of the phone survey sample (Ambel et al., 2021;Himelein et al., 2020). In the baseline face-to-face surveys used for the HFPS, the share of households with a contact phone number (coverage rate) was comparably high, ranging between 73 percent in Malawi and 99 percent in Nigeria (Table 1). There are some meaningful differences between the full baseline face-to-face sample of households and the sample of households with access to a mobile phone. These differences hold consistently across countries, except in the case of Nigeria where the coverage rate is 99 percent.
Households with access to a mobile phone are significantly wealthier, less likely to be rural, to work in agriculture or to own land, but more educated, and more likely to have wage employment or own a nonfarm enterprise. In contrast, there are no significant differences in the share of households headed by women nor in household size (see Ambel et al., 2021 for a more thorough discussion).
A further concern is unit nonresponse, whereby contacted respondents cannot be interviewed, for instance due to refusal. This is a problem in all surveys, whether face-to-face or over the phone, but unit nonresponse tends to be higher in phone surveys because refusal rates are higher on the phone, phone numbers are disconnected, or respondents cannot be reached on their phones for other reasons. In round 1 of the LSMS-supported HFPS, the response rates ranged from 60 percent in Ethiopia to 93 percent in Uganda (Table 1). These response rates are consistently higher than those cited in a recent review of remote surveys, which also found that response rates differ by survey mode, with CATI performing best, followed by IVR and SMS (Henderson and Rosenbaum, 2020).

Figure 1. Response rates over time
Response rates matter for each round of a survey. Attrition, whereby survey respondents drop out from one round to the next, is a common issue in panel surveys. In the case of the HFPS, the short intervals of approximately one month between rounds reduces the risk of phone numbers being disconnected, and incentives for continued participation are offered, though respondents may become increasingly fatigued. Overall, response rates remained quite stable across rounds, dropping off somewhat more in the latter rounds in Ethiopia. This was due in part to an ongoing armed conflict in the country's Tigray region, where at times as many as 75 percent of phone survey respondents could not be reached. Mobile network issues, including shutdowns following periods of unrest, further contributed to the drop-off in response rates in Ethiopia (Anna, 2020). In most HFPS, households successfully interviewed in round 1 were re-contacted in subsequent rounds. Only households that explicitly refused to be interviewed were dropped round-onround, but those that could not be reached were retained, which led to the number of households attempted decreasing slightly in each round ( Figure 1). 8 Round 10 in Malawi and round 12 in Nigeria were explicitly targeted at youth, leading to a reduction in the attempted interviews.
Nonresponse leads to bias if the nonresponding households or individuals differ systematically from those interviewed, and that is the case across the LSMS-supported HFPS. The differences between responding and nonresponding households are similar in magnitude and direction to the differences between household with and without access to a phone, if a little less pronounced (Ambel et al., 2021). Overall, response bias adds to coverage bias, exacerbating the differences between the full, nationally representative baseline samples and the phone survey samples. Tables A.1 and A.2 in the Appendix, adapted from Ambel et al. (2021), show a detailed comparison of the characteristics of households which participated in the HFPS with households in the respective baseline surveys. HFPS households tend to be wealthier, as measured through consumption and asset ownership, and less likely to be rural and agricultural. Heads of HFPS households are better educated, while the share of households headed by women is not statistically significantly different across the board. In the context of a phone survey on the impacts of Ebola in Liberia, Himelein and Kastelic (2015) present a similar comparison of the characteristics of their sample of phone survey households with the full sample of face to face households from which the phone survey sample was drawn, albeit for a limited set of household characteristics. They find phone survey households are less likely to be in the agricultural sector or self-employed. Henderson and Rosenbaum (2020) also find that phone survey samples are skewed towards urban households.

Methods to counteract bias
The advantage of phone surveys based on previous household surveys is the wealth of information on the selected households, which can be used to compare the selected sample to the nationally representative sample, and to assess and reduce the biases resulting from sample selection. This is most commonly done by adjusting the survey weights based on socio-demographic household characteristics from the baseline survey, augmenting the relative weight of characteristics according to how closely they align with those in the representative sample (Himelein et al., 2020). There are various techniques to implement reweighting. In weighting class adjustments, the sample is divided into cells according to certain characteristics (such as location, gender, education) and respondents in each cell are weighted according to the general population total of their cell. Propensity score adjustments predict the probability of selection for each household given its characteristics (such as gender, education, wealth) and use the inverse probability of selection to adjust the weights (Little, 1986;Himelein et al., 2020). Other techniques include rim or rake weighting and calibration of nonresponse (see discussions in Himelein et al., 2020 andAmbel et al., 2021).
The LSMS-supported HFPS implemented propensity score adjustments to the sampling weights, except in Nigeria where weighting class adjustments were used. Ambel et al. (2021) assess the efficacy of the weight adjustment techniques implemented in the HFPS, finding that they substantially reduce the bias, though they do not fully eradicate it for all variables.

Respondent selection and profiles
The choice of the respondent is another central design decision in phone surveys. One consideration is that in order to collect reliable household-level data, the respondent needs to be knowledgeable of the household's situation. In that sense, similar considerations apply as with respondent selection in face-toface surveys, where the most knowledgeable adult tends to be the household head or their spouse. When it comes to collecting reliable individual-level data, direct individual response rather than response through proxy of the main respondent is preferable. Interviewing household members other than the main respondent is considerably easier in face-to-face surveys, where interviewers are present with the household in person, than in phone surveys. Moreover, a lot of the monitoring of the impacts and perceptions of the COVID-19 pandemic in the HFPS or other such surveys is effectively at the level of the individual respondent, for instance when asking about knowledge of measures to prevent the spread of the virus or attitudes towards vaccination. Individual-level data is also critical to gauging the differential impacts of the pandemic on women and men, or by age or disability status, among others. How respondents are selected, and how consequently respondents compare to the general population, is therefore of interest. Respondent contacting and selection in the LSMS-supported HFPS followed slightly different protocols. In Burkina Faso, Ethiopia, Nigeria, and Uganda, the protocols explicitly or implicitly targeted the head of the household as the first contact person and respondent in the initial round. The protocol was different in the first round of the Malawi phone survey, where the available phone numbers from the baseline survey were randomly ordered and then contacted in that order. 9 Across all countries, an overwhelming majority of the first-round respondents -between 74 and 85 percent -were household heads, including in Malawi, despite the randomized contact protocol (Brubaker et al., 2021; Table 2). This suggests that household heads are more likely to have access to a phone. Further, while phone numbers were called in random order in Malawi, the person first contacted does not necessarily end up as the respondent.
Beyond household headship, many phone survey respondents are men, even though women are the majority in the general adult population (Table 2). Brubaker et al. (2021) assess the determinants of respondent selection across various LSMS-supported HFPS, finding that the most important factor is household headship, with household heads between 31 and 47 percent more likely to be selected than other household members. Selected respondents are also more educated and more likely to own a nonfarm enterprise. An individual's sex does not have a significant impact, which suggests that the gender imbalance among respondents is because most respondents are household heads who are overwhelmingly men (Brubaker et al., 2021). In contrast, Himelein and Kastelic (2015) find that respondents in a phone survey on the impacts of Ebola in Liberia are significantly less likely to be female. However, the authors do not report on or control for the share of household heads among their respondents. Similarly, Henderson and Rosenbaum (2020) find that phone survey respondents are more often male, though the results are not reported in a multivariate regression setting and household headship is not assessed. The authors also find phone survey respondents to be younger than the general population, while Brubaker et al. (2021) note that HFPS respondents tend to be older. Henderson and Rosenbaum (2020) and Brubaker et al. (2020) both find phone survey respondents to be better educated.
Overall, the profiles of the HFPS respondents are not representative of the general adult population. Brubaker et al. (2021) implement individual-level weighting adjustments drawing on the same techniques discussed in the context of household-level bias reduction. These generally improve the representativeness of individual-level estimates from the HFPS but fall short of overcoming selection biases. A detailed set of results adapted from Brubaker et al. (2021) is shown in Table A.3. At the same time, the individual-level weight adjustments increase the variance of the estimates, in some cases significantly so. The most effective way to improve the representativeness of individual-level estimates may be through survey design, for instance by randomly selecting one or more respondents. 10

Questionnaire design and survey duration
Questionnaire design is a crucial step for the successful development of any survey. A properly designed questionnaire is important in assuring that the right information is elicited from respondents, that the data collected is of good quality and that all the following steps of the survey can run as smoothly as possible. Long interviews with questions that are difficult to answer can create confusion among participants and generally cause respondent fatigue, particularly in panels with frequent survey rounds, increasing the incidence of measurement errors, item non-response and attrition. These in turn reduce data quality and make implementation and analysis more problematic. 11 While this is true for any survey it is particularly important for remote surveys, such as phone surveys. A recent experimental study conducted by Abay et al. (2021b) during a phone survey in Ethiopia, for example, shows that delaying a module on dietary diversity by 15 minutes led to underestimation of the dietary diversity score due to respondents' fatigue.
In terms of the content of the interview, phone surveys have limits for certain types of information.
Complex questions, questions that need visual supports, questions that need specific tools (e.g. scales), and questions that need on-site measurement, among others, require adaptation for phone surveys (Brancato et al., 2006). For example, modules on expenditures and agricultural production are long and complex modules that use many answer categories which need to be read and explained to respondents, often requiring the use of visual images or scales to elicit the proper measurement. Thus, an effort was made within the HFPS to shorten and simplify questions, avoiding complex measurement questions in favor of categorical questions that are easier and less time consuming to administer. Thus, during the HFPS questionnaire design, the limitations imposed by the survey mode heavily impacted the choice of the questions. For example, collecting information on food consumption in a standard fashion would have enriched substantially the informational content but would have also extended the length of the interview and increased its complexity beyond the limits identified by common experience and advised by the existing literature. The use of shorter food insecurity assessments and dietary diversity modules is a compromise between reduced length/complexity and the need to obtain important information on nutritional and food security outcomes. Furthermore, questions that measure household income from different sources have been replaced by categorical questions of the type "Since the last interview has income from [SOURCE] increased/decreased or stayed the same?". These examples point to the limitations of phone surveys in contexts where the precise measurement of complex monetary aggregates represents the key objective of the survey. 11 There is a literature on the effects of questionnaire design and survey length in face-to-face surveys in high-income countries. Petytchev and Petytcheva (2017) show that measurement error increases for questions posed late in the survey highlighting the importance of managing survey length properly. Holbrook et al. (2003) demonstrate that telephone respondents were less cooperative and more likely to complain about the length of the interview than face-to-face respondents. Herzog and Bachman (1981) also find that longer modules are linked to repeated identical answers given by respondents.
Modifications in the questionnaire structure imply a trade-off between comparability and survey quality. The desire to obtain data and indicators as harmonized as possible with baseline traditional face-to-face surveys within and across countries needs to be balanced against the need to adapt the questionnaire to the phone modality. Keeping the same wording and nature of the questions improves comparability but may involve using questions that do not adapt well to the phone-based nature of the interview. The need to keep the interview length limited constrains the questionnaire content favoring the administration of shorter modules of lower complexity. The monthly LSMS-supported HFPS interviews were designed so that each interview would not exceed twenty minutes, which is in line with recommendations provided elsewhere (Glazerman et al., 2020). Only in the baseline interview were a few additional minutes allowed in order to give a brief introduction of the program and to update the household roster from the most recent panel survey. The household roster was also updated in all subsequent rounds of the HFPS in the five countries included in Figure 2, though the updating was assumed to require less time in subsequent rounds as relatively little change in household composition was expected from month to month. Data on the actual duration of interviews show durations not far from that target. During the first round of the surveys, average interview duration was 21 minutes in Uganda, 33 minutes in Malawi, 18 minutes in Ethiopia, 19 minutes in Burkina Faso, and 29 minutes in Nigeria. In all countries, interview duration decreased substantially after the first round ( Figure 2).

Figure 2. HFPS average interview duration by round
Note: IQR is the abbreviation for interquartile range.
The necessity to keep interview duration within these limits led the HFPS team to devise a rotating module format where households are administered a core set of questions to primarily capture the economic impacts of COVID-19, and these questions are complemented by rotational modules on selected topics that are be introduced each month. Similar strategies have been followed by other phone surveys used to measure the impacts of COVID-19 on low and middle-income countries, such as the IPA Recovr surveys. 12 The core set of questions were administered in each country so that they are cross-country comparable. The first round of LSMS-supported HFPS covered topics including: household composition; knowledge of the existence of and channels of transmission of COVID-19; behaviors and social distancing; concerns and perceptions about COVID-19; access to food and non-food necessities and to basic services; employment; food security; safety nets. The rotating modules are introduced at various stages in the HFPS system and focus on specific topics of interest that do not require month-by-month follow up. The rotating modules include topics such as housing, credit, education, livestock, farm practices, locust infestation, social fragility, washing practices and attitudes towards vaccination. This method allows for the possibility to focus on specific issues of interest while keeping the core questionnaire manageable for a phone interview.
The core-plus-rotating module structure of the HFPS also allows for full exploitation of the advantages of the phone surveys in terms of flexibility, adaptability, and cost effectiveness. In fact, the rotating structure has allowed the HFPS content to adapt to the continuously changing needs of each country involved in the program. In Uganda and Nigeria, where a specific need to assess the impact of the pandemic on educational outcomes arose, the education module was expanded and administered at the child level. A similar strategy was developed in Nigeria and Malawi, where the employment module was expanded to collect data at the individual level, though through the main phone survey respondent (as opposed to individual-specific interviews), to expand the analysis of the COVID-19 impacts on labor market outcomes.

Cost and implementation issues
Preparation, implementation, and troubleshooting for "fieldwork" in phone surveys will differ from that in face-to-face surveys, as will procurement and personnel requirements and, therefore, overall budget envelopes.

Recruitment and fieldwork organization
While the personnel roles of traditional surveys -interviewer, supervisor, and survey manager -are still utilized in phone surveys, the necessary skills and experience may differ. Phone survey staff should, preferably, have experience with CATI (or other select data collection mode) as well as the application that is being utilized to record data. Language skills should be considered carefully, as the number of interviewers hired with specific language skills need be consistent with the share of interviews to be conducted in those languages. In the case of the HFPS surveys, which build upon pre-existing panel surveys, familiarity with the content, definitions, and other workings of the ongoing panel surveys was also valuable. For this reason, the majority of HFPS surveys selected interviewers from the pool of interviewers who carried out the local face-to-face panel survey in previous years.
A key fieldwork organization decision dictates the equipment and training needs of a given phone survey program -call center or no call center. In the case of a phone survey that utilizes a call center, interviewers work from one or more centralized locations. This can facilitate more streamlined supervision but does come with additional costs in terms of establishment of the center and other challenges, particularly in the COVID-19 environment where strict protocols are necessary to ensure the safety of those working in the centers. The alternative is remote interviewing and supervision, whereby survey staff work from their own home rather than a centralized call center. Given restrictions related to COVID-19, the HFPS operations in Burkina Faso, Ethiopia, Uganda, and Nigeria employed a remote interviewing approach. The Malawi HFPS did utilize a call center, leveraging the existing call center facility established for previous survey operations.
If survey staff are to work remotely, it is necessary that they have the required infrastructure at home: reliable electricity, phone coverage, and internet connection (Amankwah et al., 2020). All interviewers, regardless of whether they are conducting interviews remotely or from a call center, need to be equipped with a phone (with airtime for remote interviewing), a headset to facilitate data entry during the interview, a tablet or computer, a power source, and internet connection (Amankwah et al., 2020). The survey manager and supervisors should be equipped with all the above, plus a laptop to efficiently review interviews and survey implementation reports.
In the case of the LSMS-supported HFPS, in-person training was not an option in most countries given COVID-19 restrictions. Traditional conference hall trainings with mock interviews were replaced with remote training alternatives, primarily via video conference. Additional options for remote training, especially when preparatory time is more abundant than was the case in the response to the COVID-19 pandemic, include web-based tutorials, videos, and quizzes as recommended by Amankwah et al. (2020) and Glazerman et al. (2020). Manuals should be developed and provided to all survey staff to ensure consistent messaging and allow for real-time reference.
In order to facilitate effective fieldwork management and data quality, survey managers and high-level supervisors should also be trained in the generation and interpretation of survey management reports. The World Bank has developed an automated survey management system specifically for phone surveys, using the Survey Solutions CAPI platform, allowing for rapid and consistent report generation which is particularly important in settings like the HFPS where fieldwork sprints are brief relative to face-to-face fieldwork efforts. 13 Finally, surveys administered via CATI offer additional means of monitoring data quality, namely audio audits, in which interviews are fully or partially recorded and subsequently reviewed. If opted into, the use of audio audits may require additional training and personnel, though they result in a greater understanding of interviewer skills and data quality. Both the Ethiopia and Nigeria HFPS operations employ audio audits as a mechanism for ensuring data quality. In Ethiopia, the consent discussion was recorded for all interviews and those recordings were reviewed for a randomly selected 2 percent of completed interviews, plus all interviews which interviewers flagged as having potential data quality issues. In Nigeria, audio audits of the entire interview were activated for 15 percent of the sample and three trained personnel were tasked with reviewing the recordings and providing feedback to interviewers. Audio audits proved especially useful in monitoring interviewer behavior as well as interpretation, rephrasing, and translation of questions, where relevant, particularly given the absence of in-person training and supervision that interviewers typically receive. The HFPS teams were able to utilize the audio audits in these countries to channel the attention of supervisors to weakly-performing interviewers and drop interviewers when necessary. In cases where both the interviewer and respondent were audible, the recordings served the function of traditional call-backs or spot checks, as the interview conversation could be referenced against the data entered. Given the limited duration of the HFPS interviews and the relatively well-connected, urban environment in which calls were conducted, the audio audits did not pose notable challenges in the upload and syncing of data files, though this should be considered in longer surveys, especially when data upload occurs in more rural areas with more limited internet connectivity.

Respondent engagement and incentives
As in face-to-face surveys where predetermined protocols guide engagement with respondents, protocols should also be devised for phone surveys. These protocols will differ in that they will instruct interviewers on the number of call attempts, when to make call attempts, how to schedule phone interviews, etc. Guidelines provided by IPA include a wide-ranging list of contact protocols, such as the minimum time between calls, the maximum number of call attempts, whether to notify the respondent of an upcoming call by SMS, and how long to let the phone ring before considering it a failed attempt (Glazerman et al., 2020). The HFPS operations in Uganda and Malawi employed a protocol requiring up to nine call attempts in order to reach a sampled household, with three call attempts per day every other day, with at least two hours between attempts. The HFPS in Ethiopia also required up to three calls per day, for three days (Ambel et al., 2020). In Burkina Faso, a similar protocol was employed, with up to three attempts per day for the primary phone number available and if no success with those three attempts, a secondary phone number is tried up to three times the following day. If no success with the secondary phone number in those three attempts, the first and second numbers are tried again. The number of call attempts required for all completed interviews in round in 1 of the HFPS are illustrated in Figure 3. The average number of call attempts was 2.6 for Burkina Faso, 2.0 for Ethiopia, 3.2 for Malawi, 3.7 for Nigeria, and 1.3 for Uganda. Guidelines developed by IPA include a wide-ranging list of additional contact protocols, such as the minimum time between calls, the maximum number of call attempts, whether to notify the respondent of an upcoming call by SMS, and how long to let the phone ring before considering it a failed attempt (Glazerman et al., 2020). Incentives for participation are especially prevalent in phone surveys, both to encourage engagement in future interviews and to account for costs the respondent may have incurred for the interview itself. Evidence from randomized control trials suggests that the provision of incentives in phone surveys, in the form of airtime credit, improves cooperation and response (Gibson et al., 2019). There is mixed evidence on the importance of the incentive value itself, however, as Ballivian et al. (2015), for example, find similar attrition rates in groups with differing incentive values in Peru, but find the value of the incentive did have an impact on participation in a similar study in Honduras. Additionally, the value of the incentive should be inversely related to the frequency with which the respondent is interviewed, such that households that are interviewed frequently receive a smaller per interview incentive relative to households that are interviewed on a more infrequent basis (Dabalen et al.,2016). The LSMS-supported HFPS surveys provided airtime to respondents for each completed interview in the amounts of $0.70 (Ethiopia), $0.91 (Burkina Faso), $1.29 (Nigeria), $1.33 (Malawi), and $1.34 (Uganda).

Costs
The costing structure for phone surveys is radically different from traditional face-to-face surveys. Implementation via phone eliminates the need for vehicles, fuel and per diem allowances, and reduces staff time given the necessarily shorter interview duration. Costs related to travel for technical assistance are also eliminated or significantly reduced. Assuming the baseline face-to-face survey upon which a phone survey is built is not considered part of the phone survey cost, phone surveys are generally much less expensive operations.
The fixed costs for phone surveys, rather than revolve around vehicles and GPS devices, for example, center on the establishment of a call center (if feasible for the survey), personnel, and acquisition of tablets and phones. The variable costs for phone surveys, such as airtime credit, are marginal, thus increasing the cost-effectiveness of the surveys in the case of repeated or panel phone surveys such as the HFPS. The cost per completed interview in the LSMS-supported HFPS ranges from $7.84 in Burkina Faso to $13.11 in Nigeria (Figure 4.), inclusive of initial investment and recurring costs. On a per minute basis, costs range from $1.14/minute in Ethiopia to $1.98/minute in Uganda. 14 These costs are very low relative to the face-to-face LSMS-ISA surveys that underpin the HFPS series, which range from $199.48 per household (Malawi 2010/11) to $406.03 per household (Nigeria 2010/11), though the extent of data collected through the face-to-face and phone surveys ought to be considered when comparing per interview costs (Kilic et al., 2017).

Looking forward
The COVID-19 pandemic and the related restrictions have dramatically slowed down face-to-face surveys and have accelerated the adoption of phone surveys to understand the impacts of the pandemic on individuals, households and firms. In the post-pandemic era, enhancing country capacity to design and implement phone surveys as well as investing in the development of phone survey methods and tools should continue to be points of emphasis in national and international efforts to build stronger and more responsive survey systems that improve the availability, quality, timeliness and policy-relevance of household survey data production. Phone surveys should be called upon not only to meet rapid assessment needs, but also as part of recurrent household surveys that combine face-to-face interviewing with telephone interviewing to collect longitudinal data. The latter model would reduce the duration of and the respondent fatigue associated with face-to-face interviews and minimizes the recall errors in survey data on outcomes that can benefit from higher-frequency monitoring. 15 To realize this vision, future survey work should advance on several fronts.
First, relying on pre-COVID-19 face-to-face household surveys as sampling frames for phone surveys implemented during the pandemic revealed the scope for this type of approach to minimize householdlevel coverage and non-response biases, albeit with limits. Going forward, contact information for household members and reference individuals outside the household should be elicited more systematically in future face-to-face surveys that can in turn provide foundational data for sampling and bias adjustments in future phone surveys. While longitudinal face-to-face surveys routinely collect this type of contact information to facilitate household and individual tracking efforts, cross-sectional surveys should too heed this call.
Second, investments in information and communication technology (ICT) infrastructure will constitute a key component of the enabling environment for mixed mode surveys 16 to flourish in low-income countries. These investments should provide NSOs with access to reliable internet, computer hardware and software, as well as cloud computing facilities for data collection, storage, and processing. Considering the NSOs that implemented the phone surveys that are featured in our study, one of the reasons why they were able to rapidly launch phone surveys without any advance preparation was the decade-long pre-COVID-19 push by the World Bank Living Standards Measurement Study (LSMS) program to modernize the ICT infrastructure for the implementation of longitudinal household surveys and to help the partner NSOs successfully transition to computer-assisted personal interviewing (CAPI).
Third, efforts to build stronger sampling frames and ICT infrastructure for phone surveys should be augmented with further development and strengthening of existing (i) phone survey tools, including core and rotational questionnaire templates, tabulation plans and tools for remote survey management and para-and meta-data analysis for data quality assurance; and (ii) phone survey protocols, including for management of centralized call centers versus decentralized interviewers, respondent selection, and establishing and sustaining contact with each respondent. While the public dissemination of these tools and protocols has been the standard practice, there remains significant scope for providing consolidated guidance based on the practice experiences of a broad coalition of international survey programs and survey implementing agencies that were involved in phone survey design and implementation during the COVID-19 pandemic.
Finally, there are critical methodological research questions that remain under-researched on phone survey design and implementation, including as part of longitudinal household survey systems and particularly in low-income settings. For instance, although the literature on survey mode effects (i.e. the fact that identical survey questions that are asked in face-to-face versus phone interviews at the same time in otherwise comparable households may lead to different answers) is vast, research has disproportionately focused on high-income settings -precisely because of the longer history of mixed mode surveys in these countries vis-à-vis their lower-income counterparts. As phone surveys become more common in low-and middle-income countries, there will be considerable value in conducting similar research on survey mode effects. Further, in the post-pandemic era, longitudinal household surveys that seek to combine face-to-face interviews with phone interviews in between follow-up household visits will have to decide on whether they provide a phone to each household that does not have one. The provision of a mobile phone may lead to changes in household and individual behavior and the changes that are registered in data collected in baseline versus follow-up face-to-face interviews may be due to the behavioral impacts brought about by the mobile phone provision. Similarly, being subject to phone interviews in and of themselves may induce behavioral changes that in turn translate into changes in development outcomes as measured in baseline versus follow-up face-to-face interviews. Ultimately, randomized survey experiments should be implemented to gauge whether these potential behavioral impacts and survey mode effects materialize and, if so, for how long they might last.  Ambel et al., (2021). Refer to that paper for a more thorough discussion. The results of an adjusted Wald test comparing weighted means are presented (*** p<0.01, ** p<0.05, * p<0.10).  Ambel et al., (2021). Refer to that paper for a more thorough discussion. The results of an adjusted Wald test comparing weighted means are presented (*** p<0.01, ** p<0.05, * p<0.10).   Brubaker et al. (2021). Base row reports the nationally representative mean among all adults in the face-to-face survey. Rows other than the base row report the difference from the base and the p-value from a test of significance for that difference. Sample: all adults in F2F surveys, of which phone survey respondents are a sub-sample.