Blackout or Blanked Out ? Monitoring the Quality of Electricity Service in Developing Countries

Access to reliable electricity is a Sustainable Development Goal (SDG), and a key determinant of both economic growth and individual wellbeing. However, traditional methods of collecting electricity reliability data are often prohibitively expensive. In the absence of sophisticated monitoring systems, policy-makers in developing countries commonly rely instead on surveys to gauge needs and prioritize investments. However, the accuracy of survey-based methods is unclear. To shed light on this issue, we built a low-cost national electricity outage monitoring network of Smart Survey Boxes (SSBs) for this study, using off-the-shelf components in Tajikistan – a country with serious seasonal electricity supply constraints. The system was introduced alongside a high-frequency phone-based household survey called Listening-toTajikistan. The integration of the two allows benchmarking the survey responses against the unbiased “gold-standard” automated system. The results show that although the two measures are well correlated, survey data nonetheless suffer from significant and systematic bias. On average, survey respondents i) systematically under-report the incidence and severity of electricity outages, but ii) systematically over-report the incidence and severity of outages during periods of abnormally widespread outages of long duration. These findings indicate that, where feasible, automated monitoring can provide more accurate measurement of the quality of electricity service provision. For survey settings, the results also suggest that estimates are more accurate when short (daily) reference periods are used, but that care should be taken to account for time trends. 1 The World Bank, Poverty and Equity Global Practice. Email: wseitz@worldbank.org 2 Institute of Developing Economies (IDE-JETRO). Email: Yuya_Kudo@ide.go.jp 3 The World Bank. Email: jazevedo@worldbank.org


I -Introduction I.I -Monitoring Electricity Reliability
Sustainable Development Goal (SDG) 7 aims to ensure universal access to affordable, reliable, and modern energy services by 2030.Electricity outages have serious negative consequences on wellbeing, productivity, and livelihoods.Best-practice monitoring systems in upper-income countries rely on sophisticated monitoring equipment that relay detailed information on the provision of electricity service to users.However, such systems are absent in most developing countries, and even where the infrastructure is present, the data are only seldom reliable and publicly available.
To address these data gaps, outages and the constraints they impose on users are often monitored using surveys.Data collection efforts as diverse as the World Bank Enterprise (WBE) surveys, the U.S. Consumer Expenditure Survey (ES), Living Standards Measurement Surveys (LSMS), and a wide range of theme-specific research surveys all collect data on electricity outages for the purposes of understanding the effects of electricity outages on economic and social outcomes.In most cases, these surveys collect information by means of traditional recall methods.
Survey data that identify electricity outages are subject to imprecision arising from several sources however and establishing the effects of outages is often complicated by shortcomings in measurement.Some of the better-known challenges relate to recency, recall, desirability, and rounding biases.The imprecision that arises from these errors can have practical consequences for policy in several ways.For instance, outage monitoring is often used as a measure of accountability, and imprecision or bias in reporting could lead to inefficient management of resources.Likewise, systematic reporting bias between consumer subgroups could generate inefficiencies in the allocation of scarce resources.Biases in consumer perceptions could also lead to invalid estimates of the impact or potential outcomes of policy reform.
There is empirical evidence in the consumer behavior literature that systematic reporting differences may be a concern when using surveys for accountability purposes.Levanon et al., (2015) for instance find that across 12 countries, consumers with high social status tended to register more service failures and to complain more frequently than customers of lower social status.In the literature on power sector reform, many analyses recommend taking political economy considerations into account, (Sulistiyanto and Xun 2004;Victor and Heller 2007;Erdogdu 2013).This study also provides a means to objectively distinguish between the accuracy of outage reporting between higher and lower income groups (though with the important limitation that survey-based responses may not directly translate into reporting outages to authorities due to self-selection effects).
This study further contributes to a large and growing literature investigating the sensitivity of survey results to design features and other potential sources of bias.For instance, Beegle et al. (2010) find that reference periods and mode (i.e., recall vs. diary-based data collection) can materially impact the results of consumption and welfare surveys, and Bardasi et al. (2010) find similar relationships for labor force surveys.In both cases, these studies establish the extent qualitatively large and economically significant differences due entirely to survey design.
The results of this paper suggest that bias in reporting electricity outages in Tajikistan primarily takes the form of under-reporting.Respondents systematically report fewer instances of electricity outages, and when they do report an outage as having occurred, report that it lasted less long than was truly the case.However, the results also suggest that the direction of bias changes during periods of widespread outages of significant duration.During such episodes, respondents are more likely to overstate the extent and severity of electricity outages, in comparison to the unbiased SSB-based measure.

I.I -Electricity Provision in Tajikistan
The Soviet Central Asian Power System (CAPS) was built in the 1970s and included what is now the independent country of Tajikistan.The system provided nearly universal electricity in the region but was optimized without regard to country borders.With independence, Tajikistan and other countries entered complex bilateral agreements for fuels, electricity, and water.Disagreements between the member countries led to a partial disintegration of the CAPS, and ultimately, Tajikistan's exit from most cross-border arraignments, including for electricity and gas.Official policy aimed for energy selfsufficiency with heavy reliance on hydroelectric power.However, low prices and a lack of alternative sources led to widespread use of electricity for heating in Tajikistan, and serious shortages in the winter.
Traditional measures indicate that Tajikistan is among the worst performers in the ECA region in terms of the reliability of electricity supply to households.Although almost all households are connected to the electricity grid, about 70 percent of the population suffers from extensive shortages of electricity during the winter.As of 2013, these shortages, estimated at about 14 percent of annual electricity demand, imposed economic losses estimated at US$200 million per year, or 3 percent of GDP.Satisfaction with electricity provision in rural areas is the worst in the region of Europe and Central Asia.Tajikistan also suffers from a uniquely large divergence in satisfaction between urban and rural areas (83 percent vs. 34 percent).About 13 percent of households use voltage stabilizers due to unpredictable power surges, and about 10 percent of households in the Listening-to-Tajikistan (L2T) survey report recent damage to electronic devices due to power surges.

Figure 1: Satisfaction with Electricity Provision in Europe and Central Asia
The L2T survey asks respondents about the electricity outages they have experienced over several reference periods.To summarizing these results, we find that during the summer and fall, when hydroelectricity production is at its peak and electricity-based heating less common, between 80 and 85 percent of households receive a continuous supply of electricity on any given day.In April 2016, about 21 percent of households reported outages on any given day, falling to about 12 percent in September.In May, about 7 percent of households lacked electricity during prime hours (between 6PM and 10 PM), falling to 3 percent by September.About 13 percent of households use voltage stabilizers due to unpredictable power surges, and about 10 percent of households in the L2T survey report recent damage to electronic devices due to power surges.
Both the survey and SSB data recorded pervasive and lengthy outages in during the winter of 2016 when electricity rationing was in effect.This trend did not continue into the winter of 2017, when water levels and relatively warm temperatures permitted sustained electricity generation throughout the season.However, the SSB-based measures indicate a structurally higher rate of electricity outages during non-rationing periods than the survey-based indicators  The next section introduces the survey and station data collection methods employed in this study.Section III describes the analytical approaches used, focusing on comparisons between the surveybased reporting and the "gold-standard" automated monitoring system.Section IV describes and the results and provides interpretation of their significance for monitoring initiatives and policy-making.Section V concludes.

II -Data
Data from two sources are used in this study: data gathered from a monthly phone-based panel survey of households, and from an automated system of Smart Survey Boxes.

II.I -Listening-to-Tajikistan
The analyses in the following section use novel data collected by the authors in the L2T survey.
Fieldwork proceeded in two stages.The first "baseline" stage included an extensive face-to-face interview with household members.Households were selected using a stratified two-stage clustered sample design based on the 2010 national census of Tajikistan.A total of 150 clusters were selected, with a probability of selection proportional to size.From these clusters, 3000 households were selected to participate in the baseline survey.During the baseline interview, detailed information was gathered on household composition, asset ownership, and migration.A full household roster was collected (including age, gender, etc.), as well as a comprehensive consumption module that allows analysis in terms of monetary welfare groups in the country.The baseline survey was completed in March 2015.Following completion of the face-to-face fieldwork, interviewers began regularly calling a randomly selected panel of 800 households over the phone to conduct short interviews, following a set monthly schedule agreed to by the participating household.The questionnaire for these phone interviews was designed to monitor trends in subjective wellbeing, alongside measures of income, employment, service disruptions, and related indicators.
Phone-based interviews began in May 2015, and the first 38 rounds of the survey are used in the analysis that follows, covering the entire period to January 2018.The frequency of interviews changed twice during the first year of data collection. 4To ensure comparability across observations, reference periods were maintained for all questions.After excluding missing values and inconsistent entries, a total of 30350 unique observations are available for analysis.The analysis that follows assesses the importance of several time-varying measures of deprivation with respect to contemporaneous measures of wellbeing.These indicators are listed in table (2).Each question is asked every month; however, the reference period is fixed at 10 days (except for employment, which is asked over a one-week reference period).Reference periods remained constant over the survey period used in the analysis below.

Table 3: Survey-based Questions
Did you have any electricity outage at home over the past 10 days?On how many days were there electricity outages at home over the past 10 days?On the days with electricity cuts, for how many hours was electricity out at home, on average?For how many hours was electricity available for your household yesterday?For how many hours was electricity available between 6PM and 10PM yesterday?For how many hours was electricity available for your household the day before yesterday?For how many hours was electricity available between 6PM and 10PM the day before yesterday?Over the past 10 days, have any of your electrical appliances been damaged by a power surge?What is source of electricity supply used most in your household?Have you changed the electricity source used most often in the past 6 months?If so, what was source of electricity supply previously used most in your household?Who do you currently pay for electricity?
In the last 12 months, did any household members die or have permanent limb (bodily injury) damage because of electricity system?Does your household have a voltage stabilizer?Is your household advised not to power certain appliances at the same time?

II.II -Smart Survey Boxes
The SSBs were constructed in Tajikistan using off-the-shelf materials.The components included a plastic outdoor electricity box, a mobile phone, fasteners, a charger extension cord, a sim card, and a charging cord splitter (to ensure that the device did not occupy an outlet).The phones were locked to allow only text message functionality, and the sim cards programmed so that they could only send text messages, and not receive them.Open source software was downloaded to each phone and programmed to send automated text messages to a central phone number.The unit cost per SSB was about $50.
Two devices were installed in households that volunteered to participating in each of the 150 selected clusters.This approach allows cross validation for local estimates by comparing messages from separate devices for the same locations.To ensure that the presence of a SSB does not induce changes in survey-based reporting behavior, the households selected to report in L2T were different from those that were offered to host a SSB in their home.The boxes send four message types; two related to electricity events (defined as changing status to "the electricity is now on" and "the electricity is now off"), and two scheduled messages.The devices send a message each time the device detects an electricity outage, and another when the electricity turns back on.The devices are also programmed to send messages twice daily at pre-specified times, confirming the presence/absence of electricity at that time.The messages are received by a tablet device located in country.This device automatically saves all messages and automatically uploads them to the cloud for processing.The resulting data provide the day are then processed to describe the state of each monitoring box minute-by-minute.

II.III -Comparing Summary Statistics
Direct comparison of the survey vs. the box data reveals significant inaccuracies in respondent reporting, due both to over-reporting outages that did not occur, and under-reporting those that did. Figure (x) reports the differences in the average hours of electricity outrages and the number of days (over the past 10) on which outages occurred.Respondents reported outages of slightly shorter duration than was accurate, but more noticeably mis-reported the number of days on which outages.Simple comparisons of average outage and outage duration summary statistics also suggest significant reporting errors, even before drawing direct comparisons over identical reference periods.According to SSB data, taking an average over a one-month period (and an average between the two monitoring devices in each PSU when they do not correspond exactly), the daily average hours without electricity in Tajikistan is a bit more than two hours, or about 131 minutes (Table 2: Panel A1).However, the mean is influenced by a relatively small number of days with outages of long duration, and the median outage duration is a little more than 30 minutes per day.It is often supposed that focusing on the prime hours of household electricity usage between 18:00 and 22:00 yields more accurate results regarding electricity outages, as respondents are both more likely to be home during that window and are also more likely to notice outages due to their inconvenience.The SSB data show that over a month average, outages last about 18 minutes, and are likewise sensitive to outlier days (the median during an average month is about 2.4 minutes).
Comparing with at the SSB data by day (Table 2: Panel A2), a similar picture emerges.On average, outages last about 137 minutes.However, a median day has no outages at all, highlighting the importance of the method of data aggregation to report summary statistics.During prime hours of household use, mean outage duration was statistically indistinguishable from the monthly average estimates, however, there was likewise a significant difference in the median, where more than half of all days have no outages at all.More remarkable differences emerge when comparing across the mode of data collection.Estimates derived using survey-based data collection (Table 2: Panel A3) result in estimates between 1h and 51 minutes and nearly two hours, substantially lower than SSB estimates which are more than two hours in all cases.Similarly, the average duration of outages between 18:00 and 22:00 are much lower in survey-based approaches.Over the relatively short reference periods of "yesterday" and "the day before yesterday."These comparisons are drawn out more directly in table () which reports simple t-tests comparing the two data collection methods from each source for the same location and period.Although the data are structured differently (SSB data provide continuous coverage) SSB data can be harmonized to cover the given reference period (for instance "yesterday" or "the day before yesterday" calculated using the interview date) and compared directly.
Survey-based methods report about 10 percent fewer outages "yesterday" and a similar rate with respect to the "day before yesterday."When asked about the preceding ten-day period, respondents are even less accurate, reporting outages only at about half the rate that boxes recorded them.
Similarly, reported outage duration was also substantially lower than the actual duration.For the previous day, estimates derived from survey responses were about 18 minutes less than actual outage duration.Over prime hours during the previous day and the day before, respondents were off by a smaller number of minutes, but given the relatively short periods, were remarkably inaccurate.When asked over the previous ten days, estimates from survey data were lower by slightly more than two days (out of 10), and average duration was under-reported by an average of about 16 minutes.
These differences are statistically significant at customary levels and suggest that data gathered using surveys in this context results in systematic under-reporting errors.Shorter and more concrete reference periods (past day/between 6 and 10PM) reduce some bias, but do not come close to eliminating bias.
Another means of drawing these comparisons is provided in table () which focuses on measumrents of "any outage yesterday" and "any outage the day before yesterday".The possible outcomes are i) both the survey and the boxes report an outage, ii) both the survey and boxes report no outage, iii) the survey reports an outage and the boxes report no outage, and iv) the boxes report an outage and the boxes report no outage.The estimates "survey underreporting" bias should be comparable in this approach to the estimates provided in table (), and indeed, summary statistics reported in table () align as expected.

III -Accounting for fixed effects and time trends
A complimentary approach to document systematic differences between survey-based data and SSBs uses a regression framework to establish average cluster-level differences between survey-based reported outages and actual outages, while accounting for temporal fixed effects: Where Y  is the outcome variable of interest (such as average outage duration, or number of outages) aggregated in PSU g for time t.The variable   is a dummy variable indicating whether the observation was obtained using data collected from the survey (==1) or from the automated system (==0).The term   is a vector of calendar date dummies and   is a random error term.
The first set of results provided in Table () demonstrate statistically significant differences between data collection method.The significantly negative differences between survey responses and SSB results confirm to the expectations given the summary statistics in Table 5. Survey-based estimates are substantially lower on every measure and over every reference period.In terms of magnitudes, the largest errors are present in comparison 8, where, over a 10-day reference period, survey estimates are off by about 4 days out of ten, on average.

III -What determines accuracy?
To understand the household and/or individual characteristics that correlate with reporting error, a new variable is constructed as the squared difference between reported and observed outages over specific periods (coinciding with the reference periods in the survey), such that: Where Y ̅  is the difference in outcome variable of interest between the survey and the observed outcome; for instance, whether household i had any outage, or the length of the outage, for individual i in week t.The variable ℎ  is a set of household characteristics of interest, such as per capita income and demographic indicators.The term   is a vector of calendar week dummies and   is a random error term.
• The results do not find that richer people are systematically more (or less) likely to mis-report outages.
• Sick people are more likely to report outages • Reporting is more accurate when short (daily) reference periods are used.Interestingly however, responses for the "day before yesterday" are slightly more accurate, on average, than for yesterday.• Rural households are more accurate than urban households.Urban households tend to underreport outages by a statistically significant amount.

V -Conclusion
Access to reliable electricity is a key determinant of both economic growth and individual wellbeing.However, traditional methods of collecting service reliability data in surveys are of questionable validity due to several potential sources of bias.A low-cost national electricity outage monitoring network of Smart Survey Boxes (SSBs) was built for this study, alongside the rollout of a highfrequency phone-based household survey called Listening-to-Tajikistan.The integration of the two allows benchmarking the survey responses against the unbiased "gold-standard" automated system.
The results show that although the two measures are well correlated, survey data nonetheless suffer from significant and systematic bias.On average, survey respondents i) systematically under-report the incidence and duration of electricity outages, but ii) systematically over-report the incidence and severity of outages during periods of abnormally widespread outages of long duration.These findings indicate that, where feasible, automated monitoring can provide more accurate measurement of the quality of electricity service provision.For survey settings, the results also suggest that estimates are much more accurate when short (daily) reference periods are used, but that care should be taken to account i) time trends and ii) the salience of electricity outages for respondent subgroups.
VII -Robustness Checks

VII.I -Validating SSB data
In some cases, poor weather, network service disruptions, and other problems resulted in short periods of missing data in the automated system.For the results included in this paper, no imputations are used.However, for public reporting of the results, an imputation procedure was used to fill in missing observations.All the out-of-the-box model building, testing and evaluation was done using Python Scikit-learn framework (sklearn).However, in addition to these out-of-the-box models, we also tested a custom spatio-temporal nearest neighbor model.Based on earlier experimentation, we picked a decision tree-based model (called extremely randomized trees classifier) from sklearn and compared it with a custom model and two base dummy classifiers (majority and random).
To evaluate the models, we chose to use accuracy as the main metric because it's simple to interpret.
In our case, for a given box, accuracy is simply the number of correctly predicted test cases.Next, summary measures (minimum, median, mean and maximum) are calculated for all the boxes and it's the summary measure (median accuracy) which is used to select an overall model for all the boxes.In addition to accuracy, other metrics (precision and f1-score) are also calculated to corroborate results from accuracy.
Based on the evaluation results, the extremely randomized trees model performed best and was picked as the imputation model.The model is a small variation of randomized trees model which essentially are perturb-and-combine techniques specifically designed for decision trees.This means a diverse set of classifiers is created by introducing randomness in the classifier construction.The prediction of the ensemble is given as the averaged prediction of the individual classifiers (trees).
Validation of the imputation approach was conducted by randomly dropping observations from the data and replacing them with imputed values.The appropriate benchmark for this performance is the "majority" allocation.This assign either "on" or "off" depending on which state was the most common for that box.In the validation exercises, the accuracy of the majority allocation was about 70 percent, while imputed values using the random forest model had an average accuracy of 97 percent.The worst performing box still had 80 percent accuracy, while the boxes for which the estimations were most reliable were nearly 100 percent accurate using the imputation model.

Figure 2 :
Figure 2: Survey Reported Hours of Electricity Outages vs. SSB Reported Outages

Figure 3 :
Figure 3: Map of Monitoring Locations

Figure 4 :
Figure 4: Average Hours of Electricity Outage using SSB System, by Day

Figure 5 :
Figure 5: Difference between reported outage frequency (survey) and true outage frequency (SSB)

Table 7 : Regression-based Estimates Accounting for Temporal and PSU Fixed Effects
no control of frequencies of messages sent in the reference day & time