Data Sharing in Southeast Asia During the First Wave of the COVID-19 Pandemic

Background: When a new pathogen emerges, consistent case reporting is critical for public health surveillance. Tracking cases geographically and over time is key for understanding the spread of an infectious disease and effectively designing interventions to contain and mitigate an epidemic. In this paper we describe the reporting systems on COVID-19 in Southeast Asia during the first wave in 2020, and highlight the impact of specific reporting methods. Methods: We reviewed key epidemiological variables from various sources including a regionally comprehensive dataset, national trackers, dashboards, and case bulletins for 11 countries during the first wave of the epidemic in Southeast Asia. We recorded timelines of shifts in epidemiological reporting systems and described the differences in how epidemiological data are reported across countries and timepoints. Results: Our findings suggest that countries in Southeast Asia generally reported precise and detailed epidemiological data during the first wave of the pandemic. Changes in reporting rarely occurred for demographic data, while reporting shifts for geographic and temporal data were frequent. Most countries provided COVID-19 individual-level data daily using HTML and PDF, necessitating scraping and extraction before data could be used in analyses. Conclusion: Our study highlights the importance of more nuanced analyses of COVID-19 epidemiological data within and across countries because of the frequent shifts in reporting. As governments continue to respond to impacts on health and the economy, data sharing also needs to be prioritised given its foundational role in policymaking, and in the implementation and evaluation of interventions.


INTRODUCTION
In December 2019, an outbreak of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) was reported in Wuhan, China and was determined to cause the novel coronavirus disease 2019 . The World Health Organization (WHO) declared the outbreak to be a Public Health Emergency of International Concern on 30 January 2020, and subsequently a pandemic on 11 March 2020.
The impact of the pandemic required robust research to understand the novel virus and develop effective mitigation and containment strategies (1)(2)(3). In February, the WHO in collaboration with the Global Research Collaboration for Infectious Disease Preparedness and Response (GLOPID-R) developed the Global Research Roadmap in response to the pandemic and identified priority research areas (4). These included: (a) product development for improvement of clinical processes; (b) shedding, natural history of disease; (c) monitoring of phenotypic change and adaptation; (d) immunity; and (e) disease models (4). Since then, a vast number of research has been produced on the clinical aspects of the disease, nonpharmaceutical interventions (NPIs), and public health (3). There has also been interest in the role of the environment (5-10), use of machine learning techniques and digital technologies (11)(12)(13)(14)(15)(16), and government and policy responses (17)(18)(19)(20)(21)(22)(23).
To effectively respond to public health emergencies, there is a need for timely and accurate reporting of statistics and data sharing as highlighted in the recent Ebola and Zika epidemics (24)(25)(26)(27). To this end, the Principles for Data Sharing in Public Health Emergencies consisting of timeliness, ethics, equitability, accessibility, transparency, fairness, and quality have been developed and introduced (26,28,29). The Global Research Roadmap also identified data sharing as a cross-cutting research priority that spans all other key topics (4). As of writing, however, the current evidence into the quality and availability of data is severely limited, with studies focusing primarily around descriptions of data sources or comments on the importance of data and data sharing (24,(30)(31)(32)(33)(34)(35)(36)(37)(38)(39)(40)(41)(42). One research group has examined the data availability for 507 COVID-19 patients reported in January, finding that the majority of information was provided by social media and news outlets (43). Other than this example, there is no other original work that investigates the issues surrounding data availability and data sharing practices during the pandemic. In Southeast Asia, only one study on data sharing during disease outbreaks has been carried out (44). The study evaluated data quality and timeliness of outbreak reporting in Cambodia, Lao PDR, Myanmar, and Vietnam for dengue, food poisoning and diarrhea, severe diarrhea, diphtheria, measles, H5N1 influenza, H1N1 influenza, rabies, and pertussis. Further, it highlighted the broad differences observed in the data quality and timeliness between participating countries, concluding that any international data-curating attempts must be versatile enough to accommodate these.
Ongoing research into the epidemiology of SARS-CoV-2 depends entirely on access to regularly updated and factor-rich data. The benefits and importance of data sharing practices have been well-documented during previous outbreaks. In the ongoing COVID-19 crisis, government organisations, public health agencies, and research groups are responding to the call for rapid data sharing by providing data and curating detailed real-time databases that are readily and publicly accessible (30)(31)(32). Data from various groups have informed more than 100,000 papers on COVID-19 (45). Despite progress in reporting and sharing data, the scale of the global pandemic presents its own unique challenges. First, there are ethical and privacy considerations that need to be balanced carefully against the potential impact of open data sharing. Second, there is a clear lack of capacity and often appropriate computational infrastructure that may make data sharing in real time unfeasible and burdensome (26,27). Such challenges may result in changes in the quality and detail of data reporting between and within countries over time as their respective health systems become increasingly overwhelmed (33). The majority of countries are now routinely reporting the number of confirmed cases and deaths attributed to COVID-19, with the country-wide cumulative totals readily accessible from databases such as the one curated by Johns Hopkins University (30). However, the breadth of further information reported by each country is less understood. Access to demographic and geographic information of cases in particular is critically important in the context of informing policy response, as these provide greater insights into how subgroups of the population in different areas are affected by the disease. Understanding how and when these data are provided is critical to ensuring that modelling efforts and government response are well-informed. Further, understanding global responses to the COVID-19 pandemic will be of increasing relevance as countries begin to develop updated post-pandemic disease response frameworks. Being able to compare and contrast how different countries responded and provided information in the early stages of the pandemic will be crucial in designing better response and reporting pipelines for future global health crises.
Our work thus aimed to explore the scale of data reporting across the broader pandemic timeline by describing the ways in which various countries in a geographic region report COVID-19 data and how the detail of data reporting changed over time. We reviewed detailed epidemiological data from Southeast Asian countries and tracked how countries' reporting of COVID-19 data has shifted. We further evaluated differences in reporting between countries and described the accessibility of epidemiological data during the first wave in 2020. By providing these types of information, researchers may be able to conduct better and more nuanced analyses of epidemiological data of COVID-19. Further, our research provides wider insight into the data pipeline from government to researchers, and how it has adapted over time. This timeline provides greater context to the specific findings of subsequent datadriven research, highlighting areas and time periods where particular data feeds are likely to be particularly biased or data-sparse. We are also able to recommend, based upon our findings, prioritising the use of the early-case histories of specific countries for the calculation of demographic-specific disease parameters. By highlighting particular regions where specific data are available, such as travel history, hospitalisation times and symptom-tracking, we are also able to identify ideal further topics of research in the ongoing attempts to fight the spread of COVID-19.

Study Design
We conducted an observational study to describe and track changes in reporting of epidemiological data during the COVID-19 pandemic in 11 countries in Southeast Asia, namely Brunei, Cambodia, Indonesia, Lao PDR, Malaysia, Myanmar, Philippines, Singapore, Thailand, Timor-Leste, and Vietnam. Such a design allows us to compare the data reporting practices between different countries through time as the pandemic progresses (46).

Data Sources and Compilation
We focused on reporting mechanisms of individual level COVID-19 data from the aforementioned 11 countries in Southeast Asia. The region is characterised by archipelagos and comprises more than 8.0% of the world's population. During the first wave of the pandemic, these 11 countries contributed about 1.3% of the cases to the global count of more than 2.3 million cases on April 20.
We initially reviewed the data of the Open COVID-19 Data Curation Group's centralised repository containing individuallevel information on patients with laboratory-confirmed COVID-19 (47). These included data on the following variables deemed essential in monitoring pandemics: (a) Key dates, which include the date of travel, date of onset of symptoms, date of confirmation of infection, date of admission to hospital, and date of outcome; (b) Demographic information including the age and sex of cases; (c) Geographic information on domicile and travel history at the highest resolution available down to the district level; (d) Any additional information such as symptoms and 'contact tracing data' (i.e., a record of exposure to infected individuals) (47). The collection of data on these variables mirrors the minimum data to be collected for a line list of pandemic influenza cases obtained from surveillance systems, as suggested by the WHO (48). Other sources, such as the interactive dashboard by Johns Hopkins University (30), do not provide detailed individual-level information and hence were not used in this study. At the time of the conduct of this study, the said centralised repository was manually maintained by a number of individuals, and therefore would have potentially missed some information about the COVID-19 positive individuals, particularly occupation that was not recorded in the repository. To validate and augment the data from the centralised repository, we reviewed other relevant and official data sources of each country in different formats including: government trackers and dashboards that report close to real-time data, downloadable PDF reports, downloadable CSV files, and official social media accounts of governmental or public health institutions (Supplementary Table 1). In addition, we reviewed data from news agencies, pre-prints, and peer-reviewed research articles that contained information on COVID-19 cases in the country. We reviewed all possible publicly available data sources from the date when the first confirmed case was reported in the country, and up to April 20. We only collected data at one timepoint, on April 20, and therefore could only use information available then. No updates on the reporting of key epidemiological variables were made for this study.

Data Interpretation and Analysis
We documented trends and changes in how key epidemiological variables were reported by 11 Southeast Asian countries throughout the study period from January 23 to April 20. The reporting methodologies of each country could broadly be separated into three distinct time periods, defined by specific milestones in each country's data reporting. The first time period or "first reporting of cases" (T0) for all countries was the date at which the country reported its first COVID-19 case. Following this, the "first change in reporting" (T1) was the time when the information format was changed from the first report based on available data during the study period. This was primarily characterised by countries establishing a formal channel by which to declare subsequent confirmed cases of COVID-19, as opposed to (T0), where cases were primarily reported via news reports and/or government briefings. Any further changes in the level of detail, also referred in this paper as granularity for geographic data and precision for both demographic and temporal data, in the reporting of any of the epidemiological variables were considered as a "change in reporting" and were noted as a subsequent time period (Supplementary Table 2). This was characterised by countries further updating and altering their previously established formal case declaration channel as their respective data pipelines changed. The "last observed change in reporting" (T2) was the last documented change up to April 20. We also noted the number of cases in each timepoint. In this paper, we only present results on the "first reporting of cases" (T0), "first observed change in reporting" (T1), and "last observed change in reporting" (T2).
We then explored the differences in reporting of demographic, geographic, and temporal data across countries at three key timepoints: at the time they first reported cases (T0), at the time when the reporting first changed (T1), and at the last observed change in reporting (T2). Any change in the level of granularity or precision in reporting is noted. We present these differences for each epidemiological variable classified into: (a) demographic data; (b) geographic data; and (c) temporal data. Data for other epidemiological variables are presented in Supplementary Figures 1-5. We present in Supplementary Figure 6 a summary of what information each country had for each timepoint (T0, T1, T2).

Shifts in Reporting of Epidemiological Data During the First Wave
The first Southeast Asian country to report a COVID-19 case was Thailand on January 23. Singapore, Malaysia, Cambodia, Vietnam and the Philippines subsequently reported cases on or before the WHO declared COVID-19 a Public Health Emergency of International Concern (PHEIC) on January 30. Indonesia,  Table 2). Each key reporting shift is denoted by a colored circle.
Brunei, Timor-Leste, Myanmar and Lao PDR reported their first cases of COVID-19 in March (Figure 1).
Malaysia had the shortest time between reporting of the first case and first change in reporting of epidemiological data. Only a day after their first reported case, more detailed reports on the occurrence of symptoms, and dates of symptom onset and hospitalisation were provided. Similar improvements in terms of the level of granularity and precision in reporting data were also noted for the following countries: Philippines eventually reported comorbidities for some patients, Singapore and Vietnam eventually reported data on occupation, and Timor-Leste eventually reported travel history data. As case numbers increased, several countries provided less detailed information. By March 15, when 96 cases had been identified, Indonesia ceased reporting individual-level data and switched to aggregate data (i.e., number of cases per day). Timor-Leste followed by April 15, when it had 8 recorded cases. The first and the last changes in reporting were the same for Indonesia and Brunei, while Myanmar was the only country that consistently reported individual-level COVID-19 epidemiologic data since reporting its first two cases on March 23 until April 20.

Differences in the Granularity and Precision of Reporting Across Countries
There were minimal changes in the reporting of demographic data among countries. The majority of countries reported age and sex except for Timor-Leste, and only Indonesia shifted from a more precise reporting of age and sex to less detailed reporting (Figures 2A,B). We observed more changes in the reporting of occupation ( Figure 2C); Indonesia only provided occupation data at the time of reporting of first cases, while Singapore and Vietnam included data on occupation of COVID-19 patients at later timepoints.
Location information on domicile and travel history differed across countries and timepoints. While all 11 countries provided domicile information (Figure 3A), only Singapore provided precise-level addresses. Both Indonesia and Malaysia initially provided city-level information and shifted to less granular reporting. For Indonesia, province-level data was being reported by March 15 when it reached 96 cases. Meanwhile, Malaysia started reporting province-level data on March 21 when it reached 1,183 confirmed cases. On the other hand, the information coming from some countries initially presented less granularity or lower geographic resolution: Lao PDR initially reported country-level information, Thailand initially reported province-level addresses and Vietnam initially reported city-level addresses; eventually all three countries reported precise address data. There were less differences observed for travel location data reporting across countries, but also more shifts observed over time ( Figure 3B). Most (8 of 11) provided city-level information of the travel history; only Myanmar provided country-level information, while both Indonesia and Timor-Leste provided no information at the time of reporting their first cases. Only Timor-Leste shifted to a more granular level of reporting over time, while Brunei, Cambodia, Malaysia, Philippines, Singapore and Thailand reported less granular data. Lao PDR shifted reporting travel histories from city-level information when it reported its first two cases to no information being shared when it had six confirmed cases, and then to country-level travel history data when it had reported 11 cases.
For all temporal variables, countries reported either precise dates or no dates at all. At the start of each country's first case, FIGURE 2 | Differences in the level of precision in reporting demographic data: (A) age, (B) sex, and (C) occupation over three timepoints. Only those countries with changes in the level of detail and precision of reporting are highlighted. Each country may shift reporting at any timepoint: at the first reporting of cases' (T0), "first observed change in reporting" (T1), and "last observed change in reporting" (T2). Each country may report less precise data indicated by a decreasing slope (red) or more precise data indicated by an increasing slope (blue) consistently over time. Reporting may not be consistent across timepoints with shifts between different levels of precision (yellow) or reporting may not have changed at all during the study period (gray). The levels of precision are indicated for each epidemiological variable. Age has three levels while both sex and occupation are binary variables.
the majority of countries provided travel history dates except for Brunei, Indonesia, and Timor-Leste (Figure 4A). Only Brunei shifted to reporting dates for the succeeding timepoints while Malaysia, Philippines, and Singapore stopped reporting dates as cases increased. Lao PDR repeatedly shifted between reporting travel dates and excluding this information. The precision of reporting symptom onset dates also varied across countries and timepoints ( Figure 4B). Cambodia, Indonesia and Timor-Leste never reported such information, while Brunei, Myanmar, and Vietnam consistently reported specific dates when symptoms FIGURE 3 | Differences in the level of granularity in reporting geographic data: (A) domicile, and (B) travel history location over three timepoints. Only those countries with changes in the level of granularity or geographic resolution of reporting are highlighted. Each country may shift reporting at any timepoint: at the first reporting of cases' (T0), "first observed change in reporting" (T1), and "last observed change in reporting" (T2). Each country may report less granular data indicated by a decreasing slope (red) or more granular data indicated by an increasing slope (blue) consistently over time. Reporting may not be consistent across timepoints with shifts between different levels of granularity (yellow) or reporting may not have changed at all during the study period (gray). The levels of granularity are indicated for each epidemiological variable. All geographic data have five levels of granularity/geographic resolution: none, country, province, city, and precise.
presented. Malaysia provided day information in the succeeding timepoints while Philippines, Singapore, and Thailand eventually stopped reporting the date of symptom onset. Lao PDR repeatedly shifted between reporting of dates to no reporting. Date of confirmation showed consistent reporting in all countries except Thailand, which stopped its reporting when it had 42 cases ( Figure 4C). Several countries initially reported the date of admission except for Brunei, Cambodia, Indonesia, Malaysia, and Timor-Leste ( Figure 4D). Only Thailand had a shift in reporting dates of discharge, recovery, or death -reporting this information only in late February when it had 42 cases (Figure 4E).

DISCUSSION
Responding to calls for data sharing and transparency, most governments in Southeast Asia established publicly available sources of COVID-19 individual-level information. This commitment to data sharing and reporting allowed the comparison of the different data reporting practices of the countries in the region. We found that countries in Southeast Asia have different reporting practices since the start of the pandemic and during the first months of its progression. Overall, reporting of epidemiological data in Southeast Asia is precise and detailed. Many variables were consistently maintained throughout the initial outbreak period, but those with changes in reporting started early with case counts as low as four to as high as 136. There was little to no change in reporting of demographic data while changes in reporting of geographic and temporal variables were frequent and unpredictable as the pandemic progressed. Further, we find that changes in the level of precision in reporting does not only depend on case numbers, but also on the policies and interventions implemented. Comparisons across countries for different epidemiologic variables showed that national governments may shift to a less or more precise reporting of data as dictated by the burden of COVID-19 in the communities and/or their national response. As an example, Indonesia started reporting aggregate data less than two weeks after their first case was reported. Their government did not implement a nationwide lockdown, but rather focused on scaling up capacity, treating patients and supporting economic recovery. Conversely, Lao PDR, Thailand and Vietnam reported more precise demographic and geographic data at the end of the study period compared to how they reported their first cases. The national governments of these countries established mechanisms to quickly identify and isolate cases and their contacts requiring detailed contact tracing data. Our findings also show that most countries reported more precise information towards the end of the study period, but some variables such as travel history location were reported with less detail compared to the increased granularity for domicile data. These trends in travel history data highlight the shift in priorities of the governments in the (Continued) FIGURE 4 | Each country may shift reporting at any timepoint: at the first reporting of cases' (T0), "first observed change in reporting" (T1), and "last observed change in reporting" (T2). Each country may report less precise data indicated by a decreasing slope (red) or more precise data indicated by an increasing slope (blue) consistently over time. Reporting may not be consistent across timepoints with shifts between different levels of precision (yellow) or reporting may not have changed at all during the study period (gray). The levels of precision are indicated for each epidemiological variable. All date variables have three levels of precision: none, month, and day.
region towards managing local transmission. Southeast Asian countries implemented travel restrictions early, therefore having fewer imported cases and less need for precise travel history data (49). Data on dates of symptom onset, confirmation, admission, and outcome (discharge, recovery, or death) are important in estimating disease burden and forecasting health service needs. Dates of confirmation and outcome (discharge, recovery, or death) were reported consistently by most countries. This reflects the effective system of governments to register all confirmed patients in their database upon entry and exit in the healthcare system. However, we found that dates of symptom onset and hospital admission were no longer reported at the end of the observation period for some countries. The reporting of less precise dates could be attributed to the increasing incidence of COVID-19, which could have overwhelmed data reporting mechanisms of the countries, particularly because individual patient follow-up requires symptom onset dates to be accurately logged. Governments thus need to establish systems that allow accurate and fast reporting of detailed temporal data. Lack of precision could adversely affect the quality of mathematical models and other analyses, which are used to forecast demand for health services and make decisions. This consequently impacts the responses to COVID-19 at a national and subnational level, which is of greater concern among lowand middle-countries (LMICs) that already have fragile health systems. Our findings provide insights on how different health systems respond to the pandemic. Consequently, these could be used to guide how publicly available data are analysed, used, and interpreted.
Most countries reported COVID-19 data daily, with unclear reporting frequencies only being observed for Brunei, Lao PDR, and Timor-Leste. These countries do not report new cases every day because of the low number of new daily cases leading to days where no additional cases are confirmed. As they only provide updates on days when new COVID-19 cases are confirmed, their frequency of providing data updates on COVID-19 is thus irregular. Countries primarily reported individual-level data in either HTML and PDF formats, which necessitates scraping and extraction before such data could be used in analyses. During the study period, only Thailand provided a downloadable CSV format of their data. Ready-to-use data formats are important as these allow the public and scientific community to rapidly view and analyse country-specific information.
Shifts in reporting, especially from a detailed level of reporting to aggregated data, provide a challenge for accurately comparing epidemiological situations between countries, more so for understanding disease dynamics and guiding government actions. In China, it has been shown that changes in reporting have impacted modelling results of the transmission parameters of COVID-19 (45). Further, as the pandemic progresses and epidemiological information becomes increasingly less available, analyses of detailed case counts that cover the entire duration of the epidemic may not be feasible (32). In Nigeria, a forecasting algorithm has been proposed for use in policy responses given the limited data and constrained data infrastructures in the country (50). In Spain where data have been aggregated as early as May, there have been challenges in conducting age-specific time series, understanding disease transmission, and recommending interventions and policies (42). These three examples are evidence that detailed COVID-19 data are necessary, not only for research purposes, but to ultimately guide policies that avert cases and deaths in the country.
An important limitation of this study is the collection of data at only one timepoint in April. This may not accurately reflect the daily reporting situation of the 11 Southeast Asian countries when the pandemic started. Another limitation is the absence of any assessment on data quality. This evaluation was not carried out because of the fast progression of the pandemic with corresponding rapid changes in data reporting. The lack of an up-to-date and complete line list also prevents a thorough assessment of data quality. Lastly and most importantly, an evaluation of data quality also requires the consideration of other indicators such as flexibility, representativeness, data security and system stability to provide a more accurate picture of health systems and disease surveillance systems (44). These, information are not readily available and require more resources to be collected. Despite such caveats, however, this study is the first to systematically describe and compare reporting of important epidemiological data for COVID-19 across countries during the first wave. Our findings will allow researchers to conduct more nuanced analyses using epidemiological data of COVID-19.

CONCLUSION
Reporting systems in the region have been quickly established and countries provided detailed individual-level data during the first wave. This pandemic highlights the critical role of timely, accurate, and precise data sharing during outbreaks of global scale. Some concerns regarding data sharing remain, such as data privacy and public criticisms (26,27). Given that sharing of data is needed for evidence-informed policies and interventions, maintaining and strengthening data reporting systems should still be a priority of countries (51)(52)(53). For the purposes of surveillance on emerging infectious diseases, we recommend that governments coordinate data collection and reporting so that data are as comparable as possible between countries. Countries may also benefit from reporting data in a fully open access format that is readily available and in machine-readable formats to accommodate new epidemics and context-specific information. Hopefully, more governments will come to share precise data to allow more nuanced analyses. This will provide an opportunity to better understand the disease and how best to respond to the pandemic.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analysed in this study. Data can be found here: https://github.com/beoutbreakprepared/ nCoV2019.

AUTHOR CONTRIBUTIONS
AA and VP wrote the initial draft of the manuscript with inputs from BG and TR. All authors contributed to the study concept and design, data compilation, analysis, interpretation, and critically revised the report and approved the final version for submission.

FUNDING
The work was supported through an Engineering and Physical Sciences Research Council (EPSRC) (https://epsrc.ukri.org/) Systems Biology studentship award (EP/G03706X/1) to TR. This project was also supported in part by the Oxford Martin School. The funders had no role in study design, data collection and analysis, and decision to publish or preparation of the manuscript.