Optimising Clinical Epidemiology in Disease Outbreaks: Analysis of ISARIC-WHO COVID-19 Case Report Form Utilisation

Standardised forms for capturing clinical data promote consistency in data collection and analysis across research sites, enabling faster, higher-quality evidence generation. ISARIC and the World Health Organization have developed case report forms (CRFs) for the clinical characterisation of several infectious disease outbreaks. To improve the design and quality of future forms, we analysed the inclusion and completion rates of the 243 fields on the ISARIC-WHO COVID-19 CRF. Data from 42 diverse collaborations, covering 1886 hospitals and 950,064 patients, were analysed. A mean of 129.6 fields (53%) were included in the adapted CRFs implemented across the sites. Consistent patterns of field inclusion and completion aligned with globally recognised research priorities in outbreaks of novel infectious diseases. Outcome status was the most highly included (95.2%) and completed (89.8%) field, followed by admission demographics (79.1% and 91.6%), comorbidities (77.9% and 79.0%), signs and symptoms (68.9% and 78.4%), and vitals (70.3% and 69.1%). Mean field completion was higher in severe patients (70.2%) than in all patients (61.6%). The results reveal how clinical characterisation CRFs can be streamlined to reduce data collection time, including the modularisation of CRFs, to offer a choice of data volume collection and the separation of critical care interventions. This data-driven approach to designing CRFs enhances the efficiency of data collection to inform patient care and public health response.


Introduction
Clinical data are the foundation of an evidence-based response to outbreaks-at both the clinical and policy levels.The case definition, time to symptom progression, rates and covariates of clinical outcomes, and risk factors for severe disease or fatality must be rapidly defined to manage novel infections or unexpected changes in the epidemiology of known infections [1].However, the collection of clinical data relies on the availability and time of front-line health care workers and research staff.This is especially true in low-resource settings where electronic health records are less common and disease outbreaks are more common [2,3].As health emergencies often cause an increase in clinical caseload and other disruptions to health systems [4,5], it is important to minimise the burden of data collection to avoid detracting from patient care responsibilities.Evidence and innovations are required to design more efficient data tools that continue to address key knowledge gaps while optimising the use of limited staff resources.
For more than a decade, ISARIC (the International Severe Acute Respiratory and emerging Infections Consortium) and the World Health Organization (WHO) have developed case report forms (CRFs) that collect data to characterise the key clinical features of emerging infectious diseases [6].These forms have been used in response to outbreaks of MERS-CoV, Ebola virus disease, Zika, severe acute respiratory illness, acute non-A-E hepatitis, and Mpox [7][8][9][10][11][12].In January 2020, upon recognition of the need for urgent research on the novel SARS-CoV-2, ISARIC and WHO adapted their existing data forms and created a paper and electronic CRF to characterise COVID-19 [13].The form was used by thousands of sites around the world to collect data that informed local, regional, and international pandemic responses [14].While some sites used the form in its original design [15][16][17][18], others used it as a template to create a locally tailored version [19][20][21].Requests for support to amend the forms for specific contexts, including low-resource settings and critical care units, resulted in the creation of an ISARIC-WHO 'Rapid' COVID-19 CRF and add-on modules with bespoke fields (or questions) for special populations, e.g., pregnant women.All forms and modules were openly accessible, available in six languages.Data management support was provided by ISARIC where requested.ISARIC also hosted a COVID-19 data platform where investigators and public health agencies from 82 countries shared and collaboratively analysed data from more than 950,000 individual patient records collected using the forms.These accumulated data were used to plan health resources, design interventional research studies, support medicine licensing applications, improve clinical care, and inform public health policy by identifying risk factors for severe illness [22][23][24][25][26][27][28].The scale of this global effort represents a significant milestone in global pandemic collaboration, meriting in-depth analysis to learn from this experience and improve approaches for future outbreaks.
ISARIC and WHO continue to collaborate on the development of CRFs to characterise emerging infectious diseases.To date, the most commonly method used to design the ISARIC-WHO forms and similar CRFs relies on a review of the available evidence and an expert consensus process [20,[29][30][31][32][33].While this approach is valuable, the resulting forms are not commonly evaluated post hoc to determine and improve their performance.For the purpose of informing a more efficient design of forms for future outbreaks, we analysed the use and completion of the ISARIC-WHO COVID-19 CRF.The objective of the analysis was to assess the utilisation of each data field by identifying which fields were included or omitted across the locally adapted and/or implemented CRFs and to determine which fields were most reliably completed with patient data.Using COVID-19 patient data submitted to the ISARIC platform, the field utilisation in data from patients with severe disease was compared to the field utilisation in all patients.Understanding the patterns of field selection and entry provides a data-driven approach to refining the ISARIC-WHO form to optimise data collection strategies for other emerging health threats.The results of this analysis will inform how ISARIC, WHO, and outbreak research responders can structure future CRFs to streamline data capture for priority research questions.

Data Collection
The ISARIC-WHO COVID-19 CRF and ISARIC-WHO Clinical Characterisation Protocol were designed, adapted, and implemented for prospective data collection on hospitalised patients suspected or confirmed to have COVID-19.Recruitment strategies were defined at a site or national level, with suggested approaches to limiting recruitment bias offered by ISARIC (see Appendices A and B).The protocol, informed consent forms, and CRFs are available on the ISARIC website [34].Decisions regarding which forms to use and how to adapt them were made at an institutional or national level.
Pseudonymised individual patient-level data collected between 24 January 2020 and 20 March 2023 were shared with the ISARIC COVID-19 Data Platform for the purpose of collaborative analysis and evidence generation.The list of contributing sites and individuals is available on the ISARIC website (www.isaric.orgaccessed on 1 July 2024).Detailed methods of data collection and curation have been described previously [22].Data that were not mapped to the central database were not included in this analysis.Data collection sites implemented data quality checks according to locally available resources.ISARIC verified data submitted to the COVID-19 Data Platform for missing and expected values and queried discrepancies with the sites.Data have been analysed extensively in collaborative studies by the ISARIC partners, driving iterative review and the validation of all datasets by those who generated the data.
Severe patients were defined as those with any of the following: admission to an intensive or high-dependency care unit; and treatment with inotropes, vasopressors, high-flow nasal cannulas, invasive mechanical ventilation, or non-invasive mechanical ventilation.The total, median, and interquartile ranges were determined for the number of sites, total patients, severe patients, and number of collection months of each institutional dataset, and across all datasets.

Description of CRF Sections and Fields
The ISARIC-WHO COVID-19 CRF has 243 fields divided across those collected at hospital admission (103 fields), 'daily' fields collected routinely each day during hospitalisation (63 fields), 'summary' fields collected on key events throughout hospitalisation (71 fields), and 'outcome' fields collected at death or discharge (6 fields) (Figure 1).The CRF fields are divided into groups based on the types of information they contain.Admission field groups include demographics, vital signs, signs and symptoms, comorbidities, pre-admission medications, and laboratory tests.Daily field groups include vitals and assessments, interventions, and laboratory tests.Diagnostics, interventions, and complications comprise the summary field groups.Outcome fields consist of outcome status (e.g., death, discharge, hospital transfer, and date) and outcome health (e.g., delayed discharge and ongoing care).The CRF, colour-coded with the results of this study, is available in the Supplementary Materials.
Epidemiologia 2024, 5, FOR PEER REVIEW 3 Pseudonymised individual patient-level data collected between 24 January 2020 and 20 March 2023 were shared with the ISARIC COVID-19 Data Platform for the purpose of collaborative analysis and evidence generation.The list of contributing sites and individuals is available on the ISARIC website (www.isaric.orgaccessed on 1 July 2024).Detailed methods of data collection and curation have been described previously [22].Data that were not mapped to the central database were not included in this analysis.Data collection sites implemented data quality checks according to locally available resources.ISA-RIC verified data submitted to the COVID-19 Data Platform for missing and expected values and queried discrepancies with the sites.Data have been analysed extensively in collaborative studies by the ISARIC partners, driving iterative review and the validation of all datasets by those who generated the data.
Severe patients were defined as those with any of the following: admission to an intensive or high-dependency care unit; and treatment with inotropes, vasopressors, highflow nasal cannulas, invasive mechanical ventilation, or non-invasive mechanical ventilation.The total, median, and interquartile ranges were determined for the number of sites, total patients, severe patients, and number of collection months of each institutional dataset, and across all datasets.

Description of CRF Sections and Fields
The ISARIC-WHO COVID-19 CRF has 243 fields divided across those collected at hospital admission (103 fields), 'daily' fields collected routinely each day during hospitalisation (63 fields), 'summary' fields collected on key events throughout hospitalisation (71 fields), and 'outcome' fields collected at death or discharge (6 fields) (Figure 1).The CRF fields are divided into groups based on the types of information they contain.Admission field groups include demographics, vital signs, signs and symptoms, comorbidities, preadmission medications, and laboratory tests.Daily field groups include vitals and assessments, interventions, and laboratory tests.Diagnostics, interventions, and complications comprise the summary field groups.Outcome fields consist of outcome status (e.g., death, discharge, hospital transfer, and date) and outcome health (e.g., delayed discharge and ongoing care).The CRF, colour-coded with the results of this study, is available in the Supplementary Materials.

Evaluation of Field Inclusion and Completion
All fields on the ISARIC-WHO COVID-19 CRF (version 8DEC2021) were evaluated in this analysis, except for those added more than 12 months after the launch of the CRF (e.g., vaccination and virus variants), fields relevant to a particular subgroup of the population (e.g., infants), and fields that were not mapped to the central database (e.g.,

Evaluation of Field Inclusion and Completion
All fields on the ISARIC-WHO COVID-19 CRF (version 8DEC2021) were evaluated in this analysis, except for those added more than 12 months after the launch of the CRF (e.g., vaccination and virus variants), fields relevant to a particular subgroup of the population (e.g., infants), and fields that were not mapped to the central database (e.g., imaging).These fields were omitted from this analysis due to the absence of data in most patient records.No imputations were made for missing data.
The inclusion of each field from the original version into the versions used by individual institutions was at the discretion of each institution.Fields were considered included in an institutional dataset if the field featured within the CRF or if any data on that field were available in the dataset.The number and percentage of fields included in each dataset were calculated for each section of the CRF, as per Equation (1).The mean (see Equation ( 2)) and standard deviation of the number of fields included in each group of fields and across all datasets were determined.The results were visualised in a heatmap.
For each field included by each site, the number of unique patients with data recorded in the field was divided by the total number of patients (Equation ( 3)).This was averaged across all datasets that included the field to establish each mean field completion across all data (Equation ( 4)).Where fields were dependent on other information (e.g., if pregnant: gestational age in weeks was recorded), the denominator of the completion proportion calculation was the total number of patients for whom the dependent information was true.Fields on the 'daily' section of the CRF were considered completed if they contained data for at least one day during admission.The mean field group completion was calculated across the datasets as per Equation (5).
mean f ield group completion = (mean f ield 1 completion + mean f ield 2 completion + . . .mean f ield n completion) number o f f ields in the group (5)

Data Utilisation
Utilisation was quantified by calculating the Euclidean distance from an ideal point of (100, 100), representing the best case scenario of 100% completion proportion and 100% inclusion proportion.A greater distance from the ideal point indicates a reduced utilisation of a field, reflecting either less inclusion across sites, lower completeness, or both.Distances, hereafter called utilisation distances, for each field were calculated with all patient data using the mean proportion completed as the x coordinate and the mean proportion included as the y coordinate for each field (Equation ( 6)).Utilisation distances were normalised to a maximum of 100 by division with the highest distance, as per Equation ( 7), enabling comparison across fields.
Calculations were repeated using data from severe patients only.The utilisation difference between all patients and the severe cohort was determined.Results were represented in a scatter plot.

Results
A total of 1886 hospitals belonging to 42 institutions, networks, or public health agencies contributed data to ISARIC's COVID-19 data platform.Clinical characterisation data on 950,064 suspected or confirmed COVID-19 patients from 82 countries were included (Figure 2).Datasets varied in size and scope.This range spanned 19 single-site collection efforts, a multinational network of 258 sites across 44 countries, and a national clinical surveillance program across 627 sites.Data collection began before the peak case numbers in any participating country, with a median start date of 17 February 2020, and lasted for more than a year in most sites (median 16.8 months, IQR 7.6-28.1 months of data collection) (Table 1 and Table S1 in the Supplementary Materials).Across participating sites, 2155 staff were recorded as contributors to this data collection in the ISARIC database.
as datasets were collected in disparate settings, such as a district health centre in Maferenya, Guinea, and the intensive care unit of an academic medical centre in Karachi, Pakistan.

Field Group Inclusion
Out of 243 fields, a mean of 129.6 fields (SD 64.4) were included in the CRFs implemented across the institutions (Table 2).Fields collected at outcome and hospital admission were most often included.Outcome status fields had the highest mean field group inclusion rates (1.9/2, 95.2%), followed by the admission group fields of demographics (10.3/13, 79.1%), comorbidities (16.4/21, 77.9%), vital signs (7.7/11, 70.3%), and signs and symptoms (20.0/29, 68.9%).Other fields were less often included on the institutional CRFs.On average, laboratory test fields were more often included on admission forms than daily forms (51.1% vs. 44.3%)and intervention fields were more often included on daily forms  The proportion of severe patients included in each dataset ranged from 0% to 100% as datasets were collected in disparate settings, such as a district health centre in Maferenya, Guinea, and the intensive care unit of an academic medical centre in Karachi, Pakistan.

Field Group Inclusion
Out of 243 fields, a mean of 129.6 fields (SD 64.4) were included in the CRFs implemented across the institutions (Table 2).Fields collected at outcome and hospital admission were most often included.Outcome status fields had the highest mean field group inclusion rates (1.9/2, 95.2%), followed by the admission group fields of demographics (10.3/13, 79.1%), comorbidities (16.4/21, 77.9%), vital signs (7.7/11, 70.3%), and signs and symptoms (20.0/29, 68.9%).Other fields were less often included on the institutional CRFs.On average, laboratory test fields were more often included on admission forms than daily forms (51.1% vs. 44.3%)and intervention fields were more often included on daily forms than summary forms (39.0% vs. 35.5%).A total of 13/42 datasets (31.0%) that collected summary intervention fields collected only 1 or 0 daily intervention fields.Outcome health fields had the lowest mean inclusion of 0.9/4 (22.6%) (Figure 3).The mean proportion of datasets that included the field and completed the field is expressed in %.The mean utilisation distance expresses the mean distance from 100% inclusion and 100% completion of all fields in a given group.Delta is the mean field group utilisation distance in all patients minus the mean field group utilisation distance in severe patients.
than summary forms (39.0% vs. 35.5%).A total of 13/42 datasets (31.0%) that collected summary intervention fields collected only 1 or 0 daily intervention fields.Outcome health fields had the lowest mean inclusion of 0.9/4 (22.6%) (Figure 3).2) The mean proportion of datasets that included the field and completed the field is expressed in %.The mean utilisation distance expresses the mean distance from 100% inclusion and 100% completion of all fields in a given group.Delta is the mean field group utilisation distance in all patients minus the mean field group utilisation distance in severe patients.Some field groups were omitted completely from several datasets.Pre-admission medications fields did not appear in 24/42 (57.1%) of databases.A total of 11/42 (26.2%) datasets had no daily data collection fields.Data for field group inclusion of individual datasets are available in Table S2.

Field Group Completion
The ranking of completion proportion by data field group was similar to that of the inclusion proportion.Demographics and outcome status had the highest mean completion values of 91.6% and 89.8%, respectively.The remaining admission field groups were the next highest, ranging from 79.0% completion of comorbidities fields to 69.1% of vital signs fields.Outcome health fields had the lowest mean completion (19.9%), followed by laboratory test fields at admission (37.0%) and daily (39.3%) (Table 2).
There was no appreciable difference in field inclusion between all patients and severe patients (mean decrease of 0.9% inclusion), nor in the completion of the most highly included field groups of demographic and outcome status data.However, the mean completion was higher in severe patients for all other field groups.Daily data were more often entered for severe patients, resulting in the largest increase in completion; intervention, laboratory tests, and vitals and assessment mean field group completion increased by 23.4%, 13.3%, and 11.8%, respectively.Complication fields had a mean increase of 12.6% more completion in severe patients (Table 2).

Field Group Utilisation
The utilisation distance, assigned to express proximity to a point of 100% inclusion and completion, followed the field group trends of its components, with lower scores demonstrating a higher utilisation.The mean utilisation distance across all 243 data fields was 43.9 (SD 21.0) in all patients and 41.0 (SD 19.9) in severe patients.Field groups with the lowest mean utilisation distance in all patients were outcome status (8.0, SD 3.3), demographics (16.3, SD 0.14), comorbidities (22.9, SD 13.8), signs and symptoms (27.2, SD 14.4), and vital signs (30.6, SD 10.8).Summary comorbidities had a mean utilisation distance of 41.8 (SD 12.5).Field groups collected daily were all higher than the mean utilisation distance, as were admission laboratory tests, summary diagnostics and interventions, and outcome health fields in all patients (Table 2).This ranking of field group utilisation distance was consistent in severe patients.

Individual Field Utilisation
The utilisation distances of individual fields highlight the variability of results within a group and identify specific fields that impact group utilisation.Subject IDs were assigned to all patients and were therefore included and fully completed in all CRFs (utilisation distance 0).Country was defined in the metadata of each dataset and was therefore included and complete for 100% of patients (utilisation distance 0).The next lowest utilisation distance in both cohorts included variables for sex (1.7), age (3.2), admission date (3.7), and outcome (5.7 in all patients and 2.0 in severe patients).Pathogen test results (8.7 in all and 9.3 in severe for pathogen type, and 9.7 in all and 10.6 in severe for test result) were additionally in the lowest utilisation distances.Other fields in the low (green) range of utilisation distances included comorbidities, signs and symptoms at admission, vital signs at admission, and additional outcome information.The central (yellow) range of utilisation distances were dominated by complications, interventions, vitals and assessments, and laboratory tests.The highest (red) of utilisation distances included fields for less common laboratory or diagnostic tests, outcome health, and details (type or duration) of interventions.Fields that capture features of severe illness feature more in the higher range of utilisation distances within each group (Figure 4). of interventions.Fields that capture features of severe illness feature more in the higher range of utilisation distances within each group (Figure 4).The utilisation distance was lower in the severe patient cohort than in all patients, with a mean decrease in utilisation distance of 3.0 (SD 1.2) across all fields.Decreases were driven by higher completion proportions in the severe population.Daily interventions, daily laboratory tests, summary complications, and outcome status field groups had the largest decreases in utilisation distance (7.8, 6.3, 4.3, and 4.3, respectively).Individual fields associated with critical care were more likely to be included and/or completed in the severe patient cohort than in all patients.The largest decreases in utilisation distance were in fields for treatments targeting severe disease: extracorporeal membrane oxygenation type (30.0), dopamine treatment (19.4), non-invasive ventilation type (20.3) and duration (10.6), and high-flow nasal cannula duration (9.5).Examples of individual utilisation distances are presented in Table 3, with the utilisation colouring matching that in Figure 4. Utilisation scores for all fields are available in Table S3, in the Supplementary Materials.

Discussion
The implementation of the ISARIC-WHO COVID-19 CRF, or its derivatives, has been evaluated in 42 distinct data collection initiatives across 1886 hospitals.Despite their differences in geography, context, data collection periods, and volume of data collection, consistent patterns of data field inclusion and completion are observed across these institutions.

Field Inclusion
Of the 243 fields included on the ISARIC-WHO COVID-19 form, the mean number of fields included in CRFs adapted and implemented by ISARIC partners is 129.6 (53.3%).The low mean field inclusion resulted in high proportions of missing data across the aggregate data, particularly in the outcome health, laboratory, and intervention field groups.This restricted the populations that could be included in cross-institutional analyses, limiting the power and generalisability of the results to inform clinical and public health decision making.
Data collection, and the adaptation and/or implementation of the CRF used, began early in the outbreak.The majority of sites had already initiated data collection before the WHO declared COVID-19 a pandemic on 11 March 2020 [37].Therefore, decisions about which variables to include in the institutional forms were made before the magnitude of COVID-19's impact was known.Motivations to reduce the number of fields included in the institutional forms may have included limitations of infrastructure, patient management, data flow, human resources, and staff motivation.
The low inclusion of intervention fields (39.0% and 35.5% on the daily and summary forms, respectively) may be due to a lack of equipment for higher-level care, such as different types of ventilation, dialysis, or extracorporeal membrane oxygenation [38].Several participating sites in low-resource settings did not have access to these interventions.The omission of laboratory fields to align with patient management guidelines is a practical decision when tests, such as that for interleukin-6 (38.1% and 35.7% included in the admission and daily forms, respectively), are not part of standard care [39].
Data flow patterns limited the inclusion of fields in institutional data collection when the source of data used to complete the CRF did not cover all CRF fields.For example, when electronic health record data extraction is used to populate the CRF, the information required for some data fields may not be available in the records [40].In other cases, it may not be possible to adapt pre-existing data systems such as patient registries to accommodate all data in the CRF.Some sites reported reducing fields that required clinical judgement as data collection was undertaken by data staff without clinical training.Other sites omitted fields that required data that were not available in medical records [41].These factors are likely contributors to the low inclusion of fields on outcome health (3/4 fields had only 2.4% inclusion).
Completing and entering data for the entire ISARIC-WHO COVID-19 CRF was reported to take approximately 30 min per patient when inpatient stays are limited, and a local ethics committee waived the requirement for written informed consent [42].Bandwidth limitations may extend this time for electronic data capture.During a developing public health emergency, human resource constraints due to caseloads and/or staff illness may lead to conservative research data collection approaches despite the need for evidence.Two nation-wide research programs that contributed to the ISARIC COVID-19 Data Platform illustrate this.In the United Kingdom, the ISARIC4C study included 75.7% of the CRF fields to collect data on >300,000 patients in 323 hospitals under the direction of national health authorities who mobilised dedicated research staff to collect data in all hospitals [17,43].In South Africa, the National Institute of Communicable Diseases conducted a nationwide hospital surveillance and research study across 627 hospitals.Without dedicated research staff available, the decision was made to reduce the ISARIC-WHO CRF down to a feasible 20.6% of the original fields, which were collected for >490,000 patients [19,44].These findings highlight the value of dedicated research staff who can be seconded for outbreak research response when needed.
Though differing operational limitations define institutional field inclusions and exclusions, the overall inclusion trend indicates alignment in research priorities across the institutions.Outcome status fields and key admission fields (demographics, comorbidities, vital signs, and signs and symptoms) had the highest inclusion rates across the institutions (68.9-95.2%).The focus on these time points is supported by scientific justification from several sources, including Harris et al.'s 2020 review of key early outbreak research questions for SARS-CoV and MERS-CoV [45], and Rojek et al.'s list of minimal observational data needed to design clinical trials for high-priority pathogens [1].These highlight clinical presentation, risk factors for death and severe illness, optimal diagnostics, case defining features, risk factors for infection, and outcomes as research priorities.All of these can be addressed by information collected at admission and/or outcome.The WHO's 2020 Strategic Preparedness and Response Plan for the Novel Coronavirus echoes this inventory of priorities with a research focus on disease severity and identifying at-risk groups [46].

Field Completion
Unlike the institutional decisions that drove inclusion decisions, the completion of data fields relied on individual clinical and research staff.The completion of fields is subject to operational barriers when institutional decisions cannot account for all circumstances.In addition, independent decisions can be made by the individuals responsible for data collection.Though pre-admission medication fields were included on 61.9% of forms, they were completed just 40.5% of the time.This indicates a likely absence of information on the data source, and/or limitations of staff time to complete all fields.
Fields related to outcome health had consistently low completion rates across institutions (19.9% mean completion for all patients).This information regarding the health needs of the patient post-discharge may not have been assessed or available at the point of outcome, or it may have been perceived as low value or outside of the responsibilities of those collecting these data.Though many studies were later executed to understand long COVID, the absence of these data in ISARIC's pioneering characterisation work prevented the opportunity for earlier detection of sequela.Laboratory tests had the next lowest completion rates (37.0%and 39.3% for admission and daily data in all patients, respectively), likely due to many parameters not being measured as a part of clinical care.This is supported by the increase in completion for severe patients (44.9% and 52.6% at admission and daily, respectively), whose health complications would warrant additional laboratory monitoring.

Adjusting for Severity
Increased data completion for severe patients accounted for a higher field utilisation in most field groups.Additional data may enable an understanding of the mechanisms of severe disease and inform effective treatment.However, without a systematic severitybased collection strategy, individual staff may apply ad hoc approaches based on perceived relevance of data in higher-level care and/or increased availability of information due to more frequent monitoring.Prospective planning of research questions and the data required to address them is needed to promote efficiency in evidence generation.
The additional data required to understand severe disease should be collected only in the relevant patients.The consistency of field inclusion between all patients and severe patients reflects institutional-level decisions to use the same CRF across all levels of severity.The practicality of this decision highlights the importance of CRFs that can adapt to different data requirements where relevant to the severity of illness.

Limitations
The data used in this analysis were collected from diverse institutions with varying levels of resources and capacities.This diversity, while providing a broad perspective, does not represent all settings in which research on emerging infectious diseases is important.The ISARIC collaboration relied on voluntary participation and data sharing, which may introduce selection bias for institutions with interest, resources, and an enabling regulatory framework for these activities.This bias may predispose the inclusion and completion results of participating institutions to be higher than a true average of all contexts.When planning data collection that requires robust representation, it is important to make provisions to include health care facilities that lack research resources.The interpretation of clinical evidence generated only from sites with research resources should also consider potential biases in patient outcomes.
Differences over time or between different settings were not explored in this analysis.Both may have been drivers of change in completion rates.Additionally, as the data are all from patients with COVID-19, differences in research priorities for different emerging syndromes may require the collection of data on different fields and/or events.These should be considered based on the context and state of knowledge and can consider how the results of this study may be applied.
While we identified fields and field groups with lower utilisation based on inclusion and completion rates, these metrics do not fully capture the clinical significance or research value of individual data fields.Some fields may be underreported yet hold critical importance for specific populations, research questions, clinical decisions, or health policy.

Towards Optimising Clinical Epidemiology in Disease Outbreaks
Ensuring that research resources are used efficiently is important to maximise the improvement in patient outcomes in all diseases.In the context of emerging infectious diseases with limited evidence on effective interventions, this efficiency is particularly important to inform a rapid and reliable response that reduces the impact of the outbreak.Additionally, by minimising the time required for data collection, health care workers can dedicate more attention to providing front-line patient care.
The results of this study inform several key efficiency-focused improvements to CRFs for the clinical characterisation of diseases in future outbreaks.Guided by the utilisation of each field, planned changes for future ISARIC-WHO CRFs include the modularisation of contents, operational streamlining in field selection, and iterative review based on accruing evidence.These approaches will enhance the clinical and public health values of the data collected and improve data capture efficiency.
CRFs should focus on locally prioritised research and public health questions, designed with incremental modules that can be completed or omitted as resources allow.Sites with limited data collection resources can complete essential fields capturing key case-defining features, the spectrum of clinical presentation, risk factors, laboratory diagnosis, and outcomes.Sites with more resources can include additional fields for a more comprehensive understanding of the disease spectrum.A third module of daily data collection can be available for capable sites, generating detailed evidence on the disease course, patient journey, and specific populations.Fields relevant only to research questions on severe disease or patients in critical condition should be in a standalone module, used only for patients who achieve a threshold of severity.
This tiered approach ensures all sites focus their data collection capacity on the knowledge gaps most relevant to their patients.Sites may adjust the number of modules based on caseload, resource availability, or other factors as they change during an outbreak.It is important to understand and account for potential bias in any approach that focuses on well-resourced centres.
Fields that cannot be completed at a hospital should be excluded and documented to support the understanding of missing values.This allows researchers to make informed decisions about appropriate methods for handling the missing information, address potential biases, and interpret the results.
To ensure the relevance and comprehensiveness of the CRF, dynamic updates should be made based on the continuous review of research priorities and data utilisation monitoring employing the methods described in this paper.By maintaining flexibility and responsiveness in data collection processes, CRFs can support efficient clinical characterisation and enhance the quality and usability of collected data during outbreaks.

Conclusions
A data-driven approach to designing CRFs for the clinical characterisation of emerging infectious diseases can enhance data collection efficiency to inform patient care and public health response.The selection of data collection fields should always be tailored to prioritised research objectives, the outbreak dynamics, and the context of local health systems.The availability of tiered, standardised, template CRFs that optimise the inclusion of relevant data fields can accelerate the generation of critical evidence.The adoption of standard data variables and modular options additionally promotes the alignment of global data collection efforts, offering further opportunity to reduce time to evidence through aggregate analysis that impacts policy.These principles can increase the efficiency of evidence generation in many types of exploratory data collection.ISARIC and WHO will continue to develop CRFs for emerging infectious diseases, based on locally and globally defined research priorities and the practical insights gained from their actual use.

Supplementary Materials:
The following supporting information can be downloaded at https:// www.mdpi.com/article/10.3390/epidemiologia5030039/s1:ISARIC-WHO COVID-19 CRF 8DEC21; Table S1.Site and patient distribution across 42 institutional datasets, Table S2.Field group inclusion of individual datasets, and Table S3.Utilisation distance of individual fields: analysis code.Informed Consent Statement: Requirements for written or oral consent varied across committees responsible for the 1886 participating sites.The majority of patients were recruited under a waiver of consent, approved by the responsible committee in the context of the pandemic-including those in the UK and South African studies listed above.

Data Availability Statement:
The data that underpin this analysis are highly detailed clinical data on individuals hospitalised with COVID-19.Due to the sensitive nature of these data and the associated privacy concerns, they are available via a governed data access mechanism following review of a data access committee.Data can be requested via the IDDO COVID-19 Data Sharing Platform (http://www.iddo.org/covid-19accessed on 1 July 2024).The Data Access Application, Terms of Access, and details of the Data Access Committee are available on the website.Briefly, the requirements for access are a request from a qualified researcher working with a legal entity who have a health and/or research remit and having a scientifically valid reason for data access that adheres to appropriate ethical principles.A small subset of sites who contributed data to this analysis have not agreed to pooled data sharing as above.In the case of requiring access to these data, please contact the corresponding author in the first instance who will look into facilitating access.The analysis code is available in the Supplementary Materials.

Figure 2 .
Figure 2. Map of data sources and volumes.

Figure 2 .
Figure 2. Map of data sources and volumes.

Figure 3 .
Figure 3. Heatmap of the ISARIC-WHO COVID-19 Case Report Form field group inclusion in institutional datasets.The inclusion proportion of each field group is visualised for 42 institutional datasets.Darker colours indicate a lower inclusion proportion.

Figure 4 .
Figure 4. Utilisation distance of data fields in the ISARIC-WHO COVID-19 Case Report From for all (N = 950,064) and severe (N = 256,529) patient data.Utilisation distance comprises the mean completion rate (x-axis) and inclusion rate (y-axis) among institutions.Colours are assigned to dots and

Figure 4 .
Figure 4. Utilisation distance of data fields in the ISARIC-WHO COVID-19 Case Report From for all (N = 950,064) and severe (N = 256,529) patient data.Utilisation distance comprises the mean completion rate (x-axis) and inclusion rate (y-axis) among institutions.Colours are assigned to dots and backgrounds as visual cues, marking utilisation distances lower than 30 as dark green, those from 30 to 70 as yellow, and those greater than 70 as red.Colours in the severe patient plots are varied to indicate a decrease across this threshold from all patients to severe patients: light green for fields with utilisation distances that decreased to below 30 in severe patients and orange for fields with utilisation distances that decreased to below 70 in severe patients.Field labels are listed from the lowest to the highest utilisation distance within each collection time: (A) admission (left), (D) daily (centre), and (S) summary and (O) outcome (right).Next to each field name, the left circle applies to 'All patients', and the right circle applies to 'Severe patients'.

Author Contributions:
Conceptualisation: L.M.; Data curation: L.M., S.D. and T.O.Y.; Formal analysis: S.D., T.O.Y. and L.M.; Funding acquisition: L.M.; Investigation: L.M., S.D., E.G.-G., T.O.Y., J.R., J.D. and ISARIC Clinical Characterisation Group; Methodology: L.M., S.D., E.G.-G.and T.O.Y.; Project administration: L.M.; Supervision: A.F., J.R. and J.D.; Visualisation: S.D.; Writing-original draft preparation: L.M.; Writing-review and editing: all authors; L.M. is responsible for the overall content.All authors have read and agreed to the published version of the manuscript.Funding: This research was funded by Wellcome [225288/Z/22/Z and 303666/Z/23/Z]; UK Foreign, Commonwealth & Development Office [301542-403]; and the Bill & Melinda Gates Foundation [INV-063472].Institutional Review Board Statement: The WHO Ethics Review Committee approved the original ISARIC-WHO Clinical Characterisation Protocol (RPC571 and RPC572, 25 April 2013).Approval for COVID-19 implementation was obtained from local or national ethics committees by each participating institution according to local regulations.Two projects represent the majority of the data.Approval for the ISARIC4C study in the UK was granted by the Oxford C Research Ethics Committee (Ref.13/SC/0149) and the Scotland A REC (Ref.20/SS/0028).The Human Research Ethics Committee (Medical) at the University of the Witwatersrand approved the study managed by the National Institute of Communicable Diseases, South Africa, as part of a national surveillance programme (M160667).

Acknowledgments:
This work was made possible with the support of the UK Foreign, Commonwealth and Development Office and Wellcome [215091/Z/18/Z, 222410/Z/21/Z, 225288/Z/22/Z and 220757/Z/20/Z]; the Bill & Melinda Gates Foundation [OPP1209135]; the philanthropic support of the donors to the University of Oxford's COVID-19 Research Response Fund (0009109); grants from the National Institute for Health Research (NIHR; award CO-CIN-01/DH_/Department of Health/United Kingdom), the Medical Research Council (MRC; grant MC_PC_19059), and by the NIHR Health Protection Research Unit (HPRU) in Emerging and Zoonotic Infections at University of Liverpool in partnership with Public Health England (PHE), (award 200907), NIHR HPRU in Respiratory Infections at Imperial College London with PHE (award 200927), Liverpool Experimental Cancer Medicine Centre (grant C18616/A25153), NIHR Biomedical Research Centre at Imperial College London (award ISBRC-1215-20013), and NIHR Clinical Research Network providing in-

Table 1 .
Site and patient distribution across 42 institutional datasets included in the ISARIC COVID-19 Data Platform.

Countries All Patients Severe Patients * Data Collection Start Date Months of Data Collection
* Severe patients are a subset of 'All patients'.

Table 1 .
Site and patient distribution across 42 institutional datasets included in the ISARIC COVID-19 Data Platform.
* Severe patients are a subset of 'All patients'.

Table 2 .
Utilisation distance of data field groups in all and severe patient cohorts.

Table 2 .
Utilisation distance of data field groups in all and severe patient cohorts.

Table 3 .
Examples of utilisation distance of individual fields in all and severe patient cohorts.Fields listed are a random sample.Delta is the utilisation distance in all patients minus the utilisation distance in severe patients.Utilisation distance is a function of the mean field completion (x) and field inclusion (y).Utilisation colour groups in severe patients: light green indicates fields with utilisation distances that decreased to below 30 in severe patients; orange indicates fields with utilisation distances that decreased to below 70 in severe patients; dark green indicates fields with utilisation distances lower than 30 in all patients and severe patients; yellow indicates those from 30 to 70; and red indicates those greater than 70.