Data gaps in electronic health record (EHR) systems: An audit of problem list completeness during the COVID-19 pandemic

Objective To evaluate the completeness of diagnosis recording in problem lists in a hospital electronic health record (EHR) system during the COVID-19 pandemic. Design Retrospective chart review with manual review of free text electronic case notes. Setting Major teaching hospital trust in London, one year after the launch of a comprehensive EHR system (Epic), during the first peak of the COVID-19 pandemic in the UK. Participants 516 patients with suspected or confirmed COVID-19. Main outcome measures Percentage of diagnoses already included in the structured problem list. Results Prior to review, these patients had a combined total of 2841 diagnoses recorded in their EHR problem lists. 1722 additional diagnoses were identified, increasing the mean number of recorded problems per patient from 5.51 to 8.84. The overall percentage of diagnoses originally included in the problem list was 62.3% (2841 / 4563, 95% confidence interval 60.8%, 63.7%). Conclusions Diagnoses and other clinical information stored in a structured way in electronic health records is extremely useful for supporting clinical decisions, improving patient care and enabling better research. However, recording of medical diagnoses on the structured problem list for inpatients is incomplete, with almost 40% of important diagnoses mentioned only in the free text notes.


Introduction
The problem list is a feature of electronic health records (EHR) which provides a persistent summary of diagnoses and other health issues, in order to facilitate handovers and continuity of care [1,2]. Dr Lawrence Weed originally envisioned the problem list as an index, containing a "complete list of all the patient's problems, including both clearly established diagnoses and all other unexplained findings that are not yet clear manifestations of a specific diagnosis, such as abnormal physical findings or symptoms" [3].
Recording information about diagnoses in a structured way can potentially enable decision support such as medication alerts, treatment suggestions and differential diagnoses [4]. Problem lists terms can be coded using a terminology system, such as SNOMED CT concepts [5], and used as valuable resource to support health research and informatics [6,7].
Previous studies of problem list completeness have based estimates on selections of high prevalence conditions, using other data in the EHR as a gold standard (Table 1). Improving the completeness of problem lists, and ensuring their accuracy, is critical to patient safety, medical education and clinical communication in the era of health digitalisation [8]. Methods that have been shown to increase problem list completeness include problem orientated charting [9], problem list integration throughout the EHR [10], clinician alters [11], self-reporting of conditions from patients [12][13][14][15] and automatic population of the problem list from other areas of the EHR [16] or via natural language processing (NLP) [17][18][19][20] (further information in Appendix). This study sought to assess the completeness of recoding of problem list entries during the COVID-19 pandemic, one year after the installation of a comprehensive EHR system (Epic, May 2019 edition) at UCLH (University College London Hospitals) Trust. Epic is a widely used EHR system internationally, with an estimated 29% market share in the US [21], and is currently live in 3 other NHS Trusts and being prepared for installation in at least 4 additional NHS Trusts. The EHR included a structured data field for COVID-19 which was used consistently, but other information (such as diagnoses recorded in the problem list) were commonly recorded only as free-text electronic notes. Comprehensive retrospective chart review of this specific cohort of patients, and recovery of problem list data was required for EHR-derived COVID-19 datasets (see Appendix) and to support research at the trust to evaluate prognostic models for COVID-19 [22].
The aim of our audit was to assess whether information on key diagnoses was included in the problem list or stored only as unstructured free text notes.

Study type
This is a retrospective EHR system-based chart review during the COVID-19 pandemic.

Participants
All inpatients with confirmed or clinically suspected COVID-19 infection were identified by an EHR data warehouse search, with 516 patients included based on the following criteria:

Inclusion criteria
Patients with a 'suspected' or 'confirmed' COVID-19 infection flag in the EHR prior to 2nd June 2020 were included in this case note review. The COVID-19 flag was set by the infectious diseases team, according to an overall clinical assessment including virology testing and the clinical picture.

Exclusion criteria
Patients were excluded if their hospital admission during the same period was unrelated to a suspected or confirmed COVID-19 infection.

EHR system
The EHR System deployed at UCLH, Epic (May 2019 version), includes electronic documentation, order communications, clinical workflow with decision support and knowledge management to develop an evidence-based care pathway.

Data collection
Medical students recruited to the EHR department, under close supervision of the Trust Clinical Data Standards Lead/Advisor (LZ), undertook an assessment of pre-specified data fields in each patient's EHR in accordance with a clinically signed off Standard Operating Procedure (see Appendix). Primary data collection was undertaken in May 2020.
Problems were considered 'missing' if there was evidence in the text notes that the patient had a medical condition (either a new diagnosis or past medical history) that was not included on the problem list, and it was important enough that it would be considered good practice to include it. Good practice was defined to include all on-going chronic medical conditions and major new diagnoses, particularly those that require ongoing treatment or monitoring, as per recent guidance on the use of problem lists [1]. Recommended practice at UCLH is for all patients to have an up-to-date and complete problem list, and this is incorporated into EHR system training for clinicians [23]. The judgement on whether a medical condition should be included on the problem list was supervised by a consultant clinician with experience in problem list management (ADS) [1].

Statistical analysis
An estimate of the problem list 'data gap' between all the clinically relevant problems which should be recorded in the problem list, and those that were actually recorded during the admission, was assessed using the EHR audit trail.
Problems in UCLH EHR are entered using a proprietary terminology, which are mapped in the background to SNOMED CT and ICD-10. For ease of reporting categories of problems in this audit, problems were aggregated using ICD-10. Confidence intervals for proportions were calculated using the binomial distribution. Statistical analysis was carried out using R, version 3.4.4 [24].

Patient and public involvement
Patients were not directly involved in the design of this study. However patients are involved within a broader programme of work led by the senior author to improve recording of diagnoses and problems [23].

Results
We reviewed the problem list of 516 inpatients with suspected or confirmed COVID-19. The patients included 336 men and 180 women, with median age 65 years (interquartile range 53, 78). The majority (290) were of White ethnicity, 72 were Black, 73 were South Asian, 58 were of mixed or other ethnicity and 23 had no ethnicity recorded. Prior to review, these patients had a combined total of 2841 diagnoses recorded in their EHR problem lists. 1722 additional diagnoses were identified as free text in electronic patient notes and transcribed into the problem list, increasing the mean number of recorded problems per patient from 5.51 to 8.14. The overall percentage of diagnoses originally included in the problem list was 62.3% (2841 / 4563, 95% confidence interval (CI) 60.8%, 63.7%), with variation by disease area. Chronic obstructive pulmonary disease was included on the problem list in 75.4% of patients with the condition (49 / 65, 95% CI 62.9%, 84.9%), and type 2 diabetes in 70.4% of cases (88 / 125, 95% CI 61.5%, 78.1%), but hypertension was recorded in only 53.8% of cases (127 / 236, 95% CI 47.2%, 60.3%), as shown in Fig. 1.
By ICD-10 chapter, diagnoses in chapters XIX and XX (injuries, poisoning and external causes of morbidity) were most likely to be included on the problem list, followed by neoplasms in chapter II, as shown in Table 2.
Note that the COVID-19 infection flag used to identify the cohort of patients for this project is separate to the problem list; thus there is more than one location where a formal COVID-19 diagnosis can be recorded in the EHR. Only 250 of the 516 patients (48.4%, 95% CI 44.1%, 52.9%) had a problem list entry for suspected or confirmed COVID-19 prior to review (some patients had more than one entry, hence there were 360 COVID-19 related problem list entries in total).

Discussion
Overall, only 62.3% of diagnoses were recorded on the problem list for patients in this study, with considerable variation by condition. These estimates of problem list completion should be considered in the context in which the study was undertaken, such as the time since the EHR system launch (1 year), the EHR system employed (Epic) and associated EHR training, and the effects of the pandemic. The level of data incompleteness of problem lists in our study is similar to that from previous studies using electronic health records, despite differences in methodologies [15,[25][26][27][28] (Table 1).
Further study is warranted to understand the causes of variation with problem list completion by ICD-10 chapters. The desire amongst clinicians to avoid cluttering problem list [29], and the uncertainty surrounding which specialities are responsible for updating and maintaining the problem list [30,31], may explain why certain acute presentations are less likely to be recorded on the problem list. Inter-rater agreement between clinicians as to what should, and what should not be added to the problem list is especially poor for secondary diagnoses and complications that have arisen from primary problem list terms [32][33][34]. In combination with other studies [25,27], we found that chronic conditions such as type 2 diabetes mellitus and asthma were more likely to be recorded on the problem list.
Injury, poisoning and external causes of morbidity tend to directly cause an inpatient presentation and admission to hospital. Related problem list terms therefore may be more likely to be identified as the primary diagnosis and placed on the problem list.

Bridging the gap
Quality improvement initiatives aiming to improve the completeness of problem lists should consider a) the organisational and nonorganisational factors at the individual healthcare trust b) interclinician agreement as to the role and scope of the problem list in

Organisational and non-organisational factors
Educational approaches are necessary for staff need to become aware of the aspects of the EHR that lie outside their day to day use of the system, as well as the value of structured data in health informatics [10]. At UCLH trust, there was no culture of structured data use prior to implementation of an EHR. We found that free text may became the established way of working with a new EHR, and this was difficult to shift later. We suggest that a parallel effort to support adoption of structured data approaches such as problem listing should exist alongside technical EHR training, and ideally commence before EHR system implementation.
One year after implementation, many of the secondary data returns from the Trust still rely on ICD-10 codes entered retrospectively by coding staff rather than diagnoses recorded in problem lists by clinicians at the point of care, as use of the problem list is not yet systemic and consistent. The EHR department has sought to close this gap from multiple angles, including the publication of problem list leader boards by department and the dissemination of tip sheets to standardise EHR practice. A competition within the Acute Medicine department in which the problem list statistics for each individual clinician was published resulted in a threefold increase in recording of problem list items over this period.
Improvements to the user interface may also help to encourage clinicians to record information in a structured way. UCLH is commencing a project funded by the National Institute of Health Research to develop natural language processing technology to convert diagnoses entered in text into coded terms in real time, enabling clinicians to validate the entries before they are committed to the record.

The role and scope of the problem list
Poor problem list practice can increase fragmentation of problems [8] and propagate inaccuracies [35][36][37] at the expense of disrupting the patient narrative [38]. Variability in problem list practice can diminish trust in the problem list as an objective source of clinical truth [34].
The key requirement of a problem list is that it should be useful for clinical care [30]. For example, the problem list term 'childhood asthma' is of relevance for a young healthy patient with no other past medical history, but of less importance for an elderly patient with several co-morbidities. Clinicians often find it useful to add free text comments to provide additional detail or a description of the problem, in order to supplement the coded term [39]. Pigeonholing problems as either 'active' or 'resolved' also does not reflect the trajectory of health care problems [40].
These practices suggest a more successful problem list interface, which supports varied definitions of the role of the problem list and allows for individual clinician preferences, would be far less restrictive and more akin to Weed's original conception of the problem list as an index. Recommendations from the literature include: the creation of a past medical history as a separate section within the problem list [39], clear guidelines as to when to undertake problem list review [30] and better means of avoiding data duplication in problem lists [41].
Structured data approaches such as problem listing, are only more efficient if there is a conscious and concerted effort to use the EHR system to its full capacity beyond electronic storage of notes. Structured data fields need to be used appropriately, as they are not necessarily the best option for all health datathe patient's story and qualitative observations can be more faithfully recorded in free text, and excessive requirements for structured data may the system onerous to use and lead to poor quality or incomplete data [42].
Previous studies have identified a number of factors associated with improved problem list charting, including financial incentives, gap reporting, shared responsibility, usability, training, supportive policies and organisational culture [26,43].

Moving forward
The trust now has a foothold using the Epic EHR effectively, but several key questions remain moving forward: Who has the responsibility for recording and maintaining information that persists between healthcare encounters? Who should review problem list data for accuracy? How can we identify and plug gaps in the problem list?
Clinical coders and clinicians should seek to work closer in tandem during an inpatient hospital admission to create more accurate real-time Table 2 Percentage of patients with diagnosis already recorded in problem list, by ICD-10 chapters. ICD-10 chapters with fewer than 10 entries were omitted. When faced with the problem list of a discharged patient, a clinician-coder team should ask themselves both a) is this problem list accurate and complete? and b) what information is relevant now the patient has been discharged? The former is a more clear-cut question than the latter, which requires some level of standardisation to an inherently subjective issue.

Future study
This study focuses on one data field only (problem lists), and further study is needed to assess engagement and completeness for other structured data practices in EHR systems. Other areas of the EHR, such as social history, medication lists and family history can similarly suffer from incompleteness or inaccuracy. Though we have been able to accurately characterise problem list usage, and identified variation in problem list completion by ICD-10 chapter, we are only able to speculate as to the causes of incompleteness and variation. Thematic analysis, clinical surveys, and observations of problem list practice may help form a picture as to why these shortcomings occur. Further study is also warranted to assess if recommendations made by this study, such as the use of clinician-coder teams to regularly review problem lists at discharge, can be successful in practice.

Limitations
There is uncertainty amongst clinicians over exactly which set of conditions should ideally be included on a problem list, so the size of the discrepancy between free-text electronic notes and the problem list assessed in this study may partly depend on the judgement of the clinicians involved. Our method for estimating the completion rates for problem list terms also rests on the assumption that all the patients in the cohort were clerked comprehensively, without omissions of relevant past medical history from their electronic health record. Organisational factors at UCLH and the NHS, as well as the EHR system used (Epic), will impact and limit the generalisability of the results.

Conclusion
Diagnoses and other clinical information stored in a structured way in electronic health records is extremely useful for supporting clinical decisions, improving patient care and enabling research. However, one year after implementation of a comprehensive electronic health record in a major teaching hospital, recording of medical history on the structured problem list for inpatients is incomplete, with almost 40% of important diagnoses mentioned only in the free text notes.

Contributions
LZ and ADS conceived the idea of the study. LZ and JP conducted the data collection, with clinical supervision from ADS. Analysis was performed by JP, ADS and LZ. All authors contributed to the interpretation and appraisal of the results, with JP writing the manuscript. ADS is the guarantor.

Funding
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. ADS is supported by a postdoctoral fellowship (PD-2018-01-004) funded by the Health Foundation's grant to the University of Cambridge for The Healthcare Improvement Studies Institute.

Ethical approval
This is a quality improvement project carried out to improve data quality for operational reporting. The project was approved by the UCLH Data Access Committee for COVID-19 studies, who classified it as an audit, and therefore it did not require review by an ethics committee.

Data sharing
This work was carried out as an internal audit with the University College London NHS Trust. As the individual data used in this study has the potential to be patient identifiable, they are not available for sharing.

Transparency
The manuscript's guarantor (ADS) affirms that the manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned have been explained.

Acknowledgements
We would like to acknowledge Alfio Missaglia for undertaking the manual review of patient notes and problem lists for this study, Qifang

What is already known on this topic
• Diagnoses and other clinical information stored in a structured way in electronic health records is extremely useful for supporting clinical decisions, improving patient care and enabling better research. • There is evidence in the literature that the recording of diagnoses on the structured problem list for patients is often incomplete, with many diagnoses mentioned only in free text notes. • Previous studies had assessed problem list completeness only for specific conditions in comparison with other structured data, and there is an absence of such studies in UK hospitals.

What this study adds
• This study is the first to undertake a manual electronic notes review of a cohort of acute hospital inpatients in the UK to ascertain problem list sensitivity for all medical conditions relevant to ongoing care.
Deng for mapping the problem list terms to ICD-10 nomenclature and Gary Philippo for Epic analytical support. This work uses data provided by patients and collected by the NHS as part of their care and support.