Data quality and autism: Issues and potential impacts

:


Introduction
Routinely collected, administrative healthcare data are increasingly being used to inform decisions taken by clinician and healthcare service manager. [1] Whilst the use of such data can be valuable, it is important to have a clear understanding of data quality and the strengths and limitations of any dataset prior to analysis.
In England, the Getting It Right First Time (GIRFT) programme is an NHS England and NHS Improvement initiative with a remit to reduce unwarranted variation in clinical practice that negatively impacts on patient outcomes. The GIRFT programme is one of the largest users of administrative healthcare data in England, with the Hospital Episodes Statistics (HES) dataset being a key resource. Although there is usually no 'gold standard' reference dataset against which to validate administrative datasets, such as HES, missing and internally inconsistent data can be identified. The issue of data inconsistency is particularly pertinent when considering people with life-long conditions, such as autism, who may have regular hospital attendances linked to their condition or where their condition may need to be considered in the delivery of care. Autism can impact on a person's ability to communicate health problems and to engage with support and advice offered. [2] The recording of autism is mandatory in the HES dataset; all episodes of admitted hospital care for autistic people should include a diagnostic code for autism. [3].
The UK government's proposed national strategy for autistic children, young people and adults [4] has recognised the gap in autism data and identifies that a key enabler for the strategy is the need to improve data collection and quality of data on autism across the public services to drive system improvement. NHS Digital's Assuring Transformation initiative has improved record keeping of inpatient admissions for autism, but this does not include autistic patients in beds to address their physical health care. [5].
This was an exploratory analysis that aimed to identify the extent and pattern of inconsistent data for autistic patients within the HES administrative dataset and explore whether any inconsistencies were linked to outcomes. Our a-priori hypothesis was that poor recording of autism in medical notes would suggest a lack of focus on autism as a complicating condition that may impact of patient care and outcomes. In light of the LeDeR programme, a specific interest was in hospital spells where a death had occurred.

Ethics
Consent from individuals involved in this study was not required for this analysis of the HES administrative dataset. The analysis and presentation of data follows current NHS Digital guidance for the use of HES data for research purposes. Reported data are anonymised to the level required by ISB1523 Anonymisation Standard for Publishing Health and Social Care Data. [6]

Study design and data collection
This was a retrospective exploratory analysis of HES data. HES data are collected by NHS Digital for all NHS-funded patients admitted to hospitals in England. Hospital trusts run all NHS hospitals in England. A hospital trust is an administrative unit typically operating between one and four secondary or tertiary care hospitals for a geographically defined catchment population. Data collection and reporting is mandatory, and data are entered by clinical coders at each trust.

Timing, case ascertainment, inclusion and exclusion criteria
Data were taken from HES for all patient discharges during the period 1st April 2013 to 31st March 2021. This period was chosen to reflect a period where clinical coding practice in the NHS in England was relatively stable. The time period defined the sample size.
Once a diagnosis of autism has been made, autism should be recorded in HES for all episodes of hospital care for that patient. [7] Autistic patients were identified in HES where the International Statistical Classification of Diseases and Related Health Problems, tenth revision (ICD-10) codes F84.0 (childhood autism), F84.1 (atypical autism), F84.5 (Asperger's syndrome) was listed as a diagnosis in any position in the diagnostic record.
The first use of the specified code in any position of the diagnostic record for a hospital spell during the study period was identified and data for all subsequent spells for the same person extracted.

Identification of clinical coding inconsistencies
Clinical coding inconsistencies are reported at the spell level; a spell in hospital being made up of individual episodes of care. In HES a hospital spell is defined as a continuous period in hospital from admission to discharge. A spell can include multiple smaller episodes of care in various hospital settings and under different consultants. As an example, following an emergency department attendance, a patient may initially be under the care of acute medicine (episode one), then transferred to a critical care setting (episode two) and then to a care of the elderly ward (episode three) prior to discharge.
The data were extracted at an episode level and aggregated to a spell level for analysis. Therefore, a subsequent spell was considered to contain an inconsistency if none of its constituent episodes mentioned the ICD-10 codes of interest. An inconsistency was recorded where none of the three autism ICD-10 codes appeared in the diagnostic record of a subsequent spell. We recognised that the precise diagnostic code used may change for a patient, but the broad diagnosis of autism should be present during each spell. This approach was felt to be a fairer and more pragmatic approach than the stricter definition requiring all episodes in a spell to contained one of the three autism codes.

Data features (variables)
Patient characteristics: sex, age in years, ethnicity (White, Black or Black British, Asian or Asian British, Mixed, other), frailty (Hospital Frailty Risk Score HFRS [8], and deprivation (Index of Multiple Deprivation (IMD scores) [9]. All ICD-10 codes present in the diagnostic record for each subsequent spell were investigated for their potential to improve the model performance. In modelling, the feature age was the age at the subsequent admission where the inconsistency was recorded.
Features of hospital stay: Emergency readmission within 30 days, spell length of stay, hospital trust, admission method (emergency or elective), number of days since the first spell with the diagnosis was recorded (difference between the discharge date of the first spell and the admission date of the subsequent spell), clinical specialty of admission episode. All categorical features were one-hot encoded.

Outcome (target) variable
The target was described by a binary flag indicating whether a clinical coding inconsistency was recorded in the subsequent spell.

Data analysis
Data were extracted onto a secure encrypted server controlled by NHS England and NHS Improvement. Analysis within this secure environment took place using Alteryx 2019.3 (Alteryx Inc., Irvine, CA, USA), Python 3.9.6 and the scikit-learn machine learning library 0.24.2 (Python Software Foundation, Beaverton, OR, USA). [10] All machine learning models were developed using the scikit-learn library. Random forest classifiers were used to identify key co-variates associated with data inconsistencies. Random forest classifiers are ensemble classifiers that fit decision trees to portions of the data and average over all decision trees. The advantage of using these classifiers as opposed to deep neural networks is the interpretability and transparency of the results, particularly feature importance. This is of particular importance if a machine learning model is to provide useful information about the relationship between the features and outcome variable to clinicians. Machine learning has shown significant benefit in analysing healthcare data and providing insight. [11][12][13][14] The task considered in this work is a binary classification task.
The dataset was randomly split in the ratio 70:15:15 to create a training set, validation and test set, respectively. Machine learning algorithms require the data to be split into a training set, from which the algorithm learns the relationships between features, and a testing subset where it applies that learning to an unseen part of the dataset. The performance on this unseen test set is used to assess the generalisability of the model, to check for overfitting and evaluate how well the features have been learned. The machine learning algorithm was trained on the training set and its performance evaluated based on how well it could predict inconsistencies in the test set. The validation set was used to tune the hyperparameters of the random forest. To ensure that the model did not simply classify according to the majority outcome (no inconsistency), the training set was reduced by random under-sampling from the majority class to ensure that there were an equal number of patients with and without inconsistencies in the training set. This eliminated the effect of the class imbalance on the model performance and ensured that the model had sufficient exposure to patients in the minority class. Random (unstratified) under-sampling was felt appropriate given the exploratory nature of the analysis. However, the test set on which the trained model was evaluated was not balanced, increasing the model's external validity.
There exist hyperparameters specific to the random forest classifier that can be tuned. The hyperparameters were determined by performing a nested k-fold cross-validation based grid search. The combination of hyperparameters with the highest Area under the Receiver Operator Curve (AUROC) for the validation set were selected. The hyperparameters that were tuned and their values tested were: the number of trees (100, 200, 300), the minimum samples per split (2,4,8) and the maximum depth (10, 100, 1000). The final hyperparameter values used in the model were the number of trees (3 0 0), the minimum samples per split (4) and the maximum depth ("100"). Data splitting was performed before any pre-processing steps.
Model performance was reported in terms of accuracy, balanced accuracy, specificity, sensitivity, and area under the receiver operator characteristic (AUROC) curve. The 95 % confidence intervals for these were calculated using 10-fold cross-validation. The 10-fold crossvalidation data splitting procedure occurred before any pre-processing to avoid data-leakage. Given the size of the dataset, the sample size for model training and validation was considered adequate.
Shapley values were used to assess feature importance. [15] The Shapley value is the average marginal contribution of a feature value across all possible feature combinations. Shapley values assess the contribution of a given feature to the overall prediction. Model accuracy is reported in terms of the area under the receiver operating characteristic (AUROC) curve, overall accuracy and the true positive rate.
A sensitivity analysis was conducted to assess the impact of removing regular day attendances for dialysis (ICD-10 code N185 (chronic kidney disease, stage 5) and Z491 (extracorporeal dialysis)). On visual inspection of the data, spells containing these codes tended to have a very high proportion of data inconsistencies and they could add significant bias to any model. Their impact on the model was assessed by removing them and re-running the analysis. Coders for regular attendances for other reasons (e.g., cancer treatment) were not identified as significant outliers on visual inspection.
A sub-analysis was performed to identify features associated with data inconsistencies within spells where death is recorded. To do this, a mixed-effects multilevel logistic regression was used. A regression model was preferred over a random forest classifier for this analysis due to the much smaller size of the dataset.
Where data were missing the numbers are given. Missing data were rare for all features except ethnicity where 15-20 % of values were missing. When modelling ethnicity, a separate category for missing data was created. In all other cases missing data were excluded from modelling.

Results
Data were available for 172,324 unique patients with an autism diagnosis on first admission. The characteristics of these patients in their first spell during the study period are summarised in Table 1 together with the number of subsequent spells and the number of data inconsistencies. The median age of patients was 15 with a lower quartile of 9 and an upper quartile of 26. The majority of patients were aged younger than 18 years and over two-thirds were male. The Asperger's syndrome ICD-10 code was used in 36,625 (21.3 %) of first spells. There were 390,220 subsequent spells of which 170,447 (43.7 %) had inconsistencies. Data inconsistencies were more common in older age groups, females, patients from more deprived areas and patients of White ethnicity. Table 2 summarises data for the subsequent spells according to aspects of hospital care. More than half of the spells did not involve an overnight stay and these spells had a noticeably higher rate of inconsistencies than where patients stayed overnight. Elective patients had a slightly higher inconsistency rate than emergency admissions. Spells under the care of a paediatrician had a comparatively low Table 1 Number of first admission, number of subsequent spells and number of data inconsistences categorised by patient demographic characteristic on first admission.  Fig. 1 is a plot of the 20 most important features identified by the random forest model as well as the corresponding Shapley value for that feature in each sample. The features most strongly associated with inconsistencies included greater age, greater deprivation, longer time since the first spell, change in provider, shorter length of stay, being female and a change in the main specialty description. A small number of ICD-10 codes were identified as adding predictive value to the random forest classifier. Inconsistencies were less common where there was a diagnosis of at least one other learning difficulty/disability ('developmental disorder of scholastic skills' (ICD-10 code: F819) and 'attention-deficit hyperactivity disorder, predominantly inattentive type' (ICD-10 code: F900) or anxiety ('anxiety disorder, unspecified' (ICD-10 code: F419)). Inconsistencies were more common where 'chronic kidney disease, stage 5 ′ (ICD-10 code: N185) and 'care involving dialysis' (ICD-10 code: Z491) were coded together. The latter code is used when a patient attends solely for renal dialysis preparation or treatment. [16] Tables 4 and 5 provide descriptions of the most important continuous and nominal variables.
Supplementary material Figures 1 and 2 show the random forest classifier's prediction of the probability of an inconsistency as a function    For the main specialties of the final spell, the reference category was general medicine.

Table 4
Descriptions of the most predictive continuous features used in the model.  Indicates if the code F329 was (1) or was not (0) used in the spell 0: 91.0 %, 1: 9.0 % of the patient age and time in days since first coded diagnosis, respectively. The plot for patient's age shows that inconsistencies are relatively common in very young children (aged < 1 year) but are much lower in older children before rising sharply in teenage years, with steadier increases from 20 years and above. The plot for the time since the first coded diagnosis indicates that the probability of an inconsistency is at its lowest five days after first spell where the diagnosis is recorded, but steadily increases after this before remaining relatively constant.

Name of Feature
In sensitivity analysis, spells that were regular day attendances for kidney dialysis were removed. The performance of the model was similar to that of the main analysis with an accuracy of 76.1 % (95 % CI:  Figure S3 and were similar to the main analysis.
In total, 2649 patients died in hospital during the study period. Their characteristics are summarised in Table 3. The patients who did not have an autism code in the spell in which they died were more likely to be older and more deprived than those who did have an autism code. The patients with inconsistencies were also more likely to have a palliative care code in their final spell. Table 3 also shows the results of the multilevel multivariable logistic regression; inconsistencies in the final spell in those who died were significantly associated with being 80 years and over, being female, greater deprivation and use of a palliative care code in the death spell.

Discussion
Our study reveals the use of mandatory codes in the Hospital Episodes Statistics (HES) dataset to be inconsistent across spells in a large number of cases. Since our dataset reflects information recorded in the medical record during a hospital stay, it suggests that in some cases care may be sub-optimal and not take into account that a patient has a diagnosis of autism. Key factors associated with data inconsistencies included greater age, being female, greater deprivation, change of speciality or provider from the first spell, length of time from the first spell and day-case admission. The higher inconsistency rate in females, older age groups and people living in more deprived areas is perhaps the most finding striking and suggests a degree of under-recording of autism in these groups. This suggests that as people get older and may have multiple co-morbidities, clinicians may be less focussed on an underlying autism diagnosis. The higher rate of data inconsistences in females correlates with emerging evidence of late and misdiagnosis of autism in females due in part to the greater prevalence of camouflaging or masking behaviours. [17] Uncertainty of an autism diagnosis or a prolonged diagnostic process may also be possible factors impacting the inconsistency rate, particularly in younger patients. Such inconsistencies have the potential to distort our understanding of service use in key demographic groups.
Higher rates of inconsistencies were noted where there was a change of speciality or provider. The lack of joined up data systems has been acknowledged with the planned implementation of the NHS Digital and NHS England reasonable adjustment flag on the NHS Spine to improve sharing across systems. [18] Improved intra-trust (between departments) and inter-trusts communication could substantially reduce the number of data inconsistencies.
Inconsistencies were more common where the admission was associated with a routine day attendance for kidney dialysis. Although the autism codes are mandatory, this association with dialysis is not unexpected and consistent with coding practice in many trusts, where limited data is provided to clinical coders for routine day attendances and large numbers of regular day attendances may preclude extensive coding depth. Inconsistencies were less common where there was another learning disability code used in the same spell. This may be due to clinicians recognising a learning disability more easily than autism and that autism is a common comorbidity with learning disability.
Through initiatives such as the GIRFT programme, the Model Healthcare system and National Consultant Information Programme (NCIP), the use of administrative data to inform clinical and management decision making in healthcare in England is increasing and this change is to be encouraged. [19][20] The COVID-19 pandemic has also shed light on the potential of such data sources. [21] However, if attention is not paid to the quality of the underlying data, then there is potential to distort our understanding. A greater ability to link datasets across settings may help to identify and address such inconsistencies. The recommendations of the recently published Goldacre review on the use of health data may help facilitate moves to improved data quality through allowing better linkage across datasets within trusted research environments. [1] SNOMED CT is a clinical terminology set to play an important role in improving data quality and interoperability. [22] Its design and structure enable it to support scalable and meaningful health data capture, storage, retrieval and communication. [23] Since most interactions of autistic people with health services will be through primary care, comparison of primary and secondary care data at a patient level is likely to provide much greater insight than secondary care data alone. Improved intra-dataset linkage may help avoid issues around data consistency when patients change healthcare specialty or healthcare provider. In England, initiatives such as the Population Health Management Programme, developed across integrated care systems, have the potential to improve data quality. [24] Autism can impact on a person's ability to communicate health problems and to engage with support and advice offered. [2] In England, it is a legal duty under the Equality Act (2010) to ensure that reasonable adjustments, such as staff being trained in autism awareness, longer appointment times and reducing noise and distractions, are made so that services are accessible. [18] This will vary for each autistic person and not all will need them. However, clinicians need to be aware that a person has a diagnosis of autism in order to make reasonable adjustments in the provision of care. Since the end of 2021 The Learning from lives and deaths (LeDeR) programme has extended from reviewing the deaths of people with learning disabilities to include the deaths of all autistic adults within its scope. [25] This programme aims to improve services for autistic people and reducing premature mortality. The programme relies on these deaths to be identified and reported.
The need for reliable data on the health inequalities faced by autistic people has been highlighted by NHS England and NHS Improvement Autism Programme and has resulted in the inclusion of autistic people who have died in LeDeR from 2022. [25] Our study highlights the issue of identifying deaths in people with autism. We also note a higher rate of inconsistencies in patients with a palliative care code and this is perhaps unsurprising given the care setting. However, this could result in a lack of referrals to the ongoing LeDeR programme and the potential for subsequent learning and improvement recommendations to be biased. Initiatives to allow easier identification of co-morbidities within HES will help improve recording of autism diagnoses and minimise the risks of diagnostic overshadowing, particularly later in life when age-related comorbidity may come to dominate the health recorded. [26] Beyond autism, the potential impacts of data inaccuracy in people with learning disabilities have been previously identified. The 2013 confidential inquiry into premature deaths of people with learning disabilities found that the lack of reasonable adjustments was a contributory factor in a number of deaths and that hospital systems to identify people with learning disabilities who needed reasonable adjustments were limited. [27] The inquiry identified a need for clear identification of people with learning disabilities on NHS databases and for this information to be made available to care professionals in healthcare record systems, including a record of reasonable adjustment require. The issue of under-recording of learning disabilities on death certificates was identified in a recent systematic review. [28] The reasons for under recording of autism in HES are likely to be complex and specific to each trust entering data. Health professionals may not always feel able or confident enough, or feel it is necessary, to add an autism diagnosis to the clinical record if not directly relevant to the current spell of care. Autistic people themselves may not want to discuss their diagnosis and clinicians are advised by NHS England that terminology for autism is not universally accepted and that, "it is best to use the words they use to describe themselves and their loved ones". [29] Even when autism is recorded in the medical notes for a spell, out-dated terminology used by the patient may be used or an abbreviation. [29] The diagnostic term 'Asperger's syndrome' was used in over one-fifth of first spells, but is now considered an outdated term. Clinical coders can only code a diagnosis if it appears in a certain format and may be unable to code autism even if they strongly suspect it to be present from the information given. Within the specific area of learning disability, coding is also complex with slight differences in how information is recorded in medical notes leading to different ICD-10 codes being used. Therefore, to fulfil the aims of the National Autism Plan in the UK, particularly to reduce health inequalities and to ensure improved access to relevant supports and treatments, the findings of our analysis should be shared with relevant clinicians to ensure more consistent data entry, and to develop systems to optimise data quality. Our study has several strengths including the size of the dataset and the ability to track patient data across multiple hospital trusts within England and over time. The usefulness of the retrospective approach taken in this work is in highlighting data inconsistencies and identifying particular patient or spell characteristics that relate to poor data quality. These could go on to help inform policy change.
Our study also has some limitations, the most important of which have already been acknowledged in relation to coding practice. No gold standard exists against which to externally validate the HES dataset. As such, we were limited to assessing internal consistency. Inconsistencies were assessed in relation to the first spell in which the autism code was recorded during the study period. We recognise that in some cases this first recorded use of an autism code may be an error and that later absence of an autism code may represent an accurate clinical record. In our analysis we do not attempt to identify the source of the error, only that the record is inconsistent with a previous record. A potential limitation is our use of random under-sampling, which has the potential to introduce bias to the analysis. However, given the relatively large sample size, substantial bias is unlikely.
In summary, our study identifies a large number of data inconsistencies in the recording of autism in the HES dataset. Improving data quality for people with autism admitted to hospital is important in ensuring they receive appropriate care. Identifying factors associated with data inconsistencies supports discussions regarding possible reasons for these inconsistencies; a first step in improving data quality. As the UK government's national strategy for autistic children, young people and adults (2021 to 2026) [4] acknowledges, good quality data is a fundamental requirement if care for people with autism is to be improved.

Summary Table
What was already known on the topic

What this study added to our knowledge
Once diagnosed, it is a mandatory requirement to record autism in the Hospital Episodes Statistics (HES) database for England.
Specific patient features were identified as relating strongly with inconsistencies in autism coding. These included greater age and being female. The extent to which autism is consistently recorded in HES is not known.
Features relating to the hospital stay were also identified as being related to inconsistencies in autism coding. These included change of speciality or provider from the first spell in which autism was recorded, length of time from the first spell and day-case admission. Not recording autism as present during hospital stay has potential to result in (continued on next column) (continued ) What was already known on the topic What this study added to our knowledge sub-optimal care for patients and to bias data for people with autism.
Nephrology spells and spells that involved dialysis had very high rates of inconsistencies. Inconsistencies in the final spell for patients with autism who died in hospital were associated with being 80 years and over, being female, greater deprivation and the use of a palliative care code in the death spell.

Data availability statement
This report does not contain patient identifiable data. Consent from individuals involved in this study was not required. Requests for any underlying data cannot be granted by the authors because the data were acquired from data under licence/data sharing agreement from NHS Digital, for which conditions of use (and further use) apply. Individuals and organisations wishing to access HES data can make a request directly to NHS Digital.

Informed consent
Informed consent was not sought for the present study because it was an analysis of routine administrative data.

Ethical approval
Ethical approval was not sought for the present study because it did not directly involve human participants. This study was completed in accordance with the Helsinki Declaration as revised in 2013.