Leveraging real-world data to improve cochlear implant outcomes: Is the data available?

Objectives: A small but persistent proportion of individuals do not gain the expected benefit from cochlear implants(CI). A step-change in the understanding of factors affecting outcomes could come through data science. This study evaluates clinical data capture to assess the quality and utility of CI user's health records for data science, by assessing the recording of otitis media. Otitis media was selected as it is associated with the development of sensorineural hearing loss and may affect cochlear implant outcomes. Methods: A retrospective service improvement project evaluating the medical records of 594 people with a CI under the care of the University of Southampton Auditory Implant Service between 2014 and 2020. Results: The clinical records are suitable for data science research. Of the cohort studied 20% of Adults and more than 40% of the paediatric cases have a history of middle ear inflammation. Discussion: Data science has potential to improve cochlear implant outcomes and improve understanding of the mechanisms underlying poor performance, through retrospective secondary analysis of real-world data. Conclusion: Implant centres and the British Cochlear Implant Group National Hearing Implant Registry are urged to consider the importance of consistently and accurate recording of patient data over time for each CI user. Data where links to hearing loss have been identified, such as middle ear inflammation, may be particularly valuable in future analyses and to inform clinical trials.


Introduction
Hearing loss is the fourth leading cause of disability globally (Vos et al., 2016) and occurs throughout the life-course with peak incidence in both early and older age (Russ et al., 2017). One in three adults will experience significant hearing loss with the majority of those affected developing sensorineural hearing loss due to age-related hearing loss ( presbycusis). Sensorineural hearing loss describes increased hearing thresholds due to damage within the inner ear, cochlea, or vestibulocochlear nerve (Lee and Bance, 2019). The recommended therapy for bilateral severe to profound sensorineural hearing loss is cochlear implantation (NICE, 2009). Cochlear implants are a safe and effective surgically inserted prosthesis which give the majority of users functional hearing (Wilson B, 2008). Unfortunately, a minority of users have less favourable outcomes and the percentage of people affected by poor outcomes is persistent despite improvements in the field. The reasons for poor outcomes are not well understood and likely multi-factorial with implant performance, linguistic ability, and cognition contributing to CI user experience (Battmer et al., 2007;Moberly et al., 2016).
Over the last 20 years, the digitalisation of healthcare has changed the way healthcare systems function, patient care is delivered and how clinical research is conducted (Bates et al., 2014;Mallappallil et al., 2020;Ottenbacher et al., 2019). The expansion of data within healthcare has been powered by the development and adoption of electronic patient records. Simultaneous advances in computing technology, clinical informatics, digital storage capacity, the internet, and cloud computing have made it possible to store, curate, and analyse large sets data at unprecedented scale (Ottenbacher et al., 2019).
These advances have led to the development of data science, which promises to accurately process large amounts of complex real-world data to extract new information and deliver precision medicine (Booth et al., 2019). The use of data science within medicine can be split into three main tasks; associations and prediction, intervention, and counterfactual casual inference (Raita et al., 2021). As the power of secondary data analyses is realised by the wider medical community, health data will become an increasingly valuable resource in the context of hearing research (Booth et al., 2019;Cook and Collins, 2015). Whilst valuable, data by itself isn't useful (Pearl, 2019). It is the analysis of data combined with domain knowledge and causal reasoning, that holds potential. Within cochlear implantation, supervised machine learning algorithms and neural networks could find new associations between implants and recipient characteristics to model patient outcomes such as electrode extrusion, device failure, and long-term hearing outcomes. These associations, when combined with domain expertise, would inform targeted clinical trials to establish causation and effective interventions to improve cochlear outcomes. Where clinical trials are not feasible, existing domain knowledge could be encoded within analytical algorithms to establish counterfactual causal inference and answer questions such as 'What effect would this intervention have on a group of cochlear implant users with these characteristics?' (Raita et al., 2021). This could be done exclusively using existing observational data.
The production of safe and accurate predictive models in healthcare is dependent on the data used in training supervised machine learning algorithms. Data must be available in large volumes, be broad, e.g. contain many patient records, and deep, with many data variables for everyone (Sanchez-Pinto et al., 2018). Data must be trustworthy, that is both accurate and consistently reported. Data, both structured and unstructured, must be effectively curated and made available for use in data science applications (Badawi et al, 2014). Electronic patient record systems are frequently designed to support financial systems as well as clinical care. Structured reporting tools are commonly limited to support financial data with unstructured formats used for clinical data. Unstructured data is not easily analysed and requires significant pre-processing using natural language processing, labelling, and linkage to structured data. Without these fundamental qualities, data science applications in healthcare will be hindered or worse, produce inaccurate and potentially dangerous systems (Han et al, 2005).
Many successful applications of data science across medicine have already been developed; these include complex tumour analysis, diagnostic tools in inflammatory bowel disease, adverse event prediction tools in oncology, and novel public health studies looking at suicide prevention (Beck et al, 2011;Livanainen et al., 2021;Mossotto et al, 2017;Song et al, 2016). These studies have been made possible through data science approaches and demonstrate significant research value in fields typically constrained to conventional research study design with small study cohorts. These methods also create significant opportunities to increase efficiency in healthcare and reduce associated costs (Bates et al, 2014).
Most recently, big data in medicine are now best typified by the use of the electronic health records in research into COVID-19 related hospital deaths (Wood et al., 2021). Whilst these large datasets are demonstrating huge potential at whole population scale, it should not undermine the potential value of data science in smaller population groups such as people with cochlear implants. The skillsets and infrastructure developed in large datasets demonstrate the potential and should encourage their application to hearing and cochlear implantation (Lesica et al., 2021;Saeed et al., 2019).
While cochlear implant centres are promising platforms for this modality of research the relevance, availability, and quality of the data they hold has not previously been evaluated. For each CI patient, the centre providing their care will typically hold demographic data, past medical history, hearing loss aetiology, investigations, surgical records, and detailed follow-up measures of implant performance. This data could be leveraged to model and advance CI care and hearing outcomes with a cochlear implant, if the data are of appropriate quality and stored in an accessible system. Successful applications of data science to cochlear implantation are starting to emerge and could add real value to understanding factors affecting implant outcome and directing future trials (Saeed et al., 2021).
This retrospective study aims to evaluate the quality of real-world patient data held at a single UK cochlear implant centre to determine its' suitability for future research using data science methodologies. Secondary analysis of documented of otitis media in a cohort of people who have undergone cochlear implantation is used an exemplar data-field of interest.
Otitis media, an umbrella term for middle ear inflammation (MEI), is a spectrum of diseases with closely related pathological phenotypes and clinical Findlay et al. Leveraging real-world  definitions (Schilder et al., 2016). Amongst children, otitis media is a leading cause of antibiotic prescription and surgery. Approximately 60-83% of children will have ≥1 episode of acute otitis media (AOM) by the age of 3, and around 25% of three year olds will have had ≥3 episodes (Kaur et al., 2017;Teele et al., 1989). In addition to the transient conductive hearing loss due to the increased stiffness and mass of the tympanum caused by middle ear effusion (Cai et al., 2017), middle ear inflammation is associated with the development and worsening of sensorineural hearing loss (Costa and Rosito LPS, 2009). The evidence is strongest for chronic otitis media; however, there is evidence that even a single episode of acute otitis media can produce very high frequency hearing deficits. Chronic otitis media, which involves persistent middle ear inflammation, has been shown to cause damage to the cochlea (Bhutta et al., 2017;Costa and Rosito LPS, 2009;Cureoglu et al., 2004;Kaur et al., 2017;Paparella et al., 1984) that is maximal at the basal turn, reducing the thickness of the stria vascularis and the number of inner and outer hair cells (Cureoglu et al., 2004;Paparella and Goycoolea, 1980). Otitis media history is a common related pathology in cochlear implant patients. Not only is middle ear inflammation implicated in the development of sensorineural hearing loss, there is evidence it may lead to surgical challenges and short-term complications after implantation (Aftab et al., 2010;Alzoubi et al., 2015;Rak et al., 2018). The recording of otitis media (or middle ear inflammation) is therefore of significant research value in the context of understanding cochlear implant patient outcomes. The direct and indirect recording of middle ear inflammation in cochlear implant patients is a good example which to judge the quality of data recorded and stored within an auditory implant centre.
Despite the evidence, historic otitis media is frequently overlooked when evaluating CI candidacy; however, the prevalence of otitis media means a large number of individuals undergoing cochlear implantation will have a history of MEI (Luntz et al., 2004). Health data recording is typically high quality when it has immediate relevance to the clinical care of the patient. However, analysis of data recording for consistency and completeness shows that data can be of lesser quality for measures that are not considered immediately important (Mathur et al., 2014). With otitis media being outside of the typical focus of CI candidate selection, but a factor which may be prognostic of some poorer hearing outcomes, it is an exemplar for evaluating the quality of data held by cochlear implant centres in electronic health records and its suitability for research. The present study aims to identify to what extent middle ear inflammation is documented in the medical records of individuals who have undergone cochlear implantation at a single UK implant centre. The results of this study will give insight into the quality of the data held by cochlear implant centres and the potential use in prognostic models. The University of Southampton Auditory Implant Service, USAIS, is one of 19 cochlear implant centres in the UK and provides care for around 8% (2000) of the current UK population with CIs. USAIS provides a mixed adult and paediatric CI service for a variety of hearing pathologies, making it an ideal setting for this work.

Materials and methods
This study is a retrospective service evaluation conducted at the University of Southampton Auditory Implant Service (USAIS) for the purposes of service improvement. Ethical approval for the work was granted locally by the local Medicine Ethics and Research Governance Board, University of Southampton (ERGO ID:62161). Data were stored in a passwordprotected excel file in pseudonymised form using implant recipient identification codes.

Study population
All cochlear implant users under the care of USAIS were compared against the inclusion and exclusion criteria (see Fig. 1A).

Exclusion criteria
-Implanted under the care of another cochlear implant centre. -Electronic patient records inaccessible due to patient care having transferred to another centre or patient deceased.

Data collection
Data were extracted from electronic patient records of eligible individuals who have all undergone cochlear implantation, directly into a pre-prepared Microsoft Excel spreadsheet. CI user's paper records were not reviewed. Data were extracted for left and right ears separately.

Demographic data
Demographic data were included for each CI user included in the study. This included aetiology of hearing loss, age of severe/profound hearing loss, duration of hearing loss prior to cochlear implantation, and age at implantation.

Direct documentation of middle ear inflammation
Each CI user's clinical records were reviewed for direct recording of active or historic middle ear inflammation. Records reviewed included medical notes and preliminary assessment questionnaires. Terms used to directly identify middle ear inflammation (MEI) included: 'Otitis Media,' 'Ear Infection,' 'Glue Ear,' and 'Inflammation.' CI users were then divided into those who had a history of MEI recorded, those who did not have a history of MEI and those whose MEI data was missing or undocumented. Where MEI was present this was recorded as single episode, recurrent episodes of inflammation (e.g. recurrent otitis media), or chronic inflammation (e.g. glue ear).

Indirect documentation of middle ear inflammation
In addition to the direct recording of middle ear inflammation, indirect indicators of previous MEI were also extracted. Whilst indirect documentation of historic MEI suggests previous disease, this does not necessarily mean the information has been considered during the assessment phase. Explicit data recording is vital in the construction of large datasets and therefore direct recording of MEI and indirect indicators of MEI have been separated. Separation of these two types of MEI documentation important gives insight in to the accuracy of direct MEI documentation and its future research utility. The indirect indicators of MEI included otology history, the results of otoscopy and tympanometry, reports from crosssectional imaging including CT and MRI of temporal bones and previous otology surgery. Indirect otology history suggestive of previous MEI included pathologies which commonly involve or are caused by middle ear inflammation. The conditions included are not all inflammatory in their pathophysiology but predispose individuals to further to infections. These included ossicular chain damage, previous tympanic membrane perforations, otosclerosis, and cholesteatoma.

Data analysis
The data in this study were analysed in Microsoft Excel 2016. Results from left and right ears have been combined where appropriate and the adult and paediatric cohorts of CI users have been analysed and reported separately. Primary analysis for this study was prevalence of documented direct recording of historic or active MEI, no history of MEI and cases where MEI data were missing, i.e. not documented, from the records (as shown in Fig. 1B). Secondary analysis compared the incidence of indirect evidence of MEI in those with history of MEI, no history of MEI and those with undocumented or missing MEI data ( Fig. 1C and D). The schematic shown in Fig. 1 was produced in Adobe Illustrator 2022. Data visualisations in Figs. 2 and 3 were produced using Tableau 2021.2. The proportions of each cohort with direct and indirect evidence of MEI, no history of MEI and undocumented MEI have been presented in waffle plots. Waffle plots facilitate rapid visualisation of the study results. Each waffle plot consists of a 10 × 10 grid with each square representing 1% of the respective group. An initial view of a waffle plot allows the reader to interpret proportions of the whole study group based on size of the coloured area, a sum of each of the subpopulations (denoted by colour) can be made by counting squares of a particular colour.

Results
This study reviewed the records of 664 CI users who were implanted between 2014 and 2020, of which 594 individuals met the inclusion criteria. This study population represents around a third (35%) of people under the care of the centre. Of the CI user records reviewed, 457 were implanted as adults and 137 were children. Seventy CI patients were reviewed but excluded. Thirty-two CI users were implanted at another centre and 38 CI user records were inaccessible. In the adult cohort, the median age of profound/severe hearing loss was 28 years (IQR: 48 years) with a median duration between diagnosis and implantation of 30 years (IQR: 26 years). The median age of implantation in the adult cochlear implant cohort was 61 years old (IQR: 27 years). The median age of severe/profound hearing loss diagnosis in the paediatric cohort was 0 years (IQR: 0 years) with the median duration of deafness being two years (IQR: three years). The median age of implantation of the paediatric cohort was two years old (IQR: four years).

Adult cochlear implant users
Middle ear inflammation (MEI) status was directly recorded in 84.7% of adult CI users' records ( Fig.  2A). Within this, 24.3% of CI users had a positive history of MEI, 60.4% of CI users were reported as having no medical history of MEI. MEI history was missing in a significant proportion of CI users (15.3%).
The prevalence of MEI within the adult cohort of CI users remains relatively unchanged between 2014 (26.2%) and 2020 (20.0%) (Fig. 2B). The proportion of individuals whose MEI histories are missing has decreased (42.6% in 2014 and 8% in 2020).
Of the 457 adult CI users included in this study, 71.9% of users had post-lingual hearing loss compared with 28.1% with pre-lingual hearing loss. The presence of MEI and recording practises were comparable in both groups. MEI was present in 27.6% of CI users with pre-lingual hearing loss compared to 22.8% with post-lingual hearing loss. MEI history was missing in 15.8% and 15.4% of CI users, respectively.
Each case with a positive history of MEI was identified as having either a single episode of inflammation,  2C). The largest group was that with recurrent inflammation (22.5%), with a minority documented as having single episodes or chronic inflammation (0.9% in both groups). It was not possible to identify the frequency of inflammation in 75.7% of CI users with a history of MEI. Indirect indicators of middle ear inflammation were also reviewed. These indicators included past otology history, examination findings (otoscopy and Figure 1 Cochlear implant user inclusion criteria and data collection and analysis method. (A) Flowchart illustrating the inclusion and exclusion criteria for cochlear implant (CI) patients in the present study. Of 1690 CI patients, 664 users were implanted between 2014 and 2020. Seventy CI patients were excluded. thirty-two patients were implanted elsewhere and 38 patients because their records were inaccessible (B-D). The flowchart goes on to illustrate the data collection and primary and secondary analytical process. (B) The patient demographic and deafness data were analysed for key terms to indicate the direct recording/indicators of middle ear inflammation (MEI). Patients were grouped into those with a history of MEI present, those with a history of MEI absent and those with MEI history missing. (C) The data were analysed for indirect recording/ indicators of MEI such Otology history, Otoscopy and tympanometry, cross-sectional imaging, and history of Otology surgery. (D) For secondary analysis, the incidence of indirect recording/indicators of MEI were compared in those with direct recording of MEI, those with a history of MEI absent and those with MEI history missing. Comparing the direct and indirect recording of MEI allowed us to comment on the under-reporting of MEI in the CI patient cohort.  The analysis separates those who have direct recording of previous MEI, no history of MEI and whose MEI history is missing. Each waffle plot shows the proportion of CI users whose records were suggestive of MEI, were not suggestive of MEI and who had no documented information. Indirect indicators of MEI were also evaluated in the groups with no MEI history and those individuals whose MEI history was missing. In those with no history of MEI, there were indirect indicators suggestive of MEI. Otology history in 11.2%, examination history in 18.8%, imaging results in 7.6% and surgical history in 8.33%. Collectively these indicators suggested a history of MEI in 32.2% of CI users The analysis separates paediatric CI users who have direct recording of previous MEI, no history of MEI and whose MEI history is missing. Each waffle plot shows the proportion of CI users whose records were suggestive of MEI, were not suggestive of MEI and who had no documented information. with no documented history of MEI. In the group whose MEI history was missing, indirect indicators were suggestive of MEI in 22.9%, 17.1%, 12.9%, and 10.0%, respectively. Overall, for indirect indicators of MEI, 37.1% of these CI users had a history of MEI. Finally, the utilisation of previous recording of MEI on choice of which ear to implant was considered. In the adult cohort undergoing unilateral cochlear implantation, 80% of CI users (four individuals) undergoing unilateral implantation had their implant in the ear with historic MEI compared to 20% (one individual) implanted in the contralateral unaffected ear.

Paediatric cochlear implant users
Within the paediatric cohort of CI users, a history of MEI was directly recorded in 40.9% of CI users, nearly double the prevalence of the adult population (see Fig. 3A). The absence of MEI history was reported in 36.0% of users. The number of CI users whose MEI history was missing was smaller in the paediatric cohort at 13.1%.
Paediatric reporting of MEI has changed over time (see Fig. 3B). There has been a gradual reduction in the prevalence of MEI and simultaneously the number of children with documented evidence of no MEI has significantly increased. This has led to the number of paediatric CI users with missing MEI histories to fall from around 47% to 0%. This change in practice may be due to the introduction of a new paediatric questionnaire completed by, or for, CI candidates at their initial assessment.
Most of the paediatric cases, 96.4%, had pre-lingual hearing loss compared to 3.6% with post-lingual hearing loss. A history of MEI was documented in 40.2% of children with pre-lingual hearing loss compared to 60% of those with post-lingual hearing loss. MEI history was missing for 12.9% and 20% of CI users in each group, respectively.
No paediatric CI users were documented as having a single episode of MEI. 26.8% of children had recurrent MEI and 7.1% had chronic MEI, higher than their adult counterparts. 66.1% of children's MEI history was not documented in sufficient detail to categorise, this is similar to findings in the adult cohort (see Fig. 3C).
The same indirect indicators of MEI were reviewed for the paediatric cohort and suggested the incidence of MEI is higher than directly recorded (see Fig. 3D Active MEI in children is likely to result in postponement of implantation. Evaluating the effect of a history of MEI on the choice of ear for cochlear implantation was more difficult in the paediatric cohort as children typically undergo simultaneous bilateral implantation. One paediatric patient with unilateral MEI was implanted bilaterally. The single child with unilateral MEI and a single implant was implanted in the ear without MEI.

Discussion
This study presents findings from 594 cochlear implant users. This study demonstrates that the data held by cochlear implant centres are relatively complete, even when looking at factors which do not weigh heavily on implant candidacy. Direct reporting of historic middle ear inflammation is missing in a small minority of patients and the frequency of documenting MEI has improved over time. In contrast, indirect factors suggestive of MEI highlight potential instances of inaccurate data recording and when attempting to extract more detail such as the chronicity of MEI, the data was frequently not available.
Missing MEI histories make it difficult to report the overall prevalence of MEI in adults undergoing cochlear implantation. It is likely much higher than the 24.3% of adult CI users with direct recording of MEI presented in this study. This is supported by indirect indicators of MEI with one-third of those with a documented absence of MEI and those with an undocumented MEI history having some evidence of previous middle ear inflammation.
Direct recording in the present study identifies 40.9% of children undergoing cochlear implantation have historic or active MEI. Epidemiological studies estimate that over 60% of children have had at least one episode of acute otitis media by the age of 3 years old and a study looking at the use of grommets in the peri-implantation period found 42% of children undergoing implantation had an active middle ear effusion (Kaur et al., 2017;Papsin et al., 1996). This suggests active or historic MEI is also under-reported in the paediatric cohort. Indirect measures of middle ear inflammation support this. Grommet insertion is only recommended by the UK's National Institute for Health and Care Excellence (NICE) for otitis media with effusion (NICE, 2020). Consequently, the 12 cases (19%) of grommet insertion amongst those documented as having no history of middle ear inflammation and the 4 (22%) with missing MEI history, almost certainly represent missed cases of otitis media. The results of this study demonstrate under-reporting of middle ear inflammation in both adult and paediatric cohorts.
In adults, we found that 64.8% of patients with a recorded history of MEI also had indirect factors suggestive of MEI in their records. However, the remainder of the cohort (35.2%) with a positive record of MEI do not have indirect factors suggestive of MEI in their record. This highlights possible false positives and inaccuracies in the data collection and recording. More extensive analyses, such as follow-up of each case to determine the 'actual/true' or ground truth of the history of MEI of each patient was outside the scope of this study.
Analysis of electronic patient records has revealed some missing data, data that has been recorded inaccurately and emphasised difficulties in accessing patient records, particularly in older clinical records. The reasons for this are likely multi-factorial with some incorrect information reported by CI candidates as well as omissions and inaccuracies in documentation during consultations and the implementation of EPRs and improved data structures. This highlights challenges around CI user's medical records and the future application of data science methodologies.
Although this study has not investigated the complications or impact of MEI on candidate selection, many individuals undergoing cochlear implantation have either concurrent, or a history of, otitis media. While otitis media is not currently considered clinically important with regards to implant outcomes, there is reason to think it may have value as a prognostic factor. The majority of the existing literature focuses on surgical technique and short-term complications associated with otitis media (Lee and Bance, 2019). Otitis media may also have an impact on long-term implant performance. Otitis media begins with bacterial/viral infection, which subsequently causes acute middle ear inflammation. This acute reaction can be followed with more prolonged activation of macrophages in the middle and inner ear (cochlea). The cochlea contains a population of resident, long-lived, tissue macrophages (Liu et al., 2018;Nadol et al., 2014). Activated resident tissue macrophages, have the capacity to mount exaggerated inflammatory responses to subsequent immune insults (Cunningham et al., 2005;Moreno et al., 2011;Neher J, 2019). Insertion of a CI electrode array causes a tissue response and inflammation within the cochlea (Eshraghi and Van De Water, 2006;Nadol et al., 2014;Seyyedi and Nadol, 2014;Simoni et al., 2020). This inflammatory response to cochlear implantation has been shown to affect residual hearing, cochlear implant performance, and in some cases is associated with electrode extrusion (O'Leary et al., 2013;Wilk et al., 2016;Nadol et al., 2008, Ho et al., 2007. The typical inflammatory response to implantation is characterised by the development of a fibrous sheath around the electrode. This raises the likelihood that long-lived macrophages in the cochlea may contribute to outcomes after cochlear implantation (Hough et al., 2021;Wilk et al., 2016). How macrophage responses to middle ear inflammation interact or alter the response to hearing outcomes after implantation remains unknown, however, approaches that enable the study of outcomes where MEI is accurately recorded may lead to new insight.
Comprehensive health records may lead to the identification of novel potential prognostic factors for longterm implant performance from data not currently considered important. These data must be accurate, complete, and appropriately structured in electronic patient records to facilitate research to improve our understanding of hearing outcomes after implantation (Ghafur et al., 2020). This study evaluated the electronic medical records held by the centre; paper records were not examined but may contain further information. Medical records which have not been digitised contain data that have not been curated in a suitable format for data science. Healthcare providers must address these issues to facilitate research and the development of data science-powered clinical tools. This digitisation must consider future applications and use structured data formats. Where data are incomplete there is potential to fill gaps through increasing medical record availability and sharing. At this time, there continues to be a lackof health record sharing between care providers, e.g. primary and secondary care, following the collapse of a number of national initiatives and damage to public confidence (Perrin, 2016;Van Staa et al., 2016). Stakeholders need to come together to address public concerns around data privacy and the use of health data in research. Patient and public involvement and engagement (PPIE) groups will be vital in this process. PPIE groups consisting of CI users, their carers, families, and those in the D/deaf community who do not use CIs should be invited to engage in these issues. Moreover, to produce data sets large enough for data science, implant centres will need to collaborate to share data whilst individually ensuring their own documentation is fit for purpose and managed in line with governance requirements. Failing to keep good records beyond the scope needed for routine clinical practice  (Porter et al, 2019). The British Cochlear Implant Group is in the process of developing a hearing implant registry (British Cochlear Implant Group, 2019). Whilst in its infancy, if well designed this has the potential to generate a collaborative dataset for research and for well-stratified clinical trials. The findings of the recording of middle ear inflammation in this study have demonstrated the completeness, and therefore opportunity for future analysis, of data not considered of significant clinical importance at the time of documentation. Clinical registries are uniquely placed to collect and curate data which may be of future interest. The national hearing implant registry is therefore urged to ensure broad amounts of data are collected to maximise the registries' potential as a research database. This can be facilitated by lowburden data upload for clinical centres and data formats that are interoperable and that can be readily linked with other electronic health records. If the issues discussed in the present study are addressed, there is potential to use routinely collected data to perform novel and high-powered studies in the cochlear implant field. Despite the challenges highlighted, we predict that new prognostic factors for hearing outcomes with cochlear implants will be identified that enable the development of validated predictive models (Velde et al., 2021) for implant outcomes, inform candidate and device selection and direct future clinical trials for improved patient benefit.

Conclusion
This study highlights that the data held within CI users' clinical records are largely complete, making them suitable for data science research into outcomes with a cochlear implant. The scale of these studies will be small compared to recent work looking at health outcomes through analysis of health records, but this offers a step change in a field where studies still frequently include <200 subjects (Bhaskaran et al., 2021). The records from this centre identify that it may be possible to apply data science methodologies to data that was not considered important to outcome at the time of documentation. For example, the consequences of otitis media on implant performance. As new research methodologies using data science and machine learning algorithms are applied to cochlear implantation, accurate and accessible clinical records will become a valuable resource to developing patient care. As the UK cochlear implant registry moves from concept to reality, there is an opportunity for the British cochlear implant community to lead globally on data collection and curation in cochlear implantation and our understanding of CI prognostic factors (British Cochlear Implant Group, 2019).

Disclaimer statement
Funding Callum Findlay, Academic Clinical Fellow, is funded by a National Institute of Health Research (NIHR) Academic Clinical fellowship with the University of Southampton alongside Health Education England. The views expressed in this publication are those of the authors and not necessarily those of the NIHR, University of Southampton, NHS or the UK Department of Health and Social Care. Kate Hough -EPSRC for PhD studentship funding and additional support from Oticon Medical. Supported by funding to Tracey Newman, from the Web Stimulus Fund, University of Southampton.
Data availability statement Data for this study include protected health information. Currently, no data are available for sharing. With appropriate data use agreements, data can potentially be made available. Please contact the corresponding author for further details.

Notes on contributors
Callum Findlay is an NIHR Academic Clinical Fellow in Otolaryngology at the University Hospital Southampton and the University of Southampton. Callum is developing a programme of research using real-world data to identify and analyse prognostic factors associated with outcomes after cochlear implantation. Matthew Edwards is a medical student at the University of Southampton who has evaluated the completeness of clinical data recording of measures that are not associated with direct clinical care as part of his MMedSc. Kate Hough is a doctoral research fellow at the University of Southampton studying how the immune system interacts with cochlear implants and how this affects outcomes. Mary Grasmeder is a Senior Clinical Scientist with an interest in research relating to cochlear implants, in particular diagnosis of issues affecting device performance. Tracey Newman is an associate professor in neuroimmunology. Her research is developing mechanistic understanding of prognostic factors of poorer hearing outcomes after cochlear implantation.