OpenAI is a non-profit artificial intelligence research company founded in 2015.(4) One of OpenAI’s most notable projects, ChatGPT, is a web application that is powered by a state-of-the-art LLM called Generative Pretrained Transformer (GPT).(5) ChatGPT functions as an intelligent chatting bot built upon its various language understanding mechanisms, such as multilingual machine translation, code debugging, story writing, mistake correction, and identification and rejection of inappropriate requests. These mechanisms allow users to input specific prompts and receive detailed responses.(6) GPT’s impressive performance in various tasks is possible due to its pre-training process, where the model is trained with large amounts of structured and unstructured data from books, articles, reviews, and online conversations.
This extensive pre-training process ultimately separates GPT from previous LLMs that failed to interpret the context of a given input and produce relevant output.(5) GPT’s ability to derive context from data without needing domain knowledge from medical experts can be utilized to extract relevant information from clinical notes. Electronic health records (EHR) are clinical patient records that contain medical information such as vitals, lab results, medical history, and clinical notes from providers. The efficient transmission and analysis of EHR data between various providers improve clinical care quality considerably.(7) The two primary methods to automate ICD code assignment to text-free clinical notes are rule-based systems and learning-based systems. The former depends on the manual intervention of medical professionals, thus limiting the scale by which the process can be optimized. The latter does not require manual manipulation and relies on learning algorithms to extract meaningful underlying distributions in datasets.(8) OpenAI’s GPT model is an example of a learning-based system that can be trained to assign ICD codes to clinical notes using datasets prelabelled with ICD codes. Fine-tuning is the process of providing the pre-trained model with a smaller, specialized dataset for further training. Considering the model possess contextual knowledge from the larger corpus it was originally trained on, it is able to derive important insights from the smaller but more specific dataset and potentially, significantly improve the performance of the model on a specialized task.(9) By using clinical training data and fine-tuning the GPT model using OpenAI’s platform, one can assess the effectiveness of using GPT for ICD code assignment based on clinical notes.
Electronic health records (EHR) are clinical patient records that contain medical information such as vitals, lab results, medical history, and clinical notes from providers. The efficient transmission and analysis of EHR data between various providers improve clinical care quality considerably.(7) GPT’s ability to derive context from data without needing domain knowledge from medical experts can be utilized to extract relevant information from clinical notes. The two primary methods to automate ICD code assignment to text-free clinical notes are rule-based systems and learning-based systems. The former depends on the manual intervention of medical professionals, thus limiting the scale by which the process can be optimized. The latter does not require manual manipulation and relies on learning algorithms to extract meaningful underlying distributions in datasets.(8)
This extensive pre-training process ultimately separates GPT from previous LLMs that failed to interpret the context of a given input and produce relevant output.(5) OpenAI’s GPT model is an example of a learning-based system that can be trained to assign ICD codes to clinical notes using datasets prelabelled with ICD codes. Fine-tuning is the process of providing the pre-trained model with a smaller, specialized dataset for further training. Considering that the model possesses contextual knowledge from the larger corpus on which it was originally trained, it is able to derive important insights from the smaller but more specific dataset and potentially to improve the performance of the model on a specialized task (9) By using clinical training data and fine-tuning the GPT model using OpenAI’s platform, one can assess the effectiveness of using GPT for ICD code assignment based on clinical notes.
Literature Review
AI in Healthcare
Artificial intelligence in healthcare can potentially lower healthcare costs and improve outcomes. An estimate is savings of $150 billion in the United States healthcare industry by 2026.(10) AI has found its place in healthcare as robotic-assisted surgical systems, virtual nurse assistants, medications management, medical diagnostics, and so on.(10) This paper will discuss the potential of a specific AI model, GPT-3.5 Turbo for ICD code assignment.
Challenges of ICD Codes Implementation
There are several costs associated with the use of ICD codes, especially during a time of transition (from ICD-9 to ICD-10 in 2015 or currently from ICD-10 to ICD-11). One survey of 6000 medical centers found the average time spent on staff education was 61.2 hours for small, and 139 hours for medium-sized practices, and for physician education was 35.6 hours and 75.1 hours, respectively.(11) The average cost of the ICD-10-CM implementation in the United States was between $6,748 to $9,564 for a small medical practice and between $14,577 to $23,062 for a medium-sized medical practice. These costs included software updates, staff education, and EHR quality assurance projects.(11) Furthermore, the transition from ICD-10-CM code to the new ICD-11 system will take time. One 2021 study found that approximately 23.5% of ICD-10-CM codes only could be fully represented by a single ICD-11 stem code without the need for combining multiple codes and, if necessary, introducing new stem codes.(12) Most studies focus on the financial and time burden, but less research is available on the emotional stress within the healthcare system surrounding ICD coding. In clinical practice, many clinicians are frustrated by the emphasis on medical coding. One 2015 survey found that over 85% of surveyed clinicians said ICD-10 diverts focus from patient-centered care and more towards insurance and billing.(13)
Coding Errors
Insurance companies and Medicare use these codes in the diagnosis-related group’s payment system (DRG) to determine payments to hospitals.(14) Correct coding of patient encounters is exceedingly important, and failure to correctly code can have several financial and even legal repercussions for a medical practice. Some of the most common errors include upcoding (reporting that a provider spent more time with a patient than in reality), selecting the wrong procedure code, and using dated coding term instead of updated ones.(15) In the U.S., the quality of the coding process has been questioned by many studies showing there is significant room for improvement. One study by the National Academy of Medicine on the reliability of hospital discharge coding showed that only 65% agreed with independent re-coding.(16) Hsia, et al revealed a coding error rate of 20%.(17) Other similar studies showed a typical error rate of 25–30%, with low agreement between coders.(18) A report analyzing the previous ICD-9-CM codes estimated that the cost of correcting wrong codes in the U.S. was upwards of $25 billion per year. Manual coding for diverse disease etiologies, pathologies, clinical manifestations, and treatment plans is not only prone to errors but is also time-consuming and inefficient.(19)
To overcome the challenges associated with using ICD code assignments and implementation, AI seems to offer a viable solution.
AI for Assigning ICD Codes
A review of 1611 publications with automated coding from 1974–2020 found a significant increase in AI-based coding publications after 2009, with Natural Language Processing (NLP) and Machine Learning (ML) as the most used methodologies for automated coding.(20) An example is the successful collaboration between a Clinical Documentation Integrity Specialist and an embedded Computer Assisted Coding (CAC) system.(21) The ICD provides a taxonomy of classes, representing various conditions addressed at an episode of care for a patient, as presented in clinical documentation. Considering clinical documentation consists of unstructured textual data, and a single note can have multiple ICD codes assigned to it, it can thus be treated as a multi-label classification problem.(22) Deep learning-based methods have outperformed other conventional models in ICD codes assignment.(23) A systemic review of studies from 2010 to 2021 provided an overview of automatic ICD coding assignment systems that utilized NLP, machine learning, and deep learning techniques, and concluded that deep learning models were found to be better than other traditional machine learning models when automating clinical coding systems.(24)
Utilizing NLP techniques such as Word Embedding (a representation of words and phrases by vectors in a low-dimensional space such that it retains semantic and syntactic information) and a Convolutional Neural Network model (a deep learning algorithm that captures hierarchical patterns in textual data utilizing convolutional layers), another study processed 21,953 clinical records from five departments, significantly enhancing the accuracy of automated ICD-10 code predictions and potentially easing the manual coding process for physicians.(25) A similar study analyzed the use of a natural language processing-bidirectional recurrent neural network (NLP-BIRNN) algorithm to optimize the medical records and identified areas of error by medical coders. NLP-BIRNN is a deep learning algorithm that processes sequences of text in both forward and reverse directions, thus retaining contextual information from both past and future states. NLP-BIRNN reduced errors in the assignment of principal diagnosis and ICD coding.(26) The introduction of transformers (deep learning models that rely on self-attention mechanisms, processing entire sentences simultaneously, rather than word-by-word, being a lot more efficient at retaining context and thus exceptional at linguistic tasks) and Large Language Models (AI systems based on transformer architecture, trained on diverse language dataset to understand, generate and interact with human language at large) opened new doors. Publicly available systems like ChatGPT make these models available to the general public and it was only a matter of time before professionals started experimenting with these systems for healthcare applications. One study found that ChatGPT was able to generate at least one correct ICD code for an encounter 70% of the time.(27) Another compared off-the-shelf LLM models GPT-4, Llama-2 and a model specifically trained for ICD code assignments known as PLM-ICD and showed that PLM-ICD had a consistent accuracy of 22%, while GPT-4 accuracy was 22.5% as represented by F1-score.(28) The objective of the current project is to evaluate the precision of GPT to assign ICD-10 codes, and whether fine tuning can improve its performance.