Identification of kidney-related medications using AI from self-captured pill images

Abstract Introduction ChatGPT, a state-of-the-art large language model, has shown potential in analyzing images and providing accurate information. This study aimed to explore ChatGPT-4 as a tool for identifying commonly prescribed nephrology medications across different versions and testing dates. Methods 25 nephrology medications were obtained from an institutional pharmacy. High-quality images of each medication were captured using an iPhone 13 Pro Max and uploaded to ChatGPT-4 with the query, ‘What is this medication?’ The accuracy of ChatGPT-4’s responses was assessed for medication name, dosage, and imprint. The process was repeated after 2 weeks to evaluate consistency across different versions, including GPT-4, GPT-4 Legacy, and GPT-4.Ø. Results ChatGPT-4 correctly identified 22 out of 25 (88%) medications across all versions. However, it misidentified Hydrochlorothiazide, Nifedipine, and Spironolactone due to misreading imprints. For instance, Nifedipine ER 90 mg was mistaken for Metformin Hydrochloride ER 500 mg because ‘NF 06’ was misread as ‘NF 05’. Hydrochlorothiazide 50 mg was confused with the 25 mg version due to imprint errors, and Spironolactone 25 mg was misidentified as Naproxen Sodium or Diclofenac Sodium. Despite these errors, ChatGPT-4 showed 100% consistency when retested, correcting misidentifications after receiving feedback on the correct imprints. Conclusion ChatGPT-4 shows strong potential in identifying nephrology medications from self-captured images, though challenges with difficult-to-read imprints remain. Providing feedback improved accuracy, suggesting ChatGPT-4 could be a valuable tool in digital health for medication identification. Future research should enhance the model’s ability to distinguish similar imprints and explore broader integration into digital health platforms.


Introduction
patients in the modern age are more prone to medication errors due to more drugs with advances in drug research, coexistence of multiple comorbidities, and resulting complex drug regimens.one of the major contributing factors to medication error is polypharmacy, which is defined as receiving more than five drugs [1][2][3][4].the prevalence of polypharmacy significantly increased from 8.2% in 1990 to 17.1% in 2018 [1][2][3][4].polypharmacy has had negative associations with patient safety, including medication errors, nonadherence, adverse drug events, drug-drug interaction, hospitalizations, and mortality [4][5][6][7][8][9][10]. patients with chronic kidney disease (CKD) represent a cohort of the population at the highest polypharmacy risks due to CKD-associated multiple comorbidities, increased daily pill burden, and the complexity of medication management [10][11][12][13][14]. prior studies have substantiated that polypharmacy was identified as a significant risk factor for the high reported prevalence of medical errors at 68% in nephrology [10].An effective first strategy to manage polypharmacy and reduce medication errors for patients' safety involves medication reconciliation [15][16][17].Medication reconciliation can be challenging, however, as patients often have several providers that prescribe medications that may not be captured in a single medical record system, such as multi-department outpatient clinics, after-hospital discharge, over-the-counter medications, and complementary medicine or other supplements.Medication reconciliation can also be challenging for doctors and pharmacists when patients only provide pills without medical labels [18].patient recalling their home medications can also be prone to errors due to look-alike or sounds-alike (LASA) names [19].the use of artificial intelligence (Ai) may help in this challenging but important aspect of medical care, as Ai has already been implemented to assist in drug counseling [20] and identifying drug-drug interactions [21,22].
ChatGpt, a large language model developed by openAi [23], has been increasingly integrated widely into healthcare.ChatGpt-4 displays more human-like versatility in conversing coherently on open-ended topics [24][25][26][27].Since its release in 2022, ChatGpt has shown aptitude in assimilating both textual and visual inputs from complex prompts to produce responsive narratives.Several potential uses for ChatGpt as a healthcare tool include refining the healthcare provider's decision-making process, providing patient information and education, clinical documentation, and research.in the context of nephrology, previous studies have demonstrated that ChatGpt could assist in providing personalized renal nutritional dietary advice and tailored CKD patient education [28][29][30][31].Assessments across medical disciplines have also confirmed its ability to analyze images and radiographic studies to generate accurate diagnostic and management recommendations [32].However, ChatGpt −4's reliability in identifying pharmacotherapy pertinent to specialized fields like nephrology remains unexplored.
this study aims to bridge this knowledge gap by exploring ChatGpt −4's accuracy in identifying medications commonly used in nephrology practice via self-captured images.our primary objective was to evaluate the model's performance in drug identification.the secondary aims included assessing the consistency of ChatGpt −4's responses and its ability to integrate feedback to improve performance.Demonstrating ChatGpt −4's capabilities in this focused task could pave the way for larger research efforts on integrating Ai to improve nephrology practice.
to prepare the images, two identical sample pills of each selected medication were obtained from the central pharmacy.each pair of pills was then placed side by side on a neutral grey background, with one pill showing the front and the other showing the back.this setup was designed to capture all possible identifying marks on the pills.the images were captured using an iphone 13 pro Max in a well-lit environment, with the camera positioned approximately 15.24 centimeters (about 6 inches) above the pills.this distance ensured that the entire pill surface was within the frame, while still capturing fine details. the camera's maximum resolution of 4032 × 3024 pixels was used, which provided a high pixel density of at least 300 pixels per inch (ppi), ensuring that the imprints and other distinguishing features on each medication were clearly visible.

ChatGPT-4 queries
the procedure commenced with providing ChatGpt-4 with a specific instruction to employ its browser capability to ascertain the identity of the medication.this was done using the prompt: 'use browser to identify the answer of the following next prompt' .following this directive, de-identified pill images were uploaded to ChatGpt-4.each image was then queried with the subsequent text prompt: 'What is this medication?' importantly, no additional information concerning the medication's group, clinical indication, or imprint code was provided alongside the queries.
ChatGpt-4's ability to identify the medication name, dosage, and pill imprint by image was reviewed and assessed for accuracy by study investigators.if ChatGpt-4 did not accurately identify the medication due to misreading the pill imprint, it was provided with the correct feedback on the imprint and asked to generate another response using the following prompt: 'the correct imprint for this medication is XXX; what medication is this?' .the entire image query process was repeated across multiple versions of ChatGpt-4, including Gpt-4 (12/12/23, 12/30/23), Gpt-4 Legacy (08/22/24, 08/29/24), and Gpt-4.Ø (08/22/24, 08/29/24), to assess consistency and reliability of the Ai responses over time.

Statistical analysis
the primary outcome measure of this study was the accuracy of ChatGpt-4 in identifying the medications based on the provided images.Accuracy was calculated as the percentage of correctly identified medications out of the total number of medication images tested.A correctly identified medication was defined as an instance where ChatGpt-4 accurately identified the medication name, dosage, and imprint.the total number of medication images tested was 25, with each medication being tested once in the initial round and once in the repeat round after a 2-week interval.
Secondary outcomes included the consistency of ChatGpt-4's responses between the initial and repeat rounds, as well as the model's ability to improve its accuracy when provided with corrective feedback on misidentified medications.Consistency was assessed by comparing the responses between the two rounds of testing, while the effect of corrective feedback was evaluated by re-testing the misidentified medications after providing the correct imprint information.
the focus was on providing descriptive statistics to summarize ChatGpt-4's performance in identifying medications from images.
in the initial round of testing, all versions of ChatGpt-4 accurately identified 22 out of 25 medications (88%) based on the provided images.the model correctly identified the medication name, dosage, and imprint for these 22 instances, but consistently misidentified Hydrochlorothiazide, Nifedipine, and Spironolactone due to errors in interpreting the imprints.
When re-tested after 2 weeks, both Gpt-4 Legacy and Gpt-4.Ø provided identical responses to the initial round, demonstrating 100% consistency.the models again accurately identified 22 out of 25 medications (88%) and made the same errors in misidentifying Hydrochlorothiazide, Nifedipine, and Spironolactone.
for the extended-release Nifedipine mg (figure 1), ChatGpt-4 consistently misread the imprint 'Nf 06' as 'Nf 05' and incorrectly identified the medication as Metformin Hydrochloride er 500 mg across all versions and dates.for Hydrochlorothiazide 50 mg (figure 2), ChatGpt-4 misread the imprint 'teVA 2089' as 'teVA 2003' , identifying the medication as Hydrochlorothiazide 25 mg in each instance.for Spironolactone 25 mg (figure 3), the imprint '660' was consistently misread as '550' , leading to the identification of the medication as Naproxen Sodium 550 mg or Diclofenac Sodium 550 mg (table 2).
After receiving corrective feedback on the misidentified medications, ChatGpt-4 was able to accurately identify all three medications when re-tested with the correct imprint information.Despite the initial errors, the models' consistency in responses (88% accuracy) across multiple testing dates and versions highlights their reliability, though the difficulties in distinguishing similar imprints suggest the need for further refinement.

Discussion
ChatGpt-4 demonstrated consistent accuracy and reliability across multiple versions and testing dates, correctly identifying 88% (22 out of 25) of the pills based solely on medication images and simple queries.the model was able to provide comprehensive information, including the drug's name, dosage, and pill imprint.Despite its overall strong performance, the consistent misidentification of three medications-Hydrochlorothiazide, Nifedipine, and Spironolactoneacross all versions highlights a potential limitation in the Ai's ability to differentiate between similar imprints.these findings suggest that while ChatGpt-4 is a valuable tool for medication identification, further refinement is necessary to enhance its accuracy, particularly in distinguishing between medications with similar visual characteristics.this contrasts with a previous investigation by Benedict et al. [33], where it was reported that only 26% of responses from an iteration of ChatGpt contained correct drug information.in our study, ChatGpt-4 demonstrated an 88% accuracy rate in identifying medications from images, which is a significant improvement compared to the 26% accuracy reported by Benedict et al. this notable difference in accuracy may be attributed to the advancements in Ai technology and the specific versions of ChatGpt employed in each study.Benedict et al. 's research [33] utilized ChatGpt-3, which was limited to processing text-only inputs for generating answers.Conversely, our study engaged ChatGpt-4, a more refined version that has demonstrated increased accuracy and a reduced propensity for generating incorrect information, or 'Ai hallucinations' [34].the model's enhanced capacity to understand context and provide more accurate and coherent responses contributes to its superior performance in tasks like medication identification.furthermore, our methodology encompassed the use of both pill images and textual queries to prompt ChatGpt-4 for medication information.this dual-input approach represents a significant methodological advancement, potentially influencing the accuracy and applicability of Ai in drug information retrieval.the incorporation of visual data, alongside textual queries, may account for the improved performance and reliability observed in our study, underscoring the rapid evolution and enhanced capabilities of large language models in healthcare applications.
in our study, we did evaluate different dosages for specific medications, such as Hydrochlorothiazide, Nifedipine, and Spironolactone.the results showed that ChatGpt-4 had difficulty differentiating between these dosages.for example, Hydrochlorothiazide 50 mg (teVA 2089) was confused with Hydrochlorothiazide 25 mg (teVA 2003), and Nifedipine er 90 mg (Nf 06) was incorrectly identified as Nifedipine  5 mg (Nf 05).Similarly, Spironolactone 25 mg (660) was misidentified as other medications like Naproxen Sodium 550 mg (550), highlighting a challenge related to imprint recognition rather than dosage differentiation.these findings indicate that while ChatGpt-4 generally performs well, it does struggle with distinguishing between different dosages of the same medication, particularly when the imprints or visual features are similar.this underscores the need for further research to improve the Ai's ability to accurately identify and differentiate between various dosages in real-world clinical settings, ensuring that such tools can reliably support healthcare professionals in managing patient medications.
in this study, all incorrect responses from ChatGpt-4 were attributed to the misinterpretation of pill imprints.this issue is commonly associated with look-alike drugs, a significant contributor to medication errors in patients.to enhance patient safety and quality of care, the World Health organization (WHo) and the Joint Commission (tJC) recommended the implementation of medication reconciliation at all transitional care stages [16,17].Medication reconciliation involves the identification of all medications that a patient is taking, which allows the detection of unintentional discrepancies and reduces medication errors [35].While effective medication reconciliation is beneficial, it is also a timeconsuming process that can be made more challenging in patients with polypharmacy or who only have their pills without accompanying drug labels.our findings suggest that ChatGpt could assist pharmacists, healthcare professional teams, and patients in identifying unlabeled medications using only self-captured images.this could significantly decrease time and workload, allowing healthcare professionals to dedicate that time to other important patient tasks.However, it is important to note that while ChatGpt can serve as a tool for quickly identifying pills and assisting in medication reconciliation, human professional supervision remains necessary.the implementation of ChatGpt, along with human oversight, could enhance clinical care, reduce healthcare worker fatigue, and improve patient safety.
the accuracy of Ai tools like ChatGpt-4 in identifying medications from images depends significantly on image quality.Clear pill imprints are crucial for correct identification; poor lighting, low resolution, or other quality issues can hinder both Ai and human performance.As shown in figure 2, the difficult-to-read imprint on Hydrochlorothiazide 50 mg led to misidentification by ChatGpt-4, highlighting a broader limitation when image quality is suboptimal.future Ai models should incorporate image enhancement techniques and feedback mechanisms to improve accuracy, making them more reliable in varied real-world conditions.Continued research is needed to train Ai to handle poor-quality images, ensuring broader applicability and reliable support in medication management, even when ideal image conditions are not met. in addition, while this study shows ChatGpt-4's potential in identifying nephrology medications with high accuracy, it is important to note the study's limitations.the focus on commonly used nephrology medications and reliance on high-resolution images under optimal conditions limit its generalizability.future studies should include a wider range of medications and focus on improving Ai performance in less-than-ideal image capture scenarios.this will help expand the Ai's applicability and reliability across different clinical settings.
there are some limitations to this study.first, our study only utilized a high-standard quality picture input for ChatGpt-4 queries, and it did not examine the ability of ChatGpt-4 to identify pills using various image qualities.real-world challenges like poor lighting or obscured imprints may confound performance.thus, ChatGpt −4's accuracy with pill identification using low-quality images may be different.Second, we used a limited subset of medications (table 1) that are more commonly utilized in the field of nephrology.We did not assess ChatGpt-4's ability to identify medications that are over the counter, vitamins, or commonly utilized in other medical or surgical specialties.Moreover, newer medications, such as budesonide, finerenone, and tenapanor, that were recently approved by the fDA were not included in our study.testing ChatGpt-4 on a larger and more diverse medication set could further establish generalizability.Additionally, in the real world, generic medications often vary in color, shape, size, and imprint depending on the manufacturer, which can impact the accuracy of Ai-based identification tools like ChatGpt-4.our study focused on a specific set of medications with single-brand representations, providing a controlled evaluation of the Ai's capabilities.However, this approach does not capture the full variability seen in clinical practice.Different brands of the same generic medication could pose challenges for Ai models, potentially leading to misidentifications.to address this, future research should include multiple brands and a broader dataset to assess how such variations affect Ai accuracy.Nevertheless, in this study, we demonstrated ChatGpt −4's strong aptitude for identifying Nephrology-related medications when presented with pill images.to our knowledge, this is the first assessment of ChatGpt-4's diagnostic capabilities within the context of Nephrology practice.Across 25 medications commonly used to manage conditions such as hypertension, hyperlipidemia, anemia, bone mineral disease, and fluid overload, ChatGpt-4 demonstrated an initial high accuracy rate of 88% during the first round of testing.However, despite this initial success, the model showed limitations in its reliability upon re-exposure, consistently misidentifying the same three drugs when tested 3 weeks later.this outcome refines our understanding of the model's ability to develop a 'visual-semantic' understanding through contextual learning.While ChatGpt-4 exhibited potential in recognizing and learning imprint patterns, its consistent accuracy over time for all tested medications was not as robust as initially observed.our findings contribute to the evolving discussion on the capabilities and limitations of transformer-based language models in acquiring and applying a 'visual-semantic' understanding of images, underscoring the complexity of achieving consistent long-term identification accuracy.expanding the scope of testing to include multiple brands and a broader dataset will enhance the generalizability and reliability of Ai tools like ChatGpt-4 in real-world applications, ultimately improving patient safety and care.Nevertheless, future studies should continue to incorporate repeated testing under varied conditions to further validate the stability and accuracy of Ai models in clinical applications.By doing so, we can better understand the strengths and limitations of these tools, ensuring that they are reliable and safe for use in healthcare settings.

Conclusion
ChatGpt-4 exhibited an accurate and reliable response to identify medications when presented with self-captured images of pills.Difficult-to-read pill imprints compromise the accuracy of the ChatGpt-4 performance, but it can be further improved when providing corrective feedback.our novel approach underscores ChatGpt-4 as a potentially valuable tool for pill identification, medication reconciliation, and patient safety in nephrology practice when under supervision by a healthcare professional.future work should focus on addressing the limitations identified in this study, such as improving the model's ability to distinguish between medications with similar imprints and enhancing its performance across a broader range of drug classes and dosages.Additionally, expanding the dataset used to train the model, particularly with images from various sources and under different conditions, could improve its robustness and generalizability.Moreover, integrating real-time feedback mechanisms into the model could allow for continuous learning and adaptation, further improving its accuracy over time.
Beyond medication identification, future research could explore the application of ChatGpt-4 and similar models in other areas of clinical decision support, such as drug-drug interaction checking, personalized dosing recommendations, and patient education.these advancements could significantly contribute to improving patient safety and the efficiency of healthcare delivery.

Figure 1 .
Figure 1.ChatGPT's interpretation of the image depicting nifedipine extended-release 90 mg, incorrectly identified as metformin hydrochloride eR 500 mg.

Table 1 .
ChatGPT -4's response to queries for medication identification based on images.