Validation of a Natural Language Machine Learning Model for Safety Literature Surveillance

Park, Jiyoon; Djelassi, Malek; Chima, Daniel; Hernandez, Robert; Poroshin, Vladimir; Iliescu, Ana-Maria; Domalik, Douglas; Southall, Noel

doi:10.1007/s40264-023-01367-4

Validation of a Natural Language Machine Learning Model for Safety Literature Surveillance

Original Research Article
Published: 08 November 2023

Volume 47, pages 71–80, (2024)
Cite this article

Drug Safety Aims and scope Submit manuscript

Jiyoon Park¹,
Malek Djelassi²,
Daniel Chima¹,
Robert Hernandez³,
Vladimir Poroshin³,
Ana-Maria Iliescu⁴,
Douglas Domalik¹ &
…
Noel Southall ORCID: orcid.org/0000-0003-4500-880X¹

297 Accesses
Explore all metrics

Abstract

Introduction

As part of routine safety surveillance, thousands of articles of potential interest are manually triaged for review by safety surveillance teams. This manual triage task is an interesting candidate for automation based on the abundance of process data available for training, the performance of natural language processing algorithms for this type of cognitive task, and the small number of safety signals that originate from literature review, resulting in its lower risk profile. However, deep learning algorithms introduce unique risks and the validation of such models for use in Good Pharmacovigilance Practice remains an open question.

Objective

Qualifying an automated, deep learning approach to literature surveillance for use at AstraZeneca.

Methods

The study is a prospective validation of a literature surveillance triage model, comparing its real-world performance with that of human surveillance teams working in parallel. The biggest risk in modifying this triage process is missing a safety signal (resulting in model false negatives) and hence model recall is the main evaluation metric considered.

Results

The model demonstrates consistent global performance from training through testing, with recall rates comparable to that of existing surveillance teams. The model is accepted for use specifically for those products where non-inferiority to the manual process is rigorously demonstrated.

Conclusion

Characterizing model performance prospectively, under real-world conditions, allows us to thoroughly examine model consistency and failure modes, qualifying it for use in our surveillance processes. We also identify potential future improvements and recognize the opportunity for the community to collaborate on this shared task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

LiSA: an assisted literature search pipeline for detecting serious adverse drug events with deep learning

Article Open access 22 December 2022

Application of Augmented Intelligence for Pharmacovigilance Case Seriousness Determination

Article Open access 11 October 2019

Utilizing Deep Learning for Detecting Adverse Drug Events in Structured and Unstructured Regulatory Drug Data Sets

Article 24 July 2022

References

Landhuis E. Scientific literature: information overload. Nature. 2016;535(7612):457–8.
Article PubMed Google Scholar
Huysentruyt K, et al. Validating intelligent automation systems in pharmacovigilance: insights from good manufacturing practices. Drug Saf. 2021;44(3):261–72.
Article PubMed PubMed Central Google Scholar
Ball R, Dal Pan G. “Artificial Intelligence” for pharmacovigilance: ready for prime time? Drug Saf. 2022;45(5):429–38.
Article PubMed PubMed Central Google Scholar
Pinheiro LC, Kurz X. Artificial intelligence in pharmacovigilance: a regulatory perspective on explainability. Pharmacoepidemiol Drug Saf. 2022;31(12):1308–10.
Article PubMed Google Scholar
Liu X, et al. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension. Nat Med. 2020;26(9):1364–74.
Article CAS PubMed PubMed Central Google Scholar
Danysz K, et al. Artificial intelligence and the future of the drug safety professional. Drug Saf. 2019;42(4):491–7.
Article PubMed Google Scholar
European Medicines Agency and Heads of Medicines Agencies, Guideline on good pharmacovigilance practices (GVP). Module VI – Collection, management and submission of reports of suspected adverse reactions to medicinal products (Rev 2). 2017.
McHugh ML. Interrater reliability: the kappa statistic. Biochem Med (Zagreb). 2012;22(3):276–82.
Article PubMed Google Scholar
Newcombe RG. Two-sided confidence intervals for the single proportion: comparison of seven methods. Stat Med. 1998;17(8):857–72.
Article CAS PubMed Google Scholar
Beltagy IP, Matthew E. Cohan, Arman, Longformer: the long-document transformer. arXiv, 2020. https://doi.org/10.48550/arXiv.2004.05150.
Liu Y, et al. RoBERTa: a robustly optimized BERT pretraining approach. 2019. arXiv pre-print server.
Rokach L. Ensemble-based classifiers. Artif Intell Rev. 2010;33(1–2):1–39.
Article Google Scholar
Pharmaspectra, Insightmeme. https://insightmeme.com/. Accessed 25 Oct 2023.
Elsevier, Embase. https://embase.com/. Accessed 25 Oct 2023.
Wu E, et al. How medical AI devices are evaluated: limitations and recommendations from an analysis of FDA approvals. Nat Med. 2021;27(4):582–4.
Article CAS PubMed Google Scholar
D’Agostino RBSR, Massaro JM, Sullivan LM. Non-inferiority trials: design concepts and issues—the encounters of academic consultants in statistics. Stat Med. 2003;22(2):169–86.
Article PubMed Google Scholar
Rothmann MD, Wiens BL, Chan ISF. Design and analysis of non-inferiority trials. Chapman & Hall/CRC biostatistics series. Boca Raton: Chapman & Hall/CRC; 2012. p. 438 (xvi).
Google Scholar
Haviland MG. Yates’s correction for continuity and the analysis of 2 x 2 contingency tables. Stat Med. 1990;9(4):363–7 (discussion 369-83).
Article CAS PubMed Google Scholar
Sedgwick P. Multiple hypothesis testing and Bonferroni’s correction. BMJ. 2014;349: g6284.
Article PubMed Google Scholar
US Food and Drug Administration. International Conference on Harmonisation; choice of control group and related issues in clinical trials; availability. Fed Regist. 2001;66(93): p. 24390-1.
US Food and Drug Administration. Guidance for industry non-inferiority clinical trials. March 2010. https://downloads.regulations.gov/FDA-2010-D-0075-0002/attachment_1.pdf.
Kaul S, Diamond GA. Good enough: a primer on the analysis and interpretation of noninferiority trials. Ann Intern Med. 2006;145(1):62–9.
Article PubMed Google Scholar
Canales L, et al. Assessing the performance of clinical natural language processing systems: development of an evaluation methodology. JMIR Med Inform. 2021;9(7): e20492.
Article PubMed PubMed Central Google Scholar
McCambridge J, Witton J, Elbourne DR. Systematic review of the Hawthorne effect: new concepts are needed to study research participation effects. J Clin Epidemiol. 2014;67(3):267–77.
Article PubMed PubMed Central Google Scholar
Malikova MA. Practical applications of regulatory requirements for signal detection and communications in pharmacovigilance. Ther Adv Drug Saf. 2020;11:2042098620909614.
Article CAS PubMed PubMed Central Google Scholar
Vela D, et al. Temporal quality degradation in AI models. Sci Rep. 2022;12(1):11654.
Article CAS PubMed PubMed Central Google Scholar
Yu LX, et al. Understanding pharmaceutical quality by design. AAPS J. 2014;16(4):771–83.
Article CAS PubMed PubMed Central Google Scholar
Gao T, Yao X, Chen D. SimCSE: simple contrastive learning of sentence embeddings. 2021. arXiv pre-print server.
Fazi MB. Beyond human: deep learning, explainability and representation. Theory Cult Soc. 2021;38(7–8):55–77.
Article Google Scholar

Download references

Acknowledgments

This work was supported by AstraZeneca colleagues including Alex Kiazand, David Greatrex, and Maria Lägnert Hammar and the authors thank Mark Cherry, Denise Baker, Arundhati Ghosh, Charles Lee, Mel Mistretta, and Ryan McGowan for their GVP and regulatory guidance. The authors also thank the US Food and Drug Administration Artificial Intelligence/Machine Learning working group for the opportunity to present this work and the helpful feedback received during its development. Finally, the authors thank the reviewers for their support and very helpful critique.

Author information

Authors and Affiliations

Global Patient Safety, Chief Medical Office, AstraZeneca, Gaithersburg, MD, USA
Jiyoon Park, Daniel Chima, Douglas Domalik & Noel Southall
Enterprise AI Services, IGNITE IT, AstraZeneca, Mölndal, Sweden
Malek Djelassi
Enterprise AI Services, IGNITE IT, AstraZeneca, Cambridge, UK
Robert Hernandez & Vladimir Poroshin
Global Patient Safety, Chief Medical Office, AstraZeneca, Mölndal, Sweden
Ana-Maria Iliescu

Authors

Jiyoon Park
View author publications
You can also search for this author in PubMed Google Scholar
Malek Djelassi
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Chima
View author publications
You can also search for this author in PubMed Google Scholar
Robert Hernandez
View author publications
You can also search for this author in PubMed Google Scholar
Vladimir Poroshin
View author publications
You can also search for this author in PubMed Google Scholar
Ana-Maria Iliescu
View author publications
You can also search for this author in PubMed Google Scholar
Douglas Domalik
View author publications
You can also search for this author in PubMed Google Scholar
Noel Southall
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Noel Southall.

Ethics declarations

Funding

This study was funded by AstraZeneca.

Conflict of Interest

All authors are employees of AstraZeneca and may hold stock or stock options or restricted shares.

Availability of Data

The datasets generated or analyzed during the current study are available from the corresponding author on reasonable request; proprietary and/or sensitive safety data are not available for disclosure.

Ethical Approval

Not applicable.

Consent to Participate

Not applicable.

Consent for Publication

Not applicable.

Code Availability

The authors regret that we are unable to share the model prediction software, as it depends on commercial services and proprietary source code.

Author Contributions

Conception and design: DC, RH, VP, AI, DD, and NS. Collection and assembly of data: JP and MD. Data analysis and interpretation: JP, MD, DC, AI, DD, and NS. Manuscript writing: JP, DC, RH, AI, DD, and NS. Accountable for all aspects of the work: All authors.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 184 kb)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Park, J., Djelassi, M., Chima, D. et al. Validation of a Natural Language Machine Learning Model for Safety Literature Surveillance. Drug Saf 47, 71–80 (2024). https://doi.org/10.1007/s40264-023-01367-4

Download citation

Accepted: 10 October 2023
Published: 08 November 2023
Issue Date: January 2024
DOI: https://doi.org/10.1007/s40264-023-01367-4

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Validation of a Natural Language Machine Learning Model for Safety Literature Surveillance