Abstract
Background: Pharmacovigilance data-mining algorithms (DMAs) are known to generate significant numbers of false-positive signals of disproportionate reporting (SDRs), using various standards to define the terms ‘true positive’ and ‘false positive’.
Objective: To construct a highly inclusive reference event database of reported adverse events for a limited set of drugs, and to utilize that database to evaluate three DMAs for their overall yield of scientifically supported adverse drug effects, with an emphasis on ascertaining false-positive rates as defined by matching to the database, and to assess the overlap among SDRs detected by various DMAs.
Methods: A sample of 35 drugs approved by the US FDA between 2000 and 2004 was selected, including three drugs added to cover therapeutic categories not included in the original sample. We compiled a reference event database of adverse event information for these drugs from historical and current US prescribing information, from peer-reviewed literature covering 1999 through March 2006, from regulatory actions announced by the FDA and from adverse event listings in the British National Formulary. Every adverse event mentioned in these sources was entered into the database, even those with minimal evidence for causality. To provide some selectivity regarding causality, each entry was assigned a level of evidence based on the source of the information, using rules developed by the authors. Using the FDA adverse event reporting system data for 2002 through 2005, SDRs were identified for each drug using three DMAs: an urn-model based algorithm, the Gamma Poisson Shrinker (GPS) and proportional reporting ratio (PRR), using previously published signalling thresholds. The absolute number and fraction of SDRs matching the reference event database at each level of evidence was determined for each report source and the data-mining method. Overlap of the SDR lists among the various methods and report sources was tabulated as well.
Results: The GPS algorithm had the lowest overall yield of SDRs (763), with the highest fraction of events matching the reference event database (89 SDRs, 11.7%), excluding events described in the prescribing information at the time of drug approval. The urn model yielded more SDRs (1562), with a non-significantly lower fraction matching (175 SDRs, 11.2%). PRR detected still more SDRs (3616), but with a lower fraction matching (296 SDRs, 8.2%). In terms of overlap of SDRs among algorithms, PRR uniquely detected the highest number of SDRs (2231, with 144, or 6.5%, matching), followed by the urn model (212, with 26, or 12.3%, matching) and then GPS (0 SDRs uniquely detected).
Conclusions: The three DMAs studied offer significantly different tradeoffs between the number of SDRs detected and the degree to which those SDRs are supported by external evidence. Those differences may reflect choices of detection thresholds as well as features of the algorithms themselves. For all three algorithms, there is a substantial fraction of SDRs for which no external supporting evidence can be found, even when a highly inclusive search for such evidence is conducted.
Similar content being viewed by others
References
Syed RA, Marks NS, Goetsch RA. Spontaneous reporting in the United States. In: Strom BL, Kimmel SE, editors. Textbook of pharmacoepidemiology. West Sussex: John Wiley & Sons, Ltd, 2006: 91–116
Gould AL. Practial pharmacovigilance analysis strategies. Pharmacoepidemiol Drug Saf 2003; 12: 559–74
Meyboom RHB, Lindquist M, Egberts ACG, et al. Signal selection and follow-up in pharmacovigilance. Drug Saf 2002; 25(6): 459–65
Hauben M, Reich L. Communication of findings in pharmacovigilance: use of the term “signal” and the need for precision in its use. Eur J Clin Pharmacol 2005; 61(5–6): 479–80
Almenoff J, Tonning JM, Gould AL, et al. Perspectives on the use of data mining in pharmaco-vigilance. Drug Saf 2005; 28(11): 981–1007
Lindquist M, Stahl M, Bate A, et al. A retrospective evaluation of a data mining approach to aid finding new adverse drug reaction signals in the WHO International Database. Drug Saf 2000 Dec; 23(6): 533–42
Martindale W, Reynolds JEF, editors. Martindale: the extra pharmacopoeia. 36th ed. London: The Pharmaceutical Press, 2009
Physician’s desk reference. 54th ed. Montvale (NJ): Medical Economics Company, 1999
Hauben M, Reich L. Safety related drug-labelling changes: findings from two data mining algorithms. Drug Saf 2004; 27(10): 735–44
Brown EG, Wood L, Wood S. The medical dictionary for regulatory activities (MedDRA). Drug Saf 1999 Feb; 20(2): 109–17
Joint Formulary Committee. British national formulary. 52nd ed. London: British Medical Association and Royal Pharmaceutical Society of Great Britain, 2006
Hauben M, Aronson JK. Gold standards in pharmacovigilance: the use of definitive anecdotal reports of adverse drug reactions as pure gold and high-grade ore. Drug Saf 2007; 30(8): 645–55
Naranjo CA, Busto U, Sellers EM. A method for estimating the probability of adverse drug reactions. Clin Pharmacol Ther 1981 Aug; 30(2): 239–45
Venulet J, Ciucci A, Berneker GC. Standardized assessment of drug-adverse reaction associations: rationale and experience. Int J Clin Pharmacol Ther Toxicol 1980 Sep; 18(9): 381–8
Karch FE, Lasagna L. Toward the operational identification of adverse drug reactions. Clin Pharmacol Ther 1977 Mar; 21(3): 247–54
US Food and Drug Administration. Guidance for industry. E2C clinical safety data management: periodic safety update reports for marketed drugs [online]. Available from URL: http://www.fda.gov/cder/guidance/1351fnl.pdf [Accessed 2007 Mar 22]
Ashman CJ, Yu JS, Wolfman D. Satisfaction of search in osteoradiology. Am J Roentgenology 2000; 175: 541–4
Evans SJ, Waller PC, Davis S. Use of proportional reporting ratios (PRRs) for signal generation from spontaneous adverse drug reaction reports. Pharmacoepidemiol Drug Saf 2001 Oct–Nov; 10(6): 483–6
Hochberg AM, Reisinger SJ, Pearson RK, et al. Using data mining to predict safety actions from FDA adverse event reporting system data. Drug Inf J 2007; 41(5): 633–44
DuMouchel W. Bayesian data mining in large frequency tables, with an application to the FDA spontaneous reporting system (with discussion). Am Stat 1999; 53(3): 177–90
Woo EJ, Ball R, Burwen DR, et al. Effects of stratification on data mining in the us vaccine adverse event reporting system (VAERS). Drug Saf 2008; 31(8): 667–74
Hauben M, Patadia VK, Goldsmith D. What counts in data mining? Drug Saf 2006; 29: 827–32
Hauben M, Vogel U, Maignen F. Number needed to detect: nuances in the use of a simple and intuitive signal detection metric. Pharm Med 2008; 13: 1178–2595
Hauben M, Madigan D, Gerrits CM, et al. The role of data mining in pharmacovigilance. Expert Opin Drug Saf 2005; 4(5): 929–48
Aronson JK, Hauben M. Anecdotes as evidence. BMJ 2003; 326: 1346
Hochberg AM, Hauben M. Time-to-signal comparison for drug safety data mining algorithms versus traditional signaling criteria. Clin Pharmacol Ther. Epub 2009 Mar 25
Chan KA, Hauben M. Signal detection in pharmacovigilance: empirical evaluation of data mining tools. Pharmacoepidemiol Drug Saf 2005 Sep; 14(9): 597–9
Hauben M. Trimethoprim-induced hyperkalaemia: lessons in data mining. Br J Clin Pharmacol 2004 Sep; 58(3): 338–9
Acknowledgements
This work was funded by a grant from the Pharmaceutical Research and Manufacturers of America (PhRMA) to ProSanos Corporation. Dr Manfred Hauben, as a representative of the funding committee of PhRMA, participated in the design of the study, the interpretation of data and the editing of the manuscript. Alan Hochberg, Ronald Pearson, Donald O’Hara and Stephanie Reisinger are employees of ProSanos and their work was funded in part by the PhRMA grant. They were responsible for the design and conduct of the study, data collection management and analysis, and interpretation of the data. Dr Manfred Hauben is a full-time employee of Pfizer Inc. and owns stock in this and other pharmaceutical companies that may market/manufacture drugs mentioned in this article or competing drugs. David Goldsmith, Lawrence Gould and David Madigan participated as members of a project steering committee in the design and interpretation of the study and in the editing of the manuscript. They did not receive funding from PhRMA except for nominal reimbursement of incidental expenses related to the project. All authors participated in the preparation, review and approval of the manuscript. The authors thank Dr Lester Reich for his participation in the signal-adjudication process, and Dr Ivan Zorych of Rutgers University for the software implementation of the GPS algorithm. Patents are pending on technology discussed in this paper (rights assigned to ProSanos Corporation). The authors also thank the anonymous reviewers for many helpful comments received during the review of this manuscript.
Author information
Authors and Affiliations
Corresponding author
Appendix A
Appendix A
Results of Inter-Rater Adjudication Study
An experiment was performed to assess the effect of inter-rater variability in the adjudication process on the number of SDRs detected for various algorithms and report sources. Data-mining results for a subset of five drugs from the main study were selected and presented to three individuals for adjudication, as described in the Methods section. All three individuals had ≥2 years’ experience in drug safety data mining. Results of the adjudication and scoring process were tabulated, and GLMs of the Poisson family were constructed with ‘rater’ in addition to ‘algorithm’, ‘evidence level’ and ‘report source’ as stimulus factors, where ‘report source’ is derived from the ‘report source’ field in the AERS database. In a baseline model, ‘rater’ was included as a non-interacting factor, which simply accounted for the overall difference in the number of SDRs available for scoring from the three adjudicators; in other words, a scale factor. In the full model, interactions between rater and other variables (‘reference-match’, ‘category’, ‘algorithm’ and ‘report source’) were included. The results of adjudication of reference event database entries by the three individuals are shown in Appendix table 1. Note that rater 3 chose not to assign any SDRs to the categories of ‘confounding with demographic/clinical factors’ or ‘confounding with indication’. Results for the Poisson models are shown in Appendix figure 1. The interaction of ‘rater’ and ‘report source’ was non-significant. The interaction of ‘rater’ and ‘algorithm’ was statistically significant and the origin of this interaction is not known, since the raters were blind to algorithm. While statistically significant, this interaction accounts for a deviance of only 18.24, which is <0.1 % of the total model deviance, and thus is of negligible magnitude. The conclusion is that inter-rater variability should have a negligible effect on conclusions regarding various algorithms.
Rights and permissions
About this article
Cite this article
Hochberg, A.M., Hauben, M., Pearson, R.K. et al. An Evaluation of Three Signal-Detection Algorithms Using a Highly Inclusive Reference Event Database. Drug-Safety 32, 509–525 (2009). https://doi.org/10.2165/00002018-200932060-00007
Published:
Issue Date:
DOI: https://doi.org/10.2165/00002018-200932060-00007