Skip to main content
Log in

An Evaluation of Three Signal-Detection Algorithms Using a Highly Inclusive Reference Event Database

  • Original Research Article
  • Published:
Drug Safety Aims and scope Submit manuscript

Abstract

Background: Pharmacovigilance data-mining algorithms (DMAs) are known to generate significant numbers of false-positive signals of disproportionate reporting (SDRs), using various standards to define the terms ‘true positive’ and ‘false positive’.

Objective: To construct a highly inclusive reference event database of reported adverse events for a limited set of drugs, and to utilize that database to evaluate three DMAs for their overall yield of scientifically supported adverse drug effects, with an emphasis on ascertaining false-positive rates as defined by matching to the database, and to assess the overlap among SDRs detected by various DMAs.

Methods: A sample of 35 drugs approved by the US FDA between 2000 and 2004 was selected, including three drugs added to cover therapeutic categories not included in the original sample. We compiled a reference event database of adverse event information for these drugs from historical and current US prescribing information, from peer-reviewed literature covering 1999 through March 2006, from regulatory actions announced by the FDA and from adverse event listings in the British National Formulary. Every adverse event mentioned in these sources was entered into the database, even those with minimal evidence for causality. To provide some selectivity regarding causality, each entry was assigned a level of evidence based on the source of the information, using rules developed by the authors. Using the FDA adverse event reporting system data for 2002 through 2005, SDRs were identified for each drug using three DMAs: an urn-model based algorithm, the Gamma Poisson Shrinker (GPS) and proportional reporting ratio (PRR), using previously published signalling thresholds. The absolute number and fraction of SDRs matching the reference event database at each level of evidence was determined for each report source and the data-mining method. Overlap of the SDR lists among the various methods and report sources was tabulated as well.

Results: The GPS algorithm had the lowest overall yield of SDRs (763), with the highest fraction of events matching the reference event database (89 SDRs, 11.7%), excluding events described in the prescribing information at the time of drug approval. The urn model yielded more SDRs (1562), with a non-significantly lower fraction matching (175 SDRs, 11.2%). PRR detected still more SDRs (3616), but with a lower fraction matching (296 SDRs, 8.2%). In terms of overlap of SDRs among algorithms, PRR uniquely detected the highest number of SDRs (2231, with 144, or 6.5%, matching), followed by the urn model (212, with 26, or 12.3%, matching) and then GPS (0 SDRs uniquely detected).

Conclusions: The three DMAs studied offer significantly different tradeoffs between the number of SDRs detected and the degree to which those SDRs are supported by external evidence. Those differences may reflect choices of detection thresholds as well as features of the algorithms themselves. For all three algorithms, there is a substantial fraction of SDRs for which no external supporting evidence can be found, even when a highly inclusive search for such evidence is conducted.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Table I
Table II
Table III
Table IV
Table V
Table VI
Fig. 1
Table VII
Table VIII
Table IX

Similar content being viewed by others

References

  1. Syed RA, Marks NS, Goetsch RA. Spontaneous reporting in the United States. In: Strom BL, Kimmel SE, editors. Textbook of pharmacoepidemiology. West Sussex: John Wiley & Sons, Ltd, 2006: 91–116

    Google Scholar 

  2. Gould AL. Practial pharmacovigilance analysis strategies. Pharmacoepidemiol Drug Saf 2003; 12: 559–74

    Article  PubMed  Google Scholar 

  3. Meyboom RHB, Lindquist M, Egberts ACG, et al. Signal selection and follow-up in pharmacovigilance. Drug Saf 2002; 25(6): 459–65

    Article  PubMed  Google Scholar 

  4. Hauben M, Reich L. Communication of findings in pharmacovigilance: use of the term “signal” and the need for precision in its use. Eur J Clin Pharmacol 2005; 61(5–6): 479–80

    Article  PubMed  Google Scholar 

  5. Almenoff J, Tonning JM, Gould AL, et al. Perspectives on the use of data mining in pharmaco-vigilance. Drug Saf 2005; 28(11): 981–1007

    Article  PubMed  CAS  Google Scholar 

  6. Lindquist M, Stahl M, Bate A, et al. A retrospective evaluation of a data mining approach to aid finding new adverse drug reaction signals in the WHO International Database. Drug Saf 2000 Dec; 23(6): 533–42

    Article  PubMed  CAS  Google Scholar 

  7. Martindale W, Reynolds JEF, editors. Martindale: the extra pharmacopoeia. 36th ed. London: The Pharmaceutical Press, 2009

    Google Scholar 

  8. Physician’s desk reference. 54th ed. Montvale (NJ): Medical Economics Company, 1999

  9. Hauben M, Reich L. Safety related drug-labelling changes: findings from two data mining algorithms. Drug Saf 2004; 27(10): 735–44

    Article  PubMed  Google Scholar 

  10. Brown EG, Wood L, Wood S. The medical dictionary for regulatory activities (MedDRA). Drug Saf 1999 Feb; 20(2): 109–17

    Article  PubMed  CAS  Google Scholar 

  11. Joint Formulary Committee. British national formulary. 52nd ed. London: British Medical Association and Royal Pharmaceutical Society of Great Britain, 2006

    Google Scholar 

  12. Hauben M, Aronson JK. Gold standards in pharmacovigilance: the use of definitive anecdotal reports of adverse drug reactions as pure gold and high-grade ore. Drug Saf 2007; 30(8): 645–55

    Article  PubMed  Google Scholar 

  13. Naranjo CA, Busto U, Sellers EM. A method for estimating the probability of adverse drug reactions. Clin Pharmacol Ther 1981 Aug; 30(2): 239–45

    Article  PubMed  CAS  Google Scholar 

  14. Venulet J, Ciucci A, Berneker GC. Standardized assessment of drug-adverse reaction associations: rationale and experience. Int J Clin Pharmacol Ther Toxicol 1980 Sep; 18(9): 381–8

    PubMed  CAS  Google Scholar 

  15. Karch FE, Lasagna L. Toward the operational identification of adverse drug reactions. Clin Pharmacol Ther 1977 Mar; 21(3): 247–54

    PubMed  CAS  Google Scholar 

  16. US Food and Drug Administration. Guidance for industry. E2C clinical safety data management: periodic safety update reports for marketed drugs [online]. Available from URL: http://www.fda.gov/cder/guidance/1351fnl.pdf [Accessed 2007 Mar 22]

  17. Ashman CJ, Yu JS, Wolfman D. Satisfaction of search in osteoradiology. Am J Roentgenology 2000; 175: 541–4

    CAS  Google Scholar 

  18. Evans SJ, Waller PC, Davis S. Use of proportional reporting ratios (PRRs) for signal generation from spontaneous adverse drug reaction reports. Pharmacoepidemiol Drug Saf 2001 Oct–Nov; 10(6): 483–6

    Article  PubMed  CAS  Google Scholar 

  19. Hochberg AM, Reisinger SJ, Pearson RK, et al. Using data mining to predict safety actions from FDA adverse event reporting system data. Drug Inf J 2007; 41(5): 633–44

    Google Scholar 

  20. DuMouchel W. Bayesian data mining in large frequency tables, with an application to the FDA spontaneous reporting system (with discussion). Am Stat 1999; 53(3): 177–90

    Google Scholar 

  21. Woo EJ, Ball R, Burwen DR, et al. Effects of stratification on data mining in the us vaccine adverse event reporting system (VAERS). Drug Saf 2008; 31(8): 667–74

    Article  PubMed  Google Scholar 

  22. Hauben M, Patadia VK, Goldsmith D. What counts in data mining? Drug Saf 2006; 29: 827–32

    Article  PubMed  Google Scholar 

  23. Hauben M, Vogel U, Maignen F. Number needed to detect: nuances in the use of a simple and intuitive signal detection metric. Pharm Med 2008; 13: 1178–2595

    Google Scholar 

  24. Hauben M, Madigan D, Gerrits CM, et al. The role of data mining in pharmacovigilance. Expert Opin Drug Saf 2005; 4(5): 929–48

    Article  PubMed  CAS  Google Scholar 

  25. Aronson JK, Hauben M. Anecdotes as evidence. BMJ 2003; 326: 1346

    Article  PubMed  Google Scholar 

  26. Hochberg AM, Hauben M. Time-to-signal comparison for drug safety data mining algorithms versus traditional signaling criteria. Clin Pharmacol Ther. Epub 2009 Mar 25

  27. Chan KA, Hauben M. Signal detection in pharmacovigilance: empirical evaluation of data mining tools. Pharmacoepidemiol Drug Saf 2005 Sep; 14(9): 597–9

    Article  PubMed  Google Scholar 

  28. Hauben M. Trimethoprim-induced hyperkalaemia: lessons in data mining. Br J Clin Pharmacol 2004 Sep; 58(3): 338–9

    Article  PubMed  Google Scholar 

Download references

Acknowledgements

This work was funded by a grant from the Pharmaceutical Research and Manufacturers of America (PhRMA) to ProSanos Corporation. Dr Manfred Hauben, as a representative of the funding committee of PhRMA, participated in the design of the study, the interpretation of data and the editing of the manuscript. Alan Hochberg, Ronald Pearson, Donald O’Hara and Stephanie Reisinger are employees of ProSanos and their work was funded in part by the PhRMA grant. They were responsible for the design and conduct of the study, data collection management and analysis, and interpretation of the data. Dr Manfred Hauben is a full-time employee of Pfizer Inc. and owns stock in this and other pharmaceutical companies that may market/manufacture drugs mentioned in this article or competing drugs. David Goldsmith, Lawrence Gould and David Madigan participated as members of a project steering committee in the design and interpretation of the study and in the editing of the manuscript. They did not receive funding from PhRMA except for nominal reimbursement of incidental expenses related to the project. All authors participated in the preparation, review and approval of the manuscript. The authors thank Dr Lester Reich for his participation in the signal-adjudication process, and Dr Ivan Zorych of Rutgers University for the software implementation of the GPS algorithm. Patents are pending on technology discussed in this paper (rights assigned to ProSanos Corporation). The authors also thank the anonymous reviewers for many helpful comments received during the review of this manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alan M. Hochberg.

Appendix A

Appendix A

Results of Inter-Rater Adjudication Study

An experiment was performed to assess the effect of inter-rater variability in the adjudication process on the number of SDRs detected for various algorithms and report sources. Data-mining results for a subset of five drugs from the main study were selected and presented to three individuals for adjudication, as described in the Methods section. All three individuals had ≥2 years’ experience in drug safety data mining. Results of the adjudication and scoring process were tabulated, and GLMs of the Poisson family were constructed with ‘rater’ in addition to ‘algorithm’, ‘evidence level’ and ‘report source’ as stimulus factors, where ‘report source’ is derived from the ‘report source’ field in the AERS database. In a baseline model, ‘rater’ was included as a non-interacting factor, which simply accounted for the overall difference in the number of SDRs available for scoring from the three adjudicators; in other words, a scale factor. In the full model, interactions between rater and other variables (‘reference-match’, ‘category’, ‘algorithm’ and ‘report source’) were included. The results of adjudication of reference event database entries by the three individuals are shown in Appendix table 1. Note that rater 3 chose not to assign any SDRs to the categories of ‘confounding with demographic/clinical factors’ or ‘confounding with indication’. Results for the Poisson models are shown in Appendix figure 1. The interaction of ‘rater’ and ‘report source’ was non-significant. The interaction of ‘rater’ and ‘algorithm’ was statistically significant and the origin of this interaction is not known, since the raters were blind to algorithm. While statistically significant, this interaction accounts for a deviance of only 18.24, which is <0.1 % of the total model deviance, and thus is of negligible magnitude. The conclusion is that inter-rater variability should have a negligible effect on conclusions regarding various algorithms.

Table AX
figure Tab10

Adjudication of reference event database terms by three raters. ‘Matching’ refers to those terms that were matched to a signal of disproportionate reporting for at least one of the three algorithms (urn model[19], Gamma Poisson Shrinker[20] and proportional reporting ratio[18]) in a pilot study of five drugs

Fig. 2
figure 2

Poisson models for the inter-rater variability experiment. df= degrees of freedom; Max= maximum; Min= minimum; ns = not significant; χ2 = chi-square.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hochberg, A.M., Hauben, M., Pearson, R.K. et al. An Evaluation of Three Signal-Detection Algorithms Using a Highly Inclusive Reference Event Database. Drug-Safety 32, 509–525 (2009). https://doi.org/10.2165/00002018-200932060-00007

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.2165/00002018-200932060-00007

Keywords

Navigation