1. Introduction
It is critical to detect signals of adverse drug reactions from real-world data early enough to protect public health. From the real-world data, we could identify new effects of drugs that had not been identified during premarketing clinical trials. Adverse event (AE) information after drug marketing is often collected via a spontaneous reporting system to identify any long-term adverse drug reactions. In Korea, for example, the Korea Institute of Drug Safety and Risk Management (
www.drugsafe.or.kr) collects the information through a spontaneous reporting system.
Through this system, anyone, for example, a patient who has taken the drug, a doctor, or the manufacturer, can report an AE. They report information such as the symptoms of the AE, the date of onset, the name of the drug, the frequency and duration of the dose, patient information, and causality assessment information. As the causality can only be reported by medical experts, the information reported by the patients does not confirm that an AE has been caused by a particular drug. In addition, there could be issues related to data quality and under-reporting. The total number of people who received the drug and the number of AEs are not precisely known. It is difficult to determine a causal relationship between drugs and adverse effects from a spontaneous reporting system database. We can only identify signals of adverse drug reactions, so additional in-depth studies are needed [
1].
Statistical analysis is performed to detect whether any particular AEs have occurred more frequently or whether there are any unexpected AEs. Among the various data mining tools, disproportionality methods are widely used for AE signal detection. Different disproportionality methods exist based on different measures such as the reporting odds ratio (ROR) [
2], proportional reporting ratio (PRR) [
3], information component (IC), likelihood ratio test-based method (LRT) [
4], gamma Poisson shrinker (GPS) [
5], Bayesian confidence propagating neural network (BCPNN) [
6], new IC method [
7,
8], and the simplified Bayesian method (sB) [
9]. ROR, PRR, IC, and LRT are frequentist methods, while GPS, BCPNN, the new IC method, and sB are Bayesian methods [
10]. Some studies suggest that Bayesian methods, such as multi-item gamma Poisson shrinker (MGPS) and BCPNN, outperform frequentist methods such as PRR [
2,
3,
6,
7,
11]. Other studies showed that the sB method performed better than BCPNN and PRR [
9,
11]. Unlike other methods, the GPS method needs to estimate hyperparameters of the prior distribution using the whole data. Because of this process, the GPS method requires more computation time. Thus, most pharmaceutical companies and national and international pharmacovigilance organizations use other methods more often than the GPS method [
12].
Another type of data mining method for signal detection is the tree-based scan statistic (TreeScan) proposed by Kulldorff et al. [
13]. This method simultaneously searches for signals at any level (or layer) of AE in a hierarchical structure, adjusting for the multiple testing problem. It has been applied to drug safety surveillance as well as occupational disease surveillance [
13,
14].
Both LRT and TreeScan methods were developed based on a likelihood ratio test with the test statistic as the maximum likelihood ratio. Moreover, both methods use the Monte Carlo method to obtain the empirical distribution for statistical inference. The LRT method can handle AEs such as system-organ classes (SOC), preferred terms (PT), or included terms (IT) (only one layer, not all the layers together). If the AE is coded as PT, the LRT method detects the signal of single PT. The TreeScan method for detecting AE signals for a fixed drug in data with multiple layers may consider all the layers, and search for signals of PT and SOC (or other layers) together. The LRT method is more general and covers different aspects of safety signal detection. The TreeScan method is for detecting AE or certain prespecified AE groups as signals for a fixed drug, which is a special case of the LRT method.
Clinical information related to adverse drug reactions is generally coded using medical dictionaries such as the World Health Organization Adverse Reaction Terminology (WHO-ART) and the Medical Dictionary for Regulatory Activities (MedDRA), which have a hierarchical structure. There are several studies comparing the performance of disproportionality methods in a usual database format with AEs and drug combination data [
2,
3,
6,
7,
9,
11,
12,
15]. However, it is not clear which method performs better than the others when AEs are classified into a hierarchical structure. There is a study comparing the application of the GPS and TreeScan methods to real cohort data by Brown et al. [
15]. They showed that the signaled regions were similar. However, in some cases, the TreeScan method detected signals that were not detected by the GPS. We do not know which results are more reliable.
The purpose of this study was to compare the performance of several different data mining methods for signal detection in adverse drug event data grouped into hierarchical structures. Through an extensive simulation study, we evaluated the performance of ROR, PRR, IC, LRT, GPS, BCPNN, sB, and TreeScan on datasets generated based on the WHO-ART’s hierarchical structure. Originally, the methods except TreeScan were not developed for hierarchically structured data. We evaluated all the methods by considering all layers, instead of limiting to a single layer, to better reflect the hierarchical data structure and make fair comparisons. We used the type I error rate, power, sensitivity, and positive predictive value as performance measures. We also compared the application results of the methods to a real dataset from the Korea adverse event reporting system (KAERS).
2. Signal Detection Method
Several different data mining methods have been developed to detect unusually high disproportionate reporting rates from the large drug safety databases. In this paper, we considered ROR, PRR, IC, LRT, GPS, BCPNN, new IC, sB, and TreeScan, which have been relatively widely used. We mainly referred to Huang et al. [
10] for a review of the methods apart from TreeScan.
From a large drug safety database, the number of AEs by drugs can be presented in matrix form with
rows of AEs and
columns of drugs. For a particular AE (
ith AE) at a particular drug (
jth drug), the data can be summarized in a 2 × 2 table, as shown in
Table 1.
2.1. Frequentist Methods
2.1.1. Reporting Odds Ratio (ROR)
The ROR is the odds ratio that a particular AE is reported in patients who take a specific drug compared to patients who take other drugs [
2]. The ROR for the
ith AE and the
jth drug (
) is estimated as
and if either
or
is equal to 0, then
is not defined. The log-transformed
can be approximated to a normal distribution as follows:
We can obtain an approximate 100(
confidence interval (CI) for
as:
where
and
is the standard normal distribution’s cumulative distribution function.
The null and the alternative hypotheses to test whether the
ith AE for the
jth drug is a signal or not are expressed as:
As Evans et al. [
3] suggested, the lower bound of
, so we reject the null hypothesis and conclude that the
ith AE can be interpreted as a signal of the disproportionate rate (SDR) for the
jth drug.
2.1.2. Proportional Reporting Ratio (PRR)
The PRR is the ratio of the proportion of patients who reported a particular AE after taking a specific drug to the proportion of patients who have taken other drugs that reported the same AE [
3]. We estimate the PRR for the
ith AE and the
jth drug (
) as:
If
and
, then
. We use the normal approximation for the distribution of
for inference as follows:
Therefore, an approximate 100(
CI for
is expressed as follows:
The null hypothesis of is rejected if the lower bound of , as for PRR.
2.1.3. Information Component (IC)
The IC is based on the relative reporting rate
, which indicates how many particular events were reported in excess for a specific drug over the expected number of reported counts under the null hypothesis that a drug and AE are independent. The relative reporting rate is estimated by
, where
is the expected number of reports for the
ith AE and the
jth drug under the null hypothesis. The IC for the
ith AE and the
jth drug is defined as follows:
The
is estimated as
for
and
. The estimated variance of
is given by
. An approximate 100(
CI for
is expressed as follows:
If the lower bound of > 1, the ith AE can be interpreted as a signal.
2.1.4. Likelihood Ratio Test-Based Method (LRT)
Huang et al. [
4] proposed the likelihood ratio test statistic, which controls the type I error and false discovery rates by using Monte Carlo hypothesis testing. The null and alternative hypotheses to test whether the
ith AE for a specific drug (
is a signal or not are expressed as follows:
where
and
are defined as the reporting rates of
ith AE and other AEs for a specific drug, respectively. The maximum likelihood ratio (MLR) is expressed as follows:
where
I() is the indicator function,
, and
. As the distribution of MLR under the null hypothesis is unknown, the Monte Carlo hypothesis testing is used to calculate
p-values. For details, see
Section 2.3.
2.2. Bayesian Method
2.2.1. Gamma Poisson Shrinker (GPS)
DuMouchel [
5] suggested the GPS method, which is an empirical Bayes signal detection method. The GPS method uses the relative report rate, defined as follows:
This indicates the actual frequency compared to the expected frequency.
is calculated under the null hypothesis that there is no association between the drug‒AE pairs. The null and alternative hypotheses are expressed as follows:
The GPS method assumes that the model and prior distributions are as follows:
where the observed report count
follows the Poisson distribution with unknown mean
. The relative report rate follows the mixture gamma distribution where
is a gamma distribution with mean
and variance
and
is the prior probability that
came from the first gamma distribution of mixture. The hyperparameters
are estimated by the empirical Bayes method, which is also known as the maximum marginal likelihood.
As gamma distribution is a conjugate prior for Poisson distribution, the posterior distribution of
can be obtained in a closed form as follows:
where
is the posterior probability that
came from the first gamma distribution of the mixture. This is expressed as follows:
where
is the marginal distribution. This marginal distribution follows the negative binomial distribution as follows:
where
.
The 5th percentile of the posterior distribution of
(EB05) is used for decision making. EB05 can be obtained by solving the equation as follows:
This integral can be solved easily using iterative techniques such as Newton’s method. If EB05() is greater than 2, this drug‒adverse effect pair is considered a signal of disproportionate rates (SDR).
2.2.2. Bayesian Confidence Propagation Neural Network (BCPNN)
Bate et al. [
6] proposed the BCPNN method based on the IC measure. In the BCPNN method, the IC measure was defined as follows:
The observed reporting counts and marginal counts are assumed to follow a binomial distribution with a beta distribution for priors as follows:
where
and
.
Using the delta method, the posterior mean and variance of
can be obtained as follows:
where
.
The lower limit of the 95% credible interval for
is calculated by:
and if
is greater than 2, this drug‒AE pair is a possible signal with a higher reporting rate.
2.2.3. New IC Method
The new IC is an improved method for posterior inference in IC analysis, including an accurate estimate for the mode and significantly improved credibility interval estimates. This method also assumes the number of reports where denotes the relative reporting rate. The prior of parameters is given by and the posterior distribution of is given by . Then, the New is the posterior mean of , which is .
The 95% credible interval limits (
are obtained by:
for α = 0.025 and α = 0.975. If the lower limit
, the
ith AE can be interpreted as a signal.
2.2.4. Simplified Bayesian
For small datasets, the GPS method is usually not recommended because of instability in the estimation of the hyperparameters. Thus, Huang et al. [
9] suggested the simplified Bayesian (sB) method, which assumes a weaker assumption on prior distribution than the GPS method. The sB method uses a single gamma distribution as a prior as follows:
with mean 1 and variance
. Huang et al. [
9] proposed using three values (0.5, 0.01, and 0.0001) for
. They also called the prior distribution with
a less noninformative prior. The other prior distributions were called noninformative priors. The posterior distribution is also a single gamma distribution as follows:
The lower bound of the 95% credible interval for
(
) is used for detecting signals of SDR.
is expressed as follows:
If
is greater than 2, this drug‒AE pair is a possible signal with a higher reporting rate. With
, the sB method is identical to the new IC method [
10]. Hence, we only included the sB method in the simulation.
2.3. Tree-Based Scan Statistic
In a medical dictionary, all AEs are categorized into a hierarchical tree structure. Kulldorff et al. [
13,
14] proposed the tree-based scan statistic, which simultaneously searches for signals at any level (or layer) of AEs in a hierarchical structure. We call the last cell of the tree a leaf and the rest a node. That is, the higher level of leaves is the node. A higher-level node is defined as the parent node; the lower level node is defined as the child node.
is the observed number of AEs for each leaf I and
is the total observed number of AEs reported in patients who have taken a specific drug j and
is the total number of AEs reported in patients who have taken any drugs.
When the branches of a tree are cut, the sum of the observed and total number of AEs in the leaves of each cut, G,
and
, respectively, are obtained. G includes both the child nodes and parent nodes as a unit of AE. For each cut G, we can calculate the log likelihood ratio and test statistic:
where
I() is the indicator function. The cut G that maximizes LR(G) is the most likely cut of related AEs. The null hypothesis implies that the group defined by cut G has the same ratio of observed to expected AEs as the rest of the tree. In inference, Monte Carlo hypothesis testing is used, calculating the most likely cut in each random dataset. Firstly, the likelihood of the most likely cut in a real dataset is calculated. Secondly, 9999 random datasets are generated under the null hypothesis and the test statistic for each random dataset calculated. Then, the
p-value is calculated as R/(9999 + 1), where R is the rank of the test statistic of real dataset compared with random datasets.
The LRT and TreeScan methods basically use the same test statistic. Because the TreeScan considers the hierarchical structure in nature, the distribution of the test statistic is also obtained by comparing all possible cuts in the hierarchical structure. Even if the two methods detected the same signal, p-values could be different.
5. Discussion
A number of disproportionality methods for data mining and the TreeScan method were compared for signal detection during drug surveillance for AEs data grouped into hierarchical structures. We included various frequentist methods such as ROR, PRR, IC, LRT, and TreeScan as well as Bayesian methods such as GPS, BCPNN, and sB. The LRT, GPS, BCPNN, sB, and TreeScan methods detected fewer signals than the ROR, PRR, and IC methods. The power and sensitivity of the GPS, sB, LRT, and TreeScan methods tended to be lower than those of others, which implies that these methods are more conservative. The higher power and sensitivity of the ROR, PRR, and IC methods seemed to be due to the higher type I error rates. The three methods had lower PPV. The TreeScan method controls the type I error rate at the desired level, while other methods cannot control this or find appropriate cutoffs for the desired type I error rate. However, no method was superior to the others in relation to all performance measures.
We observed similar patterns in the analysis results of the KAERS data. The GPS and sB methods detected much fewer signals than the others overall. For the two specific drugs, some common AEs were detected by all methods. The ROR, PRR, and IC methods detected additional signals that were not detected by the GPS, sB, LRT, or TreeScan methods. The ROR and PRR methods detected rather too many signals, even if the number reported was small. Thus, the restriction of three or more cases for the reported count to be a signal for the ROR and PRR methods, which is usually imposed in practice [
3], might be sensible.
In terms of computation time, the GPS, LRT, and TreeScan methods are more intensive relative to the other methods. Other methods have a closed form for the confidence interval of each statistic, so only the cell count () and marginal count ( or ) of the matrix are required to calculate the confidence interval. On the other hand, the GPS method requires all cell counts in the matrix to estimate the parameters of prior distribution. For the LRT and TreeScan method, a Monte Carlo simulation is required to obtain p-values.
The methods considered in this paper are approaches that can be applied to an existing database. In some cases, one may want to continuously or sequentially monitor to detect a signal as early as possible. The sequential probability ratio test (SPRT) [
21,
22] can be used. The method has also been applied to a spontaneous adverse event reporting system [
23,
24]. However, the result of the SPRT method is highly dependent on the relative risk used to specify the alternative hypothesis [
25]. Although we did not include the SPRT in this study for these reasons, it would be interesting to compare the method in appropriate situations in future research.
The drug safety databases such as KAERS are constructed by a spontaneous reporting system and very few AEs that occur were reported, so it has a large number of zero-count cells. In this situation, a zero-inflated Poisson model could be considered. Hu et al. [
11] proposed ZIP-sB and ZIP-DP (Dirichlet process). Huang et al. [
26] proposed a zero-inflated Poisson (ZIP) model based on the likelihood ratio test. According to these research findings, ZIP models detected fewer signals in data containing a large number of zero-counts. This means that they are more conservative by considering zero-counts. In a further study, we will evaluate the performance of ZIP models and apply them to real data to compare.
Huang et al. proposed extending the likelihood ratio test-based (LRT) methods [
9] that can detect signals for including a single AE or several AEs within one AE group. The extended LRT method could be used for hierarchical structures of AEs for a fixed drug. The threshold for a signal for multiple-layer analysis should be higher than that for single-layer analysis. It will be very interesting to see the simulation results by comparing the Extended LRT vs. TreeScan with multiple layers (PT, SOC, or others). This is a future research topic.
Currently, some drug companies have different AE detection criteria. For example, AstraZeneca detects an AE when the EB05 is greater than 1.8, whereas GlaxoSmithKline detects AE when it is greater than 2 [
12]. In our study, it was confirmed that the performance of each method could vary depending on the cutoff, which is the criteria for signal detection in simulation. Therefore, how to set the cutoff for signal detection is very important and worth noting.