Likelihood-Ratio-Test Methods for Drug Safety Signal Detection from Multiple Clinical Datasets

Pre- and postmarket drug safety evaluations usually include an integrated summary of results obtained using data from multiple studies related to a drug of interest. This paper proposes three approaches based on the likelihood ratio test (LRT), called the LRT methods, for drug safety signal detection from large observational databases with multiple studies, with focus on identifying signals of adverse events (AEs) from many AEs associated with a particular drug or inversely for signals of drugs associated with a particular AE. The methods discussed include simple pooled LRT method and its variations such as the weighted LRT that incorporates the total drug exposure information by study. The power and type-I error of the LRT methods are evaluated in a simulation study with varying heterogeneity across studies. For illustration purpose, these methods are applied to Proton Pump Inhibitors (PPIs) data with 6 studies for the effect of concomitant use of PPIs in treating patients with osteoporosis and to Lipiodol (a contrast agent) data with 13 studies for evaluating that drug's safety profiles.


Introduction
Meta-analysis approaches for multiple independent studies have become very popular in medical research. In many observational and/or clinical trial studies, meta-analysis can be performed using the study-level summary measures or patient-level information; for example, the studies can be integrated using a common statistical measure such as the study-level mean or effect size and computing a weighted average of this common measure using a statistical approach such as a fixed-effect model or a random-effects model [1]. e weights are usually related to the study-level sample sizes or within study variation but may depend on other factors.
is type of approach is referred as the traditional metaanalysis and is being extensively used (as supportive) in the pre-and postapproval of drug products for evaluating their efficacy and safety. e traditional meta-analysis of many large and small clinical trials, published studies, registries, and large clinical and/or observational databases, for thorough evaluation of clinical efficacy endpoints such as the mean change in the weight-loss or blood-pressure and hazard ratio in survival comparison and clinical safety endpoints such as odds ratio, risk ratio, and absolute risk difference, has become a common practice for a modern-day pre-and postmarket clinical/observational studies [1,2]. For example, a number of meta-analyses of rosiglitazone trials for patients with type-2 diabetes have been conducted to evaluate the risk for myocardial infarction (MI) and cardiovascular mortality [3], whereas in a meta-analysis of 15 clinical trials submitted to FDA during 1987-2012, Borges et al. [4] reviewed randomized withdrawal maintenance trials for major depressive disorder.
Using the traditional meta-analysis for safety evaluation, researchers can evaluate the point estimates and 95% confidence intervals for odds ratio or risk ratio of the drug-AE pair of interest from each study, and then combine the estimates through a fixed-effect model or a random-effects model, produce an overall estimate of the parameter of interest and its associated 95% confidence interval, and then display the results using a forest plot. Here, we intend to extend the exploration of using traditional meta-analysis to safety signal detection, where relative risks (RRs) are commonly used when the drug exposure information is available, and they are usually called the risk ratios. e relative event rates or proportional reporting rates are used when there is lack of drug exposure information, which is usually the case in passive surveillance of medical products. It is important to explore safety signals in each study; however, when studying safety signals, researchers usually collect information from many trials (or studies) since a single clinical study with focus on efficacy cannot provide enough information for safety events. e clinical studies, included in a large safety data or database, are usually independent studies with different protocols. It is possible that a signal detected in one study may not be detected in other studies due to variation across studies (in terms of sample sizes, study sites, personnel, patients enrolled, study time, and others).
Several methods have been developed for data mining or safety signal detection for exploring multiple drugs and AEs (for example, proportional reporting ratios [5], reporting odds ratios [6], likelihood ratio tests [7][8][9], and Bayesian methods [10][11][12][13]). However, these signal detection methods usually work on pooled large passive data and are not designed to incorporate the heterogeneity from multiple studies. Here, we propose new methods for drug safety signal detection (with an intent to control the type-I error and false discovery rate), for data with multiple studies, obtained from large observational databases such as FDA event reporting system (FAERS; https:// open.fda.gov/data/faers/) or from clinical trial databases. e new methods utilize the regular likelihood ratio test (LRT) for signal detection [7] and consist of a two-step approach for exploring safety signals from multiple studies/sources. In the first step, the regular LRT is applied to the safety data by study and in the second step, the regular LRT test statistics from different studies are combined to derive an overall test statistic for conducting the global test at a prespecified level of significance, and if the global null is rejected in favor of the global alternative, the data provides evidence of a signal, overall. e paper is organized as follows. In Section 2, we give a brief review of the basic LRT method for signal detection (regular LRT) and introduce several methods, based on regular LRT, for signal detection from multiple studies. In Section 3, the proposed LRT methods for signal detection are applied to two datasets for illustration: first one, to a dataset on concomitant use of PPI drugs for patients taking drugs treating osteoporosis, with interest in comparing two drug groups (PPI + placebo vs. placebo only) from 6 selected studies; and second one to a selected set of 13 published studies on Lipiodol (a contrast agent) with maximum dose of 15 mg. A simulation study is conducted to evaluate the performance of the LRT analysis methods for multiple studies in Section 4. We conclude Section 5 with a discussion.

A Summary of Regular LRT.
e likelihood-ratio-test based method for signal detection developed originally for passive surveillance of large safety databases and is available to public for use in openFDA (https://open.fda.gov/tools/), called here as the regular LRT method, is a frequentist method based on multiple 2 × 2 tables [7]. For a particular AE j, of interest, there are I 2 × 2 tables if there are a total of I drugs in the study. Here, the drugs are considered different rows, and the jth AE can be considered as a column (see Section 3.1). If, for a particular drug, one wants to compare many AEs, drug should be considered as a column variable and the AEs should be the rows (see Section 3.2).
Define n ij as the cell count for ith row (e.g., drug) and jth column (e.g., AE) and assume that n ij ∼ ind Poisson (n i. p ij ), i � 1, . . . I, where p ij is the reporting rate of ith drug for jth AE, and that ((n .j − n ij ) ∼ ind Poisson(n .. − n i. )q ij ), i � 1, . . . I, where q ij is the reporting rate of all other drugs excluding ith drug for jth AE. Here, n i. � j�J j�1 n ij , n .j � i�I i�1 n ij , and n .. � j n .j . Dropping the suffix j in p ij and q ij , assume that AE j is fixed, the interest is to test the null hypothesis H 0 : p i � q i � p 0 , against the alternative hypothesis that H a : p i > q i (i.e., RR i � p i /q i > 1) for at least one i, i � 1, · · · , I. e likelihood ratio statistic for Drug i and AE j , as derived in [7], is where n .. � I i�1 n i. and E ij � n i. n .j /n .. . e maximum likelihood ratio (MLR) test statistic, for the one-sided alternative, is where the maximum is taken over i � 1, · · · , I. Since logarithm log(LR ij ) is a monotonic (increasing) function of LR ij , so it is convenient to work with MLLR � max i (log(LR ij )I(p i > q i )). e above formulation was constructed assuming there is no drug exposure information in the large postmarket safety database from passive surveillance system. In this case, "no drug exposure" usually refers to the fact that we may know how many adverse events are reported with respect to a certain drug in a passive surveillance system, but we may not know the number of patients who actually took the drug and the drug exposure information for each person. erefore, n i. was used to serve as an approximation of total drug use and relative reporting rates were compared for such an analysis using data from FDA adverse event reporting system (FAERS; https://open.fda.gov/data/faers/).
When the drug exposure for ith drug (P i ) is available, all n i. can be replaced by P i and the relative risks can then be compared with available drug exposure information (see some definitions in Huang et al. [8]). Drug exposure information may be available in a legacy database including data from completed clinical trials or data from ongoing clinical trials (for safety monitoring purpose). In clinical trial data, the drug exposure for a patient is usually well-defined and prespecified as the total dose taken by the patient during the study, or the exposure time from a certain amount of drug. In some cases, we may not have well defined drug exposure information from completed clinical trials. For example, the precise drug exposure for the concomitant use of PPI is not collected in the studies included in Section 3.1, where we may have to impute the exposure with some reasonable assumptions.
Note that in order to detect signals using information from multiple studies, the drug exposure definition should be consistent and comparable across different studies considered in a single meta-analysis. More details will be discussed in the applications. e log likelihood ratio statistic is then written as log LR ij � log n ij /P i n ij n .j − n ij /P . − P i n .j −n ij n .j /P .
where P . � I i�1 P i , I i�1 P i /P . � 1, and E ij � P i n .j /P . . Since the distribution of MLLR test statistic under the null hypothesis is not tractable, a Monte Carlo procedure (MC) is used to obtain the empirical distribution of MLLR. e empirical distribution of MLLR under the null hypothesis can now be obtained by generating a large number of Monte Carlo samples for the cell-report counts (n 1j , · · · , n Ij ) and for the j th AE, using multinomial distribution (n 1j , · · · , n Ij ) | n .j ∼ Mult(n .j , ((n 1. /n .. ), · · · , (n I. /n .. ))) with known n .j as the total number of events. If the drug exposure is available, the distribution is then (n 1j , · · · , n Ij ) | n .j ∼ Mult(n .j , ((P 1 /P . ), · · · , (P I /P . ))). If the MLLR based on the observed data, MLLR data , is greater than the threshold value of MLLR 0.05 (the upper 5 th percentile point of the empirical distribution), the null hypothesis is rejected with alpha � 0.05. e p value of MLLR can be calculated as 1 − rank of MLLR data among MLLR data and MLLRs from the empirical distribution 1 + total number of simulation for the empirical distribution generation .
e drug associated with MLLR data is then the most significant signal detected.

LRT Analysis Approaches for Signal Detection from
Multiple Studies. Here, we propose several LRT approaches based on the regular likelihood ratio test (LRT) for safety signal detection with multiple studies. Note that in the following, logLR ijs or logLR is can be calculated by the formula described in Section 2.1 by study.

Analysis of Pooled Data from Several Studies Using
Regular LRT. Suppose there are a total of S studies or datasets. Let n ij (� s n ijs ) denotes the total of event/report counts for ith drug and jth AE, summed over all the S studies (note that the subscript i is used for drug here and that one can define the row as drug or AE depending on the interest). Using this definition of "pooled" n ij , we can apply the regular LRT to detect the drug signals. However, the regular LRTapplied to the pooled data may not control the type-I error as the Monte Carlo simulation for obtaining the empirical distribution of the test statistic is carried out based on the pooled data, but not the study-level data. We observed this issue in the simulation study.
Another issue with this analysis of pooled studies is that it does not address the study to study variation, that is, heterogeneity of studies. Study heterogeneity may come from different sources including study designs (prospective versus retrospective), different endpoints, different distributions of effect modifiers, and different source of data. erefore, the analysis of the pooled studies without considering the heterogeneity may lead to biased results. is method is also vulnerable to Simpson's paradox [14,15] and should be used with caution. For example, in a medical study for evaluating kidney stone treatment [16,17], the paradoxical conclusion is that treatment A is more effective when used on patients with small stones and also when used on patients with large stones, yet treatment B is more effective on all patients (combined data).
In the following subsections, two LRT approaches for incorporating study-level heterogeneity are presented. (MMLR). Assume there are a total of S studies (with similar patients and objectives and are relevant for the purpose of current active/passive surveillance safety study), we define MLLR statistic for a fixed AE (j) of interest and sth study is MLLR s � max i (log(LR ijs )) � max i (log(LR is )) dropping the suffix j. en, the test statistic for testing the global null hypothesis versus the global alternative hypothesis is the maximum of MLLR s over all studies defined by MMLLR � max s (MLLR s ). e empirical distribution of MMLLR can be obtained by Monte Carlo simulation by generating the null data with n .s and P is from observed data and with the same relative risk for all rows from each study, s � 1, . . . , S, and then calculating MMLLR � max s (max i (log(LR is )). Like the regular LRT, MMLR controls the type-I error.

Maximum of MLLR Statistics from Multiple Studies
A drug with MMLLR from observed data (for a particular study) is a signal if the related p value (the rank of the MMLLR from the observed data among the MMLLR values obtained from empirical data divided by the total number of empirical data) is less than a prespecified significance level (such as 0.05). Furthermore, if interested, we can identify secondary drug-study combinations as signals with logLR values (log(LR is ), i � 1, · · · , I, s � 1, · · · , S) as the second largest, third largest, and forth largest values among all values for the drug-study combinations.

Weighted LRT Using Total Drug Exposure as Weight (wLRn).
In this subsection, we assume fixed jth column and drop the suffix j in the following derivations.
Let P is be the total drug exposure for ith drug in sth study. en, the weighted LRT statistic, based on the total drug exposure, is defined as wLR i � ( S i s�1 P is log(LR is ))/ ( S i s�1 P is ), where S i denotes the number of studies for the ith drug, and note that S i , i � 1, · · · , I could be different for different rows. wLR i can be interpreted as the weighted average of logLR from different studies for ith row with weight P is . e test statistic for testing the global null hypothesis versus global alternative hypothesis is then defined as MwLR � max i (wLR i ), where the maximum is obtained over all drugs, i � 1, · · · , I.
For statistical inference of wLRn method, the simulated null datasets are generated from a multinomial distribution with n .s and P is from observed data and with the same relative risk for all rows by study. e empirical distribution of wLR is formed by the 10,000 wLR sim obtained from 10, 000 simulated null data. e p value of the wLR obs is obtained by comparing the wLR obs with the 10,000 wLR sim values from the Monte Carlo process: p value � # of times wLR sim > wLR obs 10, 000 .
If the wLR obs for ith drug (row) has p value < 0.05, then the ith drug is a signal. After detecting the global signal, we can move to the 2nd largest, 3rd largest logLR or weighted logLR values, and so on for secondary signals.
In summary, the statistics discussed in Section 2.2 are presented in Table 1.

Applications
We illustrate the use of the LRT methods by applying them to two datasets with multiple studies for safety signal exploration. e first data is hypothetical, but based on real situation in the PPI data from FDA legacy database. e second data includes 13 published clinical studies on Lipiodol (a contrast agent) from literature search.
In both examples, we tried to include studies with similar features for fair comparison (such as similar patients, similar drugs, and similar objectives).

Analysis of PPI Data with Two Drugs and a Composite AE.
Proton Pump Inhibitors (PPIs) are a class of drugs that decrease gastric acid secretion through inhibition of the proton pump. It has been found that PPIs are associated with increased risk of hip fractures (adverse event) [18,19]. Huang et al. [8] evaluated if the concomitant use of PPIs reduced the efficacy of test drugs intending to treat osteoporosis among targeted patients, using clinical trial data from FDA/OTS/OCS legacy database. at database contained data from 10 trials (including single-arm studies, twoarm studies, and three-arm studies). One medication (test drug for treating osteoporosis, active control, or placebo) will be given to patients in one arm, and PPIs were given to patients in different arms concomitantly. e sample sizes of the trials range from hundreds to more than thousands. e main focus was on the composite AE (AEOST as defined in Huang et al. [8], Appendix A1), which includes many AE terms related with osteoporosis symptoms. After further examination of this data, we noticed that one trial does not have placebo arm, one trial has placebo arm but does not have subjects with concomitant PPIs, and two trials do not have AEOSTevent reported in placebo + PPIs (PLandPPI) or placebo only (PL) groups. For illustration, we selected 6 trials with AEOST events reported and with partial subjects taking concomitant PPIs in the placebo arm. Note that the patients were randomized into test drug, active control, and placebo arms in those trials. e effect of PPIs and the other drugs (test drug and active control drugs) cannot be separated if they were used together in test drug arm and active drug arm. In the following, we illustrate the analysis of safety signals using the hypothetical data with 6 studies, which reflects the data pattern of the PPI clinical data for comparing PLandPPI and PL.
Two AEs considered here are the 1st occurrence of AEOST (denoted by 1occ) and repeated occurrences of AEOST (denoted by allocc). We evaluate the relative risks of the 1st occurrence of AEOST (or repeated occurrences of AEOST) for patients in PLandPPI group with exposure of placebo and concomitant PPIs vs. patients in PL group with exposure of placebo only. For 1occ analysis, n is is the number of events for ith row (drug: placebo and PPIs together or placebo only) and sth study when one subject having only one event (1st occurrence of the repeated AEOST); P is is the exposure (sum of the exposure times in units of person-day) to the 1st occurrence of AEOST from all subjects) for ith drug (row) and sth study. For allocc analysis, n is is the number of events for ith drug (row) and sth study when one subject has several repeated events for one AE such as AEOST, and P is is the exposure (sum of the drug exposure time from all subjects) for ith drug (row) and sth study. Note that a subject's exposure time here is defined to be the time period of the subjects with placebo in PL group (time from taking placebo to end of the study or drop-off). e exposure of concomitant PPIs in PLandPPI group is not well recorded and is always shorter than placebo period; therefore, we assume that the exposure of placebo and concomitant PPIs for subjects in PLandPPI group is simply the period of placebo exposure. e actual dose and exposure time of concomitant PPIs may vary by patient and the pattern may not be consistent with the total placebo exposure in the PLandPPI patients, which may introduce bias in evaluating the relative risk of PPIs together with placebo vs. placebo only.
Using traditional meta-analysis based on relative risks of safety issues, one may obtain an overall relative risk and 95% CI using fixed-effect or random-effects models (Borenstain et al. [1], chapters 11 and 12). e τ 2 is 0 for the 1occ analysis and 0.07 for the allocc analysis. erefore, the integrated results from fixed-effect model and random-effects model are almost the same. e overall relative risk and 95% CI is 1.87 (1.43, 2.45) for the 1occ analysis and 2.44 (2.02, 2.94) for the allocc analysis. e results are shown in Figure 1 by a forest plot.
We also analyzed these data using the LRT methods, namely, simple pooled analysis with regular LRT, MMLR, and wLRn. Note that the application with two drugs can be easily extended to multiple drugs using the step-down procedure in LRT analysis methods (not traditional metaanalysis methods). For example, if one drug (Drug A) vs. other drugs is a signal, another drug (Drug B) could be a secondary signal if the value of the test statistic has a p value smaller than 0.05 when there are more than two drugs.
For simple summary, the events and the relative risks (rr) of PLandPPI and PL with 95% confidence intervals by study are shown in Table 2. e results from regular LRT on individual study and the LRT analysis methods for multiple studies together are shown in Table 3. e 95% threshold in Table 3 is the 95% percentile of the empirical distribution of the related STAT.
e individual study analysis shows that the findings of the signals may vary in different studies with various levels of signal strength.
e simple pooled analysis without considering the study variation and the MMLR and wLRn methods each considering the study-level variability have consistent results (AEOST is a signal for PLandPPI group when compared with PL only group). MMLR provides the strongest global signal of AEOST (along with the related study) as the integrated result. Stronger signal patterns were observed for the repeated occurrences analyses due to the large sample sizes. AEOST tends to be a signal for subjects taking concomitant use of PPIs (in PLandPPI group).
From the MMLR method, the most significant global signal of AEOST (1st occurrence or repeated occurrences) in subjects taking concomitant PPIs (PLandPPI group) comes from the 2nd study (s � 2). is signal for repeated occurrences is also seen in 4th study (s � 4, with p value 0.006), 5th study (s � 5, with p value 0.006), and 6th study (s � 6, with p value 0.009). e observed logLR for studies 2, 4, 5, and 6 are all greater than the threshold of 2.47 for the analysis of repeated occurrences (allocc).

Analysis of Lipiodol Data with One Drug and Multiple
AEs. Lipiodol (labeled Ethiodol in the USA), also known as ethiodized oil, is a poppyseed oil used by injection as a radiopaque contrast agent that is used to outline structures in radiological investigations [20,21]. It is used in chemoembolization applications as a contrast agent in followup imaging [22].
In order to detect possible safety signals to document the safety of Lipiodol when it is used for selective intraarterial use for imaging liver lesions in adults with known hepatocellular carcinoma (HCC), thirteen studies (articles) were identified with a maximum dose of Lipiodol as 15 ml recommended in the drug label, from more than 100 articles included in NDA 09190/S-024 submission (https://www. accessdata.fda.gov/scripts/cder/drugsatfda/). e actual doses for different subjects varied. However, the maximum dose was reported to be 15 ml for all subjects in those selected studies. e subjects in the 13 studies are all adults (average age ranging from 45 to 69). e year which the 13 articles published ranges from 1993 to 2009. e number of subjects in the studies ranges from 11 to 257. ere are a total of 27 AEs reported in all the 13 studies for the drug considered (Lipiodol). e number of subjects with a particular AE (n is ) is reported by study, note that one subject may have multiple AEs reported. Since the exact drug exposure time is unknown for each subject from the articles, we assumed that the drug exposure is the same for each subject and that it is one unit. e total drug exposure by study (P is ) is then the total number of subjects in each study, which is the same for all rows in this case (P is � P s ).
Applying the regular LRT to the individual study and the LRT methods to all the 13 studies (with a total of 27 AE terms), the detected signals are shown in Tables 4 and 5. When the observed STAT (obsstat) is greater than the 95% threshold obtained from the empirical distribution under Table 1: Statistics in different methods (j is fixed and drop suffix j in the following formulation). Either logLR or LR can be used. In addition to the most significant signal, secondary signals can also be identified.

Method logLR or weighted logLR Test statistic (STAT) Most significant signal detected
Computational and Mathematical Methods in Medicine the null hypothesis of no signals, the related AE is a signal detected. When interpreting the detected signals, one can consider lumping together similar AEs (with different AECODE codes) to form a group. For example, postembolization syndrome (PES) (with the definition from http://radiopaedia.org/articles/ post-embolisation-syndrome-1), including AECODE codes FEVER, VOMITTING, NAUSEA, and ABDOMINAL PAIN, is detected by all LRT analysis methods (Table 5). e detected signals varied in the individual study analyses (Table 4). By the simple pooled method and wLRn method, three AEs (all in PES group) are detected as signals with p value less than 0.05 (Table 5). All the signals are integrated signals by considering the information from all the 13 studies.
By the MMLR method, 21 AE-study combinations are detected as signals. PES is the most significant global signal among all the signals. ere are 11 AEs among the signals (4 in PES group) ignoring the studies.  Figure 1: Forest plot of relative risk and 95% CI by study and summary (integrated using fixed effect model) relative risk using traditional meta-analysis methods (1occ analysis (a) and allocc analysis (b)). consider data from clinical trials with some exposure information. e AE signals detected can be called signals with higher relative risk. Same kind of applications can be conducted for data from passive surveillance system by evaluating reporting rates (see reference [7]), because there is no exposure information and one cannot evaluate relative risk. Without exposure information, the formula used in computing the likelihood ratios (Section 2.1) will be different with different denominators. e example of PPIs includes patients who were treated with multiple drugs and with exposure over time (patients may receive different doses at different visits). e AE studied (AEOST) in this example is an AE with many terms associated with osteoporosis. We may observe many repeated reports of the AEOST during the exposure duration.
ere are only two drug groups (drug groups as rows with i � 1, 2) and one selected composite AE (AEOST) for comparison. is is a simple case in signal detection and very  Computational and Mathematical Methods in Medicine similar to the set-up for traditional ways of data analysis in clinical trials. In this example, we compared the two drug groups with the fixed AE (AEOST). In contrast, the example of Lipiodol is different. Contrast agents are used in discrete bursts, and many patients have only a single exposure to the drug. erefore, if the dose of the one-time injection is similar for each patient, we can assume that the drug exposure is the same for each subject and that it is one unit. en, P i can be imputed with number of patients without more information on the exposure. In this example, we have 27 AEs (AEs as rows, i � 1, · · · , 27) and one drug of interest. e purpose of signal detection is to identify the AEs with high relative risks by comparing one AE vs. other AEs for Lipiodol.
ere are a total of 27 comparisons (not 2 comparisons such as the one in the first example). e proposed LRT method can handle the multiple comparisons here with false discovery rate (FDR) controlled [7]. Traditional meta-analysis evaluating risk ratios can be applied to the PPI data with two drug groups for comparison (two rows), but may not be applied to the Lipiodol data with more than two rows due to the inflated type-I error and FDR, in the presence of multiple comparisons.

Simulation
A simulation study is conducted with focus to evaluate the performance of the LRT analysis methods, discussed in Section 2, for data with multiple studies and drug exposure information available. e performance of tradition metaanalysis on risk ratio is also explored in simulated data with two rows.

Simulation Assumptions and Parameters.
We simulate data using the information on the total number of studies, total number of rows, n .s , and P is from the datasets used in the illustration (see the cases in Table 6), with equal relative risks for the data generation under the global null hypothesis (without any safety signals by study) and with different relative risks associated with different rows (for example, assigning the higher relative risk to the 1st row for data generation under the global alternative hypothesis). Note that each row corresponds to a drug or an AE. For example, row corresponds to a drug in Illustrations 3.1 and an AE in Illustration 3.2.
If the relative risks are the same for different rows, for each study, the simulated null data are generated from the following multinomial distribution (dropping suffix j): n 1s , · · · , n Is n .s ∼ Mult n .s , P 1s I i�1 P is , · · · , P Is I i�1 P is .

(6)
If the first row is a signal, with a higher relative risk, for each study, the simulated data (under global alternative) are generated from the following multinomial distribution: n 1s , · · · , n Is n .s ∼ Mult n .s , η 1s P 1s where η 1s (P 1s )/( I i (η is P is )) + · · · + η Is (P Is )/( i (η is P is )) � 1. e relative risk of the first row vs. all other rows for sth study with η is � 1, i � 2, · · · , I, is simply η 1s . e values of η 1s (same for different studies) in this simulation are selected to be 1, 1.2, 1.5, 2, and 3 (results with η 1s � 1.2 will not be shown in Table 6, but the powers are low for the scenarios with η 1s � 1.2). η values may vary by study too (for example, η 1s � 1.5, 1.2, 3, 1, 1.3, 2 for studies 1 to 6, respectively, in the scenario with rr21 in Table 6).
e results for type-I error and power calculations, for different scenarios with equal relative risks (under global null) and different relative risks (under global alternative), are presented in Table 6. e drug exposure information P i by row is obtained from the real data discussed in Section 4 (such as sim01occ, sim0allocc, and sim0lip). e drug Note. 21 AE-study combination signals detected by MMLR method, p value is 0 for the most significant one (for AE FEVER and 3rd study (s � 3)). 10 AE terms were reported in those signals ignoring the study information and are shown in the column for AE term with maximum observed logLR over the studies. exposure and case information are the same for scenarios sim01occ (null data) and sima1occ (alternative data), with only difference in the relative risks. e same rule was applied to all other null data and alternative data generation. e total number of replications is 10,000 for the scenarios for type-I error evaluation and 1000 for the scenarios for power evaluation, respectively. e power is defined as the number of times the null hypothesis (that there is no signal detected in each study) is rejected, divided by the total number of replications. When the data are generated with the assumption of no signals, the power becomes type-I error. Table 6, the type-I error (or FDR) for data without any signals in each study stays low for wLRn and MMLR. e type-I error for the pooled method is slightly higher than the other methods with values up to 0.07 (not controlled). is is because the null data (counts) are generated from multinomial distribution by study and then they are simply added over all studies. e pooled method is then applied to the pooled data (observed), and the empirical distribution of statistics for decision making is obtained by Monte Carlo procedure based on the pooled data, but not on the study-level data. e other two LRT analysis methods controls the type-I error since both their statistics and the empirical distributions are based on study-level data and then are combined using different weighting approaches. e power in Table 6 is highest for pooled method, and moderate for wLRn and MMLR methods. e MMLR method is more conservative than wLRn. e power values increase with the increase of relative risk values assigned to the 1st row. Usually, the power reaches 0.7-0.8 when the relative risk becomes 2 or 3, for all methods. e sample size in scenario simaallocc is larger than the scenario sima1occ; therefore, the power values are higher for scenarios simaallocc with different relative risks. In the scenario with rr21 (with different η values by study), the pooled analysis no longer has the largest power.

Simulation Results. As shown in
Traditional meta-analysis (Borenstein et al. [1]) with test based on normal assumption (Z statistic) and the null hypothesis that the mean effect is 1 (relative risk case) or 0 (log of relative risk) is applied to several simulated data scenarios with two rows for power and type-I error evaluation. If the p value (from the standard normal Z test) is less than 0.05, we reject the null hypothesis of relative risk (PLandPPI vs. PL) as 1.
e type-I error is 0.067 and 0.070 for scenarios sim0allocc and sim01occ, respectively. e results reflect the inflated error of the traditional meta-analysis for data with two rows (one comparison). With more than two rows (multiple comparison) in the data, we expect bigger type-I error from the traditional meta-analysis.
e power values of the traditional meta-analysis are 73%, 98%, and 100% for scenarios simaallocc with true relative risks for the 1st row as 1.5, 2, and 3, respectively. e powers for sima1occ scenarios are smaller than the simaallocc cases due to smaller sample size, but reasonably Table 6: Type-I error (or FDR) for cases with data generated under null hypothesis (all relative risks to be 1) and power (%) for cases with data generated under alternative hypothesis (varying relative risks).

Cases
Description LRT methods  Pooled  wLRn  MMLR  Type-I error   sim0allocc With p is and n .s from real data 0.055 0.047 0.050 Allocc case in Table 2, rr1   sim01occ With p is and n .s from real data 0.073 0.047 0.049 1st occurrence case in Table 2 group from studies 1 to 6, respectively. e n is and P is , i � 1, 2 for case sim0allocc can also be found in Table 2 (for repeated occurrences of AEOST). For case sim0LIP, n is and P is (i � 1, · · · , 27 and s � 1, · · · , 13) cannot be listed here due to space limitation. e values of n is range from 0 to 128, and the values of P is � P s range from 11 to 257. large for cases with relative risk 2 or above. When the data were generated with varying relative risks for different studies (scenarios simaallocc and sima1occ with rr21), the powers from the traditional meta-analysis are very low (about 5%).

Discussion
In summary, the analysis using regular LRT on pooled data, MMLR, and weighted LRT method (wLRn) identifies signals for ith row (a drug or an AE) by incorporating the information from different studies. In addition, with MMLR method, one can identify the global signal(s) for ith row (a drug or an AE) along with the studies containing that global signal(s). Multiple signals can be detected with step-down process imbedded in the LRT method for wLRn and MMLR methods. e traditional meta-analysis methods obtain a summary statistic based on the study-level statistics such as relative risk (can also called risk ratio) by fixed-effect or random-effects models or other weighting methods. ere are two steps in the traditional methods: first, obtaining the study-level statistic (odds ratio or risk ratio), and second, obtaining the summary statistic for overall evaluation using the study-level statistic. One may then use a normal approximation for the confidence interval construction and testing for statistical significance using the summary statistic. e two-step approach is also used by the proposed LRT methods in a different way of exploring safety issues. However, in LRT methods (MMLR and wLRn), the studylevel statistics are logLR. Monte Carlo (MC) simulation is used for testing for significance of the summary statistic. e use of logLR and a step-down process for identifying secondary signals with smaller logLR values and the nonparametric MC simulation for empirical distribution of the logLR or summary of logLR using null datasets together controls type-I error and FDR. In practice, one may consider conducting the traditional meta-analysis and the proposed signal detection method together in safety evaluation from multiple studies.
Normal distribution of the parameter estimates from different studies is commonly assumed in the fixed-effect model and random-effects model for traditional metaanalysis. Simulations have shown these methods are relatively robust even under extreme violations of distributional assumptions in estimating heterogeneity [23] and calculating an overall effect size in traditional meta-analysis [24]. However, many meta-analyses include a few studies (such as 5 studies) and such a sample is more inadequate to accurately estimate heterogeneity. In the cases with limited studies, one can still use the weighted LRT method using drug exposure as weight for safety signal exploration. Note that the weight could be study sample size, drug exposure, or could also be defined by the researchers to reflect the importance of the different studies or other study features. e proposed LRT methods are mainly for postmarket safety evaluation using adverse event data collected from different studies (such as completed clinical trials or observational studies) and for safety signal monitoring using data from ongoing clinical trials. When analyzing observational data from passive surveillance system such as FAERS, we do not have exposure information including total of subjects taking drugs, drug exposure time, and dose. erefore, we can only evaluate reporting rate, and the denominator for the rate calculation is n i. . When analyzing data from clinical trials, we usually have some information about exposure including the number of patients, the dose for each patient, and the exposure time from taking drug to event. erefore, we can evaluate the risk with denominator P i . When using the proposed method for combining information from multiple studies, one cannot simply combine information from observational data and clinical trial data. In a meta-analysis, we only apply the proposed method to studies with similar features, such as similar denominators, patient populations, study objectives, and so on. e proposed LRT methods output the p values, which incorporate information of relative risk (rr) and exposure from different studies. For each study, an AE signal with higher rr and bigger exposure value may lead to a small p value; and an AE with higher rr and small exposure value may not have a small p value. e integrated AE signals with small p values from all studies are affected by the combined information of the rr estimates and exposure information from all studies. Both relative risk and exposure by study are important information that can be included in the output in addition to the p values from the proposed LRT method.
Note that there are missing data issues in data collected from surveillance system (such as delayed reporting, missing reporting, and repeated reporting). ere are also missing data issues in clinical trials. For example, patients may drop off before the completion of the one study with less adverse events reported and those patients may have worse disease status compared with patients completing the study. In these situations, we will only observe the available adverse events before drop-off and miss many adverse events after drop-off. We may miss some reports for the patients' missing visits in clinical trials. Some studies may have less missing values and some may have more. In data mining for safety signals, safety investigators usually ignore those missing events in the analysis for the signals. Signals detected without considering missing data in single study or multiple studies may introduce bias. is may be a topic for future research.
Data Availability e hypothetical data for PPI analysis and the Lipiodol data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.