A systematic strategy for estimating hERG block potency and its implications in a new cardiac safety paradigm.

INTRODUCTION
hERG block potency is widely used to calculate a drug's safety margin against its torsadogenic potential. Previous studies are confounded by use of different patch clamp electrophysiology protocols and a lack of statistical quantification of experimental variability. Since the new cardiac safety paradigm being discussed by the International Council for Harmonisation promotes a tighter integration of nonclinical and clinical data for torsadogenic risk assessment, a more systematic approach to estimate the hERG block potency and safety margin is needed.


METHODS
A cross-industry study was performed to collect hERG data on 28 drugs with known torsadogenic risk using a standardized experimental protocol. A Bayesian Hierarchical Modeling (BHM) approach was used to assess the hERG block potency of these drugs by quantifying both the inter-site and intra-site variability. A modeling and simulation study was also done to evaluate protocol-dependent changes in hERG potency estimates.


RESULTS
A systematic approach to estimate hERG block potency is established. The impact of choosing a safety margin threshold on torsadogenic risk evaluation is explored based on the posterior distributions of hERG potency estimated by this method. The modeling and simulation results suggest any potency estimate is specific to the protocol used.


DISCUSSION
This methodology can estimate hERG block potency specific to a given voltage protocol. The relationship between safety margin thresholds and torsadogenic risk predictivity suggests the threshold should be tailored to each specific context of use, and safety margin evaluation may need to be integrated with other information to form a more comprehensive risk assessment.


Introduction
The cardiac action potential is regulated by the electrical current flows of ions across cardiomyocyte membranes. Many drugs can bind to ion channels, block ionic flow and disrupt the regulation of the action potential, leading to a drug-induced arrhythmia, or "proarrhythmia" (Friedman and Stevenson, 1998). A particularly dangerous type of proarrhythmia is known as "Torsade de Pointes," or TdP, which is a rare ventricular tachycardia with a potential to cause sudden cardiac death (Roden, 2008). The ion channel of greatest interest to the identification of TdP risk is the Kv11.1 potassium channel, which is encoded by hERG (human ether-a-go-go related gene) and carries the rapidly activating delayed rectifier potassium current (I Kr ) (Vandenberg et al., 2012). Block of the hERG channel results in a reduction of I Kr and repolarization reserve (Roden, 1998), and in turn may lead to QT prolongation and TdP (Nachimuthu et al., 2012). Although I Kr is one of the most prominent repolarizing currents, other cardiac currents also contribute to repolarization (Roden, 1998). Based on this more comprehensive understanding of cardiac electrophysiology and cellular mechanisms of TdP, the Comprehensive in vitro Proarrhythmia Assay (CiPA) was proposed to integrate multi-ion channel pharmacology measured in vitro into experimentally-parameterized in silico models to assess TdP risk (Sager et al., 2014). The progress made by the CiPA Initiative and other similar projects worldwide have led to the formation of an International Council for Harmonisation (ICH) Implementation Working Group to develop Questions & Answers (Q&As) for ICH S7B (nonclinical) and E14 (clinical) guidelines (Questions and Answers, 2018).
This upcoming new international cardiac safety paradigm may facilitate the use of nonclinical data as part of an integrated risk assessment strategy to inform clinical decision making. Two types of nonclinical approaches have been used in cardiac safety assessment. One type focuses solely on quantifying block of the hERG channel, as this is the most common ionic basis for TdP (Redfern et al., 2003). The other uses a more comprehensive platform (such as in silico models with multi-ion channel in vitro data (Kramer et al., 2013;Abbasi et al., 2017), induced pluripotent stem cell (iPS)-derived cardiomyocytes (Ando et al., 2017), or in vivo/ex vivo systems (Champeroux et al., 2005)) to quantify the pharmacological effects on the cardiac system. While the latter can be regarded as proarrhythmia risk prediction models and typically produce a metric (such as a numerical score or qualitative classification) to predict the risk, the former usually try to identify a "safety margin" threshold, where safety margin is defined as the ratio of the half inhibitory concentration,IC 50 , for hERG inhibition to the maximum free therapeutic concentration, C max free , of the drug.
The assumption is that above the safety margin threshold, the compound is not likely to induce TdP (Redfern et al., 2003). There have been previous attempts to relate hERG safety margin to TdP risk (Wallis, 2010). However, several confounding factors make it difficult to interpret these past findings. First, these investigations (Redfern et al., 2003;De Bruin et al., 2005;Gintant, 2011;Webster et al., 2002) pooled together drug potency data from vastly different experimental conditions (voltage protocol, temperature, native vs. heterologous systems, etc.). As the results of in vitro ion channel patch clamp assays are sensitive to these conditions (Kirsch et al., 2004;Lee et al., 2019), the inconsistent data used in these studies renders their proposed safety margin thresholds of dubious validity. Second, some studies used clinical QTc prolongation as the endpoint (Gintant, 2011;Webster et al., 2002). Since the real concern is TdP, the use of a surrogate marker limits the use of the proposed safety margin threshold. Third, no uncertainty quantification was done in these studies, and all results are point estimates. The use of a single point estimate may be acceptable for early stage drug screening studies. For late-stage regulatory risk assessment, however, the uncertainty in the data must be accounted for. Lastly, all these studies defined safety margin using hERG IC 50 as the drug potency parameter. This prevented the use of such a strategy on drugs for which an accurate measurement of IC 50 is not possible, for instance due to solubility issues.
Under the CiPA Initiative, a systematic strategy was developed to address the above issues. A standardized voltage protocol was used in measurements of 28 CiPA drugs with known clinical TdP risk at several globally-distributed facilities ("sites"). All sites used high throughput (HTS) automated patch-clamp systems, though not necessarily the same type of devices. The data were generated in a blinded fashion by each site, and then collected for centralized de-blinding and data analysis. A rigorous statistical method was applied to account for inter-and intrasite variability. A method was also developed to calculate the block potency using not only IC 50 , but also lower inhibitory concentrations (such as IC 10 , IC 20 ) to accommodate for those drugs with solubility issue. Based on the estimated posterior distributions of hERG potency of the 28 drugs, the relationship between choosing a specific safety margin threshold and making an error in TdP risk classification (false positive and negative rates) is explored. Finally, a modeling and simulation study was used to highlight the fact that any hERG block potency estimation depends on the experimental temperature and the particular voltage protocol used.

Pharmacology and electrophysiology
The CiPA Initiative organized a panel of facilities with HTS systems to participate in a multi-site study coordinated by the Health and Environmental Sciences Institute (HESI). The study was conducted in two phases (phase 1 for 12 CiPA training drugs and phase 2 for 16 CiPA validation drugs) and not all sites participated in both phases. The 28 drugs were categorized into High, Intermediate, or Low TdP risk classes by a dedicated CiPA team (Colatsky et al., 2016). While training and validation data sets were needed for the development of CiPA in silico models Dutta et al., 2017;Li et al., 2017;Li et al., 2018), such a division is not needed here. Therefore, the two data sets were unified in this study. Dose-response data for several other ion channels and physiological temperature was also collected as part of the study, but only the ambient temperature hERG data was analyzed here. The identities of the participating sites were masked by numerical indicators. A centralized procedure was taken where all 28 drugs were purchased and prepared into stock solutions by a single laboratory outside of the participating sites. The identities and concentrations of the stock solutions were masked. Each site received a blinded aliquot of the stock solutions. Instructions were provided for making serial dilutions from each stock solution. The sites performed their hERG assays without knowing the identities or concentrations of the compounds. The unblinded drug names and concentrations, along with the measured block percentage for each cell, are available along with this publication at https://github.com/FDA/CiPA/tree/System-atic_Strategy_hERG_Block_2020.
The sources of the 28 CiPA drugs are as follows. For Phase 1, all drugs were purchased from Millipore Sigma (St. Louis, MO), formulated in 100% DMSO, blinded and shipped frozen to collaborator testing sites. All stocks were prepared at 1000× concentration to be tested, to limit final DMSO exposure upon preparation of serial dilutions to 0.1% in each assay. For Phase 2, blinded drug powder with instruction for formulation of stock solutions was sent to the sites by the Chemotherapeutic Agents Repository of the National Cancer Institute and stored at −20°C until the day of testing.
The CiPA step-ramp protocol was used as the standard protocol across sites. It involves a pulse pattern, repeated every 5 s, consisting of a depolarization to 40 mV amplitude for a 500 ms duration, followed by a ramp (1.2 V/s) to −80 mV for 100 ms. The holding potential is −80 mV. Peak tail current is measured during the ramp. Site-specific parameters, such as cell lines, buffers, HTS platforms, can be found in Supplementary Materials. The details of the action potential waveform protocol can be found in a previous publication .

Bayesian hierarchical model to analyze multi-site data
We used a Bayesian hierarchical model (BHM) similar to that used by Johnstone et al. for analysis of ion channel dose-response data (Johnstone et al., 2016). The distribution of a logarithmic transformation of the IC 50 for the j th site (denoted by pIC 50 j ) across all sites was assumed to follow a logistic distribution. Similarly, the site-specific Hill coefficients (h j ) across all sites were assumed to follow a log-logistic distribution. The hyperparameters controlling these distributions (intersite variability), along with site-specific probability distributions of IC 50 (and Hill coefficients) within each site (intra-site variability), are estimated through a Markov Chain Monte Carlo (MCMC) process using the Metropolis-Hastings algorithm. In-house developed R scripts using the FME package (https://cran.r-project.org/web/packages/FME/index. html) were run on the FDA's High Performance Computing cluster to execute the algorithms. A discussion of the prior information, bounds on the parameters, and implementation details can be found in the Supplementary Methods.

Modeling and simulation method for predicting the effect different voltage protocols have on hERG block potency estimates
The hERG dynamic model previously developed for CiPA  was used to simulate drug effects on the hERG channel using various input voltage protocols. The drug-specific kinetic parameters were estimated by fitting to time-dependent fractional block data from a modified Milnes protocol (Milnes et al., 2010). A quality criterium was applied to select only those cells with less than 20% background current, where the background current was measured by applying 0.5 μM E-4031 to the cell at the end of each experiment. A bootstrapping procedure was used to quantify the uncertainty in the drug-specific kinetic binding parameters. This generated 2000 sets of kinetic binding parameters for each drug, which provides a numerical approximation of the true joint distribution . For each drug, the 2000 sets of parameters were used to perform 2000 simulations for a particular voltage protocol across 10 concentrations. The concentrations were chosen to span a wide range of block for each drug. For each drug and voltage protocol, this gave a simulated dose-response dataset of 2000 predicted block values for each of the ten different concentrations. The joint probability distribution of IC 50 s and Hill coefficients for each drug was found by applying Markov-chain Monte Carlo to the simulated dose-response data. The likelihood function assumed the mean block is normally distributed with the mean equal to the block predicted by the Hill equation, and error about the mean described by a drug-specific variance term. Hill coefficient was bounded between 0.5 and 2.0, which was the range obtained by previous investigators after examination a large amount of HTS dose-response data (see (Elkins et al., 2013) and Supplementary Methods for details). The raw patch clamp data obtained from the Milnes protocol, code, fitted hERG kinetic drug binding parameters, and simulated dose-response curves can be all found at https://github.com/FDA/CiPA/tree/ Systematic_Strategy_hERG_-Block_2020. We note to the reader that the hERG model uses an E max model to assume a saturating maximum drug effect . As a consequence, the Hill equation used to fit the simulated doseresponse data was slightly altered to account for a maximum block effect (B max ). This is similar to a modified Hill equation used to introduce an I max effect. Details can be found in the Supplementary Methods and in reference (Mistry, 2018). Of note the B max term was only used to estimate hERG potency from simulated dose-response dataset. It was not used when estimating hERG potency from real experimental data through BHM.

System-specific Hill for
Site 1 with intra-site variability System-specific IC50 for Site N with intra-site variability System-specific IC50 for Site 1 with intra-site variability Hyperparameters to control the distribution of system-specific Hills across sites (Inter-site variability)

Noise (observation error)
Hyperparameters to control the distribution of system-specific IC50s across sites (Inter-site variability) Hill coefficient prior information Noise prior information IC50 prior information Fig. 1. The Bayesian Hierarchical Modeling (BHM) structure to quantitate both inter-and intra-site variability in multi-site hERG assay data. A diagram depicting the structure of the BHM model to infer distributions of statistical parameters that give rise to the observed experimental data. The blue box at the top is the "prior information," which is prior knowledge or assumptions we have about the experimental systems. The green box corresponds to the distribution of hyperparameters that control the system-specific IC 50 s and Hill coefficients across sites (inter-site variability). Similar to Johnstone et al. (Johnstone et al., 2016), we assume IC 50 s and Hill coefficients for the same drug follow two distinct distributions across sites and hence are governed by two independent sets of hyperparameters (see Supplementary Methods), although it is possible that there is some correlation between IC 50 s and Hill coefficients for the same drug across sites. The red box corresponds to the distribution of site/system-specific parameters (IC 50 s and Hill coefficients) within each site (intra-site variability). Note that each site has its own distribution of IC 50 s and Hill coefficients. Site 1 and Site N (the last site) were shown with other sites being represented by ellipsis. The purple box at the bottom is the set of all experimental observations provided by all sides. Of note, the prior information for IC 50 and Hill coefficient was deduced by following the approach of Johnstone et al. (Johnstone et al., 2016) using HTS screening data with a large number of repeats (Elkins et al., 2013) (see Supplementary Methods for details). In addition, for Hill coefficients we set a boundary between 0.5 and 2.0, after examining HTS screening data with large numbers of repeats (see Supplementary Methods for details). For the prior information of measurement error or system noise, we used a uniform distribution for all sites, although in theory prior information about system noise can be obtained for each site and used to further constrain the parameters. One of the hyperparameters in the green box (the location parameter μ, see Supplementary Methods) corresponds to the mean of the IC 50 distribution across sites. The probability distribution of μ reflects our uncertainty in estimating the mean hERG block potency across sites, and will be used as each drug's IC50 distribution to calculate the safety margin distribution across sites. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Multi-site hERG data from HTS using the CiPA step-ramp protocol
An initial examination of the multi-site hERG data using the same standard voltage protocol suggests there are significant intra-site and inter-site variabilities (Fig. S1 in Supplementary Materials). Theoretically, intra-site variability may be mainly attributed to the inherent randomness of the measurement (random measurement error from the system and random variation of electrophysiological properties from cell to cell) while inter-site variability may largely stem from systematic differences between sites (different platforms, cell lines, internal quality control, etc.). To account for both types of variability, we adopted a Bayesian hierarchical modeling (BHM) approach (Johnstone et al., 2016). The general idea behind BHM is to assume that, for any drug, each site has its own specific hERG IC 50 and Hill coefficients, which are determined by the specific configuration of the experimental system at that particular site. The variation of these system-specific (or site-specific) IC 50 s and Hill coefficients across sites can be modeled as a "higher level" distribution to quantify inter-site variability. Within each individual site, the system-specific IC 50 and Hill coefficient distributions for each drug determine a "lower level" uncertainty in dose-response relationships that characterize intra-site variability. The hyperparameters that describe inter-site variability and the site-specific parameters that describe intra-site variability can be estimated from the multi-site data, as shown as a diagram in Fig. 1. By applying this BHM modeling strategy to the multi-site data, we derived joint distributions of the site-specific parameters and hyperparameters for each of the 28 drugs. One of the hyperparameters μ (see Supplementary Methods), which corresponds to the mean of the estimated IC 50 distribution across sites for each drug, is shown in Table 1.
Drugs with poor solubility or weakly interact with the hERG channel often cannot attain close to 50% block in experimental practice. This makes IC 50 for such drugs difficult to estimate. To remedy this problem, IC 20 has been proposed as a substitute for IC 50 to quantify hERG block potency (Redfern et al., 2003;Wallis, 2010). Our proposed method allows the calculation of low inhibitory concentrations (IC 10 , IC 20 , etc.) after taking into consideration the experimental variability (Table 1 and Supplementary Methods).
The high sensitivity of the hERG assay can be utilized to define a safety margin threshold to minimize the likelihood of hERG block mediated TdP risk, such as the study by Redfern et al. (Redfern et al., 2003)) to define a threshold of 30 by finding an upper bound of IC 50 / C max free among drugs of considerable TdP liability. A caveat of this approach is that any defined threshold suffers from false positive and false negative rates, and previous studies focused on achieving high sensitivity (low false negative rate) without considering a desired high specificity (low false positive rate). We reasoned that different context of use might place different weights on the tolerability of false positives and false negatives, which would motivate using different thresholds. Accordingly, the relationship between any chosen safety margin threshold and the rates of false positive (probability of low TdP risk drugs having the safety margin below the threshold) and false negative (probability of high or intermediate TdP risk drugs having the safety margin above the threshold) is explored based on the posterior probability distributions of the 28 drugs' hERG potency divided by their C max free Li et al., 2018) values ( Fig. 2

and Supplementary
Methods). We observe the expected trend of a decreasing false negative rate and an increasing false positive rate with increases in threshold.
For example, at a threshold of 300, the false negative rate is 2%, but the false positive rate is 67%. When the threshold is moved to 30, the two error rates are closer to each other (false negative and positive rates 27% and 33% respectively). A similar plot is given in Fig. 3 using IC 20 / C max free as safety margin.

Protocol-dependent hERG block potency estimations illustrated by modeling and simulation studies
One major confounding factor in the hERG potency estimation, which is relatively well controlled in this but not in previous studies, is protocol-dependent changes of IC 50 s. It has been well established that the voltage protocol and temperature have significant impact on the hERG assay results (Kirsch et al., 2004;Lee et al., 2019). But no systematic study has been performed to investigate the effect of different voltage protocols on hERG potency estimation. As experimental profiling of such changes across many drugs and different protocols is time consuming, we took advantage of an in silico hERG model that has been parameterized by a dynamic protocol to estimate drug binding kinetic parameters . By using this model, we were able to simulate three distinct voltage protocols to predict what potency values we might get from these protocols across all 28 CiPA drugs (Fig. 4). As the hERG model was parameterized by dynamic data collected at physiological temperature by a single laboratory, the predicted IC 50 s could be regarded as theoretical system-specific IC 50 s at 37°C with intra-site variability from a single site. The three protocols were picked because they represent the standard CiPA hERG step-ramp protocol (0.2 Hz) (Li et al., 2018), a slower CiPA step-ramp protocol (0.03 Hz), and an action potential wave form protocol to mimic bradycardia (0.5 Hz) . Before using the model for all drugs, a preliminary experiment suggests our model can predict frequency-dependent drug block reasonably well ( Supplementary Fig. S2).
A comparison of the predicted IC 50 s across the three protocols for all CiPA drugs is shown in Table 2. As expected, different drugs have different change of potency in response to protocol or frequency changes. With the same CiPA protocol, some drugs that are trapped within the closed hERG channel during protocol intervals (e.g. bepridil (Li et al., The hERG assay data for 28 CiPA drugs across multiple sites using high throughput automated patch clamp systems were collected and subjected to a BHM as depicted in Fig. 1 2017)) have essentially the same IC 50 s when varying the frequency from 0.2 Hz to 0.03 Hz. Some highly trapped drugs (e.g. dofetilide ) even have higher block potency (lower IC 50 ) at slower (0.03 Hz) compared to higher (0.2 Hz) frequency for the same CiPA protocol, illustrating the so called reverse frequency dependent blocking as previously reported (Thomas et al., 2003). By contrast, some less trapped drugs, such as cisapride , have frequency dependent blocking as their hERG block potency is higher (the IC 50 is lower) at higher frequency. Other than the tendency of trapping, drug binding rate also plays a role. For instance, terfenadine and ondansetron are two drugs with similar tendency of trapping, and yet quite different binding rates . The slower-binder terfenadine has an over two-fold decrease in block potency (increase in IC 50 s) when the CiPA step-ramp protocol frequency is decreased from 0.2 Hz to 0.03 Hz. In contrast the fast-binder ondansetron has almost the same level of block potency when frequencies change. These results show that hERG potency estimates depend strongly on the applied voltage protocol. Therefore, any proposed safety margin only has scientific validity when it is in reference to a specific voltage protocol.

Discussion
In this study, we developed a systematic approach to estimate hERG block potency for TdP liability assessment. This strategy aims at addressing four issues associated with previous studies: the use of heterogenous experimental conditions (voltage protocols and temperatures), the ambiguity in the endpoint (TdP vs QTc prolongation), the lack of uncertainty quantification, and the inability to cover those drugs that are difficult to estimate IC 50 s due to various reasons such as solubility.
The standardization of experimental protocols was performed as part of a cross-industry HTS automated patch clamp study coordinated by HESI under the CiPA Initiative. The step-ramp protocol was chosen as it is a simple approximation of the shape of a cardiac action potential. Some physiological temperature data were also collected by the HESI study. As most of the data were at ambient temperature, we decided to focus on room temperature data in this project to maximize the coverage of participating sites for a more comprehensive understanding of inter-site variability. To make all sites' experimental conditions as close as possible, not only the voltage protocol but also the sources and concentrations of each compound are "standardized" (stock solutions centrally prepared and then distributed) across sites. A blinding procedure was also implemented to ensure an objective application of the hERG assays at each site. The collected multi-site data are an important resource to investigate "why" experimental variabilities exist and identify the most important underlying factors, with the goal of reducing lab-to-lab variability for future hERG assays. On the other hand, the collected data also provide a resource to study "how" to use current hERG assay data across industry, for example to define a safety margin threshold after considering intra-and inter-site variability. While a manuscript for the former is being prepared, this document represents our effort towards the latter application of the data.
The Bayesian hierarchical modeling approach allows us to quantify intra-site and inter-site variability in a sound statistical framework. A supplementary approach was also developed to define hERG block potency using low inhibitory concentrations (IC 20 etc.) to accommodate those drugs with poor solubility or not enough block at highest concentrations. It should be noted that it may be difficult to calculate a reasonable estimate of IC x if the drug in experimental practice cannot achieve a block of at least x. For example, to calculate IC 20 accurately Fig. 2. Relationship between the choice of a safety margin (IC 50 /C max free ) threshold and the false positive and false negative rates for TdP risk classification. X axis: Any chosen safety margin threshold. Y axis: the false negative (red) and false positive (blue) rates associated with each safety margin threshold. The false positive rate is defined as the probability that a low TdP risk drug will have a safety margin below the threshold. The false negative rate is the probability that an intermediate-risk or high-risk drug will have a safety margin above the threshold. Please see Supplementary Methods for details. All probabilities are based on the posterior probability distributions of hERG potency (IC 50 ) of the 28 drugs divided by corresponding C max free . Three exemplar thresholds and their associated false positive/negative rates are labeled: A threshold of 300 with very high sensitivity (very low false negative rate) and low specificity (high false positive rate), previously proposed threshold of 45 by Gintant et al. (Gintant, 2011), and the threshold of 30 by Redfern et al. (Redfern et al., 2003). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) B.J. Ridder, et al. Toxicology and Applied Pharmacology 394 (2020) 114961 the drug will need to achieve at least 20% block in the hERG assay at highest tested concentrations. Based on the posterior probability distributions of hERG potency (IC 50 or IC 20 ) values as for the 28 drugs derived from BHM, we explored the relationship between safety margin threshold, the false positive rate, and the false negative rate for predicting TdP risk. Similar to previous studies (Redfern et al., 2003;Gintant, 2011) we found that no threshold could achieve both a low false positive rate (high specificity) and low false negative rate (high sensitivity). With a higher threshold (e.g. 300), the high sensitivity (0.98) and low specificity (0.33) give a positive likelihood ratio of 1.5 (a torsadogenic drug is 1.5 times more likely to be classified as torsadogenic compared to a non-torsadogenic drug), and a negative likelihood ratio of 1/16.5 (a non-torsadogenic drug is 16.5 times more likely to be classified as non-torsadogenic compared to a torsadogenic drug), respectively. The lower threshold (e.g. 30) gives a positive likelihood ratio of 2.2, and a negative likelihood ratio of 1/2.5, respectively. The classification performance from either safety margin threshold is lower than the comprehensive in silico model integrating multi-channel pharmacology (Li et al., 2018). This is because the safety margin strategy only considers the hERG channel, even though the interplay between multiple ion channels is needed to regulate the action potential. This necessitates the need for comprehensive proarrhythmia risk prediction models that can properly account for the coupled effect of multiple cardiac ion channels. However, under specific context of use, the safety margin strategy could still be utilized. For example, during relatively early drug screening, a lower threshold with a balanced tradeoff between the false positive and false negative rates could quickly screen out many high-risk drugs to produce a "short list" of promising candidates. At later development stage, a higher threshold (lower false negative rate/higher sensitivity) could be applied to determine which drugs have low probability of hERG blockmediated TdP liability, and which ones may have enough concern and warrant the use of more comprehensive models for accurate TdP risk assessment.
There are notable limitations to our study. First, although we considered experimental variability of estimating hERG block potency in our multi-site study, no uncertainty was associated with C max free when calculating the safety margin. This is due to the fact that there are very few studies to establish methods to quantify the uncertainty in the estimation of plasma protein binding, a main factor behind the variability of C max free (Wang et al., 2014). Nevertheless, some factors of pharmacokinetic uncertainty did go into our calculation of C max free . For example, for terfenadine we used a median level of plasma concentration after cytochrome P450 inhibition (Redfern et al., 2003) to account for drug-drug interactions that may be responsible for terfenadine-mediated TdP events. However, not all drugs in this study have enough data to consider drug-drug interactions. In addition, it is known that the metabolite of risperidone, paliperidone, prolongs the QTc interval in a concentration-dependent manner (Suzuki et al., 2012) and that concentrations of this metabolite need to be taken into account for both efficacy and cardiac repolarization. This is analogous to taking in to account the increased exposure of terfenadine occurring in the presence of a metabolic inhibitor, as done in this study. This emphasizes the need to have a good understanding of the distribution, metabolism and  B.J. Ridder, et al. Toxicology and Applied Pharmacology 394 (2020) 114961 elimination of the compounds and any other contributing ion channel effects from metabolites. A second limitation is that we used model-predicted, rather than experimentally measured, hERG block potency across protocols to highlight protocol-dependent changes in IC 50 s. We did not attempt to compare the predicted IC 50 s from our hERG model to experimentally   to quantify the uncertainty in the dose response curves and generate a credible interval for IC 50 and Hill coefficients (panel E). The CiPA hERG model parameterized by Milnes protocol data collected at physiological temperature was used to simulate three protocols and predict dose-response curves for 28 CiPA drugs across multiple concentrations. An uncertainty quantification procedure similar to BHM was used to estimate IC 50 s, but with only intra-site variability considered. The 2.5% quantile, 50% quantile, and 97.5% quantiles forming the 95% credible intervals (CI) of IC 50 s for all drugs are shown across the three protocols. The unit for all IC 50 s is nM. Note that as 2000 simulated cells were used per concentration, the estimated IC 50 s generally have lower uncertainty than the variation with protocol dependency. Ramp protocol: CiPA step-ramp protocol. AP protocol: Action potential wave-form protocol. The rationale of selecting these protocols can be found in the Main Text.
measured IC 50 s using the same protocols in the literature, as it will be difficult to dissect the observed differences into intra-site variability and inaccuracy of the model. However, comparing the predicted and observed frequency-dependent hERG block data from the same lab suggested that our prediction may have captured the hERG dose-response reasonably well. On the other hand, some drugs were predicted by our hERG model to have reverse frequency dependency in hERG block potency (IC 50 s are lower at 0.03 Hz compared to 0.2 Hz with the CiPA step-ramp protocol). Although reverse use dependency on AP prolongation is often observed at the cellular level for many hERG blockers (Barandi et al., 2010;Jurkiewicz and Sanguinetti, 1993;Virag et al., 2009), at the channel level drugs are usually observed to have either (forward) frequency dependent or frequency independent block (Stork et al., 2007). Reverse frequency dependency at the channel level is theoretically possible when drugs can bind to the closed hERG channel or the tendency to be trapped within the closed hERG channel is excessive, but this phenomenon is only occasionally observed for few drugs (Thomas et al., 2003). It is unknown whether our hERG model's prediction of more widespread channel-level reverse frequency dependency is due to the protocol selection for the simulation, or theoretical IC 50 s with less uncertainty and higher resolution to identify subtle difference, or inaccuracy of the model by overestimating the trapping tendency of some hERG blockers. However, this potential discrepancy does not interfere with the model's ability to highlight the pattern of protocol-dependent IC 50 changes. The last and the most important limitation is that, block potency estimations are dependent on data quality and other experimental conditions as well. The HESI multisite study had a standardized experimental protocol, but no unified quality control criteria. This may affect IC 50 estimates, and subsequently the estimation of the threshold. In addition, a number of experimental conditions were not standardized, and their impact on hERG potency estimation is unknown. For instance, different plate types or reservoir materials were used in this study (polypropylene vs glasscoated plates, Teflon reservoirs, glass vials etc.). And while all sites used heterologous cell lines to express human hERG channels, different cell types (CHO or HEK293) and biological properties (e.g. channel density) might also contribute to cross-site variability. Our BHM method quantifies the overall inter-lab variabilities from all sources and does not require any specific experimental conditions to be used by different labs (such as heterologous vs iPS or native cardiomyocytes), as long as the underlying experimental protocol is appropriately designed for block potency estimation and standardized across labs. However, understanding the impact of these experimental conditions on potency estimation is essential for future hERG assays.
With the proposed establishment of best practice and quality standards for in vitro ion channel studies by the ICH S7B/E14 Q&A process (Questions and Answers, 2018), it is hopeful that in the future similar multi-site studies with not only standard protocol but also unified quality control criteria and other important experimental conditions will be conducted, and then our systematic strategy can be re-applied to the new data to update the hERG potency estimation and re-define the safety margin.

Disclaimer
This report is not an official US Food and Drug Administration guidance or policy statement. No official support or endorsement by the US Food and Drug Administration is intended or should be inferred.

Declaration of Competing Interest
The authors declared no conflict of interests.