ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Research Article
Reanalysis

Revisiting inconsistency in large pharmacogenomic studies

[version 1; peer review: 1 approved, 1 approved with reservations, 1 not approved]
PUBLISHED 16 Sep 2016
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

This article is included in the Preclinical Reproducibility and Robustness gateway.

Abstract

In 2013, we published a comparative analysis mutation and gene expression profiles and drug sensitivity measurements for 15 drugs characterized in the 471 cancer cell lines screened in the Genomics of Drug Sensitivity in Cancer (GDSC) and Cancer Cell Line Encyclopedia (CCLE). While we found good concordance in gene expression profiles, there was substantial inconsistency in the drug responses reported by the GDSC and CCLE projects. We received extensive feedback on the comparisons that we performed. This feedback, along with the release of new data, prompted us to revisit our initial analysis. Here we present a new analysis using these expanded data in which we address the most significant suggestions for improvements on our published analysis — that targeted therapies and broad cytotoxic drugs should have been treated differently in assessing consistency, that consistency of both molecular profiles and drug sensitivity measurements should both be compared across cell lines, and that the software analysis tools we provided should have been easier to run, particularly as the GDSC and CCLE released additional data.
           
Our re-analysis supports our previous finding that gene expression data are significantly more consistent than drug sensitivity measurements. The use of new statistics to assess data consistency allowed us to identify two broad effect drugs and three targeted drugs with moderate to good consistency in drug sensitivity data between GDSC and CCLE. For three other targeted drugs, there were not enough sensitive cell lines to assess the consistency of the pharmacological profiles. We found evidence of inconsistencies in pharmacological phenotypes for the remaining eight drugs.
 
            Overall, our findings suggest that the drug sensitivity data in GDSC and CCLE continue to present challenges for robust biomarker discovery. This re-analysis provides additional support for the argument that experimental standardization and validation of pharmacogenomic response will be necessary to advance the broad use of large pharmacogenomic screens.

Keywords

drug sensitivity, cancer, pharmacogenomics, consistency, pharmacogenomic agreement

Box 1. Summary box

In 2013 we reported inconsistency in the drug sensitivity phenotypes measured by the Genomics of Drug Sensitivity in Cancer (GDSC) and the Cancer Cell Lines Encyclopedia (CCLE) studies. Here we revisit that analysis and address a number of potential concerns raised about our initial methodology:

  • Different drugs should be compared based on the observed pattern of response. To address this concern, we considered drugs falling into three classes: (1) drugs with no observed activity in any of the cell lines; (2) drugs with sensitivity observed for only a small subset of cell lines; and (3) drugs producing a response in a large number of cell lines. For each class, we assessed the correlation in drug response between studies using a variety of metrics, selecting the metric that performed best in each individual comparison. While no metric identified any substantial consistency for the first class (sorafenib, erlotinib, and PHA−665752) due to no activity, judicious choice of metric found high consistency for three of eight highly targeted therapies in the second class (nilotinib, crizotinib, and PLX4720), but no metric found better than moderate correlation for two of four broad effect drugs in the third class (PD−0332901 and 17-AAG).

  • Measure of consistency for targeted drugs. Beyond considering drug response profiles, targeted drugs should be treated differently when assessing consistency. We used six different statistics to test consistency, using both continuous and discretized drug sensitivity data. We confirmed that Spearman rank correlation, used in our 2013 study, does not detect consistency for the three highly targeted therapies profiled by GDSC and CCLE. Other statistics, such as Somers' Dxy or Matthews correlation coefficient, yielded moderate to high consistency for specific drugs, but there was no single metric that found good consistency for each of the targeted drugs.

  • Consistency of molecular profiles across cell lines. In our initial published analysis, we reported correlations based on comparing drug response “across cell lines” while gene expression levels were compared “between cell lines.” It has been suggested it would be more appropriate to compute correlations “across cell lines” for both molecular and pharmacological data. Here we report a number of statistical measures of consistency for both gene expression and drug response compared across cell lines and confirm our initial finding that gene expression is significantly more consistent than the reported drug phenotypes.

  • Some published biomarkers are reproducible between studies. In our initial comparative study we found that the majority of known biomarkers predictive of drugs response are reproducible across studies. We extended the list of known biomarkers and found that seven out of 11 are significant in GDSC and CCLE. While one can find such anecdotal examples, they do not lead to a general process for discovering a new biomarker in one study that can be applied to another study.

  • Research reproducibility. The code we provided with our original paper was incompatible with updated releases of the GDSC and CCLE datasets. We developed PharmacoGx, which is a flexible, open-source software package based on the statistical language R, and used it to derive the results reported here.

Introduction

The goal of precision medicine is the identification of the best therapy for each patient and their own unique manifestation of a disease. This is particularly important in oncology where multiple cytotoxic and targeted drugs are available, but their therapeutic benefits are often insufficient or limited to a subset of cancer patients. Large-scale pharmacogenomics studies in which experimental and approved drugs are screened against panels of molecularly characterized cancer cell lines, have been proposed as a means for identifying drugs effective against specific cancers and for developing genomic biomarkers predictive of drug response. The Genomics of Drug Sensitivity in Cancer project (GDSC, referred to as the Cancer Genome Project [CGP] in our initial study)1, and the Cancer Cell Line Encyclopedia (CCLE)2 have each reported results of such screens, providing data on drug sensitivities and molecular profiles for collections of representative cancer cell lines.

Presented with these two large studies, our hope was that we could use the data to identify new molecular biomarkers of drug response in one study that would predict response in the second. We3 and others46 reported difficulties in building and validating biomarkers of response using the GDSC and CCLE datasets, even when the analysis was limited to the drugs and cell lines screened in both studies. To understand the cause of this failure, we compared the gene expression profiles and the drug response data reported by the GDSC and CCLE7,8. We found that, although the gene expression data showed reasonable consistency between the two studies, the drug sensitivity measurements were surprisingly inconsistent. This inconsistency can be clearly seen by plotting drug response reported for each of the 15 drugs provided in both GDSC and CCLE for the 471 cell lines assayed by both studies710. Since the publication of our comparative analysis, we received a great deal of constructive feedback from the scientific community regarding multiple aspects of the analysis we reported, including suggestions for analytical methods that might uncover greater consistency between the studies. Moreover, both GDSC and CCLE have released new drug sensitivity and molecular profiling data, allowing us not only to revisit our initial analysis, but also to extend it using these new data.

To begin, we investigated alternative statistics to assess the inter-study consistency for drugs exhibiting different patterns of response across the collection of cell lines common to both studies. We then considered statistical methods for highly targeted drugs expected to be sensitive only in a subset of cell lines. We compared consistency estimates between continuous and discretized molecular features (gene expression, copy number variations and mutations) and drug sensitivity data, and importantly, assessed how potential discordance may affect the discovery of molecular features (biomarkers) predictive of drug response. We also revisited our analysis of consistency of molecular data between studies and evaluated “known biomarkers” of response expected to be predictive in these studies.

This extensive reanalysis found that by selecting specific statistical measures on a case-by-case basis, one can identify moderate to good consistency for two broad effect and three highly targeted therapies. However, overall, our results support our initial observations that drug sensitivity data in GDSC and CCLE are inconsistent for the majority of the drugs, even when considering metrics yielding the highest consistency for individual drugs. Our present analysis adds further evidence supporting the need for robust and standardized experimental pipelines to assure generation of comparable, biologically relevant measures of drug response as well as unbiased statistical and machine learning methods to better predict response. Failure to do so will continue to limit the potential for use of large-scale pharmacogenomic screens in reliable drug development and precision medicine applications.

Results

The overall analysis design of our study is represented in Figure 1.

828ef650-b741-4fa8-9a7e-dc73240554b2_figure1.gif

Figure 1. Analysis design.

GDSC: Genomics of Drug Sensitivity in Cancer; AE: ArrayExpress; Cosmic: Catalogue of Somatic Mutations in Cancer; CGHub: Cancer Genomics Hub; CCLE: Cancer Cell Line Encyclopedia.

Intersection between GDSC and CCLE

To identify the largest set of cell lines and drugs profiled by both GDSC and CCLE, we used the PharmacoGx computational platform11 that is able to store, analyze, and compare curated pharmacogenomic datasets. We created curated datasets for the new releases of the GDSC (July 2015) and CCLE (February 2015) projects. The improved curation of new data using PharmacoGx11 identified 15 drugs in common between GDSC and CCLE as well 698 cell lines, originating from 23 tissue types (Figure 2). This is the same number of shared drugs but the updated datasets contains a larger number of common cell lines than the 471 reported in our previous analysis7.

828ef650-b741-4fa8-9a7e-dc73240554b2_figure2.gif

Figure 2. Intersection between GDSC and CCLE.

Overlap of (A) drugs, (B) cell lines and (C) tissue types.

Comparing single nucleotide polymorphism (SNP) fingerprints

To check the accuracy of cell line name matching, we compared single nucleotide polymorphism (SNP) fingerprints using data released in both studies. We first controlled for the quality of the SNP arrays and excluded 11 of 1,396 profiles due to low quality (see Methods). We then compared SNP fingerprints of cell lines with identical name using > 80% as threshold for concordance12,13. Consistent with the results reported by the CCLE2, the vast majority of cell lines had highly concordant fingerprints (462 out of 470 cell lines with SNP profiles available in both GDSC and CCLE; Dataset 1). We found eight cell lines with same identifier but different SNP identity (Figure 3); these were removed from our subsequent analyses to avoid discrepancies due to the use of possibly mislabeled or contaminated cell lines.

828ef650-b741-4fa8-9a7e-dc73240554b2_figure3.gif

Figure 3. SNP fingerprinting between cancer cell lines screened in GDSC and CCLE.

Estimation and filtering of drug dose-response curves

We used the viability measures for each drug concentration in GDSC and CCLE to fit dose-response curves and assess their quality. An important factor influencing the fitting of drug dose-response curves is the range of concentration used for each cell line/drug combination. In CCLE, all dose-response curves were measured at eight concentrations: 2.5×10-3, 8×10-3, 2.5×10-2, 8×10-2, 2.5×10-1, 8×10-1, 2.5, and 8 μM. However, in GDSC response was measured at a different set of concentrations for each drug. The minimum concentrations for different drugs range from 3.125×10-5 to 15.625 μM. In each case, the concentrations tested by GDSC form a geometric sequence of nine terms with a common ratio of two between successive concentrations. Thus, the maximum concentration tested for each drug is 256 times the minimum concentration for that drug and ranges from 8×10-3 to 4000 μM.

To properly fit drug dose-response curves, one must make multiple assumptions regarding the cell viability measurements generated by the pharmacological platform used in a given study. For instance, one assumes that viability ranges between 0% and 100% after data normalization and that consecutive viability measurements remain stable or decrease monotonically reflecting response to the drug being tested. Quality controls were implemented to flag dose-response curves that strongly violate these assumptions (Supplementary Methods). We identified 2315 (2.9%) and 123 (1%) dose-response curves that failed to pass in GDSC and CCLE, respectively, as exemplified in Figure 4 (all noisy curves are provided in Supplementary File 1. We excluded these cases to avoid erroneous curve fitting.

828ef650-b741-4fa8-9a7e-dc73240554b2_figure4.gif

Figure 4. Examples of noisy drug dose-response curves identified during the filtering process in GDSC and CCLE.

The grey area represents the common concentration range between studies. (A) JNS-62 cell line treated with 17-AAG; (B) LS-513 treated with nutlin-3; (C) HCC70 cell lines treated with PD-0332991; and (D) EFM-19 cell line treated with PD-0325901.

We used least squares optimization to fit a three-parameter sigmoid model (Methods) for the drug dose-response curves in GDSC and CCLE (Supplementary File 2). For each fitted curve, we computed the most widely used drug activity metrics, that are the area under the curve (AUC) and the drug concentration required to inhibit 50% of cell viability (IC50).

Consistency of drug sensitivity data

We began by computing the area between the two drug dose-response curves (ABC) to assess consistency of cell viability data for each cell line combination screened in both GDSC and CCLE using the common concentration range. ABC measures the difference between two drug-dose response curves by estimating the absolute area between these curves, which ranges from 0% (perfect consistency) to 100% (perfect inconsistency). The ABC statistic identified highly consistent (Figure 5A, B) and highly inconsistent (Figure 5C, D) dose-response curves between GDSC and CCLE. The mean of the ABC estimates for all drug-cell line combinations was 10% (Supplementary Figure 1A), with paclitaxel yielding the highest discrepancies (Supplementary Figure 1B).

828ef650-b741-4fa8-9a7e-dc73240554b2_figure5.gif

Figure 5.

Examples of (A,B) consistent and (C,D) inconsistent drug dose-response curves in GDSC and CCLE. The grey area represents the common concentration range between studies. (A) COLO-320-HSR cell line treated with AZD6244; (B) HT-29 treated with PLX4720; (C) CAL-85-1 cell lines treated with 17-AAG; and (D) HT-1080 cell line treated with PD-0332991.

We compared biological replicates in GDSC, which were performed independently at the Massachusetts General Hospital (MGH) and the Wellcome Trust Sanger Institute (WTSI). These experiments are comprised of 577 cell lines treated with AZD6482, a PI3Kβ inhibitor screened in GDSC (Supplementary File 3). We computed the ABC of these biological replicates and observed both highly consistent and inconsistent cases (Supplementary Figure 2). We then computed the median ABC values for each pair of drugs in GDSC and used these as a distance metric for complete linkage hierarchical clustering. We found that the MGH- and WTSI-administered AZD6482 experiments clustered together, suggesting that the differences between dose-response curves of biological replicates were smaller than the differences observed between different drugs (Supplementary Figure 3A). We performed the same clustering analysis by computing the ABC-based distance between all the drugs in GDSC and CCLE and observed that only three out of the 15 common drugs clustered tightly (17-AAG, lapatinib, and PHA−665752; Supplementary Figure 3B). Despite the small number of cell lines exhibiting sensitivity to PHA−665752 and lapatinib, these drugs closely clustered between GDSC and CCLE; however this was not the case for other highly targeted therapies, such as AZD0530, nilotinib, crizotinib and TAE684 Supplementary Figure 3B).

Although the ABC values provide a measure of the degree of consistency between studies, it is the AUC and IC50 estimates, and their correlation with molecular features (such as mutational status and gene expression) that are commonly used to assess drug response. Therefore we revisited our comparative analysis of the drug sensitivity data using the expanded data and the standardized methods implemented in our PharmacoGx platform. Using the same three-parameter sigmoid model to fit drug dose-response curves in GDSC and CCLE (see Methods), we recomputed AUC and IC50 values and observed very high correlation between published and recomputed drug sensitivity values for each study individually (Spearman > 0.93; Figure 6; Dataset 2).

828ef650-b741-4fa8-9a7e-dc73240554b2_figure6.gif

Figure 6. Comparison between published and recomputed drug sensitivity values between GDSC and CCLE.

(A) AUC in GDSC; (B) AUC in CCLE; (C) IC50 in GDSC; and (D) IC50 in CCLE. SCC stands for Spearman correlation coefficient.

It has been suggested that some of the observed inconsistencies between the GDSC and CCLE may be due to the nature of targeted therapies, which are expected to have selective activity against some cell lines10,14,15. This is a reasonable assumption as the measured response in insensitive cell lines may represent random technical noise that one should not expect to be correlated between experiments. We therefore decided to clearly discriminate between highly targeted drugs with narrow growth inhibition effects and drugs with broader effects. We used the full GDSC and CCLE datasets to compare the variation of the drug sensitivity data of known targeted and cytotoxic therapies as classified in the original studies (Supplementary Figure 4). We observed that drugs can be classified in these two categories based on median absolute deviation (MAD) of the estimated AUC values (Youden’s optimal cutoff16 of AUC MAD > 0.13 for cytotoxic drugs). We then used this cutoff on the common drug-cell line combinations in GDSC and CCLE to define three classes of drugs (Supplementary Figure 5):

  • No effect: Drugs with minimal observed activity (typically active in less than five sensitive cell lines with AUC > 0.2 or IC50 < 1 µM in either study). This class includes sorafenib, erlotinib and PHA−665752.

  • Narrow effect: Highly targeted drugs with activity observed for only a small subset of cell lines (AUC MAD ≤ 0.13). This group includes nilotinib, lapatinib, nutlin-3, PLX4720, crizotinib, PD-0332991, AZD0530, and TAE684.

  • Broad effect: Drugs producing a response in a large number of cell lines (AUC MAD > 0.13). This includes AZD6244, PD-0325901, 17-AAG and paclitaxel.

We then compared the AUC (Figure 7, Supplementary Figure 6 and Supplementary Figure 7 for published AUC, recomputed AUC and AUC computed based on the common concentration range, respectively) and IC50 (Supplementary Figure 8 and Supplementary Figure 9) values and calculated the consistency of drug sensitivity data between studies using all common cases and only those that the data suggested were sensitive in at least one study (Figure 8 and Supplementary Figure 10 for AUC and IC50, respectively, and Dataset 3). Given that no single metric can capture all forms of consistency, we extended our previous study by using the Pearson correlation17, Spearman18, and Somers' Dxy19 rank correlation coefficients to quantify the consistency of continuous drug sensitivity measurements across studies (see Methods).

828ef650-b741-4fa8-9a7e-dc73240554b2_figure7.gif

Figure 7. Comparison of AUC values as published in GDSC and CCLE.

For cytotoxic drugs (paclitaxel), cell lines with AUC < 0.4 were considered as insensitive, while for targeted therapies cell lines with AUC < 0.2 were considered insensitive (grey dashed lines). In case of perfect consistency, all points would lie on the grey diagonal.

As expected, no consistency was observed for drugs with “no effect” (Figure 8A). For AUC of drugs with narrow and broad effects, Somers' Dxy was the most stringent, with consistency estimated to be < 0.4 except for two drugs (PD-0325901 and 17-AAG), which were also the two drugs identified as the most consistent using Spearman correlation (ρ ~ 0.6; Figure 8A). However, these statistics did not capture potential consistency for the most highly targeted therapies, nilotinib, crizotinib, and PLX4720, for which the Pearson correlation coefficient gave the best evidence of concordance, as this statistics is strongly influenced by a small number of highly sensitive cell lines (Figure 7). Our results concur with the recent comparative study published by the GDSC and CCLE investigators15.

828ef650-b741-4fa8-9a7e-dc73240554b2_figure8.gif

Figure 8. Consistency of AUC values as published and recomputed within PharmacoGx, with AUC* being computed using the common concentration range between GDSC and CCLE.

(A) Consistency assessed using the full set of cancer cell lines screened in both studies. (B) Consistency assessed using only sensitive cell lines (AUC ≥ 0.4 for broad effect drugs, and AUC ≥ 0.2 for drugs with narrow effects). (C) Consistently assessed by discretizing the drug sensitivity data using the aforementioned cutoffs for AUC. PCC: Pearson correlation coefficient; SCC: Spearman rank-based correlation coefficient; DXY: Somers’ Dxy rank correlation; MCC: Matthews correlation coefficient; CRAMERV: Cramer’s V statistic; INFORM: Informedness. The symbol '*’ indicates whether the consistency is statistically significant (p< 0.05).

We then restricted our analysis to the cell lines identified as sensitive in at least one study and computed the same consistency measures (Figure 8B). To our surprise, eliminating the insensitive cell lines resulted in decreased consistency for most drugs, which suggests a high level of inconsistency across sensitive cell lines, with the only exceptions of the highly targeted drugs nilotinib and crizotinib.

To test whether discretization of drug sensitivity data into binary calls (“insensitive” vs. “sensitive”; see Methods) improves consistency across studies, we used three association statistics, the Matthews correlation coefficient20, Cramer’s V21, and the informedness22 statistics (Figure 8C). These statistics are designed for use with imbalanced classes, which is particularly relevant in large pharmacogenomic datasets where, for targeted therapies, there are often many more insensitive cell lines than sensitive ones. As expected, the highly targeted therapies, nilotinib and PLX4720 (and nutlin-3 using informedness), yielded high level of consistency, but this was not the case for the other targeted therapies. We also found that the drug sensitivity calls for drugs with broader inhibitory effects were also poorly correlated between studies (Figure 8C).

We performed the same analysis using IC50 values truncated to the maximum concentration used for each drug in each study separately. We observed similar patterns with nilotinib and crizotinib yielding moderate to high consistency across studies (Supplementary Figure 10). Note that Somers' Dxy rank correlation is biased in the presence of many repeated values in the datasets being analyzed, which is the case for truncated IC50 — pairs of cell line with identical IC50 values in one dataset but not in the other will not be taken into account as evidence of inconsistency — which explains the artifactual perfect consistency it suggests for both nilotinib and crizotinib.

Consistency of molecular profiles across cell lines

Discovering new biomarkers predictive of drug response requires both robust pharmacological data and molecular profiles. In our original study, we showed that the gene expression profiles for each cell line profiled by both GDSC and CCLE were highly consistent. However, we found that mutation profiles were only moderately consistent, a result that was later confirmed by Hudson et al.23.

There have been questions as to whether the measures of consistency we reported for drug response should be compared to those we reported for gene expression. Specifically, we reported correlations based on comparing drug response “across cell lines,” meaning that we examined the correlation of response of each cell line to a particular drug reported by the GDSC with the response of the same cell line to the same drug reported by the CCLE. In contrast we reported correlation of gene expression levels “between cell lines,” meaning that we compared the expression of all genes within each cell line in the GDSC to the expression of all genes in the same cell line in the CCLE (see Supplementary Methods). It has been suggested that a more valid comparison would be to compare both drug response and gene expression across cell lines. We report the results of such an “across cell lines” analysis of gene expression here, computed using techniques analogous to those we used to compare drug response.

We began by comparing the distribution of gene expression measurements generated using the microarray Affymetrix HG-U219 platform in GDSC, the microarray Affymetrix HG-U133PLUS2 platform and the new Illumina RNA-seq data in CCLE (Supplementary Figure 11). We observed similar bimodal distributions, suggesting the presence of a natural cutoff to discriminate between lowly vs. highly expressed genes. We therefore fit a mixture of two gaussians and identified an expression cutoff for each platform separately (Supplementary Figure 11). We then compared the consistency of continuous and discretized gene expression values between (i) the microarray Affymetrix HG-U133PLUS2 and Illumina RNA-seq platforms within CCLE (intra-lab consistency); (ii) the microarray Affymetrix HG-U219 and HG-U133PLUS2 platforms used in GDSC and CCLE, respectively (microarray, inter-lab consistency); and (iii) the microarray Affymetrix HG-U219 and Illumina RNA-seq platforms used in GDSC and CCLE, respectively (inter-lab consistency). We performed a similar analysis for CNV log-ratios and observed high consistency across cell lines (Figure 9A). Supporting our previous observations, we found that CNV and gene expression measurements are significantly more consistent than drug sensitivity values when using all cell lines (Wilcoxon rank sum test p-value < 0.05; Figure 9A; Supplementary Figure 12A).

828ef650-b741-4fa8-9a7e-dc73240554b2_figure9.gif

Figure 9. Consistency of molecular profiles (gene expression, copy number variation and mutation) and drug sensitivity data between GDSC and CCLE using multiple consistency measures.

(A) Consistency assessed using the full set of cancer cell lines screened in both studies. (B) Consistency assessed using only sensitive cell lines (AUC ≥ 0.4 for broad effect drugs, and AUC ≥ 0.2 for drugs with narrow effects). (C) Consistently assessed by discretizing the molecular and drug sensitivity data. PCC: Pearson correlation coefficient; SCC: Spearman rank-based correlation coefficient; DXY: Somers’ Dxy rank correlation; MCC: Matthews correlation coefficient; CRAMERV: Cramer’s V statistic; INFORM: Informedness.

Similarly to the filtering we performed for drug sensitivity data, we subsequently restricted our analysis to the cell lines showing high expression of a given gene/cell line combination in at least one study. Again, CNV and gene expression measurements were significantly more consistent than drug sensitivity values in this case (Wilcoxon rank sum test p-value < 0.05; Figure 9B; Supplementary Figure 12B). When dichotomizing data into lowly/highly expressing, amplifications/deletions, and wild type/mutated cell lines and insensitive/sensitive cell lines, the CNV and gene expression data were still more consistent (Figure 9C) although the difference was not always significant (Supplementary Figure 12C). Concurring with the report of Hudson et al.23, we observed low consistency for mutation calls across cell lines (Figure 9C).

Consistency of gene-drug associations

The primary goal of the GDSC and CCLE studies was to identify new genomic predictors of drug response for both targeted and cytotoxic therapies. We therefore evaluated whether the good consistency in drug sensitivity data observed for nilotinib, PLX4720 and crizotinib, and the moderate consistency observed for 17-AAG and PD-0332901 would translate in reproducible biomarkers. We estimated gene–drug associations by fitting, for each gene and drug, a linear regression model including gene expression, CNV and mutations as predictors of drug sensitivity, adjusted for tissue source (see Methods). As illustrated in Figure 1, we used the molecular and pharmacological data generated independently in GDSC and CCLE to identify and compare gene-drug associations. This approach prevents any information leak between the two datasets, which may lead to overoptimistic consistency between the studies, as in the recent comparative study published by the GDSC and CCLE investigators9. Given the high correlation between the published and recomputed AUC values in each study (Figure 6) and their similar consistency (Figure 9), all gene-drug associations were computed using published AUC for clarity.

We first computed the strength and significance of each gene expression in both datasets separately. Similarly to our initial study7, the strength of a given gene-drug association is provided by the standardized coefficient associated to the corresponding gene expression in the linear model and its significance is provided by the p-value of this coefficient (see Methods). We then identified gene-drug associations that were reproducible in both datasets (same sign and False Discovery Rate [FDR] < 5%) or that were dataset-specific (different sign or significant in only one dataset) using continuous (Supplementary Figure 13 and Supplementary Figure 14 for common and all cell lines, respectively) and discretized (Supplementary Figure 15 and Supplementary Figure 15 for common and all cell lines, respectively) published AUC values as drug sensitivity data. We assessed the overlap of gene-drug associations discovered in both datasets using the Jaccard index24. All Jaccard indices were low, with nilotinib yielded the largest overlap of gene-drug associations (32%), followed by PD-0325901 and erlotinib (almost 20%), while the other drugs yielded less than 15% overlap (Supplementary Figure 17). Our results further indicate that larger overlap exists for gene-drug associations identified using the continuous drug sensitivity data compared with associations using discretized drug sensitivity calls (Wilcoxon signed rank test p-value of 4×10-2 and 2×10-3 for the common set and the full set of cell lines, respectively). We therefore focused our analyses on the gene-drug associations identified using continuous published AUC values. The number (and identity) of gene-drug associations computed using continuous published AUC values are provided in Supplementary Table 1 and Supplementary Table 2 (Dataset 5 and Dataset 6) for common and all cell lines, respectively.

Given that simply intersecting significant gene-drug associations identified in each dataset separately yielded poor reproducibility for all drugs, we sought to more closely mimic the biomarker discovery and validation process. We therefore used one dataset to discover significant gene-drug associations and test whether this subset of markers validated in an independent dataset. Using the discovery dataset, gene-drug associations are first ranked by nominal p-values and their FDR is computed. An association is selected if it is part of the top 100 markers and its FDR is less than 5%. This procedure ensure to control for both significance and number of selected biomarkers, which can vary with respect to the cell line panel used for the analysis (larger panels enable the identification of more significant biomarkers due to increased statistical power). A gene-drug association is validated in an independent dataset if its nominal p-value is less than 0.05 and its “direction”, that is whether the marker is associated with sensitivity or resistance, is identical to the one estimated during the discovery process.

We computed the proportions of validated gene-drug associations for each drug using gene expression data in GDSC as discovery set and CCLE as validation set, and vice versa (Figure 10). Overall, we found that biomarkers for PD-0325901 and nilotinib yielded a high validation rate (> 80%) with either dataset as discovery set using the common cell lines screened in GDSC and CCLE (Figure 10A). When using the entire cell line panels used in each study, two more drugs -- lapatinib and erlotinib -- yielded high validation rate (Figure 10B). 17-AAG, and PLX4720 yielded validation rate between 60% and 80%, while the other drugs yielded a validation rate around 50% or lower. For eight out of the fifteen drugs, using the entire panel of cell lines screened in each study (Figure 10B) improved the validation rate compared to limiting the analysis to common cell lines (Figure 10A). However validation rate decreased for five other drugs, suggesting that using large, but different panels of cell lines may increase statistical power but could also introduce biases in the biomarker discovery process.

828ef650-b741-4fa8-9a7e-dc73240554b2_figure10.gif

Figure 10. Proportion of gene-drug associations identified in a discovery set (top 100 gene-drug associations as ranked by p-values and FDR < 5%) and validated in an independent validation dataset.

In blue and red are the gene-drug associations identified in GDSC and CCLE, respectively. Associations are identified using gene expression data as input and (A) continuous published AUC values as output in a linear model using only common cell lines or (B) all cell lines. The number of selected gene-drugs associations in each datasets is provided in parentheses. The symbol ’*’ represents the significance of the proportion of validated gene-drug associations, computed as the frequency of 1000 random subsets of markers of the same size having equal or greater validation rate compared to the observed rate.

We then investigated whether higher validation rates would be obtained by using more stringent significance threshold and relaxing the constraint on the number of significant associations in the discovery set (Supplementary Figure 18 and Supplementary Figure 19). Using common cell lines, we found that proportion of validated gene-drug association monotonically increases with FDR stringency for six drugs, with very high validation rate for the most stringent FDR cutoff (validation rate > 80% for FDR < 0.1%) for 17-AAG, PD-0325901, PLX4720 and nilotinib using either dataset as discovery set (Supplementary Figure 18). Using the entire panel of cell lines in each study actually improved validation rate for six drugs, AZD6244, TAE684, AZD0530, lapatinib — and erlotinib and sorafenib, for which insufficient number of sensitive cell lines were screened in both GDSC and CCLE (Supplementary Figure 19). However, validation rate decreased for 17-AAG, crizotinib and PLX4720, which suggests again that large, but different panels of cell lines might introduce selection bias for some drugs.

Known biomarkers

As reported in the original GDSC (1) and CCLE (2) publications and in recent reports10,14,15, several known biomarkers for targeted therapies have been shown to be predictive in both GDSC and CCLE. In our initial comparative study we also found the following known gene-drug associations:

  • BRAF mutations were significantly associated with sensitivity to MEK inhibitors (AZD6244 and PD-0325901) and BRAFV600E inhibitor (PLX4720) with nominal p-values < 0.01; see Supplementary File 10–Supplementary File 13 of our initial study.

  • ERBB2 expression was significantly associated with sensitivity to lapatinib with nominal p-value = 0.04 and 8.4×10-15 for GDSC and CCLE, respectively; see Supplementary File 4 and Supplementary File 5 of our initial study.

  • NQ01 expression was significantly associated with sensitivity to 17-AAG with nominal p-value = 2.4×10-13 and 6.2×10-14 for GDSC and CCLE, respectively; see Supplementary File 4 and Supplementary File 5 of our initial study.

  • MDM2 expression was significantly associated with sensitivity to Nutlin-3 with nominal p-value = 7.7×10-18 and 7×10-8 for GDSC and CCLE, respectively; see Supplementary File 4 and Supplementary File 5 of our initial study.

  • ALK expression was significantly associated with sensitivity to TAE684 with nominal p-value = 1.6×10-9 and 1.7×10-9 or GDSC and CCLE, respectively; see Supplementary File 4 and Supplementary File 5 of our initial study.

We revisited our biomarker analysis using the new data released by GDSC and CCLE to test whether additional known biomarkers can be identified. In addition to the expression-based gene-drug association reported in Dataset 6, we recomputed all gene-drug associations based on mutations (Dataset 7) and gene fusions using the entire panel of cell lines in each study. We confirmed the reproducibility of the known associations reported in our initial study, but we were not able to find reproducible associations for EGFR mutations with response to AZD0530 and erlotinib, and HGF expression with response to crizotinib (Table 1). The reproducibility of the majority of these previously known associations attests to the relevance of the GDSC and CCLE datasets although our results demonstrated that the noise and inconsistency in drug sensitivity data render discovery of new biomarkers difficult for the majority of the drugs.

Table 1. List of known gene-drug associations with their effect size and significance in GDSC and CCLE.

Gene-drug associations were estimated using the full panel of cell lines and AUC as measure of drug sensitivity.

DrugGeneTypeGDSC
effect size
GDSC
pvalue
CCLE
effect size
CCLE
pvalue
Reproducible
17-AAGNQO1expression0.555.3E-390.604.7E-29YES
PD-0325901BRAFmutation0.836.4E-090.828.1E-10YES
AZD6244BRAFmutation0.936.1E-100.863.7E-10YES
TAE684ALKexpression0.282.2E-070.261.1E-08YES
AZD0530EGFRmutation0.039.5E-010.518.2E-03NO
AZD0530BCR_ABLfusion3.872.2E-18
CrizotinibHGFexpression-0.036.5E-010.281.3E-09NO
CrizotinibMETamplification0.108.1E-020.293.8E-09NO
PLX4720BRAFmutation1.758.6E-461.382.2E-27YES
Nutlin-3MDM2expression0.392.0E-250.318.4E-12YES
lapatinibERBB2expression0.421.1E-120.533.4E-33YES
lapatinibERBB2amplification0.248.4E-060.394.2E-19YES
NilotinibBCR_ABLfusion6.155.7E-52
PHA-665752HGFexpression0.044.9E-010.062.0E-01NO
PHA-665752METamplification0.222.2E-040.026.8E-01NO
ErlotinibEGFRmutation0.711.9E-011.272.4E-12NO
SorafenibFLT3mutation1.201.9E-020.963.5E-05YES

Discussion

Our original motivation in analyzing the GDSC and CCLE data was to develop predictive gene expression biomarkers of drug response. When we applied a number of methods using one study to select gene expression features and to train a classifier, and then applied it to predict reported drug response in the second study, our predictive models failed to validate for half of the drugs tested3. Indeed, out of nine predictors yielding concordance index25 ≥0.65 in cross-validation in the training set (GDSC), only four were validated in identical cell lines treated with the same drugs in the validation set (CCLE)3.

As we explored the reasons for this failure, we first checked whether cell lines could have drifted and consequently exhibited different transcriptional profiles between GDSC and CCLE. We found that any genome-wide expression profile in one study would almost always identify “itself” (its purported biological replica) as being most similar among the cell lines in the other study. In a way this is not surprising. When gene expression studies were in their infancy, there were many reports that compared the results from studies and found that they were inconsistent and unreproducible in new studies — as demonstrated by the countless microarray signatures that fail to reproduce beyond their initial publication. As a result, scientists involved in gene expression studies “circled the wagons” and developed both much more standardized laboratory protocols and “best practices” for reproducible analysis, including data normalization and batch corrections, that now mean that independent measurements from different laboratories are far more often consistent and so can be used for signature development and validation26,27.

Unexpectedly, when we compared phenotypic measures of drug response that were released by the GDSC and CCLE projects, we found discrepancies in growth inhibition effects of multiple anticancer agents. What that means in practice is that, for some drugs, a molecular biomarker of drug response learned from one study would not likely be predictive of the reported response in the other. And consequently neither of the studies might be useful in predicting response in patients as many had hoped when these large pharmacogenomic screens were published.

The feedback from the scientific community on our analysis, the availability of new data from the GDSC and CCLE, as well as improvements in the PharmacoGx software platform we developed to support this type of analyses11, prompted us to revisit the question of consistency in these studies to see if we could find a principled way to identify correlated drug response phenotypes. By testing a variety of methods of classifying the data, and choosing the metric which gave the best consistency for each drug, we were able to find moderate to good consistency of sensitivity data for two broad effect and three highly targeted drugs. We also confirmed the overall lack of consistency between the studies for eight drugs, while there were not enough sensitive cell lines that had been screened by both GDSC and CCLE to properly assess consistency for the remaining three drugs. The summary box included with this paper briefly describes the most significant issues that people have raised in discussing our previous findings with us and summarizes what we have found in our reanalysis.

Some have suggested that one way to improve correlation would have been to compare the studies and throw out the most discordant data as noise and then compare the remaining concordant data. While this would certainly find concordance in the remaining data, the approach is equivalent to fitting data to a desired result, which is bad practice and certainly could not be extended to other data sets or to the classification of patient tumors as responsive or nonresponsive to a particular therapy.

There is, however, merit in the suggestion that one would not expect to see correlation in noise. And noise is precisely what one would expect to see in drug response data from cell lines that are resistant to a particular drug or nonresponsive across the range of doses tested. As reported here, filtering the data in each study independently to classify cell lines in a binary fashion, and then comparing the binary classification between studies using a variety of metrics developed to handle the intricacies of this sort of response data, also failed to find simple correlations in the data, except for three of the highly targeted therapies, nilotinib, PLX4720 and crizotinib. What this ultimately means is that the most and the least sensitive cell lines would not appear to be the same when comparing the two studies.

There are many reasons for potential differences in measured phenotypes reported by the GDSC and CCLE, including substantial differences in doses used for each drug and in the methods used to both assay cell viability and to estimate drug response parameters. By comparing GDSC and CCLE with an independent pharmacogenomic dataset published by GlaxoSmithKline (GSK), we showed that higher consistency is achieved when the same pharmacological assay is used (GSK and CCLE used the CellTiter-Glo assay, while GDSC used Syto60)7,8. Genentech also used the CellTiter-Glo assay and observed higher consistency of drug sensitivity data with CCLE compared to GDSC10. The authors elegantly evaluated the impact of cell viability readout, growth medium, and seeding density. They observed only weak impact of the choice of pharmacological assay as their follow-up screen with the Syto60 assay clustered closer to their own CellTiter-Glo screen than GDSC, suggesting that other parameters might have driven the inconsistency observed with GDSC10. They further showed that increased fetal bovine serum and seeding cell density had a systematic effect on mean cell viability. Pozdeyev et al. showed that restricting the computation of AUC to the concentration range shared between GDSC and CCLE, the equivalent of our AUC* drug sensitivity measure, yielded a small, but statistically significant improvement in consistent of pharmacological profiles28. Ultimately what our analysis and these recent reports suggest is that not only drug sensitivity measurements must be carefully and appropriately compared, but also that there is a pressing need for standardization of both laboratory and computational methods for assaying drug response.

The primary goal of the GDSC and CCLE studies was to link molecular features of a large panel of cancer cell lines to their sensitivity to cytotoxic and targeted drugs. The reproducibility of most of the known gene-drug associations provides evidence that these large pharmacogenomic datasets are biologically relevant. When we investigated whether we could find significant gene-drug associations discovered in one dataset that validate in the other independent dataset, we observed over 75% validation rate for the most significant molecular biomarkers for eight of 15 drugs, which is a major improvement over our initial comparative study. However, this does not suggest that one can use these studies to find new, reproducible gene-drug associations for the rest of the drugs -- excluding paclitaxel and PHA-655752 for which no significant biomarkers could be identified -- as the majority of associations can be found in only one dataset but not in both.

This study has several potential limitations. First, while the raw drug sensitivity data are publicly available for GDSC, these data have not been released within the CCLE study. We could not fit the drug dose-response curves using the technical triplicates but rather relied on the published median sensitivity values. Second, we discretized drug sensitivity values by selecting a common threshold to discriminate between insensitive (AUC ≤ 0.2 and IC50 ≥ 1 µM) and the rest of the cell lines for all the targeted agents. However, it is clear that such a threshold could be optimized for each drug, which might have an impact on the consistency of drug phenotypes and gene-drug associations based on binary sensitivity calls (note that the same applies for molecular data as well). Unfortunately the size of the current drug sensitivity datasets is not sufficient to develop drug-specific thresholds for sensitivity values but the release of larger pharmacogenomic studies may allow us to address this issue in the near future. Lastly, the current set of mutations assessed in both study is small (64 mutations), which drastically limits the search for mutation-based and other genomic aberrations associated with drug response. The exome-sequencing data available within the new GDSC1000 dataset will enable to better explore the genomic space of biomarkers in cancer cell lines, and their reproducibility across studies.

Conclusion

As is true of many scientists working in genomics and oncology, we were excited when the GDSC and CCLE released their initial data sets and were hopeful that these projects would help to accelerate drug discovery and further the development of precision medicine in oncology. However, what we found initially, and what the reanalysis presented here further indicates, is that there are inconsistencies between the measured phenotypic response to drugs in these studies. Even in our reanalysis, where we used methods specific to individual drugs and the response characteristics of the cell lines tested, we were only able to find new biomarkers predictive of response for around half of the drugs screened in both studies. Consequently, it is challenging to use the data from these studies to develop general purpose classification rules for all drugs.

Our finding that molecular profiles are significantly more consistent than drug sensitivity data, indicates that the main barrier to biomarker development using these data is the unreliability in the reported response phenotypes for many drugs. For studies such as these to realize their full potential, additional work must be done to develop robust and reproducible experimental and analytical protocols so that the same compound, tested on the same set of cell lines by different groups, yields consistent and comparable results. Barring this, a predictive biomarker of response developed from one study is unlikely to be able to reliably validated on another, and consequently, is unlikely to be useful in predicting patient response.

From having worked in large-scale genomic analyses, we recognize the challenges involved in planning and executing such studies and commend the GDSC and CCLE for their work and for making all the data available. However, we strongly encourage the GDSC, the CCLE, the pharmacogenomics and bioinformatics communities as a whole, to invest the necessary time and effort to standardize drug response assays in order to achieve greater consistency and to assure that measurements in cell lines are relevant for predicting response in patients. The recent report from Genentech is a significant step in this direction. Ultimately, that effort will help to assure that mammoth undertakings in drug characterization can deliver on their promise to identify better therapies and biomarkers predictive of response.

Methods

The PharmacoGx platform

The lack of standardization of cell line and drug identifiers hinders comparison of molecular and pharmacological data between large-scale pharmacogenomic studies, such as the GDSC and CCLE. To address this issue we developed PharmacoGx, a computational platform enabling users to download and interrogate large pharmacogenomic datasets that were extensively curated to ensure maximum overlap and consistency11. PharmacoGx provides (i) a new object class, called PharmacoSet, that acts as a container for the high-throughput pharmacological and molecular data generated in large pharmacogenomics studies (detailed structure provided in Supplementary Methods); and (ii) a set of parallelized functions to assess the reproducibility of pharmacological and molecular data and to identify molecular features associated with drug effects. The PharmacoGx package is open-source and publicly available on Bioconductor.

The GDSC (formerly CGP) dataset

Drug sensitivity data. We used the data release 5 (June 2014) with 6,734 new IC50 values for a total of 79,903 drug dose-response curves for 139 different drugs tested on a panel of up to 672 unique cell lines. The data are accessible from ftp://ftp.sanger.ac.uk/pub4/cancerrxgene/releases/release-5.0/.

Molecular profiles. Gene expression data were downloaded from ArrayExpress, accession number E-MTAB-3610. These new data were generated using Affymetrix HG-U219 microarray platform. We processed and normalized the CEL files using RMA29 with BrainArray30 chip description file based on Ensembl gene identifiers (version 19). This resulted in a matrix of normalized expression for 17,616 unique Ensembl gene ids. SNP array data for the Genome-Wide Human SNP Array 6.0 platform were downloaded from GEO with the accession number GSE36139. We processed the raw CEL data using Affymetrix Power Tools (APT) v1.16.1. Copy number segments were generated using HAPSEG v1.1.131 based on RMA-normalized signal intensities and Birdseed v2-called genotypes. These segments were further refined using ABSOLUTE v1.0.632 to identify allele-specificity within each segment. Mutation and gene fusion calls were downloaded from the GDSC website and processed as in our initial study7.

The CCLE dataset

Drug sensitivity data. We used the drug sensitivity data available from the CCLE website (https://portals.broadinstitute.org/ccle/data/browseData) and updated on February 2015 with a total number of 11,670 dose-response curves for 24 drugs tested in a panel of up to 504 cell lines.

Molecular profiles. Gene expression data were downloaded from the CCLE website and CGHub33 for the Affymetrix HG-U133PLUS2 and Illumina HiSeq 2500 platforms, respectively. SNP array data were downloaded from EMBL-EBI with the accession number EGAD00010000644. Normalization of microarray data (1036 cell lines) and SNP array data (1190 cell lines) was performed the same way than for GDSC. RNA-seq data (935 cell lines) were downloaded as BAM files previously aligned using TopHat34 and the quantification of gene expression was performed using Cufflinks34 based on Ensembl GrCh37 human reference genome. Mutation data were retrieved from the CCLE website and processed as in our initial study7.

Curation of drug and cell line identifiers

The lack of standardization for cell line names and drug identifiers represents a major barrier for performing comparative analyses of large pharmacogenomics studies, such as GDSC and CCLE. We therefore curated these datasets to maximize the overlap in cell lines and drugs by assigning a unique identifier to each cell line and drug. Entities with the same unique identifier were matched. Manual search was then applied to match any remaining cell lines or drugs which were not matched based on string similarity; annotations were consistently extracted from Cellosaurus35. The cell line curation was validated by ensuring that the cell lines with matched name had a similar SNP fingerprint (see below). The drug curation was validated by examining the extended fingerprint of each of their SMILES strings36 and ensuring that the Tanimoto similarity37 between two drugs called as the same, as determined by this fingerprint, was above 0.95.

Cell line identity using SNP fingerprinting

To assess the identity of cell lines from GDSC and CCLE, data of low quality were first excluded from our analysis panel (detailed procedure described in Supplementary Methods). Of the 973 CEL files from GDSC, only 66 fell below the 0.4 threshold (6.88%) for contrast QC scores, indicating issues in resolving base calls. Additionally, five of the 1,190 CEL files from CCLE had an absolute difference between contrast QC scores for Nsp and Sty fragments greater than 2, thus indicating some issues with the efficacy of one enzyme set during sample preparation. CEL files with contrast QC scores indicative of some sort of issue with the assay that would affect the genotype call rate or birdseed accuracy were removed and genotype calling was conducted on the remaining CEL files using Birdseed version 2. The resulting files were then filtered to keep only the 1006 SNP fingerprints that originated from CEL files that had a common cell line annotation between GDSC and CCLE (503 CEL files from each). Finally, pairwise concordances of all SNP fingerprints were generated according to the method outlined by Hong et al.12.

Drug dose-response curves

To identify artefactual drug dose-response curves due to experimental or normalization issues, we developed simple quality controls (QC; details in Supplementary Methods). Briefly, we checked whether normalized viability measurements range between 0% and 100% and that consecutive measurements remain stable or decrease monotonically reflecting response to the drug being tested. The drug dose-response curves which did not pass these simple QC were flagged and removed from subsequent analyses as the curve fitting would have yielded erroneous results.

All dose-response curves were fitted to the equation

y=11+(x/EC50)HS,

where y = 0 denotes death of all infected cells, y = y(0) = 1 denotes no effect of the drug dose, EC50 is the concentration at which viability is reduced to half of the viability observed in the presence of an arbitrarily large concentration of drug, and HS is a parameter describing the cooperativity of binding. HS < 1 denotes negative binding cooperativity, HS = 1 denotes noncooperative binding, and HS > 1 denotes positive binding cooperativity. The parameters of the curves were fitted using the least squares optimization framework. Comparison of our dose-response curve model with those used in the GDSC and CCLE publications is provided in Supplementary Methods.

Discretization of pharmacogenomic data

Drug sensitivity data. To discretize the drug sensitivity data, we used AUC ≤ 0.2 (IC50 ≥ 1 µM) and AUC ≤ 0.4 (IC50 ≥ 10 µM) to identify the “insensitive” cell lines for targeted and cytotoxic drugs, respectively, while the rest of the cell lines are classified as “sensitive”. These reasonable, although somewhat arbitrary, cutoffs enabled to explore the potential of such binary drug sensitivity calls as new drug phenotypic measures to find consistency in drug sensitivity data and gene-drug associations.

Gene expression data. To discretize the drug sensitivity data into lowly vs. highly expressed genes, we fit a mixture of two Gaussians of unequal variance using the full distribution of expression values of the 17,401 genes in common between GDSC and CCLE datasets. We defined the expression threshold as the expression value for which the posterior probability of belonging to the left tail of the highly expression distribution is 10%.

Mutation data. Similarly to the GDSC and CCLE publications, we transformed the original mutation data into binary values that represent the absence (0) or presence (1) of any missense mutations in a given gene in a given cell line.

Gene-drug associations

We assessed the association, across cell lines, between a molecular feature and response to a given drug, referred to as gene-drug association, using a linear regression model adjusted for tissue source:

                                                     Y = β0 + βiGi + βtT

where Y denotes the drug sensitivity variable, Gi and T denote the expression of gene i and the tissue source respectively, and βs are the regression coefficients. The strength of gene-drug association is quantified by βi, above and beyond the relationship between drug sensitivity and tissue source. The variables Y and G are scaled (standard deviation equals to 1) to estimate standardized coefficients from the linear model. Significance of the gene-drug association is estimated by the statistical significance of βi (two-sided t test). When applicable, p-values were corrected for multiple testing using the FDR approach38.

As we recognized that continuous drug sensitivity is not normally distributed, which violates one of the assumption of the linear regression model described above, we also assessed the consistency of gene-drug association using discretized (binary) drug sensitivity calls as the response variable in a logistic regression model adjusted for tissue source, similarly to the linear regression model.

Measure of consistency

Area between curves (ABC). To quantify the difference between two dose-response curves, we computed the area between curves (ABC). ABC is calculated by taking the unsigned area between the two curves over the intersection of the concentration range tested in the two experiments of interest, and normalizing that area by the length of the intersection interval. In the present study, we compared the curves fitted for the same drug-cell line combinations tested both in GDSC and CCLE. Further details are provided in Supplementary Methods.

Pearson correlation coefficient (PCC). PCC is a measure of the linear correlation between two variables, giving a value between +1 and −1 inclusive, where 1 represents total positive correlation, 0 represents no correlation, and −1 represents total negative correlation17. PCC is sensitive to the presence of outliers, like a few sensitive cell lines in the case of drug sensitivity data measured for highly targeted therapies or genes rarely expressed.

Spearman rank correlation coefficient (SCC). SCC is a nonparametric measure of statistical dependence between two variables and is defined as the Pearson correlation coefficient between the ranked variables18. It assesses how well the relationship between two variables can be described using a monotonic function. If there are no repeated data values, a perfect Spearman correlation of +1 or −1 occurs when each of the variables is a perfect monotone function of the other. Contrary to PCC, SCC can capture non linear relationship between variables but is insensitive to outliers, which is frequent for drug sensitivity data measured for highly targeted therapies or genes rarely expressed.

Somers’ Dxy rank correlation (DXY). DXY is a non-parametric measure of association equivalent to (C - 0.5) * 2 where C represents the concordance index25 that is the probability that two variables will rank a random pair of samples the same way19.

Matthews correlation coefficient (MCC). MCC20 is used in machine learning as a measure of the quality of classification predictions. It takes into account true and false positives and negatives, acting as a balanced measure which can be used when the classes are of different sizes. MCC is in essence a correlation coefficient between two binary classifications; it returns a value between −1 (perfect opposite classification) and +1 (identical classifications), with 0 representing association no better than random chance.

Cramer’s V (CRAMERV). CRAMERV is a measure of association between two nominal variables, based on Pearson's chi-squared statistic, giving a value between 0 (no association) and +1 (perfect association)21. In the case of 2×2 contingency table, such as binary drug sensitivity or gene expression measurements, CRAMERV is equivalent to the Phi coefficient.

Informedness (INFORM). For a 2×2 contingency table comparing two binary classifications, INFORM can be defined as Specificity + Sensitivity - 1, which is equivalent to true positive rate - false positive rate22. The magnitude of INFORM gives the probability of an informed decision between the two classes, where INFORM > 0 represents appropriate use of information, 0 represents chance-level decision, < 0 represents perverse use of information.

Data and software availability

Open Science Framework: Dataset: Revisiting inconsistency in large pharmacogenomics studies, doi 10.17605/OSF.IO/CD8Z239

Data: The list of all the pharmacogenomic datasets available through the PharmacoGx platform can be obtained from R using the availablePSets() function from the R/Bioconductor library PharmacoGx.

The GDSC and CCLE PharmacoSets used in this study are available from pmgenomics.ca/bhklab/sites/default/files/downloads/ using the downloadPset() function.

Code: The R code necessary to replicate all the results presented in this article is available from the cdrug2 GitHub repository.

Comments on this article Comments (0)

Version 3
VERSION 3 PUBLISHED 16 Sep 2016
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Safikhani Z, Smirnov P, Freeman M et al. Revisiting inconsistency in large pharmacogenomic studies [version 1; peer review: 1 approved, 1 approved with reservations, 1 not approved] F1000Research 2016, 5:2333 (https://doi.org/10.12688/f1000research.9611.1)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 16 Sep 2016
Views
37
Cite
Reviewer Report 10 May 2017
David G. Covell, Screening Technologies Branch, Developmental Therapeutics Program, National Cancer Institute, Frederick, MD, USA 
Approved
VIEWS 37
The paper under review, 'Revisiting inconsistency in large pharmacogenomics studies' by Zhaleh Safikhani, Petr Smirnov, Mark Freeman, Nehme El-Hachem, Adrian She, Quevedo Rene, Anna Goldenberg, Nicolai J. Birkbak, Christos Hatzis, Leming Shi, Andrew H. Beck, Hugo J.W.L. Aerts, John Quackenbush, Benjamin Haibe-Kains, reports an updated analysis of results from two previously published systematic ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Covell DG. Reviewer Report For: Revisiting inconsistency in large pharmacogenomic studies [version 1; peer review: 1 approved, 1 approved with reservations, 1 not approved]. F1000Research 2016, 5:2333 (https://doi.org/10.5256/f1000research.10354.r22599)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 10 Jul 2017
    Benjamin Haibe-Kains
    10 Jul 2017
    Author Response
    We thank Dr Covell for his constructive comments regarding our study. We are glad to hear that our PharmacoGx package is useful to reproduce our analysis results. The hope is that our ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 10 Jul 2017
    Benjamin Haibe-Kains
    10 Jul 2017
    Author Response
    We thank Dr Covell for his constructive comments regarding our study. We are glad to hear that our PharmacoGx package is useful to reproduce our analysis results. The hope is that our ... Continue reading
Views
61
Cite
Reviewer Report 21 Dec 2016
Paul T. Spellman, Department of Molecular and Medical Genetics, Department of Molecular and Medical Genetics, Oregon Health and Science University, Portland, OR, USA, Portland, OR, USA 
Approved with Reservations
VIEWS 61
Safikhani et al. have updated their previous analysis of two of the largest systematic drug screening projects linked to genomics data. The previous findings indicated that there is a lack of concordance between the two datasets that makes finding biomarkers ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Spellman PT. Reviewer Report For: Revisiting inconsistency in large pharmacogenomic studies [version 1; peer review: 1 approved, 1 approved with reservations, 1 not approved]. F1000Research 2016, 5:2333 (https://doi.org/10.5256/f1000research.10354.r17399)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 25 Jul 2017
    Benjamin Haibe-Kains
    25 Jul 2017
    Author Response
    We thank the reviewer for his constructive comments. We agree with the reviewer that we need to update the discussion to reflect this important point. In the absence of “gold ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 25 Jul 2017
    Benjamin Haibe-Kains
    25 Jul 2017
    Author Response
    We thank the reviewer for his constructive comments. We agree with the reviewer that we need to update the discussion to reflect this important point. In the absence of “gold ... Continue reading
Views
143
Cite
Reviewer Report 03 Nov 2016
Michael T. Hallett, Centre for Structural and Functional Genomics, Department of Biology, Concordia University, Montréal, QC, Canada 
Not Approved
VIEWS 143
This manuscript seeks to compare two large pharmacogenomics datasets (several hundred cancer cell lines screened against 15 common drugs) and evaluate their level of agreement via (1) the drug sensitivity values and (2) gene expression profiles of the cell lines. ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Hallett MT. Reviewer Report For: Revisiting inconsistency in large pharmacogenomic studies [version 1; peer review: 1 approved, 1 approved with reservations, 1 not approved]. F1000Research 2016, 5:2333 (https://doi.org/10.5256/f1000research.10354.r16370)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 25 Jul 2017
    Benjamin Haibe-Kains
    25 Jul 2017
    Author Response
    We thank the reviewer for his constructive comments. We too believe it is important for the community to be aware of the challenges for biomarker discovery stemming from the lack ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 25 Jul 2017
    Benjamin Haibe-Kains
    25 Jul 2017
    Author Response
    We thank the reviewer for his constructive comments. We too believe it is important for the community to be aware of the challenges for biomarker discovery stemming from the lack ... Continue reading

Comments on this article Comments (0)

Version 3
VERSION 3 PUBLISHED 16 Sep 2016
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.