Drug mechanism‐of‐action discovery through the integration of pharmacological and CRISPR screens

Abstract Low success rates during drug development are due, in part, to the difficulty of defining drug mechanism‐of‐action and molecular markers of therapeutic activity. Here, we integrated 199,219 drug sensitivity measurements for 397 unique anti‐cancer drugs with genome‐wide CRISPR loss‐of‐function screens in 484 cell lines to systematically investigate cellular drug mechanism‐of‐action. We observed an enrichment for positive associations between the profile of drug sensitivity and knockout of a drug's nominal target, and by leveraging protein–protein networks, we identified pathways underpinning drug sensitivity. This revealed an unappreciated positive association between mitochondrial E3 ubiquitin–protein ligase MARCH5 dependency and sensitivity to MCL1 inhibitors in breast cancer cell lines. We also estimated drug on‐target and off‐target activity, informing on specificity, potency and toxicity. Linking drug and gene dependency together with genomic data sets uncovered contexts in which molecular networks when perturbed mediate cancer cell loss‐of‐fitness and thereby provide independent and orthogonal evidence of biomarkers for drug development. This study illustrates how integrating cell line drug sensitivity with CRISPR loss‐of‐function screens can elucidate mechanism‐of‐action to advance drug development.


th Feb 2020 1st Editorial Decision
Thank you for submit ting your work to Molecular Syst ems Biology. We have now heard back from two of the three reviewers who agreed to evaluat e your manuscript . Since the recommendat ions of these two reviewers are quit e similar, I prefer to a make a decision now rat her than furt her delaying the process. If we receive comment s from reviewer #1 we will forward them to you so that you can address any furt her issues raised.
As you will see below, the reviewers acknowledge that the present ed met hod and findings seem int erest ing. They raise however a series of concerns, which we would ask you to address in a major revision. The recommendat ions of the reviewers are rat her clear and therefore there is no need to repeat the point s list ed below. Please feel free to cont act me in case you would like to discuss in furt her det ail any of the issues raised by the reviewers. On a more edit orial level, we would ask you to address the following issues: -Please provide a .docx formatted version of the manuscript text (including legends for main figures, EV figures and tables). Please make sure that the changes are highlighted to be clearly visible.
-Please provide a .docx formatted letter INCLUDING the reviewers' reports and your detailed point-by-point responses to their comments. As part of the EMBO Press transparent editorial process, the point-by-point response is part of the Review Process File (RPF), which will be published alongside your paper.
-Please note that all corresponding authors are required to supply an ORCID ID for their name upon submission of a revised manuscript. -Before submitting your revision, primary datasets (and computer code, where appropriate) produced in this study need to be deposited in an appropriate public database (see https://www.embopress.org/page/journal/17444292/authorguide#dataavailability). -Dataset #1 -Dataset #2> Please remember to provide a reviewer password if the datasets are not yet public.
The accession numbers and database should be listed in a formal "Data Availability " section (placed after Materials & Method) that follows the model below (see also https://www.embopress.org/page/journal/17444292/authorguide#dataavailability). Please note that the Data Availability Section is restricted to new primary data that are part of this study.

# Data availability
The datasets (and computer code) produced in this study are available in the following databases: *** Note -All links should resolve to a page where the data can be accessed. *** -We would encourage you to include the source data for figure panels that show essential quantitative information. Additional information on source data and instruction on how to label the files are available at < https://www.embopress.org/page/journal/17444292/authorguide#sourcedata >.
-All Materials and Methods need to be described in the main text. We would encourage you to use 'Structured Methods', our new Materials and Methods format. According to this format, the Material and Methods section should include a Reagents and Tools Table (listing key reagents, experimental models, software and relevant equipment and including their sources and relevant identifiers) followed by a Methods and Protocols section in which we encourage the authors to describe their methods using a step-by-step protocol format with bullet points, to facilitate the adoption of the methodologies across labs. More information on how to adhere to this format as well as downloadable templates (.doc or .xls) for the Reagents and Tools Table can be found in our author guidelines: < https://www.embopress.org/page/journal/17444292/authorguide#researcharticleguide>. An example of a Method paper with Structured Methods can be found here: .
-Please provide a "standfirst text" summarizing the study in one or two sentences (approximately 250 characters, including space), three to four "bullet points" highlighting the main findings. I noticed that you have already provided a "synopsis image" in the pdf format. Please provide it in a jpeg format (550px width and max 400px height).
-When you resubmit your manuscript, please download our CHECKLIST (http://embopress.org/sites/default/files/Resources/EP_Author_Checklist.xls) and include the completed form in your submission. *Please note* that the Author Checklist will be published alongside the paper as part of the transparent process http://msb.embopress.org/authorguide#transparentprocess.
If you feel you can satisfactorily deal with these points and those listed by the referees, you may wish to submit a revised version of your manuscript. Please attach a covering letter giving details of the way in which you have handled each of the points raised by the referees. A revised manuscript will be once again subject to review and you probably understand that we can give you no guarantee at this stage that the eventual outcome will be favorable. Link Not Available IMPORTANT: When you send your revision, we will require the following items: 1. the manuscript text in LaTeX, RTF or MS Word format 2. a letter with a detailed description of the changes made in response to the referees. Please specify clearly the exact places in the text (pages and paragraphs) where each change has been made in response to each specific comment given 3. three to four 'bullet points' highlighting the main findings of your study 4. a short 'blurb' text summarizing in two sentences the study (max. 250 characters) 5. a 'thumbnail image' (550px width and max 400px height, Illustrator, PowerPoint or jpeg format), which can be used as 'visual title' for the synopsis section of your paper. 6. Please include an author contributions statement after the Acknowledgements section (see https://www.embopress.org/page/journal/17444292/authorguide) 7. Please complete the CHECKLIST available at (http://bit.ly/EMBOPressAuthorChecklist). Please note that the Author Checklist will be published alongside the paper as part of the transparent process (https://www.embopress.org/page/journal/17444292/authorguide#transparentprocess). 8. Please note that corresponding authors are required to supply an ORCID ID for their name upon submission of a revised manuscript (EMBO Press signed a joint statement to encourage ORCID adoption). (https://www.embopress.org/page/journal/17444292/authorguide#editorialprocess) Currently, our records indicate that there is no ORCID associated with your account.
Please click the link below to provide an ORCID: Link Not Available The system will prompt you to fill in your funding and payment information. This will allow Wiley to send you a quote for the article processing charge (APC) in case of acceptance. This quote takes into account any reduction or fee waivers that you may be eligible for. Authors do not need to pay any fees before their manuscript is accepted and transferred to the publisher.
As a matter of course, please make sure that you have correctly followed the instructions for aut hors as given on the submission websit e.

REFEREE REPORTS
Reviewer #2: Review: Drug mechanism-of-act ion discovery through the int egrat ion of pharmacological and CRISPR screens This manuscript by Goncalves et al. describes the int egrat ion of nearly 200k drug sensit ivit y measurement s for 397 unique cancer drugs and genome-wide CRISPR loss-of-funct ion screens in various 484 cell lines to syst emat ically invest igat e drug mechanism-of-act ion (MOA) in cells. The proposed analyt ical met hods in this st udy add the value to the concept of training comput at ion models for predict ing drug MOA and drug responses from exist ing dat a, wit h the novelt y of using the CRISPR dat a (i.e. genet ic fit ness) to train the models.
This approach leverages pharmacological and CRISPR screening dat a, I wonder whet her aut hors looked int o if and how the drug-gene associat ions from this st udy relat e to the drug-gene int eract ions det ect ed from chemogenet ic screens (CRISPR screens wit h the added treat ment arm, i.e. combined chemical and genet ic pert urbat ions)?
Pg. 3. Re: 'Parallel integration of gene loss-of-function screens with drug response can be used to investigate drug mechanism-of-action ...' There are other studies which integrated gene loss-of-function screens with drug response to investigate drug MOA besides the ones authors referenced here. Authors might consider referencing those studies as well to stay on track with the most recent relevant data content (e.g. Pg. 3. Re: 'We show that CRISPR-Cas9 datasets recapitulate drug targets, can provide insights into drug potency and selectivity, and define cellular networks underpinning drug sensitivity.' Yes, authors show this in their manuscript, however, this is not a novel finding, similar statements have been previously reported.
Pg. 6. Re: 'For 76 drugs no significant association with their target was identified ... Thus, 47.5% of the annotated compounds (n=170) has an association with either the target or a functionallyrelated protein.' It is not clear to me about which 76 drugs authors are talking here; 76 drugs from 26% of the 358 drugs with target annotation and for which they identified significant drug-gene pairs with their putative targets (first bar in the Figure 1c), or 76 drugs from the remaining 74% of drugs which don't have a significant association with their target gene's knockout? In addition to that, 47.5% of the annotated compounds (n=170), doesn't match the percentage portrayed in the panel Figure 1c Pg. 9. Re: 'Similarly, we observed that selective EGRF inhibitors cetuximab, erlotinib and gefitinib ( Figure 3) were associated with EGFR but nor ERBB2, whereas ...' Has the selectivity of these inhibitors been reported previously? How impactful are these observations on existing therapies for cancers stemming from alterations in EGRF and ERBB2?
Reviewer #3: Refinement of the mechanism of action of anti-cancer therapeutics emerging from phenotypic or target-based screens is critical both for guiding a mechanistic understanding of drug efficacy and toxicity. Previous attempts to solve this challenge have largely relied on low-throughput biophysicalbased measurements or the use of high-dimensional readouts to match gene and drug perturbations.
Here, Goncalves et al, attempt to tackle this challenge systematically in a new way by leveraging the recently published genome-wide CRISPR viability screening datasets that they and others have produced. Their premise in this proof-of-concept manuscript using established cancer drugs with largely known mechanisms of action that the correlation in viability between a genetic knockout and an established drug across nearly 500 cell lines should rediscover the MOA and/or shed new light on it.
Through an extensive series of supervised linear regression analyses they demonstrate the merits of this approach. They find that in 26% of cases, the killing pattern of drugs is directly phenocopied by the CRISPR killing pattern of the known drug target. They explore protein-protein interactions and find new relationships, and also look for "robust biomarkers" that independently explain both the CRISPR killing and drug killing. They discover an exciting relationship between the MARCH5 E3 ligase and MCL1 inhibition, a finding, that with further study should be very interesting for exploiting the MCL1 addiction in many human cancers.
Overall, the paper and approach is solid and interesting. The analysis approach is largely straightforward and while not completely novel, is applied here to a new dataset in a new way. The MARCH5 finding could ultimately be quite important. I believe that many readers will appreciate this approach to the MOA challenge in cancer and this paper will be highly read and cited.
Some suggestions: (1) The paper focuses on established drugs, with potent efficacy and (mostly) highly refined mechanisms of action. While this is useful proof-of-concept, it is unclear ultimately how the approach will work where the real MOA challenge is, which is for compounds in development. This might become a key part of the discussion. Are the same relationships observed when the drug killing effect is weaker (ie. IC10 or 20)? This might be instructive for this future application.
(2) It is possible that straightforward linear mixed regression model analyses used here may miss signal that lies in the tail of the distribution (where the killing is), and overweight the bulk of the distribution. It may be helpful to compare and contrast several analytical approaches.
(3) From my read, it seems the paper is largely the product of supervised analysis (drug target or PPIs of target). Were unsupervised results explored, recognizing the need to correct for multiple hypotheses? Perhaps there is more room for discovery here?
(4) I think the manuscript could be strengthened a bit if the overly simplistic concept of a singular drug target of each drug was softened. This is alluded to in some places, but not others. When a CRISPR and drug profile correlate, this may be due to picking up signal of an additional "offannotated target" effect, or a "pathway" effect of the intended target. How do we know which correlations are due to which effects?
Minor points: (a) I think the Figures, especially Figure 3, could benefit from some additional design focus. (b) For the 26% of drugs that match CRISPR, do we see enrichment for particular categories of drugs, or for those with broader spectra of killing, or strength of killing, etc? Have we learned any general lessons from these or is the result purely stochastic? (c) I would be careful about this statement "hence, CRISPR measurements are more powered than gene expression to identify drug functional interaction networks". The strength of this statement may overinterpret the given analysis.

Response to reviewers
We thank the reviewers for their constructive feedback on our manuscript and believe we have been able to address their comments. Below follows our point-by-point reply to the reviewers' comments (provided in italics), and our responses as well as significant changes to the manuscript text are highlighted in blue font.  Yes, authors show this in their manuscript, however, this is not a novel finding, similar statements have been previously reported. """ We agree with the reviewer's comment. We want to emphasise that from our comprehensive analysis integrating CRISPR-Cas9 screens revealed many functional aspects of cancer drugs that were not explored before. We rephrased this sentence accordingly. We have also now referenced previous studies using loss-of-function screens who have utilised a similar approach. In addition to that, 47.5% of the annotated compounds (n=170), doesn't match the percentage portrayed in the panel Figure 1c. """ We apologize for the confusion; the text did not clearly define the drugs we were referring to. The reviewer is correct to say that the 76 drugs came from those 74% (n=264) of drugs which do not have a significant association with their target's knockout. To clarify, for 358 drugs we have information about their nominal targets which have also been knocked out in the CRISPR-Cas9 screens. From these, 26.3% have significant associations with the target and another 21.2% associations with genes closely related with the target (PPI distance 1, 2 and 3), making a total of 47.5% (n=170). We substantially rephrased the paragraph to make this clearer. Through an extensive series of supervised linear regression analyses they demonstrate the merits of this approach. They find that in 26% of cases, the killing pattern of drugs is directly phenocopied by the CRISPR killing pattern of the known drug target. They explore proteinprotein interactions and find new relationships, and also look for "robust biomarkers" that independently explain both the CRISPR killing and drug killing. They discover an exciting relationship between the MARCH5 E3 ligase and MCL1 inhibition, a finding, that with further study should be very interesting for exploiting the MCL1 addiction in many human cancers.
Overall, the paper and approach is solid and interesting. The analysis approach is largely straightforward and while not completely novel, is applied here to a new dataset in a new way. The MARCH5 finding could ultimately be quite important. I believe that many readers will appreciate this approach to the MOA challenge in cancer and this paper will be highly read and cited. """ We thank the reviewer for their positive comments. """ (1) The paper focuses on established drugs, with potent efficacy and (mostly) highly refined mechanisms of action. While this is useful proof-of-concept, it is unclear ultimately how the approach will work where the real MOA challenge is, which is for compounds in development. This might become a key part of the discussion. Are the same relationships observed when the drug killing effect is weaker (ie. IC10 or 20)? This might be instructive for this future application. """ The reviewer raises an important application of this analysis. Firstly, we have not observed any substantial bias towards drugs with stronger drug responses among the significant drug -gene associations (Rebuttal Figure 1a). This could be in part explained by the fact that for this manuscript we only considered drugs that showed IC50s lower than 50% of the maximum screened concentration in at least 3 cancer cell lines. Additionally, we found that significant drug -gene associations are enriched for drugs with cyostatic/cytotoxic responses in a subset of cancer cell lines (Rebuttal Figure 1b). This suggests that to be able to identify drug -gene interactions, particularly drug -target, it is likely more important to have consistent responses in subsets of cell lines rather than the strength of the drug response. Nonetheless, we cannot completely exclude that for drugs with weaker cyostatic/cytotoxic effects size will be smaller and thereby more difficult to capture their mode-of-action. We expanded the discussion section to address these points, as suggested. Briefly, we believe for compounds with unknown mode-of-action this type of analysis can provide evidence of potential direct targets if a single CRISPR KO correlates strongly with the compound response across the same set of cancer cell lines. The true drug target could be among the top associations and therefore we expect that our approach can be used to guide complementary experimental (e.g. kinobead) and computational (e.g. drug pocket binding) methods for further validation. In the absence of significant associations, this is less informative but it still can support that the compound (if showing cellular activity) is likely mediating its response through engaging multiple targets. We also believe our approach could be useful for drugs in advanced development (e.g. hit or lead optimisation) to identify potential undesirable off-target activities, particularly for non-kinase off-target activities.

Rebuttal
""" (2) It is possible that straightforward linear mixed regression model analyses used here may miss signal that lies in the tail of the distribution (where the killing is), and overweight the bulk of the distribution. It may be helpful to compare and contrast several analytical approaches. """ The reviewer is right to point out that fundamental limitations of simple linear regressions and specific characteristics of drug response distributions might lead to miss some druggene associations. While we agree with this, some aspects of linear models also make them very well suited for this analysis: i. scalability, implementations of linear mixed models have been extensively optimised to handle hundreds to millions of tests, for example for eQTL analyses, this is important for this study as we performed a total of ~8 million tests; ii. availability of well-established and computationally efficient statistical tests, such as likelihood-ratio tests, that support comparisons with covariates and random effects and thereby are instrumental to statistically assess the added value of each gene CRISPR fitness profiles over potential confounding effects; iii. despite relying on the identification of simple linear associations, these approaches approximate reasonably well to drug -gene associations that deviate from that vi. prior to fitting the linear models, we standardise the drug response measurements by removing the mean and scaling to unit variance, this is a common procedure to many machine learning approaches to make the data ranges more comparable.
Taken together, despite the intrinsic limitations of linear regression models, we believe they provide a very flexible and scalable approach to identify relevant associations between drug response and CRISPR gene essentiality profiles.

Rebuttal Figure 2. Representative examples of drug -target associations of drugs with cell cytotoxic/cytostatic responses only present in a small subset of outlier cancer cell lines. MET inhibitor (left) and FGFR2 inhibitor (right) with response
profiles in a small subset of the cell lines that show significant associations with their targets. """ (3) From my read, it seems the paper is largely the product of supervised analysis (drug target or PPIs of target). Were unsupervised results explored, recognizing the need to correct for multiple hypotheses? Perhaps there is more room for discovery here? """ We have taken an unsupervised approach to the identification of drug -gene associations. Considering that drug measurements come from different technological approaches, druggene associations p-values were adjusted on a per drug basis, not overall. We observed that this helps finding the most relevant associations of each drug without the problem of enriching for drugs with an overall higher number of associations, such as Nutlin3-a and FGFR1 inhibitors. We have rephrased initial parts of the manuscript to make this more explicit. """ (4) I think the manuscript could be strengthened a bit if the overly simplistic concept of a singular drug target of each drug was softened. This is alluded to in some places, but not others. When a CRISPR and drug profile correlate, this may be due to picking up signal of an additional "off-annotated target" effect, or a "pathway" effect of the intended target. How do we know which correlations are due to which effects? """ We believe that multiplexed CRISPR screens (e.g. dual and triple knockouts) would be necessary to precisely identify targets of drugs with polypharmacology effects. This would likely inform on the portion of drugs (46.6%) for which we have not identified any significant association with single gene knockout. Unsupervised search of drug targets using ChEMBL bioactivity profiles showed that, despite low mean differences, drugs with significant drugtarget association had lower number of putative targets (two-sided Welch's t-test p-value = 0.003) (new panel in Fig EV3d). We agree with the reviewer that it is important to understand when a drug -gene association is due to direct physical inhibition or indirect association of the drug response pathway. It is challenging to know this from our analysis alone and more evidence is generally required. Nonetheless, we believe our analysis can provide important insights to guide this interpretation. On the one hand, if the effects are related to "pathway" effects then these will be closely connected and functionally related to the known nominal targets of the drugs in the PPI network (PPI shortest path <= 3). We confirmed this is the case for the majority for the drugs we screened. On the other hand, if it is a potential off-target we expect no immediate link of the associated gene with any of the canonical targets of the drug. For example, ibrutinib, a BTK inhibitor, strongly correlates with EGFR and ERBB2 and is supported by kinobead measurements. The relative strength of the association can also provide important insights. For example, MCL1 inhibitors strongly correlate with MCL1 suggesting very selective associations, nonetheless MARCH5 is also related but more weakly. Thus, rather than a putative off-target MARCH5 is likely functionally related, even though String PPI does not have any relation between the two. This was confirmed independently by dual knockout screens, showing a synthetic-lethal interaction of BCL1L2 and MARCH5 (PMID:32029722). We thank the reviewer for this comment as it pointed out important aspects of the interpretation of our analysis that we did not consider in the discussion, and we expanded the discussion accordingly. """ Minor points: (a) I think the Figures, especially Figure 3, could benefit from some additional design focus. """ We changed Figure 3 to group drug -gene associations by drug target classes, this way it corresponds better with the references in the text and makes it easier to compare the activity of drugs of the same target class.
""" (b) For the 26% of drugs that match CRISPR, do we see enrichment for particular categories of drugs, or for those with broader spectra of killing, or strength of killing, etc? Have we learned any general lessons from these or is the result purely stochastic?
""" Some drug target classes seem to be more predominantly represented in the group of 26% of drugs with significant correlation with their nominal targets (Rebuttal Figure 4).
Nonetheless, due to the low number of drugs per target class we can not exclude these are biased by the specific set of drugs that we considered in this analysis. Drugs with a broad spectrum of killing across all cell lines are less likely to have significant associations with their targets (Rebuttal Figure 1b), and this seems to be less driven by cell killing strength (Rebuttal Figure 1a). In general, we observed that significant drug -target associations are more likely to be selective based on independent kinobead assay data (Figure 1e). common tests, such as t-test (please specify whether paired vs. unpaired), simple χ2 tests, Wilcoxon and Mann-Whitney tests, can be unambiguously identified by name only, but more complex techniques should be described in the methods section; are tests one-sided or two-sided? are there adjustments for multiple comparisons? exact statistical test results, e.g., P values = x but not P values < x; definition of 'center values' as median or average; definition of error bars as s.d. or s.e.m.
1.a. How was the sample size chosen to ensure adequate power to detect a pre-specified effect size? 1.b. For animal studies, include a statement about sample size estimate even if no statistical methods were used.
2. Describe inclusion/exclusion criteria if samples or animals were excluded from the analysis. Were the criteria preestablished?
3. Were any steps taken to minimize the effects of subjective bias when allocating animals/samples to treatment (e.g. randomization procedure)? If yes, please describe.
For animal studies, include a statement about randomization even if no randomization was used.
4.a. Were any steps taken to minimize the effects of subjective bias during group allocation or/and when assessing results (e.g. blinding of the investigator)? If yes please describe.

B-Statistics and general methods
the assay(s) and method(s) used to carry out the reported observations and measurements an explicit mention of the biological and chemical entity(ies) that are being measured. an explicit mention of the biological and chemical entity(ies) that are altered/varied/perturbed in a controlled manner. a statement of how many times the experiment shown was independently replicated in the laboratory.
Any descriptions too long for the figure legend should be included in the methods section and/or with the source data.
In the pink boxes below, please ensure that the answers to the following questions are reported in the manuscript itself. Every question should be answered. If the question is not relevant to your research, please write NA (non applicable). We encourage you to include a specific subsection in the methods section for statistics, reagents, animal models and human subjects.

definitions of statistical methods and measures:
a description of the sample collection allowing the reader to understand whether the samples represent technical or biological replicates (including how many animals, litters, cultures, etc.).

The data shown in figures should satisfy the following conditions:
Source Data should be included to report the data underlying graphs. Please follow the guidelines set out in the author ship guidelines on Data Presentation.
Please fill out these boxes ê (Do not worry if you cannot see all your text once you press return) a specification of the experimental system investigated (eg cell line, species name). Number of cell lines, genes and drugs were defined as the most comprehensive data-sets possible.
graphs include clearly labeled error bars for independent experiments and sample sizes. Unless justified, error bars should not be shown for technical replicates. if n< 5, the individual data points from each experiment should be plotted and any statistical test employed should be justified the exact sample size (n) for each experimental group/condition, given as a number, not a range; Each figure caption should contain the following information, for each panel where they are relevant:

Data
the data were obtained and processed according to the field's best practice and are presented to reflect the results of the experiments in an accurate and unbiased manner. figure panels include only data points, measurements or observations that can be compared to each other in a scientifically meaningful way.