SynergyFinder Plus: towards a better interpretation and annotation of drug combination screening datasets

Combinatorial therapies have recently been proposed for improving anticancer treatment efficacy. The SynergyFinder R package is a software tool to analyse pre-clinical drug combination datasets. We report the major updates to the R package to improve the interpretation and annotation of drug combination screening results. Compared to the existing implementations, the novelty of the updated SynergyFinder R package consists of 1) extending to higher-order drug combination data analysis and the implementation of dimension reduction techniques for visualizing the synergy landscape for an unlimited number of drugs in a combination; 2) statistical analysis of drug combination synergy and sensitivity with confidence intervals and p-values; 3) incorporating a synergy barometer to harmonize multiple synergy scoring methods to provide a consensus metric of synergy; and 4) incorporating the evaluation of drug combination synergy and sensitivity simultaneously to provide an unbiased interpretation of the clinical potential. Furthermore, we enabled fast annotation for drugs and cell lines that are tested in an experiment, including their chemical information, targets and signalling network information. These annotations shall improve the interpretation of the mechanisms of action of drug combinations. To facilitate the use of the R package within the drug discovery community, we also provide a web server at www.synergyfinderplus.org that provides a user-friendly interface to enable a more flexible and versatile analysis of drug combination data.


Introduction
Many complex diseases, including cancers and infectious diseases, usually develop drug resistance easily 1,2 . To achieve more sustainable clinical efficacy, multi-targeted drug combinations are proposed to tackle the disease signalling pathways more systematically 3 . With the advances of high-throughput drug screening technologies, an increasing number of drug combinations is being tested in multiple disease models, such as cancer cell lines and patient-derived primary cultures.
The main aim of these preclinical drug combination experiments is to identify the most synergistic and effective drug combinations that result in improved cellular responses, such as cell viability and toxicity. With the potential hits prioritized from the screening, more functional studies can be warranted to elucidate the mechanisms of action of the drug interactions that ultimately lead to the identification of drug combination response biomarkers that are critical for stratifying patients for more targeted therapies.
To evaluate the potential of a drug combination, mathematical and statistical models are needed to characterize the expectation if the drugs are not interactive, after which the boosting effects of the drug combination can be formally quantified. There are currently four major synergy models: the highest single agent (HSA) 4 , LOEWE 5 , BLISS 6 and ZIP 7 . However, the overall effects of a drug combination may not lead to sufficient efficacy despite a strong degree of interaction. Therefore, it is recommended to evaluate both synergy (i.e. the degree of interactions) and sensitivity (i.e. the overall treatment efficacy) simultaneously 8 . Previously developed tools that can analyse drug combination synergy include COMPUSYSN 9 , BRAID 10 , Combenfit 11 , SynergyFinder 12 and Synergy 13 . However, these tools are designed mainly for analysing two-drug combinations only.
More recently, the SynergyFinder2 tool 14 has been developed, which extends to the three-drug combinations. However, only the HSA model is compatible with its lower-order scenarios in its implementation. Furthermore, the visualization of the high-order drug combinations becomes nontrivial, as none of the existing visualization methods can deal with combinations of three or more drugs 15 . More importantly, there is a lack of implementation tools that can harmonize the multiple synergy models to derive a consensus about the degree of interactions 16 .
To address these limitations, we provide a major update to the SynergyFinder R package that enables novel functions: 1) formal mathematical models to assess the synergy of high-order drug combinations; 2) visualization of the degree of synergy using a dimension reduction technique; 3) formal statistical methods to evaluate the significance of synergy; 4) an implementation of the synergy barometer to systematically compare the results from different synergy models; and 5) an implementation of a synergy-sensitivity (SS) plot to evaluate the potential of a drug combination unbiasedly. Finally, we developed data annotation tools to retrieve the pharmacological and molecular profiles of the drugs and cells, which shall facilitate the discovery of mechanisms of action of the most synergistic and effective drug combinations. We provide a new website at www.synergyfinderplus.org to allow a user to upload their experimental results and run all the analyses with the R package in the backend. All the functions and source codes are freely accessible for academic users (Figure 1). Figure 1. Summary of the SynergyFinder Plus workflow. We highlight the new features: 1) the capability of analysing drug combinations of more than two drugs; 2) tailored statistical testing to account for uncertainty for data with replicates; 3) harmonized visualization of multiple synergy score metrics; 4) visualization of the relationship between sensitivity and synergy scores; and 5) a data annotation web server to query multiple cell line and drug databases.

Synergy and sensitivity models for high-order drug combinations
Consider the response of a drug to be measured as a % inhibition that ranges from 0 to 1, with a higher value indicating better efficacy. For a combination that involves drugs, the observed combination response is denoted as , while the observed monotherapy response of its constituent le drugs is . .
. The expected combination response is determined by the assumption of noninteraction. Currently there are four major synergy reference models 16,17  (1)

Visualization of higher-order synergy and sensitivity scores
For the visualization of higher-order drug combinations on a map of dose configurations, the commonly utilized synergy or sensitivity landscape models are not directly applicable. Here we propose a dimension reduction technique that is based on multi-dimensional scaling, similar to the recent application in transforming numerical data into images 18 . For a drug combination in a highdimensional dose space, with its coordinates of , we utilize their rankings to determine the pairwise similarity between the instances of . A common choice of quantifying the similarity is the Euclidean distance. We utilize multi-dimensional scaling to minimize the error of the pairwise distance in a two-dimensional space in which the synergy and sensitivity scores can be visualized as a synergy landscape (Figure 2). Note that using the dose rankings as the input for multi-dimensional scaling can assure that the resulting two-dimensional ts s coordinates are equally distanced among the neighbouring dose conditions, thus making the visualization easier to interpret. Furthermore, for the case of two-drug combinations, the algorithm can converge to the actual dose rankings, thus preserving the consistency across all the orders of combinations. We have also developed functions to visualize the synergy and sensitivity scores for a specific dose condition in a grid of barplots, which can help identify the most synergistic and sensitive areas in the synergy landscape.

Statistical analysis
The statistical significance of synergy can be defined at a single dose or at the whole dose matrix level. Take a two-drug combination experiment as an example. We assume that the replicates of drug responses are measured independently within each dose in the dose matrix (Figure 3). At each dose level, we propose using a bootstrapping approach to determine the confidence intervals of the synergy scores. Namely, we sample with replacement from the replicates to determine a bootstrap dose-response matrix. The synergy scores for the HSA and BLISS models can be derived simply by comparing the bootstrapped responses at the dose combination and at their corresponding monotherapy doses (i.e. using eq (1) and eq (2)). However, for the LOEWE and ZIP models, bootstrapped synergy scores will be derived using the curve fitting over the whole dose matrix (i.e. eq (3) and eq (4)).
For a scenario where no replicates are available, the confidence interval cannot be determined at the dose level. However, the p-value of the average synergy score of the whole dose-response matrix can still be derived by pooling the synergy scores together and then comparing them to zero. For both scenarios, we report the overall p-values of synergy and the confidence intervals at the dosespecific level when replicates are available.

Analysis of three-drug combinations
To test the algorithms designed for higher-order combinations, we utilized one of the recent antimalaria studies in which 16 triple artemisinin-based combination therapies have been tested against 15 parasite lines 23 . The aim of this study was to identify synergistic partner drugs that can be combined with artemisinin to overcome the emergence of malaria resistance. For each triple drug combination, a 10x10x12 multi-dimensional dose matrix was constructed, resulting in n = 1200 viability values. The datasets were obtained from the NCATS data portal at https://tripod.nih.gov/matrix-client/.
We analysed the triple drug combinations in terms of their sensitivity and synergy and visualized them in surface plots using the dimension reduction techniques.

Analysis of two-drug combinations with replicates
We obtained the ONEIL dataset from the DrugComb data portal 24 . The ONEIL data is a pan-cancer drug combination study in which 583 drug combinations have been tested across 39 cell lines 25 .
For each drug combination, four replicates have been produced, making it a representative example dataset for the analysis of the significance of the synergy scores. We show selective drug combinations that were tested in the A2058 cell line (melanoma).
As shown in Figure 5A, the confidence intervals for the responses as well as synergy scores vary at different doses, suggesting that not all the scores are equally significant. Furthermore, we observed a general trend that a higher response or synergy score tends to have a smaller confidence interval, consistent with the expectation of a typical drug screening experiment. In order to evaluate the synergy score more systematically, we provided the synergy barometer for specific dose  combinations, it would therefore be beneficial to provide data integration tools for effective annotation and accurate matching to public databases such that more comprehensive features used for synergy predictions can be obtained. Efforts regarding annotating and harmonizing existing drug screening data, such as DrugComb 24,29 and DrugCombDB 30 , have significantly contributed to the development of data-driven pharmacological modelling [31][32][33] . For newly generated drug screening datasets, we have developed the SynergyFinder Plus portal further as a companion tool for retrieving publicly available information in a more automated fashion. With one click of the button, users can obtain not only cross-database identifiers of the drugs and cells, but also additional information, such as the mechanism of action and disease indication.
Here we show how SynergyFinder Plus annotates a particular drug combination in the ONEIL 25 dataset (5-Fluorouracil and Vorinostat tested in the A2058 cell line, Figure 6A). After users upload the dataset, SynergyFinder Plus returns two annotation tables describing the cell and drug identities.
For example, the cell annotation Additionally, a third table listing detailed drug-target binding information is provided, allowing the biological processes of drug perturbation to be further retrieved using the target identifiers in ChEMBL. These annotations enable a more systematic exploration of drugs and cell lines that are tested in a high-throughput experiment ( Figure 6B). With such a comprehensive collection of identifiers, we believe that the drug combination data can be better integrated with other related data sources for further downstream analysis. We provide a major update to the commonly used SynergyFinder R package that allows the modelling of drug interactions for any higher-order combinations 15 . Higher-order drug combinations have been recently explored in, for example, lung cancer 35 , colorectal cancer 36 and ovarian cancer 37 . It is therefore expected that promising multi-drug combinations will enter formal clinical trials for multiple diseases 38 . Furthermore, we have developed statistical analysis of the drug combinations, which is also generally applicable for higher-order combinations. Importantly, the mathematical models for higher-order combinations have been made consistent with those for two-drug combinations, and therefore their scores are readily comparable across different orders.
To visualize higher-order combinations, we have developed a dimension reduction technique to display the synergy as well as sensitivity landscape on a two-dimensional plane.
In addition to providing multiple synergy models, we have also addressed the problem of harmonization. With the increasing number of models that have been proposed to characterize drug synergy, there is a more pressing need to help users understand the differences between these models such that a user may make an informed interpretation about the drug combination results.
We propose the use of a synergy barometer on which all the synergy models can be systematically compared together with the observed drug combination response. Despite only including the four major synergy models, HSA, BLISS, LOEWE and ZIP, we welcome the incorporation of other synergy models that can readily have their synergy scores put into the barometer to be compared with existing models. With the synergy barometer, we hope that the drug discovery community may derive a consensus on the best reporting guideline for the drug combination data analysis, which has been long sought since the time of the Saariselkä agreement 39 and thereafter 40 .
We also provide the formal statistical analysis for replicates of drug combinations. The confidence intervals and p-values for all four synergy scores are readily comparable. Even for the cases in which no replicates have been performed, the significance of synergy over all the dose conditions can be provided, thus enabling a statistical evaluation of drug combinations in a primary screen that usually does not provide replicates due to the high number of combinations. Once a drug combination hit passes the initial screening based on the synergy barometer and the statistical significance, we recommend a confirmation screen that involves more doses as well as more replicates so that its synergy can be more formally evaluated. If a user would like to know more about the mechanism of action of the drug combination, we also provide integrative tools to retrieve the chemical structure information as well as other functional annotation of the drugs, which shall help the downstream analysis of the drug combinations.
Finally, we provide new features to harmonize the assessment of synergy and sensitivity. The synergy models that have been developed in the SynergyFinder R package have the same scale as the drug combination sensitivity method that was developed earlier, thus allowing a direct comparison in an SS plot. We recommend the users consider the SS plot when reporting their drug combination results, as the sensitivity is the clinical endpoint for approving any drug or drug combinations. Importantly, the drug combination sensitivity values also have the same scale as monotherapy drug responses in the unit of percentage inhibition, and therefore we can readily compare a drug combination with a single drug to evaluate their efficacy. With the harmonization of drug combination sensitivity and monotherapy sensitivity, we may provide an integrated data source for developing advanced machine learning approaches for predicting drug sensitivity, such as those drug sensitivity datasets that are deposited in DrugComb 24 . To facilitate the FAIRification (findable, accessible, interoperable and reusable characteristics) of drug combination analysis, we provide a web server at www.synergyfinderplus.org to let users upload the data and obtain the analysis results in the form of reports. Meanwhile, we encourage users to deposit their experimental data in DrugComb at https://drugcomb.org/, currently one of the most comprehensive drug screening databases. With the algorithms and the source code made available through the SynergyFinder R package, we welcome the machine learning community to leverage the harmonized drug combination and monotherapy datasets for more robust and transferable predictions to further facilitate the drug combination discovery 27,[41][42][43] .