New methods for drug synergy prediction: a mini-review

In this mini-review, we explore the new prediction methods for drug combination synergy relying on high-throughput combinatorial screens. The fast progress of the field is witnessed in the more than thirty original machine learning methods published since 2021, a clear majority of them based on deep learning techniques. We aim to put these papers under a unifying lens by highlighting the core technologies, the data sources, the input data types and synergy scores used in the methods, as well as the prediction scenarios and evaluation protocols that the papers deal with. Our finding is that the best methods accurately solve the synergy prediction scenarios involving known drugs or cell lines while the scenarios involving new drugs or cell lines still fall short of an accurate prediction level.


Introduction
Combination therapies involving two or more drugs are nowadays frequently used to treat complex diseases.Combination therapies can enhance treatment efficacy while mitigating side-effects and drug resistance, due to lower required drug doses and drug synergistic effects.The vast number of potential drug combinations presents a major bottleneck for developing new combination therapies, which calls for new computational approaches to facilitate the exploration of drug combination spaces.
In recent years, several large high-throughput screening datasets for drug combination have been published [1,2,3,4,5] which has fueled the development of a new generation of predictive models for drug synergy.
In this mini-review we focus on the new prediction methods based on deep learning, pioneered by the DeepSynergy method [6], and machine learning developed in last years.We put these papers under a unifying lens by first highlighting the prediction setups, included the prediction scenarios, the data sources, the input data types and synergy scores used in model output, as well as the evaluation protocols used.Following that, we discuss the core technologies, commonly used in the models, and present an overview of the predictive performance of the models.
We limit our focus to new methods that have been tested against large benchmark datasets on pre-clinical synergy and dose response data, in particular leaving out drug-drug interaction (DDI) prediction, which has its own focused literature (see e.g.[7]), as well as papers that are using small scale data or lacking comparison to alternative prediction methods.In addition, we do not cover web-based tools, software and libraries that implement prediction methods (see e.g.[8] for a review).

Prediction Setups
Synergy scores.The output of a synergy prediction model is either a realvalued synergy score (a regression task) or a binary prediction (synergistic/nonsynergistic), which is generally obtained from the synergy scores by thresholding.The most common synergy scores are Bliss independence, Loewe additivity, zero interaction potential (ZIP), and Highest single agent (HSA), ComboScore and S-score [9].Synergy prediction models can be trained separately against different synergy scores or, by using multi-task learning [10], trained against multiple scores at the same time.Regression methods generally use the synergy scores as the model output, while classification methods use some form of thresholding to classify the combinations as synergistic, non-synergistic or antagonistic.Alternatively, the methods can be trained to predict the dose-response of the combination, and the synergy scores can be computed in a post-processing step [11].
Input data.Drug combination prediction models utilize diverse data types to capture the complexity of drug interactions and their effects.Table 1 lists the most important data sources for drug synergy prediction, including synergy or dose-response data of drug combinations (top part) and data sources containing descriptors and profiling data of drugs and cell lines (bottom part).Below we explain shortly the most common data types used by the methods included in this mini-review, to illustrate the commonalities and differences of the predictive methods listed in Table 2.We note that there is a large amount of further detail and variations within the general datatypes and details should be checked from the original papers.
• Drug features: These encompass various molecular descriptors and fingerprints (FP), chemical structures (CS), pharmacological properties, drug (monotherapy) dose-response.Such features provide information about the characteristics and potential interactions of the drugs.In addition, some methods include drug combination response in other cell lines than the one being predicted as an input feature [11,33,34].
• Genomic and transcriptomic data: Cell line expression profiles (CLE), miRNA expression, genomic mutations (MUT), copy number variations (CNV), and other genomic data can be leveraged to identify molecular signatures associated with drug response.These multi-omics data sources contribute to a deeper understanding of the mechanisms underlying drug combination effects.
• Biological pathways and networks: Protein-protein interaction networks (PPI) and drug-target associations (DTA), offer valuable insights into the underlying mechanisms of drug combinations.They enable the incorporation of biological knowledge and context into the prediction models.
The methods in Table 2 generally divide into two types: (1) narrow input data, relying on one type of drug feature and one type of cell line feature (usually CLE) combined with very large training sets, and (2) broad input data, using multiple drug, genomic and biomedical pathway data, but with smaller training data sets.The first category generally allows easier scale-up to large data better but may be restricted in generalizing in the more challenging prediction scenarios (LDO,LCO) due to limited biological context.The second category, on the other hand, has broader biological context but may be restricted in training data size.The choice of the scenario has a major effect on the accuracy of the prediction.However, in the papers describing the methods (Table 2), the assumed scenario is often only implicitly given, in the description of the training-validation-test splitting or the cross-validation procedure.Table 3 lists the scenarios that we identified in the papers.By far the most common scenario is LTO which follows from splitting unique triplets randomly into training, validation and test sets or cross-validation folds, followed by LPO [12,13,14,15,10,16,17,18,19], while LDO [20,16,18,19,21] and LCO [20,16,22,19,21] have received less attention, perhaps partly due to the that each scenario requires its dedicated data splitting strategy and partly due to their difficulty.
Evaluation protocols and hyperparameter tuning.The most popular crossvalidation procedure among the references is 5-fold cross-validation, where the splits are chosen to honor the scenario under investigation.Another common strategy is random train-test-split, which is repeated a few times.A few papers use in addition an independent test set that is not used in the model development [38,39,44,47,49,17].
Most synergy prediction models have some hyperparameters that are given as input to the learning algorithm, which typically affects the capacity to the model to fit the data.However, over half of the papers in this review did not clearly explain the data splitting strategy for hyperparameter tuning (Table 3).The use of a validation set separated from the training and test data is a sound approach used in several of the papers.A few papers [40, 49,17,21] employ a rigorous nested cross-validation approach, where the inner loop is used to tune the hyperparameters and the outer one is used for obtaining the performance estimates.

Predictive models for drug synergy
Here we provide a short overview of the modeling approaches used in the reviewed papers.The interested reader may check the original papers for details.

Neural Network Models
The majority of the recent drug synergy prediction models (Table 2) are based on neural networks of various types, which have become popular in recent years.Their main benefit is the ability to learn new representations from large training data while the main deficiency is the computational complexity of training the models.
Deep Neural Networks (DNN) are a widely used neural architecture in drug synergy prediction models, either as a component or a standalone model [12,13,38,14,45,47,48,10,49,16,53,54,19].DNNs are composed of multiple layers of interconnected computation units that compute a linear transform of the inputs followed by a non-linear activation function.Myriads of DNN architectures can be created by using different types of connection patterns between the units.
Encoder-decoder networks such as auto-encoders (AE) and transformers are used to learn latent representations of structured data such as SMILES sequences or molecular graphs.Transformers learn mappings between general inputs and outputs.By relying on attention units, transformers can adaptively focus on different parts of a structured object.In drug synergy prediction, BERT transformer has been used to learn latent representations from SMILES strings [37,14].More generally, transformers have been used to map embeddings of input data sources into intermediate representations [21,39,15].In addition, the Word2vec encoder-decoder network has been used to extract latent representations from text documents describing drug combinations [53].Graph Neural Networks (GNN) specialize in analyzing relational data represented as graphs or networks.GNNs are capable of learning embeddings for individual nodes and edges as well as complete graphs.The main benefit of GNNs over text (e.g.SMILES) or vectorial representations (e.g.molecular fingerprints), is their capability to learn fine-grained representations that are still explainable in graphical form.In drug synergy prediction, GNNs are used to model molecular graphs as well as biological networks of drugs, targets and cell lines [38,44,56].Graph Convolutional Network (GCN) is one of the most widely used type of GNN [52,57,17,42,14,58].Graph Attention Networks (GAT) combine graph convolution with attention units for added flexibility [38].
Siamese Network share parameters between subnetworks processing different data items arising from paired objects, such as pairs of drugs [18].They are particularly valuable for assessing the similarity or complementarity of drug properties, an essential factor in predicting drug synergy.

Forest-based models
Models based on ensembles of trees such as random forest, based on bootstrap aggregation, and XGBoost [51] based on gradient boosting are strong predictors and frequently used in drug combination prediction.These methods have been recently extended to deep forests where several layers of forests are used [41,37,20] for synergy classification.However, similarly to neural networks, they can be computationally intensive to build and challenging to interpret.

Factorization models
Factorization approaches in drug combination prediction involve the decomposition of multi-dimensional tensors into latent factors to extract latent features and relationships between drugs [22,50].These models are powerful in predicting missing values in incomplete data tensors (e.g.combination response or synergy data) by learning from the co-occurrences of subsets of variables.In particular Higher-Order Factorization Machines (HOFM) [11] and latent tensor reconstruction [33] are accurate in the LTO scenario as well in completion of individual dose-response matrices.On the other hand, they are not expected to have an advantage in the LDO and LCO scenarios, which require extrapolation outside the known drug or cell line space.

Bayesian models
Bayesian models provide a consistent fully probabilistic inference approach for modeling drug combination experiments and predicting doseresponse relationships.In particular, Gaussian processes have been used to model drug synergy [36,34].These models particularly excel in allowing the prediction uncertainty to be rigorously addressed.However, the complexity of Bayesian inference can be computationally demanding and may require specialized knowledge in statistics, limiting their accessibility and use in broader applications.

Prediction performance of the models
Figure 1 summarizes the predictive performance of the models reviewed here.Figure 1a depicts the reported AUROC values for each classification method.The highest values are due to GAECDS (0.98), MatchMaker (0.97) and Kim et al. (0.96) while nine further methods reach 0.9 AUROC or more.Methods that report performance over multiple scenarios exhibit a range of values given by the Box-Whisker plot.Among regression models predicting Loewe synergy (Figure1b), MARSY obtains the highest Pearson Correlation Coefficient (PCC) of 0.89 on DrugComb, above the second highest value 0.83 by MGAE-DC on Merck-2016, both of these obtained in the LTO scenario.Figures 1c and 1d depict the distributions of AUROC and PCC values in different scenarios.The LCO and LDO scenarios show as significantly harder prediction tasks than LPO and LTO: based on Wilcoxon rank-sum test with Bonferroni correction, LDO has significantly lower AU-ROC compared to LPO (p=0.026), and LTO (p=0.002).Similarly, LCO has lower mean AUROC than LTO (p=0.015).In regression tasks the difference in PCC between LCO and LPO is significant (p=0.018).Due to the small sample sizes none of the other differences are statistically significant.Figure 1e shows the reported AUROC values per dataset.The distributions are relatively similar and no statistically significant differences can be shown between the datasets.

Discussion
The majority of the synergy prediction methods focus on the LTO and LPO tasks, where the performance of the best methods is already very high, and probably hard to improve upon.Notably the top performers rely on large number of training examples combined with 'narrow' input representation, single input data types for drugs and cell lines.The best results in the LCO and LDO tasks, on the other hand, are clearly lower, and suggest a shift of focus is needed for the method developers.In these scenarios, developing better representations for broad input data types could be a way forward.We note that the evaluation setups across papers are not always clearly reported and easy to compare.Although some common protocols are in use, e.g.5-fold cross-validation, the reporting of hyperparameter tuning is lacking in many papers, even among top performing models, which diminishes the confidence in the reported results.We are nevertheless happy to see the rigorous nested cross-validation approach in several papers.
Going forward, it seems clear that more unified approaches are needed for the inter-comparability of the methods.The standardization of benchmark datasets, prediction scenarios and evaluation protocols should help the community to make clearer assessment of the state-of-the-art and potential points of improvement.
[34] L. Rønneberg, P. D. Kirk, M. Zucknick, Dose-response prediction for invitro drug combination datasets: a probabilistic approach, BMC bioinformatics 24 ( Prediction scenarios..The difficulty of synergy prediction depends significantly on the assumption of which data we expect to be present at prediction time.Given a triplet (D 1 , D 2 , C) of a pair of drugs (D 1 ,D 2 ) and a cell line (C) to be predicted in the test set, the following scenarios are frequently studied: • Leave-Triplet-Out (LTO): The triplet (D 1 , D 2 , C) does not occur in the training data.However, the drug pairs (D 1 ,D 2 ) may occur in the training set connected to another cell line C ′ .• Leave-Pair-Out (LPO): The drug pair (D 1 ,D 2 ) does not occur in the training set in connection to any cell line.however, the individual drugs may occur in training set in connection to any cell line.• Leave-Drug-Out (LDO): At least one of the drugs in the pair (D 1 , D 2 ) does not occur in training set at all.• Leave-Cell line-Out (LCO): The cell line C does not occur in the training set but drugs D 1 and D 2 may occur in the training set in conjunction of other cell lines.

Figure 1 :
Figure 1: Summary of predictive performance of synergy prediction models: (a) Classification performance by method, (b) Regression performance by method (c) Classification performance by scenario, (d) Regression performance by scenario (e) Classification performance by dataset.

Table 1 :
Data sources containing combination response data for model training and evaluation (top part) and sources containing additional input data for drugs and cell lines (bottom part).The data sources marked with '*' are databases integrating multiple studies.

Table 2 :
Summary of Recent Methods in Drug Synergy Prediction

Table 3 :
Summary of evaluation methods in the reviewed papers, including the prediction scenarios, validation protocols and hyperparameter tuning.