1 Introduction

One of the most remarkable milestones in the history of medicine is attributed to William T.G. Morton (1819–1868), who pioneered the use of inhaled ether as a surgical anesthetic [1]. Anesthesia is a temporary and reversible state induced by medications or other interventions that leads to a controlled loss of sensation or consciousness [2]. Depending on the nature of the procedure and individual patient requirements, various types and levels of anesthesia are available, including general anesthesia, regional anesthesia, and local anesthesia. The primary objective of anesthesia is to ensure patient safety and comfort, providing a foundation for further examinations or surgical procedures.

The effects of anesthesia are intricately intertwined with a wide array of physiological processes, encompassing analgesia, sedation, hypnosis, immobility, unconsciousness, amnesia, suppression of autonomic reflexes, and muscle relaxation [3, 4]. In clinical practice, anesthesia involves the administration of a combination of anesthetic drugs rather than relying solely on a single agent. Broadly speaking, anesthetic drugs can be categorized into various types, including general anesthetics, local anesthetics, analgesics, sedatives, and muscle relaxants [5].

The pursuit of an ideal anesthetic encompasses a wide range of characteristics including safety, effectiveness, controllability, rapid onset and offset, minimal side effects, stability, compatibility, reversibility, and cost-effectiveness, yet the realization of such an agent remains elusive [4, 6, 7]. Although commonly used anesthetic drugs possess certain desirable properties, they are also associated with unwanted effects (Table 1). Improper utilization may lead to adverse reactions including respiratory depression, hypotension, adrenal suppression, inflammation, nausea, vomiting, and convulsions in patients [8,9,10,11,12]. Consequently, the expectation among all anesthesia practitioners is the development of a novel, safer, and more efficacious anesthetic drug.

The development of novel drugs is a complex, protracted, high-risk, and costly endeavor, typically spanning more than a decade and involving investments amounting to billions of dollars [13]. Despite the extensive research and financial commitments, only a limited number of drugs successfully navigate the approval process and reach the market [14, 15]. Notably, drugs targeting the central nervous system, including anesthetics, experience the highest rate of failure in development [16]. Numerous factors contribute to the failure of drug development, including but not limited to inadequate potency, unforeseen adverse effects, off-target effects, and significant challenges in synthesis [17].

The advancement of computer-aided methods and artificial intelligence in the medical field has propelled modern anesthesiology into the era of information technology. In this regard, the integration of computer-aided drug design (CADD) and machine learning techniques has yielded diverse applications in the field of drug discovery [18]. These methodologies have proven instrumental in reducing costs, saving time, preemptively eliminating unqualified molecules, and minimizing failures in the final stage of drug development. This paper provides a comprehensive review of commonly utilized computer-aided and machine-learning methods, highlighting their applications in the development of anesthetic drugs.

Table 1 Commonly used anesthetic drugs. FDA, Food and Drug Administration

2 Methods of computer-aided drug design

The concept of CADD can be traced back to the early 1960s when researchers started using computers to simulate and model chemical interactions. However, it was not until the 1980s that the term “computer-aided drug design” was coined, and the field really began to take off in the 1990s [33]. Generally, there are two types of techniques: structure-based drug design and ligand-based drug design (Fig. 1).

Fig. 1
figure 1

Overview of computer-aided drug design. QSAR, quantitative structure-affinity relationship; HTS, high-throughput screening; VS, virtual screening

2.1 Structure-based drug design

Structure-based drug design (SBDD) has gained significant popularity as a widely embraced strategy for expedited and cost-effective lead identification and optimization. SBDD has emerged as a widely adopted approach for rapid and cheaper lead identification and optimization [34]. It encompasses a range of techniques, such as homology modeling, molecular dynamics, molecular docking, and de novo drug design [35]. The core of SBDD is to explore binding modes in protein-ligand complexes and predict the binding affinity [36, 37]. The initial step in SBDD is to identify drug targets and obtain their three-dimensional structure, which can be experimentally measured or computationally predicted using methods like homology modeling [13].

2.1.1 Molecular docking

Once the 3D structure of proteins is obtained, the subsequent step involves exploring atomic-level interactions between macromolecular structures and small molecules [38]. The most well-known SBDD technique in this process is molecular docking. Docking methods typically consist of two components: the search algorithm and the scoring scheme [39, 40]. The search algorithm is responsible for sampling suitable conformations from high-dimensional spaces, while the scoring scheme is employed to evaluate interaction energies and rank different candidate dockings [41, 42]. Classic scoring schemes can be force-field-based, empirical, or knowledge-based [43], while the introduction of machine learning (ML) approaches has greatly improved the accuracy of scoring functions [44]. Recent developments in molecular dynamic simulations have made conformational exploration easier, thereby making molecular docking a more powerful tool for virtual screening lead compounds.

2.1.2 Homology modeling

Owing to the high cost and time-intensive feature of the experimental measurements, homology modeling has become one of the most precise computational methods to obtain 3D coordinates of proteins from their amino acid sequences [45, 46]. The fundamental tenet of homology modeling is that the target sequence of interest is likely to share a similar structure and function to a homologous template [47, 48]. The quality of the predicted structure is determined by the level of similarity between the target and the template sequence [49]. Sequence-based similarity search algorithms, such as FASTA and BLAST, are commonly employed to identify homologous templates from the Protein Data Bank (PDB) dataset [50].

2.1.3 Molecular dynamics

Proteins were once believed to be static structures, but this view has been challenged and overturned [51]. Molecular dynamics (MD) is a computer simulation technique employed to predict protein movements and conformational changes over time [52]. Initially introduced by Alder and Wainwright [53], MD simulations have evolved from methods based on Newtonian mechanics to more advanced quantum mechanical approaches [54, 55]. MD simulations are widely used to study drug-target interactions, capture multiple conformational changes in proteins, calculate ligand-target binding energies, and identify cryptic or allosteric sites [56, 57]. With rapid development and iteration of graphics processing units, the precision of MD simulations has significantly improved, thereby enabling the exploration of more sophisticated biochemical processes [58].

2.1.4 High-throughput screening and virtual screening

Both high-throughput screening (HTS) and virtual screening (VS) share the goal of identifying potential target molecules from a large, diverse library of compounds [59]. VS primarily relies on computational simulations to predict the interactions between small molecules and a target [60]. On the other hand, HTS involves the experimental testing of a large number of compounds in the wet lab. Though these two approaches possess certain distinctions, they are often used complementarily and researchers even use these terms interchangeably. Depending on the availability of structural data, VS/HTS can be conducted using either receptor-based or ligand-based methods [61]. However, in the practical application of anesthetic drug development, the utilization of structure-based methods is more prevalent.

2.2 Ligand–based drug design

In cases where structural information about drug targets is not available, ligand-based drug design (LBDD) can be a preferred approach [62]. The principle behind LBDD is that molecules with similar structures tend to share similar biological properties. Common LBDD methods include quantitative structure-affinity relationship (QSAR), similarity search, and pharmacophore modeling [63]. The main objective of LBDD techniques is to identify key structural or physicochemical features that account for the observed biological activity of a set of compounds [64].

2.2.1 Similarity search

Similarity searching is the most effective and straightforward approach to selecting structures similar to the input compound based on chemical or physiochemical characteristics [65, 66]. Choosing the right molecular descriptor is an important part of similarity measuring. Common molecular descriptors include 2D and 3D descriptors as well as various physicochemical properties [67]. The most widely used 2D descriptor is the molecular fingerprint, which compares similarity through bit strings [68]. Similarity searching is used in a wide range of scenarios, such as molecular novelty measurement, clustering analysis, docking searches, and reaction similarity evaluation [69]. This method can be a more advantageous alternative when lacking enough ligands with known bioactivity [70].

2.2.2 Pharmacophore modeling

Pharmacophores represent abstract characteristics essential for biological activity, rather than actual chemical or functional groups [71]. Pharmacophore features commonly include cations, anions, aromatics, and hydrogen bond acceptors or donors [72, 73]. Ligand-based pharmacophore modeling is commonly utilized in virtual screening to reveal features crucial for receptor binding. It generally includes the following steps: analyzing the conformational space of ligands in the training set, aligning molecular conformations, and identifying the best overlay of pharmacophore features [74].

2.2.3 QSAR

QSAR modeling is a process of predicting the physical and biological properties of untested molecules based on their structure [75, 76]. Pharmacophore modeling identifies essential 3D features for biological activity, while QSAR quantifies the relationship between structural features and activity, offering complementary information for drug development. QSAR modeling has undergone constant evolution and reinforcement, becoming one of the most frequently utilized approaches for the statistical analysis of biological data [77]. However, the phenomenon of “activity cliffs”, where slight changes in chemical structures can result in significant changes in target activity, can impede QSAR model performance [78]. It’s an inherent challenge but may be improved with other advanced computational techniques.

The selection of chemical descriptors plays a pivotal role in improving the performance of the QSAR models. 2D QSAR studies are based on various 2D properties such as constitutional descriptors, topological descriptors, and electronic descriptors. But their limited ability to account for the 3D structure of molecules undermines their predictive capabilities [79]. To overcome this limitation, 3D-QSAR leverages the ligands’ three-dimensional characteristics like molecular shape, electrostatic potentials, hydrophobic regions, and steric interactions, while 4D-QSAR adds conformational and alignment freedom, improving the model’s accuracy and comprehensibility [80].

2.2.4 Clinical demand-oriented reverse drug design

Anesthesia, especially general anesthesia is an intricate physiological activity that encompasses multiple targets, and a comprehensive understanding of each target’s mechanism remains elusive. These hurdles make it more complicated to develop novel anesthetics through the target-based drug design paradigm. In addition, clinicians consider the clinical features of a molecule as the most crucial information and the primary indicator of its potential as a new drug. Although descriptors commonly used in QSAR pertain to the molecule’s druggability, they do not provide a direct reflection of the molecule’s biological activity when used in patients.

Therefore, our team led by Jin Liu (the professor and chairman of the Anesthesiology Department and director of the Translational Neuroscience Center at West China Hospital, Sichuan University, Chengdu, China) presents a novel strategy termed “clinical demand-oriented reverse drug design”. Our strategy prioritizes the direct optimization of compound clinical properties regardless of the information of targets, distinguishing itself from the conventional target-based drug discovery paradigm. Moreover, our model has the ability to understand clinical requirements described in natural language. It can process a descriptive paragraph outlining clinical demands as input and subsequently generate potential candidates, setting it apart from phenotype-based drug discovery, which primarily revolves around screening as its central methodology.

Our team has undertaken efforts to develop novel general anesthetics through clinical demand-oriented reverse drug design. We have selected several clinical characteristics as optimization goals, encompassing anesthesia efficacy, anesthetic onset and duration time, blood pressure, abnormal nervous system excitation, and respiratory depression. Taking inspiration from Jin’s work [81], our initial step involved training a molecular variational autoencoder (VAE) capable of encoding and decoding molecules. The VAE model enabled us to encode input molecule graphs into latent vectors and decode modified vectors into compounds with optimized properties. In addition, the latent space provided by the VAE model can be utilized to train predictors for diverse clinical properties.

To generate and optimize potential compounds, we randomly sampled vectors from the latent space (i.e., random initialization). Subsequently, we calculated the necessary gradients for each property and applied the gradient descent algorithm to optimize each property individually. The optimized compounds exhibiting high predictive scores for all the clinical goals underwent further experimentation in the wet lab. Our research paradigm has already yielded preliminary results and may shed light on the development of new anesthetic drugs.

3 Principles of machine learning

The ultimate goal of artificial intelligence (AI) is to create computer systems or machines that have human-like intelligence and independence [82]. Learning and problem-resolving are typical tasks that represent human intelligence, thereby making "machine learning" a core area of artificial intelligence [83]. The major focus of machine learning is to develop computer systems that can acquire knowledge, make predictions or decisions, and enhance their performance using provided data instead of explicit programming [84, 85].

ML techniques are typically classified into three main categories (Fig. 2): supervised learning, unsupervised learning, and reinforcement learning. In addition, deep learning, a specialized approach within ML, and transfer learning, an extension or application of ML, have also demonstrated their utility in the field of drug discovery.

Fig. 2
figure 2

Machine learning algorithms used for anesthetic drug discovery

3.1 Supervised learning

Supervised learning encompasses the process of training a model using annotated data, where input samples are paired with corresponding target labels [86]. Through this approach, the model acquires the ability to make predictions or classify unseen data by leveraging the patterns and relationships learned from the labeled examples.

3.1.1 Classification

Classification refers to assigning input data instances to specific predefined categories or classes. The evaluation of a classification model is typically based on metrics such as precision, recall, accuracy, F1 score, and the area under the receiver operator characteristic curve (AUC-ROC) [87].

Naive Bayes (NB)

Naive Bayes is a classification method based on Bayes theorem that assumes the independence of features given class labels. This method has advantages including simplicity, computational efficiency, and ability to handle high-dimensional or unbalanced data [88]. Nonetheless, in real-world scenarios, it is uncommon for features to be truly independent, which restricts the effectiveness of this method when substantial interactions exist between features.

k-Nearest Neighbor (k-NN)

The k-NN algorithm classifies or predicts new data points based on their proximity to labeled training examples within a feature space. k-NN exhibits versatility by accommodating both classification and regression tasks. To generate predictions, the algorithm identifies the k nearest neighbors in proximity to the new data point and assigns the most prevalent class label among them. The algorithm’s performance hinges upon the careful selection of the distance metric and the optimal value for k [89]. Noteworthy advantages of k-NN encompass its simplicity, ease of implementation, capacity to handle multi-class classification, and adaptability to novel data. Nevertheless, k-NN’s main disadvantage is computational complexity, especially with large datasets, owing to the necessity of comparing the query point against all training examples [90].

Support Vector Machine (SVM)

SVM aims to identify an optimal hyperplane within a high-dimensional space to separate data points into different classes, maximizing the margin between the classes. SVM is widely regarded as one of the most robust predictive models and finds extensive utilization in the domain of drug development [91]. SVMs are capable of handling high-dimensional data but are not directly applicable to multi-class tasks.

Decision Tree (DT)

The decision tree algorithm constructs a hierarchical structure of nodes to facilitate decision-making based on simple decision rules derived from data features [92]. Conceptually, a tree can be perceived as a piecewise constant approximation. This approach possesses a white-box property, offering interpretability through visually comprehensible tree structures, and it can effectively handle both numerical and categorical data. These advantages have established decision trees as one of the most prevalent and potent machine learning methods [93]. The limitations of decision trees include their proneness to overfitting when deep trees are utilized, as well as their sensitivity to minor data variations, which can lead to high variance [94]. To address these limitations, ensemble techniques like random forests or gradient boosting can be utilized.

Gradient Boosting Machine (GBM)

Boosting is a technique that transforms weak learners into strong learners, and one notable implementation of this principle is gradient boosting. GBM sequentially combines weak predictive models to create a robust ensemble model and can be used in regression and classification tasks. When decision trees serve as the weak learners, the resulting algorithm is referred to as a gradient-boosted tree. This iterative process allows GBM to effectively capture complex, non-linear relationships in the data, providing higher accuracy compared to single machine learning models [95].

Random Forest (RF)

Random forest is a versatile machine-learning algorithm that harnesses the collective strength of an ensemble of decision trees. Each tree in the forest is constructed independently by randomly selecting subsets of features and data samples. The final prediction is then determined by aggregating the outputs through majority voting or averaging. RF exhibits several advantages, including its proficiency in handling high-dimensional data, capturing complex relationships, and effectively managing missing values and outliers. Although Random Forest frequently achieves superior accuracy compared to a single decision tree [96], it sacrifices some of the inherent interpretability in decision trees.

Synthetic Minority Oversampling Technique (SMOTE)

Label imbalance is a common challenge encountered in datasets utilized for drug discovery [97], and SMOTE is employed to tackle this issue [98]. It addresses the scarcity of data in the minority class by generating synthetic samples that closely resemble the existing minority class instances. This approach effectively balances the class distribution and enhances the classifier’s ability to accurately predict the minority class. However, SMOTE may introduce synthetic samples that are noisy or less representative of the actual data.

3.1.2 Regression

Regression aims to predict a continuous numerical value as the output. Regression models can be linear or non-linear and are evaluated using metrics such as mean squared error, mean absolute error, or R-squared.

Linear regression

The linear regression model is renowned for its simplicity and effectiveness. It establishes a linear relationship between a dependent variable and one or more independent variables, seeking to identify the best-fit line that minimizes the overall difference between the predicted and actual values [99]. Notably, linear regression offers interpretability through coefficients that convey the direction and magnitude of the variable relationships. But this method is not suitable for nonlinear scenarios.

Ordinary least square

One of the commonly used methods in Linear Regression is Ordinary Least Squares, which estimate the parameters of a linear model by minimizing the sum of the squared errors.

Logistic regression

Logistic regression gets its name from the logistic function, also known as the sigmoid function, which lies at the core of this method. The sigmoid function maps real-valued numbers to a range between 0 and 1. Despite the resemblance to linear regression in terms of representation, logistic regression differs by focusing on binary values rather than continuous ones [100]. The simplicity and interpretability of logistic regression make it advantageous, as it offers insights into the importance and direction of input variables. Furthermore, logistic regression demonstrates computational efficiency and versatility in handling both numerical and categorical features, making it a valuable tool for various types of data analysis.

3.2 Unsupervised learning

Unsupervised learning deals with unlabeled data, allowing the model to discover hidden patterns or data groupings without explicit target labels or human intervention. Clustering and dimensionality reduction are two of the most common tasks in unsupervised learning.

3.2.1 Clustering

Clustering is used to automatically group similar objects together based on their inherent characteristics or patterns, without any prior knowledge of class labels or output values. In the process of drug discovery, clustering can be used to cluster compounds based on structural similarities, or group molecular targets based on functional characteristics.

K-Means

K-Means is an iterative algorithm that partitions data into k clusters by assigning data points to the cluster with the closest centroid and updating the centroids based on the mean of the data points within each cluster. The value of k, representing the number of clusters, is a user-defined parameter. K-Means boasts simplicity and efficiency, making it well-suited for large datasets. It operates without the need for labeled data and can handle both numerical and categorical features. However, it is crucial to consider that K-Means algorithm is sensitive to the initial placement of centroids, which can lead to convergence on different solutions [101].

Hierarchical clustering

Hierarchical Clustering is a method that groups similar data points into nested clusters based on pairwise distances [102]. It differs from K-Means in that it constructs a hierarchical structure of clusters, while K-Means directly partitions the data. Divisive clustering begins with all data points in a single cluster and recursively splits them into smaller clusters until the desired number of clusters is reached. This approach allows for capturing the inherent hierarchical structure of the data without requiring the prior specification of the number of clusters. But the choice of distance metric and linkage method can impact the resulting clusters.

3.2.2 Dimensionality reduction

Dimensionality reduction techniques are used to transform high-dimensional data into a lower-dimensional subspace while preserving the most important features. By reducing the dimensionality, researchers can simplify data analysis, remove noise, and identify key patterns.

Principle Component Analysis (PCA)

PCA addresses the complexity of high-dimensional data by extracting smaller features from a large set of variables [103]. It’s widely acknowledged as one of the most prominent methods within this category. PCA identifies the directions, known as principal components, along which the data exhibits the most significant variability. PCA finds applications in various domains, including molecular descriptors analysis, feature space reduction, and high-dimensional data visualization. However, reducing the number of variables naturally comes at the expense of accuracy.

Linear Discriminant Analysis (LDA)

LDA focuses on projecting data onto a lower-dimensional space while preserving class-specific information. It assumes that the data follow a Gaussian distribution and that the covariance matrices of different classes are equal [104]. LDA calculates class-specific means and covariance matrices to determine a discriminant function that optimally separates the classes. This method offers benefits such as simplicity, interpretability, and the ability to handle multi-class problems. But it can’t capture complex non-linear relationships in the data.

Time-structure Independent Components Analysis (tICA)

tICA is a method used to analyze data that changes over time, like molecular dynamics simulations or time series data. It simplifies the data and finds important patterns in the information. tICA can be a useful tool for understanding slow movements and significant changes in complex systems, thereby accelerating the process of drug development.

3.3 Deep learning

Deep learning (DL) algorithms are based on artificial neural networks, which aim to emulate the functionality of the human brain and learn from examples [105]. In the human brain, millions of interconnected neurons collaborate to acquire and process information [106]. Likewise, deep learning neural networks comprise multiple layers of artificial neurons, operating collectively within a computer system.

3.3.1 Multilayer Perceptron (MLP)

MLP is an artificial neural network that comprises multiple interconnected layers of nodes called neurons. It consists of three types of layers: the input layer, the output layer, and the hidden layer. Each neuron in the network receives input from the previous layer, applies a non-linear activation function to generate an output, and passes it to the next layer. The hidden layers, positioned between the input and output layers, enable MLPs to approximate any arbitrary function and learn complex features [107]. MLPs are trained using the backpropagation technique, where the network adjusts the weights and biases of its connections to minimize the discrepancy between predicted and actual values [108].

3.3.2 Convolutional Neural Network (CNN)

CNN is a neural network architecture that draws inspiration from the structure and functioning of the human visual system [109]. It consists of multiple layers, including convolutional layers, pooling layers, and fully connected layers. The convolutional layers play a central role in CNNs as they extract important features from the input [110]. Pooling layers then downsample the output, reducing its dimensionality. Fully connected layers connect all the neurons from the previous layer to the next, enabling classification or regression tasks. The hierarchical structure of CNNs allows them to automatically learn pertinent features from raw data, eliminating the necessity for manual feature engineering.

Though originally designed for visual data analysis, CNNs have expanded their applications to various domains, including drug discovery. CNNs are now utilized in analyzing intricate molecular and biological data, facilitating de novo drug design, aiding decision-making in pharmaceutical research, and processing imaging data like high-content screening images or histopathology slides [111].

3.3.3 Graph Neural Network (GNN)

A graph is a data structure consisting of two fundamental components: nodes (or vertices) and edges [112]. GNN is a specific type of deep learning model specifically designed to operate on graph-structured data. The primary goal of GNNs is to acquire node or entire graph representations through the propagation of information within the graph structure. An exemplary technique in this realm is the Graph Convolutional Network, which extends conventional convolutional operations to encompass graphs.

Graphs provide an effective representation of the structure and properties of chemical molecules, and they have garnered significant attention in the field of drug discovery [113, 114]. GNNs offer the capability to model molecular structures and exploit both local and global molecular interactions, making them valuable tools in the exploration of novel drug candidates.

3.3.4 Recurrent Neural Network (RNN)

RNN is a general framework for processing sequential data. Unlike feedforward neural networks, RNNs have feedback connections, allowing them to maintain an internal memory or state. This memory allows the network to process information not only based on the current input but also on previously seen inputs in the sequence.

RNNs suffer from vanishing or exploding gradients, which can make long-term dependencies challenging to learn. To mitigate this issue, specialized variations of RNNs, such as Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU), have been introduced. LSTMs achieve this through separate memory cells and gating mechanisms [115], while GRUs simplify the architecture by using combined and reset gates [116].

3.4 Transfer learning

Transfer learning is a machine learning method that applies knowledge obtained from training one task to a distinct but related task [117]. Instead of commencing from scratch, a pre-existing model, typically trained on a substantial dataset, is used as a foundation for a new task [118]. The pre-trained model has already learned general features and patterns from the initial task, which can be leveraged to improve learning and performance on the new task even with limited labeled datasets.

The integration of transfer learning and deep learning architectures frequently gives rise to a powerful framework known as deep transfer learning, offering a convenient means of fine-tuning parameters. Transfer learning has demonstrated promising applications, including biological sequence analysis, molecule bioactivity prediction, molecular generation, virtual screening, and protein-protein interaction prediction [119]. These advancements highlight the potential of transfer learning to enhance various aspects of drug discovery research.

This promising method can save computational resources and minimize the requirement for labeled data. However, effective transfer learning depends on the similarity between the pre-trained task and the target task. Failing to select an appropriate pre-trained model may result in the "negative transfer", wherein the learner’s performance deteriorates compared to not employing transfer learning at all [120].

3.5 Reinforcement learning

Reinforcement learning (RL) stands as a distinct paradigm within the field of machine learning, wherein an agent interacts with an environment, receives feedback or rewards, and adjusts its behavior through trial and error [121]. Although both supervised learning and reinforcement learning aim to optimize their performance over time, they differ in three main aspects: Firstly, supervised learning relies on labeled training data, while reinforcement learning doesn’t require pre-labeled datasets. Secondly, supervised learning aims to predict output accurately for new inputs, whereas reinforcement learning seeks an optimal strategy to maximize cumulative rewards over time. Thirdly, feedback in supervised learning comes from labeled training data, while reinforcement learning receives feedback as rewards or penalties based on agent actions in the environment, which can be delayed and sparse. Similar to how children explore their surroundings and learn to achieve a goal, RL models operate autonomously and self-teach. RL algorithms hold considerable power and promise as they can learn optimal actions for achieving success in an unfamiliar environment without the need for intervention or guidance from a supervisor.

Reinforcement learning has found practical applications in the realm of drug discovery, encompassing a range of areas such as enhancing precision medicine, improving clinical trials, optimizing biochemical properties, and advancing pharmacological research [122, 123]. Traditional generative models may generate novel molecules with limited synthetic accessibility. This challenge can be effectively tackled by integrating reinforcement learning algorithms to explore the synthesizable chemical space [124].

RL boasts numerous advantages and diverse applications, but it is not exempt from limitations. For example, RL algorithms grapple with the delicate task of striking a balance between exploring new actions and exploiting existing knowledge. Additionally, designing appropriate reward functions poses a significant challenge [125].

4 Applications in anesthetics discovery

Drug discovery typically involves identifying and validating targets, discovering potential hits and leads, optimizing lead compounds, conducting preclinical testing, progressing through clinical trials, and obtaining regulatory approval (Fig. 3). CADD and ML methods have been widely applied at various steps of the process (Table 2).

Fig. 3
figure 3

Steps for anesthetic drug discovery

4.1 Target elucidation

Target elucidation is the fundamental basis for the design and development of effective drugs that modulate target activity and deliver disease treatment. It encompasses two primary parts: target identification and target validation. Target identification involves the identification of biological targets (proteins, enzymes, receptors, etc.) and the exploration of their structural characteristics [169]. Target validation, on the other hand, aims to understand target-ligand interactions, uncover the underlying molecular mechanisms, and demonstrate the druggable potential of the target [170]. Obtaining comprehensive information about targets is vital for guiding the subsequent drug discovery process.

4.1.1 Prediction of protein structures

Proteins in living cells continuously undergo conformational changes, altering their atomistic-level structures and binding affinities with chemicals [154]. Researchers have found that certain protein conformations exhibit significantly stronger chemical binding abilities than others [171]. Furthermore, a protein’s shape determines its function. Erroneous protein structures may lead to toxicity and adverse reactions. Therefore, correctly predicting the structure of proteins and understanding protein conformational dynamics are vital to uncovering the underlying mechanisms of diseases and then advancing the drug discovery process.

Traditional structural biology experiments are time- and money-consuming, thus computational methods have recently flourished to produce less expensive and quicker computer simulations, helping researchers to predict the structure of targets related to anesthesia [172]. For example, Gamma-aminobutyric acid (GABA) has been widely recognized as the primary inhibitory neurotransmitter in the central nervous system [173]. The characterization of \(\textrm{GABA}_{\textrm{A}}\) and \(\textrm{GABA}_{\textrm{B}}\) receptors has been crucial for understanding some physiological activities in the nervous system [174]. Sripriya Akondi et al. applied SVM to classify \(\textrm{GABA}_{\textrm{A}}\textrm{Rs}\) based on the features extracted from Chou’s pseudo-amino acid composition [158, 175]. Liao et al. trained four ML classifiers to distinguish \(\textrm{GABA}_{\textrm{A}}\textrm{Rs}\) from non-\(\textrm{GABA}_{\textrm{A}}\textrm{Rs}\), using only the protein sequence information. Among the four classifiers, the gradient boosting decision tree and a library for support vector machine (libSVM) marginally overperformed RF and k-NN [157].

Opioids are widely used for pain relief and there are four primary subtypes of opioid receptors. Opioid receptor Kappa 1 (OPRK1) agonists have been demonstrated to activate pain-inhibitory pathways within the central nervous system [176]. Recently, Sripriya Akondi et al. implemented a decision fusion strategy that integrated multiple machine-learning algorithms to identify potential drug-binding molecular conformations of the OPRK1 protein. To address class imbalance issues, a synthetic minority oversampling technique (SMOTE) was incorporated. The decision fusion strategy utilizing LR (Logistic regression) + SMOTE + GB (Gaussian Naive Bayes) and LR + SMOTE + k-NN techniques outperformed those using a single technique [154]. Feinberg et al. employed a combination of molecular dynamics simulations and unsupervised machine learning techniques, including the tICA algorithm and K-Means clustering, to identify new conformational states of \(\mu\)OR. The authors then utilized docking scores to develop random forest classifier models that were capable of virtual screening agonists and binders for \(\mu\)OR [134].

4.1.2 Prediction of targets and binding sites

Effective target discovery is an indispensable prerequisite in the modern target-based drug discovery paradigm, as the selection of appropriate targets directly correlates with the success of drug development [177,178,179]. Therefore, prioritizing the identification and validation of targets is of paramount importance in the drug discovery process.

For example, Kang et al. created a novel deep-learning model and discovered a distinct druggable binding site for the P2X3 receptor (P2X3R). Their finding of 16 unique hit compounds may expedite the search for new P2X3R antagonists and lead to more effective neuropathic pain treatments [166].

Bertaccini et al. conducted a series of studies utilizing homology modeling to construct atomic-level models of the GABA receptor and the glycine alpha one receptor (GlyRa1). These models uncovered potential anesthetic binding sites and suggested a link between ligand docking scores and drug potency. Their findings have made a significant contribution to the current understanding of anesthetic action and have facilitated the development of novel anesthetics [128,129,130].

4.1.3 Prediction of protein-ligand interactions

Protein-ligand interactions take an irreplaceable part in various biological processes, including signal transduction, gene regulation, and immunologic reaction [180]. Therefore, exploring the mechanisms of protein-ligand recognition and binding is crucial for providing a theoretical foundation for drug development [181, 182].

In a recent study, Cheng and Ding developed SVR-based QSAR models to estimate the compounds’ binding affinities after docking with the \(\textrm{GABA}_{\textrm{A}}\) receptor and identified six essential characteristics that influence ligand-receptor binding [147]. Wijeyesakere et al. successfully built in silico tools to access the GABAergic potential of uncharacterized compounds — two binary classifiers based on random forest [164]. Liu et al. showed the effectiveness of molecular docking methods in predicting both the binding sites and affinities of anesthetics for both water-soluble and membrane proteins [137]. By employing molecular docking and absorption, distribution, metabolism, excretion, and toxicity (ADMET) prediction techniques, Meraj et al. assessed a group of Disopyramide analogs and discovered five potential lead compounds that were capable of binding to human voltage-gated sodium channel (NaV) proteins [135], which is beneficial for the development of anesthetics, anticonvulsants, and antiarrhythmic drugs [183]. Similarly, Lv et al. discovered the potential of a traditional Chinese medicine named gastrodin as a new anesthetic drug[136].

A major goal of computational biology has been to explore the mechanisms of protein-ligand binding. Based on homology modeling, Elgarf et al. discovered that different benzodiazepine ligands employ discrepant modes for binding to \(\textrm{GABA}_{\textrm{A}}\) receptors [126]. Pan et al. integrated several computational methods, including homology modeling, molecular docking, and molecular dynamics simulations, to investigate the interaction between benzhydrylpiperazine agonists and human \(\delta\) opioid receptors [127]. Yuan et al. combined molecular dynamics simulations and molecular docking to explore the potential binding mechanism of propofol to \(\textrm{GABA}_{\textrm{A}}\textrm{Rs}\) [131]. Lima Neto et al. were the pioneers in introducing quantum biochemical computational approaches to investigate the binding mechanism of existing agonists and antagonists to the \(\textrm{GABA}_{\textrm{B}}\) receptor [139]. A recent study by Manzur-Villalobos et al. evaluated the association between the Nav1.7 channel and local anesthetics through virtual screening, molecular docking and dynamics [133].

4.2 Hit and lead discovery

In drug discovery, the term "hit and lead discovery" encompasses the initial stages involved in identifying and optimizing potential drug candidates. This umbrella term includes two phases: hit discovery and hit-to-lead. The hit discovery phase focuses on the identification of compounds or molecules that exhibit promising activity against a particular target or disease [184]. Once hits are identified, the hit-to-lead phase begins, where the selected hits undergo further optimization to enhance their potency, selectivity, and other desirable properties [185]. These two stages typically involve high-throughput screening or virtual screening of large compound libraries, de novo drug design, and structure-activity relationship studies.

4.2.1 High-throughput screening and virtual screening

Since the mid-1990s, HTS has been widely considered the primary catalyst for hit discovery [186, 187]. This method enables the identification of potential "hits" from large-scale chemical libraries [188]. In recent years, there has been a notable advancement in VS techniques, which can complement high-throughput discovery methods. Both academia and industry have replaced traditional drug screening methods with HTS and VS due to their simplicity, cost-effectiveness, and efficiency [189].

Through HTS and VS, Feinberg et al. identified a promising molecule named FMP4, which is an agonist for \(\mu\)OR and possesses a unique structure [134]. Cayla et al. identified a group of lead compounds based on a distinctive chemical core. These compounds exhibited comparable anesthetic efficacy while minimizing detrimental cardiovascular side effects, making them superior to traditional intravenous anesthetics [140]. Chandra et al. introduced a tiered approach for the identification of Nav1.7 inhibitors. A novel compound named DA-0218 exhibited analgesic and anti-pruritic properties [132]. Ebalunode et al. created a structure-based pharmacophore modeling approach to identify novel anesthetic compounds [146]. Peng et al. discovered prospective anesthetics by virtual screening 50,000 compounds based on drug-likeness and ADMET properties, with five lead compounds demonstrating the potential for binding to \(\textrm{GABA}_{\textrm{A}}\) protein [143]. Yang et al. discovered two innovative compounds with reversible sedative-hypnotic activity [141]. Peng et al. identified two FDA-approved drugs that had not been recognized as sigma 1 receptor (S1R) antagonists previously [142].

Specifically, fluorescence-based techniques are becoming the most commonly used detection method due to their great sensitivity and automation-friendliness[189, 190]. For example, Lea et al. utilized a surrogate binding target for anesthetics and an environment-sensitive fluorescent probe to identify novel anesthetics [144]. McKinstry-Wu et al. developed a high-throughput assay by choosing a fluorescent anesthetic agent and apoferritin as a surrogate for the anesthetic target. In their study, two potent agents with low toxicity were selected from a large-scale library [145].

4.2.2 De novo drug design

Over the past 15 years, de novo drug design—the process of generating innovative compounds with desirable properties but without a beginning template has emerged as a potential area of study [191]. De novo drug design can produce entirely novel and distinctive molecules to enhance existing chemical libraries, whereas traditional virtual screening often screens candidate compounds from existing synthetically accessible, druglike chemical space [192, 193]. Nevertheless, lead compounds generated through this approach typically exhibit low synthetic accessibility [194].

With the emergence and flourishing of deep learning and reinforcement learning techniques, de novo drug design has offered promising opportunities for developing new and effective drugs. For instance, CNNs [195], RNNs [192], generative adversarial networks [196] and VAEs [197] have already been applied in this field.

Given the limited brain retention capacity of the commonly used opioid antagonist Naloxone, the development of improved opioid antagonists is crucial in tackling the opioid epidemic. Deng et al. created a deep reinforcement learning (DRL) framework to generate molecular simplified molecular-inputine-entry system (SMILES), predict chemical properties, and optimize SMILES into desired properties. In this study, they sampled 10k SMILES and filtered out 6 potential lead compounds, demonstrating their DRL framework’s ability to discover better opioid antagonists [165].

4.2.3 QSAR

Using computational methods like QSAR to calculate the ligand-binding affinity is of special interest because it can accelerate the drug discovery process at an early stage [198, 199]. In a study by Lu and Zhou, a novel method was proposed by bridging 3D-QSAR and receptor modeling to predict the binding affinity of imidazobenzodiazepines to \(\textrm{GABA}_{\textrm{A}}\) receptor subtypes [148]. By constructing 3D-QSAR models, Peng et al. accurately predicted the binding affinity between ligands and the S1R [142], Pan et al. evaluated the binding affinity of a set of benzhydrylpiperazine \(\delta\) opioid receptor agonists [127]. Floresta et al. built three distinct QSAR models based on a class of fentanyl-like structures to predict the binding affinity of \(\mu\)OR ligands [153]. Jia et al. presented a fresh workflow to discover new analgesic opioids with higher binding affinities. For each bioassay endpoint taken from the PubChem dataset, they created 12 individual QSAR models using a mix of four different chemical descriptors and three machine learning techniques (k-NN, RF, and SVM) [149]. Mehdipour et al. estimated the anesthetic potency of polyhalogenated ethers by developing a QSAR model that incorporates four parameters: logP, molecular polarizability, most positive charge, and electrostatic potential parameters [150].

4.3 Lead optimization

After identifying lead compounds with the desired bioactivity and selectivity, the subsequential step is lead optimization, which aims to improve their drug-like properties and enhance the chance of developing drug candidates. This involves iterative cycles of compound design, synthesis, and testing to optimize various aspects, including potency, selectivity, pharmacokinetics (PD), pharmacodynamics (PK), metabolic stability, and safety [200].

4.3.1 Evaluation of ADMET properties

PD and PK parameters, including ADMET, have long been critical considerations for researchers in the drug discovery process [201]. Lack of efficacy and safety has been a key factor contributing to the late-stage attrition of drug candidates [202]. The emergence of in silico prediction offers a new dimension for assessing multiple ADMET properties at every stage of drug discovery, enabling the optimization of virtual screening and experimental testing process by filtering out compounds with adverse ADMET profiles [203, 204].

Manzur-Villalobos et al. predicted the ADMET properties of the amide-type local anesthetic analogs and identified a potential brand-new local anesthetic that showed high binding capacity [133]. Jagannathan endeavored to discover novel psychoactive metabolites from Cannabis sativa, the smoke of C. sativa, and other phytocannabinoid matrices, based on the physiochemical descriptors and ADMET properties predicted by four ML techniques (NB, SVM, MLP, and hierarchical clustering) [156]. Coli Louvisse de Abreu et al. evaluated multiple types of toxicity of the epinephrine and norepinephrine degradation products using an ADMET predictor [151].

Different from de novo drug design, there are two typical approaches to designing new drugs based on existing drug molecules. One is to screen new compounds from derivatives of FDA-approved drugs[205]. In a recent study, Azamatov et al. introduced computational AMDET prediction methods to develop novel local anesthetics with lower toxicity. Through the study of a group of 1-aryl tetrahydroisoquinoline alkaloid derivatives, they also proposed explainable correlations between chemical structures and acute toxicity [152]. Another approach is to refine currently available drugs by optimizing PK/PD parameters. Jiang et al. utilized molecular docking and PK prediction techniques to design etomidate analogs that exhibit comparable anesthetic potency while avoiding adrenocortical suppression [138].

A variety of anesthetic and psychotropic drugs exert their effects on the central nervous system (CNS). Predicting the penetration of the blood-brain barrier (BBB) is a significant part of assessing ADMET properties and designing CNS drugs [206, 207]. A study by Yu et al. proposed an innovative hybrid method for identifying potential CNS drug candidates by combining three traditional ML algorithms (DT, RF, SVM) with the DL algorithm (Graph Convolutional Network). The resulting model generated six explainable sub-structural features that enable the rapid assessment of a molecule’s potential to penetrate BBB and become a CNS drug [162].

PK/PD modeling is a widely utilized method for assessing dose-concentration-response correlations, providing valuable insights for the understanding of drug delivery systems and the process of drug discovery [208, 209]. Sevoflurane is one of the most commonly used inhalation anesthetics but may lead to neurotoxicity and cognitive dysfunction [210]. Dhandore et al. combined PK/PD modeling and six machine-learning regression techniques (Linear Regression, Support Vector Regression, Bayesian Ridge, Decision Tree Regressor, Gaussian Process Regressor, and XGBoost) to investigate the effects of sevoflurane concentration changes on ten body parameters. The XGBoost model exhibited the least errors and identified \(\textrm{SpO2}\) as the most crucial body parameter to study drug effects during surgical anesthesia [160].

4.3.2 Prediction of side effects

Unexpected side effects are a major contributor to late-stage drug development failures and can even lead to the withdrawal of FDA-approved drugs [211,212,213]. Traditional approaches for evaluating adverse effects like animal models and clinical research, are time-wasting, costly, and not easily scalable [214,215,216]. Consequently, there is growing interest in using computational methods to detect and predict drug side effects.

Accurately predicting postinduction hypotension (PIH) can greatly assist clinicians in recognizing the potential risk of medication use and selecting the appropriate induction agent. In a study by Kendale et al., eight supervised ML techniques (logistic regression, SVM, NB, k-NN, linear discriminant analysis, random forest, neural nets, and GBM) were employed to assess the risk of hypotension [155]. Lundberg et al. created an ensemble machine learning model that significantly augments anesthesiologists’ predictive capabilities concerning the risk of intraoperative hypoxemia, both in preoperative assessment and real-time monitoring. Their ML system leverages time-series data extracted from the patient’s monitors and anesthesia machine, alongside static information, including age, gender, smoking status, height, and weight, which serve as inputs for the model. Meanwhile, the system enhances interpretability by furnishing insights into the contribution and trends of various risk factors [163].

Abnormal movements like convulsions, seizures, or severe involuntary muscular contractions occasionally happened during anesthesia induction. Nagata et al. developed a convulsion prediction model based on heart rate variability to predict drug-induced convulsions. Their model is built based on an anomaly detection algorithm named multivariate statistical process control [168]. Zhang et al. presented an enhanced SVM model to predict the seizure liability of chemicals, which was characterized by its cost-effectiveness, expeditiousness, and superior accuracy [161]. Gao et al. developed an ML model that combined PCA, SVM and CNNs to effectively differentiate seizure-inducing drugs from safe drugs [159].

To improve clinical decision-making, monitoring and predicting muscle relaxation in patients is important, as muscle relaxants are a critical component of balanced anesthesia. Wang et al. utilized DL techniques (RNN, GRU, LSTM) to predict the real-time Train-of-four ratio of cisatracurium, a commonly used muscle relaxant. Additionally, they implemented transfer learning by calculating patient similarity based on body mass index and age and constructing pre-trained models [167].

Table 2 Applications of computational methods, GABA, gamma-aminobutyric acid

5 Challenges

Despite the widespread use of CADD and ML methods in the field of anesthetic drug discovery in recent years, the inherent disadvantages and limitations of these methods, along with the complexity of the anesthetic process, are still inescapable hurdles to the development of new anesthetic drugs.

5.1 Limitations of CADD methods

SBDD heavily relies on the availability of high-resolution structural data, such as protein crystal structures. However, acquiring experimental structures for all target proteins of interest poses significant challenges and overheads. Furthermore, SBDD often overlooks the impact of solvent molecules like water in protein-ligand interactions [217]. Another limitation of SBDD is its tendency to focus on a single target, disregarding the fact that drugs can interact with multiple targets within the human body. Complex diseases and physiological activities can involve multiple targets, thereby restricting the applicability of SBDD in such scenarios.

LBDD requires an ample supply of high-quality ligand data for the target protein. The efficacy of LBDD is greatly influenced by the accessibility and quality of the training set. In addition, LBDD predominantly concentrates on designing drugs based on known ligands that have been previously identified and characterized, but it often neglects the significance of considering protein structure and flexibility [218]. Also, Computational methods used in LBDD such as search algorithms, and scoring functions still have limited accuracy, necessitating further validation in the wet lab.

The limitations specific to each of the commonly used CADD methods are described below. To start with, the accuracy of homology modeling greatly relies on the quality and availability of target-template sequence alignment [219]. Secondly, the sampling methods utilized in molecular dynamics simulations still require further improvement to enhance their accuracy and efficiency [56]. Thirdly, targets and receptors exhibit considerable flexibility, making molecular docking a particularly challenging task [39]. In the case of virtual screening, a major limitation lies in the absence of robust scoring metrics [71]. Moreover, the selection of training compounds significantly influences the quality of the generated pharmacophore model [74]. Eventually, it bears to mention that many CADD methods require substantial computational power and storage resources, which can pose constraints on their widespread application.

5.2 Disadvantages of ML/DL methods

The quality of the dataset is one of the most significant factors affecting the performance of machine learning models [220]. However, data currently available are often not only low in quantity but also far from high-quality, well-annotated, balanced labeling and comprehensive [221, 222]. In particular, there are few specialist datasets available for anesthetic drug development. Secondly, drug discovery datasets incorporate various forms of biological, structural, and chemical information, such as adverse medication effects in patients, multi-omics data in pharmacology, and bioactivity assays in the wet lab [223]. Researchers must carefully select appropriate data handling methods since problems often can’t be treated as simple classification or regression tasks. Lastly, the creation of dedicated databases often requires significant time and effort from experienced professionals for manual data annotation. Due to competition among pharmaceutical companies and high dataset construction costs, high-quality datasets are frequently not publicly available [224].

Lack of transparency and explicability is one of the intrinsic flaws of machine learning that has been widely criticized [223, 225]. "Black box" machine learning models make predictions but without providing explanations of how and why the model reaches a particular conclusion [226, 227]. Nowadays, only a few studies have investigated the explanatory power of their models [222]. This problem greatly hinders the promotion of ML methods in the drug discovery process since researchers are more interested in the biological mechanism behind the prediction.

There are some other important factors closely related to the performance of machine learning models. Firstly, many algorithms are not out-of-the-box and require re-initialization and fine-tuning of parameters, resulting in poor repeatability of results [224]. For example, the starting values like weights and biases in neural networks are usually chosen randomly, which means different initializations can lead to different trained models with varying performance. Additionally, overfitting and underfitting often occur during the training of the model, which significantly impairs predictive accuracy. Finally, training ML/DL models often necessitates extensive storage and computing resources [228]. Large-scale models can’t be trained in a short period of time without a sufficient number of consumer-grade graphical processing units or tensor-processing units.

5.3 Unclear mechanism of anesthesia

Although the notion that general anesthetics act as "drugs without receptors" has been disproved, the precise mechanism of anesthesia remains elusive, making it more challenging to develop novel anesthetic drugs [229, 230].

Recently, there has been growing evidence to suggest that the primary molecular targets for general anesthetics are neurotransmitter- and voltage-gated receptors, such as \(\textrm{GABA}_{\textrm{A}}\) receptors, glycine receptors, nicotinic acetylcholine receptors, and glutamate receptors [231, 232]. However, a drug’s effectiveness is not solely determined by its activity on the intended target, but also by its potential effects on other targets. This is especially relevant in complex therapeutic procedures such as general anesthesia, which often require a combination of drugs to achieve the desired effect on numerous targets [233]. Therefore, the development of novel anesthetics necessitates the consideration of multiple targets, further amplifying the complexity of this task.

Furthermore, the development of effective and safe drugs requires a meticulous equilibrium between biological activity, appropriate pharmacodynamic/pharmacokinetic parameters, and manageable toxicity. These diverse properties are interrelated yet distinct. To optimize multiple properties without compromising one another, researchers must either construct an ensemble model capable of predicting all intended properties or train multiple individual models to optimize each property sequentially [220, 222].

6 Conclusion

Undoubtedly, CADD and ML methods have been adopted by more and more pharmaceutical companies and research teams for designing novel anesthetic drugs. It is worth noting that Large Language Models (LLMs), such as ChatGPT, have made remarkable strides and garnered unprecedented attention recently. LLMs have also exhibited significant potential in bioinformatic analysis and drug discovery tasks, aiding researchers in generating functional protein sequences, identifying new potential targets, predicting toxic effects of drugs, interpreting drug-drug interactions, and extracting accurate pharmacological data from the scientific literature.

Despite the successful identification of several promising anesthetic compounds through computational methods, the development of new commercially available anesthetics remains a formidable challenge. Multiple obstacles continue to hinder progress, such as the scarcity of high-quality datasets, inherent limitations of computational methods, and the incomplete understanding of the underlying mechanisms of anesthesia. In a nutshell, further explorations of the physiological basis of anesthesia and continuous advancements in computational techniques are imperative. Only through these efforts can researchers develop novel anesthetics with reduced expense and shorter development timelines.