Introduction

Recently, numerous endeavors have been made to find various efficacious and promising techniques to appropriately solve or at least mitigate the unprecedented challenges of pharmaceutical industry such as low solubility of drugs, unacceptable productivity and growing research and development (R&D) costs1,2,3. Oxaprozin (known under the brand name Daypro) is a well-known nonsteroidal anti-inflammatory drug (NSAID), which has indication in pain/swelling management of adult patients suffering from rheumatoid arthritis, ankylosing spondylitis and soft tissue disorders4. Microsomal oxidation and glucuronic acid conjugation are known as the major procedures of Oxaprozin primary metabolism in the liver. Metabolism of this drug in the liver results in the formation of Ester and ether glucuronides as the prominent conjugated metabolites. Manageable safety profile, great efficiency, low liver toxicity and appropriate cost has made Oxaprozin a golden NSAID for the pain alleviation of patients with chronic musculoskeletal diseases5,6,7.

Application of novel approaches to increase the poor solubility of drugs is an attractive approach to solve one of the challenges of pharmaceutical industry. Recently, the use of supercritical fluids (SCFs) for processing therapeutic agents has offered suitable opportunities for the pharmaceutical manufacturing scientists8. This type of fluid possesses great potential of application in disparate scientific scopes including drug delivery, chromatography, and extraction9. Among various sorts of SCFs, supercritical carbon dioxide (SCCO2) recommends various interesting technological advantages such as low toxicity, ignorable flammability and environmentally friendly characteristic which may eventuate in result a significant decrement in the application of commonly employed organic solvents. Apart from different industrial-based applications, particle micronization using SCCO2 is one of the novel and promising approaches for fabricating micro-/nanoparticles with controlled size and purity10.

Prediction of drugs solubility using artificial intelligence (AI) method has currently attracted the attention as a noteworthy option for validating the actual data obtained from experimental research. Development of predictive modeling and simulation via this technique for different industries (i.e., separation, purification, extraction and drug delivery) can considerably decline computation time and guarantee the accuracy of conducted experimental results11,12.

Computers can learn from data without having to be explicitly programmed, using a class of AI techniques known as machine learning (ML). Machine learning seeks to develop meta-programs that process experimentally gathered data and apply it to train models for the prediction of unknown future inputs13,14. Ensemble methods are also a class of ML methods that use several basic models to achieve higher accuracy and generality in prediction15,16.

When multiple weak estimators are combined to produce a robust estimator, it is known as "boosting." Because of the sequential logic employed by Boosting, each weak estimator has a direct impact on its successor. Particularly AdaBoost17 is a typical boosting algorithm that uses reweighted training data to gradually obtain weak classifiers. It was decided to use Adaboost procedures to modify the efficiency of two base estimators as the foundation of this study. Decision Tree and Gaussian process regression are selected base models.

Decision Tree asks a series of questions using feature sets, such as ‘is equal' or ‘is greater,' and based on the provided answers, another question is asked to respond. Same procedure is repeated until no further inquiries are received, at the point the result is obtained. The data is constantly divided into binary components, allowing the Decision Tree to grow. To evaluate the divisions for all attributes, a randomness metric such as entropy is used18,19.

Also, for both exploration and exploitation, Gaussian process regression is a non-parametric Bayesian modeling technique. The primary profitability of the method is the ability to forge a reliable response for input variables. It can describe a broad range of interactions between features and targets by using a feasibly infinite count of input features and allowing the data to define the complexity level through Bayesian inference20,21.

Experimental

In this paper, validation of predictive models’ results is done by their comparison with obtained experimental data from the experiments of Khoshmaram et al.22. They developed a pressure-volume-temperature (PVT) cell to experimentally measure the solubility value of Oxaprozin in SCCO2 solvent22. In their developed setup, first, the SCCO2 solvent is prepared via increasing the pressure of gaseous CO2 through the liquefaction unit. In the second step, the impurities of condensed manufactured SCOO2 are removed via an inline filter. Then after, the purified SCOO2 flows through a surge tank before its entrance to the PVT cell. The controlling process of temperature as an important parameter directly affects the solubility value of drug takes place using heating elements that wrapped the chamber and are isolated via PTFE layer.

Data Set

The dataset used in this study comes from22, which has just 32 data points. The temperature and pressure are two input parameters. Each vector also has one output (solubility). Table 1 shows the dataset.

Table 1 The Whole Dataset.

Methodology

GPR

Gaussian process regression is one of the base models used. GPR, unlike other regression models, does not necessitate the specification of an exact fitting function. A multidimensional Gaussian distribution sampled at random points can be compared to field data 23,24.

The target y is simulated as \(f\left( {\varvec{x}} \right)\) for a collection of n-dimensional instances \(D = \left\{ {\left( {{\varvec{x}}_{i} ,y_{i} } \right){|}i = 1, \ldots ,n} \right\}\), where \({\varvec{x}}_{i} \in R^{d}\) is input data point and \(y_{i} \in R\) is the output vector.

$$y = f\left( {\varvec{x}} \right)$$
(1)

The GP is declared using f(x), which is an implicit function illustrated as a collection of random variables:

$$f\left( {\varvec{x}} \right) \sim GP\left( {m\left( {\varvec{x}} \right),{\mathbf{K}}} \right)$$
(2)

In the above equation, K denotes any covariance defined by kernels and their corresponding input values and m(x) is the mean operator.

Decision Tree

Trees are a fundamental data structure in a variety of AI contexts. An ML technique known as decision trees (DTs) is normally usage to measure the data. It is possible to utilize a decision tree to solve different estimation issues. To build a basic decision tree, you need internal nodes (which makes decision with query input features), edges (which return results and transmit them to children), and terminal or leaf nodes (which return results and send them to children) (that make decision on final output)25,26.

The root node is a special and unique node in the DT, which treats each dataset feature as a hub or node. To demonstrate how the tree model works, we start with a single node and work our way down the tree (output). Until a terminal node is found, this strategy will be tweaked and refined. The DT's forecast or outcome would be the terminal node18,27,28. The most useful algorithms for decision tree induction are CART28, CHAID25, C4.5, and C5.029.

ADABOOST

Multiple base predictors can be combined to create an ensemble learning-based model, which outperforms a single predictor. By altering the weight distribution of samples, Freund and Schapire17 proposed the AdaBoost algorithm for enhancing the accuracy of weak learners. Because of its advantages, this method has become increasingly popular30,31.

As the “AdaBoost” name implies, this technology adaptively enhances base predictors, enabling them to address complicated issues. One of the symptom for theamicability of basic models is that they have good generalization properties due to their simple structure. But despite the fact that they are easy to use in real-world situations, their architecture is severely biased, therefore they cannot handle complex jobs.

The Adaboost algorithm from Hastie et al.32,33 is mostly demonstrated in the following steps.

  1. 1.

    Set weights for data points:

$$\omega_{i} = \frac{1}{N} , i \in \left\{ {1, \ldots , N} \right\}$$
(3)
  1. 2.

    Set Number of base estimators as M.

  2. 3.

    For b from 1 to M:

(a) Develop a learner Gb(x) using the weights \(\omega_{i}\).

$$({\text{b}})\,\,\,err_{b} = \frac{{\mathop \sum \nolimits_{i = 1}^{N} \omega_{i} I\left( {y_{i} \ne G_{b} \left( {x_{i} } \right)} \right)}}{{\mathop \sum \nolimits_{i = 1}^{n} \omega_{i} }}$$
(4)
$$({\text{c}})\,\,\,\,\,\alpha_{b} = \log \left( {\frac{{1 - err_{b} }}{{err_{b} }}} \right)$$
(5)
$$({\text{d}})\,\,\,\,\,\,\omega_{i} \leftarrow \omega_{i} .\exp \left( {\alpha_{b} .I\left( {y_{i} \ne G_{b} \left( {x_{i} } \right)} \right)} \right), i = 1, \ldots ,N$$
(6)
  1. 4.

    Final Output:

$${\text{G}}\left( {\text{x}} \right) = sign \left( {\mathop \sum \limits_{b = 1}^{M} \alpha_{b} G_{b} \left( x \right)} \right)$$
(7)

In the previous procedure, the quantity of data vectors and the number of iterations are N and M, respectively. The estimator that passes b over the data is Gb(x). Building a prediction model (Base model) can be done in a variety of ways, but the most frequent is to employ stumps or very short trees. The operator I is set to 0 if the logical correlation is false and to 1 if the correlation is true, as shown by the indicator variable34,35,36.

Results

Important hyper-parameters of selected models were first tuned applying the search grid method to assess the efficacy of the approaches described in this study. The resultant models were then examined using three distinct criteria, as specified below: MAE, MAPE, and R-score37,38:

$$\begin{array}{*{20}c} {{\text{MAE ERROR}} = \frac{1}{{\text{n}}} \times \mathop \sum \limits_{i = 1}^{{\text{n}}} \left| {{\hat{\text{y}}}_{{\text{i}}} - {\text{y}}_{{\text{i}}} } \right| } \\ \end{array}$$
(8)
$${\text{MAPE ERROR}} = \frac{1}{{\text{n}}} \times \mathop \sum \limits_{i = 1}^{{\text{n}}} \left| {\frac{{{\hat{\text{y}}}_{{\text{i}}} - {\text{y}}_{{\text{i}}} }}{{{\text{y}}_{{\text{i}}} }}} \right|$$
(9)

The third regression performance metric in our research is R2 score. The R2-Score is used on a regression line to determine how close the estimated amounts are to the true (expected) amounts.

$$\begin{array}{*{20}c} {{\text{R}}^{2} - score = 1 - \frac{{\mathop \sum \nolimits_{{\text{i = 1}}}^{{\text{n}}} \left( {{\text{y}}_{{\text{i}}} - {\hat{\text{y}}}_{{\text{i}}} } \right)^{2} }}{{\mathop \sum \nolimits_{{\text{i = 1}}}^{{\text{n}}} \left( {{\text{y}}_{{\text{i}}} - {\upmu }} \right)^{2} }}} \\ \end{array}$$
(10)

μ indicates the mean of the expected data39.

In Figs. 1 and 2, the ADA + DT and ADA + GPR models are analyzed in terms of expected values and estimated values, respectively. The blue dots are the estimated values with the training samples and the red dots with the test data. The distance from the expected data line is important to us. Also, the numerical results of the three criteria mentioned above are explained in Table 2. Based on results, the ADA + GPR model has passed almost all the points of the training data. But despite this fact, we can say that the obtained model has no overfitting problem because the red dots, which are test data and have not been included in the training phase, are also close to the expected values.

Figure 1
figure 1

Expected and estimated values (ADA + DT).

Figure 2
figure 2

Expected and estimated values (ADA + GPR).

Table 2 Final Model Results.

Figure 3 shows the simultaneous impact of the pressure and temperature as inputs the only output (Oxaprozin solubility). This diagram shows that increasing both inputs generally increase the output value. By keeping each of the two input parameters constant and changing the other parameter, we obtained two-dimensional Figs. 4 and 5, which confirms this fact. Figure 4 illustrates the influence of pressure and Fig. 5 demonstrates the impact of temperature on the solubility value of Oxaprozin. To analyze the diagrams, the effects of pressure and temperature on the solubility of drug must be considered. It is conspicuous from the graphs that whenever the temperature value improves, the molecular compaction in the SCCO2 system increases, which consequently eventuates in enhancing the solvating power of solvent and thus, increasing the solubility of Oxaprozin40. Figure 4 proves nearly 8 times enhancement in the solubility value of Oxaprozin by enhancing the pressure from 110 to 410 bar.

Figure 3
figure 3

prediction surface in final ADA + GPR.

Figure 4
figure 4

Trends for Pressure.

Figure 5
figure 5

Trends for Temperature.

About temperature, the presence of opposite impacts on two competing parameters makes the analysis difficult. Increasing the temperature, the sublimation pressure of SCCO2 system increases that positively encourages the Oxaprozin solubility. On the other hand, increase in temperature deteriorates the density of solvent that results in reducing the solubility of drug. To evaluate simultaneous impact of these parameters, cross-over pressure (CP) must be considered. At pressure values lower than CP, density reduction possesses stronger effect than sublimation pressure increases and therefore, when the temperature increases the solubility of Oxaprozin in SCCO2 fluid reduces. At pressure values greater than CP, sublimation pressure increment has greater impact than density reduction and therefore, when the temperature increases the solubility of Oxaprozin in SCCO2 fluid considerably improves. This analysis agrees with similar papers10. The optimal values, which should therefore be approximately the upper limit of both inputs, are also shown in Table 3, which are the same as the maximum values.

Table 3 Optimal Values.

Conclusion

In current years, increasing the solubility values of different commonly employed drugs using green solvents is an attractive field of study in pharmaceutics. SCCO2 has been recently introduced as a promising alternative for organic solvents because of having valuable features such as high efficacy, inflammability, and low toxicity. In this study, two base models (weak estimators) were used and boosted with Adaboost methods with the aim accurate prediction of Oxaprozin solubility in SCCO2 system. Decision tree (DT) and Gaussian process regression are two of these models (GPR). We optimized these models' hyperparameters and evaluated them using standard metrics. The MAE error rate, R2-score, and MAPE of boosted DT are 6.806E-05, 0.980, and 4.511E-01, respectively. Furthermore, boosted GPR has an R2-score of 0.998, MAPE error of 3.929E-02, and MAE error rate of 5.024E-06. As a result, ADA + GPR was chosen as the best model, with the following best values: (T = 3.38E + 02, P = 4.0E + 02, Solubility = 0.001241).