Tuning Ferulic Acid Solubility in Choline-Chloride- and Betaine-Based Deep Eutectic Solvents: Experimental Determination and Machine Learning Modeling

Deep eutectic solvents (DES) represent a promising class of green solvents, offering particular utility in the extraction and development of new formulations of natural compounds such as ferulic acid (FA). The experimental phase of the study undertook a systematic investigation of the solubility of FA in DES, comprising choline chloride or betaine as hydrogen bond acceptors and six different polyols as hydrogen bond donors. The results demonstrated that solvents based on choline chloride were more effective than those based on betaine. The optimal ratio of hydrogen bond acceptors to donors was found to be 1:2 molar. The addition of water to the DES resulted in a notable enhancement in the solubility of FA. Among the polyols tested, triethylene glycol was the most effective. Hence, DES composed of choline chloride and triethylene glycol (TEG) (1:2) with added water in a 0.3 molar ration is suggested as an efficient alternative to traditional organic solvents like DMSO. In the second part of this report, the affinities of FA in saturated solutions were computed for solute-solute and all solute-solvent pairs. It was found that self-association of FA leads to a cyclic structure of the C28 type, common among carboxylic acids, which is the strongest type of FA affinity. On the other hand, among all hetero-molecular bi-complexes, the most stable is the FA-TEG pair, which is an interesting congruency with the high solubility of FA in TEG containing liquids. Finally, this work combined COSMO-RS modeling with machine learning for the development of a model predicting ferulic acid solubility in a wide range of solvents, including not only DES but also classical neat and binary mixtures. A machine learning protocol developed a highly accurate model for predicting FA solubility, significantly outperforming the COSMO-RS approach. Based on the obtained results, it is recommended to use the support vector regressor (SVR) for screening new dissolution media as it is not only accurate but also has sound generalization to new systems.


Introduction
Ferulic acid, a phenolic compound abundant in various plant tissues [1], has emerged as a captivating subject of scientific inquiry [2,3] and industrial interest [4,5], while as a component of traditional Chinese medicinal herbs, it has been used for centuries [6].Chemically, it is a 4-hydroxy-3-methoxycinnamic acid, belonging to the wide group of phenolic acids.Its molecular structure is characterized by a phenolic ring with a hydroxyl group and a methoxy group, and can exist in both cis and trans forms [2].Ferulic acid exhibits a plethora of biological activities that are the result of this unique structure.First of all, it is well known as an antioxidant agent [7,8].Specifically, the hydroxyl and methoxy groups on the benzene ring

Solubility of Ferulic Acid in DESs and DES-Water Mixtures
In the experimental part of the study, deep eutectic solvents were applied to solubilize ferulic acid.While DESs are known for being very effective solubilizers of many compounds, the addition of water to these systems can improve solubility even further [82,87], which was the reason for also including the aqueous mixtures of DESs in the study.The solubility of various compounds in water is usually intensely lower than in the eutectic systems, and the addition of excess amounts of water to the DES systems results in a decrease in the solubility of the considered solute.In this concentration range water can be considered an anti-solvent for DESs.However, small amounts of added water tend to promote the solubility of such a solute.The origins of the observed phenomenon are complex and involve the formation of nanostructures of DES components and water molecules [88,89].In particular, the stabilization nano-structure around choline chloride or other hydrogen bond acceptor (HBA) components of DESs play an important role [88].These clusters are remarkably stable, even at modest dilutions.This complex behavior indicates that aqueous-DES mixtures cannot be regarded as typical antisolvent-cosolvent systems, although the favorable effect of the addition of water is still practically valuable and worth in-depth exploration.Here, an extensive search for the optimal water-DES solvent was performed while tuning ferulic acid solubility.In the first phase of experiments, the FA solubility was determined in pure DES systems.These DESs comprised either choline chloride (ChCl) or betaine (BI), both acting as hydrogen bond acceptors (HBA), and one of six polyols, namely ethylene glycol (ETG), diethylene glycol (DEG), triethylene glycol (TEG), 1,3-butanediol (B3D), 1,2-propanediol (P2D), and glycerol (GLY), acting as hydrogen bond donors (HBD).Three molar ratios of HBA to HBD were tested, i.e., 1:1, 1:2, and 1:4.All of the measurements at this screening phase were conducted at 25 • C.
The obtained results show that in general, DESs based on choline chloride are more efficient solubilizers of FA than those comprised of betaine.Nonetheless, the results present a similar overview, regardless of whether ChCl or BI were used as HBAs.When analyzing the molar compositions of the tested systems, it turns out that the 1:2 HBA-HBD molar ratio performs best, followed by the 1:4 ratio, with the 1:1 ratio being the least effective.This general observation is only altered in a few cases in the systems with betaine, where the 1:4 molar ratio outperforms some of the 1:2 systems.The comparison of the effectiveness of individual HBDs reveals differences between systems with choline chloride and betaine.When ChCl is used, the decreasing order of FA solubility is obtained, regardless of the molar ratio of DES constituents: TEG > DEG > GLY > ETG > B3D > P2D.Slightly more complex behavior is observed for systems utilizing betaine.For the optimal 1:2 molar composition, the following trend is obtained: TEG > GLY > DEG > ETG > B3D > P2D.Rather surprisingly, in the 1:4 molar composition, the DES prepared using B3D is the second most effective system, with the one using ETG being the least effective.Overall, it seems that three HBDs stand out as being the most promising, i.e., triethylene glycol, diethylene glycol, and glycerol.The ChCl-TEG eutectic system in a 1:2 molar proportion was the most effective DES among the studied formulations.At 25 • C, the solubility of ferulic acid was found to be x FA = 0.0532.Using the ChCl-DEG system resulted in an FA solubility of x FA = 0.0494, closely followed by the ChCl-GLY eutectic, characterized by x FA = 0.0485.DES formulations with betaine were less efficient in terms of FA solubilization, and the top three systems, namely BI-TEG, BI-DEG, and BI-GLY, were responsible for ferulic acid solubilities of x FA = 0.0432, x FA = 0.0401, and x FA = 0.0392 at 25 • C. Detailed results of FA solubility determination in the studied systems are presented in the Supplementary Materials (please refer to Table S1).
The obtained solubility values of FA in pure DESs served as a starting point for selecting the most promising eutectic systems, which would be used to form aqueous mixtures.Two systems utilizing choline chloride were selected along with two containing betaine, namely ChCl-TEG, ChCl-DEG, BI-TEG, and BI-GLY.The created ternary mixtures were characterized by different amounts of the DES, expressed as its mole fraction in a solute-free solution.The solubility of FA in such systems was measured at four temperatures in the range of 25 As it was expected based on previous experiences, small amounts of water added to the eutectic solvent increase the solubility of ferulic acid.As stated, the nature of DES-water systems is complex.However, it is useful to describe their behavior at least as an apparent cosolvency/antisolvency effect.The solubility curves, describing the mole fraction solubility of FA as a function of the composition of ternary aqueous DES mixtures, are presented in Figure 1.The increasing amount of DES in its aqueous mixture leads to the increase of the solubility of ferulic acid, however, after a certain point, which represents the most effective composition, solubility starts to decrease.The composition characterizing the mixture yielding the highest FA solubility corresponds with the molar composition of x * DES = 0.7 in the aqueous mixture.This observation holds for systems using both choline chloride and betaine at all studied temperatures.In the case of the mixture based on the ChCl-TEG eutectic, the solubility of ferulic acid at the optimal composition equaled x FA = 0.0588 at 25 • C, which is 112% of the FA solubility in the pure eutectic solvent.For the second-most effective DES, namely ChCl-DEG, the FA solubility was found to be x FA = 0.0532 at 25 • C, which corresponds to 108% of the FA solubility in the DES itself.For the aqueous mixtures of eutectics comprising betaine, the BI-TEG and BI-GLY systems were characterized with the solubility of ferulic acid equal x FA = 0.0494 and x FA = 0.0459, respectively, at 25 • C and at the optimal composition, which stands for about 114% of the water-free DES solubility.Additionally, to no surprise, the elevated temperature of the measurements resulted in increased solubility of ferulic acid.This increase was rather stable, and the difference between the solubility at 25  S2 and S3).S1).
The obtained solubility values of FA in pure DESs served as a starting point for selecting the most promising eutectic systems, which would be used to form aqueous mixtures.Two systems utilizing choline chloride were selected along with two containing betaine, namely ChCl-TEG, ChCl-DEG, BI-TEG, and BI-GLY.The created ternary mixtures were characterized by different amounts of the DES, expressed as its mole fraction in a solute-free solution.The solubility of FA in such systems was measured at four temperatures in the range of 25 °C to 40 °C.
As it was expected based on previous experiences, small amounts of water added to the eutectic solvent increase the solubility of ferulic acid.As stated, the nature of DESwater systems is complex.However, it is useful to describe their behavior at least as an apparent cosolvency/antisolvency effect.The solubility curves, describing the mole fraction solubility of FA as a function of the composition of ternary aqueous DES mixtures, are presented in Figure 1.The increasing amount of DES in its aqueous mixture leads to the increase of the solubility of ferulic acid, however, after a certain point, which represents the most effective composition, solubility starts to decrease.The composition characterizing the mixture yielding the highest FA solubility corresponds with the molar composition of x * DES = 0.7 in the aqueous mixture.This observation holds for systems using both choline chloride and betaine at all studied temperatures.In the case of the mixture based on the ChCl-TEG eutectic, the solubility of ferulic acid at the optimal composition equaled xFA = 0.0588 at 25 °C, which is 112% of the FA solubility in the pure eutectic solvent.For the second-most effective DES, namely ChCl-DEG, the FA solubility was found to be xFA = 0.0532 at 25 °C, which corresponds to 108% of the FA solubility in the DES itself.For the aqueous mixtures of eutectics comprising betaine, the BI-TEG and BI-GLY systems were characterized with the solubility of ferulic acid equal xFA = 0.0494 and xFA = 0.0459, respectively, at 25 °C and at the optimal composition, which stands for about 114% of the water-free DES solubility.Additionally, to no surprise, the elevated temperature of the measurements resulted in increased solubility of ferulic acid.This increase was rather stable, and the difference between the solubility at 25 °C and at 40 °C amounted to around 20% regardless of the eutectic type and aqueous mixture composition.Detailed solubility values are again provided in Supplementary Materials (please refer to Tables S2 and S3).The obtained results are worth comparing to solubility data available in the literature.The solubility of ferulic acid was studied in several systems, including water [90,91]; ethyl lactate and its mixtures with water [92]; isopropanol and its aqueous mixtures [93]; and a number of organic solvents, such as DMSO, trascutol, methanol, and ethyl acetate [91,94].When considering these data, the following decreasing order of FA solubility (measured at 25 • C) in various neat solvents can be obtained: DMSO (x FA = 0.0526) > transcutol (x FA = 0.0430) > methanol (x FA = 0.0295) > propylene glycol (x FA = 0.026)3 > ethanol (x FA = 0.0254) > ethylene glycol (x FA = 0.0207) > isopropanol (x FA = 0.0194) > 2-propanol (x FA = 0.0188) > 2-butanol (x FA = 0.0168) > 1-butanol (x FA = 0.0161) > ethyl acetate (x FA = 0.0130) > water (x FA = 0.000049).In the above context, it turns out that among the neat DESs, only the ChCl-TEG 1:2 system offers a slightly better FA solubility than the most efficient classical solvent, namely DMSO, with the ChCl-DEG 1:2, ChCl-GLY 1:2, ChCl-ETG 1:2, and ChCl-TEG 1:4 systems being slightly less effective although outperforming the second classical solvent, which is transcutol.Among the systems with betaine, only the BI-TEG 1:2 system performed better than transcutol.The addition of water to the selected eutectics increased the FA solubility.Thus, the ChCl-TEG 1:2 (x* DES = 0.7) and ChCl-DEG 1:2 (x* DES = 0.7) systems gave higher solubility than DMSO, with the BI-TEG 1:2 (x* DES = 0.7) being close to that.

Ferulic Acid Intermolecular Interactions in DES
The determination of the structure of ferulic acid in its monomeric form was the starting point of the computational procedure.Within a 5 kcal/mol energy window, three of the most stable and distinct conformers were identified, as schematically presented in Figure 2, where the relative energy values corresponding to the RI-BP97/def2-SVPD//RI-BP/def2-TZVPD-FINE level of theory were included.It is noteworthy that the most stable conformers differ in the orientation of the hydroxyl group connected to the aromatic ring or the hydroxyl group that is part of the carboxylic substituent.The most stable conformer forms an intramolecular hydrogen bond between the hydroxyl and methoxy groups.Through comparison with the second conformer, it can be inferred that this type of interaction contributes approximately 1.9 kcal/mol to intramolecular stabilization.The third conformer is similar to the first one but has an anti-rotated hydroxyl group in the carboxylic moiety, resulting in an energy increase of about 2.0 kcal/mol.The order of conformers remains consistent in both the bulk phase and vacuum.

Ferulic Acid Intermolecular Interactions in DES
The determination of the structure of ferulic acid in its monomeric form was the starting point of the computational procedure.Within a 5 kcal/mol energy window, three of the most stable and distinct conformers were identified, as schematically presented in Figure 2, where the relative energy values corresponding to the RI-BP97/def2-SVPD//RI-BP/def2-TZVPD-FINE level of theory were included.It is noteworthy that the most stable conformers differ in the orientation of the hydroxyl group connected to the aromatic ring or the hydroxyl group that is part of the carboxylic substituent.The most stable conformer forms an intramolecular hydrogen bond between the hydroxyl and methoxy groups.Through comparison with the second conformer, it can be inferred that this type of interaction contributes approximately 1.9 kcal/mol to intramolecular stabilization.The third conformer is similar to the first one but has an anti-rotated hydroxyl group in the carboxylic moiety, resulting in an energy increase of about 2.0 kcal/mol.The order of conformers remains consistent in both the bulk phase and vacuum.The study of intermolecular interactions with DES constituents involved pairing each conformer of ferulic acid and minimizing the supermolecule energy.The most representative results are presented in Figures 3 and 4. The affinities of ferulic acid in the studied The study of intermolecular interactions with DES constituents involved pairing each conformer of ferulic acid and minimizing the supermolecule energy.The most representative results are presented in Figures 3 and 4. The affinities of ferulic acid in the studied DES were expressed using the concentration-independent standard Gibbs free energy values (∆G o a = RTln(a o )) for the corresponding pair formation reactions at ambient conditions (T = 298.15K).The subscript "a" indicates the use of mole fraction values corrected with activity coefficients, This expression method is convenient, as it characterizes the thermodynamic propensity of interacting components regardless of the solvent environment.It is expected that the self-association behavior of ferulic acid is consistent across all systems, independent of the content and mole fractions of the components.Indeed, ferulic acid shows a strong propensity for self-aggregation, which can be inferred from the dimer structure and standard Gibbs free energy values, indicating the highest affinity among all studied pairs, as shown in Figures 3 and 4. The FA dimer is stabilized by strong bidirectional hydrogen bonds forming a cyclic structure of the C 8 2 type, common for all carboxylic acids.Additionally, the carboxylic group in FA acts as a strong proton donor, directly contributing to the stabilization of both FA-ChCl and FA-BI pairs.Interestingly, two types of structural motifs were identified.The first type involves the third stable conformer of monomeric FA, where the energy disfavor of this conformer is compensated by non-polar interactions of ChCl or BI with the delocalized electrons of the aromatic ring, making these pairs the most stable.In the case of choline chloride, an alternative structure is stabilized by interactions with the hydrogen carboxylic group.Betaine, having a non-neutralized acetate group, cannot form stable pairs of this type and instead interacts with the hydroxyl group in the para position of FA.Notably, the affinity of FA to BI is significantly stronger compared to ChCl.Interactions between FA and water are much weaker and primarily involve hydrogen bonding with the carboxylic or hydroxyl groups.These observations suggest that the variety of potential contacts between FA and DES constituents might account for the high solvation abilities and consequently high solubility promoted by strong solute-solvent interactions.
from the dimer structure and standard Gibbs free energy values, indicating the highest affinity among all studied pairs, as shown in Figures 3 and 4. The FA dimer is stabilized by strong bidirectional hydrogen bonds forming a cyclic structure of the  type, common for all carboxylic acids.Additionally, the carboxylic group in FA acts as a strong proton donor, directly contributing to the stabilization of both FA-ChCl and FA-BI pairs.Interestingly, two types of structural motifs were identified.The first type involves the third stable conformer of monomeric FA, where the energy disfavor of this conformer is compensated by non-polar interactions of ChCl or BI with the delocalized electrons of the aromatic ring, making these pairs the most stable.In the case of choline chloride, an alternative structure is stabilized by interactions with the hydrogen carboxylic group.Betaine, having a non-neutralized acetate group, cannot form stable pairs of this type and instead interacts with the hydroxyl group in the para position of FA.Notably, the affinity of FA to BI is significantly stronger compared to ChCl.Interactions between FA and water are much weaker and primarily involve hydrogen bonding with the carboxylic or hydroxyl groups.These observations suggest that the variety of potential contacts between FA and DES constituents might account for the high solvation abilities and consequently high solubility promoted by strong solute-solvent interactions.Ferulic acid can also form stable pairs with all the hydrogen bond donor (HBD) counterparts of the DESs studied here.It is noteworthy that all polyols form two distinct motifs involving direct contact with the carboxylic group.In all cases, similarly to FA-ChCl interactions, the rotation of the hydroxyl group within the carboxylic moiety into the antiposition allows for additional interactions with the delocalized aromatic electron clouds.play a crucial role.These interactions not only offset the energy increase caused by the distortion of the carboxylic group but also contribute significantly to the overall stability of the structure.Notably, the highest affinity of FA was found for triethylene glycol (TEG), which correlates well with the highest solubility of FA in DESs containing this polyol regardless of the type of DBA.However, a linear trend between solubility and affinity values is not generally observed, suggesting that other factors contribute to the stabilization of the saturated FA-DES systems.

Machine Learning Model
Solubility measurements, though simple, are time-consuming experiments with several tricky steps, particularly when using DES as dissolution media.This complexity hinders the exhaustive search for new solvents, given the vast number of potential HBA-HBD combinations, along with concentration and temperature dependencies.Consequently, theoretical exploration of the solvent hyperspace to support experimental screening of the most suitable deep eutectic solvents including therapeutic variants (THEDES) is of signif- Ferulic acid can also form stable pairs with all the hydrogen bond donor (HBD) counterparts of the DESs studied here.It is noteworthy that all polyols form two distinct motifs involving direct contact with the carboxylic group.In all cases, similarly to FA-ChCl interactions, the rotation of the hydroxyl group within the carboxylic moiety into the antiposition allows for additional interactions with the delocalized aromatic electron clouds.This behavior is observed for all polyols except ethylene glycol (ETG), which-due to its shortest chain-is unable to adopt a similar position to other proton-donating constituents.This prevents ETG from positioning above the aromatic ring of FA, making a structure stabilized by two hydrogen bonds the most probable configuration.
In all other cases, the non-polar interactions involving the aromatic electron cloud play a crucial role.These interactions not only offset the energy increase caused by the distortion of the carboxylic group but also contribute significantly to the overall stability of the structure.Notably, the highest affinity of FA was found for triethylene glycol (TEG), which correlates well with the highest solubility of FA in DESs containing this polyol regardless of the type of DBA.However, a linear trend between solubility and affinity values is not generally observed, suggesting that other factors contribute to the stabilization of the saturated FA-DES systems.

Machine Learning Model
Solubility measurements, though simple, are time-consuming experiments with several tricky steps, particularly when using DES as dissolution media.This complexity hinders the exhaustive search for new solvents, given the vast number of potential HBA-HBD combinations, along with concentration and temperature dependencies.Consequently, theoretical exploration of the solvent hyperspace to support experimental screening of the most suitable deep eutectic solvents including therapeutic variants (THEDES) is of significant practical value.In this study, a machine learning approach was employed, conducting an exhaustive search for models with the highest accuracy and predictive potential.A diverse set of non-linear regressors was tested for this purpose.
The training dataset comprised ferulic acid solubility values, including newly measured results for this study and previously published data.This dataset, the largest available at present, offers substantial structural diversity of solvents, which is a promising indicator for the generalization of the obtained model.The dataset (N = 344) includes mole fraction solubility values of FA in neat solvents (11 systems, N = 103), binary mixtures (1 system, N = 45), and DES (all presented in this paper, N = 196).The models were trained on a training subset (two-thirds of the entire set) by tuning the adjustable parameters of each regressor.
It is crucial to note that metrics such as mean absolute error (MAE) or the correlation coefficient (R 2 ) were not the sole criteria for model accuracy.As mentioned in the methodology section, potential generalization was incorporated into the scoring function as a penalty derived from learning curve analysis (LCA) using scikit-learn.This approach evaluates model performance by increasing the percentage of included data and performing 10-fold cross-validation.This procedure, conducted separately for training and cross-validation subsets, provides comprehensive diagnostics by assessing the risk of overfitting and quantifying the models' sensitivity to the used data.Although computationally expensive, this method ensures a thorough evaluation of the model.Models characterized by low MAE, high R 2 , and failure to meet generalization criteria as revealed by LCA should be approached with caution.Indeed, this was observed in the best-performing model found in this study.As anticipated, utilizing artificial neural networks can lead to models capable of accurately back-computing experimental data.Multi-layer perceptrons (MLPs) were trained, allowing for flexible adjustment of network architecture to excel in capturing non-linear relationships.MLPRegressor, particularly adept at learning complex patterns through interconnected layers of neurons, demonstrated impressive accuracy.Figure 5 presents the accuracy of this regressor model, alongside other computations, including the second-best model, SVR, and solubility predictions based on the COSMO-RS approach.Two important conclusions can be drawn from this figure.Firstly, the native COSMO-RS results only qualitatively agree with experimental data, predicting general trends but not actual mole fraction solubility.The RMSD (root mean square deviation) and MAPE (mean absolute percentage error) for logarithmic mole fraction values are as high as 0.6 and 54.6%, respectively, which are significantly worse values compared to MLP and SVR, which had values of 0.025 (1.6%) and 0.061 (4.97%), respectively.Secondly, the predictions of both presented models are acceptable, though MLP outperforms SVR.However, the optimal architecture of the MLP network, optimized during the learning process, adopts a highly complex structure consisting of 13 hidden layers.The details of the optimized hyperparameters are provided in Figure 6.The optimized parameters for these models are summarized in Figure 6, which also includes the results of learning curve analysis for each regressor.It is noteworthy that considering multiple regressor models for experimental data modeling and prediction offers several advantages.Firstly, it can enhance performance by leveraging the strengths of different models, such as their ability to capture linear or non-linear relationships.Secondly, comparing and validating multiple models on the same dataset provides insights into their relative strengths, weaknesses, and overall reliability.Lastly, regressors can be used to formulate ensemble methods with optimized weights, further improving overall performance and generalization by mitigating bias and variance.This approach also promotes better generalization to new data and helps identify uncertainties or inconsistencies in predictions.However, this step was not implemented here, as the accuracy was acceptable even with a single-regressor approach and the available data pool was rather limited.For further examination of model properties, the LCA results are presented in Figure 6.Since NuSVR closely mirrors SVR results in both solubility data back computation and LCA results, it was excluded from further analysis.As previously noted, the MLP model has a complex structure that allows for the most accurate solubility back computations.However, LCA reveals significant limitations when applying this model to new data, as the learning curve shows considerable sensitivity to the data pool used for MLP application.Ideally, lines resulting from LCA should exhibit a smooth decrease in MAE with increasing data sample percentage for both training and cross-validation subsets.In the case of the MLP model, this requirement is met for the training dataset, suggesting excellent fitting to known experimental data, but values not seen in the learning phase are predicted with much less consistency.This suggests that while the complex architecture effectively captures the diversity of known data, it fails with new data, indicating poor generalizability for theoretical screening of FA solubility in new dissolution media.The second model analyzed in Figure 6 suffers much less from this drawback.Indeed, after the inclusion of at least 75% of the solubility data, a systematic decrease in MAE values is observed.Similar conclusions can be drawn for the other two models included in Figure 6.From a practical point of view, the selection of SVR, NuSVR, or HGB for further application is a fortunate circumstance since these models are relatively fast to optimize and to

Preparation of the Calibration Curve
The spectrophotometric determination of the solubility of ferulic acid in deep eutectic solvents and their aqueous mixtures was preceded by the preparation of a calibration curve.For this purpose, the stock solution of FA was prepared in a 100 mL volumetric flask using methanol as a solvent.This solution was then diluted, which was achieved by transferring fixed amounts of the stock solution into 10 mL volumetric flasks and adding methanol accordingly.Eleven solutions were obtained in this way, with varying concentrations in the range of 0.00618 mg/mL to 0.0206 mg/mL.The absorption spectra of these solutions were then recorded with the help of an A360 spectrophotometer from AOE Instruments (Shanghai, China) in the wavelength range from 200 nm to 500 nm.The absorbance maximum was found to correspond to the 321 nm wavelength and did not change over the course of the measurements.Three separate curves were prepared in this manner, and the final curve was the result of their averaging.The obtained linear regression was It is noteworthy that the MLPRegressor and SVR were not the only non-linear models demonstrating acceptable accuracy in predicting ferulic acid solubility.Other less complex models, such as NuSVR, HistGradientBoosting, and CatBoost regressors, also achieved acceptable accuracy with significantly lower computational costs compared to MLP regressor.The best models can beranked as follows based on their MRSD and MAPE values given in the parenthesis, respectively: MLP (0.026, 1.57%) > SVR (0.062, 4.97%) ≈ NuSVR (0.063, 5.15%) > HGB (0.050, 2.16%) > CatBoost (0.051, 2.03%).
The optimized parameters for these models are summarized in Figure 6, which also includes the results of learning curve analysis for each regressor.It is noteworthy that considering multiple regressor models for experimental data modeling and prediction offers several advantages.Firstly, it can enhance performance by leveraging the strengths of different models, such as their ability to capture linear or non-linear relationships.Secondly, comparing and validating multiple models on the same dataset provides insights into their relative strengths, weaknesses, and overall reliability.Lastly, regressors can be used to formulate ensemble methods with optimized weights, further improving overall performance and generalization by mitigating bias and variance.This approach also promotes better generalization to new data and helps identify uncertainties or inconsistencies in predictions.However, this step was not implemented here, as the accuracy was acceptable even with a single-regressor approach and the available data pool was rather limited.For further examination of model properties, the LCA results are presented in Figure 6.Since NuSVR closely mirrors SVR results in both solubility data back computation and LCA results, it was excluded from further analysis.As previously noted, the MLP model has a complex structure that allows for the most accurate solubility back computations.However, LCA reveals significant limitations when applying this model to new data, as the learning curve shows considerable sensitivity to the data pool used for MLP application.Ideally, lines resulting from LCA should exhibit a smooth decrease in MAE with increasing data sample percentage for both training and cross-validation subsets.In the case of the MLP model, this requirement is met for the training dataset, suggesting excellent fitting to known experimental data, but values not seen in the learning phase are predicted with much less consistency.This suggests that while the complex architecture effectively captures the diversity of known data, it fails with new data, indicating poor generalizability for theoretical screening of FA solubility in new dissolution media.The second model analyzed in Figure 6 suffers much less from this drawback.Indeed, after the inclusion of at least 75% of the solubility data, a systematic decrease in MAE values is observed.Similar conclusions can be drawn for the other two models included in Figure 6.From a practical point of view, the selection of SVR, NuSVR, or HGB for further application is a fortunate circumstance since these models are relatively fast to optimize and to apply for screening purposes.Because of this, they are recommended for further development and wrapping with a user-friendly application.In general, both support vector regression and its variant, NuSVR, are very effective when dealing with high-dimensional data and can handle nonlinear relationships through the use of kernel functions.Support vector regression aims to find a hyperplane that best fits the data while maximizing the margin.It uses a subset of training samples called support vectors to define the regression function.SVR is known for its ability to handle non-linear relationships by applying kernel functions, such as radial basis function (RBF), polynomial, or sigmoid.NuSVR utilizes additional parameters to control the number of support vectors.The HistGradientBoosting regressor belongs to the boosting family of algorithms and is characterized by an ensemble of decision trees.It utilizes a gradient boosting framework, where subsequent trees are built to correct the errors made by previous trees.It incorporates histogram-based gradient boosting, which improves training speed and memory efficiency.The CatBoost regressor, on the other hand, is a gradient boosting algorithm that is particularly effective with categorical features.It requires minimal data preprocessing and is known for its fast training speed and robust handling of various data types.

Preparation of the Calibration Curve
The spectrophotometric determination of the solubility of ferulic acid in deep eutectic solvents and their aqueous mixtures was preceded by the preparation of a calibration curve.For this purpose, the stock solution of FA was prepared in a 100 mL volumetric flask using methanol as a solvent.This solution was then diluted, which was achieved by transferring fixed amounts of the stock solution into 10 mL volumetric flasks and adding methanol accordingly.Eleven solutions were obtained in this way, with varying concentrations in the range of 0.00618 mg/mL to 0.0206 mg/mL.The absorption spectra of these solutions were then recorded with the help of an A360 spectrophotometer from AOE Instruments (Shanghai, China) in the wavelength range from 200 nm to 500 nm.The absorbance maximum was found to correspond to the 321 nm wavelength and did not change over the course of the measurements.Three separate curves were prepared in this manner, and the final curve was the result of their averaging.The obtained linear regression was found to be A = 98.226 × C + 0.002 (A-absorbance, C-concentration in mg/mL).The validation parameters of the curve included the determination coefficient R 2 , the limit of detection (LOD), and the limit of quantification (LOQ).The R 2 coefficient was equal to 0.9989, which ensures satisfactory linearity of the curve.LOD was found to be 0.000494 mg/mL, while LOQ was 0.001483 mg/mL, which are values far below the concentrations achieved in the studied samples.Overall, the calibration curve can be considered adequate for the determination of the solubility of ferulic acid.

Preparation of the Samples and Solubility Measurements
The solubility of ferulic acid in the considered eutectic systems was determined using the well-established and reliable shake-flask method [95][96][97][98] combined with spectrophotometric measurements.
Deep eutectic solvents were formed by combining a hydrogen bond acceptor with a hydrogen bond donor in various molar ratios, including a unimolar proportion, a 2-fold excess amount of the HBD, and its 4-fold excess amount.One of two substances was used as a HBA, i.e., choline chloride or betaine, while one of six polyols-i.e., TEG, DED, ETG, GLY, P2D, or B3D-was used as an HBD.This resulted in a total of 36 eutectic systems.In order to prepare the eutectic formulation, the two constituents were mixed in glass vessels in a specific molar ratio, placed on a heating plate, and further mixed until the formation of a homogenous solution.DESs prepared in this manner were used in their pure form, and the most promising ones were also selected to form mixtures with water in pre-determined molar compositions.
As the initial step in the solubility determination procedure, saturated solutions of ferulic acid in considered DESs were prepared.This was done by placing an excess amount of FA in a test tube and adding the selected DES or aqueous DES mixture.The Orbital Shaker Incubator ES-20/60 from Biosan (Riga, Latvia) was used to ensure a stable temperature of 25 • C, 30 • C, 35 • C, or 40 • C, depending on the measurement conditions, for 24 h of incubation with simultaneous mixing at 60 rev/min.The samples were then filtered through a 0.22 µm pore-size PTFE syringe filter.All of the test tubes, syringes, pipette tips, and filters were initially heated at the same temperature as the measured sample in order to prevent precipitation.The samples were accordingly diluted with methanol before the measurements.As was the case for the calibration curve, the A360 spectrophotometer was used to record the spectra of the samples in the 200 nm-500 nm wavelength range with a 1 nm resolution and using methanol for calibration.The concentration of ferulic acid was calculated based on the linear equation of the calibration curve and the absorbance values recorded at the characteristic wavelength, i.e., 321 nm.Additionally, the density of the samples was measured, which was necessary for the computation of mole fraction solubility.For this propose, 1 mL of each solution was weighed in a 10 mL volumetric flask using a RADWAG (Radom, Poland) AS 110 R2.PLUS analytical balance with 0.1 mg precision.For each studied system, three samples were prepared and measured, and the obtained values were averaged.

Conformational Analysis
The most representative structures of either monomeric forms of every DES constituent or their homo-and hetero-molecular pairs were found by employing an extensive conformational analysis using the COSMOconf [99] and COMSOtherm [100] packages.The procedure was already described in our previous papers [101][102][103], hence only a brief synopsis is provided here.Each molecule, or molecular complex, was represented by a maximum of ten low-energy conformations identified by independent conformational searches for both the gas and condensed phases.The latter is crucial to account for the influence of the surrounding environment within the conductor-like screening model.The outcome of this protocol is a set of "cosmo" and "energy" files compatible with the latest parameter set, which is BP_TZVPD_FINE_24.ctd.This file comprises all the necessary parameters utilized for thermodynamic properties' computation in COMSOtherm [100].It is worth mentioning that there are available variants of "ctd" files comprising parameter values adjusted for different levels of computation.Here, the RI-BP/def2-TZVPD-FINE level was used as the final step of the conformational analysis and affinity computations.In the manual, there is a recommended two-step procedure starting with geometry optimization on a slightly less demanding level, namely RI-BP/def-TZVP, followed by the final single-point energy computations and generation of the necessary "cosmo" and "energy" files.This procedure can be followed for many molecules but suffers from serious drawbacks if applied to intermolecular complexes or flexible molecules stabilized by non-bonding type of interactions.It is related to inadequate representation of the dispersive forces by the RI-BP/def-TZVP method, and adding correction to the final energy cannot correct the occasionally obtained improper geometries.This is especially important for complexes with delocalized electrons, the interactions of which are poorly represented using the RI-BP/def-TZVP approach.For example, many stacking complexes are not predicted to be stable.This problem was already stated for edaravone [101] or methylxhanthines [97,104].Hence, the actual geometry optimization was performed for both monomers and pairs using the RI-BP97/def2-SVPD approach.Apart from the more realistic pair geometries, there is an additional benefit of using this level of computations, namely the straightforward accounting for basis superposition set error (BSSE).This is regarded as an important component of the total interaction energy.However, rigorous computations of the BSSE via counterpoise correction can be time-consuming.Therefore, much less demanding alternatives using the DFT-C approach were proposed [105].This geometry-based method of BSSE estimation relies on BP97/def2-SVPD geometries and accounts for atom-atom many-body corrections to the total molecular complex energy.Hence, this method was used for the geometry optimization of all structures used in this study.The molecular geometries obtained in such a manner were then used for single energy computations at a level compatible with the parametrization of COSMOtherm mentioned above.Hence, the full acronym describing this approach is as follows: RI-BP97/def2-SVPD//RI-BP/def2-TZVPD-FINE, where the first section before the double slash represents the optimization level and the second part the single energy computations.
The conformational search of the most probable structures or monomers is straightforward and done automatically using COSMOconf.However, identification of the most representative pair conformations posed a more significant computational challenge.Here, the same protocol as previously used for other APIs [101][102][103] was used.This involved generating conformations for various combinations of ferulic acid dimers and ferulic acid-solvent pairs.To achieve this, the COSMOtherm software employed the "CONTACT = {1 2} ssc_probability ssc_weak ssc_ang = 15.0"command.This command prioritizes the generation of the most probable pairs based on contact probabilities considering both hydrogen bonding and weak interactions.Consequently, this step typically results in a substantial number of initial structures requiring further optimization according to the above procedure with the implementation of data reduction to eliminate redundant and high-energy geometries.Two criteria were used for this selection: rootmean-square deviation (RMSD) and relative energy compared to the most stable conformer.Ultimately, only unique contacts within a 5.0 kcal/mol energy window relative to the most stable structure were retained for each complex, ensuring a representative pool of conformers with diverse structures and energies.The final "cosmo" and "energy" files were generated on the same level of theory and used for affinity computations using COSMOtherm parametrization.The core of all quantum chemistry computations was the Biovia TURBOMOLE [106] interfaced with TmolX.In the final step, the affinity computations were performed to characterize the interaction properties of ferulic acid with every constituent of the studied DESs using the default settings of COSMOtherm.

Machine Learning Protocol
The machine learning protocol utilized in this study adheres to the methodology previously applied in our earlier projects [79,102,108].As comprehensive details have been previously reported, only brief remarks are provided here.The solubility prediction model was developed using in-house Python code (version 3.10, https://www.python.org/)designed for hyperparameter tuning across 36 regression models.These models encompass a wide range of algorithms, including linear models, boosting methods, ensembles, nearest neighbors, neural networks, and other regressor types.Hyperparameter optimization was conducted using Optuna (version 3.2, https://optuna.org/),an open-source Python package.The optimization process involved 5000 minimization trials, employing the tree-structured Parzen estimator (TPE) as the search algorithm sampler.To evaluate the performance of each regression model, a custom scoring function was defined, integrating multiple metrics to assess both accuracy and generalizability, as detailed in a previous work [108].This scoring function includes penalties derived from learning curve analysis (LCA), performed using the scikit-learn library (version 1.2.2) during the parameter tuning process.Due to the computational demands of LCA, initial computations were limited to two points, encompassing 50% and 100% of the total dataset.Subsequent LCA evaluations of the final model involved 20-point calculations within the 50-100% data range.The custom loss function incorporates the mean MAE values obtained from the largest training set size, thereby integrating both accuracy and generalizability aspects and providing insights into the model's performance on unseen data.

Molecular Descriptors
The set of molecular descriptors was formulated as described in our previous publication [103], based on the σ-potential values.The temperature-dependent σ-potentials were calculated for each compound in its pure, single-component state.The molecular descriptors for complex systems, utilized in machine learning, were represented as the difference between the σ-potential of a pure solute and that of a solvent at a given temperature.For multicomponent solvents, the σ-potential was characterized as the pure state value weighted by the mole fraction of the solute-free mole fraction.Typically, COSMOtherm generates σ-potential profiles consisting of 61 points for σ values between −0.03 and +0.03 e/Å 2 with a 0.001-step increment.
To reduce the number of descriptors, the most promising subset was identified by inspecting the significance of the relationship between experimental solubility and relative σ-potential values.This selection criterion was quantified by restriction to R 2 > 0.4 for a given σ value.This is illustrated in Figure 7, which shows two subsets: non-DES solvents (including neat solvents and binary mixtures) and DES systems.Notably, the overall correlation between experimental solubility and σ-potential is modest when considering the entire dataset.However, restricting the analysis to non-DES systems reveals a very high correlation (R 2 > 0.8) in the non-polar region, typically attributed to hydrophobicity (HYD).Additionally, the sub-range of hydrogen bond donicity (HBD) shows a high correlation with FA solubility.The points marked with bold black dots in Figure 7 were used for machine learning.
Additionally, solubility was computed by fully solving the solid-liquid equilibrium (SLE) problem using COSMOtherm [100] for two reasons.Firstly, it is interesting how accurate are the predictions based on the COSMO-RS theory [109].Secondly, the computed solubility can be used as a molecular descriptor for machine learning purposes.It is crucial to note that the COSMO-RS approach is designed for predicting the thermodynamic characteristics of bulk systems, excluding the solid state.Since solubility involves the transition of the crystalline phase into a liquid saturated solution, the contribution of fusion data to the overall thermodynamic characteristics must be added to the input files.The required data include the melting temperature (T m = 445.83K), heat of fusion (∆H fus = 32.49kJ/mol), and heat capacity change upon melting (∆C p,fus ≈ ∆S fus ≈ ∆H fus T m ).The values in parentheses correspond to the average data provided in the compilation by Acree et al. [110].The obtained solubility values were utilized as molecular descriptors in addition to the relative σ-potential values previously defined.
the transition of the crystalline phase into a liquid saturated solution, the contribution of fusion data to the overall thermodynamic characteristics must be added to the input files.The required data include the melting temperature (Tm = 445.83K), heat of fusion (ΔHfus = 32.49kJ/mol), and heat capacity change upon melting (ΔCp,fus ≈ ΔSfus ≈ ΔHfusTm).The values in parentheses correspond to the average data provided in the compilation by Acree et al. [110].The obtained solubility values were utilized as molecular descriptors in addition to the relative σ-potential values previously defined.

Conclusions
Ferulic acid is an important representative of phenolic acids with many practical applications.Since it can be obtained from natural sources, the selection of the most effective and green media for extraction seems to be of immense importance.This is emphasized by efforts to measure the solubility of FA in neat solvents and binary mixtures, as documented by a study of the literature.This paper further extends the knowledge of the dissolution of ferulic acid through a systematic study of two types of deep eutectic solvents involving choline chloride or betaine, acting as hydrogen bond acceptors, with one of six polyols playing the role of hydrogen bond donors.The performed optimization encompassed both the DES composition and its relative concentration in aqueous mixtures.It was found that the eutectics utilizing choline chloride were slightly more effective than those with betaine and that a 1:2 molar ratio of the HBA and HBD counterparts of the eutectic was the optimal one.Furthermore, the addition of small amounts of water to the DES further promotes the solubility of FA compared to the neat eutectic.Among the considered polyols, triethylene glycol proved to be the most effective.The presented results suggest that designed solvents comprising choline chloride and TEG, with an addition of water, can be treated as efficient alternatives to traditional organic solvents, including the first-choice solvent, namely DMSO.In order to gain insight into the saturated systems of

Conclusions
Ferulic acid is an important representative of phenolic acids with many practical applications.Since it can be obtained from natural sources, the selection of the most effective and green media for extraction seems to be of immense importance.This is emphasized by efforts to measure the solubility of FA in neat solvents and binary mixtures, as documented by a study of the literature.This paper further extends the knowledge of the dissolution of ferulic acid through a systematic study of two types of deep eutectic solvents involving choline chloride or betaine, acting as hydrogen bond acceptors, with one of six polyols playing the role of hydrogen bond donors.The performed optimization encompassed both the DES composition and its relative concentration in aqueous mixtures.It was found that the eutectics utilizing choline chloride were slightly more effective than those with betaine and that a 1:2 molar ratio of the HBA and HBD counterparts of the eutectic was the optimal one.Furthermore, the addition of small amounts of water to the DES further promotes the solubility of FA compared to the neat eutectic.Among the considered polyols, triethylene glycol proved to be the most effective.The presented results suggest that designed solvents comprising choline chloride and TEG, with an addition of water, can be treated as efficient alternatives to traditional organic solvents, including the first-choice solvent, namely DMSO.In order to gain insight into the saturated systems of FA in liquid media on a molecular level, the intermolecular interactions in the considered systems were studied, which led to the identification of the most stable homo-and heteromolecular pairs formed between the interacting compounds.For the purpose of solvent screening, aimed at assisting the time-consuming experimental phase, a machine learning protocol was employed for the formulation of a non-linear model being able to predict FA solubility.This phase resulted in obtaining very accurate models that were able to precisely back-compute solubility as a function of solvent type and temperature.It is recommended to utilize the SVR regressor with the provided values of optimized parameters for screening purposes of new dissolution media.It is also worth mentioning that the very popular way of solubility prediction using the COMSO-RS approach is at most qualitatively accurate, and the developed models are much more accurate.
and the top three systems, namely BI-TEG, BI-DEG, and BI-GLY, were responsible for ferulic acid solubilities of xFA = 0.0432, xFA = 0.0401, and xFA = 0.0392 at 25 °C.Detailed results of FA solubility determination in the studied systems are presented in the Supplementary Materials (please refer to Table

Figure 1 .
Figure 1.The solubility curves of ferulic acid (FA) in aqueous DES mixtures involving choline chloride (left panel) and betaine (right panel) and selected polyols at various temperatures, expressed as solvent-composition-related mole fractions.X*DES stands for mole fractions of solute-free DES in aqueous mixtures.For comparison, the room-temperature solubility of ferulic acid in DMSO is provided.

Figure 1 .
Figure 1.The solubility curves of ferulic acid (FA) in aqueous DES mixtures involving choline chloride (left panel) and betaine (right panel) and selected polyols at various temperatures, expressed as solventcomposition-related mole fractions.X* DES stands for mole fractions of solute-free DES in aqueous mixtures.For comparison, the room-temperature solubility of ferulic acid in DMSO is provided.

Figure 2 .
Figure 2. Schematic representation of ferulic acid conformers with electron density distributions and their relative energies in the bulk state, represented by an infinite conductor.Additionally, in parentheses are the relative values of total energies obtained for the gas phase.

Figure 2 .
Figure 2. Schematic representation of ferulic acid conformers with electron density distributions and their relative energies in the bulk state, represented by an infinite conductor.Additionally, in parentheses are the relative values of total energies obtained for the gas phase.

Figure 3 .
Figure 3.The most representative structures of FA dimers and hetero-molecular pairs formed with choline chloride, betaine, and water.

Figure 3 .
Figure 3.The most representative structures of FA dimers and hetero-molecular pairs formed with choline chloride, betaine, and water.

Figure 4 .
Figure 4.The most representative structures of FA with HBD counterparts of studied DESs.

Figure 4 .
Figure 4.The most representative structures of FA with HBD counterparts of studied DESs.

Molecules 2024, 29 , 3841 10 of 21 Figure 5 .
Figure 5.The correlation between experimental and computed solubility of ferulic acid in neat, binary, and deep eutectic solvents.The gray color denotes the results of COSMO-RS computations, blue indicates the SVR model, and in black, the results obtained from the MLP model are marked.

Figure 5 . 35179347438435515 Figure 6 .
Figure 5.The correlation between experimental and computed solubility of ferulic acid in neat, binary, and deep eutectic solvents.The gray color denotes the results of COSMO-RS computations, blue indicates the SVR model, and in black, the results obtained from the MLP model are marked.

Figure 6 .
Figure 6.Characteristics of the best models found by hyper-parameters training for prediction of the ferulic acid solubility.The plots provide the results of the learning curve analysis, which is devoted to testing the consistency of models' performance using both sub-sampling and cross-validation.The optimal values of each model are provided for reproducibility purposes.

Figure 6 .
Figure 6.Characteristics of the best models found by hyper-parameters training for prediction of the ferulic acid solubility.The plots provide the results of the learning curve analysis, which is devoted to testing the consistency of models' performance using both sub-sampling and cross-validation.The optimal values of each model are provided for reproducibility purposes.

Figure 7 .
Figure 7.The correlation between relative solute-solvent σ-potentials and experimental solubility expressed as a function of σ values.Two series represent correlations computed for subsets including only non-DES solvents (neat solvents and binary mixtures) or only DES systems.Bold symbols define regions used as a set of molecular descriptors, where R 2 > 0.4 for either subset.The split into three distinct subranges is marked with colorful rectangles.

Figure 7 .
Figure 7.The correlation between relative solute-solvent σ-potentials and experimental solubility expressed as a function of σ values.Two series represent correlations computed for subsets including only non-DES solvents (neat solvents and binary mixtures) or only DES systems.Bold symbols define regions used as a set of molecular descriptors, where R 2 > 0.4 for either subset.The split into three distinct subranges is marked with colorful rectangles.