Introduction

Epoxy (EP) resin is widely used in the field of corrosion protection because of strong adhesion properties, high corrosion resistance, excellent mechanical properties and low cost. However, cracks may arise inside or at the surface of the EP matrix during long-term service and reduce its corrosion protection performance with time, thus increasing potential safety hazards during its service life1. The application of self-healing coatings will be the most common and cost-effective method of improving the corrosion protection and thus the durability of metallic structures. A wide range of engineering structures from vehicles to aircrafts, from factories to house-hold equipment can be effectively protected via the self-healing coating systems. Recent efforts have focused on improving the durability of EP coatings in the presence of damage by granting them self-healing functions, which can be realized through intrinsic repair of the material matrix by reversible covalent bonds2 and noncovalent bonds3, or via extrinsic strategies depending on the release of healing agents4 and corrosion inhibitors5 into coating defects. In contrast to these extrinsic self-healing mechanisms, the intrinsic one endows the coating with the ability to simulate natural systems and repeated repairability. Such mechanisms are typically based on reversible covalent bonds via disulfide bonds6, Diels–Alder reactions7, and hydrazone bonds8, or non-covalent interactions via metal-ligand9 and hydrogen bonding10,11,12. Among these mechanisms, the most promising one is based on dynamic hydrogen bonds because of their high reversibility and mild repair conditions, in combination with their directional and tunable self-association properties13. As an indication of the self-healing ability of the coating, the low-frequency impedance modulus, such as according to the electrochemical impedance spectroscopy (EIS) data measured at 0.01 Hz (|Z|0.01Hz), were extensively used to estimate the overall corrosion resistance of the test area14,15. A higher |Z|0.01Hz value represents a higher barrier ability of the coating. Based on the previous studies16, in our view the design of an ideal self-healing corrosion protective coating should have the following main index: (1) The |Z|0.01Hz value of the self-healed coating is nearly close to that of the intact coating; (2) excellent barrier ability, |Z|0.01Hz value more than 1010 Ω·cm2; (3) long-term stability in corrosive environments both before and after repair. For example, in a previous work by our group11, an intrinsic self-healing EP coating was developed by grafting 2-ureido-4[1H]-pyrimidinone (UPy) as a quadruple hydrogen bonding unit onto the backbones of an EP-matrix. The UPy/EP coating demonstrated high-efficient self-healing functionality within 5 min in 3.5 wt.% NaCl solution. The self-healed coating still had high |Z|0.01Hz value of 4.8 × 1010 Ω·cm2 even after 60 days of immersion in NaCl solution.

Often, the achievement of the target performance of self-healing implies synergy between multiple components of the EP coating formulation, including different resins, curing agents, liquid/solid additives, etc. The conventional trial-and-error design strategy for coating formulation is time-consuming and labor-intensive. Recently, machine learning methods have show to represent a promising option for materials design and optimization, especially for systems with complex properties or compositions17,18,19,20,21. For example, Haik et al.22 developed a machine learning model to predict the stress relaxation properties of EP matrix composites, based on a three-layer neural network model using initial stress, test temperature and operating time as input variables and stress relaxation behavior as output. The final model was obtained by training 9000 experimental data samples. This model can predict efficiently the time-dependent mechanical behavior of a viscoelastic or a viscoplastic material. Kan et al.23 constructed a molecular recognition model for predicting 2000 molecular descriptors from chemical structures using a gated graph neural network, and extracted 32-dimensional vectors representing 2000 molecular descriptors through the molecular recognition model to complete the dimension reduction. This 32-dimensional vector was used as the input value for the next Gaussian regression, and the machine learning model for predicting electrical conductivity was finally built by training a large amount of data. Typically, the establishment of an accurate machine learning requires vast training data, which is difficult to be obtained for polymer resin formation considering the heavy experimental workload in the synthesis and characterization24,25. Therefore, the construction of small sample datasets in the machine learning aspect of the research method has major implications for polymer design.

The problem of machine learning under small sample data conditions (<1000 samples) has received much attention in recent years26,27. For the processing of small sample data, the most common methods are the neural-network-based methods28, hierarchical machine learning29, active-learning-based method30 and so on. For instance, Li et al.31 proposed a model combined with nearest neighbor interpolation (NNI), synthetic minority oversampling technique (SMOTE) and extreme gradient boosting (XGBoost) models to predict the abrasion of rubber composites with small samples. NNI and SMOTE are two classical models in image processing that aim at increasing the sample size and solving the problem of sample unevenness. Combining these two models, the original dataset was expanded from 23 to 710 samples. Finally, the abrasion was predicted by the XGBoost model to yield a better prediction accuracy (MSE = 0.001). Similarly, active learning has been applied to discover EP adhesive strength30, polymer molecular dynamics32, high-Tg polymers33,34 and among others from the small initial datasets.

Herein, we employed a machine learning framework to develop self-healing composite coatings for corrosion protection applications. A flowchart of the machine learning workflow is shown in Fig. 1. In the machine learning framework, active learning and Bayesian optimization to model and maximize the common logarithm of the low-frequency impedance modulus (lg|Z|0.01Hz) obtained from EIS measurements for various scratched self-healing EP composite coatings to improve its self-healing property. This coating formulation consists of an EP resin, polyetheramines, amino-terminated urea-pyrimidinone monomers (UPy-D400) and ZIF-8@Ca microfillers. The EP resin mixed with polyetheramine can react to form an EP-based polymer, and the UPy-D400 acts as a quadruple hydrogen bonding unit that can be grafted into the EP network to provide a self-healing function for the EP polymer via the self-association process; The ZIF-8@Ca microfiller, which is an empty CaCO3 carbonate microcontainer with ZIF-8 nanoparticles assembled on the surface, is incorporated as a model filler that can not only enhance the barrier property of EP coating, but also present a pH-sensitive response to release loaded substance (e.g., inhibitors) to achieve useful functions. For the machine learning process, four-parameter variables, molecular weights of polyetheramine, the molar ratio of polyetheramine to EP, UPy-D400 content, and ZIF-8@Ca content, were used as input, and the lg|Z|0.01Hz value of the scratched coatings was used as output; 32 initial dataset were obtained from the preliminary experiment. Among the five common models, the model with the best accuracy was selected, and trained to achieve the best accuracy by active learning. Subsequently, the Bayesian optimization method was used to search for the scratched self-healing EP composite coating with an extremely high lg|Z|0.01Hz value. Finally, the self-healing and corrosion protective properties of the optimal coating were verified by EIS and salt spray testing.

Fig. 1: A machine learning workflow for performance optimization in self-healing EP composite coating.
figure 1

Four steps are involved in machine learning workflow, from a data acquisition, b active learning, c Bayesian optimization, and d experimental verification.

Results and Discussion

Experimental results from the initial dataset

As seen in Table 1, four parameters with four initial condition levels were set (total experimental conditions = 44 = 256 sets). Four parameter variables included the molecular weight of polyetheramine, molar ratio of polyetheramine to EP, the molar content of UPy-D400, and mass content of the ZIF-8@Ca microfillers. An initial 32 sets of experimental conditions were extracted from the 256 sets by orthogonal Latin square design method35. This is a method based on mathematical statistics and the orthogonality principle, which can achieve the equivalent results of a large number of comprehensive tests with the minimum number of tests. It selects a part of points which can represent the whole experiment according to the orthogonality of the experiments. And these selected points are uniformly distributed in the whole space36,37. Then, the coatings were prepared for EIS measurements according to these 32 conditions, the corresponding the low impedance modulus (lg|Z|0.01Hz value) of different scratched coatings was obtained. The reason for selecting lg|Z|0.01Hz value as the output instead of using |Z|0.01Hz value is to eliminate the undesirable effects caused by sample dataset with high variability.

Table 1 Summary of variable parameters for coating formulation used at the active learning stage.

Measurements of lg|Z|0.01Hz experimental values of scratched coatings that comprise our initial dataset are reported in Table 2. Figure 2 shows the distribution of lg|Z|0.01Hz experimental values. As shown in Fig. 2, the average lg|Z|0.01Hz experimental values were widely distributed in the range of 4.75–10.87 (lg(Ω·cm2)). According to a previous experimental study11, the scratched coatings with different self-healing abilities are involved in this distribution, indicating that the selection of the initial preparation conditions using the orthogonal Latin square method is reasonable.

Table 2 Experimental results of lg|Z|0.01Hz values of scratched coatings prepared under various conditions (32 initial dataset), the lg|Z|0.01Hz values represent the average ± standard deviations.
Fig. 2: Distribution of lg|Z|0.01Hz experimental values from the 32 initial dataset.
figure 2

This task aims to confirm the distribution of target property values under initial experimental conditions.

Assessment and selection of an lg|Z|0.01Hz values prediction model

Next step, different experimental conditions and corresponding lg|Z|0.01Hz value of scratched coating were used as the input and output of the machine learning process, respectively, and five common machine learning models were trained using 32 initial datasets. A comparison of the predicted and measured lg|Z|0.01Hz values for each model is shown in Fig. 3a, e. A black dashed straight line indicates equal measured and predicted values. A comparison of the accuracy of each model is shown in Fig. 3f. Compared with the other models, the RF model yielded the best accuracy in terms of a higher coefficient of determination (R2) value, and lower mean absolute percentage error (MAPE) and root mean square error (RMSE) values. This may be due to its deeper layers of model structure than general machine learning models; RF models possessed a good processing ability for data with high variability38,39. Hence, the RF model was chosen to predict the lg|Z|0.01Hz values in subsequent steps.

Fig. 3: The selection of the best machine learning model.
figure 3

Distribution of predicted versus measured lg|Z|0.01Hz values from successive test sets used in the 10-fold cross-validation using different machine learning models, a–e correspond to artificial neural network (ANN), linear regression (LR), support vector regression (SVR), decision tree (DT) and random forest (RF) model, respectively. f A comparison of the accuracy for each model, including R2, MAPE, and RMSE values.

Active learning and machine learning model performance

For the active learning process, the RF model first predicted the lg|Z|0.01Hz values of all (256 – 32 = 224 sets) possible experimental conditions from the 32 initial dataset. The predicted lg|Z|0.01Hz values were ranked in descending order. The five top-ranked experimental conditions from 224 sets of conditions were selected as proposals for subsequent measurements to be performed in the laboratory. These five measurements were added to the initial 32 datasets. Then, the machine learning model for the prediction of the lg|Z|0.01Hz values was trained again on this improved (32 + 5) dataset. The new measurements were re-used in the RF model to improve the accuracy, as this can enhance the prediction accuracy for high-target performance samples in a targeted manner and improve the active learning efficiency. This process, from the prediction phase to the reuse phase, represents one cycle of active learning (see Table 3). This active learning process is repeated until the preliminary goal of the best accuracy of the machine learning model is achieved. In this study, the active learning cycle was stopped if all the evaluation indices (MAPE, RMSE and R2) stopped increasing.

Table 3 Experimental results of predicted and measured lg|Z|0.01Hz values of scratched coatings prepared under various proposed conditions.

Figures 4a–g present scatter plots of the predicted versus measured lg|Z|0.01Hz values from the initial dataset to the last cycle. The blue and red dots indicate existing and new measurements, respectively. The evolution of the corresponding R2, MAPE and RMSE values for each cycle is summarized in Fig. 4h, i. As shown in Figs. 4a–g, the predicted and measured values gradually approached the black dashed straight line from the initial dataset to the last cycle, indicating that an increase in the dataset size resulted in predicted lg|Z|0.01Hz values that are closer to measured lg|Z|0.01Hz values. As the dataset size increased, R2 clearly increased, and the MAPE and RMSE decreased gradually. After five active learning cycles, the R2, MAPE and RMSE values reached equilibrium, at this time, the active learning process was terminated. For the dataset of 62 samples, the RF model achieved R2, MAPE and RMSE values of 0.709, 0.081 and 0.685 (lg(Ω·cm2)), respectively. Compared to the accuracy of the initial dataset, improvements of 246%, 51% and 47% were achieved for R2, MAPE, and RMSE, respectively. In this case, R2 was greater than 0.7 and both MAPE and RMSE were stabilized at a low level, indicating that the RF model reached acceptable accuracy. Therefore, the active learning procedure was stopped at this stage and the RF model was fixed based on the existing dataset.

Fig. 4: Active learning process.
figure 4

a–g Correlation scatter plots of predicted and measured lg|Z|0.01Hz values using different datasets, including initial dataset and cycle 1-6 datasets. h, i Comparison of the accuracy (R2, RMSE and MAPE value) of the RF model for different datasets.

In addition, Table 3 lists the top-five proposed experiments for the five cycles of active learning with the corresponding predicted and measured lg|Z|0.01Hz values. Several measured lg|Z|0.01Hz values in Table 3 that were greater than 11.00 (lg(Ω·cm2)), which is greater than the highest value in the initial dataset, showed that the RF model allowed us to predict the experimental conditions of the coating with a potentially high self-healing ability. These additional data on high-performance self-healing coatings are beneficial for further maximization using Bayesian optimization. In addition, the proposed experiments required polyetheramine of molecular weights 400 and 2000 g·mol–1, with an r value greater than 0.85, 10-20 mol% of UPy-D400, and ZIF-8@Ca microfiller content in the full range. This provided the main guidance for refining the test conditions in the subsequent step.

Bayesian optimization for screening optimal candidate

In this step, three experimental conditions were refined: r values, molar ratio of UPy-D400, and microfiller content were varied from 0.85 to 1.00, 10 to 20 mol%, and 5.5 to 10.0 wt.%, by increments of 0.1, 1 mol%, and 0.1 wt.%, respectively. The molecular weights of the polyetheramine curing agents were fixed at 400 and 2000 g·mol–1. Obviously, this search space for the coating formulation is vast, and the machine learning model has limited utility if it do not incorporate uncertainty and the expected improvement process. Since a machine learning model is built using a limited amount of training data, the selection of candidates using that model may be limited to a local search. Therefore, we speculate that Bayesian optimization may give better results because this optimization technique considers the uncertainty of the prediction and the balance between local and global search40.

Bayesian optimization works on a surrogate model and evaluates a utility function41. The utility function uses the mean and standard deviation of the candidates estimated by the surrogate model. The utility function encodes a trade-off between the exploitation (candidate searching at points with high mean) and exploration (candidate searching at points with high uncertainty). Herein, we have used RF as the surrogate model and expected improvement (EI) as a utility function. The EI is defined as the following Eqs. (1)-(2)42:

$${\rm{EI}}({\rm{x}})=\sigma ({\rm{x}})[z\varPhi (z)+\phi (z)]$$
(1)
$${\rm{z}}=\,[\mu ({\rm{x}})-{\rm{f}}({{\rm{x}}}^{+})-\varepsilon ]/\sigma ({\rm{x}})$$
(2)

where EI(x) represents the expected improvement value for each coating formulation candidate. μ and σ are the predicted output and standard deviation of the candidates obtained from the surrogate model, f(x+) is the maximum value of the target material property observed in the training data set. Φ represents the cumulative distribution function and ϕ is the probability distribution function assuming the target property values follows the normal distribution. The term ε regulates the amount of exploration, higher the value of ε more is the exploration. In this method, the largest EI value represents the most promising coating formulation candidate. Here, we use 1000 iterations for BO run, as this was sufficiently many to predict the optimal experimental conditions with high accuracy (see Data Availability section for where to access this code), and a series of experiments were conducted starting from rank 1 (Table 4). The new highest lg|Z|0.01Hz values of 11.58 ± 0.28 (lg(Ω·cm2)) was observed, that is, (4.40 ± 2.04) × 1011 Ω·cm2. This impedance modulus value was considerably high compared with those reported in previous studies on EP-based self-healing coating11,43,44,45,46, which reported a typical lg|Z|0.01Hz value range of 7.48–10.68 (lg(Ω·cm2)). The suggested experimental conditions from Bayesian optimization showed that a relatively low molecular weight of polyetheramine and a high molar ratio of polyetheramine to EP were promising conditions for achieving a high lg|Z|0.01Hz value, whereas the molar ratio of UPy-D400 and microfillers content should be in the middle of their defined range. According to previous studies47,48, excessive amine addition improves the shape recovery rate of EP materials. The intrinsic self-repair process mentioned in this study is realized by a self-healing unit (hydrogen bond) self-association process on the premise that the damage can be physically closed. A high shape recovery rate is beneficial for the physical closure of scratched material surfaces11. Excess amine (excessive r value) leads to higher flexibility but lower mechanical strength of EP materials47, an optimum combination of high strength and good flexibility can be achieved by adjusting the r value precisely through Bayesian optimization. The introduction of self-healing units and microfillers may also affect the various performance indicators of the coatings, which can balance each addition amount simultaneously to achieve a reasonable design for target property.

Table 4 Proposed preparations of a composite coating at Bayesian optimization stage with the related experimental lg|Z|0.01Hz values of scratched coatings.

Figure 5 shows the distribution of lg|Z|0.01Hz values of scratched coatings from the initial dataset, after the five active learning cycles, and after a Bayesian optimization process. The lg|Z|0.01Hz values from the initial dataset were spread randomly from 4.75 to 10.87 (lg(Ω·cm2)). By comparison, all samples that followed an active learning cycle exhibited a high lg|Z|0.01Hz value (>8.23 (lg(Ω·cm2))), and one sample from the Bayesian optimization dataset showed an exceptionally high lg|Z|0.01Hz value. These results demonstrate the potential of our machine learning framework for the design and optimization of high-performance functional materials based on small sample conditions.

Fig. 5: Comparison of the measured target performance for each machine learning stage.
figure 5

Distribution of measured lg|Z|0.01Hz values from the initial dataset (blue), after active learning process (dark blue) and after Bayesian optimization (red).

Interpretation of machine learning model for coating design

EIS measurements were conducted on the scratched pure commercial EP and ZIF-8@Ca/EP coatings and their corresponding intact coatings to study the self-healing and corrosion resistance properties. The ZIF-8@Ca/EP coating was prepared based on the best formulation selected by Bayesian optimization. Nyquist and Bode plots of the intact coatings were obtained by EIS after 30 min of immersion in 3.5 wt.% NaCl solution (Fig. 6a–c). Figure 6d–i show the Nyquist and Bode plots of the steels with scratched coatings after immersion for 1, 15, 30 and 60 d. The as-used pure EP coating was prepared by mixing E51 with D400 polyetheramine curing agents at a molar ratio of 5:3. For the pure EP sample, the intact coating initially showed a high barrier property with large capacitive arc in the Nyquist plot (Fig. 6a) and the high |Z|0.01Hz value (3.98 × 1010 Ω·cm2) in the Bode plot (Fig. 6b). The phase angles in the high frequencies (105 Hz) were close to –90° which indicates the capacitive character of the coatings. In contrast to the intact pure EP coating, intact ZIF-8@Ca/EP coating exhibited a slightly larger capacitive arc in terms of Nyquist plot, and |Z|0.01Hz value rose to 3.82 × 1011 Ω·cm2, indicating substantial improvement in the barrier property of the coating after the machine learning adjustment. The average and standard deviation of the |Z|0.01Hz value for intact coating were calculated using six parallel samples, expressed as (4.63 ± 2.08) × 1011 Ω·cm2.

Fig. 6: EIS characterizations of the different intact/scratched coatings.
figure 6

a Nyquist plots and b, c Bode plots of the intact pure EP and intact ZIF-8@Ca/EP coatings after 30 min of immersion in 3.5 wt.% NaCl solution. Nyquist plots and Bode plots of different d–f scratched pure EP and g–i scratched ZIF-8@Ca/EP coating during immersion in 3.5 wt.% NaCl solution for 60 d.

In terms of the scratched coatings, the capacitive arcs of the pure EP coating shrank and the |Z|0.01Hz values declined gradually over the entire immersion time, demonstrating the continuous deterioration of the barrier property (Figs. 6d–e). Subsequently, for the phase diagrams in Fig. 6f, scratched pure EP showed two-time constants: one related to the charge transfer process at the coating/substrate interface (10−2−100 Hz), and the other related to the resistance increase by means of corrosion product formation in the artificial defect (101−105 Hz)49. Compared with the Bode plots for pure EP coating, the Bode plots of the scratched coating showed approximately –45° straight lines with |Z|0.01Hz values in excess of 3.80 × 1011 Ω·cm2 at the beginning of immersion. The corresponding phase angles were –90◦ over the frequency range of 10–1−105 Hz. This implies that during the immersion, a conductive pathway is not formed through the coating, which largely exhibits a capacitive behavior similar to that of an intact coating50. During the 60 d of immersion, the |Z|0.01Hz values of the ZIF-8@Ca/EP coating only slightly decreased from 3.80 × 1011 Ω·cm2 to 1.23 × 1011 Ω·cm2, confirming that the scratched ZIF-8@Ca/EP coating had been well repaired and possessed a satisfactory corrosion resistance.

After scratching, the pure EP and ZIF-8@Ca/EP coatings were subjected to salt spray tests following the ASTM B117/D1654 standard. Figures 6b and 7a show the optical images of the coatings after exposure to the salt spray chamber for different periods. According to the visual assessment in Fig. 7a, green corrosion products were observed at the scratches of the pure EP coating within the 1 d of the salt spray test. After 60 d, large-scale coating delamination and corrosion products appeared in the scratched region, indicating that the scratched location of the pure EP coating was highly vulnerable to attack by corrosive species. Compared with pure EP, only slight scratch traces were observed at the scratched positions, and the ZIF-8@Ca/EP coating did not show any signs of degradation (delamination, corrosion, or blistering) after 30 d (Fig. 7b). Furthermore, as the salt spray exposure time increased to 60 d, only one slight corrosion spot was observed at the scratched site, indicating the corrosion of the scratched ZIF-8@Ca/EP coating could be controlled in a salt spray environment for a long time.

Fig. 7: Salt spray analysis of the different intact/scratched coatings.
figure 7

a, b Optical images of the pure EP and ZIF-8@Ca/EP coating. c Optical images of the pure EP and ZIF-8@Ca/EP coating after pull-off test at the end of salt spray test. d The adhesion strength values of the pure EP and ZIF-8@Ca/EP coating before and after 60 d of salt spray exposure, the adhesion strength values represent the average ± standard deviations.

The adhesion strength, an important indicator of coating properties, can be measured using a pull-off test. Figure 7d shows the adhesion strength/loss values of intact pure EP and ZIF-8@Ca/EP coating before and after the 60 d salt spray test. The optical images of the remaining coatings following the pull-off test are presented in Fig. 7c. As shown in Fig. 7c, none of the samples exhibits cohesive failure. As shown in Fig. 7c, the dry adhesion strength of the ZIF-8@Ca/EP coatings (9.82 MPa) is higher than that of pure EP (4.70 MPa). This is because the introduction of branched-chain amines and UPy units enhanced the hydrogen bonding between the coating and the metal surface51. After salt spraying, the pure EP coating exhibited a considerable adhesion loss of 79.4% (0.97 MPa). In contrast, the ZIF-8@Ca/EP coating demonstrated not only the highest wet adhesion strength (9.50 MPa) but also minimal adhesion loss (3.3%) after a 60 d of salt spray test.

In summary, the design of experimental techniques combined with an active learning and Bayesian optimization was proposed to predict and optimize the lg|Z|0.01Hz values of scratched EP self-healing coatings composed of different molecular weights of polyetheramine curing agent, molar ratios of polyetheramine to E51 EP resin, molar content of UPy-D400 and mass contents of ZIF-8@Ca microfillers. The active learning process yielded the preferred experimental conditions to build a predictive RF model of lg|Z|0.01Hz values with satisfactory accuracy (R2 = 0.709, MAPE = 0.081, RMSE = 0.685 (lg(Ω·cm2))) after five cycles of active learning. Then, an extremely high lg|Z|0.01Hz values of 11.58 (|Z|0.01Hz = 3.80 × 1011 Ω·cm2) was achieved using the experimental conditions that were refined by Bayesian optimization. As confirmed by EIS, the ZIF-8@Ca/EP coating exhibited a great healing effect in barrier property (intact sample: 3.82 × 1011 Ω·cm2, repaired sample: 3.80 × 1011 Ω·cm2). In addition, in terms of the corrosion resistance after repair, the ZIF-8@Ca/EP coating exhibited slight corrosion after 60 d of the salt spray test, and the adhesion loss of the composite coating after the salt spray test was 3.3%, which was considerably lower than that of the pure EP coating (79.4%).

Methods

Materials

Polyetheramine curing agents with four different molecular weights (230, 400, 2000 and 4000 g·mol–1) were sourced from the Aladdin Industrial Corporation. The E51 EP resin was sourced from Jiangsu Heli Resin Co., ltd. The ZIF-8@Ca microfillers and the UPy-D400 monomers were obtained using previously published methods11,51. The Q235 mild steel was used as the substrate.

Preparation of coatings and EIS test

Based on the selected 32 experimental conditions, the preparation process of the self-healing EP coating containing ZIF-8@Ca microfillers (ZIF-8@Ca/EP) is shown in Fig. 8. In each case, the ZIF-8@Ca microfillers were first mixed with the E51 EP resin under magnetic stirring. The polyetheramine curing agent and UPy-D400 were then added to the mixture using a mechanical agitator at 500 rpm for 10 min. Prior to the coating preparation, the steel specimens were wet-polished sequentially with 150-, 240- and 400-grit sandpapers, washed with ethanol and blow-dried in an N2 atmosphere. The resulting mixture was applied to a steel piece using a bar coater. The coated samples were obtained by drying at room temperature for 48 h. The final thickness of each of the dry films was approximately 85 μm.

Fig. 8: Schematic illustration of the preparation process for self-healing EP composite coating.
figure 8

The coating formulation consists of the EP resin, polyetheramines, hydrogen bond unit (UPy-D400) and ZIF-8@Ca microfillers.

EIS tests were performed to measure the low-frequency impedance (|Z|0.01Hz) values of the coated steel with/without an artificial scratch. Herein, all scratches of the EIS tests are made by a scalpel, and they are reproducible. The EIS results were obtained using a 3.5 wt.% NaCl solution and a CHI-660E electrochemical workstation with a three-electrode cell system comprising a coated steel substrate as a working electrode, a platinum plate electrode as a counter electrode and a saturated calomel electrode (SCE) as a reference electrode. The test parameters were set in the 10−2−105 Hz range with a 0.02 V root mean square amplitude. Prior to EIS measurements, artificial through-coating scratches (approximately 3 mm in length and approximately 60 µm in width) were made on the different coated steels using a scalpel. The measurements were conducted on the coated steels at least five times to ensure the reproducibility of the EIS results. In EIS results, the |Z|0.01Hz value in the Bode plot usually represents the main performance index for the corrosion resistance of a coating, that is, a higher |Z|0.01Hz value reflects a higher barrier property52. Therefore, this index was used to characterize the repair effect of the barrier properties of the coating after scratching.

To further verify the self-healing and long-term anti-anticorrosion ability of the scratched composite coating after machine learning process, salt spray test was performed on the coatings via exposing the samples to salt spray for 60 d in accordance with ASTM D1654.

Data pre-processing, data splitting and machine learning models

Data pre-processing and data splitting were performed and different machine learning models were simulated using the Python package scikit-learn (version 1.1.1). The four variable parameters (Table 4) in this study were standardized following a standard Gaussian distribution of a mean of 0 and a variance of 153. The purpose of normalization is to make the preprocessed data be limited to a certain range (e.g., [0,1] or [–1,1]), thus eliminating the undesirable effects caused by sample dataset with high variability. The validity and accuracy of all employed machine learning models were evaluated using k-fold cross-validation. In this step, the data were randomly arranged and divided into 10 groups. Nine groups were allocated for training purposes, and the remaining group was assigned to validate of the model. The average value was obtained by repeating the same process 10 times. To obtain the performance level of the model, the MAPE, RMSE and R2 were introduced to evaluate the k-fold cross-validation, using the following Eqs. (3)-(5):54,55,56

$$\,{\rm{MAPE}}=\,\frac{1}{{\rm{n}}}\mathop{\sum }\limits_{{\rm{i}}=1}^{{\rm{n}}}\frac{|{{\rm{y}}}_{{\rm{i}}}{-\hat{{\rm{y}}}}_{{\rm{i}}}|}{|{{\rm{y}}}_{{\rm{i}}}|}$$
(3)
$$\,{\rm{RMSE}}=\sqrt{\frac{1}{{\rm{n}}}\mathop{\sum }\limits_{{\rm{i}}=1}^{{\rm{n}}}{{({\rm{y}}}_{{\rm{i}}}{-\hat{{\rm{y}}}}_{{\rm{i}}})}^{2}}$$
(4)
$${{\rm{R}}}^{2}=1-\frac{{\sum }_{{\rm{i}}=1}^{{\rm{n}}}{{({\rm{y}}}_{{\rm{i}}}{-\hat{{\rm{y}}}}_{{\rm{i}}})}^{2}}{{\sum }_{{\rm{i}}=1}^{{\rm{n}}}{{({\rm{y}}}_{{\rm{i}}}-\bar{{\rm{y}}})}^{2}}$$
(5)

where n is the number of samples, and \({y}_{i}\) and \({\hat{y}}_{i}\) are the experimental and predicted values of the ith sample, respectively.

The accuracy of the machine learning model was accessed using its MAPE (MAPE value is in between 0 and 1, a value closer to 0 indicates greater accuracy57) and RMSE (a lower value of each indicates greater accuracy30) and R2 (a value closer to 1 indicates greater accuracy; when the R2 coefficient is greater than 0.7, the model represents acceptable accuracy58.)

Five machine learning models were applied as regression tools to the dataset: LR, ANN, SVR, DT and RF models. The machine learning methods are described in detail in the related reference59. The interested reader should refer to the Data Availability section for where to access our code used to run these algorithms.

Bayesian optimization

Bayesian optimization40 was used to determine the highest lg|Z|0.01Hz values by refining the variable conditions from Table 1. Bayesian optimization was performed using the Python package GPyOpt.