Accelerated discovery of high-performance Al-Si-Mg-Sc casting alloys by integrating active learning with high-throughput CALPHAD calculations

ABSTRACT Scandium is the best alloying element to improve the mechanical properties of industrial Al-Si-Mg casting alloys. Most literature reports devote to exploring/designing optimal Sc additions in different commercial Al-Si-Mg casting alloys with well-defined compositions. However, no attempt to optimize the contents of Si, Mg, and Sc has been made due to the great challenge of simultaneous screening in high-dimensional composition space with limited experimental data. In this paper, a novel alloy design strategy was proposed and successfully applied to accelerate the discovery of hypoeutectic Al-Si-Mg-Sc casting alloys over high-dimensional composition space. Firstly, high-throughput CALculation of PHAse Diagrams (CALPHAD) solidification simulations of ocean of hypoeutectic Al-Si-Mg-Sc casting alloys over a wide composition range were performed to establish the quantitative relation ‘composition-process-microstructure’. Secondly, the relation ‘microstructure-mechanical properties’ of Al-Si-Mg-Sc hypoeutectic casting alloys was acquired using the active learning technique supported by key experiments designed by CALPHAD and Bayesian optimization samplings. After a benchmark in A356-xSc alloys, such a strategy was utilized to design the high-performance hypoeutectic Al-xSi-yMg alloys with optimal Sc additions that were later experimentally validated. Finally, the present strategy was successfully extended to screen the optimal contents of Si, Mg, and Sc over high-dimensional hypoeutectic Al-xSi-yMg-zSc composition space. It is anticipated that the proposed strategy integrating active learning with high-throughput CALPHAD simulations and key experiments should be generally applicable to the efficient design of high-performance multi-component materials over high-dimensional composition space.


Introduction
With the high strength-to-weight ratio, excellent castability, and excellent thermal/electrical conductivities, Al-Si-Mg casting alloys are widely used in the automotive industries, aerospace, construction, and electricity industries [1]. Continuous improvement in the properties of Al-Si-Mg casting alloy is required to microstructures typically consist of coarse anisotropic dendritic (Al) phase, need-like eutectic (Si), and brittle intermetallics (e.g. β-Al 5 FeSi), will greatly limit the mechanical properties of casting alloys [2]. The addition of alloying/microalloying elements for regulating the cast microstructure has been serving as one effective method to improve the comprehensive properties of casting alloy. In recent decades, rare earth elements (Sc [3][4][5][6][7][8][9][10][11][12][13][14][15][16], La [17], Ce [18], Er [19], Y [20], etc.) were widely introduced to greatly modify the eutectic (Si) from coarse need-like to fine-fibrous and refine α-(Al) to significantly improve the mechanical properties and castability of Al-Si-Mg-based casting alloys. Especially, the rare earth element Sc might be the most effective modifier for eutectic (Si) [21], which could be due to the decreased surface tension of molten Al [22,23] or the competitive nucleation and growth among the reactive components in ternary or higher-order eutectics [11,15,24]. Moreover, when the ratio of w(Si)/w (Sc) is lower than 2.5 (i.e. Al-2Si-1Sc alloy [25]), one can observe the primary Al 3 Sc, which is coherent with the (Al) at low lattice misfit [26] and has a high resistance to coarsen [27] to refine the (Al) grains and AlSc 2 Si 2 phase via the peritectic reaction (L +Al 3 Sc→AlSi 2 Sc 2 ) [28]. While the primary AlSc 2 Si 2 may form at 2.5 < w(Si)/w(Sc) < 4.5 [28] (see Supplementary Tables S1, S2, and Figure S1). When the ratio of w(Si)/w(Sc) is larger than 4.5, one can see the formation of eutectic AlSc 2 Si 2 , which is partially coherent with primary/eutectic (Al) [15] and can enhance the mechanical properties due to its large Bulk modulus (B = 99.9 GPa) and shear modulus (G = 74.1 GPa) [29].
Accordingly, numerous studies have been devoted to optimizing the additional content of Sc in different casting aluminum alloys with well-defined compositions by means of the experimental 'trial-and-error' method. For example, Muhammad et al. [30] and Pramod et al. [31] studied the effect of different Sc additions (0.0, 0.2, 0.4 wt.% Sc) on microstructure and mechanical properties of A357 and A356 casting alloys, respectively, and the experimental results showed that the mechanical properties were improved with the increase of Sc addition under the experimental maximum addition (0.4 wt.%). Pandee et al. [13] also studied the effect of different Sc additions (0, 0.24, 0.40, 0.65 wt.% Sc) on the microstructure and mechanical properties of A356 (Al-7Si-0.3 Mg) casting alloy, and the results showed that 0.4 wt.% Sc was the optimal addition for A356 casting alloy. Whereas the results from Xu et al. [32] showed that the optimal Sc addition was 0.8 wt.% for F357 (Al-7Si-0.65 Mg) casting alloy. Kim et al. [22] also found that 0.8 wt.% Sc addition was optimal content for Al-8.5Si alloy. In general, the optimal addition of Sc required for different Al-Si-Mg casting alloys varies significantly with different Si and Mg contents. However, the determination of the optimal Sc addition for different Al-Si-Mg casting alloys by the experimental trial-and-error method is not only time/ labor-consuming but also imprecise. What's more, it is almost impossible to accurately explore the optimal contents of Si, Mg, and Sc in hypoeutectic Al-xSi-yMg-zSc casting alloys over high-dimensional composition space by the trial-and-error experiments. Therefore, there is an urgent need to remedy this situation.
With the development of computer technology, various computational assisted alloy design methods, like first-principles (FP) calculations [33], molecular dynamics (MD) simulations [34], computational thermodynamics (CT) [11,15], computational kinetics [35,36], phase-field (PF) simulations [37], and machine learning (ML) approach [38,39], have been widely used to accelerate the development of high-performance alloys. The CT method, which can construct the quantitative relationship between the composition and microstructure of alloys, has been recently used to efficiently design the optimal Sc contents in i.e. A356 (0.54 wt.% Sc), A357 (0.50 wt.% Sc), A360 (0.76 wt.% Sc), and A380 (0.75 wt.% Sc) alloys, and also experimentally validated [11,15]. Furthermore, the combination of CT and ML techniques, which can establish the quantitative relation 'composition-process-microstructureproperties' of target alloys and thus accelerate the design of alloy composition, has been employed to efficiently design the optimal addition of Sr in A356 alloy [40]. Thus, such a combination of CT and ML techniques may stimulate the efficient composition design in multicomponent alloys [41,42].
However, there are still big challenges when one directly applies such a combination of CT and ML techniques to optimize the contents of Si, Mg, and Sc in the present Al-xSi-yMg-zSc alloys. The first big challenge lies in that a large number of datasets are needed to establish the quantitative relation 'composition-process-microstructure-properties' of Al-xSi-yMg-zSc alloys over high-dimensional composition space. For the casting Al-Si-Mg alloys, it is well known [41,43] that one may obtain a vast amount of data for reliable relation 'composition-process-microstructure' by combining the high-throughput CALculation of PHAse Diagrams (CALPHAD) simulations with key experiments. However, there are too limited experimental data in the literature to develop the quantitative relation 'microstructure-properties'. That is because most of the current data devote to the Sc-modified A356 alloys, while for other commercial alloys, i.e. A357 and A360, the experimental data are very scarce, not to mention the alloy compositions that do not belong to the commercial alloys. The second big challenge is how to achieve the multiobjective optimization, including ultimate tensile strength (UTS), yield strength (YS), and elongation (EL), of the Al-Si-Mg-Sc casting alloys.
For the first challenge, based on the ocean of CALPHAD data on 'composition-process-microstructure' and limited experimental data on mechanical properties, the active learning technique can be applied and may result in designing the next experimental points by the CALPHAD and Bayesian optimization samplings, experimental validation, and data feedback iterations, which can reduce the uncertainty of models and improve the predicted accuracy with the least number of iterations. Such a strategy should be beneficial to efficiently construct the required dataset with a minimum size. As for the second challenge, the multi-objective optimization strategies, like the sequential filter strategy [44], the transformation of multi-objective into single-objective optimization methods [45], Pareto front optimization method [46], can be utilized.
Consequently, in this paper, a novel alloy design strategy integrating activating learning with highthroughput CALPHAD calculations and key experiments is proposed to accurately explore the optimal contents of Si, Mg, and Sc in hypoeutectic Al-xSi-yMg-zSc casting alloys over high-dimensional composition space with the multi-objective optimization. The quantitative relation 'composition-process-microstructure' of hypoeutectic Al-Si-Mg-Sc casting alloys is first established by the high-throughput CALPHAD solidification simulations. Then, the reliable relation 'microstructure-properties' of hypoeutectic Al-Si-Mg-Sc casting alloys is acquired using the active learning technique supported by key experiments designed by CALPHAD and Bayesian optimization samplings. After that A356-xSc alloy system is chosen as the benchmark, and the present strategy is utilized to design the high-performance hypoeutectic Al-xSi-yMg alloys with optimal Sc additions and then validated by experiments. Finally, the present strategy is extended to screen the optimal contents of Si, Mg, and Sc over high-dimensional hypoeutectic Al-xSi-yMg-zSc composition space.

Design strategy
The schematic diagram for our multi-component alloys discovery strategy is shown in Figure 1. Firstly, the literature data were collected to construct the training dataset, which includes the composition and mechanical properties. The corresponding solidified microstructures (type and fraction of solidified structures) were calculated by high-throughput Scheil-Gulliver solidification simulations based on the reliable thermodynamic database of Al-Si-Mg-Sc system [11], from which the relation of 'compositionprocess-microstructure' was established, and used as the input features. Then, the features were sorted and analyzed to determine the relationship between the input features and the impact on output features (properties), i.e. UTS, YS, and EL. The artificial neural network (ANN) was used to construct the relation of 'microstructure-properties' from low-to high-dimensional composition space. The uncertainties from the data noise, data source, and trained state of the model were considered to evaluate the predicted results. Moreover, the input microstructural features space over a wide composition range of hypoeutectic Al-Si-Mg-Sc casting alloys was constructed by the high-throughput CALPHAD solidification simulations, and used to predict the properties over the wide composition space. The initial experimental dataset was constructed by CALPHADassisted sampling. After considering the uncertainties of machine learning prediction, Bayesian optimization was employed to perform the globally efficient design of the next experimental points with the best properties over the searching space. After that, the designed Sc-modified Al-Si-Mg casting alloys were prepared and the mechanical properties were measured to validate the design results, and feedback to the training dataset for the next iteration. Besides, some stopping criteria were applied to avoid pointless over-exploration. For instance, the iteration should stop when the result meets the expectations or when the improvement of uncertainty is not noticeable [47]. Finally, the mechanical properties of hypoeutectic Al-Si-Mg-Sc casting alloys over the wide composition space were obtained with an efficient strategy, and the high-performance hypoeutectic Al-Si-Mg-Sc casting alloys were recommended.

High-throughput CALPHAD calculations
Up to now, the CALPHAD approach has been widely used in the design and development of different highperformance materials, but the only CALPHAD computational tool is generally of low efficiency in exploring the entire composition and temperature space of a multi-component system [48]. In order to meet the increasing demand for massive calculation in the field of high-performance aluminum alloy, a machine learning accelerated distributed task management system (Malac-Distmas) has been developed in our research group to realize high-throughput calculations (HTCs) and storage of various data [43].
With the platform Malac-Distmas coupling Thermo-Calc software [49], we can perform HTC of Scheil-Gulliver simulation for Sc-additional Al-Si-Mg casting alloy over the commercial hypoeutectic composition space, to obtain their solidified microstructure and construct the solidification diagram. During the calculations, the content range of Si was set between 4.5 and 13 wt.% with ∆w(Si) = 0.5 wt.%, that of Mg was between 0 and 0.7 wt.% with ∆w(Mg) = 0.05 wt.%, while that of Sc was between 0 and 1 wt.% (when w(Si)≤10 wt.%) or between 0 and 1.3 wt.% (when w(Si)>10 wt.%) with ∆w(Sc) = 0.01 wt.%. The calculation range of Sc content mainly depends on the Si content, while the optimal Sc content designed from the solidification diagram increases with the increase of Si content. It means that 29,519 composition points of Scheil-Gulliver simulations need to be calculated, and it is almost impossible to perform so many simulations only by manually submitting the simulation tasks. With a client program of Malac-Distmas, 101 tasks of the Scheil-Gulliver simulations can be completed within 20 min using Thermo-Calc software. It costs 4 days with Malac-Distmas and two Thermo-Calc clients to finish the present simulations. The simulation results were extracted and saved in an SQL-format database with a total size of 20 MB.

Machine learning technique
Machine learning modeling aims to establish a function between input and output and makes it as close to the real function relationship as possible by optimizing the model parameters [50,51]. Due to its low computational cost and short development cycle, machine learning is coupled with powerful data processing and high prediction performance and is being widely used in material science, including the establishment of phase diagrams [52], properties prediction [53], the discovery and design of high-performance materials [38,54], and the exploration of strengthening and toughening mechanism [38,40]. The widely used machine learning algorithms include linear algorithms, decision tree-based (DT) algorithms, artificial neural network (ANN), support vector machines (SVM), random forest (RF), and some Bayesianbased algorithms [55]. ANN is the most common approach in machine learning and will be used in this work. The open-source platforms for machine learning, Scikit-learn [56] and Pytorch [57], were used in this work.

Active learning
When the sample data are scarce due to the experimental challenges or high costs, it is difficult to establish a machine learning model with high prediction accuracy and great generalization ability using an existing small dataset. Hence, the active learning method that uses the designed experimental iterative feedback optimization method to improve the machine learning model predictions and reduce the number of required experiments has attracted attention [50,58]. Results of Lookman et al. indicate that active learning is forgiving of poor model quality [47]. The active learning method includes the following steps: data collection, feature engineering, machine learning model construction, next experimental points design, experimental test, and data feedback. The key point of active learning is to design the next experimental points according to the predicted value and prediction uncertainty of the machine learning model, and then iteratively optimize the next modeling by the feedback of experimental results until the model prediction meets the requirements [59].
In materials design, the common sampling strategies include manual empirical sampling, CALPHADassisted sampling, and Bayesian optimization sampling. Bayesian optimization (BO), which is an efficient global optimization method based on adaptive sampling, is one of the most common experimental points design methods in material exploration [47,60,61]. Bayesian optimization method evaluates the utility or acquisition function based on the model's predicted mean value and prediction uncertainty to design the next experimental points. The utility function allows a balance between exploitation (sampling where the objective mean is high) and exploration (sampling where the uncertainty is high). Among the available utility functions (i.e. Probability of Improvement (PI), Expected Improvement (EI), and Expected Upper Confidence Bounds (UCB)), the EI, which considers not only the probability of improvement but also the expected magnitude of improvement, is the widely used as utility function [62]. The EI [63] is defined as where μ and σ are the mean and the standard deviations of objective function f(x), respectively. f(x + ) is the value of the best sampling so far and x + is the location of that sample. Φ(•) and ϕ(•) are the cumulative distribution function (CDF) and the probability distribution function (PDF) of the standard normal distribution. Parameter ξ can be used to further control the trade-off between global search and local optimization, and determines the amount of exploration during optimization and higher ξ values lead to more exploration. A recommended default value for ξ is 0.01 [63]. Thus, a new sample point, x*, is chosen amongst other data points based on the largest EI, i.e. x* = argmax x EI(x). For quantifying uncertainties, three types of uncertainties should be considered: i) the experimental uncertainty, which corresponds to the noise observations of the same one batch experiments; ii) the data uncertainty or data noise, which corresponds to the sparse data and data from various sources, especially for the experimental data, and iii) the model uncertainty, which includes the model selection and random state of one model [64]. A confidence interval in the model prediction can be drawn by identifying and quantifying the sources of the uncertainty. This not only allows users to understand the prediction reliability but also facilitates the implementation of active learning [65,66].
Therefore, the experimental noise (error) of Scmodified Al-Si-Mg casting alloys was considered with the Gaussian distribution function in this work. Moreover, the mean and standard deviation of the tensile mechanical properties of Sc-modified Al-Si-Mg casting alloys with the same composition from different sources were re-evaluated with different weights that depend on the ratio between the respective number of experiments and the total number, as shown in Table 1. The ANN model including one hidden layer with 15 neurons was used in this work and all hyperparameters were fixed except the initial training state to avoid the effect of other hyperparameters on uncertainties. For each feature group, the ANN model was repeated 1000 times with the initial training state from 0 to 999 with and without the experimental data noise (error), resulting in 1000 ANN models for evaluating the uncertainty. The 'leave-one-out' method was used to validate the models. An ensemble of these 1000 models then gave predictions with means μ and standard deviation σ. Materials with the highest EI were selected for preparation.

Experimental procedure
To avoid the effect of minor element contamination, the Al-Si-Mg master alloys were prepared using the high-purity elements with purity up to 99.99 wt.% purchased from Alfa Aesar (China) Chemicals Co., Ltd. Since magnesium evaporates easily during melting, an amount of extra 5 wt.% magnesium was added in each sample. Each alloy sample was melted in a graphite crucible using an inductive furnace under an argon gas atmosphere. After homogenizing for 5 min at the temperature of 720°C, the Al-2Sc was added at 750°C to obtain the required chemical composition, as listed in Table 1. Each sample was then cast into a cylindrical graphite mold with a diameter of 20 mm and a height of 150 mm and preheated to 100°C. A universal testing machine (Instron 3369, USA) with a loading speed of 1 mm/min was used to test the roomtemperature tensile of the alloys. For each alloy, four specimens were tested, and the mean values were accepted.

High-throughput Scheil solidification simulations
Based on the solidification diagram constructed with a huge amount of Scheil solidification simulations, Lu et al. [11] from our research group designed the optimal Sc content (i.e. 0.54 wt.%) in A356 with the criterion This work *The Sc content is optimal for the hypoeutectic Al-xSi-yMg casting alloy based on the results of high-throughput CALPHAD solidification simulations.

**
The weight Wi is used to re-calculate the mean and standard deviations of mechanical properties for the alloys with the same composition. Moreover, the weights depend on the ratio between the respective number of experiments and the total number. For example, for A356-0.54Sc alloy, the weight of data from Ref. [11] is 6/8 (0.75), while is 1/8 (0.125) for the data from Ref. [16] and this work.
that the binary eutectic structure was completely replaced by the ternary eutectic structure, and their experimental result also showed that A356-0.54Sc alloy owned the best mechanical properties over the composition range of A356-xSc. In addition, Lu et al. [15] designed the optimal Sc contents in A357 (0.50 wt.% Sc), A360 (0.76 wt.% Sc), and A380 (0.75 wt.% Sc) alloys based on the solidification diagram, and their experiments nicely validated their predictions. Therefore, the CALPHAD-assisted alloy design method can be used to establish the relation of 'composition-process-microstructure' for efficiently exploring the high-performance Sc-modified hypoeutectic Al-Si-Mg casting alloys. Figure 2 displays the partial results for high-throughput Scheil-Gulliver solidification simulations of Scmodified commercial hypoeutectic casting Al-Si-Mg alloys, of which the concentration ranges of Si and Mg are in [4.5, 13] wt.% and [0, 0.7] wt.%, respectively. The composition ranges of commercial hypoeutectic casting Al-Si-Mg alloys [67], i.e. A356, A357, 359, A360, 365, A380, A413, 443, A444, and so on, were plotted on the horizontal projection plane. Each vertical line represents a Sc-modified Al-Si-Mg alloy series, such as A360-xSc alloys shown on the left side of Figure 2, and the corresponding solidification diagram and phase fraction diagram can be constructed to design the optimal Sc content (0.79 wt.%) in A360 alloy with the criterion that the binary eutectic structure was completely replaced by the ternary eutectic structure.
Besides the optimal Sc content plane of the commercial hypoeutectic Al-Si-Mg casting alloys, the corresponding solidification diagram and phase fraction diagram of alloys in this plane were shown on the right side of Figure 2. As can be seen from the solidification diagram of Al-xSi-yMg-optimal Sc alloys, when Mg content is above 0 wt.%, the alloys with optimal Sc contents have the same solidified sequence with the pre-set criteria, i.e. that liquid→(Al), liquid→(Al) +(Si)+AlSc 2 Si 2 , and liquid→(Al)+(Si)+AlSc 2 Si 2 +Mg 2 Si. When Mg content equals 0 wt.%, the alloys with optimal Sc content have the solidified sequences as liquid→(Al), liquid→(Al)+(Si)+AlSc 2 Si 2 . With the increase of Si content, the melting point of alloys decreases, as shown in the solidification diagram of Al-Si-Mg-optimal Sc alloys. The temperature of quaternary eutectic reaction (Liquid→(Al)+(Si)+AlSc 2 Si 2 +Mg 2 Si) remains constant (558.22°C) in the Sc-additional Al-Si-Mg casting alloys, and the temperature surface is thus a plane. Moreover, with the increase of Si content, the fraction of primary (Al) decreases, and the fraction of ternary eutectic structure increases.

Benchmark in Sc-modified A356 alloys
The relationship between the alloy features and experimental properties can be constructed by machine learning and can be used to design alloys with better properties. Moreover, the input determines the output for machine learning. Good features might be more important than the optimization of model hyper-parameters [68][69][70]. Selecting appropriate features can not only reduce prediction overfitting, improve the signal-to-noise ratio, prevent dimensional disaster, and improve prediction generalization ability, but also make the model better interpretability.  1.3] wt.% when w(si) > 10 wt.%. High-throughput Scheil-Gulliver solidifications were performed. Each vertical line represents a Sc-modified Al-Si-Mg alloys series, i.e. A360-xSc alloys, and the corresponding solidification diagram and phase fraction diagram can be constructed to design the optimal Sc content (i.e. 0.79 wt.% for A360 alloy) based on the criterion that binary eutectic structure was completely replaced by the ternary eutectic structure. Moreover, the optimal Sc content for each Al-Si-Mg hypoeutectic casting alloy can be designed with a similar criterion and plotted as a plane. The corresponding solidification diagram and phase fraction diagram of this plane were shown on the right side.
In most research about machine learning-assisted alloy design, the common input features are compositions of different components [53,71]. However, with the high-throughput CALPHAD solidification simulations, the solidified microstructure of Sc-modified commercial hypoeutectic Al-Si-Mg casting alloys can be obtained and used as features. Accordingly, the effect of the different features on machine learning results in the Sc-modified A356 casting alloys will be investigated. Furthermore, the uncertainties from the data noise and trained state of the machine learning model are quantified.
Based on the simulated microstructure (as shown in Figure 3(a)) and the experimental results of A356-xSc casting alloys except for the composition point (0.54 wt.% Sc) with best properties [11] listed in Table 1, two feature groups of A356-xSc alloys were set: i) the traditional feature group A, in which the input feature is the Sc content while the output features are the properties (UTS, YS, EL), ii) the microstructural feature group B, in which the input features are the calculated fractions of primary (Al) phase, total eutectic (Al)+(Si), AlSc 2 Si 2 phase, and Mg 2 Si phase, as validated by the experiments [11], while the output features are the same with feature group A. For microstructural features (as shown in Figure 3(a)), the fraction of primary (Al) phase firstly increases with the increase of Sc content and decreases when w(Sc) is over 0.54 wt.%. The change of eutectic (Al)+(Si) is the opposite of that of the primary (Al). The fraction of AlSc 2 Si 2 phase increases as the Sc content increases, but the fraction of Mg 2 Si remains the same. Besides, the Pearson correlation coefficient was used to analyze the linear relation among these features, as shown in Figure 3(b). The results also show that the fraction of eutectic (Al)+(Si) structure is a highly negative linear relationship with the fraction of primary (Al) in the microstructural feature group. In principle, one microstructure feature can be removed in such two strongly linearly correlated features to achieve the feature dimension reduction for better model performance. However, we retain all the microstructural features, and there are two reasons: i) that the comparison between the simulated phase fractions and the experimental data is facilitated; and ii) that the effects of each feature on the mechanical properties of alloys can be analyzed, and the strengthening/toughening mechanisms can be then discussed. Figure 4 shows that the plot with the green color of the 1000 ANN model results in two feature groups for A356-xSc excluding the best point (0.54 wt.% Sc). The error ranges of predicted results are plotted with two standard deviations of the mean value of 1000 ANN model results, corresponding to the confidence interval of 95%. When the experimental mechanical properties noises were not considered, the uncertainty of results was only contributed from the initial training state of the machine learning model, as shown in Figure 4(a,e). The results show that the training results describe well the existing training data, and the uncertainty from the initial training state of the machine learning model is small. However, the results from feature group A suggest that it is impossible to predict the optimal experimental elongation, while the results from feature group B show that there is a great probability to obtain the optimum properties at 0.54 wt.% Sc. Moreover, in order to perform the efficient global design of the next experimental points based on the trained results, the multi-objective (UTS, YS, EL) is transformed into a single-objective to represent the comprehensive mechanical property by using the quality index Q=UTS+YS•log 10 (EL), which was slightly modified based on Q DJR = UTS + c•log 10 (EL) [72] by replacing the constant value c (i.e. ca. YS value of A356 alloy) by the composition-dependent YS values over the wide composition range in this work. Then, the EI can be then calculated using Equation (1) with the mean and standard deviation of Q and used to determine the next point with maximum EI. The results of the comprehensive mechanical property of the two feature groups without data noise are presented in Figure 4(c1,g1). As indicated in the figures, there is a great probability to obtain the optimal experimental comprehensive mechanical property when the feature group B is used. The corresponding EI values are shown in Figure 4(c2,g2), and the designed next points with maximum EI values from the models with feature group A and feature group B are 0.41 and 0.54 wt.% Sc, respectively. The results show that models with feature group B (the microstructural features) are more efficient and accurate to design the optimal experimental point than those with feature group A (the composition features).
Furthermore, the data noises with Gaussian distribution function for experimental mechanical properties were considered in 1000 ANN models with different initial training states. The training results of two feature groups with data noise are presented in Figure 4(b,f). It can be seen that the errors of the experimental data are within the confidence interval of prediction. For one feature group, the results without and with data noise suggest that the uncertainty from data noise is larger than that from the initial training state models. However, in terms of predicting the optimal experimental points, the results are the same as those without considering data noise. The corresponding quality index Q and EI were calculated based on the predicted results, as shown in Figure 4 (d1-d2,h1-h2). As can be seen from Figure 4(d1,d2), although the Q results of the models with feature group A and considering data noise can cover a part of experimental optimal Q, the designed next point with maximum EI is 0.99 wt.% Sc, which is already far from the experimentally optimal point. The models with feature group B and considering data noise not only describe the Q results of training experimental data well but also predict and design the experimental optimal point, which can be validated by the results of EI, as shown in Figure 4(h1,h2). Compared with feature group A, feature group B is more suitable in ANN models for describing the experimental data, and the corresponding training results can accurately design the next experimental point, especially when the experimental error is taken into account. The main reason for the difference might be that there is only one monotonic variable (Sc content) as an input feature in feature group A, while the properties of the material are closely related to the microstructure used in feature group B. Therefore, feature group B (the microstructural features) will be selected in this work and the uncertainties are quantified from models and data noise.

Discovery of hypoeutectic Al-xSi-yMgoptimal Sc alloys
Combining CT and ML techniques, the quantitative relationship of 'composition-process-microstructureproperty' of Sc-modified hypoeutectic Al-Si-Mg casting alloys can be established to efficiently design the (a,c,e,g) for data without noise, (b,d,f,h) for data considering noise. The comprehensive mechanical property was calculated using the quality index Q=UTS+YS•log 10 (EL) in (c1,d1,g1,h1), and the EI value was evaluated to design the next point in (c2,d2,g2,h2). The error ranges of predicted results are plotted with 2 standard deviations of the mean value of 1000 ANN model results, corresponding to the confidence interval of 95%.
optimal Sc content, such as A356-xSc alloys. However, it is foreseeable that there is considerable uncertainty in predicting the mechanical properties of hypoeutectic Sc-modified Al-Si-Mg casting alloy using the models only based A356-xSc data. Figure 6(a) shows the trained result of the comprehensive mechanical property index Q with the experimental data of A356-xSc. The result of 1 st iteration shows that there is a large uncertainty when predicting the properties of Al-xSi-yMg-optimal Sc. In order to accurately explore the optimal mechanical properties of commercial hypoeutectic Al-Si-Mg-Sc casting alloys, more data are needed to improve the ML models. The active learning method, which uses the designed experimental iterative feedback optimization method to improve the machine learning model predictions and reduce the number of required experiments, was utilized in this work.
The key to the improvement of ML models is the sampling strategy. The common sampling strategies include manual empirical sampling, CALPHADassisted sampling, and Bayesian optimization sampling. For the construction of the initial training dataset, manual empirical sampling is specially considered. Fortunately, CALPHAD-assisted sampling (CT method) is an efficient method to accurately design alloy. Based on the alloy design results of Sc-modified A356 casting alloys, the CT method can be used to efficiently design the optimal Sc addition for hypoeutectic Al-Si-Mg casting alloys with the optimal mechanical properties and the optimal Sc addition over the composition space of hypoeutectic Al-Si-Mg casting alloys has been performed based on the high-throughput Scheil solidifications (as shown in Figure 2) and can be described by Eq. (S1) in supplementary materials.
Hence, the high-performance composition searching space for hypoeutectic Al-Si-Mg-Sc casting alloys was forced on the optimal Sc content surface, and the corresponding microstructural features used in the ANN models are shown in Figure 5. As can be seen from Figure 5, the fraction of the total eutectic (Al)+(Si) structure and AlSc 2 Si 2 phase increase obviously with the increasing Si content, while the change of primary (Al) phase is opposite to the change of the total eutectic (Al)+(Si) structure. The amount of Mg 2 Si is relatively small and its variation is mainly related to the Mg content. The previous 1000 ANN models considering the data noise were iteratively optimized with the feedback of new experimental points. The sampling process was conducted in the following two steps: Step 1: CALPHAD-assisted sampling. In order to accurately explore the optimal mechanical properties of commercial hypoeutectic Al-Si-Mg-Sc casting alloy, more data were chosen for the commercial hypoeutectic Al-Si-Mg alloys with the optimal Sc content designed based on the CT method. As shown in Table 1, the new experimental data during the 2 nd iteration were from Lu's work in 2019 years (A357 and A357-0.53Sc, A360 and A360-0.79Sc, A380 and A380-0.77Sc), which are on the Al-xSi-yMg-optimal Sc surface. The weights based on the proportion of the number of sampling in this system were used to calculate the mean and standard deviation of mechanical properties for the alloys with the same composition. What's more, the new experimental data were added into the 3 rd iteration, including the 355-0.35Sc, A413-1.14Sc, and 359-0.73Sc from this work. Figure 6(a-c) present the quality index Q for experimental comprehensive mechanical property and the predicted results after 3 iterations for the optimal Sc-modified hypoeutectic Al-Si-Mg alloys (the solidified features and the training results of A357-xSc, A360-xSc, and A380-xSc can be seen in Supplementary Figures S2 and S3). It should be noted that the gray surfaces represent the 95% confidence interval of the predictions. As shown in Figure 6(a-c), the predictions of ANN models describe the experimental results well. As new experimental data were fed back into the models for iterative optimization, the prediction of ANN models was improved and the uncertainty around the experimental data was reduced. The ML models describing the relationship between the microstructure and properties for the hypoeutectic Al-Si-Mg casting alloys with optimal Sc were preliminarily established. Furthermore, in this composition space of hypoeutectic Al-Si-Mg casting alloys with optimal Sc content, the predicted results show that the composition with the best comprehensive mechanical property index Q is located in the corner of the space with maximum Si and Mg content, accompanied by a large uncertainty.
However, there is no commercial hypoeutectic Al-Si-Mg series corresponding to the composition with maximum Q. It is difficult to further design the next points only based on the CALPHAD-assisted sampling strategy. Bayesian optimization sampling, which is an efficient global optimization method based on adaptive sampling, was then applied to design the new experimental points and further reduce the uncertainty of the models.
Step 2: Bayesian optimization sampling. Based on the iterative results of the previous step, the EI over the composition space of optimal Sc-modified hypoeutectic Al-Si-Mg alloys, allowing a balance between exploitation (sampling where the objective mean is high) and exploration (sampling where the uncertainty is high), was calculated using Equation (1) and shown in Figure 6(f). As can be seen from Figure 6 (f), after the 3 rd iteration, the maximum EI is located in the corner of the space with maximum Si and Mg content (13 wt.% Si and 0.7 wt.% Mg with 1.11 wt% optimal Sc), and value of EI is 206. Thus, considering that the next point is designed based on max(EI), the corner point (Al-13Si-0.7 Mg-1.11Sc) with predicted Q = 552 ± 240 MPa is chosen as the next point, marked as A1. Then, the A1 alloy was prepared and the mechanical properties were measured, as listed in Table 1. The experimental results show that UTS, YS, and EL of A1 alloy were 253 ± 4 MPa, 152 ± 3 MPa, and 2.3 ± 0.2%, respectively, and the Q is 308 ± 10 MPa and at the low boundary of the predicted range from the 3 rd iteration. Afterward, the experimental properties of Al alloy were fed back to the dataset as the 4 th iteration for optimizing the models, and the iterative results are shown in Figure 6(d). Compared with the results of the 3 rd iteration in Figure 6(c), the prediction accuracy and uncertainty of models after the 4 th iteration were significantly improved over the entire concerned composition space. The EI after the 4 th iteration was calculated and shown in Figure 6(f). The results show that the EI of models after the 4 th iteration reduced significantly and the maximum EI was 24.25 at the Al-9.5Si-0.7 Mg (optimal Sc 0.75 wt. %), which is located in the maximum Mg boundary and the boundary of composition windows of commercial 359 alloys (Si:8.5-9.5 wt.%, Mg: 0.5-0.7 wt.%). Therefore, Al-9.5Si-0.7 Mg-0.75Sc alloy marked as A2 with predicted Q = 395 ± 40 MPa, as the designed alloy of the 4 th iteration, was also prepared and compared with the 359-0.73Sc alloy in the 3 rd iteration to validate and feedback optimize the models.
The mechanical properties of A2 alloy were measured as listed in Table 1. The experimental results showed that UTS, YS, and EL of A2 alloy were 241 MPa, 142 MPa, and 5.0%, respectively, and the Q was 340 ± 7 MPa. The experimental Q of A2 alloy falls out of the low boundary of predicted Q from the 4 th iteration. However, the measured mechanical properties of A2 are quite similar to those of existing 359-0.73Sc alloy (UTS: 245 ± 4 MPa, YS: 140 ± 3 MPa, EL: 5.8 ± 0.3%, Q: 352 ± 9 MPa).
The experimental properties of A2 were fed back to the dataset as iteration 5 for optimizing the models, and the iterative results were updated and given in Figure 6(d). After the 5 th iteration, the predicted accuracy of models was improved slightly, and the alloy composition space with the highest Q was around the composition windows of the A360 alloys (Si:9-10 wt. %, Mg: 0.4-0.6 wt.%). The corresponding properties (UTS, YS, and EL) during the iteration processes were displayed in Supplementary Figure S4. The EI after the 5 th iteration was calculated and shown in Figure 6(f). The results show that the EI of models after the 5 th iteration reduced less than that from the last iteration, indicating that the uncertainty of the models is almost reaching a relative minimum. The maximum EI reduces from 24 of the 4 th iteration to 10 at the Al-10Si-0.5 Mg (optimal Sc 0.83 wt.%), which is located in the composition range of A360 alloy. Moreover, the predicted Q (374 ± 30 MPa) of the newly designed alloy (Al-10Si-0.5 Mg-0.83Sc) is very close to the Q of the existing A360-0.79Sc alloy (Al-9.5Si-0.5 Mg-0.79Sc, Q = 380 ± 30 MPa). It means that the prediction accuracy of the models is high, and the benefits of the next iterative optimization should be not large. Herein, the iterative optimization process can be stopped.

Discovery of hypoeutectic Al-xSi-yMg-zSc alloys with best mechanical properties
Moreover, considering that the experimental points are distributed in the commercial hypoeutectic Al-Si-Mg-Sc space (see Supplementary Figures S5 and S6), the trained ANN models should be applicable to predict the mechanical properties over the entire hypoeutectic Al-xSi-yMg-zSc composition space. In order to test the predictive ability of the current models at unknown points in the range of hypoeutectic Al-Si-Mg alloys, the lasted experimental points of Al-6Si-xSc (x = 0.3, 0.6, 1 wt.%) alloys from the work of Wang et al [25]. were chosen as the predicted system. Figure 7 shows the predicted results based on the current models and experimental results for Al-6Si-xSc alloys. The error ranges of predicted results are plotted with two standard deviations of the mean value of 1000 ANN model results, corresponding to the confidence interval of 95%. As shown in Figure 7, all of the experimental mechanical properties of Al-6Si-xSc alloys are located within the 95% confidence interval of predictions. The results indicate that the current models also have a good predictive ability in the global composition space.
Therefore, the trained ANN models can be used to screen the optimal contents of Si, Mg, and Sc in the entire hypoeutectic Al-Si-Mg-Sc composition space. Figure 8 comprehensively presents the relation 'composition/process-microstructure-properties' for   the strength but a weak effect on the elongation, while the primary (Al) phase has the opposite effect. The predicted comprehensive mechanical property index Q over the entire hypoeutectic Al-Si-Mg-Sc composition space was shown in Figure 9(a). The results demonstrate that the composition space for the alloys with the highest Q values locate around the range of 4.5-10 wt.% Si, 0.4-0.7 wt.% Mg and 0.2-1.0 wt.% Sc. Moreover, the alloys with the best comprehensive mechanical property can be screened over the entire hypoeutectic Al-Si-Mg-Sc composition space. Figure 9(b) presents the screened optimal contents of Si, Mg, and Sc in the entire hypoeutectic Al-Si-Mg-Sc composition space when the predicted Q is over 372 MPa, which is larger than 99.7% of the alloys. As can be seen from Figure 9(b), the composition space with maximum Q is mainly around the range of 6-10 wt.% Si, 0.5-0.6 wt.% Mg and 0.3-1.0 wt.% Sc, located in the composition space of the Sc-modified A357, 359, and A360 alloys. In addition, the predicted Q for hypoeutectic Al-Si-Mg alloys with the additional Sc is shown in Figure 9(c). In the figure, the color bar represents the content of Sc. The comparison of predicted Q for hypoeutectic Al-Si-Mg alloy without and with optimal Sc is shown in Figure 9(d). With the addition of optimal Sc based on the CT method, the comprehensive mechanical property of hypoeutectic Al-Si-Mg alloys is improved significantly, with Q values reaching maximum values in most regions. As can be seen in Figures 7 and 9(d), although there is relatively large uncertainty in the corner with minimum Mg and Si due to lack of data, the current models are adequate to design the alloys with desired high Q.

Conclusions
In this paper, we proposed a novel alloy design strategy integrating active learning with high-throughput CALPHAD calculations and key experiments to accurately explore the optimal contents of Si, Mg, and Sc in hypoeutectic Al-xSi-yMg-zSc casting alloys over highdimensional composition space with the multi-objective optimization.
We first performed the high-throughput CALPHAD solidification simulations of ocean of hypoeutectic Al-Si-Mg-Sc casting alloys over a wide composition range to establish the quantitative relation 'composition-processmicrostructure'. Then, the relation 'microstructuremechanical properties' of Al-Si-Mg-Sc hypoeutectic casting alloys was acquired using the active learning technique supported by key experiments designed by CALPHAD and Bayesian optimization samplings. Moreover, the uncertainties from data noise and models were quantitatively evaluated during modeling.
After a benchmark in A356-xSc alloys, the results showed the microstructural features are better than composition features during modeling and Bayesian optimization samplings. Then, such a strategy was utilized to design the high-performance hypoeutectic Al-xSi-yMg alloys with optimal Sc additions that were later experimentally validated. Finally, the present strategy was successfully extended to screen the optimal contents of Si, Mg, and Sc over high-dimensional hypoeutectic Al-xSi-yMg-zSc composition space, the composition space for maximum comprehensive mechanical property mainly locates in the range of 6-10 wt.% Si, 0.5-0.6 wt.% Mg and 0.3-1.0 wt.% Sc, conforming well with that of the Sc-modified A357, 359, and A360 alloys.
It is anticipated that the proposed strategy integrating active learning with high-throughput CALPHAD simulations and key experiments should be generally applicable to the efficient design of high-performance multi-component materials over high-dimensional composition space.

Author contributions
J.G. designed the project, performed the high-throughput simulations of CALPHAD, designed experiments, constructed active learning process, and wrote the manuscript.