Processing Technique Selection for Steels Based on Mechanical Properties Using Machine Learning Framework

Prediction of process route from materials with a desired set of property is one of the fundamental issues in the perspective of the materials design aspect. The parameter space is often too large to be bound since there are too many possibilities. However, in the areas with limited theoretical access, artificial learning techniques can be attempted by using the available data of the candidate materials. In this study, a computational method has been proposed to predict the different process routes of steel constituted taking composition and desired mechanical properties as inputs. First of all, historical data of the actual rolling process was collected, cleaned, and integrated. Further, the dataset is divided into four different classes based on the rolling process data. Then, to find out the essential characteristics of variables, feature correlation among various features has been calculated. A state-of-art machine learning prediction methods such as logistic regression, K-nearest neighbor, Support vector machine, and the random forest are studied to implement the prediction model. In order to avoid the overfitting of the model, k fold cross-validation is applied to the model, and achieve a realistic prediction result with an accuracy of 97%. The F1-score of the classification model is 0.86, and the kappa score is 0.95, which comply that the model has excellent learning and speculation ability and the precise forecast of steel process routes based on the given input parameters.


Introduction
Steel is a versatile material concerning its application due to its superior properties, available variations, and amicability of its shaping and processing to obtain desired application-specific products [1][2].In case of different grades of steel, many properties depend on composition and microstructure.They can be divided into two major categories, namely structure sensitive properties (e.g., yield strength) and structure insensitive properties (e.g., electrical conductivity) [3].The material processing depends on various conditions like rolling techniques (hot rolled and cold drawn), which affect the overall performance and application of steel.The cold-rolled process increases the hardness and decreases the grain size, whereas the hot rolled process reduces the average grain size of metal but maintains an equiaxed microstructure [4][5].Control of process variables of thermomechanical processing can be useful to realize the desired grain size through some solid-solid reactions like recrystallization and phase transformation [6][7][8][9][10][11][12][13].To reduce the production cost and the human intervention, the introduction of automation in the steel industry in a more enhanced form than the present one, is necessary.In this perspective, prediction algorithms are used to speed up the production process, and such efforts are being taken up by the materials science and technology community for attaining certain solutions after compilation of extensive datasets of material composition, process methods, and related properties [14][15][16].In 2009, Brahme et al. have designed an artificial neural-network-based prediction model of cold rolling textures from steel, which is used to predict fiber texture using texture intensities, carbon content, carbide content, and the amount of rolling reduction [17].Simecek et al. have developed the MECHP tool to predict the mechanical properties of hot rolled steel products.The machine measures process data like water-cooling and subsequent air-cooling of the hot-rolled narrow plate and wire [18].Jassim et al. have developed an empirical model to predict hardness, yield strength, and tensile strength for single roll melt spinning of rapidly solidified ribbons [19].This forecasting technique assists with examining the connection between rapid solidification parameters (opening breadth, a gap in the Nozzel-roll wheel and melting temperature) and the thickness, hardness, and elasticity of quick rapid solidified ribbons [19].
Stiffness, strength, ductility, hardness, toughness, etc. are some of the mechanical properties, which govern the selection of a material for a given application.Looking for the optimum fusion of material properties along with economic factors, is a necessity for selecting process routes.The determination of the material processing method is of considerable significance while determining the performance of steel.This is due to the fact that variation in process routes for a given grade of steel is capable of producing different structures.This, in turn, is capable of influencing properties like fracture and corrosion behavior [20][21].It is, therefore, imperative to formulate a distinction between the fundamental and desirable input parameters (here composition, strength, and rolling process).During the manufacturing process, materials and process parameters are to be controlled, and hence, these are ideally desired to be input variables for the study of determining the process routes [22].The extent of control that the material experiences during processing defines the final properties of the material.At present, a large number of research initiatives have been taken up to predict the steel processing mechanism along with predicting the mechanical properties of steel using various computational methodologies based on artificial neural network and pattern recognition [23][24].
However, such a complex network structure requires more time and training data to ensure that the model predicts with a high degree of accuracy.In recent trends, many industries are inclined towards machine learning orientated solution to speed up the processing mechanism [25][26][27][28][29][30][31].In this study, a robust machine learning model is described for prediction of different manufacturing processes of steel of various grades viz.hot rolled (HR), cold drawn (CR), annealed cold drawn (ACR), spheroidized annealed cold drawn (SCR).The proposed computational model will be applied without utilizing materials or energy for production.This will make the process less time consuming and thereby economically more viable.This eventually will cater to the needs of the clients of steel industry by lowering the overall production cost.

Computational Model
In this paper, the effect of various classification models over steel dataset has been discussed to automate the steel processing mechanism [33].To implement the prediction model, open source software Python 3.7 has been used.The libraries used in this model are Scikit-learn, pandas, and seaborn for various computational and visualization purposes [32].Figure 1 shows the predictive steel processing model step.

Figure1. Schematic representation of steel processing predictive model.
The data used in the trained and testing process includes 11 columns indicating different features and 129 rows indicating the type of steel as mentioned in the supplementary document.Various compositions of steel like Iron (Fe), Carbon (C), Manganese (Mn), Sulphur (S), Phosphorus (P), and various mechanical properties like tensile strength(  ), reduction in area (  ), hardness (H), elongation (  ) and yield strength(  ), are used as features.We also take one more function named carbon equivalent (CE), which can be evaluated using equation 1.In machine learning, logistic regression is useful while obtaining a particular class or event existing from the actual class table.Various classification techniques such as logistic regression, support vector machine, K-nearest neighbor, and random forest is used to compare the results [34][35][36][37][38].In this paper, a multi class classification is discussed to predict the thermomechanical processing routes, namely hot rolled (HR), cold drawn (CR) annealed cold drawn (ACR), and spheroidite annealed cold drawn (SCR) as describe in table 1.

Table1
According to the statistical distribution of attributes in Table 2, each attribute requires some tweaking.
The collected data is in different range and if the scales for different features are wildly different, it may cause the abnormality in model performance.To ensuring all features weighted equally in their representation, data normalization should be used to adjust their weights accordingly.Data normalization is a procedure frequently applied as a major aspect of data preparation for AI.The objective of normalization is to change the estimations of numeric sections in the dataset to a typical scale, without twisting contrasts in the scopes of qualities.This seems like a better choice for more realistic data representation.Figure 2 shows the overall flow of the proposed work.Besides, it shows how a tree can be generated from the used dataset.We tested the data with various machine-learning algorithms, and we found bagging based random forest gives the best results after parameter tuning.data slit into the left and right node so that optimal results can be derived by using a voting scheme.

2.1.Feature Correlations
In predictive statistics and machine learning, an attribute with a high correlation coefficient has more influence on the prediction variable [36].This can be understood and visualized with the help of a correlation map.The correlation coefficient () can measure the relationship between variables.The correlation coefficient of two attributes is always a range between 1 (Positive relationship) to -1 (Negative relationship), whereas 0 implies no correlation at all.Equation 2 is useful to calculate the correlation coefficient: The correlation map displayed in Figure 3 shows the dependence that the features have on each other.
The correlation coefficients are presented within the cells.The diagonals possess a correlation coefficient of 1, showing that each feature perfectly correlates with itself.Yield strength has strong negative correlations with elongation and reduction in the area.In contrast, elongation has strong positive correlations with a reduction in area (  ), and strong negative correlations with yield strength (  ), hardness (H), and tensile strength (  ).

2.2.Random Forest implementation
Classifier such as 'Decision Tree' often encounters limitation on the complexity at a fundamental level [39].When it comes to unsupervised learning, decision trees' can grow to arbitrary complexity by achieving generalized accuracy only.On the other hand, the oblique decision tree is beneficial for having optimized training set accuracy.Hence, the proposed method is to increase the number of trees in feature subspace using random criteria, forming a random forest.By increasing the number of trees, the complementary generalization in the combined classification can be monotonically improved.
Using the best split approach, the above-selected features are used for finding the root node of the tree.With the establishment of a root, the child nodes computed and determined using the same approach as earlier.Finally, the process will halt only when the root node and target as a leaf node are set.In the same way, all the possible trees with the 'k' feature selection to form 'n' random trees of the forest.When it comes to validation of any feature set against the expected outcome, a very stable test set should be built.In a classification method such as random forest, a large set of combinations is generated, which will increase the complexity in the outcome as well.Hence, a voting methodology is adopted that can determine the maximum occurrences of the target output from the available 'n' random trees.So, of the different criteria available for the selection of the target node, majority voting is adopted.This method tries to cast votes of all the possible target feature selections available from the random forest.The feature set that has maximum votes obtaining for maximum test cases is considered as winner feature set for the given classification, as shown in the following equation.
Where, N = total number of features, k = number of features selected, fn = classification tress generated,  ́= input complements provided.

2.3.Support Vector Machine
In machine learning, the support vector machine (SVM) is a statistical enabled learning method based on the strategy of the highest margin.This method is appropriate when the sample set is lesser than the number of dimensions.The main working principle of SVM is creating hyperplane to uniquely classified data points from an n-dimensional space where n is the number of features.An SVM is a soft margin classifier, and the meaning of this is that we will have hyperplane and margin around the hyperplane, but not all the observations need to be on the correct side of the hyperplane.Hyperplanes are decision boundaries that help classify the data points.Data points falling on either side of the hyperplane can be attributed to different classes.[40].
In the linearly separating case, the decision surface equation of the separating hyperplane can be written [41] as Where, x is input vector, w is an adjustable weight vector and b is bias.

Decision tree
Decision trees are a popular supervised learning method that, like many other learning methods, can be used for both regression and classification [42][43].The working principle of decision trees is to split the data into subsets, where each subgroup belongs to only one class.This is accomplished by dividing the input space into a pure plane (region).Practically in real life, extremely pure subsets are not possible; therefore, to get real data points, the data is too divided into many subsets in such a manner where each subset belongs from the same class.
Boundaries uniquely identified in each section are called decision boundaries based on which the decision tree model makes classification decisions.A decision tree is a hierarchical structure with nodes and directed edges.The top node is known as the root node, and the bottom is known as leaf nodes.Nodes that are neither the root node nor the leaf nodes are called internal nodes.The root and internal nodes have test conditions; each leaf node has a class label associated with it.Figure 5 represents a clear idea about different nodes.A classification decision is made by traversing the decision tree starting with the root node.
The depth of a decision tree is the number of edges in the longest path from the root node to the leaf node.The decision algorithm can operate on the original training data pretty much as is.So decision trees tend to work well with data sets that have a mixture of feature types binary, categorical, or continuous and with features that are on very different scales.One drawback of decision trees is that despite the use of pruning, they can still overfit all or parts of the data and may not achieve the best generalization performance compared to other methods.With the training dataset xi, Yi, the approximation function can be expressed as [42]: Where, the corresponding training dataset of decision tree hk is {  , − +   ℎ  ).

2.5.K Nearest Neighbor
kNN is one of state of the art supervised machine learning classification algorithm.The idea behind kNN is to classify data points from its neighbor.KNN relies on the notion of the so-called duck test.In the classification problem, data points from the same class are identified.Hence, similar input samples labeled with the same target label.This means that classification of a sample is dependent on the target labels of the neighboring points.
kNN value can be measured by calculating the distance between its neighbors, and such distance can be calculated using Euclidean, Manhattan, Minkowski equation.If K = 1, at that point, the case is mostly assigned to the class of its closest neighbor.This distance calculated using, as given below

Results and Discussion
Finding the correlation among various compositions and mechanical properties is central to comprehend the connection between chose features with other features..The input dataset is on different scales; hence it is not good practice to use such a dataset without using data preprocessing; therefore, we normalize all data to maintain a specific range of input data samples by using min-max feature scaling.
While computing the performance of the model, the relationship among various compositions and mechanical properties is, need to understand-correlation among compositions and mechanical properties graphically represented in figure 5 using the python seaborn module.From the scatter plot, it is shown that when yield strength increase the value of elongation decreases and while increasing the elongation, hardness of the surface decreased, which is evident to the ground truth theory.This correlation scatters plot provides a quantitative description of all input parameters, which is very useful in this study.It is also identified that not all input parameters are played a vital role in the correlation process, so it can be removed from the feature set to reduce the number of input parameters in such a manner that will not affect the results.In this paper, a multi-class classification is discussed; namely, hot rolled, cold drawn, annealed cold drawn and spheroidite annealed cold drawn are denoted as class 1, class 2, class 3 and class 4 respectively.
The dataset split into two subsets, 80% of data is used for training, and 20% is used for testing.In this study, a computational comparison has been discussed among various existing machine-learning algorithms such as logistic regression, Decision tree, k nearest neighbor, SVM, and random forest, as shown in table 3. A graphical representation of the accuracy table is shown in figure 6.

3.1.Feature (predictive properties) selection
Classification accuracy is not enough to measure the effectiveness of any model.Based on precision and recall, the f1 score can be calculated, which is a statistical measure of testing accuracy.The F1 score value near 1 indicates the perfect precision and recall.F1 score can be calculated as equation 7 Where On the other hand, a kappa score (k) is also used to measure the test accuracy quality.The significance of rater reliability quality lies in the way that it represents the degree to which the information gathered in the testing process are right portrayals of the factors estimated.
Kappa score 0 means the the model understanding isn't appropriately mappedand 1 represent flawless understanding, and this value can be calculated as Where,   : relative observed agreement among raters   is the hypothetical probability of chance agreement.as the driver for Industry 4.0.It is reasonable to expect that the approach address in the present study will be extended in future particularly in the domain of materials informatics.

Ethical Approval:
The author warrant that the paper fulfills the ethical standards of the journal.
Process steps involved for the model development.The tree in the right side shows how data slit into the left and right node so that optimal results can be derived by using a voting scheme.Comparisons of the accuracy of different machine learning models.
Rolling process comparison actual vs predicted results.

Figure 2 .
Figure 2. Process steps involved for the model development.The tree in the right side shows how

Figure 4 .
Figure 4. Node block diagram of a decision tree.Left-hand side nodes are known as the left child node, and the right side nodes are known as the right child node.

Figure 5 .
Figure 5. Correlation among various features using a scatter plot.The diagonal matrix indicates the highest correlation, among other features.

Figure 6 .
Figure 6.Comparisons of the accuracy of different machine learning models.

Figure 8 .
Figure 8. k fold cross validation graph where k=10

Figures
Figures

.
Various steel processing methods used for the prediction model.

Table 2
displays the statistics summary of each column from the dataset used, which includes, mean, standard deviation, minimum and maximum values of the column in the dataset.There are columns having a count of 129, which shows that the dataset does not have any NULL values.This description also shows that the features of the dataset have varying ranges.Machine learning models like logistic regression perform better when they learn from data that is in the same range and to accomplish that normalization is a necessity for this dataset.Table2: Descriptive statistics of input variables

:
Comparison table of the model prediction accuracy with various classification techniques.

Table 4
shows the model accuracy, precision, recall, and kappa score on the used dataset.

Table 4 :
Confusion Matrix of the proposed model.The matrix shows the number of samples for each class predicted by the model.F1score is used to understand the correctness of classification if the score is one that indicates the model is stable and 0 means unstable.In this study, we achieve ~0.97% accuracy, which shows the goodness of the model.Moreover figure7, represents a clear picture of the performance of testing data.From figure6, it can be inferred that except in one case, all predicted results are correctly classified; hence, such a model can be useful in automating the process selection process in the steel industry.Number of input parameters are also played a vital role in model performance and a model could be more effective if fewer input parameters are used to achieve the desire results.Initially, we use 11 different features to implement the model.It is also noticed that not all features are required during the steel processing selection process, and from the feature importance statistics carbon, carbon equivalent, sulfur, phosphorus, yield strength, and elongation are enough to predict the rolling process.With the fixation of k, the model is repeatedly trained on the training set but tested on test sets of all the folds expect the trained one.When it comes to testing accuracy, an average of all the accuracy results for all the k-folds is considered.The dataset is reset to its original state or shuffled to avoid any biasing and then subjected to dividing for validation sets such that proper balance of data The model accuracy metric is often used for determining the performance of various data processing and modeling techniques.In learning models, this metric is often challenged due to the overfitting of the training data for the model.However, when it comes to machine learning, unlike inductive learning, machine learning tries to generalize the solution beyond the training data.Hence, to validate any machine-learning model, especially for greater generalization accuracy cross-validation method is often adopted.This methodology is very effective when training data is limited.Cross-validation splits the given dataset into multiple folds (kno of folds) containing both training and testing sets.accuracy is thus 96.98 (~97)which prove the acceptability level of prediction of the rolling process selection model.