Background & Summary

Corrosion of metallic structures has a substantial impact on the economy. A 1998 study estimates the direct cost of corrosion to be 276 billion USD each year in the U.S. alone, or about three percent of the gross domestic product (GDP)01,2,3. Therefore, the accurate assessment, control, and prediction of corrosion is of paramount importance. For metals that undergo uniform corrosion, the corrosion rate can often be obtained by mass loss measurements as a function of exposure time to a corrosive environment or by electrochemical measurements. However, for many corrosion resistant alloys (CRAs), either homogeneous solid solutions or multi-phase alloys, this assessment is more complicated because of the existence of a thin oxide on the alloy surface known as a passive film4. This film decreases the rate of uniform corrosion but can be susceptible to accelerated localized corrosion, including pitting corrosion, crevice corrosion, and stress corrosion cracking, all of which are associated with the local breakdown of passive films. These corrosion modes can result in unexpected, catastrophic failure, which should be avoided by all means.

Electrochemical approaches that are simple and fast have been widely used to assess the corrosion rate of CRAs, including linear polarization, potentiodynamic polarization, and electrochemical impedance spectroscopy (EIS)5. A number of electrochemical metrics derived from these techniques have been adopted to describe the corrosion resistance of alloys, such as corrosion current density (icorr), passive current density (ipass), corrosion potential (Ecorr), pitting potential (Epit), repassivation potential (Erp), crevice corrosion potential (Ecrev), pitting temperature (Tpit, more commonly abbreviated CPT), and crevice corrosion temperature (Tcrev, more commonly abbreviated CCT). These metrics are defined in Table 1 and depend on the alloy composition, structure and defects, environmental factors including the temperature and chemical composition of the electrolyte, as well as physical factors such as the crevice former. A typical curve resulting from the cyclic potentiodynamic polarization of a passive metal is schematically illustrated in Fig. 1. Metals are electrochemically polarized from the active region toward the noble region by incrementally stepping the potential while the corresponding current density is recorded. The Ecorr corresponds to the potential at which no external current flows. At this potential, the metal corrodes at a rate defined by icorr, which is the corrosion current density. This value is generally obtained by extrapolating the linear portion of the anodic and cathodic branches of the polarization curves to Ecorr. When the potential is scanned toward the more noble direction, a passive region exists where the current density is independent of the applied potential. The corresponding current density for this region represents the ipass. Although metastable pitting could occur in this region, no stable pit could form. When the applied potential is more noble than a specific range of values, the passive film breaks down and stable pits form, a process accompanied by a rapid increase of the current density. This characteristic potential is defined as Epit, which has been broadly used to determine the tendency of a given metal or alloy to breakdown. During cyclic polarization, the scan direction of the potential is reversed when the current density exceeds a predetermined value. Subsequently, the Erp value can be frequently but not always achieved when the current density drops substantially, indicating the repassivation of the pit. This value is generally lower than Epit. Since stable pits can only form when the potential is more noble than Epit, and they can only propagate at a potential more noble than Erp, higher values of Epit and Erp suggest that the material is more resistant to pitting corrosion. Compared to the weight loss measurement, these electrochemical metrics possess a number of advantages: (1) They can be obtained much faster, particularly for CRAs that are intentionally designed with high corrosion resistance. (2) For parameters such as Epit, Erp, and Tpit, the results can be reproducible with well-controlled experimental conditions. (3) They help to gain an understanding of the corrosion mechanism. Therefore, these electrochemical metrics have been extensively used in the metal corrosion community, and a vast resource of data is available in the literature.

Table 1 Definitions of various electrochemical metrics for metal corrosion.
Fig. 1
figure 1

A typical cyclic potentiodynamic polarization curve, after Frankel, Journal of The Electrochemical Society, 1998, 145, 697. Reproduced with permission.

Given the hidden damage that localized corrosion introduces, with its frequent difficulty of detection and consequent potential for unforeseen catastrophic failure of high-value assets, designing materials with resistance to localized corrosion is of high priority. Various factors should be taken into account for such designs, including material composition, microstructure, tendency for passivity, and electrochemical activity of the surface. However, current approaches for the development of CRAs are primarily based on intuition, history of past successes, and trial and error. Although empirical models capable of predicting the corrosion resistance of a specific group of materials exist in the literature, they are strictly constrained to a very limited and particular range of compositional space. For instance, the pitting resistance equivalence number (PREN) was developed as a figure of merit by correlating the pitting corrosion resistance to alloy compositions for Fe-Cr-Ni alloys6. One relation characterizing the resistance of austenitic stainless steels to pitting corrosion is7:

$$PREN= \% Cr+3.3\times \% Mo+19.4\times \% N$$
(1)

where the coefficients simply describe the relative effects of Mo and N to that of Cr and the concentrations are in weight percent. Based on this equation, the pitting corrosion resistance of austenitic stainless steel can be primarily controlled by the amount of beneficial components, i.e. Cr, Mo, and N. While the PREN has been broadly used in the CRA industry, this equation cannot be extrapolated to compositions outside of those used to fit the equation, including high entropy alloys (HEAs) and aluminum alloys.

In this study, we create a CRA database with electrochemical metrics that could be fed into machine learning (ML) based models to allow for exploration of the compositional space beyond what was used to create the current empirical models. To our knowledge, this is the first published database of this type. Existing large-scale databases only contain the uniform corrosion rate of certain types of alloys8,9. However, for CRAs, the uniform corrosion rate is not very meaningful because the dominant corrosion mechanism is localized corrosion as has been introduced above. The rate of the localized corrosion can be many orders of magnitude higher than that of uniform corrosion. Therefore, more complex electrochemical parameters, such as the pitting potential and repassivation potential or other metrics reported in this database, are required to capture these corrosion phenomena. A high level overview of the dataset is shown in Fig. 2. This database not only allows us to link the corrosion resistance of CRAs to various experimental parameters, including materials composition and environmental attributes (temperature, pH, and chloride ion concentration), but also enables the development of calculable matrices that could shed light on the fundamental physical processes that govern the corrosion performance. For instance, the corrosion resistance of metals could be correlated to the bonding strengths of metal-metal and metal-oxygen bonds6,7, chloride ion adsorption susceptibility10,11 and oxide enrichment and depletion factors12, which could be calculated through density functional theory or molecular dynamics approaches. ML models can also be integrated into a generic multi-physics modeling framework to bridge gaps where there is not yet mechanistic theory that enables the prediction of complex corrosion phenomena. Therefore, development of corrosion databases such as this may be crucial for the future of alloy design and optimization.

Fig. 2
figure 2

A schematic overview of the dataset. The data were collected from 85 publications, with materials which fall into 4 material classes. There are 6 datasets reporting a total of 8 different corrosion metrics, with 1274 total records.

Methods

The data were collected from experimental results reported across 85 literature sources7,10,11,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94. Each composition was assigned a “Material class” based on the elements present and the source of the data. The Fe- and Al- based alloys are compositions that contain >35 wt. % Fe or Al, respectively. High entropy alloys are compositions that were reported as such in the source literature. The Ni-Cr-Mo ternary system contains alloys with all three elements present as well as some compositions containing just one or two of the three base elements. Alloy compositions which fall outside these bounds are classified as “Other”. The data for Fe-based alloys were collected from the references of a book95, while the data for the Ni-Cr-Mo ternary system of alloys were extracted from a doctoral dissertation89. The data for HEAs and Al alloys were collected from existing literature by searching the Corrosion journal, The Journal of Science and Technology, Science Direct, and Google Scholar. Some values were reported directly in each source, while the others were extracted from the figures using the WebPlotDigitizer software96. The database consists of six datasets containing different electrochemical metrics for CRAs including corrosion current density (icorr), passive current density (ipass), corrosion potential (Ecorr), pitting potential (Epit), repassivation potential (Erp), crevice corrosion potential (Ecrev), pitting temperature (Tpit), and crevice corrosion temperature (Tcrev) (see Table 1 for definitions). Some sources provided average values of these descriptors, while others reported the maximum and minimum values. All of these values were recorded.

In addition to the electrochemical metrics, metadata related to details of the experiments were recorded. For every measurement reporting a corrosion potential, the experimental parameters include the alloy composition and environmental attributes of the corrosion experiment (temperature, test solution, pH, and chloride ion concentration). When provided, the alloy microstructure, alloy heat treatment, electrochemical testing method, and the electrochemical scanning rate were also recorded. For every measurement reporting a critical temperature as the corrosion metric, the experimental parameters reported include the alloy composition, test solution, and test method, with some measurements also including the scan rate of the temperature, test potential, alloy heat treatment, and alloy microstructure.

Data Records

The data were collected from sources studying Fe-based alloys, HEAs, a Ni-Cr-Mo ternary system of alloys, and Al-based alloys. A description of the number of data records based on the electrochemical metric and material class is provided in Table 2. Beyond the reported electrochemical metrics and elemental composition, other fields that may be populated are:

  • [Cl-]: Chloride ion concentration. It has been well established that the presence of chloride ions can break down the passive film of various CRAs and the extent of corrosion strongly depends on the concentration of these ions97. Therefore, it is crucial to include this parameter as an input for any model designed for the prediction of metal corrosion.

  • Microstructures: Description of reported microstructures. In general, homogeneous solid solutions are more resistant to localized corrosion compared to multi-phase alloys. Localized corrosion can be initiated preferentially at heterogeneity on the surface, including dislocations and defects, secondary phase particles, inclusions, and grain boundaries, all of which are closely related to the microstructure of a given alloy. For example, the S-phase (Al2CuMg) particles existing on the surface of many Al-based alloys are more active than the matrix, so they are susceptible to localized breakdown. Similarly, pitting corrosion is usually initiated on MnS precipitates that exist in most stainless steels97.

  • Oxide: Oxide layer composition, if present and reported. As described above, localized corrosion is a phenomena associated with the localized breakdown of the surface oxide, which strongly depends on the chemical composition of the oxide. The most widely used CRAs to date are primarily based on Fe-Cr and Ni-Cr, which rely on the formation of a Cr-rich surface layer that is highly resistant to aqueous corrosion. Therefore, the oxide composition, if known, will be beneficial for the understanding and prediction of localized corrosion.

  • pH: pH of bulk test solution. Metal corrosion is influenced not only by the type of materials and potential, but also by the solution pH. The thermodynamic stability of a given metal or alloy in different pH environments can be found in the potential-pH diagram, also known as the Pourbaix diagram98. For example, under open circuit potential at room temperature, Fe can be strongly dissolved at acidic pH whereas passivisation occurs under basic conditions. Al corrodes slowly in a neutral environment but it is soluble in both acidic and alkaline environments.

  • Scan rate of temperature: For temperature-dependent scans, this reports the scan rate of temperature in °C/min. For example, the determination of the Tpit relies on progressively increasing or decreasing the temperature at a specific step. This step sizes has been known to influence the measured Tpit values and their range of distribution99.

  • Test Solution: Chemical composition of test solution. Corrosion is not an intrinsic property that only correlates to the material itself; it is also significantly influenced by the environment. The chemistry of the test solution is a critical environmental factor for metal corrosion and it directly determines whether localized corrosion will occur. In solutions without aggressive anions, predominantly chloride ions, localized corrosion does not occur, so additional care should be taken when interpreting the results obtained by electrochemical approaches. For instance, the rapid increase of current beyond the passive regime may be transpassive dissolution rather than pitting. Additionally, when the solution contains oxidizing agents, such as Fe3+, the likelihood of localized corrosion will increase. Similarly, if corrosion inhibitors are present in the test solution, the results should not be directly combined with those inputs collected in an inhibitor-free environment.

  • Test temp.: Test temperature (°C). Corrosion is fundamentally governed by both thermodynamics and kinetics, so any changes in temperature will influence the rate of corrosion. An elevated temperature not only helps to overcome the energy barrier required for a given corrosion process to occur, but also accelerates the corrosion rate by simply enhancing the mass transport. In addition, many CRAs do not undergo localized corrosion until a critical temperature range is exceeded, which is usually expressed as the critical crevice temperature or critical pitting temperature. Recent studies showed that these parameters are statistically distributed rather than single-valued99. Due to the critical role temperature plays during corrosion, this input must be taken into account for the prediction of the corrosion rate for any metals.

  • Test method: Type of electrochemical/chemical test performed, e.g. potentiodynamic, potentiostatic polarization, or immersion test. It is commonly observed that results acquired by different techniques can be slightly different, which should be considered if these inputs are combined.

  • Scan rate (mV/s): The rate of potentiodynamic polarization scan in mV/s. During potentiodynamic polarization, the metals are polarized by incrementally increasing or decreasing the potential with a fixed step size, and the corresponding current density (i.e. corrosion rate) is measured. This electrochemical approach relies on the determination of surface reaction kinetics in a steady-state regime, otherwise the measured corrosion rate will deviate from the actual value, leading to inaccurate results that cannot be trusted. Therefore, the scan rate of potentiodynamic polarization must lie within a reasonable range for the results to be trusted, as has been reported elsewhere100.

  • Heat treatment: Description of reported heat processing steps. Heat treatment predominately influences the structure of the alloys, so this input should be considered if provided.

  • Reference: Literature reference number or DOI.

  • Comment: If given, this may include information such as the name of a commercial alloy used in the test or other experimental details.

Table 2 The number of data records by electrochemical metric and material class. Here, “Fe” = Fe-based alloy, “Al” = Al-based alloy, “NiCrMo” = Ni-Cr-Mo ternary system alloy, “Other” = Other alloy.

The database is available online on both the Citrination website101 as well as figshare102. On the Citrination website, the database can be downloaded as a collection of physical information files (PIFs)103, downloaded as an Excel spreadsheet, or used with the Citrination platform to build ML models. On the figshare website, the database can be downloaded as an Excel spreadsheet, in which the six datasets are present as individual tabs. The Fe-based alloys are presented in the spreadsheet without color highlighting and with the “Material class” column as “Fe Alloy”. The rest of the data include HEAs (orange color highlighting, “Material class”: “HEA”), Ni-Cr-Mo ternary system of alloys (green color, “Material class”: “NiCrMo Alloy”)), Al-based alloys (blue color, “Material class”: “Al Alloy”)), and a category containing materials which did not fall into any of the four material classes (purple color, “Material class”: “Other”)). An example section of the Erp dataset (without color highlighting) is shown in Table 3, and the other datasets follow the same or a very similar format. The reference number corresponds to the references in the final tab of the dataset. This data can be straightforwardly incorporated for use in a ML framework.

Table 3 A section of the Erp dataset. The full elemental composition includes the following elements: Fe, Cr, Ni, Mo, W, N, Nb, C, Si, Mn, Cu, P, S, Al, V, Ta, Re, Ce, Ti, Co, B, Mg, Y, and Gd.

Technical Validation

The data were collected and verified as accurate by a team of scientists familiar with all of the electrochemical metrics of corrosion reported here. Furthermore, additional screening was performed by examining outliers in various statistical plots of the datasets. Outliers were investigated and corrected if a transcription error or misinterpretation of the literature value was identified.

This database contains compositions ranging from pure elements, multi-component high entropy alloys, and complex designed steels, with a total of 24 elements represented. To visualize the spread of the compositions in the data, Fig. 3 shows the configurational entropy, ΔS, for each of the six datasets. The ΔS of an alloy is given by104:

$$\Delta S=-\,R\mathop{\sum }\limits_{i=1}^{n}{x}_{i}ln{x}_{i}$$
(2)

where n is the number of elements, xi the concentration of the i-th element and R is the universal gas constant. Note that this quantity represents the maximum possible configurational entropy for a given alloy composition, where this maximum value would be reached in the case of an ideal solid solution105. Different regimes of configurational entropy include low entropy alloys (LEA) having 0 < ΔS ≤ 1.5R, medium entropy alloys (MEA) in the range of 1 ≤ ΔS ≤ 1.5R and HEAs have ΔS > 1.5R104. We find that all of the classes of materials fall in a reasonable range, with the data points classified as HEAs in our database mostly above ΔS = 1.5R, the Al-based alloys falling in the LEA range (with some measurements on pure Al present at ΔS = 0), the Fe-based alloys spanning the LEA to MEA ranges, and the Ni-Cr-Mo ternary system of alloys falling in a cluster within the LEA range. These expected clusterings support the transcription accuracy of the elemental compositions collected in this database.

Fig. 3
figure 3

Configurational entropy of datasets. Here, we show the configurational entropy, ΔS, of the datasets reporting (a) Epit in volts vs. saturated calomel electrode (VSCE), (b) Erp in VSCE, (c) Ecrev in VSCE, (d) Ecorr in VSCE, (e) Tpit in degrees Celcius, and (f) Tcrev,max (maximum of measured Tcrev value) in degrees Celcius. The dashed lines in b) show the guidelines for classifying a low entropy alloy (LEA), medium entropy alloy (MEA) or high entropy alloy (HEA) based on the ΔS value. The histogram to the right of each panel shows the distribution in measured values for each corrosion metric.

In Fig. 4, we show the temperature dependence of the database, linking data from sources that included measurements at multiple temperatures. Here, we clearly see that the majority of the data were measured at room temperature. Indeed, all measurements for Al-based alloys and HEAs were measured at room temperature. Still, among the measurements with multiple temperatures for a given composition, we see a general decrease in Epit, Erp, and Ecrev with increasing temperature, which follows the expected trend97, further supporting the validity of the dataset.

Fig. 4
figure 4

Temperature distribution of datasets. Here, we show the temperature distribution for (a) Epit in VSCE, (b) Erp in VSCE, (c) Ecrev in VSCE. Dashed lines indicate measurements on the same material performed at varying temperatures.

Usage Notes

There are some cases in which the corrosion metric is given as a lower bound, rather than a real value. In particular, many of the entries in the Ni-Cr-Mo ternary dataset are reported as a lower bound for the Erp potential because the corrosion resistance extended beyond the maximum potential of the electrochemical measurement. In order to use these data entries in a regression algorithm, these must be converted to a real number, with the method for conversion depending upon the intended application. Additionally, the content of some trace elements such as C, S, and P was often not provided by the original source. In those cases, the values for non-existing entries is reported as “NA”. Since the localized corrosion of several alloys depends on these trace elements, such as the role of sulfide in stainless steels, these fields may need to be replaced with estimated values. In some rare cases, the concentrations of some major elements such as Cr, Ni, and Mo were not provided by the original source and were therefore also reported as NA. These entries should be treated differently compared to the trace elements and should not be replaced with zeros. Detailed information on the specimens, when provided by the original source, was included in the “comments” field. This information includes materials grade, samples preparation details, surface finish, cleanliness, metallurgy methods, test area, and test location. It should be noted that the data are not uniformly distributed over material classes, elemental composition, or other features such as chloride concentration. Due to the nature of experimental corrosion data found in literature, it is impossible to achieve a perfectly balanced set of data with respect to all features. Therefore, the imbalance in the database should be taken into account when incorporating the data into statistical models.

Additional composition-dependent features could be generated for use in ML algorithms. For example, the Magpie featurization library could be used to generate features based on elemental and ionic properties, stoichiometry, and electronic structure106,107. Additionally, physics-based simulations could be performed to generate additional composition-dependent features. For example, the properties of alloys are influenced by the thermodynamic activity of each component, which can be calculated and used as inputs to the model. Furthermore, the severity of metal corrosion strongly depends on the bonding strengths of metal-metal and metal-oxygen bonds108. Specifically, alloying elements with strong metal-metal bonds, such as Mo, Nb, Ta, and W, do not corrode easily so they can act as dissolution moderators or blockers. Elements with weak metal-metal but high metal-oxygen bonds, such as Al, Ti, and Cr, can facilitate the formation of the surface passivation film. Therefore, the metal-metal and metal-oxygen bond strengths can be calculated6,7 as additional input features. Similarly, chloride ion adsorption susceptibility10,11 can be calculated because different alloying elements have different affinity toward the chloride ions, which are the dominant aggressive species that can break down the alloy passivity.

Care should be taken when combining input variables acquired from different environments. For example, pitting potential could vary with the solution aeration condition, particularly for alloys with a lower pitting potential than the corrosion potential. In naturally aerated solutions, the pitting potential of aluminum alloy 2024, for example, is close to its corrosion potential, Ecorr−1. However, in deaerated conditions, the corrosion potential strongly decreases (Ecorr−2) due to the absence of the oxygen reduction reaction, so the true pitting corrosion of this alloy can be revealed to be in the range of Ecorr−2 to Ecorr−1. Therefore, the solution aeration condition may be considered as a separate input variable. Additionally, some corrosion data were acquired in synthetic sea water, which has a much more complicated chemistry compared to other solution environments. For simplicity, a predictive model could use only pH and chloride ion concentration to approximate the solution chemistry. However, other species existing in this environment could also play a role during the corrosion measurement. Furthermore, some corrosion studies were performed in solutions saturated with oxygen or hydrogen, which may influence the redox environment and thus affect the measured electrochemical metrics.