Background & Summary

Traditional engineering alloys consist of a single principal element (e.g., Fe in steels and Ni in superalloys) and one or more solute elements present in much lower concentrations than the principal element. In contrast, multi-principal element alloys (MPEAs), also called complex concentrated alloys (CCAs), are a class of alloys where no single element dominates the composition and 3 or more principal elements are present in significant amounts. The term high entropy alloy (HEA) is often used to describe MPEAs with 5 or more principal elements and medium entropy alloy typically describes MPEAs with 3 or 4 principal elements. These alloys exhibit unique and extensively tunable properties compared to traditional single principal element alloys1,2,3,4,5,6,7,8,9,10,11.

A primary driver of interest in MPEAs is the significant expansion in compositional design space for new alloy development made available compared to traditional alloys12. Assuming a palette of 30 elements to choose from, there are approximately 143,000 potential 5-component systems and 594,000 potential 6-component systems to explore, with countless compositions within each system to synthesize and characterize, often with unknown processing routes. This large design space presents a challenge, since examining each system experimentally is prohibitively expensive. As such, there has been recent interest in employing computational and data-driven methods to accelerate exploration of MPEA systems and identify promising candidates for experimental study13,14.

Since the approach for MPEA design was defined in 20041,2, there has been a growing body of work in the literature exploring these systems experimentally, with a focus on mechanical properties. An accurate accounting of high quality data from these studies is necessary to aid in further MPEA development, such as identifying gaps in design space, training machine learning models, flagging of outliers, etc. Given the large interest in this class of alloys, data on new systems are rapidly being published, necessitating frequent database updates to maintain relevancy. The updated MPEA mechanical properties database presented here combines data from previous reviews, makes corrections to data, and adds new data from articles published in 2019. The complete database will be hosted online in conjunction with a template to ensure routine updating and public availability of the database.

Methods

Extraction from literature

Two previous reviews of MPEA mechanical properties from 201815,16 were used to populate the initial database. When combined, these reviews contained data on 296 unique MPEA compositions (614 composition-property combinations). The additional data extracted for this study included 334 unique MPEA compositions (931 composition-property combinations), more than doubling the existing data. During extraction and digitization of the initial database, various typos and extraction errors were identified and corrected. Once digitized, the initial database was combined with the newly extracted data and put into single spreadsheet, as demonstrated in Table 1.

Table 1 The 25 MPEAs in the database with the highest yield strength (at room temperature), illustrating how a subset of the data are stored in each field.

To identify new sources of MPEA data, a keyword search for “high entropy alloy” was conducted on Web of Science (query performed October 2019) and responses were filtered for articles published in 2019. From this query, 136 articles were identified as potentially viable sources of experimental MPEA mechanical property data (i.e. articles reporting single and multiphase materials with a minimum of three elements). Defined in detail in the Data Records section, relevant mechanical property data were extracted from plots, tables, and text and input into a tabular format. To extract data from plots, webplotdigitizer17 was employed. The newly extracted data were combined with the previously digitized data to complete the database. A high level overview of the extraction workflow is provided in Fig. 1.

Fig. 1
figure 1

The database generation workflow. Records are first extracted from various publications and input into a defined template format. Post-processing tools are used to identify outliers or erroneous data points. A detailed review of the number of records and properties contained in the resultant database is presented in Table 2.

Data from future publications

For any data that are relevant, but not present in the current review, researchers are encouraged to make their own contributions. Using the template provided on GitHub18 data extraction and digitization can be performed by many groups asynchronously. This template is formatted such that data can be easily uploaded to Citrination, an online platform for materials data19. Upon notification, any data added to the database on Citrination will be verified for integrity by the authors. Researchers are also encouraged to upload their data to other open data resources and contact the authors directly for integration with the MPEA database.

Data Records

The database contains 1545 records from 265 articles. An individual record is defined as having a unique composition, property, temperature, reference combination. For example, if two articles measured the yield strength of HfNbTaTiZr at five temperatures, the number of records extracted is ten. On a per record basis, this database presents a > 100% improvement in the amount of available data when compared to the data presented in the 2018 reviews.

The data in the database are extracted to best represent the data made available by the authors. Often, not all properties in the database are reported for every record. For example, despite the importance of grain size and interstitial contents on properties, particularly for refractory MPEAs, these features are missing from many articles. The data are made available on Figshare20 and in various tabular formats on the project GitHub18. The data have also been digitized into Physical Information File (PIF) records, an open-source json-based schema for materials data21. PIF records are hosted on Citrination (https://citrination.com/datasets/190954) to provide easy access for data visualizations and machine learning. Each data source will be updated continuously as more data are extracted.

The database records consist of the following fields, as available:

  • Alloy composition: Normalized and alphabetized nominal alloy composition, in atomic percent. Validation and alphabetization were performed using the Pymatgen Composition module22.

  • Microstructure: The experimentally observed phases (e.g. FCC, BCC, B2). Any phases that were not BCC, FCC, HCP, L12, B2, or Laves were labeled as “Sec. = secondary” or “Other”.

  • Processing method: The conditions under which the alloy was synthesized. CAST = as-cast or directional casting. POWDER = gas atomization, mechanical alloying, sintering, spark plasma sintering, or vacuum hot pressing. WROUGHT = cold-rolled, hot-rolled, or hot-forged. ANNEAL = annealed, homogenized, or aged. OTHER = additive manufacturing, hot isostatic pressing, or severe plastic deformation.

  • Grain size (μm): The average grain size of the alloy.

  • Exp. Density (g/cm3): Experimentally reported density.

  • Calculated Density (g/cm3): Density estimated using the rule of mixtures (ROM): \(\rho =\Sigma {x}_{i}{M}_{i}/\Sigma {x}_{i}{V}_{i}\) where xi, Mi, Vi are the atomic fraction, molar mass and molar volume of the element i. Elemental density values were obtained via Pymatgen22.

  • Type of test: Indicator for whether mechanical testing was performed under tension (T) or compression (C).

  • Test temperature (°C): Temperature at which mechanical testing was performed.

  • YS (MPa): Measured yield strength.

  • UTS (MPa): Measured ultimate tensile strength (for tensile tests) or maximum compression strength (for compression tests).

  • HV: Experimentally reported Vickers hardness.

  • Elongation (%): Measured elongation at failure or maximum reported compression strain.

  • Elongation plastic (%): Measured plastic elongation or plastic compression strain.

  • Exp. Young’s modulus (GPa): The experimental Young’s modulus, when reported.

  • Calculated Young’s modulus (GPa) Young’s modulus calculated using the rule of mixtures (ROM) for single phase solid solutions only: \(E=\Sigma {x}_{i}{V}_{i}{E}_{i}/\Sigma {x}_{i}{V}_{i}\) where xi, Vi, and Ei are the atomic fraction, molar volume, and Young’s modulus of the alloy element i. Elemental Young’s modulus values were obtained via Pymatgen22.

  • O content (wppm): Measured oxygen content.

  • N content (wppm): Measured nitrogen content.

  • C content (wppm): Measured carbon content.

A portion of the database (the 25 alloys with the highest yield strength at room temperature) is highlighted in Table 1 to provide an example for how data are stored in each field. This is only a subset of the properties collected in the database. Statistics on all properties extracted for the database are presented in Table 2.

Table 2 Statistics of the properties captured in the database.

Figure 2 illustrates the relationship between yield strength and elongation for compressive and tensile tests. Figure 3 illustrates the temperature dependence of yield strength across three microstructure classifications (single-phase BCC, single-phase FCC, and multiphase/other).

Fig. 2
figure 2

Room-temperature yield strength values plotted against elongation. For visualization purposes, elongation results in compression have been assigned negative values in the plot. Points are colored by structural class (single phase BCC (turquoise), single phase FCC (gold), other (magenta)). “Other” is defined as any report of an MPEA that is either multiphase, or single-phase but not FCC or BCC.

Fig. 3
figure 3

Yield strength as a function of temperature for three classes of HEAs ((a) single phase BCC (turquoise), (b) single phase FCC (gold), (c) other (magenta)). “Other” is defined as any report of an MPEA that is either multiphase, or single-phase but not FCC or BCC. The trend may suggest that BCC MPEAs have better high-temperature strength, but also highlights the lack of data available for FCC MPEAs.

Technical Validation

Review by domain experts

The data were collected, processed, and verified for accuracy by a team familiar with MPEAs and their properties. This domain-knowledge was useful during data compilation and formatting of the dataset.

Extreme value identification

During the processing of the database, various statistical plots were generated to assist in the identification of outliers and subsequent removal or correction of inaccurate data. Figure 4 is provided as an example of the outlier identification process. Box plots are generated for properties of interest and extreme values in the tails of the distribution are investigated. Extreme values that could not be verified were either removed or corrected.

Fig. 4
figure 4

Workflow associated with extreme property value verification and (if necessary) correction. Step 1: Box plots were generated for properties of interest (e.g. alloy hardness) and the source of any extreme values were investigated. Step 2: In this case, the inaccuracy was units-related; the value was recorded as in units of GPa, however the database expected units of HV. Step 3: The value with correct units was updated in place of the originally recorded value. Original source reproduced with data from Jumaev et al.26.

Usage Notes

This expanded dataset on MPEAs is intended for use to guide experiment for future alloy development. As shown in Figs. 2 and 3 the dataset can produce informative visualizations to guide researcher efforts. Each record can be accessed programmatically via the Citrination API23. In conjunction with traditional Python data processing packages (e.g. pandas) the dataset will be useful as training data for machine learning applications. To ensure data quality, each record is associated with a digital object identifier (DOI) link to the original source. To improve the predictive capabilities of subsequent machine learning models, researchers are encouraged to contribute to this database through the addition of new data as it is generated.