A RESTful API for exchanging Materials Data in the AFLOWLIB.org consortium

The continued advancement of science depends on shared and reproducible data. In the field of computational materials science and rational materials design this entails the construction of large open databases of materials properties. To this end, an Application Program Interface (API) following REST principles is introduced for the AFLOWLIB.org materials data repositories consortium. AUIDs (Aflowlib Unique IDentifier) and AURLs (Aflowlib Uniform Resource locator) are assigned to the database resources according to a well-defined protocol described herein, which enables the client to access, through appropriate queries, the desired data for post-processing. This introduces a new level of openness into the AFLOWLIB repository, allowing the community to construct high-level work-flows and tools exploiting its rich data set of calculated structural, thermodynamic, and electronic properties. Furthermore, federating these tools would open the door to collaborative investigation of the data by an unprecedented extended community of users to accelerate the advancement of computational materials design and development.


I. INTRODUCTION
Data-driven materials science has gained considerable traction over the last decade or so. This is due to the confluence of three key factors: 1) Improved computational methods and tools; 2) Greater computational power; and 3) Heightened awareness of the power of extensive databases in science [1]. The recent Materials Genome Initiative (MGI) [1,2] reflects the recognition that many important social and economic challenges of the 21st century could be solved or mitigated by advanced materials. Computational materials science currently presents the most promising path to the resolution of these challenges.
The first and second factors above are epitomized by high-throughput computation of materials properties by ab initio methods, which is the foundation of an effective approach to materials design and discovery [3][4][5][6][7][8][9][10][11][12]. Recently, the software used to both manage the calculation work-flow and perform the analyses have trended toward more public and user-friendly frameworks. The emphasis is increasingly on portability and sharing of tools and data [13][14][15]. Similar to the effort presented here, the Materials-Project [16] has been providing open access to its database of computed materials properties through a RESTful API and a python library enabling ad-hoc applications [17]. Other examples of online databases for materials properties include that being implemented by the Engineering Virtual Organization for Cyber Design (EVOCD) [18], which contains a repository of experimental data, materials constants and computational tools for use in Integrated Computational Material Engineering (ICME). The future advance of computational materials science would rely on interoperable and federatable tools and databases as much as on the quantities and types of data being produced.
A principle of high-throughput materials science is that one does not know a priori where the value of the data lies for any specific application. Trends and insights are deduced a posteriori. This requires efficient interfaces to interrogate available data on various levels. We have developed a simple WEB-based API to greatly improve the accessibility and utility of the AFLOWLIB database [14] to the scientific community. Through it, the client can access calculated physical properties (thermodynamic, crystallographic, or mechanical properties), as well as simulation provenance and runtime properties of the included systems. The data may be used directly (e.g., to browse a class of materials with a desired property) or integrated into higher level work-flows. The interface also allows for the sharing of updates of data used in previous published works, e.g., previously calculated alloy phase diagrams [19][20][21][22][23][24][25][26][27][28][29][30][31], thus the database can be expanded systematically.
The rest of this paper is organized as follows. The AFLOWLIB libraries are presented in Section II. A new materials identifier constructed to navigate the libraries is introduced in Section III. The data provenance and access format schemes are explained in Sections IV and V. The access syntax and its various options are described in Section VI. A few examples for the use of the API and two computer scripts are given in Section VII. The strategy for updates is mentioned in Section VIII. A brief conclusion is included in Section IX.

II. THE AFLOWLIB LIBRARIES MULTI-LAYERED STRUCTURE
At its core, AFLOWLIB consists of a coordinated set of libraries of first-principles data describing thermodynamic, structural, and other materials properties of alloy systems (Figure 1). From the top, it is administered through a large SQL database [32] organized in layers (reminiscent of other more developed computer interfaces, i.e., IEEE-POSIX [33]). Each layer consists of searchable entries called aflowlib.out. The SQL interface is called the zero-layer ( Figure 2) because of its structural immanence.
In this layered organization, each aflowlib.out can be the child of an aflowlib.out-parent or the parent of an aflowlib.out-child. Each aflowlib.out is identified by a name and an address. The first is the AUID (Aflowlib Unique IDentifier -$auid), while the second is the AURL (Aflowlib Uniform Resource Locator -$aurl). The structure is summarized in Figure 1.
The current implementation of AFLOWLIB includes three layers that can be navigated using control keywords and absolute paths.
• Project-layer. This layer contains information about the project to which the data belongs. For example, for searches in alloys, the project's AURL could be of the type server:AFLOWDATA/ followed by LIB1_RAW, LIB1_LIB, LIB2_RAW, LIB2_LIB, LIB3_RAW, or LIB3_LIB corresponding to the single element, binary and ternary libraries of post-or pre-processed data respectively. For HTTP access [34], the AURL would be translated from $aurl=server:/directory/ into $web=http://server/directory/. Other translations are considered for future implementations. An example of aflowlib.out for the project-layer is depicted in Figure 3(a) (with server=aflowlib.duke.edu).
• Set-layer. This layer contains information about one or more systems calculated in one or more different configurations (e.g., various structural prototypes, different unit cells required for phonons calculations using the finite difference method, etc. ). An example of aflowlib.out for the set-layer is depicted in Figure 3(b).
To facilitate reproducibility, species making up the $aurl might include a subscript or postfix indicating the pseudopotential type. For example, in searches in this layer where the user is interested in the Ag-Ti system, the AURL could be of the type $aurl=server:AFLOWDATA/LIB2_RAW/AgTi_sv/. Here the "_sv" in Ti_sv indicates the "sv" type of pseudopotential in the quantum code used for the calculation, in this case VASP [35]. Other identifiers may be used to indicate pseudopotentials used in, for example, Quantum Espresso (QE) [36] or for potentials coming from other ultrasoft pseudopotential libraries [37].
• Calculation-layer. This layer contains information about one system calculated in one particular configuration (e.g. AgTi in configuration $prototype=66 (C11 b [38,39])). For entries in this layer, the AURL could be of the type $aurl=server:AFLOWDATA/LIB2_RAW/AgTi_sv/66/. This $aurl points to an aflowlib.out describing this specific structure properties. An example of aflowlib.out for the calculation-layer is depicted in Figure  3(c). The calculated geometry (here "66") is one of the strings comprising the AFLOW prototypes' database.
The prototypes' database acts as a look-up table to common Strukturbericht designations [38,39] and may also refer to geometries included in the Inorganic Crystal Structure Database (ICSD) [40][41][42]. In the latter case, a legitimate entry would be something like .../AgTi_sv/ICSD_58369.AB/, where the structure composed of Ag and Ti is calculated in the configuration defined by $prototype=ICSD_58369, and the postfix .AB indicates the species ordering: e.g. .AB or .BA for Ag and Ti in AB or BA positions, respectively. For ternaries the postfix is a combination of A,B,C, where indices can be repeated to create binaries (e.g. AAB). Furthermore, the nonelement X can be included to indicate vacancies in sublattices (e.g. from Heusler to half-Heusler systems [43]). Valid $prototype choices also include strings of the type "f123" following the enumeration scheme of Hart and Forcade [44,45].
Here the characters f, b, h, s indicate fcc, bcc, hcp, and simple cubic structures respectively. The number that follows designates the enumerated prototype. The complete list of structure designations can be accessed with the command "aflow --protos" or by consulting the online links. The options are illustrated in the AFLOW manual [13].
It is important to note that while the operations of concatenating strings to $aurl are reminiscent of a UNIX computer file-system, the final $aurl is not necessarily served as a directory or a file. In fact, a computer daemon (process running in background) dynamically serves multiple file-system directories for the same HTTPtranslated $aurl. The implementation was so chosen in order to better use the calculation-layer data belonging to different project-layers and to allow an $aurl to contain multiple servers: e.g. $aurl=server1,server2: somewhere1/AgTi_sv/,somewhere2/AgTi_sv/. In future updates of the AFLOWLIB.org API, the latter collection will allow users to download AgTi's data from server1 and server2 concurrently and transparently.

III. MATERIALS OBJECT IDENTIFIER: AFLOWLIB UNIQUE IDENTIFIER
Following the spirit of the Digital Object Identifier (DOI) [46] every entry in our database is identified by an alpha-numeric string called an AUID (Aflowlib Unique IDentifier -$auid). In future publications, we will use the AUID as a permanent string to the correct AURL, which might get relocated across servers. By giving a short AUID, the user can identify 1) WEB locations for the deliverables, 2) appropriate points of contact (name, emails or other means of identifying corresponding authors), and 3) copyright and publication date. We have chosen this linked approach so that future databases and expansions of the current ones (e.g., locations, servers, personnel, etc.) will not affect the retrieval of the objects. A simple WEB-FORM will be offered to the community to insert AUIDs and to retrieve the information. The AUID is constructed from a 64-bit CRC check-sum (cyclic redundancy check [47]) of concatenated input and output files. For example, the bcc Ag-Ti structure 66 described above would be accessed using the AUID "$auid=aflow:448e178e19e9e973". The negligible probability of duplicates with this checksum length, and the opportunity of adding a "void character" to the input files (a grain of salt) guarantee the uniqueness of the AUID and its expandibility for many years to come. We welcome other groups to set up versions of AUIDs and to communicate with us the necessary identification data so that our consortium can offer WEB links to entries beyond the ones prepared by its current members.

IV. DATA PROVENANCE
Reproducibility is a key tenet of the scientific method, but replication occurs rarely. As experiments become more complex and costly this problem will only worsen. A particular advantage of simulated experiments is that they are uniquely suited for reproducibility. Imagine a scenario in which the original experimenter publishes his/her executable workflow with the necessary input data. Then, anyone wishing to validate or build upon those results does so with a single command on a system with access to the appropriate software. This could be done so that all the software tools, input and output data are maintained remotely, lowering cost, improving ecological sustainability (saving electricity) and increasing collaboration.
In reality, such a framework is not yet available and its implementation is not so straightforward. Despite the potential advantages of computational science in the realm of reproducibility, the crucial data provenance is often completely neglected. An overall culturally-driven http://www.aflowlib.org/index_sqlite.php disincentive to the actual reproduction of experiments is pervasive. Fortunately there are indications that this problem is being recognized and addressed [2]. AFLOWLIB data is reproducibly structured-the workflow and input parameters are defined by the AFLOW software and a single master input file aflow.in, leading to easily reproducible input parameters. In the case of Density Functional Theory-based simulations, this includes essential calculation parameters such as the k-mesh density, the energy cut-off, the exchange correlation potential, the ab initio software used, and the geometry of the structures. Combining the parameters contained in this file with other provenance related data (calculation time, memory, code, etc.), the AFLOW-API provides curated and reproducible data. Data access can be obtained at any level through the API with the appropriate AURL strings, currently translated into WEB inquiries. WEB forms such as the one shown in Fig 2 allow the user to search within a project for data fitting specified criteria. Alternatively, access through the API is supported by several data formats: "HTML" [34], "JSON" [48], "DUMP", "PHP" [49], "TEXT", and "NONE". The format option is intended for use on the level of the entry returning the whole aflowlib.out. For any given property in the set of keywords contained in the entry, the mode is currently to return a simple byte sequence with no formatting. Attempting to format a single property returns the full property set in the specified format. For example, the AURL $aurl/?density returns the density of the specified structure. More specifically, for $aurl=aflowlib.duke.edu:AFLOWDATA/LIB2_ RAW/AgTi_sv/66/?density it returns "6.54346".  Energy (eV)

CHOOSE DATABASE
AlCu_pvMn_pv. States (States/eV) full set of entry properties. "HTML" is primarily for interactive use where keywords and files can be promptly explored with web browsers; "JSON" and "PHP" are valid language syntaxes to facilitate the data access by programmed codes; "DUMP" allows the user to access in his/her own method; "TEXT" returns the entry in a single line with the keywords separated by "|" so that other databases can be built on top of AFLOWLIB.org; and "NONE" may be useful as a method to test the existence of an AURL or for debug purposes.
Clear attribution of contributed data is essential for the development of distributed databases comprising inputs from a wide network of contributors. AFLOW facilitates attribution with the AUID, a unique and persistent identifier, that includes the author, laboratory, group, and affiliation as data entry fields. The shared content in the database is simple to reduce or augment according to a contributor's preference and the attribution is ensured by the unique identifier and contributor labels that are accessible with the AFLOWLIB-API.
The structure of the AFLOWLIB database is federated : Autonomous members of the consortium (with distinct geographical locations and affiliations) are able to transparently contribute to a composite database, preserving ownership and claim over the substance of their data. The underlying meta-data schema of the contributed data are consistent by production, to ensure the clarity and searchability of the composite database. A contributor to the consortium begins by downloading the latest version of the AFLOW binary (as of writing this paper this is version 30825) and interfacing with a quantum code. AFLOW is currently configured to run VASP automatically. Pre-and post-processing is functional for both VASP and Quantum Espresso so that agnostic standardization of inputs and outputs between the two codes can be obtained.
The layered and reentrant structure of the AFLOWLIB API allows the manipulation of data coming from differ-ent sources and databases, e.g. the Materials Project [16,17]. In order to facilitate this future extension, special keywords are introduced here to identify the source ($aurl/?data_source) and the translated syntax of the information ($aurl/?data_language). We foresee a global common interface where users can approach heterogeneous data and applications to leverage the efforts of different consortia. Note that in this scenario, due diligence is required to recognize the authorship of the original work, and not the serving database, merely.

VI. TABLE OF PROPERTIES AND API KEYWORDS
This section includes the keywords currently present in the database: description, type, inclusion policy and the AFLOWLIB syntax for retrieval. The list is divided into mandatory, optional control and optional materials keywords. The mandatory keywords must be present in every entry at all layers of the database. Some of the optional control keywords appear at the projects and systems levels while others appear at the calculations level. The optional materials keywords usually appear just at the calculations layer, and not all of them are present in all of the entries. Each entry begins with the AURL and AUID keywords, denoted by the syntax words $aurl and $auid, respectively. • auid -Description. "AFLOWLIB Unique Identifier" for the entry, AUID, which can be used as a publishable object identifier, following the spirit of the DOI foundation [46] (see Section III). -Type. string.
• aurl -Description. "AFLOWLIB Uniform Resource Locator" returns the AURL of the entry. The web server is separated from the web directory with ":". This tautological keyword, aurl returning itself, is useful for debug and hyperlinking purposes.   • aflowlib_entries (aflowlib_entries_number) -Description. For projects and set-layer entries (see Figure 1), aflowlib_entries lists the available subentries which are associated with the $aurl of the subdirectories. By parsing $aurl/?aflowlib_entries (containing $aurl/aflowlib_entries_number entries) the user finds the further locations to interrogate.
• aflowlib_date (aflowlib_version) -Description. Returns the date (version) of the AFLOW post-processor which generated the entry for the library. This entry is useful for debugging and regression purposes.
• aflow_version -Description. Returns the version number of AFLOW used to perform the calculation. This entry is useful for debugging and regression purposes.
• author -Description. Returns the name (not necessarily an individual) and affiliation associated with authorship of the data. Multiple entries are separated by commas. Spaces are substituted with " " to aid parsing.
• corresponding -Description. Returns the name (not necessarily an individual) and affiliation associated with the data origin concerning correspondence about data. Multiple entries are separated by commas. Spaces are substituted with " " to aid parsing.
• data_source, data_language -Description. As mentioned in the text, the layered structure of AFLOWLIB well adapts to serve and translate data presented in other open databases. If this is the case, the source and language (API) of the data are given with these two keywords. When using non-AFLOWLIB data, due diligence is required to recognize the authorship of the original work, and not the serving database, merely.
• loop -Description. Informs the user of the type of post-processing that was performed.
• node_CPU_Cores, node_CPU_MHz, node_CPU_Model, node_RAM_GB -Description. Information about the node/cluster where the calculation was performed. Number of cores, speed, model, and total memory accessible to the calculation.
-Units. MHz for speed, gigabytes for RAM.
• sponsor -Description. Returns information about funding agencies and other sponsors for the data. Multiple entries are separated by commas. Spaces are substituted with " " to aid parsing.
-Tolerance. Calculations of lattices (Brillouin zones), prototypes, and symmetries (point/factor/space groups) are based on different algorithms and require different sets of tolerances. To guarantee selfconsistency of the results, initial tolerances are set to very stringent values (e.g., 10 −4 % for distances, 10 −2 % for angles, 10 −4 % for spectral radii of mapping matrices, etc.) and slowly increased alternatingly (by a factor of 2) until self-consistency is found amongst geometrical descriptors. The final tolerances are usually of the order of ∼ 0.5% for distances and ∼ 1% for angles.
• code -Description. Returns the software name and version used to perform the simulation.
• composition -Description. Returns a comma delimited composition description of the structure entry in the calculated cell.
• compound -Description. Similar to composition. Returns the composition description of the compound in the calculated cell.
• density -Description. Returns the mass density.
• dft_type -Description. Returns information about the pseudopotential type, the exchange correlation functional used (normal or hybrid) and use of GW.
• eentropy_cell (eentropy_atom) -Description. Returns the electronic entropy of the unit cell used to converge the ab initio calculation (smearing).
-Units. Natural units of the $code, e.g., eV or Ry (eV/atom or Ry/atom) if the calculations were performed with VASP [35] or QE [36], respectively.
• Egap -Description. Band gap calculated with the approximations and pseudopotentials described by other keywords.
• Egap_type -Description. Given a band gap, this keyword describes if the system is a metal, a semi-metal, an insulator with direct or indirect band gap.
• energy_cutoff -Description. Set of energy cut-offs used during the various steps of the calculations.
• enthalpy_cell (enthalpy_atom) -Description. Returns the enthalpy of the system of the unit cell H = E + P V (enthalpy per atom -the value of enthalpy_cell/N ).
-Units. Natural units of the $code, e.g., eV or Ry (eV/atom or Ry/atom) if the calculations were performed with VASP [35] or QE [36], respectively.
• enthalpy_formation_cell (enthalpy_formation_atom) -Description. Returns the formation enthalpy ∆H F per unit cell (∆H F atomic per atom). For compounds A N A B N B · · · with N A + N B · · · = N atoms per cell, this is defined as: -Type. number.
-Units. Natural units of the $code, e.g., eV or Ry (eV/atom or Ry/atom) if the calculations were performed with VASP [35] or QE [36], respectively.
• entropic_temperature -Description. Returns the entropic temperature as defined in Ref. [3,7] for the structure. The analysis of formation enthalpy is, by itself, insufficient to compare alloy stability at different concentrations and their resilience toward high-temperature disorder. The formation enthalpy represents the ordering-strength of a mixture A x A B x B C x C · · · against decomposition into its pure constituents at the appropriate concentrations x A , x B x C , · · · . (∆H F is negative for compound forming systems). However, it does not contain information about its resilience against disorder, which is captured by the entropy of the system. To quantify this resilience we define the entropic temperature for each compound as: where the sign is chosen so that a positive temperature is needed for competing against compound stability. This definition assumes an ideal scenario [3] where the entropy is . T s is a concentration-maximized formation enthalpy weighted by the inverse of its entropic contribution. Its maximum T s = max phases [T s (phases)] represents the deviation of a system convex-hull from the purely entropic free-energy hull, −T S(x), and hence the ability of its ordered phases to resist the temperaturedriven deterioration into a disordered mixture exclusively promoted by configurational-entropy.
• files -Description. Provides access to the input and output files used in the simulation (provenance data).
-Description. Once the "files" list has been parsed, each file can be accessed with $aurl/file (note no "?" for accessing individual files).
• forces -Description. Final quantum mechanical forces (F i , F j , F k ) in the notation of the code.
-Type. Triplets (number,number,number) separated by ";" for each atom in the unit cell.
-Tolerance. See entry Bravais_lattice_orig or discussion about tolerances.
• kpoints -Description. Set of k-point meshes uniquely identifying the various steps of the calculations, e.g. relaxation, static and electronic band structure (specifiying the k-space symmetry points of the structure).
-Type. Set of numbers and strings separated by "," and ";".
-Request syntax. $aurl/?kpoints. The fourth field specifies the effective on-site exchange interaction parameters ({J}, one number for each species separated by ","). Although more compact, the convention is similar to the VASP notation [35].
• natoms -Description. Returns the number of atoms in the unit cell of the structure entry. The number can be non integer if partial occupation is considered within appropriate approximations.
-Tolerance. See entry Bravais_lattice_orig or discussion about tolerances.
• positions_cartesian -Description. Final Cartesian positions (x i , x j , x k ) in the notation of the code.
-Type. Triplets (number,number,number) separated by ";" for each atom in the unit cell. • pressure -Description. Returns the external pressure selected for the simulation.
• prototype -Description. Returns the AFLOW unrelaxed prototype which was used for the calculation. The list can be accessed with the command "aflow --protos" or by consulting the online links. The options are illustrated in the AFLOW manual [13]. Note that during the calculation, unstable structures can deform and lead to different relaxed configurations. It is thus imperative for the user to make an elaborate analysis of the final structure to pinpoint the right prototype to report. Differences in Bravais lattices, Pearson symbol, space groups, for the _orig and _relax versions are extremely useful for this task.
-Tolerance. See entry Bravais_lattice_orig or discussion about tolerances.
• PV_cell (PV_atom) -Description. Pressure multiplied by volume of the unit cell (of the atom).
-Units. Natural units of the $code, e.g., eV or Ry (eV/atom or Ry/atom) if the calculations were performed with VASP [35] or QE [36], respectively.
• sg (sg2) -Description. Evolution of the space group of the compound [50,51]. The first, second and third string represent space group name/number before the first, after the first, and after the last relaxation of the calculation.
• spinD -Description. For spin polarized calculations, the spin decomposition over the atoms of the cell.
• spinD_magmom_orig -Description. For spin polarized calculations, string containing the values used to initialize the magnetic state for the ab initio calculation.
-Type. String containing the instruction passed to the ab initio code with spaces substituted by " ".
-Units. Natural units of the $code.
• spinF -Description. For spin polarized calculations, the magnetization of the cell at the Fermi level.
• stoichiometry -Description. Similar to composition, returns a comma delimited stoichiometry description of the structure entry in the calculated cell.
• volume_cell (volume_atom) -Description. Returns the volume of the unit cell (per atom in the unit cell).

VII. EXAMPLES
A. Generating a free-energy zero temperature convex hull: OsTc In this example we introduce the steps to generate a binary free-energy zero temperature convex hull at zero temperature. As an example, we choose the system OsTc [7,23,62], and we illustrate the logical steps for obtaining it. The user should prepare his/her own computer code to download and analyze the data as suggested.
2. The user downloads and parses the query $web/ ?keywords. Being in a project-layer, a better and faster alternative is to download the entries' number and type with the queries $web/?aflowlib_entries and $web/ ?aflowlib_entries_number.
The user then parses $web/?aflowlib_entries and the string Os_pvTc_pv associated with the requested OsTc free-energy zero temperature convex hull.
5. Finally, the user collects the free-energies and plots the convex hull as depicted in Figure 4.
6. The whole process can be performed with the AFLOW code. The command "aflow --alloy OsTc --update --server=aflowlib.org" connects to the appropriate server, downloads the information, calculates the free-energy curve and prepares a PDF document with the appropriate information and hyperlinks to the individual entries. See the AFLOW literature for more options [13]. The user still has to double check the final relaxed structure prototypes. This is performed with a combination of: $web/Os_pvTc_pv/$entry i /?compound, $web/Os_pvTc_pv/$entry i /?geometry, $web/Os_pvTc_pv/$entry i /?positions_cartesian, $web/Os_pvTc_pv/$entry i /?prototype, including files such as: $web/Os_pvTc_pv/$entry i /edata.orig.out, and $web/Os_pvTc_pv/$entry i /edata.relax.out, and verifying the results by consulting appropriate prototype databases (e.g., the Naval Research Laboratory Crystal Structure database, Ref. [39]).
In this example we introduce the steps to generate a ternary zero temperature phase diagram. As an example, we choose the system CoNbSi [43], and we illustrate the logical steps for obtaining it. The user should prepare his/her own computer code to download and analyze the data as suggested. 1.
4. The user calculates the convexity of the formation enthalpy landscape (we use QHULL [63]) and plots the phase diagrams (we use GNUPLOT [64]). The diagram is depicted in Figure 5.

C. Obtaining band structures
In this example, we introduce the steps to obtain the band structure and density of states plots for a calculated material. As an example, we choose the compound Al 2 CuMn. The user should prepare his/her own computer code to download and analyze the data as suggested. 1.
2. Within the project-layer, the user parses the query $web/?aflowlib_entries, which shows that the string AlCu_pvMn_pv is associated with the requested AlCuMn ternary system.
3. The user parses $web/AlCu_pvMn_pv/. The query is part of a set-layer. In this case, the set contains 10 entries, namely the calculation for the AlCu_pvMn_pv system in the prototypes ICSD_57695.ABC, T0001. 4. Using the ?enthalpy_formation and ?loop queries for these 10 entries the user finds which are stable and include a band structure calculation (indicated by a negative formation enthalpy and the string bands in the ?loop query output). For this example, the user selects the entry $web/AlCu_pvMn_pv/T0001.A2BC/ for the compound Al 2 CuMn, which satisfies both queries.
5. At the calculation-layer, the user finds the full aflowlib.out entry.
By interrogating $web/AlCu_ pvMn_pv/T0001.A2BC/?files, the user obtains a list of all of the files available for download for this calculation, including: • the input file for the calculation $web/AlCu_pvMn_ pv/T0001.A2BC/aflow.in, A collage of these files is shown in Figure 6. Also available in the calculation-layer for this entry are the input and output files from the VASP and AFLOW runs, that the user may extract from the output of the $web/AlCu_pvMn_pv/T0001.A2BC/?files command.

D. Synergy of experimental and calculation data on a rare prototype
The experimental data on binary alloys contains many gaps. It also presents a huge panoply of structural prototypes, ranging from very common ones, appearing in hundreds of compounds, to very rare ones appearing in just a few systems. HT calculations can be used to bridge those gaps and provide a more complete picture about the existence of yet unobserved compounds and their structures. They can also considerably extend the predicted range of those rare prototypes, indicating their existence in a larger set of binary systems. One such example studied the prevalence of the Pt 8 Ti prototype. This structure has been experimentally observed in 11 systems, but a high-throughput search over all of the binary transition intermetallics revealed it should be stable at low temperatures in 59 systems [28]. The study verified all the experimental occurrences while offering additional predictions, including a few surprising ones in supposedly well-characterized systems (e.g., Cu-Zn). This example serves as a striking demonstration of the power of the high-throughput approach. In this section we present a new example, discussing recent reports observing the rare prototype Pd 4 Pu 3 in a few transition metal binaries and computationally predicting a considerable extension of its stability or metastability in such systems.
The Pd 4 Pu 3 (hR14, space group #148) was first observed in its eponymous system in 1967 [66,67]. It has since been reported in 37 additional binary systems, mostly of a lanthanide or an actinide with the elements Pt or Pd. [68]. Only 6 compounds of this prototype have been reported in transition metal binary systems: Ni 4 Ti 3 [69,70], Pd 4 Y 3 [71], Pd 4 Zr 3 [72], Pt 4 Zr 3 [73], Rh 4 Zr 3 [74], and most recently Hf 3 Pt 4 [75]. In these compounds, one component is a 3B or 4B element and the other is from the ninth or tenth column of the periodic table. In   [65]. Colors denote reported compounds with indicated year of discovery (green and light blue) and prediction of unreported compounds (red and orange) found to be stable (green and red) or metastable (light blue and orange) in the calculations. Systems where the structure is unstable (formation enthalpy of more than 30 meV/atom above the convex hull) are denoted in blue. The square parentheses denote the formation enthalpy of metastable and unstable structures above the convex hull of the respective system. this example we wish to examine the possible appearance of this prototype in all transition metal binary systems of these columns (30 systems). This can be done in a few steps, as follows: 1. Consulting the complete list of structure designa-tions of AFLOW with the command "aflow --protos" or by the online links, the user finds the label of the prototype, in this case 655.AB or 655.BA (depending on the order of the species).

2.
Using $aurl=aflowlib.duke.edu:AFLOWDATA/ LIB2_RAW/?aflowlib_entries the user finds the entry name for each of those 30 systems in the set-layer. Then, using $aurl=aflowlib.duke.edu:AFLOWDATA/ LIB2_RAW/XXX/?aflowlib_entries for each of those names the user finds the calculations of the desired prototype in the calculaiton-layer (indicated by the string 655.BA or 655.BA in the query output).
3. Following the steps of example VII A the user constructs the convex hull for each of these systems and finds the position of the desired structure in it, as either stable, metastable or unstable.
Following these steps for the Pd 4 Pu 3 prototype we find that it appears as a low temperature stable compound in six systems, two reported in experiments and four newly predicted ones. The structure is also found to be metastable (with less than 30meV/atom above the respective system convex hull) in ten systems, of which it was reported in four by experiments, and is predicted in six additional ones. Among the predicted phases, three compounds of the same stoichiometry, Pt 4 Y 3 , Hf 3 Pd 4 and Hf 3 Rh 4 , are reported with an unknown structure in the experimental literature but identified with the Pd 4 Pu 3 structure in the calculations. Overall, the calculation extends the prevalence of this prototype (stable or metastable) among transition metal binaries from five systems to sixteen. Figure 7 summarizes these results.

E. Bash api.sh example
This "bash" script example api.sh downloads an aflowlib.out entry for the project-, set-, or calculation-layers of the binary alloys. This "python3" script example api.py downloads an aflowlib.out entry for the project-, set-, or calculation-layers of the Heusler alloys database.

IX. CONCLUSION
The AFLOWLIB API provides a simple and powerful tool for accessing a large set of simulated materials properties data. This will allow the community to make use of AFLOWLIB to the fullest extent possible, through search formats allowing complete accessibility of the database contents at all levels and integration of search results into externally formulated workflows. Such workflows may execute any type of investigation on the obtained data, ranging from a simple study of the properties of a specific material to extensive statistical analyses of whole structure classes for materials prediction. The full provenance of the data produced is provided, following a standard of reproducible and transparent scientific data sharing, to facilitate its straightforward reproduction and extension.
The AFLOWLIB database is growing continually by updating existent alloy libraries and adding new ones (e.g., recent attention is focused on ternary systems and electronic properties). The new API described in this paper is built on top of the AFLOW framework, developed to create the database and to interrogate it, but it can be easily extended to other materials design environments. It is constructed as a federatable tool to maximize the utility of the database to the scientific community and expedite scientific collaboration with particular emphasis on reproducibility, accessibility and attribution.