QDB: a new database of plasma chemistries and reactions

One of the most challenging and recurring problems when modelling plasmas is the lack of data on key atomic and molecular reactions that drive plasma processes. Even when there are data for some reactions, complete and validated datasets of chemistries are rarely available. This hinders research on plasma processes and curbs development of industrial applications. The QDB project aims to address this problem by providing a platform for provision, exchange, and validation of chemistry datasets. A new data model developed for QDB is presented. QDB collates published data on both electron scattering and heavy-particle reactions. These data are formed into reaction sets, which are then validated against experimental data where possible. This process produces both complete chemistry sets and identifies key reactions that are currently unreported in the literature. Gaps in the datasets can be filled using established theoretical methods. Initial validated chemistry sets for SF$_6$/CF$_4$/O$_2$ and SF$_6$/CF$_4$/N$_2$/H$_2$ are presented as examples.


Introduction
Realistic plasma models of many processes rest on the availability of reliable atomic and molecular data, so that the models are able to replicate the processes that drive the plasma at the submicroscopic level. Particularly for low-temperature plasmas, which are substantially molecular in composition, the set of possible processes, which we refer to as reactions below, can be very large. For low-temperature plasmas, accurate and comprehensive reaction datasets enable complex modeling of plasma-using technologies that empower our technology-based society [1]. Assembling appropriate datasets is therefore of critical importance.
For a given plasma composition, there are sets of species that are present in the plasma and a set of processes, generally called reactions, that will link the species or different states of the species. This reaction set is described as the 'chemistry' for that plasma. For anything but the simplest molecular plasma, the number of possible reactions that could make up a chemistry can be very large [2]. The important reactions in a given plasma will be a subset of all these possible reactions, although it is not always possible to say in advance precisely which these reactions will be. In this context it is appropriate to characterize a useful chemistry as one which has three attributes: (1) The chemistry should be complete, that is contain all the important reactions for the given plasma. (2) It should be consistent, that is the reactions should not be unbalanced, thus resulting in the plasma composition being driven away from the true composition. (3) Finally, the plasma chemistry should be correct; this criterion cannot be demonstrated on theoretical grounds alone and requires validation against experimental measurements made in plasmas.
Assembling plasma chemistries is far from straightforward. While there may be several chemistries available for relatively simple systems such as molecular nitrogen plasmas [3][4][5][6][7], they generally do not exist for more complex problems such as the chemical mixtures typically used in etching and other technological plasmas. Indeed, given that reactions involving molecular radicals frequently remain completely uncharacterized [8], it is often a challenge to assemble a complete reaction set for these chemistries.
Here we present the QDB. There are a growing number of databases aimed at supplying the needs of plasma modellers. For example, the recent LXCat project of Pitchford et al [9] aims to provide a web-based platform for data needed to model low-temperature plasmas. In practice LXCat considers electron collision processes but not heavy-particle (chemical) reactions. While both QDB and LXCat are set up to accept and provide multiple datasets for a single process if they are available, QDB aims to recommend a dataset for a particular application while LXCat leaves this choice to its users. The Phys4Entry database provides (ro)vibrationally resolved collisional data, including heterogeneous processes, for modeling re-entry plasmas [10]. For low-temperature, astronomical plasmas KIDA [11,12] and BASECOL [13] provide data on chemical reactions and collisional excitation, respectively.
QDB aims to provide a repository for cross sections and/ or rates for key reactions needed for models of low-temperature, i.e. molecular, plasmas. QDB collects data on both electron scattering and heavy-particle reactions and aims to facilitate and encourage peer-to-peer data sharing by its users. At present the data provided are largely for two-body reactions and hence are appropriate for low-pressure plasmas, but this will change in the future. Given sets of reactions, QDB then assembles these sets in chemistries for important plasma mixtures. If there are suitable experimental data available, these chemistries can be validated.
The following section gives an overview of QDB, with a technical specification of the data model given in the appendix. Section 3 catagorises the process types included in the database while section 4 summarizes the data sources used. A list of the reactions with a complete set of references is given as supplementary data to this article. Section 5 explains our chemistry construction and validation procedure; this is illustrated for two chemistries, those comprising SF 6 /CF 4 /O 2 and SF 6 /CF 4 /N 2 / H 2 , respectively. These chemistries were selected due to their importance in silicon etching. Section 6 discusses future developments planned for the database, and the last section provides a summary and conclusions.

Overview
QDB provides reaction rates, cross sections and chemistries. The basic data item is the species, which can be state-specified, e.g. N( 4 S), or not, e.g. N 2 . At present QDB considers three generic species: the electron, the photon, and M, the third body in three-body reactions, plus 405 other atomic and molecular species. This total rises to 904 species when statespecified species are counted separately.
Species can undergo a series of processes called reactions. These includes, for example, elastic scattering or momentum transfer that are not generally regarded as chemical reactions. These processes are considered in more detail below. At present the database contains data on 4099 distinct reactions, comprising 2888 energy-dependent cross sections and 2259 temperature-dependent rate coefficients in Arrhenius form. Note that QDB allows multiple datasets for the same reaction, and for some reactions we have distinct data which are available as both cross sections and rate coefficients.
Many of these reactions are compiled into chemistries. Currently QDB has 29 chemistries, which are tabulated and discussed below. These chemistries can be validated, provided appropriate experimental data are available. Currently QDB contains 8 chemistries with some degree of validation; two of these chemistries are considered in detail below. Notes on the validation procedure are provided in the form of a datasheet for each validated chemistry. Chemistries are awarded a star rating that reflects how far they have been shown to satisfy the criteria of being complete, consistent, and correct. QDB is structured as a MySQL relational database; the data model used is discussed in the appendix to this article.
Users can upload new data using the interface on the QDB website (www.quantemoldb.com) and download data using a choice of file formats, which is being expanded. Currently supported formats are comma-separated text for each reaction, provided as a zip file or in qdat format, which facilitates input for Kushner's Hybrid Plasma Equipment Model (HPEM) [14,15] code. The zip format contains all the cross sections, as individual comma-separated text files, and the rate coefficients that are needed for the specific chemistry. It also includes a manifest file listing all the files provided in the zip archive, a readme file, and a set of citations in bibtex format. The online view of each dataset contains data sheets and a form allowing users to provide feedback. Figure 1 shows a sample screen shot from QDB giving a comparison between various cross sections for electronimpact ionization of methane. Note that for some reactions, cross  [16], and Janev and Reiter [17].
sections for the process are obtained from two different sources; in these cases the agreement is excellent.
Development of QDB is performed with input from the Advisory Board whose members are co-authors of this paper.

Process types
Each reaction dataset in QDB is classified as containing cross sections or rate coefficients. The latter can be generated from the cross sections. The rate coefficients are expressed in Arrhenius form stored in the form of three parameters (A, n, and E), which can be used to compute the rate coefficients at the desired temperatures. For electron-impact reactions the Arrhenius formula is where T e is the electron temperature in eV and E is the activation energy in eV. For heavy-particle reactions the Arrhenius formula employed is where T g is the gas temperature in K and E is the activation energy in K. In both cases, A is the Arrhenius coefficient whose units depend on the order of the reactions. First-order reactions such as photodissociation and photoexcitation are expressed in s −1 ; second-order reactions, such as electronimpact reactions or two-body heavy-particle reactions, are expressed in cm 3 s −1 ; and three-body reactions use cm 6 s −1 . Cross sections, e.g. for electron-neutral-molecule scattering, are given in units of cm 2 as a function of electron energy ineV. Each reaction is classified according to the process considered. These processes are listed in tables 1, 2, and 3 which consider electron collision processes, heavy-particle reactions and processes involving photons, respectively. Note that some heavy-particle processes, in particular HAS and HIR, also involve a third body, generically denoted M in the database. The process label does not depend on the presence of M which is therefore not included in the process description.

Reactions in QDB
The scientific literature contains many measurements and calculations of reaction data that provide potentially useful input to plasma models. However, the task of extracting these data is far from straightforward. So far our strategy has been to focus on major data compilations and data sources. A list of those included so far is given in table 4. In addition a variety of data was taken from models performed by Kushner and coworkers [18][19][20][21][22][23][24][25][26][27][28][29][30][31][32][33][34]. These sources were augmented with individual reactions taken directly from the original scientific literature. Where no suitable data could be found, internal (Quantemol) electron collision cross sections were generated using the Quantemol-N [35] implementation of the UK Molecular R-matrix code (UKRMol) [36]. As implied by table 4, only a few of these cross sections have been published, although some have already been made available via LXCat [9] and the Virtual Atomic and Molecular Data Centre [37]. The process of adding new data to QDB is a continuous one. The present results represent a snapshot of the situation as of November 2016.
A complete list of all reactions currently given in QDB with appropriate bibliographic references is provided in the supplementary data. Tables 5 and 6 summarize the sources of the data currently available in QDB by process type. At present there are relatively few radiative process in the database; the only ones involve radiative decay (PRD) in atoms [21,92,93].

Chemistry construction and validation
The chemistry sets are assembled starting from reactions already present in QDB; missing reactions are then extracted from the literature and added to QDB where possible. In cases where important reactions have not been previously studied, the missing reaction data are calculated using appropriate  [35] for electron-molecule scattering reactions, or by scaling laws, or estimated to provide the necessary data. These data are added to QDB. This allows us to provide complete and self-consistent chemistry sets that form the starting point for validation. Figure 2 illustrates the network of 196 reactions assembled to characterize the chemistry of CF 4 and O 2 . Table 7 lists the chemistries avaible in QDB as of November 2016.
The self-consistency of each chemistry set is checked using a range of models including Kushner's zero-dimensional GlobalKin model [29,32,291,292] as implemented in Quantemol-P. GlobalKin couples molecular data (i.e. reaction probabilities) with plasma models to determine plasma properties, such as equilibrium concentrations of species. A variety of plasmas can be simulated using this software, such as etching and atmospheric pressure plasma reactors. HPEM, as implemented in Quantemol-VT, is also used. Initial validation is achieved by using a chemistry set as the basis for the modeling of different industrial reactors. Comparison of the model output with measurement is the principal means by which validation is achieved. For higherdimensional simulation, the behavior of the species and the surface parameters across the wafer, such as etching or deposition rates, can also be used for comparison. Chemistry sets are given a reliability rating on the basis of these comparisons; see table 8. Of course, we recognize that the partial validations presented below only give agreement in the general trends for a given parameter under one set of specific conditions. Therefore, we cannot guarantee (i) that the given chemistry set produces right results under different operation conditions, nor (ii) that every plasma parameter is reproduced correctly at this stage of validation. Continually improving the offered chemistry sets by extending the validation process to other operating conditions and plasma parameters is one of the goals of QDB. Indeed, it is hoped that the communitydriven nature of the QDB website will inspire new, relevant validation tests which can be carried out to further improve the database.
We now illustrate this process using two chemistry sets: those for SF 6 /CF 4 /O 2 and SF 6 /CF 4 /N 2 /H 2 gas mixture etching Si. As discussed below, these sets comprise subsets of mixtures, which provide important chemistries as well. These were also validated as part of the validation process. The chemistries were validated using GlobalKin and compared with experimental data provided by Infineon. Unfortunately the data available for complex reactions sets such as the ones considered is often very limited and so only partial validation can be expected for these cases: we consider these examples to meet only the lowest level of validation-that is, agreement in the general trends between simulation and experiment for one discharge, as discussed above.
The Infineon tool consists of two parts. The first part is a coaxial microwave plasma discharge. In this part the electromagnetic wave propagates along the interface between the plasma column and the surrounding dielectric tube and the plasma column is sustained by electromagnetic energy. Free radicals are formed in this chamber with high efficiency. This chamber is connected to a larger vessel where the flux of particles propagates and where the remote wafer to be etched is located. Our GlobalKin models only attempted to model the second, larger chamber. GlobalKin performs a spatially homogeneous plasma chemistry simulation which are coupled with surface reaction modules. The model uses a Boltzmann solver to obtain electron impact reaction rate coefficients. These models assumed a plasma volume of 90000 cm 3 , an area around the plasma of 10700 cm 2 . The models, which used an assumed diffusion length of 8.3 cm, were initiated      [19,28,29,32,121,149,180,184,207,248,284, 285] Heavy-particle dissociative neutralization (HDN) [18,20,23,121,149,150,177,272,274,286, 287] Heavy-particle dissociation & charge transfer (HDC) [18-21, 92, 178, 183, 186, 188, 189, 192, 288, 289] [ 24,32,98,109,121,157,197, 233] Heavy-particle dissociation and ionization (HDI) [290] using the feedstock gases. They were run for 500 iterations, corresponding to a total of 1 s, which proved sufficient to reach steady state.

SF 6 /CF 4 /O 2
Initially, distinct sets of chemistries for SF 6 /O 2 and CF 4 /O 2 were constructed and validated separately. These chemistries were then merged and missing reactions, such as SF 6 + CF 3  + SF 5 + + CF 4 [293] or SF x − + CF y  + SF x + CF y , were identified and added. We then set up separate surface chemistries for SF 6 and CF 4 , with a focus on silicon etching by F-radicals in the case of SF 6 and CF 4 . Surface chemistry parameters were taken from Kokkoris et al [294]. Since CF 4 formed a smaller percentage of the mixture, see table 9, we did not include the polymer deposition by CF x radicals. In addition to the F atom reactions, we added reactions of SF x radicals using the same reaction scheme as Kokkoris et al [294], see also comments on this work by Nelson et al [295].
Only just over 1% of oxygen is added to the mixture, as this increases the dissociation of SF 6 and CF 4 . Due to its low density, the oxygen-related surface reactions were not included in the model. We also assumed that no significant concentration of CS molecules is formed during plasma processing with a mixture of SF 6 and CF 4 , due to the large concentration of the F radicals in the mixture. This results in a much higher probability of formation of C x F y and SF x species. The conditions of the experiments are presented in table 9.
In the validation tests of the chemistry set, the power was varied from 1000 to 2000W at a fixed pressure of 500 mTorr.
As a second test we varied the pressure with power fixed at 2000W. We used 40% of the experimental power in our simulation, in order to simulate the energy dissipation. This is almost certainly an overestimate of the power reaching the actual plasma in the Infineon device which explains why the  simulations give faster etch rates than the measurements, see below.
Our GlobalKin simulations only provide global average values and therefore cannot provide an absolute quantitative comparison. For validation of our global model simulation we therefore compare trends. In particular, we compared the trends in the etching rate with measurements provided by Infineon Technologies, who studied the effect of both power variation and pressure variation. According to these measurements, the etch rate increases with increasing power but decreases with increasing pressure. Figure 3 illustrates the effect of varying power and pressure on the silicon etch rate for a mixture of SF 6 /CF 4 /O 2 . Good agreement in the trends is observed between the results of our simulation and the experimental data of Infineon. By increasing the power in the measurements and simulation we observe an increase in the Si etching rate but the rate drops as the pressure is raised.
Given the limited nature of the validation tests we have been able to perform for this chemistry we can rate it at only at level 3 (the lowest rating for a validated chemistry) in table 8.

SF 6 /CF 4 /N 2 /H 2
Initially, two sets of chemistries, for SF 6 /CF 4 , from the previous validation task, and for N 2 /H 2 , were constructed and validated separately. These chemistries were then merged. Since there is only a small proportion of N 2 /H 2 in the mixture, we excluded species like NF x and CH x F y , as well as were estimated from Moseley et al [271] and added to the chemistry list. The SF 6 /CF 4 chemistry was generated using the same assumptions used for the SF 6 /CF 4 / O 2 mixture. We also dealt with the surface reactions in a similar fashion to SF 6 /CF 4 /O 2 . Ranking Description of comparison conditions 1 Self-consistent but behavior differs from available measurements. 2 Not yet compared; no suitable measurements found. 3 Comparison with measurements for the same process conditions versus one variable: power, gas flow or pressure. Behavioral trends reproduced but quantitative agreement may be lacking. 4 Comparison with some measurements for different process conditions (more than one comparison) e.g. validation for different pressure regimes. Quantitative agreement reached for most process conditions and behavioral trends reproduced consistently. 5 Chemistry tested using more than a program: e.g. Quantemol-P and Quantemol-VT or another plasma simulation model. Quantitative and trend agreement across all of a range of process conditions.   Table 10 summarizes the experimental conditions assumed in the model. Figure 4 illustrates the effect of varying power on the silicon etch rate for the SF 6 /CF 4 /N 2 /H 2 mixture. The experimental data from Infineon we tested against compared etch rates, under similar conditions, for the SF 6 /CF 4 /N 2 /H 2 and SF 6 /CF 4 /O 2 mixtures. According to the measurements, a higher etch rate is found for SF 6 /CF 4 /O 2 than for SF 6 /CF 4 /N 2 /H 2 . As shown in figure 5, due to the lower dissociation in presence of N 2 /H 2 compared with O 2 , both simulation and experimental results show a higher etch rate for the SF 6 /CF 4 /O 2 mixtures compared to SF 6 /CF 4 /N 2 /H 2 due to the lower production of F.
Again, given the limited nature of the validation tests that are possible for this chemistry at this time, we give this chemistry set a 3 star rating (see table 8) indicating agreement only in the general trends relating to a single parameter.

Future developments
The process of adding both more reactions and chemistries to QDB is continuous and ongoing. We will also progressively improve the validation status of current chemistries and validate more chemistries, although these activities require appropriate experimental data to be available for us to validate against. We have developed an initial rating system for these chemistries, and we have developed this further by allowing users to also submit ratings. At the same time, we plan to implement more formal uncertainty quantification (UQ) procedures [296].
The processes currently covered by QDB are listed in tables 1, 2, and 3. These lists do cover all possible low-pressure gas phase processes. For example, inclusion of vibrationallyresolved reactions for molecules, such as in electron collisions with CO 2 [79], is important for a number of plasma studies. QDB has the capability to hold such data but more work on data input and processes considered will be required to make it fully functional. At present QDB does not include processes that occur on surfaces and has only limited data for processes involving a third body. Both of these will be included in the database in the future. The inclusion of three-body reactions will extend the coverage to atmospheric pressure plasmas.
At present the data can be downloaded in two formats: a generic one and one that is appropriate for HPEM. With increasingly large datasets and sophisticated modeling programs, it is desirable for data to be transferred directly from the database to the model using an application program interface (API). We plan to develop APIs for commonly used plasma modeling programs in order to facilitate the use of QDB. We are also currenly implementing facilities for users to self-assemble chemistries in their own basket; in the longer term we plan to facilitate this with an automated chemistry generation tool.

Conclusions
One of the challenging problems when modeling plasmas is the lack of reliable chemistry data. For this purpose, we have developed the Quantemol Database (QDB), which aims to provide a platform for the exchange and validation of reactions that are important in plasmas and plasma chemistry datasets. The database provides data on both electron scattering and heavyparticle reactions, and it aims to facilitate and encourage peer-topeer data sharing by its users. QDB currently includes almost 5000 reactions and 29 complete sets of chemistries; so far 8 of these sets have undergone some sort of validation. The set of reactions includes more than 2800 cross sections and more than Appendix: The QDB data model The QDB is implemented using the MySQL relational database management system. An overview of the principal tables and their relations is given in figure 6. Each collision or reaction is considered to take place between one or more reactants to give one or more products. Each of the reactants and products may be an atom, ion, molecule, molecular ion, or particle (such as a photon or an electron), perhaps in a specified quantum state. These species are represented by their chemical formula according to a standard notation (for example, Ar, H2O, NH2+ for atoms and molecules, efor electrons hn for photons). State information is attached as a number of text strings matching a defined pattern, which can be parsed according to the type of state being considered. A list of some of the state types with examples is given in table 11.
Each reactive or collisional process may be described by more than one DataSet: an experimental measurement or theoretical prediction of rate data relating to the process. There are two principal types of DataSet. Cross sections are represented as a table of (electron energy, cross section value) pairs, stored in an external resource referenced by filename or URL. Rate data expressed according to an Arrhenius-like expression, see equations (1) and (2) are represented by storing separate parameters A, n, and E. The parameters and the columns of any tabular data have associated metadata (name, units, and description) in a linked relational database table.
Different reactive and collisional processes are identified by a three-letter code (process type) (for example: EDR = dissociative recombination). The codes employed are an extended version of those defined in the IAEA document of Humbert et al [297]. A list of codes is given in tables 1, 2 and 3; more extensive descriptions and examples are given on the QDB website at http://quantemoldb.com/reactions/processes/.
The structure of the data model allows the user interface to perform searches of the collisions by species (reactant or product), process type, and citation. Furthermore, additional fields within the DataSet table allow for evaluation comments, quality assessment rating, and validity and usage notes to be stored.
QDB Chemistries are self-contained collections of collisional and reactive processes describing the properties of a plasma under some set of conditions. A table in the relational database holds metadata relating to each Chemistry, evaluation notes and ratings, and the associations with the relevant DataSets; see figure 7.