Fast species ranking for iterative species-oriented skeletal reduction of chemistry sets

A fast algorithm is developed for ranking the species in a chemistry set according to their importance to the modeled densities of user-specified species of interest. The species ranking can be constructed for any set of user-specified plasma conditions, but here we focus predominantly on low-temperature plasmas, with gas temperatures between 300 and 1500 K covering the typical range of ICP and CCP plasma sources. This ranking scheme can be used to acquire insight into complex chemistry sets for modeling plasma phenomena or for a species-oriented reduction of the given chemistry set. The species-ranking method presented is based on a graph-theoretical representation of the detailed chemistry set and establishing indirect asymmetric coupling coefficients between pairs of species by the means of widely used graph search algorithms. Several alternative species-ranking schemes are proposed, all building on the theory behind different flavors of the directed relation graph method. The best-performing ranking method is identified statistically, by performing and evaluating a species-oriented iterative skeletal reduction on six, previously available, test chemistry sets (including O2–He and N2–H2) with varying plasma conditions. The species-ranking method presented leads to reductions of between 10 and 75% in the number of species compared to the original detailed chemistry set, depending on the specific test chemistry set and plasma conditions.


Introduction
Utilizing the unique properties of the low-temperature plasma has become an integral part of almost every industry sector, spanning over a wide range of applications such as medicine, biotechnology, surface modification, microfabrication, har- * Author to whom any correspondence should be addressed.
Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. vesting energy, thrusters, ozone generation or abatement systems, to name just a few. As an example of the importance of the low-temperature plasma technologies for our every day lives, it has been estimated that as much as one-third of steps involved in the manufacturing of microelectronic technologies are plasma-based [1]. While providing desirable properties, the very complex nature low-temperature plasma systems also poses challenges for describing and understanding plasma phenomena. Understanding the plasma properties is crucial for the optimization of plasma-based processes and technologies and the only way to acquire an insight of any significant depth is through numerical modeling techniques.
There are many methods available for modeling lowtemperature plasma properties and behavior which vary in both accuracy and complexity. No matter what kind of plasma model of whatever spatial dimensionality is considered, each is built around a chemistry set which describes the volumetric interactions between all the species tracked in the model, and additionally, the interactions between the species and surfaces. A volumetric and surface chemistry set is a very important base for every plasma model, accounting for the majority of sources and sinks of species. Many pre-compiled detailed chemistry sets for various feed gases and applications can be found in the literature, see for example references [2][3][4][5][6][7][8]. As a consequence of advances in gas kinetics, published chemistry sets are becoming increasingly larger. For plasma physics modeling applications, chemistry sets may routinely include up to a hundred species and many thousands of reactions. For example, Koelman et al [9] provide a chemistry set for the splitting of CO 2 in non-equilibrium plasmas which contains 73 unique species involved in 5724 reactions. In the combustion modeling community, where very large chemistry sets have been used for longer than in the plasma modeling community, some applications require sets that may contain several hundred or even thousands of species and ten thousands of reactions [10].
Any chemistry set might contain redundant species or reactions, which could be removed without compromising any plasma model outputs for the space of model parameters linked to the desired application. In fact, almost all published chemistry sets contain redundant species and reactions [11]. There are two main reasons for this. First, chemistry sets tend to cautiously include species and reactions with uncertain importance for the plasma model built around the given set. Second, even if the importance of each species and reaction is somehow tested, it might be done so for a much wider range of input parameters and conditions, than the range of conditions, the chemistry set is finally utilized over. While large chemistry sets do not significantly impact the performance of more approximative plasma models with low computational cost, such as volume-averaged 0D models, this is not true for plasma models resolving several spatial dimensions, where the larger chemistry sets (and particularly larger numbers of species in the set) means that models can rapidly become prohibitively expensive to run.
While one might come across published chemistry sets which have been reduced to smaller sizes to accurately model only a particular application with a particular set of plasma conditions, such as the chemistry set for plasma in air reduced by Bak and Cappelli [12] from the detailed chemistry set by Kossyi et al [13], these published reduced sets are, by design, only valid for a narrow domain of plasma conditions and parameters described in the publications and therefore cannot be readily adopted for a more general case. It follows that methods for identifying and eliminating redundant species (and reactions) from very large detailed chemistry sets, or more generally, methods reducing the dimensionality of the underlying systems of partial differential equations are of great importance for more computationally costly plasma models, capable of generating more insight than simple 0D models.
The computational cost of solving spatially resolved plasma models increases greatly with the number of species in the chemistry set up to the point, that solving 3D plasma models with more than a few species might become too timeconsuming, costly, and impractical. The number of reactions in the chemistry set N R on the other hand does not increase the dimensionality of the system of governing equations but might still increase its stiffness. Then there is also a question of the interpretability of a solution, which might become unclear with very complex chemistry sets.
Several review papers have dealt with the problem of chemistry set reduction, mostly in combustion modeling [14][15][16][17][18]. Reduction methods might be classified into three main categories: lumping, time scale analysis, and skeletal reduction [18,19]. The lumping method [20] may be useful, whenever the species of a chemistry set can be clustered together into groups with very similar chemical properties, and each group can be then treated as one pseudo-species. This method is used very frequently in plasma modeling, for example for lumping of vibrational levels of species A into few groups [2] or a single lumped pseudo-state A(ν) [21], or for lumping several electronic metastable states into a single compound state A * [21]. Various reduction methods falling under the time scale analysis category aim to define fast reactions or highly reactive species in the chemistry set and approximate those as partial equilibrium reactions and quasi-steady state species. Their corresponding PDEs can then be solved explicitly as algebraic equations or their solutions tabulated, decreasing the dimensionality of the whole system of PDEs. Various methods based on time scale analysis exist, such as quasi-steady state analysis [22], partial equilibrium analysis and more systematic approaches including the intrinsic low dimensional manifold method [23,24] and computational singular perturbation (CSP) method by Lam and Goussis [22,[25][26][27] have been widely adopted in the past in plasma and combustion modeling. Finally, skeletal reduction methods seek to identify and eliminate species and reactions unimportant for a given set of particular simulation conditions and applications. The present work concentrates on skeletal reduction.
The skeletal reduction methods can be further categorized as reactions-oriented and species-oriented, based on whether they aim to reduce the number of reactions or species in the kinetic system. Reactions-oriented methods for elimination of reactions from a chemistry set include among others sensitivity analysis [28,29] and detailed reduction method [30,31]. Elimination of reactions from a chemistry set does not reduce the dimensionality of the system of governing PDEs, and therefore generally the decrease in plasma model computation time due to eliminating a number of reactions from a set is only marginal and due to faster computation of the species sources terms (as well as a possible decrease in stiffness of the system). For that reason, species-oriented reduction methods are arguably more important. The simplest method which is frequently used by the plasma modeling community is the elimination of species with very low density. This method was used for reduction by, among others, Van Gaens and Bogaerts [5], Liu et al [32], and Turner [21], but this method is rather crude and cannot be generalized. More systematic methods for species elimination include, for example, sensitivity analysis [11,33], Jacobian analysis, CSP [34], principal component analysis (PCA) [35], the simulation error minimization connectivity method of Nagy and Turanyi [10] and the directed relation graph method (DRG) of Lu and Law [36][37][38] as well as its variations, such the DRG with error propagation (DRGEP) method of Pepiot-Desjardins and Pitsch [39] and DRG-aided sensitivity analysis [22,40]. Again, most of the above-mentioned methods were developed and used specifically for combustion modeling, but some were recently also used in plasma-assisted combustion applications (PCA and DRGEP, both by Bellemans et al [35,41]) or even in plasma physics and astrochemistry modeling, such as DRG by Sun et al [42], DRGEP by Venot et al [43] and sensitivity analysis by Obrusník et al [44]. The majority of the methods mentioned above are even implemented in a single software package by Lebedev et al [45].
In this work, we present a fast, automated method of ranking all species in a plasma system according to their importance for modeling a certain subset of selected species of interest, together with a simple iterative species-oriented method of skeletal reduction utilizing such a ranking. What we are aiming for is the application of an automated chemistry set optimizer which can be attached to a web-based database of plasma kinetic data. Input for such an application would take the form of a detailed chemistry set (which is valid for a wide range of conditions and simulation parameters), set of simulation parameters and set of species of interest, while output would be an optimized chemistry set of a minimal reduced size, reproducing the densities of all the species of interest with the given plasma model with errors lower than pre-defined value compared to the input detailed chemistry set. The detailed chemistry set could be either a pre-compiled feature of the kinetic database, or dynamically assembled from the processes in the database and their kinetic data. While the methods of lumping and time scale analysis each have their domain of application, the skeletal reduction is an obvious choice for the outlined chemistry set optimizer application, where the detailed set of species and reactions is already expected to be present in the database of kinetic processes, and the attached reduction method aims to identify the essential species (and their reactions) and ignore the rest.
This work is divided into several sections. The next section provides the introduction of the ranking-based speciesoriented iterative skeletal reduction method, which was used to assess different species ranking schemes. In section 3, we present a number of competing ranking schemes based on graph representation of detailed chemistry sets. Different methods of chemistry graph edge weights distribution are considered, all based on a single evaluation of a plasma model with the detailed chemistry set and different flavors of the DRG theory. On this weighted chemistry graph, several methods to evaluate indirect coupling between pairs of species are presented, based on different graph theory search algorithms. Every pair of edge-weighting method and indirect coupling method defines a species ranking scheme, which can be used for a species-oriented skeletal reduction of the detailed chemistry set. Section 4 gives details for a number of test reduction cases which are used to identify the statistically bestperforming ranking scheme in section 5; finally, the last section provides conclusions and outlook.

Ranking-based iterative reduction method
The existing species-oriented skeletal reduction methods mentioned in the introduction all use a thresholding parameter in one form or another (such as the direct interaction coefficient threshold from the DRG theory), with a somewhat arbitrary value. This might pose a problem for some applications, such as the chemistry set optimization tool outlined in the introduction. The arbitrary thresholding coefficients employed in the existing species-oriented skeletal reduction methods generally have a very uncertain mapping to the errors of the model outputs induced by the reduction with given . The output errors also are not monotonic with , forbidding the use of binary search to converge toward the optimal value. Together with the higher computational cost of some of the existing methods, this fact renders employing these methods problematic for the automated chemistry set optimizer tool we aim for. The development of both the reduction method described in the rest of this section and the species ranking scheme addressed in section 3 was motivated by the outlined web-based kinetic processes database application, and specifically as an add-on module for the plasma chemistry Quantemol database (QDB) [8]. However, the method itself is not exclusive to QDB, but applicable to be used with any existing database of plasma kinetic processes, or indeed in any application requiring a fast species ranking, or fast species-oriented skeletal reduction method. Apart from the QDB, many different online databases of plasma kinetic data exist, such as LXCat database for modeling low-temperature plasmas by Pitchford et al [46], Phys4Entry by Celiberto et al for modeling re-entry plasmas [47], databases for astrochemical modeling KIDA by Wakelam et al [48,49] and UDfA by McElroy et al [50], fusion oriented databases by Murakami et al of NIFS, Japan [51] and by Park et al of Data Center for Plasma Properties, NFRI, Korea [52], or BASECOL by Dubernet et al [53], to name a few.

Introduction of the method
In this work, we present a simple chemistry set reduction method overcoming some of the difficulties mentioned in the previous paragraph. The method is based on ranking all the species in the model based on their importance for modeling densities of a defined set of species of interest and iteratively eliminating the lowest-ranked species, while periodically checking after each iteration, if the reduction error does not exceed a defined maximum threshold value. This way, the validity of the reduced chemistry set is always ensured with respect to the pre-selected species of interest and the set of reduction conditions. The rest of this section outlines the general framework behind the method and some key concepts, while the species ranking method development is described in section 3.
The following key concepts and definitions are important for the present work: • Detailed chemistry set: the set containing all the species and reactions. • Reduced chemistry set: a set with at least one species eliminated from the detailed set. • Species elimination: removal of all the reactions involving the given species (on either side). • Reduction conditions: a set of application-specific conditions (such as pressure, power, etc), for which the reduced chemistry set should yield a low reduction error. • Reduction error δ: the relative difference of species of interest densities between the detailed and reduced models. • Allowed error Δ: maximal δ allowed.
• Species of interest: set of species, densities of which need to be modeled within Δ with the reduced chemistry set. • Species ranking: species hierarchy reflecting how important each species is for modeling all of the species of interest. Rank of each species in the chemistry set is determined by its ranking score.
The general structure of our reduction method is summarized by the flow diagram given as figure 1. The method uses a plasma model to generate results used as inputs for a species ranking method and also to check the reduction error after each species elimination. Although the reduction framework described here is defined very generally, throughout this work we used a 0D global model as a plasma model and the reduction error took the form of a mixed error function adopted from the work of Nagy and Turanyi [10] where index i runs over all of the species of interest with number densities A i and the index j runs over the set of pre-selected times of interest {t j }. The concept of the times of interest is further explained in section 3. The global model used throughout this work was the PyGMol (Python global model), which was developed for QDB and specifically for the reduction method presented. Although there are several global models available within the plasma modeling community, such as GlobalKin by Kushner [54], Plasma-R by Kokkoris [55] or ZDPlasKin by Pancheshnyi et al [56], the choice of developing a new code was driven by the need for Python implementation which allowed seamless integration with the rest of the reduction framework.
The PyGMol global model calculates the number densities n i (where the index i runs over all the heavy species in the associated chemistry set), as well as the electron temperature T e , all as a function of time, by integrating the set of particle density balance and electron energy density balance PDEs. The electron density n e is not solved for explicitly but rather implicitly by enforcing the charge neutrality. The heavy species temperature T g is treated as a constant input parameter, rather than being solved for self-consistently. The particle density balance equation includes contributions from volumetric reactions, flow, and from diffusion (surface sinks) losses as well as surface sources. The volumetric kinetics are defined by the Arrhenius formula for both electron processes (2), and heavy species processes (3), where each reaction rate coefficient k is parametrized by factors A, n, and E a . The surface kinetics are described by a set of species sticking coefficients s i for each species, together with an arbitrary number of return species per any stuck species, with return coefficients r ik . The surface coefficients are coupled to a simplistic particle diffusion model, where surface particle loss rate S i and source rate R i are expressed as and respectively, as used (among others) by Kushner in GlobalKin [54]. Here, D i is the diffusivity of the i-the species and Λ is the diffusion length, a function of the plasma geometry. The electron energy density balance includes contributions of power from any external sources absorbed by the plasma, elastic and inelastic collisions between electrons and heavy species, generation and loss of electrons in volumetric reactions and power lost by electrons and ions surface losses.
A detailed description of the PyGMol global model can be found on the QDB website. While obviously being approximative, the PyGMol model should however suffice here, since the quantitative results of the model are not the aim of the present work; an equally simplistic model was used in a similar study by Turner [21,57].
The condition for terminating the reduction method from figure 1 was set to be N δ subsequent iterations resulting in exceeding the maximal reduction error allowed δ > Δ. An appropriate value of N δ must be chosen to balance the tradeoff between the number of eliminated species and the number of plasma model calls.
Finally, it needs to be noted, that validity of the reduced chemistry set for all the species of interest is strictly ensured within the density error Δ only in the scope of the same plasma model that was used in the reduction method. This plasma model, of course, needs to be sufficiently simple to handle the full detailed chemistry set with very fast run-time, so no reduction is strictly necessary, when using the simple model only. Reduced chemistry sets need to be created for more sophisticated (usually spatially resolved) plasma models where the number of species in the set is of much greater importance. In a case of any end-application plasma model (different from the one used in the reduction process), the validity of the reduced chemistry set cannot be readily extrapolated and completely ensured. The use of any reduced chemistry set should therefore always be preceded by careful consideration of the similarity of both the plasma models. As an example, if a reduced set is expected to be used within a plasma model capturing the same physics as the plasma model employed in the reduction method, only in multiple dimensions, it might be appropriate to run the reduction method for more than one point of the reduction conditions parameter space and/or use lower maximal reduction error Δ within the reduction method, than what is an acceptable accuracy of the end-application model.
By the same token, that validity of the reduced chemistry set is by design only ensured for the single set of plasma conditions that are the reduction conditions. When performing a chemistry set reduction targeting a specific modeling application, it might be appropriate to run the reduction with multiple sets of reduction conditions sampled from the parameter space, and only eliminate species that are flagged redundant in all of those runs. Such a parameter sweep, however, can be performed on top of the reduction method presented in this work and therefore will not form part of this paper.

Influence of species ranking
It is clear, that the method for ranking the species is the most important part of the reduction method presented. Good species ranking scores should correlate strongly with the error in the species of interest densities induced by removing the species from a chemistry set. Completed reduction runs may be analyzed by plotting the reduction error as a function of the number of species retained in the chemistry set. As an example, figure 2 shows such a reduction plot for an N 2 -H 2 chemistry set (described more closely in section 4.2) and plasma model with arbitrary reduction conditions. The solid line starts with the detailed chemistry set containing in total 42 species, and follows all the species eliminated from the set while keeping the reduction error δ below the maximal value allowed (in this case, Δ = 10%). The dotted lines show species, which could only be eliminated with an unacceptable reduction error, and were therefore reinstalled back into the chemistry set. The species ranking order coming into the reduction method in this particular case was [N + 3 , N( 2 D), N( 2 P), N + , , N 2 (C 3 Π u ), N 2 (ν 6 ), N 2 (ν 5 ), N 2 (ν 4 ), . . .], where subsequent unsuccessful elimination of the three species N 2 (ν 6 ), N 2 (ν 5 ) and N 2 (ν 4 ) triggered the termination condition for N δ = 2. Figure 2 also demonstrates that the reduction error is generally not monotonic with the number of species eliminated.
Finally, some caveats need to be noted regarding the process of species elimination from a chemistry set. The integrity of the chemistry sets needs to be protected at all times. For example, each retained species must have at least one production and consumption channel and at least one positive ion must be retained in the set. This adds extra constraints on species elimination. Similarly, in some cases, elimination of an a species upsets the balance of the chemistry set to the point where the plasma model solution fails. Cases like this are treated as if the iteration resulted in a higher than allowed reduction error. Safeguards against inconsistent chemistry sets after species elimination are coded into the elimination routine of the framework.

Species ranking
Several methods for fast species ranking are proposed, each based on a single evaluation of a plasma model with the Depiction of a chemistry graph with nodes representing species and directional edges representing asymmetric direct interactions between species, weighted by the direct interaction coefficients. A direct interaction coefficient w(u, v) between uth and vth species is depicted in red, while the asymmetric coupling coefficient W AB between species A and B is hinted in blue, depending on all the paths p(A → B) in the chemistry graph leading from A to B. detailed chemistry set, and a graph-theoretical representation of the chemistry. A chemistry graph is created, with species as nodes and directed weighted edges between species. The species ranking hierarchy is then built and it reflects the indirect coupling between each species and the species of interest. Figure 3 shows a schematic depiction of such a chemistry graph, with 11 nodes (species) and 17 directed edges. Each directed edge (u, v), leading from uth species to the vth species, is weighted by a direct interaction coefficient w(u, v), which is generally a function of reaction rates R j of all the reactions in the detailed chemistry set. By analogy with graph theory nomenclature, when regarding a single edge, or a direct interaction coefficient, uth and vth species will be referred to as tail species and head species respectively, where the vth head species is directly affected by the presence of all the reactions involving uth tail species. Any direct interaction coefficient w(u, v) is a measure of asymmetric coupling between two species that are directly related through some of the elementary reactions in the detailed chemistry set.

Chemistry graph
Coupling between two species, however, exists even if they do not share any elementary reactions. The indirect asymmetric coupling coefficients W AB between species A and B are therefore defined, reflecting the global (often indirect) effect of the presence of species A (and all its reactions) on the modeled density of the species B. For coupling coefficient W AB , the species A and B will, by convention, be referred to as the source species and target species, respectively. A coupling coefficient W AB is generally a function of direct interaction coefficients along all the paths leading from the source species A to the target species B in the chemistry graph. This way, in the resulting species ranking, the rank of each species A will reflect its importance (or rather the collective importance of all the reactions involving the species A) for modeling species B belonging to the set of species of interest; even a species with a very low density will rank relatively high, if it acts as an important intermediate species for production or consumption of any of the species of interest.
The following subsection defines several methods for calculating w(u, v) and W AB and how the species ranking is extracted from the set of coupling coefficients.

Direct interaction coefficients
Several methods for edge weights distribution in chemistry graphs (and for calculating the direct interaction coefficients w(u, v)) have been proposed in the literature.
In the original DRG theory [36], Lu and Law propose that w(u, v) takes the form of the sum of absolute values of production and consumption rates for head species by all the reactions involving the tail species, with a normalizing factor of the sum of absolute values of production and consumption rates for head species by all the reactions in the chemistry set: The index j runs over all N R reactions in the chemistry and R j is the reaction rate of the jth reaction k j is a reaction rate coefficient of the jth reaction (2), (3), and n L l j is the density of the lth reactant of the jth reaction. The term δ j u selects only reactions involving the tail species and a v j is the net stoichiometric coefficient of head species in jth reaction In their other work [38], Lu and Law propose an alternative definition for the direct interaction coefficient to (6), replacing the denomination factor by the absolute value of the total net production rate of the head species This means, that unlike the direct interaction coefficient defined by (6) which is bound between 0 and 1, (10) becomes singular for the head species in equilibrium. Another definition of the direct interaction coefficient was proposed by Pepiot-Desjardins and Pitch [39] in their DRGEP method. In their definition, the direct interaction coefficients are equal to the absolute value of net production of the head species by all the reactions involving the tail species, normalized by the maximum of total production or consumption of the head species, as The range of the direct interaction coefficient (11) also can be shown to be between 0 and 1 [39].
Finally, here we propose another definition of w(u, v) which naturally emerges from (6), (10) and (11) as their combination Similar to (10) and sharing the same denominator, (12) is not bound between 0 and 1 but rather is singular for the head species in equilibrium. The direct interaction coefficients defined by (6), (10)-(12) only reflect the importance of volumetric reactions involving tail species toward modeling the head species density. While this is appropriate for combustion modeling, where the DRG method and its variants all originated, it is, in fact, blind to the effect of surface reactions and conversions, which are often of great importance in plasma modeling. Therefore, in the present work, modified definitions are proposed, addressing also the production and consumption rates of head species by diffusion losses and surface conversions, such as Equations (13a)-(13d) correspond to (6), (10)-(12) respectively, with added species consumption and production rates on surfaces. The index i runs over all N S species in the chemistry set and S i is the surface sticking rate of ith species, defined by (4). The total production and consumption rates of the head species P v and C v are with These modified direct interaction coefficients also preserve the ranges of the original ones, that means (13a) and (13c) are bound between 0 and 1, while (13b) and (13d) are not.

Species coupling coefficients
With chemistry graph edges weighted by the direct interaction coefficients according to one of (13a)-(13d), the species coupling coefficients W AB can be defined between any two species A, B, using some well-established graph search algorithms: shortest path, maximum bottleneck path, maximal product path, and maximal flow searches.
In the shortest path approach, the coupling coefficient between source species A and target species B is calculated as a reciprocal of the length of the shortest path leading from A to B, with the individual edges being attributed their own length equal to reciprocals of weights w(u, v) Since all the edge weights (and therefore lengths) are by definition non-negative, the shortest paths might be calculated very efficiently, using, for example, Dijkstra's algorithm [58] with a very convenient computational complexity O(N 2 S ). Alternatively, one might define the coupling coefficient W AB as the direct interaction coefficient of a rate-limiting step across all the paths A → B. For each path p(A → B), the path rate-limiting step can be defined as with index i running through all the species in any one path p, and with source and target species A, B corresponding to i = 0 and i = n respectively. The coupling coefficient W lim.rate AB is then defined as the global rate-limiting step This is equivalent to the maximum bottleneck path problem, widest path problem or a maximum capacity route problem from graph theory. Similarly, as in the case of the shortest path approach, the maximal limiting rate search between any two species can be performed very efficiently, for example by a modified Dijkstra's algorithm with O(N 2 S ) complexity [59].
Another option is to define W AB as a product of all the direct interaction coefficients in a path p(A → B) with maximal such a product: This definition of the species coupling coefficients only makes sense for direct interaction coefficients bound between 0 and 1, otherwise, the graph search would favor long, convoluted paths involving a maximal number of edges with w(u, v) > 1. Therefore, this approach only applies to the w DRG (13a) and w DRGEP (13c) definitions of direct interaction coefficients, as w DRG (13b) and w DRG (13d) are not bound between 0 and 1, but rather become singular for head species in total equilibrium. The maximal propagated error search is inspired by the notion of error propagation proposed by Pepiot-Desjardins and Pitch [39] in their DRGEP method. If the direct interaction coefficients as edges are approximating errors induced to the density of a head species by the elimination of the tail species, the errors propagate along the paths from source to target species introducing geometric damping. The idea is that if some error is introduced to the prediction of the source species A, the longer the error needs to propagate along the path p(A → B) to reach the target species B, the smaller the effect will typically be. For edge weights bound between 0 and 1, the maximal product path problem is equivalent to the shortest path problem with modified edge lengths, as the coupling coefficient can be redefined as which can be solved by any shortest path searching algorithm, which can handle zero-weighted edges, corresponding to w(u, v) = 1.0. It needs to be noted that Dijkstra's shortest path algorithm treats edges with zero weight as non-existing edges, which would generally result in an incorrect W err.prop. AB , therefore a different algorithm needs to be used to compute (20), e.g. Bellman-Ford algorithm [60], which would in our case have O(N 3 S ) asymptotic complexity. The last option considered in this work for expressing the coupling coefficients W AB from a chemistry graph is using a maximum flow search. In analogy to the maximum bottleneck path problem (18), where the weights of the edges can be imagined as flow capacities and the flow capacity of the maximum-bottleneck path p widest (A → B) is searched for, in the maximum flow problem, the maximum flow from A to B is sought, without the restriction of only one single path. The species coupling coefficient W max.flow AB is then defined as a maximal total flow from A to B, while a partial flow through any single edge f(u, v) does not exceed the edge capacity (or value of the direct interaction coefficient) and flow is conserved in around any node

Species ranking
An instantaneous ranking score C t i for ith species X i can be expressed as a maximum coupling coefficient between X i and any of the pre-defined species of interest at a given simulation time t, such as where index k runs over all the species of interest {X k }. The instantaneous coupling coefficients C t i depend on the reaction rates R t j of the detailed chemistry set at any given time t. To ensure the reduction validity over the whole simulation time span, C t i are sampled over a sufficiently dense set of times of interest {t}, and the ranking scores are defined as This way, the ranking score of any particular species covers the coupling between its presence in the chemistry set and the modeled number densities belonging to the species of interest sampled at the times of interest-which are exactly the outputs the reduced chemistry set is designed to preserve.  Figure 4 demonstrates the importance of choosing an appropriate set of times of interest {t}. It shows the time evolution of oxygen radical in an O 2 -He atmospheric pressure plasma, comparing the PyGMol outputs with detailed and reduced chemistry for two reduced chemistry sets, each reduced with reduction and species ranking evaluated for a different set of times of interest. In both cases, the reduction was performed with Δ = 10% and with only the O atom as a single species of interest. Figure 4(a) shows data for a chemistry set reduced using a single time sample, coinciding with the end of the power pulse. While the atomic oxygen density agrees with the detailed set sufficiently well for the steady-state phase, the reduction error during the afterglow phase might be unacceptable. Figure 4(b) shows that sampling the reaction rates from a larger times of interest set, spanning the pulsing period, results in a (albeit larger) reduced chemistry set which preserves atomic oxygen density even during the afterglow.
The choice of the sampling times of interest will always depend on the specific application and on the computational time budget. A logarithmic distribution of time samples might be more useful for calculations with a constant external power term, while a linear distribution might be preferred for a timedependent power source, as shown in figure 4. The times of interest might also be sampled based on the detailed chemistry set solution, with the instantaneous sampling frequency proportional to the time derivatives of the densities of species of interest. The computational cost of the ranking algorithm will, however, increase linearly with the size of the samples set, because each time sample will effectively instantiate a separate chemistry graph which needs to be searched.
In section 3.2, we proposed 4 alternative definitions of direct interaction coefficients w(u, v) (13a)-(13d), and in section 3.3, we proposed 4 alternative ways to extract the species coupling coefficients W AB out of a chemistry graph, with directional edges weighted by w (u, v). This defines in total 14 alternative methods of ranking species according to (22), with ranking scores being a function of species of interest, times of interest and time-dependent reaction rates simulated by a plasma model with the detailed chemistry set (the two definitions of the direct interaction coefficient (13b), (13d) are not compatible with the W err.prop. AB coupling coefficient (19).) All the proposed alternative species ranking methods were tested for a chemistry set reduction on an array of test sets, reaction conditions and species of interest (section 4) by the method described in section 2, with the results shown in section 5.

Test reduction cases
Six test chemistry sets are introduced, to provide the basis for a number of test reduction cases generating the data for a statistical assessment of all the species ranking schemes introduced in section 3. Table 1 lists all the chosen test chemistry sets with their sizes (number of species and reactions), typical applications and source publications.

O 2 -He test chemistry set (S1)
The first test chemistry set is one for O 2 -He atmospheric pressure plasma taken from the work of Turner [21]. The detailed chemistry set contains 25 species and 373 reactions. Table 2 lists all the species included in the detailed chemistry set, while all the reactions and their kinetics can be found in the source publication. It is evident that the vibrational kinetics is not treated in great detail in the present model, with a single lumped vibrational state (denoted by ν) present for each neutral molecule. This simplification was justified in the source compilation by very dilute mixtures of oxygen in helium and by much lower densities of those vibrational states compared to their ground states. Reduction conditions for this detailed chemistry set should therefore not increase the oxygen ratio beyond the limit set by the source publication. Perhaps the most topical use of the O 2 -He plasma is for a biomedical application at atmospheric pressure. The source publication chose to approximate the so-called micro atmospheric pressure plasma jet (μAPPJ) [62], which is in many ways a typical example. The chemistry set source publication uses a device configuration of 1 × 1 mm 2 channel with a length of 30 mm for its global model and with a constant power of 1 W supplied to the system for 3 ms (which is a residence time of the gas in the channel at a typical flow rate), followed by a further 3 ms of zero power, modeling the afterglow phase. The source publication does not mention considering any surface losses or species conversions, so a set of default surface interaction coefficients was considered for the present work,  The source publication [21] used a plasma model with kinetic data in the same functional form as used by PyGMol, providing an opportunity to validate our PyGMol global model. Figures 5-7 show very good agreement between the data calculated by Turner [21] and our outputs from PyGMol model, for the same conditions and the same O 2 -He chemistry set.

N 2 -H 2 test chemistry sets (S2-S3)
The next two test chemistry sets for N 2 -H 2 plasma are based on the source publication of Hong et al [4]. The source publication compiles a chemistry set containing 42 distinct species and 408 reactions, building on previous work of Gordiets et al [63,64] and adapting it for atmospheric pressure. In contrast to the O 2 -He test chemistry set, this set explicitly tracks the first few vibrationally excited states of the most abundant molecular species N 2 and H 2 . All the species included are shown in table 4, while the reactions can be found in the source publication, together with their original sources.
Perhaps the most topical application for both low-pressure and atmospheric pressure is the ammonia synthesis. However, modeling of N 2 -H 2 plasmas is also important for a wide range of other applications, such as plasma cutting [65,66], arcjet thrusters for satellite propulsion [67,68], AlN, TiN or SiN film deposition [69][70][71], etching of organic low-permittivity layers [72], nitriding [73], and the interaction of puffed N 2 with the H 2 plasma in fusion reactors [74]. In this work, we base two test chemistry sets on the data from Hong et al; one for atmospheric pressure conditions (S2) and one for low-pressure conditions (S3), see table 1.
Kinetic data for the volumetric reactions compiled by Hong et al are not all compatible with the PyGMol global model. Most of the electron processes in the source publication cite the cross-sectional data from the ZDPlasKin [56] internal database and many of the heavy species processes follow reaction coefficients described by varying functions of gas temperature, differing from the Arrhenius representation required by PyGMol. The volumetric reactional kinetics were adapted for PyGMol according to the following procedure: • Electron processes described by cross-sectional data were all fitted to Arrhenius form, assuming the Maxwellian distribution on a grid of electron temperatures. For reproducibility reasons, the cross-sectional data for these   processes were adopted from the QDB database. The threshold electron energy for each of the processes was taken to be the electron energy of the first non-zero data-point of the corresponding cross-section. • For electron processes described in the Arrhenius form, the threshold energy was adopted from the corresponding processes cross-sections found in QDB, as described in the previous point. • All the heavy species processes described in the Arrhenius form were left unchanged. • The heavy species processes following any other functional dependence were all explicitly expressed for a constant gas temperature T g .
The surface coefficients (sticking and return coefficients s and r with return species R) are listed in table 5 for both atmospheric-pressure and low-pressure test chemistry sets. The surface coefficients for quenching of the excited states and for V -V surface transitions are taken from Hong et al (with the original sources listed in table 5). Additionally, all the ions are neutralized on surfaces, while ions with their neutral counterparts not present in the system are returned as their most abundant neutral fragments. Finally, a separate category is formed by the species H and H 2 . The NH 3 production is reportedly dominated by surface processes both for atmospheric-pressure conditions [4] and for low-pressure conditions [75]. The main channels for NH 3 surface production are reported to be the  Table 7. Species included in the test chemistry set for CH 4 -N 2 plasma, taken from the QDB database (chemistry C28). The * symbol denotes the lowest-energy electronically excited states. In our work, for the sake of brevity, these NH 3 surface production channels were represented by sticking and return coefficients for H and H 2 only, as shown in table 5, with values calibrated to match the modeling results reported by Hong et al [4] for atmospheric pressure case and by Carrasco et al [75] for the low-pressure conditions, for the plasma conditions and parameters stated in both sources. Surface sticking of the H species was discounted in the case of atmospheric pressure test chemistry set, due to a much lower degree of hydrogen dissociation. Finally, it needs to be noted that the surface kinetics described by table 5 are highly approximative and should not replace the full set of heterogeneous reactions (e.g. [4]) in any work aiming for quantitative modeling results. However, in our work, it will suffice to provide important test chemistry sets, with some of the species (mainly NH x ) densities being dominantly controlled by surface diffusion and conversion, rather than volumetric processes.

Other test chemistry sets (S4-S6)
Three additional test chemistry sets are used in present work to test the reduction technique: sets for plasma in CF 4 -CHF 3 -H 2 -Cl 2 -O 2 -HBr, CH 4 -N 2 , and finally Ar-NF 3 -O 2 . These chemistry sets were chosen with no regard to any specific target application and they were adopted directly from the pre-compiled plasma chemistries given by the QDB database [8]; their consistency and validity was not verified in the present work. They can be regarded as mere generators of non-linear outputs needed for analysis of the species ranking methods defined in section 3. Tables 6-8 summarise all the species included in these chemistry sets. The full lists of reactions and their kinetic data are not reproduced here, but can be found in the QDB database. In the instances of reactions described by cross-sectional kinetics, the data from QDB were fitted to the Arrhenius form on a grid on Maxwellian temperatures. Default surface coefficients were used for all the QDB test chemistry sets, as defined in table 3. All the reduction cases considered in the present work are listed in table 9. For each reduction case, the species of interest consist of a set of selected negative species (always including an electron), two sets of selected positive species (with higher and lower densities) and three sets of selected neutrals (including one consisting purely of selected excited states.) In some cases, the choice of the species interest is inspired by the chemistry set source publication (such as in the reduction cases C1.4 and C1.5, which feature species identified as important radicals and transient species respectively by Turner [21]), but in most cases, the choice was made purely arbitrarily. The reduction conditions for each reduction case are summarised in table 10. The reduction conditions were kept the same for each reduction case belonging to any single chemistry set. For the first three chemistry sets, the reduction conditions are taken from Turner et al [21], Hong et al [4], and Carrasco et al [75] respectively. For the chemistry sets sourced from QDB (C4.1-C6.6), the reduction conditions were set arbitrarily, with the exception of the Ar-NF 3 -O 2 chemistry set, which adopts the conditions presented in the validation notes in QDB. Table 11 lists the times of interest used to build the species rankings from the test chemistry sets and to evaluate the reduction error within the iterative reduction algorithm. The same times of interest were used for all reduction cases belonging to any single chemistry set. In the case of the pulsed-power reduction cases with the O 2 -He chemistry set (C1.1-C1.6), the times of interest are distributed linearly, covering the whole power cycle. In the remaining cases of constant-power reduction conditions, the times of interest are distributed logarithmically, to capture not only the steady-state conditions but also the evolution from the initial conditions. The simulation times t end (also shown in table 11) were chosen to be sufficiently long for each reduction case to reach a steady state.

Results
Altogether, 14 different species ranking methods were introduced in section 3.4, based on path searches in graphs representing chemistry. These ranking methods are built around 4 separate ways to calculate direct interaction coefficients w (13a)-(13d), together with 4 separate methods of devising coupling coefficients W between any two species, based on different graph path search algorithms, introduced in section 3.3; (13b) and (13d) yield direct interaction coefficients which are not bound between 0 and 1, excluding the chemistry graphs from the use of the maximal propagated error search method.
All the species ranking schemes presented were tested with the species-oriented iterative skeletal reduction method introduced in section 2 on an array of 36 test reduction cases listed in table 9 and with further plasma conditions and model parameters listed in tables 10 and 11. The same maximal reduction error Δ = 10% was considered in the reduction method with all the test reduction cases. The species ranking scheme yielding the best results with the ranking-based iterative reduction method was identified statistically.

Species ranking performance metrics
For a direct comparison of the competing species ranking schemes with regards to the ranking-based iterative reduction method, the number of species eliminated from any given test chemistry set provides a good metric for how well the ranking Table 10. Summary of the reduction conditions for all the chemistry sets considered: pressure p, gas temperature T g , absorbed power P, the reactor dimensions r and z, the total feedstock gas flow Q and the ratio of the feedstock gases.  The ranking schemes are labeled with the different methods of calculating the direct interaction coefficients w and the coupling coefficients W; e.g. the method labeled DRG sh.path uses w DRG (13a) and W sh.path AB (16) respectively. The density species ranking scheme simply ranks the species according to their density as modeled by the plasma model employed with the detailed chemistry set. scheme employed works with the reduction method. We define the number of eliminated species by the jth ranking scheme in the kth reduction case as N j ,k . This is followed with the number of eliminated species by the best-performing ranking scheme Figure 9. Distribution of Δ max , i.e. the difference in the number of eliminated species between each ranking scheme and the best-performing ranking scheme (24), over all the reduction cases. In the first row, histograms for the DRG sh.path ranking scheme are shown, in comparison to the trivial benchmark of ranking species by their densities, shown in the second row. Three columns of plots correspond to three different choices of N δ = 2, 1, 0, with the same color-coding as in figure 8. in the kth reduction case, as For any jth ranking scheme, the value of will determine how well it performs against the best ranking scheme in the kth reduction case. Any well-performing ranking scheme should have consistently low values of Δ max, j k for any reduction case. To assess the performance of the jth ranking scheme on the whole array of test reduction cases (table 9), the deviation from the best parameter σ j can be evaluated as where N = 36 is the number of reduction cases considered.
The σ value is a good measure of a performance of a single method in a pool of other species ranking methods, since it quantifies how many more species were eliminated on average by the reduction using the best ranking method for every reduction case. A lower value of σ translates to a better species ranking scheme for species reduction, while the lower bound for σ is 0, for a hypothetical jth ranking scheme, which would result in an elimination of the highest number of species (compared to all the other competing ranking schemes), for every single reduction case, or In reality, there is a distribution of Δ max, j across the different reduction cases. Of course both Δ max, j k and σ j will be influenced not only by how well the jth ranking scheme performs in the reduction method, but also by the chosen value of N δ , or the maximal allowed streak of unsuccessful species eliminations, as the algorithm termination condition. The results were studied for different values of N δ , however, it was found that N δ = 2 offers a good balance between the number of eliminated species and the number of plasma model calls, for the majority of the ranking methods.

Comparison of ranking schemes
A comparison of the σ value for all of the species ranking schemes considered in this work is shown in figure 8 for three different choices of N δ . As a benchmark, an additional ranking scheme labeled density is included in the results, where the species ranking scores trivially equal to their maximal densities (over all the time samples) solved with the detailed chemistry set, and as such are blind to the choice of the species of interest. The species ranking scheme using the DRG definition of direct interaction coefficients w DRG (13a) and the shortest path approach for the calculation of the coupling coefficients W sh.path AB (16) can be identified as providing species rankings which perform consistently well with the ranking-based iterative reduction method for N δ = 2. Regarding constructing the chemistry graph, the DRG (13a) and DRGEP (13c) methods for direct interaction coefficients give significantly better results than DRG (13b) and DRG (13d) methods, while the shortest path method for species coupling coefficients W sh.path AB appears to yield marginally better results than the rest.
Finally, the histograms in figure 9 show the distribution of Δ max (24), over all the reduction cases, and for 3 different choices of N δ . Histograms for the optimal DRG sh.path species ranking scheme are plotted, together with the density ranking scheme benchmark. The distribution for the DRG sh.path species ranking scheme is centered reasonably tightly around the optimal value of Δ max = 0, however, there are some instances of reduction cases, where this ranking scheme performs significantly worse than the best-performing ranking scheme. By contrast, the ranking scheme based simply on species density values shows distribution with a significantly wider spread.

Reduced test chemistry sets
To give an idea of typical sizes of reduced chemistry sets as compared with the detailed ones, table 12 shows the number of species and reactions in sets reduced using the DRG sh.path ranking scheme for each reduction case from table 9. The table also shows for each reduction case all the best ranking schemes resulting in the greatest reduction (in a number of species) and the number of species corresponding to those, if the DRG sh.path ranking scheme is not among them. The reduction parameters were Δ = 10% and N δ = 2 for each of the reduction cases, and the reduction conditions were as described in tables 10 and 11.
In some cases, such as reduction cases C2.5 or C6.5, the reduced chemistry set was significantly smaller in size than the detailed set. In other cases, such as C4.3, the reduction was far less significant. The difference in size between the detailed chemistry sets and any reduced sets depends on many factors, such as reduction conditions, species of interest, or how well refined the detailed chemistry set is in the first place.

The C1.4 O 2 -He reduction case
Since it is impractical to analyze every reduction case in this work, a single reduction case, C1.4 O 2 -He chemistry set with O, O 2 (a 1 Δ g ) and O 3 as species of interest, was chosen as an example and is considered here in greater detail. Table 13 lists all the species retained in the reduced O 2 -He chemistry set, as well as all the species eliminated, after running the reduction algorithm with the DRG sh.path species ranking scheme for the reduction case C1.4. Figure 10 shows the PyGMol global model solutions for species of interest densities belonging to the same reduced chemistry set. The density evolution is compared to the model with the detailed chemistry  set. It is evident, that the reduced chemistry set indeed preserves the densities of species of interest from the detailed set very well inside the allowed error Δ. Some other facts can be noted about the eliminated species listed in table 13. The He + ion has been eliminated by the reduction algorithm despite being the most abundant ion of one of the feed gases. This is due to a relatively large ionization potential of He atoms and consequently the very low He ionization degree (≈10 −13 at the end of a power pulse, as modeled by PyGMol with the detailed chemistry set). The He + 2 species, on the other hand, has not been eliminated by the reduction method, despite being even less abundant than the He + species. This is simply a shortcoming of the species ranking scheme employed (DRG sh.path ), where He + 2 is ranked above He + , and more importantly, above all of O − , He * and O + 2 , none of which can be eliminated from the chemistry set with an acceptable reduction error δ. He + 2 can indeed be removed from the reduced set without changing the reduction error significantly and would have been, if N δ 3, or if one of the DRGEP species ranking schemes was employed (see table 12). However, for the reduction parameters presented with N δ = 2 and the DRG sh.path ranking scheme, the reduction algorithm was terminated just before the He + 2 iteration. Also to be noted, all of the vibrationally excited species were eliminated except the O 3 (ν) species. This is because O 3 (ν) is a part of the dominant channel for the consumption of O via the reaction O + He + O 2 → He + O 3 (ν), which could not be removed within an acceptable reduction error. Figure 11 shows the production rates of each species of interest via the 5 most prominent processes, for both detailed and reduced chemistry sets described in table 13, sampled 0.1 ms before the end of power pulse. It can be seen, that while some of the channels are lost due to the eliminated species, in this particular case O 2 (ν) and O + 4 , these are generally much less significant channels, with rates about 2 orders of magnitude or more lower than the most prominent channel. This is also true for consumption channels and for other sample times, including ones sampled for the afterglow phase in the second half of the power cycle.
Finally, figure 12 shows the relative differences of some of the global model outputs between the reduced chemistry set (C1.4) and the detailed set. The outputs plotted are the densities of the species other than pre-selected species of interest, as well as the electron temperature. It can be seen that the density error induced by the reduction for some species, such as O( 1 D) and all the charged species e, O − , O + 2 , He + 2 , is fairly significant. This is unsurprising, since both the species ranking coming into the reduction method, and the reduction error (1) were completely blind to all the global model outputs except the densities of the species of interest. Other outputs, such as densities of He * , O 2 (b 1 Σ + g ), O 3 (ν), and the electron temperature T e are well replicated with the reduced chemistry set. This is, however, not guaranteed by the reduction method presented, but rather can be considered a side effect of preserving the dominant production and consumption channels of the species of interest (figure 11).

Conclusions and outlook
A method of ranking the species in a chemistry set according to their importance for modeling densities of a pre-defined set of species of interest is a crucial component of the presented species-oriented method for skeletal reduction of chemistry sets. In this work, we present several competing fast algorithms for species ranking, based on a graph-theoretical representation of chemistry sets. In all of the species-ranking methods presented, weights of the edges between species in the chemistry graphs (or the direct interaction coefficients) were distributed according to the different flavors of the DRG theory, with modifications to include effects of surface interactions, and several species rankings were proposed, built around different well-established search algorithms from the graph theory. We identify the DRG sh.path species ranking method, which uses the definition of direct interaction coefficients from the original DRG theory [36], with Dijkstra's search for the shortest path [58] in the chemistry graph, as the most suited for species-oriented skeletal reduction of chemistry sets, statistically on an array of diverse test chemistry sets and plasma conditions. The DRG sh.path species ranking method led to reductions of between 10 and 75% in the number of species compared to the original detailed chemistry sets, depending on the specific test set and plasma conditions. Our primary motivation for the development of a fast species-ranking algorithm is its use in the species-oriented iterative skeletal reduction method. While we demonstrate its successful utilization for reducing the number of species in several test chemistry sets, the resulting reduced sets likely still retain some redundant reactions. It could be desirable in the future to devise a reaction-ranking scheme which would assess the importance of the reactions retained for modeling the set of species of interest, and thus to expand our chemistry set reduction framework with a reaction-oriented skeletal reduction method.
Additionally, regarding the species-oriented iterative skeletal reduction method presented in this work, the validity of each chemistry set reduced using this method is strictly ensured only for the set of plasma model parameters that are the user-specified reduction conditions. It is often the case in a modeling application, that the chemistry set needs to be valid over a range of input parameters. Although this might be achieved by running the reduction method on a set of samples from the parameter space, as proposed in section 2.1, this kind of parameter sweep might be unnecessarily inefficient. Therefore, modifying the species-ranking method along with the reduction error function δ, so they reflect a range of reduction conditions rather than a single set, could be an important area of the future research.
Finally, a significant shortcoming of the species-ranking method presented is that the ranking only takes into account the species' importance for modeling densities of a set of pre-selected species of interest. The species ranking is blind to any coupling between the presence of the species in the chemistry set and any other possible plasma model outputs, such as electron density and electron temperature. More sophisticated plasma models can also resolve some additional outputs, such as the gas temperature or particle fluxes at surface, and certain applications might dependent on the reduced chemistry set preserving those outputs as well. As an aim for the future research, nodes representing other possible plasma model outputs should be inserted into the chemistry graph with appropriate weights distributed among edges connecting them to the other graph nodes. This will allow for the species ranking to reflect species coupling to the said additional model outputs and the notion of species of interest will be expanded to a more general notion of outputs of interest.