Skip to main content
Log in

Data-driven Classification and Modeling of Combustion Regimes in Detonation Waves

  • Published:
Flow, Turbulence and Combustion Aims and scope Submit manuscript

Abstract

A data-driven approach to classify combustion regimes in detonation waves is implemented, and a procedure for domain-localized source term modeling based on these classifications is demonstrated. The models were generated from numerical datasets of canonical detonation simulations. In the first phase, delineations of combustion regimes within the detonation wave structure were analyzed through a clustering procedure. The clustering output usefully illuminated distinctions between detonation, deflagration, and intermediary regimes within the wave structure. In the second phase, the resulting delineated fields from the clustering step were used to guide localized source term modeling via artificial neural networks (ANNs), enabling a type of classification-based regression approach for source term estimation. A comparison of the estimations obtained from the local ANNs (trained for a subset of the domain given by a particular cluster) with the global ANN counterparts (trained agnostic to the clustering) showed general improvement of estimations provided by the domain-localized modeling in most cases. Ultimately, this work illuminates the useful role of data-driven classification and regression techniques for both physical analysis of the complex wave structure and for the development of new models which may serve as suitable pathways for long-time simulations of complex combustion systems (such as rotating detonation combustors).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Akintayo, A., Lore, K.G., Sarkar, S., Sarkar, S.: Prognostics of combustion instabilities from hi-speed flame video using a deep convolutional selective autoencoder. Int. J. Prognost. Health Manage. 7(023), 1 (2016)

    Google Scholar 

  • Arthur, D., Vassilvitskii, S.: k-means++: The advantages of careful seeding. In: Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, (2007)

  • Barwey, S., Hassanaly, M., An, Q., Raman, V., Steinberg, A.: Experimental data-based reduced-order model for analysis and prediction of flame transition in gas turbine combustors. Combust. Theory Modell. 23(6), 994 (2019)

    Article  MathSciNet  Google Scholar 

  • Barwey, S., Hassanaly, M., Raman, V., Steinberg, A.: Using machine learning to construct velocity fields from OH-PLIF images. Combust. Sci. Technol. 23(6), 1 (2019)

    Article  Google Scholar 

  • Barwey, S., Ganesh, H., Hassanaly, M., Raman, V., Ceccio, S.: Data-based analysis of multimodal partial cavity shedding dynamics. Exp. Fluids 61(4), 1 (2020)

    Article  Google Scholar 

  • Bell, J.B., Brown, N.J., Day, M.S., Frenklach, M., Grcar, J.F., Propp, R.M., Tonse, S.R.: Scaling and efficiency of PRISM in adaptive simulations of turbulent premixed flames. Proc. Combust. Inst. 28(1), 107 (2000)

    Article  Google Scholar 

  • Bykovskii, F.A., Zhdan, S.A., Vedernikov, E.F.: Continuous spin detonations. J. Propuls. Power 22(6), 1204 (2006)

    Article  Google Scholar 

  • Cailler, M., Darabiha, N., Veynante, D., Fiorina, B.: Building-up virtual optimized mechanism for flame modeling. Proc. Combust. Inst. 36(1), 1251 (2017)

    Article  Google Scholar 

  • Chacon, F., Gamba, M.: Study of parasitic combustion in an optically accessible continuous wave rotating detonation engine. AIAA Paper 2019–0473, (2019)

  • Chen, J.Y., Blasco, J.A., Fueyo, N., Dopazo, C.: An economical strategy for storage of chemical kinetics: fitting in situ adaptive tabulation with artificial neural networks. Proc. Combust. Inst. 28(1), 115 (2000)

    Article  Google Scholar 

  • Christo, F., Masri, A., Nebot, E.: Artificial neural network implementation of chemistry with PDF simulation of H2/CO2 flames. Combust. Flame 106(4), 406 (1996)

    Article  Google Scholar 

  • Fiévet, R.: Effect of vibrational nonequilibrium on isolator shock structure. J. Propuls. Power 34(5), 1334–1344 (2018)

    Article  Google Scholar 

  • Fiévet, R., Koo, H., Raman, V.: Numerical simulation of a scramjet isolator with thermodynamic nonequilibrium. AIAA Paper 2015–3418, (2015)

  • Fiévet, R., Voelkel, S.J., Raman, V., Varghese, P.L.: Numerical investigation of vibrational relaxation coupling with turbulent mixing. AIAA Paper 2017–0663, (2017)

  • Fiorina, B.: Accounting for complex chemistry in the simulations of future turbulent combustion systems. AIAA Paper 2019–0995, (2019)

  • Fiorina, B., Gicquel, O., Vervisch, L., Carpentier, S., Darabiha, N.: Approximating the chemical structure of partically premixed and diffusion counterflow flames using FPI flamelet tabulation. Combust. Flame 140, 147 (2005)

    Article  Google Scholar 

  • Franke, L.L., Chatzopoulos, A.K., Rigopoulos, S.: Tabulation of combustion chemistry via artificial neural networks (ANNs): methodology and application to LES-PDF simulation of Sydney flame L. Combust. Flame 185, 245 (2017)

    Article  Google Scholar 

  • Frolov, S.M., Dubrovskii, A.V., Ivanov, V.S.: Three-dimensional numerical simulation of the operation of a rotating-detonation chamber with separate supply of fuel and oxidizer. Russian J. Phys. Chem. B 7(1), 35 (2013)

    Article  Google Scholar 

  • Giorgetti, S., Coppitters, D., Contino, F., Paepe, W.D., Bricteux, L., Aversano, G., Parente, A.: Surrogate-assisted modeling and robust optimization of a micro gas turbine plant with carbon capture. J. Eng. Gas Turbines Power 142(1), 1 (2019)

    Google Scholar 

  • Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)

    MATH  Google Scholar 

  • Hecht-Nielsen, R.: Theory of the backpropagation neural network. In: Wechsler, H. (ed.) Neural Networks for Perception, pp. 65–93. Elsevier, Amsterdam (1992)

    Chapter  Google Scholar 

  • Herrmann, M., Blanquart, G., Raman, V.: A bounded QUICK scheme for preserving scalar bounds in large-eddy simulations. AIAA J. 44(12), 2879 (2006)

    Article  Google Scholar 

  • Jha, P., Groth, C.: Evaluation of flame-prolongation of ildm and flamelet tabulated chemistry approaches for laminar flames. Combust. Theory Modell. 16, 31 (2012)

    Article  Google Scholar 

  • Jiang, G., Peng, D.: Weighted ENO schemes for hamilton–jacobi equations. SIAM J. Sci. Comput. 21(6), 2126 (2000)

    Article  MathSciNet  Google Scholar 

  • Kaiser, E., Noack, B.R., Cordier, L., Spohn, A., Segond, M., Abel, M., Daviller, G., Östh, J., Krajnović, S., Niven, R.K.: Cluster-based reduced-order modelling of a mixing layer. J. Fluid Mech. 754, 365 (2014)

    Article  Google Scholar 

  • Kapoor, R., Lentati, A., Menon, S.: Simulations of methane-air flames using ISAT and ANN. AIAA Paper 2001–3847, (2001)

  • Kempf, A., Flemming, F., Janicka, J.: Investigation of lengthscales, scalar dissipation, and flame orientation in a piloted diffusion flame by LES. Proc. Combust. Inst. 30(1), 557 (2005)

    Article  Google Scholar 

  • Kingma, D. P., Ba, J.: Adam: A method for stochastic optimization. (2017)

  • Landge, A. G., Pascucci, V., Gyulassy, A., Bennett, J. C., Kolla, H., Chen, J., Bremer, P.-T.: In-situ feature extraction of large scale combustion simulations using segmented merge trees. Supercomputing Conference Paper SC.2014.88, (2014)

  • Lu, F.K., Braun, E.M.: Rotating detonation wave propulsion: experimental challenges, modeling, and engine concepts. J. Propuls. Power 30(5), 1125 (2014)

    Article  Google Scholar 

  • Lu, T., Law, C.K.: A directed relation graph method for mechanism reduction. Proc. Combust. Inst. 30(1), 1333 (2005)

    Article  Google Scholar 

  • Malik, M.R., Isaac, B.J., Coussement, A., Smith, P.J., Parente, A.: Principal component analysis coupled with nonlinear regression for chemistry reduction. Combust. Flame 187, 30 (2018)

    Article  Google Scholar 

  • Mueller, M.A., Kim, T.J., Yetter, R.A., Dryer, F.L.: Flow reactor studies and kinetic modeling of the H2/O2 reaction. Int. J. Chem. Kinet. 31(2), 113 (1999)

    Article  Google Scholar 

  • Murphy, K.P.: Machine Learning: A Probabilistic Perspective. MIT Press, Cambridge (2012)

    MATH  Google Scholar 

  • Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in PyTorch. \(31^{{\rm st}}\) Conference on Neural Information Processing Systems (NIPS), (2017)

  • Peters, N.: Turbulent Combustion. Cambridge University Press, Cambridge (2000)

    Book  Google Scholar 

  • Pitsch, H.: Large-eddy simulation of turbulent combustion. Ann. Rev. Fluid Mech. 38, 453 (2006)

    Article  MathSciNet  Google Scholar 

  • Pope, S.B.: Computationally efficient implementation of combustion chemistry using in-situ adaptive tabulation. Combust. Theory Modell. 1, 41 (1997)

    Article  MathSciNet  Google Scholar 

  • Prakash, S., Raman, V.: Detonation propagation through inhomogeneous fuel-air mixtures. In: Proceedings of the 27th International Colloquium on the Dynamics of Explosions and Reactive Systems (ICDERS), (2019)

  • Prakash, S., Fiévet, R., Raman, V.: The effect of fuel stratification on the detonation wave structure. AIAA Paper 2019–1511, (2019)

  • Prakash, S., Fiévet, R., Raman, V., Burr, J., Yu, K. H.: Analysis of the detonation wave structure in a linearized rotating detonation engine. AIAA J. pp. 1–15 (2019)

  • Raman, V., Hassanaly, M.: Emerging trends in numerical simulations of combustion systems. Proc. Combust. Inst. 37(2), 2073 (2019)

    Article  Google Scholar 

  • Ranade, R., Alqahtani, S., Farooq, A., Echekki, T.: An ANN based hybrid chemistry framework for complex fuels. Fuel 241, 625 (2019)

    Article  Google Scholar 

  • Rubner, Y., Tomasi, C., Guibas, L.J.: The earth mover’s distance as a metric for image retrieval. Int. J. Comput. Vis. 40(2), 99 (2000)

    Article  Google Scholar 

  • Sato, T., Raman, V.: Hydrocarbon Fuel Effects on Non-premixed Rotating Detonation Engine Performance (American Institute of Aeronautics and Astronautics, 2019). AIAA SciTech Forum. https://doi.org/10.2514/6.2019-2023(2019)

  • Sato, T., Voelkel, S., Raman, V.: Analysis of detonation structures with hydrocarbon fuels for application towards rotating detonation engines. AIAA Paper 2018–4965, (2018)

  • Sato, T., Fabian, C., Duvall, J., Gamba, M., Raman, V.: Dynamics of rotating detonation engines with a pintle-type injector. \(24^{{\rm th}}\) International Society for Air Breathing Engines (ISABE) Conference Paper, (2019)

  • Sen, B.A., Menon, S.: Linear eddy mixing based tabulation and artificial neural networks for large eddy simulations of turbulent flames. Combust. Flame 157(1), 62 (2010)

    Article  Google Scholar 

  • Steinley, D.: K-means clustering: a half-century synthesis. Br. J. Math. Stat. Psychol. 59(1), 1 (2006)

    Article  MathSciNet  Google Scholar 

  • Tammisola, O., Juniper, M.P.: Coherent structures in a swirl injector at Re = 4800 by nonlinear simulations and linear global modes. J. Fluid Mech. 792, 620 (2016)

    Article  MathSciNet  Google Scholar 

  • Van Oijen, J.A., de Goey, L.P.H.: A numerical study of confined triple flames using a flamelet-generated manifold. Combust. Theory Modell. 8(1), 141 (2004)

    Article  MathSciNet  Google Scholar 

  • Warnatz, J., Maas, U., Dibble, R.W.: Combustion. Springer, New York (1996)

    Book  Google Scholar 

  • Zhou, R., Wu, D., Wang, J.: Progress of continuously rotating detonation engines. Chin. J. Aeronaut. 29(1), 15 (2016)

    Article  Google Scholar 

Download references

Acknowledgements

This research is supported by the NASA Aeronautics Research Mission Directorate (ARMD) Fellowship under grant No. 80NSSC18K1735 with Dr. Tomasz Drozda of NASA Langley Research Center as technical adviser. The authors would like to thank NASA High-End Computing Capability (HECC) for the generous allocation of computational resources on the NASA Pleiades and Electra supercomputers which were used to generate the datasets.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shivam Barwey.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Appendix

Appendix

1.1 K-Means Algorithm

Recall that centroid \(C_i\) represents a region of phase space that contains some proportion of the total number of snapshots. This region of phase space, denoted \(\mathbb {C}_i\), is a cluster. The centroid that represents each grid point is the one that is the closest to the grid point based on a distance measure. Here, the L\(_2\)-norm in the \(\mathbb {R}^{N_f}\) space is used to compute this distance, which is given by

$$\begin{aligned} d_{i,k} = \sqrt{\sum _{j = 1}^{N_f} (S_i^j - C_k^j)^2}, \end{aligned}$$
(1)

where \(d_{i,k}\) represents the L\(_2\)-norm between the i-th grid point and k-th centroid. The superscript j in Eq. 1 indexes the number of dimensions \(N_f\), or features, in the centroid/grid point vectors. This snapshot-centroid assignment is represented in an association matrix,

$$\begin{aligned} T_{i,k} = {\left\{ \begin{array}{ll} 1 &{} \text {if } S_i \in \mathbb {C}_k, \\ 0 &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$
(2)

With the above descriptions, the full K-means algorithm as detailed in Ref. Arthur and Vassilvitskii (2007) is summarized as follows:

  1. 1.

    Determine the initial distribution of \(N_k\) centroids \(\mathcal{C}\) with k-means++.

  2. 2.

    Assign all N grid points to the nearest centroid as per Eq. 1, accumulating the association matrix \(T_{i,k}\) as per Eq. 2.

  3. 3.

    Update the k-th centroid by computing the center of mass of all the grid points in the k-th cluster,

    $$\begin{aligned} C_k = \frac{\sum \limits _{i=1}^{N}S_i T_{i,k}}{\sum \limits _{i=1}^{N}T_{i,k}},\quad k = 1,\ \ldots , \ N_k, \quad i = 1, \ \ldots ,\ N. \end{aligned}$$
    (3)
  4. 4.

    Repeat (2) and (3) until convergence, where convergence is defined as the point at which an additional iteration would cause all centroids to change by a negligible distance.

1.2 Time-Evolution of Cluster Size

The discussion in Sect.3 assigned physical significance to each cluster in the context of the training data, and then extended the findings into the testing set. To assess the effect of time progression on the segmented flowfield, Fig. 10 shows the time evolution of the cluster size for both training (solid lines) and testing (dashed lines) datasets. The cluster size is defined as the proportion of total number of grid points per snapshot occupied by a particular cluster—as such, for each time instance, the cluster sizes sum to unity. To see which clusters dominate at a particular time instant, the absolute cluster sizes are shown in the left plot of Fig. 10. To better represent how cluster sizes evolve, the right plot shows the absolute sizes normalized by the initial sizes at \(t=0\). Since the clusters themselves have physical meaning, the evolution of cluster size is useful metric when combined with the physical space representation in Fig. 5 in that it both 1) provides a coarse-grained picture of the time evolution of the detonation structure and 2) allows for the comparison of this structure evolution across different datasets.

From Fig. 5, the spatial structure of the delineated clusters is maintained with time progression, with introduction of clusters 2 and 4 on the left-hand side far from the wave front as the detonation wave passes to the right. Regions identified by clusters 2 and 4 are coherent between snapshots as they convect behind the wave front. However, there is transition between clusters 3 and clusters 2/4 as the mixing behavior in the post-detonation region changes with time. The triple point propagation is identified by the movement of cluster 6 with time in a direction along the wave front surface. Thus, the clustering process identifies and tracks the movement of the triple points with reasonable accuracy. Most importantly, the cluster classification is consistent between snapshots, allowing for each cluster to be interpreted with physical relevance to flow structures with time.

Figure 10 displays trends that are both shared different across both datasets. An expected result is the decrease in size of cluster 1 (the ambient region) in both datasets—as the wave progresses through the domain, the ambient domain proportion is reduced with time. Further, the identification of the triple point regions, cluster 6, is nearly consistent with time. This is an important result as this region occurs primarily near the shock front. As a fixed portion of the detonation front exists within each snapshot, the total number of points corresponding to this cluster remains stable. For both the training and testing dataset, cluster 6 is the lowest populated cluster as the triple points are concentrated regions due to wave interactions. In the training dataset, the size of the post-detonation cluster 3 along with deflagration clusters 2 and 4 increase in time. However, as cluster 3 was observed to be contained to a finite region behind the wave front, the cluster size increases slightly as the detonation wave fully enters the frame and reaches a stable size - this is evident in the normalized cluster size evolution (right plot of Fig. 10) of the training dataset. On the other hand, the sizes of cluster 2 and 4 increase a greater rate, with the increase in cluster 2 most dominant for both the training and testing dataset. This is expected, as the post-detonation regions represented by cluster 2 increase in size to match the reduction in cluster 1. Cluster 5 increases rapidly for the training dataset whereas this region is largely constant for the testing set; the shock-separated region increases as the wave progresses to the right in the LMDE, but in the stratified mixture case, the amount of entrained high-pressure gas is consistent prior to homogenization due to post-detonation mixing.

The rate of change in cluster size in the training dataset is more drastic in comparison to that of the testing set as the the training set features distinct regions, such as the shock-separated region, that are enforced by the fuel-air distribution due to injection. In the testing set, the stratification ahead of the wave is distributed more randomly, and does not ensure one cluster will dominate within different locations perpendicular to the shock front. Ultimately, the interpretation of time-evolution of cluster size allows for a unique and useful way to (a) assess a coarse-grained surrogate of the evolution of the detonation wave structure, and to (b) compare the evolution across different detonation configurations.

Fig. 10
figure 10

Absolute cluster sizes as percentage of total grid points (left) and cluster sizes normalized by the sizes at \(t=0\) (right). Colors represent different cluster numbers. Solid lines correspond to training set and dashed lines to the testing set. Note that lines corresponding to testing set end at a lower maximum time (less testing snapshots)

1.3 Feature Importance in Classification

The discussion below explores the finer details related to conditional distributions of grid points within each cluster. In particular, the information provided by the various features as it relates to the output delineations is analyzed, and a pathway for quantifying feature importance is provided.

Figure 11 shows typical pressure versus specific volume relations for the training set. The points corresponding to each cluster are identified by their respective colors, and the centroids are indicated by the larger white-enclosed markers. An expected trend is observed for cluster 1, which is an outlier in this space since this cluster represents ambient chemical conditions. Some separation is seen along the density axis for cluster 5, whose centroid occupies the highest density value. This is evidenced by the its spatial distribution in Fig. 5, which revealed that this cluster strongly represents a high-compression minimum-reaction region. Interestingly, the peak pressures near 50 atm are realized by points belonging to the cluster 6 (the region near the wavefront encapsulating triple point structures). However, the presence of overlapping densities in the pressure versus specific volume space (especially for clusters 2, 3, 4, and 6) is telling with regards to detonation wave structure classification: the primary role of the pressure and density features lies in 1) creating the delineation between reacting and ambient portions of the domain, and 2) identifying cluster 5 as a particularly high-density region. Ultimately, the remaining 7 features must contain the additional contribution to the variation in cluster-based probability densities of the grid points for the production of the segmented field structures. In other words, many more axes must be added to Fig. 11 to fully explain the regime delineations produced in Fig. 5.

Fig. 11
figure 11

Representation of pressure versus specific volume for all clusters. The larger points with white outline indicate centroids. Insets indicate cluster-specific scatter plots. Colors indicate cluster number

The above discussion regarding Fig. 11 necessitates a more complete analysis of feature contribution to the overall delineation output. This relies on the interpretation and manipulation of probability distribution functions (PDFs) conditioned on both cluster number and feature. To illustrate this, Fig. 12 shows cluster-conditional PDFs for three of the nine total features: temperature (left), pressure (middle), and \(Y_{HO_2}\) (right). Each color corresponds to the grid point distribution in a single cluster. By analyzing such PDFs for each feature, one can assess how effective each feature is in determining the differences between grid points belonging to different clusters. If cluster PDFs for a feature are similar to one another (in both mean and variance), then that particular feature contributes less to the distance measure used in the clustering. Therefore, the degree of separation, or dissimilarity, between cluster distributions in Fig. 12 is related directly to the “importance” of those respective features in the segmented field representation. As expected, in both temperature and pressure distributions, cluster 1 is the outlier. Of the non-ambient clusters, there are much greater differences in distribution in for temperature than for pressure (though the pressure distributions still display differences in variation about the mean). Thus, for these clusters, it can be surmised that temperature is a more contributing feature in determining the delineations shown in Fig. 5, in particular for regions within the deflagration regime. On the other hand, for \(Y_{HO_2}\), the PDFs for all the clusters are much more stacked, implying lower overall delineation contribution.

Fig. 12
figure 12

Cluster PDFs conditioned on temperature (left), pressure (middle), and \(Y_{HO_2}\) (right) for the testing dataset (trends for training set were identical)

The analysis of the distributions in Fig. 12 allows for a qualitative assessment of feature importance in the classifications shown in Fig. 5 based on cluster distributions. For a more quantitative analysis, a PDF-based measure known as the Earth Mover’s Distance (EMD (Rubner et al. 2000)) can be used to create an importance metric for the features. The EMD is defined for two PDFs (which can be empirical) over a specified domain support, and is given by \(\mathcal {D}(p_1, p_2)\), where \(p_1\) and \(p_2\) are PDFs, and \(\mathcal D\) is the EMD function. The following distance properties hold: \(\mathcal {D}(p_1, p_2) \ge 0\), \(\mathcal {D}(p_1, p_1) = \mathcal {D}(p_2, p_2) = 0\), and \(\mathcal {D}(p_1, p_2) = \mathcal {D}(p_2, p_1)\). A summary of the mathematical formulation and definition is provided in Appendix A.4. Informally, if \(p_1\) and \(p_2\) are visualized as piles of dirt, \(\mathcal {D}(p_1, p_2)\) represents the “cost” of morphing one pile of dirt into the other. The factors which contribute to this “cost” are 1) the distance required to move the dirt, and 2) the amount of dirt present in the movement. The EMD is an appropriate measure here over other PDF-based distances such as Kullback-Leibler (KL) divergence, as it is well-defined for PDFs with nonzero densities in the feature space. The EMD usefully takes into account differences in distribution means as well as variance when determining the final dissimilarity score (the EMD of two identical distributions centered around different values will result in a positive distance, as will the EMD of two distributions with the identical centers but different standard deviations). Thus, using the EMD, a quantifiable metric for feature importance, denoted \(\mathcal {I}(F_i)\), where \(F_i\) is the i-th feature in \(\mathcal F\), is defined as

$$\begin{aligned} \mathcal {I}(F_i) = \sum _{j=1}^{N_k} \Big (\sum _{k=1}^{N_k} \mathcal {D}(p_j^{F_i}, p_k^{F_i})\Big ), \quad i=1,\ldots ,N_f. \end{aligned}$$
(4)

In Eq. 4, \(p_j^{F_i}\) represents the PDF of the j-th cluster corresponding to feature \(F_i\). The importance metric is essentially a sum of the EMD combinations of cluster distributions for a particular feature. The inner summation in Eq. 4 represents the contribution of j-th cluster to the overall importance. By this metric, if feature 1 has lower importance than feature 2, feature 1 is less significant overall in the clustering output, i.e. it is less crucial in illuminating differences between grid points belonging to different clusters. This metric can be used to effectively downsample the feature selection used to generate the labels in Fig. 5.

Figure 13 shows importance metrics for each feature in the training and testing datasets. Note that scaled quantities were used in the computation to allow for comparison of importance across different features. The colors in each bar correspond to the cluster contribution to the importance of that particular feature, which itself can be useful in the identification of feature-regime relationships. For example, in the case of \(Y_{H_2 O}\), cluster 3 (active predominantly in the deflagration regime) contributes a large amount to the importance. On the other hand, for \(Y_{H}\) and \(Y_{O}\), cluster 6 (active near the wavefront and triple point regions) dominates the importance value. Thus, an ability to quantitatively assign a most “relevant” cluster to a particular feature in the context of delineation power can be obtained.

Interestingly, Fig. 13 shows that the distribution of importance is similar between training and testing sets. In both cases, the metric implies that \(Y_{HO_2}\) and \(Y_{H_{2}O_{2}}\) are noticeably the least significant in the clustering. To illustrate the utility of the EMD-based metric, the clustering is performed again using a reduced feature set without \(Y_{HO_2}\) and \(Y_{H_{2}O_{2}}\). These results are shown in Fig. 14a, c for a single training and testing snapshot, respectively. Clustering using the downsampled set of features results in a segmented field that is nearly identical to that generated from the original, larger feature set. The same characteristics of the detonation wave and the regions of interest are captured by the reduced feature set. Thus, the importance metric provides useful insight into the thermodynamic properties and species necessary to demarcate the different modes of combustion.

The comparison with the original and reduced feature sets is also shown in Fig. 14b, d. These are distance plots, which is a proxy measure of uncertainty in the labels used to generate the segmented fields. They depict the distances of each grid point to its assigned centroid; high values in the distance field correlate directly with high classification uncertainty. In both training and testing sets, points of highest distance occur near the triple points. This means that despite the fact that the presence of the triple points is captured within clusters 5 and 6 to an acceptable level, the uncertainty in classification near the triple points is relatively high with the chosen parameters. This is an indicator to the chemical complexity in this portion of the detonation wave. However, alongside this, an important note is that this distance field is practically unchanged when clustering with the reduced feature set (i.e. the removal of \(Y_{HO_2}\) and \(Y_{H_{2}O_{2}}\) did not cause noticeable increase in label uncertainty), meaning that the clustering output as a whole has been preserved in the process of reducing the feature set. For this reason, it is important that the initial classification is performed with a full (or very large) feature set available from the data for general applications. Using the original clustering followed by the application of the importance metric, a lower feature set can then be obtained to identify both redundant and important regime-dependent features and additionally, reduce the computational cost for labeling in potential online applications.

Fig. 13
figure 13

Feature importance measures with cluster contributions for the training (left) and testing (right) datasets. Cluster contributions for importance value for each feature by respective colors

Fig. 14
figure 14

Comparison of the clustering output (only one snapshot shown) between original full feature set and the reduced set. a Segmented fields for training dataset. b Distance fields for training dataset. c Segmented fields for testing dataset. d Distance fields for testing dataset

1.4 Earth Mover’s Distance

The formulation of the EMD is presented here in the context of Sect. 3. For a given feature f, consider two discrete densities p(A) and p(B)—these can be thought of as distributions of data corresponding to the feature f in two clusters. These discretized densities (not necessarily normalized) can be represented as histograms in the set notation

$$\begin{aligned} \begin{aligned} \mathcal {H}_A = \{(\alpha _1, p_1(A)),\text { ...}, (\alpha _M, p_M(A))\}, \\ \mathcal {H}_B = \{(\beta _1, p_1(B)),\text { ...}, (\beta _N, p_N(B))\}, \end{aligned} \end{aligned}$$
(5)

where \(\alpha _i\) (resp. \(\beta _j\)) represents the center of bin i (resp. j) and \(p_i(A)\) (resp. \(p_j(B)\)) represents the number of data points in bin i (resp. j). Note that the number of bins or the number of discrete points in each histogram does not have to be equal; here, \(\mathcal {H}_A\) and \(\mathcal {H}_B\) have a total of M and N bins respectively.

As stated in Sect. A.3, simply put, the EMD metric is a strictly positive measure that quantifies the degree of separation between two PDFs. This measure is obtained by minimizing a weighted sum of distances known as the work W, where

$$\begin{aligned} W(p(A), p(B), Q) = \sum _{i=1}^M \sum _{j=1}^{N} q_{i,j} d_{i,j}. \end{aligned}$$
(6)

In Eq. 5, \(Q = \{q_{i,j}\}\) is the set of weights (sometimes referred to as the optimal flow), and \(d_{i,j}\) represents a distance measure between two points in the discrete PDFs (often taken as the Euclidean distance, not to be confused with the similar term in Eq. 1). The weights are subject to the following constraints:

$$\begin{aligned}&q_{i,j} \ge 0, \end{aligned}$$
(7)
$$\begin{aligned}&\sum _{j=1}^{N} q_{i,j} \le p_i(A), \end{aligned}$$
(8)
$$\begin{aligned}&\sum _{i=1}^{M} q_{i,j} \le p_j(B), \end{aligned}$$
(9)
$$\begin{aligned}&\sum _{i=1}^M \sum _{j=1}^N q_{i,j} = \min {\{\sum _{i=1}^M p_i(A), \sum _{j=1}^N p_j(B)\}}. \end{aligned}$$
(10)

In other words, the upper bound of each weight \(q_{i,j}\) is provided by the number of data points in bins i and j of \(\mathcal {H}_A\) and \(\mathcal {H}_B\) respectively. Finally, the EMD is defined as

$$\begin{aligned} \mathcal {D}(p(A), p(B)) = \frac{\sum _{i=1}^M \sum _{j=1}^N q^{*}_{i,j} d_{i,j}}{\sum _{i=1}^M \sum _{j=1}^N q^{*}_{i,j}}, \end{aligned}$$
(11)

where \(Q^{*} = \{q^{*}_{i,j}\}\) is the optimal set of weights obtained by minimizing Eq. 5 (Rubner et al. 2000).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Barwey, S., Prakash, S., Hassanaly, M. et al. Data-driven Classification and Modeling of Combustion Regimes in Detonation Waves. Flow Turbulence Combust 106, 1065–1089 (2021). https://doi.org/10.1007/s10494-020-00176-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10494-020-00176-4

Keywords

Navigation