Data-driven approaches to study the spectral properties of chemical structures

The molecular energy, which is the sum of all eigenvalues, is crucial in determining the total π-electron energy of conjugated hydrocarbon molecules. We used machine learning techniques to calculate the energy, inertia, nullity, signature, and Estrada index of molecular graphs for bismuth tri-iodide and benzene rings embedded in P-type surfaces within 2D networks. We applied MATLAB to extract the actual eigenvalues from the data and developed general equations for these molecular properties. We then used these equations to estimate the values and compared them to the actual values through graphical analysis. Our results demonstrate the potential of data-driven techniques in predicting molecular properties and enhancing our understanding of spectral theory.


Introduction
In the continuously developing discipline of chemistry, a comprehensive understanding of molecular energy and spectral characteristics is imperative for predicting the behavior and interactions of various molecules, such as conjugated hydrocarbon molecules.These molecules exhibit unique chemical properties, like increased stability and reactivity, due to the delocalization of π electrons throughout their structure [1][2][3].The concept of molecular energy, which is the total sum of all eigenvalues, is central to studying these molecules.This concept is closely linked to spectral theory, a branch of mathematics that examines the relationship between eigenvalues and eigenvectors of linear operators.Spectral theory provides valuable insights into molecular properties and interactions by analyzing and classifying molecular structures based on their energy and spectral characteristics [4][5][6].
Researchers increasingly adopt interdisciplinary approaches to tackle complex scientific questions as chemistry advances [7].Data science and machine learning have emerged as powerful tools to revolutionize the study of molecular properties [8].These techniques provide a robust framework for analyzing extensive and complex datasets, identifying patterns, and building predictive models that significantly improve our understanding of molecular behavior [9,10].Machine learning, a subset of artificial intelligence, can train algorithms to recognize patterns and relationships within data, leading to predictions or conclusions regarding molecular behavior.These algorithms can be applied to various tasks, such as predicting molecular energy, determining chemical reactivity, and identifying potential drug candidates [11][12][13].
In spectral theory and molecular energy, data science and machine learning can be used to develop efficient computational methods for estimating molecular properties.By harnessing these tools, researchers can overcome the limitations of traditional approaches, which often depend on time-consuming and labor-intensive experiments.Additionally, these techniques enable the discovery of new relationships and patterns within data, potentially leading to groundbreaking findings and advancements in the field of chemistry [14][15][16].
The structure of a chemical compound can be represented by a molecular graph, which can be transformed into various matrices by utilizing several graph properties.The atoms of a structure are connected via a bond, which is directed to the adjacency and distance matrices.The polynomial obtained from the adjacency or distance matrix can be considered a structure's signature.The eigenvalue obtained from the polynomial considered the molecular descriptor and was used in the quantitative structure-property/activity relationship.
Graph of a molecule is a mathematical entity defined as G = (V,E), where V is the set of vertices, also called atoms and E is the set of all the edges of the molecule, also called bonds.Usually, in molecular structure, hydrogen atoms are not contemplated.Adjacency matrices A = [ a ij ] of molecules are square matrices of order n, and eigenvalues λ 1 , λ 2 , λ 3 , …, λ q− 1 , λ q of A are called the eigenvalues of the molecular structure.The absolute sum of all the eigenvalues is known as the energy of a molecular structure G. Mathematically, it is denoted as The set consisting of eigenvalues is also named as the spectrum of the graph G.The spectral characteristics of the graph have been expansively investigated.There are many applications of graph theory in the field of chemical graph theory [17][18][19].Among the many applications of spectral theory in chemistry, one of them is based on the adjacent equivalence between the eigenvalues of the structure and the molecular orbital energy level of electrons in conjugated hydrocarbons [20,21].
Rank and nullity play a vital role in graph theory and are associated with the area of linear algebra.Rank represents the sum of molecular structures' positive and negative inertia index.The nullity of a structure η(G) is the number of roots having zero value in the characteristic polynomial of A(G) and represents the stability of the molecular structure.If the molecule is stable, closed-shell means its nullity is zero, whereas if it is unstable, it is highly reactive, and open-shell means its nullity is more significant than zero.Every molecule structure can be expressed as square matrices with only 1's and 0's entries p(G), the positive eigenvalues correspond to the positive inertia index, while n(G), the negative eigenvalues correspond to the negative inertia index.
In recent decades, another important concept known as the Estrada index was introduced by Ernesto Estrada and defined as EE(G) = ∑ n i=1 e λi .Initially, it was applied to quantify the degrees of folding of long-chain molecular structures, particularly proteins.In a continuation of Estrada index's applications, many studies have been done [22][23][24][25][26][27].
By the motivation of the above mathematical concepts, the structure of the molecule and its optimal properties are measured through these concepts in the current study.For the investigation, two famous molecules, Bismuth tri-iodide, and benzene, are considered due to their huge applications in chemistry, chemical engineering, and other fields of science.We measure the energy and Estrada index of these structures.In addition, we have also calculated the inertia, nullity, and signature of the molecules.This study aims to minimize the error between molecular graphs' exact and estimated values through polynomial curve fitting.We focus on two specific structures: Bismuth Tri-iodide (BiI3) and a benzene ring embedded in a p-type surface.By comparing the exact and estimated energy and Estrada index for each structure, we aim to demonstrate the effectiveness of our methodology.To achieve this, we employ a multi-step computational process using various software tools.

Brief description of Bismuth Tri-iodide (BiI 3 )
Bismuth tri-iodide (BiI 3 ) is an inorganic structure that is produced by the chemical reactions of iodine and bismuth; this motivated the interest of qualitative studies [28].BiI 3 is extremely helpful in subjective inorganic investigations.It was experimentally shown that Bi-doped glass optical strands are among the most capable energetic laser media.Various types of Bi-doped fiber strands are formed and depleted to make Bi-doped fiber lasers and optical loudspeakers [29].BiI 3 is a structure consisting of three layers, such that a bismuth atom is packed in between iodide particles to form a repeated I − Bi − I plane [30].Each monolayer unit of BiI 3 is stacked with each other via Vander Walls forces [31].This structure provides ideal 2D material for photovoltaic cells, optoelectronics, and ambient temperature X-rays/gamma rays detectors [32].These stacking patterns and interlayer distance affect the electronic structure and stability [33].These electronic properties are modified by intercalation, chemical doping, and mechanical strength [34,35].Particularly, the optical properties are significantly affected by the interlayer distance.The BiI 3 forms the material for photodetectors with excellent durability and stability with different bending strains, making them suitable for flexible devices such as for optoelectronics with advanced technologies like optical fiber communication, flexible imaging technologies, complex environmental monitoring, and wearable light sensors [36].In addition, BiI 3 also gains attention in gamma-ray detectors or radiographic imaging owing to its strong photon inhibition power due to its high density (5.78 g/cm3), large band gap, and greater effective atomic number.These important characteristics are essential for huge resolutions of room-temperature gamma-ray spectroscopies [37,38].The tremendous properties of BiI 3 also lead to the tactile applications of smart sensors, photovoltaic cells, human-machine interfacing and photonics.Through the modification of the morphologies the BiI 3 can convert into single and twin plates [39].

Computational methodology for molecular graph analysis
To measure the energy and Estrada index, different types of computation work are done by different software as shown in Fig. 1.Following the procedure outlined in Fig. 1, we first used HyperChem to draw the molecular structure of BiI 3 , as depicted in Fig. 2.

I. Masmali et al.
Secondly, an adjacency matrix of the molecular graph is constructed by using TopoCluj.Third, the matrix's eigenvalues are calculated with Matlab's help.Finally, a polynomial curve of degree two is built through the eigenvalues attained from the adjacency matrix of the molecular structure using cf Toolbox in Matlab.

Energy and Estrada index of BiI 3
The 2 s-order polynomials which display the energy and Estrada index of the Bismuth Tri-iodide molecule are given by Eqs. ( 1) and ( 2) respectively.
Eqs. ( 1) and ( 2) can further be written in the form of coefficients as shown in Eqs. ( 3) and ( 4): The numerical results for the energy of and Estrada index of BiI 3 through Eqs ( 1) and ( 2) are calculated at different values of m in Table 1.
Further, estimated values of energy and Estrada are computed at different numbers of unit cells by using Eqs.( 1) and ( 2) and compared with exact values of energy and Estrada index obtained through constructing the adjacency matrix, as shown in Figs. 3 and 4.
Here, exact values are denoted by a blue dotted line and estimated values by an orange line.These figures show a good agreement between exact and estimated values.For the close view, we also calculate the Mean Absolute percentage error between these values in Tables 2 and 3.The relative error is important because it gives us a valuation of the accuracy of calculations or projections.This allows us to analyze the method we use to identify areas for potential improvements.
Another way to support this study is to use another statistical method in which we first find the mean absolute error and the standard deviation of the errors across all data points to understand the overall accuracy of the estimation method.The average absolute error for the above data is 0.0763, and the standard deviation is 0.02423519.The normal distribution curve of the data given in Table 2 is shown in Fig. 5.
The average absolute error for the above data is 0.0855 and the standard deviation is 0.09604087671403254.The normal distribution curve of the data given in Table 3 is shown in Fig. 6.
In Tables 2 and it is noticed that the exact/actual values of the energy of BiI 3 are smaller than the value of energy of BiI 3 obtained from the quadratic equations, i.e.E ext (B) < E est (B), where, we denote E ext exact and E est the estimated values of energy.Moreover, errors are positive among these values of energy, i.e.Error > 0. Similarly, in Table 3, we have noted that actual values of Estrada index of BiI 3 are always more than the estimated values, i.e.EE ext (B) > EE est (B), where, EE ext and EE est represents the exact and estimated value of Estrada index, respectively.We find that the mean absolute percentage error of energy of BiI 3 is 0.035 and mean absolute percentage error of Estrada index of BiI 3 is 0.00515.This error analysis shows that the relative errors are generally very small, indicating that the estimated values EE est (B) are quite close to the exact values EE ext (B).This suggests that the estimation method is accurate for this dataset, with errors typically less than 0.1 % relative to the exact values.

The inertia, nullity and signature of BiI 3
This section analyzes the molecular structure of BiI 3 and its stability through optimal properties.For this, we calculate the numerical results of inertia, nullity and signature of BiI 3 in Table 4.In Table 4, p(B) shows the positive inertia index whereas n(B) represents the negative inertia index.When the vertical unit cells n are increased at constant m horizontal unit cells, it is found the balance between positive and negative inertia indexes.The difference between positive and negative eigenvalues is called the signature s(B) and found no difference due to balanced behavior of inertia indexes.The results for nullity η(B) are obtained by ac- counting the eigenvalues having zero value in the characteristic polynomial.The nullity of BiI 3 is increased with increasing of vertical unit cells of structure as shown in Table 4.

Benzene Ring Embedded (BRE) in P-type-surface
P-type networks are embeddings of sp 2 carbons in triply periodic surfaces with the same regularity of single-node simple cubic Bravis tilings.It linked among the 230 symmetry classes of Euclidean space.In these embeddings, the edges of the structure are without crossings, and it splits the space into two disjoint regions.A molecular structure consists of entirely sp 2 atoms, is embeddable in a triply periodic surface, with nonpositive Gaussian arc.These types of carbon structures are called Schwarzites.Schwarzites have exceptional electronic, magnetic, and optical characteristics.The Shwarzites, which are embedded in P-type surfaces, decorate the Bravais lattice in three-dimensional Euclidean space.P-type surfaces can be filled with various coverings of polygons having more sides than hexagons, which are required to create the negative Gaussian curvature.

Table 4
The inertia, nullity and signature of BiI 3 .

Energy and Estrada index of BRE
The same procedure is followed to measure the energy and Estrada index of BRE as for BiI3.The structure of BRE is constructed through HyperChem, as shown in Fig. 7.
The results of energy and Estrada index in the form of 2 s-order polynomials are shown in Eqs. ( 5) and ( 6) respectively.
To check the accuracy of the result, we have compared the estimated values of energy and Estrada index which are obtained from Eqs. ( 5) and ( 6) with the exact values of energy and Estrada index in Figs. 8 and 9 and found a good agreement between the results.
Here, exact values are denoted by a blue dotted line and estimated values by an orange line.In addition, we calculate the mean absolute percentage error between these values in Tables 6 and 7.
The average absolute error for the above data is 0.3761 and the standard deviation is 0.42581697.The normal distribution curve of the data given above is shown in Fig. 10.
The average absolute error for the above data is 0.0855 and the standard deviation is 0.09604087671403254.The normal distribution curve of the data is shown in Fig. 11.
We have observed that the exact value of the energy of BRE is less than first two terms, the gradual increase from the estimated values of energy of BRE is seen.Similarly, exact value of Estrada index of BRE is less than the estimated values of Estrada index of BRE, for few terms, which later have a sudden change in behavior for the remaining five values.We find that the mean absolute percentage error of energy of BRE is 0.04283 and mean absolute percentage error of Estrada index of BRE is 0.00523.

The inertia, nullity and signature of BRE
The molecular structure of BRE and its stability are analysis through the numerical results of inertia, nullity and signature in Table 8.In Table 8, p(BRE) and n(BRE) shows the positive and the negative inertia indexes and found a balance between results.The signature of molecular structure is displayed by s(BRE) and found the zero values because of the balance behavior of inertia indexes.In the results of nullity denoted by η(BRE), found a constant behavior at all unit cells of the structure as shown in Table 8.

Conclusion
We studied the energy, Estrada index, inertia, nullity, signature for BiI 3 and BRE in current investigation.The following inequalities Fig. 7. Benzene ring embedded in P-type surface.π-electrons correspond to the each eigen value of the graph under consideration.If we calculate the energy for a single unit, such as (1,1), we obtain a positive integer.As we increase the order horizontally, for example, moving from (1,1) to (1,2), (1,3), (1,4) and so on, the energy of each subsequent unit will also be increased.This is because the number of vertices grows, and consequently, the order of the adjacency matrix also increases.Similarly, we proceeded vertically, for example, from (1,1) to (2,1), (3,1), and so on.As the order increased, handling the calculations became increasingly difficult.Therefore, we generalized our graph and applied statistical methods to analyze the general behavior of the data.To determine the energy of a particular unit, we generated a polynomial of a certain order to estimate the energy of the desired unit.The positive eigenvalues were linked with the antibonding level, negative eigenvalues were linked with bonding levels, and zero eigenvalues were associated with the nonbonding level.By employing MATLAB to extract actual eigenvalues from the data and generate general equations, we aimed to bridge the gap between the actual and estimated values of these molecular properties.We used the Mean Absolute Percentage Error (MAPE) as the standard metric for error in Table 2, Table 3, Tables 6, and Table 7.When the MAPE between actual and expected values was close to zero, it generally indicated that the model's predictions were accurate on average.Additionally, we observed that the energy of BiI 3 and BRE was 0.035 and 0.04, respectively, from units (3,1) to (3,10).Similarly, we observed that the Estrada index of BiI3 and BRE was 0.00515 and 0.0523, respectively, from units (3,1) to (3,10).We have used two other methods to support our results.In the first method, we have found the relative error percentage (%) and in the second method, we have found the average absolute error and the standard deviation along with their normal curve.In the first method, the relative error percentage (%) was less than 0.1 %, which suggested that this method was accurate for this dataset, with errors typically less than 0.1 % relative to the exact values.In the second method, a small value of standard deviation indicated that the data points were very close to the mean, reflecting low variability, high consistency, tight distribution, and predictability.
For future research, we propose to conduct comparative studies with other data-driven and other machine learning approaches, such as neural networks, SVM, and decision trees, to evaluate the relative performance and applicability of different methods.Apply the polynomial curve fitting method to a wider range of molecular structures and chemical systems to test its generalizability and robustness.Explore the integration of more advanced machine learning techniques to improve prediction accuracy and computational efficiency.Investigate the use of our methodology in real-world applications, such as drug discovery, materials science, and chemical engineering, to assess its practical utility and impact.Develop a more comprehensive framework that combines various data-driven approaches for a holistic understanding and prediction of molecular properties.The error analysis method we used can be applied to any model.Relative error percentage is scale-independent, making it useful for comparing errors across different datasets or models.It provides a direct interpretation of how large the errors are relative to the actual values, making it easier to understand the model's performance in practical terms.Standard deviation offers a well-rounded understanding of the model's performance, highlighting  different aspects of accuracy and consistency, which can be crucial for model evaluation and improvement.

Fig. 1 .
Fig. 1.Procedure to calculate the energy and Estrada index of molecular structures.

Fig. 3 .
Fig. 3. Comparison of exact and estimated values of energy of BiI 3 .

Fig. 5 .
Fig. 5. Normal distribution curve for the absolute errors in energy estimation of BiI 3 .

Fig. 6 .
Fig. 6.Normal distribution curve for the absolute errors in Estrada index estimation of BiI 3 .

I
. Masmali et al.E ext (B) > E est (B), EE ext (B) > EE est (B), E ext (BRE) > E est (BRE) and EE ext (BRE) > EE est (BRE) have been observed between exact and estimated values of energy of BiI 3 and BRE.In addition, since the nullity of BiI 3 and BRE is zero so the molecule of these structures is stable and closed shell.The numerical values of the energy told us about the correlation between the bond energy of π-electrons and every orbital in

Fig. 8 .
Fig. 8. Exact and estimated comparison of energy of BRE.

Fig. 9 .
Fig. 9. Exact and estimated comparison of Estrada index of BRE.

Fig. 10 .
Fig. 10.Normal distribution curve for the absolute errors in energy estimation of BRE.

Fig. 11 .
Fig. 11.Normal distribution curve for the absolute errors in Estrada index estimation of BRE.

Table 1
The quadratic equations of the energy and Estrada index of BiI 3 .

Table 2
The exact values E ext (B) and estimated values E est (B) of the energy of BiI 3 .

Table 3
Exact values EE ext (B) and estimated values EE est (B) of the Estrada index of BiI 3 .

Table 5
The quadratic curves for the Energy and Estrada index of BRE.+ 351.98 n + 10.5054 3.9665 n 2 + 701.3630 n + 1.6310

Table 6
The exact values E ext (BRE) and estimated values E est (BRE) of the energy for BRE.

Table 7
The exact values EE ext (BRE) and estimated values EE est (BRE) of the Estrada Index for BRE.

Table 8
The inertia, nullity and signature of BRE.