Discrimination and chemical composition quantitative model of Raw Moutan Cortex and Moutan Cortex Carbon based on electronic nose and machine learning

: Raw Moutan Cortex (RMC) is a traditional medicinal material commonly used in China. Moutan Cortex Carbon (MCC) is a processed product of RMC by stir-frying. As raw and processed products of the same Chinese herb pieces, they have different effects. RMC has the effects of clearing heat and cooling blood, promoting blood circulation and removing blood stasis, but MCC has the contrary effect of cooling blood and hemostasis. Therefore, it is necessary to distinguish them effectively. The traditional quality evaluation method of RMC and MCC still adopts character identification, and mainly relies on the working experience and sensory judgment of employees with experience. This will lead to strong subjectivity and poor repeatability. And the final evaluation result may cause inevitable errors and the processed products with different processing degrees in actual production ， which affects the clinical efficacy. In this study, the electronic nose technology was introduced to objectively digitize the odor of RMC and MCC. And the discrimination model of RMC and MCC was constructed in order to establish a rapid, objective and stable quality evaluation method of RMC and MCC. According


Introduction
Raw Moutan Cortex (RMC) is the dry root bark of Paeonia suffruticosa Andr [1].It was first published in shennong herbs classic [2,3].It is a traditional medicine commonly used in China.It tastes bitter, spicy and slightly cold, and has the effect of clearing heat and cooling blood, promoting blood circulation and removing stasis.Moutan Cortex charcoal is the processed product of RMC.It has the function of cooling blood and stopping bleeding.In the actual processing process of MCC, due to the different production conditions, equipment and people's subjective judgment and other reasons, the processing results are often too excessive or not reach the standard, so as to obtain the different processed product, such as light MCC (LMCC), standard MCC (MCC), and heavy MCC (HMCC).Underprocessing or overprocessing will affect the effectiveness of drugs, and then affect the clinical efficacy.
At present, the quality control of MCC mainly adopts the traditional quality evaluation methodfeature recognition, namely [4] "Quality evaluation based on feature recognition".Chinese Pharmacopoeia and local Chinese medicine treatment specifications [5] describe MCC as "dark brown on the outer surface, brown on the inner surface, with a burnt aroma, slightly bitter and astringent taste".However, feature recognition often depends on the experience of practitioners and people's sensory judgment, which is often easily affected by subjective feelings.The final evaluation may cause inevitable error and poor repeatability, and the processed products with different processing degrees in actual production.Now, many research have studied about the Chemical changes in RMC and its different products, these results showed that most compounds were decreased with the deepening of processing degree and the increase of temperature, such as catechin, paeonol, quercetin, Kaempferide, isorhamnetin and tannin, While the content of gallic acid and 5-HMF were firstly increased with the extension of processing time and then began to decline [6][7][8].And at the same time the pharmacodynamics studied showed that tannins such as catechin have astringent and hemostatic effects [9][10][11].Paeonol had the effect of promoting blood circulation and removing blood stasis [12,13].5-HMF was an aldehyde produced by dehydration of glucose and other monosaccharide compounds under high temperature or weak acid conditions.As the temperature continues to rise, 5-HMF is easily decomposed into levulinic acid and formic acid, which was a marker of heating process [14,15].Chemical components are the material basis of pharmacodynamic effects.Different chemical components in different processing degrees will certainly cause the change of effects, and then affect the clinical efficacy.Now, high performance liquid chromatography (HPLC), gas chromatography-mass spectrometry (GC-MS) [16], thin layer chromatography (TLC) and other analytical methods [17,18] have also used for quantitative quality identification of MCC.Although these methods have great advantages in detection accuracy, they also have some disadvantages, such as sample destruction, large amount of chemical reagents consumption, long analysis time, and difficulty in obtaining sample quality evaluation results quickly in the production process.Therefore, it is necessary to establish a fast, nondestructive and sensitive method to evaluate and control the quality of RMC and MCC.
Electronic nose is an electronic sensing instrument.It simulates human's sense of smell to obtain sample odor data, and performs objective digital processing of odor [19,20].It is mainly composed of gas sensor array, signal processing unit and pattern recognition [21].It has the advantages of simple sample pretreatment, convenient operation and fast reaction speed.The quality, authenticity and processing degree of Traditional Chinese medicine determine the characteristics and intensity of odor to a certain extent.On the other hand, the odor of traditional Chinese medicine is directly related to its internal chemical composition, which can reflect the internal nature and become the correlation point between external quality performance and internal material basis.For example, MOS Type Sensor-Array and machine learning were used to classify and identify the Potato Cultivars [22].MAU-9 electronic-nose MOS sensor array components and ANN classification were used to discriminate the herb and fruit essential oils [23].Opto-electronic nose coupled to a Silicon Micro Pre-Concentrator Device were used to select sensing of flavored waters [24].
In this study, electronic nose and machine learning were used to discriminate and quantitative analysis chemical composition of RMC and MCC.Firstly, HPLC was used to determine the contents of gallic acid, 5-hydroxymethylfurfural, paeoniflorin and paeonol in different processing levels of MCC.Secondly, Electronic nose was used to determine the smell information of different processing degrees of MCC.Then PCA, SVM and other methods were used for qualitative identification.Finally, the PLSR and SVR quantitative models were compared and analysis the content of galic acid, 5hydroxymethylfurfural, paeoniflorin and paeonol in RMC and MCC.It provided a rapid, simple and non-invasive monitoring method to quality evaluation the RMC and MCC.

Officinal material
27 batches of RMC pieces were collected and purchased from the pharmaceutical companies all over the country.It was identified by Associate Professor Liu Jizhu, School of traditional Chinese medicine, Guangdong Pharmaceutical University.Voucher specimens were deposited at the Herbarium Centre, Guangdong Pharmaceutical University.Part of each batch was processed at 180C for 3-5, 6-8 and 9-11 min respectively, and different processing degrees products were achieved, including Light Carbon (LMCC), Standard Carbon (SMCC) and Heavy Carbon (HMCC).[6].All samples were crushed by a high-speed multifunctional grinder (JP-150A, Jiupin Industry and Trade Co., LTD., Yongkang, China) and then passed through an 80-mesh sieve, and then dried at 45℃ and sealed for preservation.All the samples were 108 batches and were summarized in Table 1.

E-Nose equipment and measurements
Samples were detected by portable e-nose PEN3 (Airsense Analytics, Schwerin, Germany), which, with the built-in sensor array, sampling and cleaning channels, and data acquisition system, is characterized by its automatic adjustment, calibration, and system enrichment functions [25].
The sensor array is composed of ten MOS sensors sensitive to different compounds, the sensitive characteristics of each sensor was listed in Table 2.The sensor response value was relative resistivity G / G0 (G represented the resistance of the sensor after the action of volatile gas in the sample to be tested, G0 represented the resistance of the sensor after the action of reference gas filtered by standard active carbon).The electronic nose device is shown in Figure 1.Accurately weigh 0.5 g of RMC and MCC with different processing degrees (passing 80 mesh sieve), place it in a conical flask with a stopper, accurately add 25 mL of 50% methanol, weigh and extract with ultrasound (power 100W) for 30 min, cool to room temperature, weigh, then make up the weight loss with 50% methanol, filter, and take the continuous filtrate as the test solution.

Preparation of reference solution
The standard substances of gallic acid, 5-HMF, paeoniflorin and paeonol were accurately weighed and dissolved with 50% methanol to obtain the stock solutions at the concentrations of 0.7000, 1.000 and 2.000 mg/mL respectively.The working standard solutions, after prepared by mixing and diluting the stock solutions with methanol, were filtered through a 0.22 μm PTEE filter.The stock solutions and working solutions were stored at 4 °C for further use.

Methodology validation.
The linearity of the HPLC method for each analyte was evaluated by calibration curves.Each analyte at a series of different concentrations was analyzed in triplicates.The linearity of the calibration curve was constructed by plotting the peak area ratios vs. the concentration of four components.The precision of the HPLC method was determined by intraday and interday measurements.The working standard solution was analyzed in six replicates on the same day to obtain the intraday precision while the interday precision was obtained by analyzing the working standard solution daily (six replicates) for three successive days.Meanwhile, the stability was assessed by analyzing the same sample solution (LMCC4) at 0, 3, 6, 9, 12, and 24 h, respectively.Besides, recovery tests (LMCC4) were performed according to Chinese pharmacopeia to investigate the accuracy of the developed HPLC method.Mixed standard solutions at the uniform concentration level (100%) were added into 0.5 g of the known real samples, and each solution was done three copies in parallel according to the proposed HPLC method.The results were expressed as relative standard deviation (RSD %) of the measurements.

Experimental pretreatment
The different batch of RMC and MCC were firstly put them in the quartz container separately, and then sealed the quartz container with double-layer fresh-keeping film.Before each test, let the samples stand for 30 min to fill the whole quartz container with volatile smell; Warmed up the machine and flushed the metal sensor of electronic nose for 300 s before detection.

Detecting parameters
The electronic nose was connected to the computer, and the corresponding curve of the sample sensor was obtained in real time by Winmaster workstation.After the sample standed for 30 min, insert the injection needle of the electronic nose was inserted into the fresh-keeping film and fixed it, and inhaled the sample gas to be tested at a flow rate of 150 mL/min.The intake air was passing through activated carbon.Each sampling time is 120 s, the sampling interval is 1 s, and the sensor cleaning time is 120 s. when measuring the sample, the ambient temperature and humidity are controlled at about 25C and 30% respectively.Each batch of samples were measured three times in parallel, and the average response curve was taken as the test data of the samples, and the odor response value matrix of 108 batches of samples was obtained.

Method validation
Precision of the method was determined by intraday and interday measurements.The sample Powder (LMCC14) was analyzed in six replicates on the same day to obtain intraday precision, and they were analyzed daily (six replicates) for three successive days to obtain the interday results.The stability was assessed by analyzing the same sample powder (LMCC14) at 0, 2, 4, 6, 8 h, respectively.

Data analyzing and Statistical tests
Data analysis was completed in the MATLAB 2020 environment, qualitative analysis using Classification toolbox 5.2.PCA was performed with SIMCA-P + 12.0 software.The significance test was carried out by two-tailed test in this paper.

Methodology validation
The results of the methodology validation for HPLC analysis were shown in Table 3 and Figure 2. The calibration curves of each analyte displayed good linearity over the range (R 2 > 0.9997) of different concentrations.The RSD values of the precision test were 0.10-2.74%for intraday assays and 0.52-1.64%for interday assays.The RSD values of stability tests were 0.14-2.79%.The recoveries of the HPLC method were above 96.94%,and the RSD values were less than 3.0%.The results demonstrated that the developed HPLC method was capable of accurately determining the contents of the twelve chemical ingredients in different RMC and MCC samples.

Sample analysis
The developed HPLC method was applied to simultaneously determine the contents of the chemical ingredients in RMC and MCC.The results were shown in Table 4.There was a significant difference in the contents of the four chemical ingredients between RMC and different products of MCC.The contents of paeoniflorin and paeonol in RMC were higher than the different degree process product of MCC.And their contents were declined with the extension of processing time.While the content of gallic acid and 5-HMF were the highest among the RMC and MCC, at the same time they were firstly increased and then declined with the processing time.This results was consisted with previous studies [6].

Response curve of odor sensor
Figure 3 showed the odor response curve of 10 sensors, using electronic nose on one sample within 120 seconds.It could be seen from the figure that the change of the sensor response value increased gradually and then tended to be gentle.This was because during headspace injection, the concentration of volatile substances in the sample entering the sensor channel increases continuously and finally reaches dynamic equilibrium.

Response value of odor sensor
The radar diagram of the sample odor sensor using the response value of each sensor were construct when it reached equilibrium (Figure 4).From the radar diagram, it could be seen that the strongest sensor of RMC was W5S (nitrogen oxide), followed by W1S (methyl) and W2W (organic sulfide), indicating that the volatile gas substances of RMC were mainly nitrogen oxide, methyl and organic sulfide.
The response values of sensors W1C (aromatic components), W3C (ammonia) and W5C (aromatic alkanes) changed little, while the response values of sensors W5S (nitrogen oxides), W1S (methyl), W1W (sulfide), W2S (alcohols) and W2W (organic sulfide) changed greatly, which revealed that nitrogen oxides, methyl, sulfide, alcohols and organic sulfide were the differential compounds of odor between RMC and MCC.On the whole, the difference of sensor response values between carbon products with different processing degrees was small.It was difficult to distinguish carbon products with different degrees by radar map alone, and other discrimination methods needed to be used for further analysis.

Optimization of sensor array
The sensitivity of sensors to gas was partially crossed and relatively nonspecific, so collinearity and other problems may occur between some sensors.In order to reduce the miscellaneous information between sensor arrays and the complexity of high-dimensional data on the model, Pearson correlation analysis was used to calculate the correlation coefficient between the two gas sensors by taking the response values of 10 sensors as variables.When the correlation coefficient value of the two sensors was larger, it proved that the correlation of the two sensors was better, and the consistency of the information obtained was closer.So, the two sensors could replace each other, and one of them could be considered to eliminated.From Table 6, it could be seen that the correlation coefficient of W1C and W3C, W3C and W5C, W1S and W2S, W1W and W2W were large, namely 0.984, 0.971, 0.980 and 0.995 respectively.Therefore, W1W, W5C, W1W and W2S sensors were eliminated in the subsequent analysis.Load analysis diagram of electronic nose odor sensor in RMC and MCC was listed in Figure 5.It could be seen that the variance contribution rates of the 10 sensors on the first principal component were basically the same, and the factor loads of sensors W1S and W2S, W1W and W2W were very close, which indicated that sensors W1S and W2S, W1W and W2W were similar to each other, and one of them can be eliminated to optimize the sensor array.In the second principal component, the contribution rate of W5S was the smallest, so the W5S sensor was removed.According to the results of correlation analysis and load analysis, sensors W3C, W6S, W1S, W2W and W3S were selected to form a new sensor array for subsequent analysis.

Principal component analysis
In this experiment, the sample odor response value was used as the input variable, and the unsupervised identification of RMC and MCC samples was carried out by principal component analysis (PCA) [26].The effects of sensor array optimization of the models were compared (Figure 6).before the sensor array optimized, the cumulative interpretation of nine principal component reached 99.98% (Figure 6a), among of the first two principal components reached 92.85% (PC1 = 86.61%,PC2 = 6.24%).After optimized, the accumulation interpretation of six principal components reached 100% (Figure 6b), and the cumulative interpretation of the first 2 principal component score reaches 92.76% (PC1 = 87.17%,PC2 = 5.59%).The results indicated that the first two principal components can represent above 92% of all odor information characteristics of the sample, and the extracted information was well representative.The scores of the first two principal components of the PCA model before and after the optimization of the sensor array were shown in Figure 7.It could be seen from the figure that the RMC had a large spatial distribution range, indicating that there were large differences in odor information among raw products, which may be related to the origin and production date of them, but there was no spatial overlap between raw products and carbon products, indicating that the odor of MCC samples has changed significantly after processed.The smell information of MCC with different processing degrees overlaps seriously in space.At the same time, after the optimization of sensor array, the spatial distribution range of MCC smell information was further reduced, and the clustering effect of the same category was better.So, the unsupervised recognition method of PCA could not effectively distinguish MCC with different processing degrees, the supervised pattern recognition method with better training effect needs to be adopted.

Supervised pattern recognition
In this experiment, taking the collected sample odor quantitative data as the independent variable and the sample category as the dependent variable, the discriminant models of RMC and MCC samples with different processing degrees were established by using linear discriminant (LDA) [27], partial least squares-discriminant analysis (PLS-DA) [28,29] and support vector machine (SVM) [30,31].The performance of the model was evaluated by 10-fold cross validation and external validation.The samples were divided into training set and verification set according to the ratio of 2:1, which included 72 batches of training set and 36 batches of verification set.The training set was used to train the model and optimize the best parameters of the model; The validation set was used to test the application effect of the model.The identification results of LDA, PLS-DA and SVM discrimination models based on the response value of electronic nose odor sensor were shown in Table 7. Comparing the results of the three models, it could be seen that the positive judgment rates of cross validation and external validation of SVM models were higher than 90.00%, indicating that this method could accurately complete the rapid discrimination of RMC raw products and MCC with different processing degrees.

Correlation analysis
In this experiment, the odor quantitative data of RMC and MCC were correlated with the content of internal components, and the Pearson correlation analysis was carried out by SPSS 23.0 software.The Pearson correlation analysis results were shown in Table 8.From the correlation coefficient, it could be seen that the correlation between gallic acid content and each sensor was low, and the correlation coefficients are lower than 0.3; The contents of 5-HMF, paeoniflorin and paeonol were significantly correlated with the sensors W3C, W6S, W1S, W2W and W3S (significance less than 0.01).The contents of 5-HMF were positively correlated with the sensors W3C and negatively correlated with W6S, W1S, W2W and W3S, while the contents of paeoniflorin and paeonol were negatively correlated with the sensors W3C and positively correlated with W6S, W1S, W2W and W3S.To a certain extent, the higher the W3C response value of the sensor, the lower the W6S, W1S, W2W and W3S of the sensor, the higher the content of 5-HMF, and the opposite was true for paeoniflorin and paeonol.

Regression model for compounds content of RMC and MCC based on odor quantification
According to the results of "3.4.1" correlation analysis, the content of gallic acid,5-HMF, paeoniflorin and paeonol had a certain correlation with their odor characteristics.In this experiment, the quantitative data of odor were taken as independent variables, gallic acid, 5-HMF, paeoniflorin and paeonol were taken as dependent variables, and partial least squares regression (PLSR) and support vector machine regression (SVR) were used to establish the component content regression model.The performance of the model was evaluated by 10-fold cross validation and external validation.The determination coefficient (R 2 ), root mean square error (RMSE) and relative analysis error (RPD) were used as the evaluation indexes of the regression model.The results of the quantitative model of compounds content of RMC and MCC based on odor quantitative data were shown in Table 9.According to the model evaluation indexes in the table, the fitting effect of the regression model established by SVM was better than that of PLSR model, the model correlation coefficients R 2 c and R 2 p were significantly improved, and RMSEc and RMSEp were smaller, such as gallic acid quantitative model, R 2 c and R 2 p increased from 0.3773 and 0.2848 to 0.8251 and 0.8509 respectively, RMSEc and RMSEp decreased from 1.2839 and 1.3492 to 0.6600 and 0.6161 respectively, and RPD increased from 1.1992 to 2.6259, indicating that SVM had high prediction effect in dealing with the quantitative problem of component content with low correlation with odor characteristics.This may be related to its working principle in dealing with nonlinear problems.The correlation results between the measured values and predicted values of each component were shown in Figure 8.It could be observed that the predicted results of 5-HMF were scattered on the regression line, and the correlation coefficient of the quantitative model was lower than 0.75, indicating that the performance of the model may be improved.The other compounds gallic acid, paeoniflorin, paeonol were concentrated on the regression line.From the above results revealed that SVM was able to identify the electronic nose relevant to the target compounds and accurately predict the contents of these compounds except for 5-HMF.At the same time with a good prediction, which enabled electronic nose to accurately analyze the influence of processed on MCC quality.
In fact, it is a very popular work to detect samples by electronic nose to obtain the effect of different other conditions on samples.For example, Sana Tatli et al. used electronic nose to detect the response difference of volatile organic compound emission in cucumber to track the effect of different urea fertilizers [32]；Robert Rusinek et al. analyzed the effects of fiber additives on vocs in bread using electronic nose technology [33].Faraneh Khodamoradi et al. investigated the effects of different nitrogen fertilizer amounts on basil [34].However, all of these studies were based on qualitative analysis, and this study combined qualitative and quantitative analysis to provide a more accurate monitoring of the degree of carbon frying in moutan cortex.Volume 19, Issue 9, 9079-9097.

Conclusions
The electronic nose combined with chemometrics was introduced to digitize the smell of RMC and MCC.The discrimination model and chemical composition quantitative model of RMC and MCC with different processing degrees were constructed.The experimental results showed that: 1) After the RMC was stir-fried, there was little difference in the odor response of MCC with different processing degrees, indicating that the volatile components did not change significantly with the deepening of processing degree; Combined with supervised SVM model, MCC with different processing degrees could be identified and predicted accurately, and the correct rate of sample discrimination was 91.67%.
2) Based on the odor digitization of RMC and MCC, combined with PLSR and SVM, the quantitative models of gallic acid, 5-HMF, paeoniflorin and paeonol in RMC and MCC were established.Except for 5-HMF, the determination coefficients (R 2 ) of the quantitative models of gallic acid, paeoniflorin and paeonol were higher than 0.8.The results showed that the quantitative data of RMC and MCC odor could be used to predict the contents of three chemical components; The fitting effect of 5-HMF quantitative model based on odor response value was general, and the model could be optimized and improved by fusing the eigenvalues of other sensors.
In addition, this study established a reliable quantitative model, which is not available in most of the latest studies mentioned above.Quantitative research not only gives more accurate interpretation of samples, but also can be used to control the degree of processing more accurately, which is a more comprehensive perspective of analysis.

Figure 3 .
Figure 3. Response curve of electronic nose sensor.

Figure 4 .
Figure 4. Radar diagram of odor sensor of RMC and MCC.

Figure 5 .
Figure 5. Load analysis diagram of electronic nose odor sensor.

Figure 6 .
Figure 6.The Sensor array optimization results of PCA model explanatory variables and cumulative explanatory variable.(a) before optimization (b) after optimization

Figure 7 .
Figure 7. Scores of the PCA model before (a) and after the optimization of the sensor array model (b).

Table 1 .
Sample information of RMC and MCC.

Table 3 .
The results of methodology validation for HPLC analysis.The RSD values of stability test were 0.68-4.21%(Table5).The system was considered suitable for analysis of RMC and MCC.

Table 4 .
Average content of chemical components in RMC and MCC (N = 27).

Table 5 .
Investigation on the precision of electronic nose sensor.

Table 6 .
Correlation analysis results of electronic nose sensors in RMC and MCC

Table 7 .
Supervised pattern recognition results.

Table 8 .
Correlation analysis results between odor characteristics and internal component content of RMC and MCC (number of cases = 108).

Table 9 .
Results of chemical composition quantitative model based on odor response value of electronic nose.