Multivariate Statistical Analysis Applied in Wine Quality Evaluation

This study applies multivariate statistical approaches to wine quality evaluation. With 27 red wine samples, four factors were identified out of 12 parameters by principal component analysis, explaining 89.06% of the total variance of data. As iterative weights calculated by the BP neural network revealed little difference from weights determined by information entropy method, the latter was chosen to measure the importance of indicators. Weighted cluster analysis performs well in classifying the sample group further into two sub-clusters. The second cluster of red wine samples, compared with its first, was lighter in color, tasted thinner and had fainter bouquet. Weighted TOPSIS method was used to evaluate the quality of wine in each sub-cluster. With scores obtained, each sub-cluster was divided into three grades. On the whole, the quality of lighter red wine was slightly better than the darker category. This study shows the necessity and usefulness of multivariate statistical techniques in both wine quality evaluation and parameter selection.


INTRODUCTION
Wine is widely consumed in many countries around the world (Bentlin et al., 2012) and people are increasingly concerned with the quality of the wine. Some appraise wine quality by sensory tasting while others evaluate quality of wine by physicochemical analysis. Measurement of physicochemical index technology such as heterosexual natural isotopic fractionation and nuclear magnetic resonance technology, have been gradually developed (Jiang et al., 2008). With the improvement of measurement techniques, physicochemical analysis is being widely used.
The methods of physicochemical specifications analysis mainly include traditional statistical methods such as Principal Component Analysis (PCA), Cluster Analysis (CA), Discriminate Analysis (DA) and Decision Trees (DT), Artificial Neural Networks (ANN) and Support Vector Machine (SVM), which have been frequently used in the field of classification (Hernanz et al., 2007;Aly, 2005;Kavuri and Kundu, 2011;Jin, 2005;Osorio et al., 2008). Two principal components were grasped by using PCA and then wine samples were clearly clustered into two homogenous groups by using CA, which was sufficient to differentiate the wines produced with different clones (Burin et al., 2011). But previous researchers didn't take the clustering index weights into account. The quality of cluster is largely under the influence of index weights. Cluster weights reflect the importance of the index, which is the advantage of weighted cluster analysis. In addition, the new fuzzy clustering algorithm which defines indexes weights in the framework of Axiomatic Fuzzy Set (AFS) theory is based on Shannon Entropy (Zhang et al., 2009). With three-layer feed forward architecture, ANN of back propagation learning was applied to update weights (Shoemaker et al., 1991). The method of DA was used to distinguish wines from different countries based on a minimal number of the most important parameters (Römisch et al., 2006). ANN methods were used for the classification of Slovak white varietal wines with the aim to classify wines by different variety, producer, location and the year of production (Kruzlicova et al., 2009).
Technique for Order Performance by Similarity to Ideal Solution (TOPSIS), a Distance Comprehensive Evaluation Method, is one of the most common methods for problems involving multi-criteria decisions (Cruz-Ramiaírez et al., 2010). To achieve competitive edge in the market, TOPSIS method was performed to select fruits from superior locations in terms of total natural antioxidants of the fruit (Sun et al., 2011). But each indicator was given the equal weight, which can't explain the degree of importance of indicators. A comprehensive evaluation model of coal mine safety, established by the entropy weights and TOPSIS, was applied to evaluate safety conditions of production in four coal mines (Li et al., 2011).
In this study, we used Principal Component Analysis to eliminate the correlation between indicators. And then the wine samples were clustered by Weighted Cluster Analysis, where weights were determined by information entropy. In addition, in order to verify the accuracy of the weights, we used Back Propagation (BP) Neural Network to update weights. Finally, we used weighted TOPSIS method to evaluate the quality of various types of wine and determine the grade of wine. It is worth mentioning that the weights were respectively determined by information entropy method for red wine of the first category and the second category. Likewise, BP neural network was used to test the accuracy of the weights.

Data sources and original indicators:
Research data is quoted from the 2012 China Undergraduate Mathematical Contest in Modeling, with 27 kinds of red samples monitoring 12 parameters as a case study (http://www.mcm.edu.cn/problem/2012/2012.html).

PCA of indicators:
A widely used multivariate analytical statistical technique, Principle Component Analysis can simplify a set of dependent texture variables to a smaller set of underlying variables based on patterns of correlation among the original variables (Lawless and Heymann, 1999). PCA can use fewer new variables instead of the original variables with the largest variability (He et al., 2007).

Information entropy weighted clustering: Cluster
Analysis is a tool of exploratory data analysis to solve classification problems. The degree of association is strong between members of the same cluster and weak between members of different clusters (Burin et al., 2011). Cluster quality is largely under the influence of the weights of features. Shannon Entropy is used to defines indexes weights (Zhang et al., 2009).
Below are steps for weighted information entropy cluster: • Normalize the original data matrix. Let m stands for wine samples, n is located as physicochemical ( 1) Under the j-th index, value of i-th sample valuation is p ij : (2) • Calculate weights of the properties. Information entropy of j-th index is: (3) Below is the formula of j-th index of entropy weights w j : (4) • Use weights to calculate the squared Euclidean distance • Do clustering analysis using ward method with squared Euclidean distance • Analyze evaluation results We applied a BP neural network model in iterating weight calculated by entropy method for weights accuracy inspection. BP neural network, the most widely used neural network model, is a multi-layer network model of one-way communication (Xie et al., . Normalized data of the red wine's main constituent was regarded as input and weight determined by information entropy was regarded as output. Component weight calculated by information entropy is definitely accurate if there is little difference between iterative weights and initial weights. Comprehensive evaluation based on TOPSIS method: TOPSIS, developed by Hwang and Yoon (1981), is a simple ranking method in conception and application (Hwang and Yoon, 1981). The standard TOPSIS method attempts to choose alternatives that simultaneously have the shortest distance from the positive ideal solution and the farthest distance from the negative-ideal solution. Making full use of attribute information, TOPSIS provides a cardinal ranking of alternatives and does not require attribute preferences to be independent (Chen and Hwang, 1992;Yoon and Hwang, 1995). The evaluation object is ranked in accordance with the value of the relative degree of approximation. The bigger the value, the better the evaluation object.

RESULTS AND DISCUSSION
Analysis of the outcome of PCA: As is shown in Table 2, a total 89.06% of data information was explained by four principal components. So it was reasonable to take the principal components F 1 , F 2 , F 3 , F 4 to represent the original 12 targets to conduct the cluster analysis. The matrix of the red wine component score coefficients are represented in Table 3.
From Table 3, we knew that component 1 of the red wine contained information of anthocyanin, Tannins, total phenols, Flavonoids, DPPH Semiinhibition volume, which could be accordingly named taste factors; Component 2 of the red wine contained information of a* (D65), C (D65), which could be named chromaticity factors; Component 3 of the red wine contained information of H (D65), b* (D65), which could be named cool tone factors; Component 4 contained information of aromatic, L* (D65), resveratrol, which could be named incense factors.

Analysis of information entropy weighted cluster:
We calculated the entropy weights of four principal components of the red wine. The results of our calculation are shown in Table 4. The weights of principal components will be greater if more information is contained in the main ingredient. It indicates that the principal components are very important when they have high weights. As was shown in Table 4, we knew that the principal component 1 had the greatest impact on wines clustering.
In Table 4, we can see that iterative weights calculated by the BP neural network had a small difference from weights before iterating, which proved that weights determined by information entropy had a high accuracy.
We divided samples into different categories, based on the standard that the distance between the two classes was greater than 10 and the within-class distance was about 5.
Results for the red wine classification are shown in clustering tree (Fig. 1). According to the standard, we     , 5, 8, 9, 14, 17, 22, 23, 24 and 27 and the second category contained samples 1, 3, 4, 6, 7, 10, 11, 12, 13, 15, 16, 18, 19, 20, 21, 25 and 26, respectively. Values of all the physicochemical indicators but the color ones of the first category were greater than those of the second category. It showed that the first class of the red wine was relatively dark and the tone was darker. It had sour taste and rich aroma. However, the second class was lighter in color and partially brick red. It tasted thin and had less odor than the first class.

Result of TOPSIS comprehensive evaluation:
We calculated the entropy weights of four principal components of the red wine. As shown in Table 5, iterative weights calculated by BP neural network had a small difference from weights before iterating, proving that weights determined by information entropy had a high accuracy. We conducted TOPSIS comprehensive evaluation regarding the two categories of red wine samples. The optimal value stood for the relative closeness to the ideal solution can be calculated from negative and positive ideal solution. To be exact, the optimal value stood for the quality of wine. The higher the score was, the better the wine was. The wine grading standard can be determined according to the optimal values and their distribution. According to the distribution of optimal values of wine samples, we divided values into three intervals. Meantime, every category of wine was divided into three levels. Each interval corresponded to a particular grade of wine quality. Grade I indicated the worst quality, while Grade III stood for the best quality. The grading standards of red wine are shown in Table 6. Table 7 shows scores of wine samples. If optimal value was less than 0.5, then the solution of corresponding wine samples approached the negative solution. On contrast, if optimal value was more than 0.5, the solution of corresponding wine samples approached the positive solution. From Table 7, we found that most optimal values were less than 0.5. So we concluded that the whole qualities of most wine samples were generally not high. And the distance  between the value of the best wine and that of the worst wine in three categories were bigger than 0.5, which showed good discrimination of using TOPSIS method. Based on optimal values, we graded the red wine according to the standard shown in Table 6. Table 8 shows wine classification results. In the first category of red wine, wine of grade I accounted for 50%, wine of grade II accounted for 30%, wine of grade III accounted for 20%; In the second category of red wine, wine of grade I accounted for 17.65%, grade II accounted for 52.94%, grade III accounted for 29.41%. On the whole, with red wine, the quality of the lighter category was slightly higher than the darker category.

CONCLUSION
We grasped the principal components of the physicochemical indicators using Principle component analysis. And then we calculated each main component weight based on the method of entropy weights. To verify the accuracy of the weights calculation, we used BP neural network model to iterate weights. The results of BP neural network showed that there were narrow difference between iterative weights and initial weights, which proved that weights determined by information entropy had a high accuracy. After weights accuracy was verified, we clustered red wine samples into two categories. Weighted cluster analysis worked well in clustering. We applied the weighted TOPSIS method to objectively evaluating the quality of various types of wine, which showed good discrimination in assessment of wine quality. The method has displayed good practicality and can be used in cases where there are no other objective criteria available. It steers clear of the thorny problem of determining subjective weights in general evaluation and conducts a comprehensive evaluation of the quality of the wine, playing an important role in the promotion of scientific, standardized and institutionalized evaluation of the wine quality. What's more, the model can be widely used in food and other quality evaluation.