Multivariate Analysis under Indeterminacy: An Application to Chemical Content Data

The Hotelling T-squared statistic has been widely used for the testing of differences in means for the multivariate data. The existing statistic under classical statistics is applied when observations in multivariate data are determined, precise, and exact. In practice, it is not necessary that all observations in the data are determined and precise due to measurement in complex situations and under uncertainty environment. In this paper, we will introduce the Hotelling T-squared statistic under neutrosophic statistics (NS) which is the generalization of classical statistics and applied under uncertainty environment. We will discuss the application and advantage of the neutrosophic Hotelling T-squared statistic with the aid of data. From the comparison, we will conclude that the proposed statistic is more adequate and effective in uncertainty.


Introduction
In classical statistics (CS), the univariate analysis is the technique to analyze the single-variable data. e multivariate analysis has been widely used to analyze data having more than one variable. In the multivariate technique under the CS, the Hotelling T-squared statistic has been widely applied in the variety of fields (see, for example, [1,2]), for the testing either the means for more than one populations are equal or not. is statistic is the extension of the t-test, which is applied for the testing of the mean for the single population. Brereton [3] used the Hotelling T-squared statistic to detect the outlier in chemical data. In [4], Varmuza and Filzmoser worked on multivariate analysis for chemometric data. Hervé et al. [5] applied the multivariate technique on biological data. Kitaga ki et al. [6] used Hotelling T-squared statistic in chemical and electrochemical oscillator issues. For more details about the applications of the Hotelling T-squared statistic, the reader may read [3,7] and [8].
e Hotelling T-squared statistic derived under the CS can be only applied for the analysis when all observations in the multivariate data are determined, precise, and certain. In practice, the data under study are not always precise but linguistic. For example, the temperature of a certain city may be high, low, and medium or the measurement of variable data in a complex system may lead to being in an interval rather than the determined values. In such situations, the Hotelling T-squared statistic under the CS cannot be used for the analysis of the data. When observations are uncertain or fuzzy, the fuzzy Hotelling T-squared statistic can be applied for the testing of means of multivariate populations. Taleb et al. [9] applied the fuzzy Hotelling T-squared statistic to design a control chart. D'Urso [10] provided a review on fuzzy multivariate analysis. Bakdi and Kouadri [11] presented a new adoptive principle component analysis technique to detect fault in a complex system. In [12], Ammiche et al. introduced principle component analysis for the Tennessee Eastman process using a fuzzy approach. More applications can be read in [13][14][15].
Recently, the neutrosophic logic, which is the extension of the fuzzy logic, attracted many researchers due to its applications in the variety of fields. e neutrosophic logic considered the measure of indeterminacy which fuzzy logic does not consider (see [16]). e neutrosophic statistics (NS) which is based on the neutrosophic numbers is the generalization of the CS (see [17,18]). e NS has been applied widely in the rock-measuring issues (see, for example, [19,20]). e application of the NS for the inspection of the product can be seen in [21,22]. e applications of the NS in the area of the process control can be seen in [23,24]. e application of the NS in medical can be read in [25]. For more information on neutrosophic theory, the reader may refer to [26,27].
Aslam and Smarandache [17,18] pointed out some suggestions to extend the several concepts of CS to the NS. By exploring the literature and best of our knowledge, there is no work on the development of Hotelling T-squared statistic under the NS. In this paper, we will introduce the Hotelling T-squared statistic under NS, which is the generalization of classical statistics and applied under uncertainty environment. We will discuss the application and advantage of neutrosophic Hotelling T-squared statistic with the aid of data. We expect that the proposed neutrosophic Hotelling T-squared statistic will perform better than the existing Hotelling T-squared statistic in uncertainty.

Preliminaries
Let x jkN ∈ [x jkL , x jkU ] be a neutrosophic random variable, which represents the particular neutrosophic observation of the k th variable that is noted from the j th item. Note here that x jkN ∈ [x jkL , x jkU ] is expressed in the indeterminacy interval having the smaller value x jkL and the larger value x jkU . e neutrosophic form of x jkN ∈ [x jkL , x jkU ] having determinate part x jkL and indeterminate part x jkU I N ; I N ∈ [I L , I U ] can be written as follows: x jkN � x jkL + x jkU I N ; I N ∈ [I L , I U ]. Note here that the neutrosophic random variable reduces to the variable under classical statistics if no indeterminacy is recorded in the data. e neutrosophic data matrix having n N ∈ [n L , n U ] neutrosophic observations of p N ∈ [p L , p U ] neutrosophic variables is given as follows: can be written as (2) Note here that X N ϵ[X L , X U ] is the generalization of the data matrix under classical statistics. e data matrix under X N ϵ[X L , X U ] reduces to the data matrix under classical statistics when I L � 0. e neutrosophic sample mean and neutrosophic sample variance from n N measurements from p N neutrosophic variables are computed as follows: e neutrosophic form of x kN ϵ[x kL , x kU ] can be written as Note here that x kN ϵ[x kL , x kU ] is the generalization of the sample mean under classical statistics. e data matrix under x kN ϵ[x kL , x kU ] reduces to the sample mean under classical statistics when I L � 0: e neutrosophic form of S ikN ϵ[S ikL , S ikU ] can be written as Finally, neutrosophic sample correlation between the i th and k th variables is given by e neutrosophic form of r ikN ϵ[r ikL , r ikU ] can be written as Note here that r ikN ϵ[r ikL , r ikU ] is the generalization of sample correlation under classical statistics. e data matrix under r ikN ϵ[r ikL , r ikU ] reduces to the sample correlation under classical statistics when no indeterminate observations. e neutrosophic descriptive statistics for n N measurements and on p N variables can be presented into the following arrays. e neutrosophic sample mean variance and covariance and correlation are presented by the array X N ϵ s 11U · · · s 1kU · · · s 1pU s j1U · · · s j1U · · · s jpU s n1U · · · s nkU · · · s npU ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ ⎤ ⎦ and R N ϵ 1 · · · r 12L · · · r 1pL r 21L · · · 1 · · · s 2pL s pL 1L · · · s pL 2L · · · 1 ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦, ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ 1 · · · r 12U · · · r 1pU r 21U · · · 1 · · · s 2pU s pU 1U · · · s pU 2U · · · 1 ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ ⎤ ⎦ , respectively.

Neutrosophic Hotelling T 2 N Statistic
In this section, we discuss the proposed neutrosophic Hotelling T 2 N statistic. In classical statistics, the student t-test is applied for the testing of the mean for the univariate case. As mentioned by [28], rejecting the null hypothesis that means are equal when |t N |is large is the same as rejecting the null hypothesis of its square: where α is the level of significance and t n U −1 (α/2) is upper 100(α/2) th percentiles of the neutrosophic t-distribution with the neutrosophic degree of freedom n U − 1. e generalization of equations (1) and (2) for the multivariate case under the neutrosophic statistical interval method (NSIM) is given by where can be written as e statistic is given in equation (14) is called neutrosophic Hotelling T 2 N statistic and has neutrosophic F-distribution with neutrosophic degree of freedom (ndf ) p N and (n N − p N ):

Journal of Analytical Methods in Chemistry
e neutrosophic Hotelling T 2 N statistic can be used for the testing of hypothesis H 0N : μ N � μ 0N and alternative hypothesis H 0N : μ N ≠ μ 0N . e H 0N : μ N � μ 0N will be rejected if e software provides the p value in making a decision about the acceptance or the rejection of the null hypothesis. According to [18], "a neutrosophic p value is defined in the same way as in classical statistics: the smallest level of significance at which a null hypothesis H 0 can be rejected." Note here that the neutrosophic p value is not an exact or determined value as in the case of classical statistics. Smarandache [18] discussed criteria to accept or reject the null hypothesis using the neutrosophic p value.

Application
Now, we discuss the application of the proposed neutrosophic Hotelling T 2 N statistic using data selected from the healthcare department.
e data are collected from 20 healthy women and three variables, which are sweat rate, sodium, and potassium contents are measured. e observations of variables underinvestigated will be obtained from the measurement process. It is expected that not all observations in the data are precise and exact. erefore, it cannot be analyzed using CS. Similar data for classical statistics are given by [28]. e data having some neutrosophic observations are shown in Table 1. We want to test that the means of three groups for the healthy women have the same population means. We state null and alternative hypotheses as follows: Step 1: H 0N : μ 0N � [4,4] [50, 50] [10, 10] Step 2: some basic calculations for the data are given in Table 1 [50, 50] Step 3: let α � [0.10, 0.10] be the level of significance.

Comparisons
In Section 4, we presented the testing procedure for the proposed neutrosophic Hotelling T 2 N . e proposed neutrosophic Hotelling T 2 N is the generalization of CS. e proposed neutrosophic Hotelling T 2 N testing procure reduces to the testing procedure under CS when all observations of sweat data are precise. From neutrosophic sweat data, we note that the proposed testing procedure provides the analysis values in the indeterminacy interval rather than the determined values. e neutrosophic form of proposed Hotelling statistic is T 2 N � 9.7387 − 11.41I N ; I N ϵ[0, 0.1470]. For example, the proposed Hotelling statistic has the indeterminacy interval from 9.73 to 11.41. It means, under uncertainty environment, one can expect the values of T 2 N from 9.73 to 11.41. e first value 9.73 of the indeterminacy interval of T 2 N shows the determined part, and 11.41 is an indeterminate part. When imprecise observations are noted in the sweat data, the value of T 2 N is 9.73 which is under the CS. In other words, when the level of significance is 5%, the probabilities that the null hypothesis is accepted, rejected, and indeterminate are 0.95, 0.50, and 0.1470. By comparing the proposed test with the test under CS, we note that the existing test is unable to tell about the probability of the indeterminacy. As mentioned by [19,20] that a method that provides the values in an indeterminacy interval under uncertainty is considered as the most effective and adequate method. By comparing the proposed testing procedure with the existing under CS, our theory is the same as in [19,20].

Concluding Remarks
In this paper, we introduced the Hotelling T-squared statistic under neutrosophic statistics (NS) which is the generalization of classical statistics and applied under uncertainty environment. We discussed the application and advantage of neutrosophic Hotelling T-squared statistic with the aid of data. e proposed neutrosophic Hotelling T-squared statistic is expressed in the indeterminacy interval and hence more flexible and information than the Hotelling T-squared statistic under classical statistics. Based on the comparison, we recommend using the proposed neutrosophic Hotelling T-squared statistic for the analysis of the data under uncertainty. Some more properties of the proposed neutrosophic Hotelling T-squared statistic can be studied as future research. e sensitivity of the proposed statistic to uncertainty and measurement errors can be studied in future work.

Data Availability
e data used to support the findings of this study are included in the paper.

Conflicts of Interest
e authors declare that they have no conflicts of interest regarding this paper.