A Statistic Method for Anatomical and Evolutionary Analysis A Statistic Method for Anatomical and Evolutionary Analysis

Rules, formulas, and statistical tests have been widely used in studies that analyze continuous variables with the normal (Gaussian) distribution or defined parameters. Nevertheless, in some studies such as those in gross anatomy, only statistics with dis- crete or nominal variables are available. In fact, the existence or absence of an anatomical structure, its features and internal aspects, innervation, arterial and vein supplies, etc. can be analyzed as discrete and/or nominal variables. However, there have been no adequate methods, which allow transformation of data with qualitative/nominal variables in gross anatomy to those with quantitative variables. To resolve the issue, we have purposed a new method that allows, in order, descriptions based on numerical analyses, the statisti - cal method for comparative anatomy (SMCA), and proposed the formula for comparison of groups of anatomical structures among different species that allows to infer evolu - tionary perspective. The important features of this method are as follows: (1) to allow to analyze numerical data, which are converted from discrete or nominal variables in morphological areas and (2) to quantitatively compare identical structures within the same species and across different species. The SMCA fills the lack of a specific method for statistical works in comparative anatomy, morphology, in general, and evolutional correlations.


Introduction
The statistical analysis is widely used in studies in almost all scientific fields to lead to discussions and conclusions of the data [1]. In most of the cases, the variables analyzed are continuous ones that can be analyzed by calculating an average, standard deviation, i.e., the data can be fitted approximately to Gaussian distribution (normal distribution) of probability. Gaussian distribution means that most population, in studied variables, is concentrated around the average, i.e., the data are grouped around the average symmetrically [2]; these data, when fitted in Cartesian plane, display a geometrical shape similar to an inverted bell, called Gaussian curve showing a central tendency. In a perfect Gaussian distribution, the average is located at the center of the curve and the frequency of the data, in a studied variable, decreases quantitatively toward lateral extremes. This type of data distribution, similar distribution around the average, is called parametric statistic.
It is important that data characteristics regarding their distribution must be analyzed before any statistical calculation. However, variables of data that do not follow Gaussian distribution are sometimes submitted to statistical calculation under assumption of normal distribution of probability. In these cases, application to mathematical tools under assumption of normal distribution would induce errors, to be more specific, acceptance or rejection of statistical hypotheses should be incorrect [1]. These kinds of errors are becoming common mainly because of indiscriminate and incorrect use of statistical software. The statistical programs are very important to allow fast, concise, and reliable analyses of variables and include a test of normality of data. However, sometimes, data that do not display normal distribution are analyzed using these programs, and the results usually indicate no statistical significance.
For understanding the importance of correct statistics, imagine that although average atmospheric temperature around a man is 24°C, his feet are put in a refrigerator and the head is put in a stove. Then, he should consider the temperatures as very uncomfortable [3]. In this case, the average hardly reflects correct interpretation of the data. Indeed, these extreme values could provide an acceptable average. However, mostly these values are not fitted into Gaussian distribution. Therefore, they cannot be analyzed by parametric statistics [1][2][3].
Data that cannot be submitted to parametric designs should be analyzed using nonparametric statistics. Indeed, other types of averages can be calculated based on nonnormal distributions, as in the binomial or chi-square (χ 2 ) distribution, for example. However, they usually do not allow central tendency analyses and require randomness among data. Furthermore, nonparametric statistics are less precise than parametric ones [4].
In gross anatomy, there are many cases in which numerical data are not available for analyses. Anatomical studies must analyze absence or presence of a structure or organ and characteristics associated with these organs; for example, presence or absence of specific nerves and vessels in muscles, and distribution of these structures if these structures are present. They are qualitative variables, but not numerical ones. This means that numbers cannot provide this type of information.
It is well-known that anatomical texts include vast descriptions of structures, relationships between the structures, axes, and positions of the body. These findings indicate that specific statistical methods are required especially in comparative anatomical studies. However, any previous statistical methods do not allow accurate analyses of anatomical data, but only could assist discussion on them. Some anatomical studies tried to analyze qualitative variables more objectively, using nonparametric statistics such as chi-square (χ 2 ) [5]. The basis of the chisquare statistics is associativity among the data. Thus, the chi-square statistic is an important tool as a multivariate analysis of discrete variables that are considered to be independent. However, the statistical hypothesis in this test does not agree with the Darwin´s theory of evolution, inter alia, because of the assumption of existence of a common ancestor [6]. Indeed, the assumption of the common ancestor suggests similarities of structures across species, i.e., they cannot be random in organisms that evolved from a same ancestor, since the ancestral animal provided basic structures, and could provide derivative features with descendants (for a detailed review, see Ref. [7]). Therefore, application to the statistical methods such as chi-square statistics, which are based on the randomized premise, could induce a hypothetical error.
It is reasonable that when the central tendency measures cannot be used, nonparametric distributions must be chosen [2], mainly due to small sample sizes [3]. The nonparametric distributions are also used when it is difficult to set up quantitative variables. Indeed, the percentage of a given structure based on the frequency of the structure in the samples is one of the key measures in nonparametric statistics used in gross anatomy. In gross anatomy, the highest percentage of occurrence of a given structure is called normal, while the lowest percentage is called variation [8]. Percentages of normal and variation could be analyzed using nonparametric methods.
Gross anatomy has no specific statistical method for analyses of noncontinuous variables regarding anatomical structures until a few years ago. Here, we show a new statistical method based on nonparametric statistics, more consistent with anatomical descriptions. We also compare this new method with cladistics used for evolutionary analyses to indicate usefulness of this new method in this discipline.

Concepts of the statistical methods for gross anatomy
In this section, we will show that the new statistical method is based on the anatomical concept of normality, and appropriate weight is provided with each variable (parameter for a specific feature) of structures based on the importance of the variable, and that conclusions can be drawn based on the values integrated across multiple variables. The results by this new method have been reported in our previous papers, in which this method was designed to compare muscles not only within the same species but also across different species in comparative gross anatomy [1,7,[9][10][11]].

Anatomical concept of normality and variation
The initial step in the statistical method for comparative anatomy (SMCA) is to calculate the frequency based on the normality and variation concept in anatomy. "A normal structure" means that it is observed in greater than 50% of cases within the same species. Therefore, the variation can be observed in less than 50% of cases [8]. The summary of the steps to calculate SMCA is shown in Table 1.
Good examples of structures in animals that should be applied to SMCA are muscles, because muscles require different variables to describe their characteristics: shapes, innervation, vascularization, origin, insertion, and number. Different individuals in the same species or individuals in different species could display different numbers, as in the contrahentes muscles in primates.
The formula indicating the relationship among a total number of studied structures and numbers of normal and variation in the structures is shown below: where N is the total number of analyzed structures, n v is the number of structures with variation, and r v is the number of normal structures (N-n v ). The subscript i indicates specific species, for instance human, chimpanzees, etc., while the subscript j indicates specific structures, and the subscript k indicates parameters (variables) of the specific structures. Thus, the sum of normal and variation in structures must be a total (100%) of these structures.
In case of muscles, the parameters should include at least the following four: (1) innervation, (2) origin, (3) insertion, and (4) vascularization. For instance, in a case of the biceps in a specific species, j is 1 (j = 1), and i is 1 (i = 1). The data analysis in this step should be performed in terms of the following four parameters: number of muscles (r v(115), n v(115) ) and (6) shape (r v(116), n v(116) ) could be added for more detailed analyses. In addition, further detailed parameters (subscript (h)) could be added (see below in detail). The next step is calculation of the relative frequency (RF = P ijk ) of normal structures in each parameter against the total number of structures based on frequencies of normality and variation, i.e., (1) innervation (r v(111), n v(111) ), (2) origin (r v(112), n v(112) ), (3) insertion (r v(113), n v(113) ), and (4) vascularization (r v(114), n v(114) ). According to these frequencies, each RF for (1) innervation (P ij1 ), (2) origin (P ij2 ), (3) insertion (P ij3 ), and (4) vascularization (P ij4 ) can be calculated, as follows: When the structure is pair organs, N (number of individuals in a sample) must be multiplied by 2. It is also possible to separately calculate P ijk for each piece in each body side in case of pair organs. Although any values can be used for N, smaller number of N will result in lower statistical power. It is obviously essential to analyze large numbers of specimens. The analyses with small numbers of specimens are not appropriate for scientific analyses.
It is noted that qualitative features are transformed into quantitative data after the initial data are expressed as percentages. Thus, the method allows numerical description of anatomical structures, which increases preciseness in description of characteristics of anatomical structures. Another usefulness of this method is that the value of RF can be obtained from previous literatures as prevalence (percentage of the structure) in a given species. This is especially impor- Weighted averages for single Weighted averages for multiple muscles (mean of Note: P ijk = proportion of normal structured organs, where i represents individual species, j represents individual structures (i.e., muscles), and k represents individual parameters of the muscles. tant when the study includes comparative anatomy. For example, the palmaris longus could be defect in humans [8,10,12] and its prevalence is around 90% [13]; therefore, RF might be 90% among total individuals. However, in the analyses of innervation, vascularization, origin, or insertion of the palmaris longus, only 90% receives attention and the data from the remaining 10% are sometimes discarded. Such case is common in comparative studies, where, usually, only data in specific species are studied.
Normal structure in each parameter means 0.5 < P ijk ≤ 1 in practical terms, because a quantity lesser than 50% does not match the definition of normal structures in anatomy. However, mathematically, P ijk can vary as follows: 0 ≤ P ijk ≤ 1. For a given species, P ijk , according to the concept of normality, must be greater than 0.5. However, when different species are compared, the frequencies of the normal structures could be different. For example, when the dorsoepitrochlearis muscle is compared among primates and Homo, it is rarely observed in modern humans [14] and approximate P ijk is 0.05 (this value was derived from the literature reporting the percentage of presence of this muscle in individuals), while, in nonhuman primates, the dorsoepitroclearis muscle is a normal feature, and the P ijk is 1.00. Furthermore, some muscles have more than one origin or insertion, as in the triceps brachii with three heads, and in rare cases, this muscle has four heads of origin in Homo [8,13]. Therefore, there are only two kinds of origins in this muscle; type 1 containing three heads as a normal feature and type 2 containing four heads as a variation form.
For accurate and detailed analyses, it is required to calculate P ijk by adding other multiple parameters for muscles or other structures. For example, for muscles, the parameters should include, at least, (1) number or kinds of nerves, or branches of a same nerve (P ij1 ), (2) origin(s) of muscles (P ij2 ), (3) insertion of muscles (P ij3 ), and (4) vascularization of muscles by arteries or branches of one artery (P ij4 ). These parameters should be chosen according to the goal of the analysis; some of the parameters could be removed while the other could be added. Furthermore, (5) quantity of muscles (P ij5 ) and its (6) shape (P ij6 ) could be included in more detailed studies. It is noted that small number of parameters results in less characterization of the studied structure.
By the introduction of this parameter (P ijk ) in this method (SMCA), anatomical characteristics (shown by P ijk ) can be compared among different samples within the same species or across different species, which is useful characteristic of the SMCA for studies of comparative anatomy. For example, variations in the number of the muscles could be compared within the same species or across different species in primates [12].

Definition of pondered average of frequency (PAF)
In the next step of the SMCS in which multiple features (P ijk ) of a given structure are compared among different species, a unique variable (PAF), an integrated value over multiple parameters (P ijk ) is computed. For this purpose, pondered values [the weighted coefficients (w k )], multiplied by P ijk , are specified. The coefficients must be specified according to the anatomical importance of a given parameter in assessment of anatomical similarity. For example, since a small value of the P ijk is ascribed to large variations, the characteristic is not important in assessment of structure similarity. Therefore, the P ijk with small values must be associated with small weighted coefficients. On the other hand, the P ijk with large values (i.e., few variations) must be associated with larger weighted coefficients (see below).
After designation of pondered values as weighted coefficients, the pondered average of frequencies (PAF = P w(ij) ) is computed according to the following formula: for any species (i = 1, 2, … , s ) and any muscles (j = 1, 2, … , m ) ; where P ijk is the relative frequency and w k is the weighted coefficient linked to a specific parameter. For instance, for the muscle 1 of species 1, P 111 is the PAF (relative frequency) of innervation, and weighted coefficient w 1 is 3; P 112 is PAF of the muscle origin, and w 2 is 2; P 113 is relative frequency of muscle insertion, and w 3 is 2; and P 114 is relative frequency of vascularization, and w 4 is 1 [9].
The idea of weighted coefficients (w k ) is based on frequencies of variation in studied structures. The structures with less variation could receive larger weight and the more variable structures, smaller weight. For example, if vessels receive larger weight in the comparison, this could compromise final results, leading to an idea of larger anatomical difference among specimens or species in spite of less difference in muscles. Therefore, we gave the weighted coefficient 3 to innervation (k = 1, w 1 = 3) in a case of muscles. In muscles, during embryonic development of animals, a given nerve terminates on a given muscle [14]. Thus, variations in innervation of muscles are few. Therefore, variations in the innervation is very sensitive to the differences among individuals in a same species and, also, to the difference among different species. Accordingly, the four parameters for muscles noted above, i.e., innervation, origin, insertion, and vascularization, the innervation shows less variations, origin and insertion usually show similar variations, and vascularization, more variations. Thus, both weighted coefficients for origin and insertion should receive the same weight coefficient 2 (w 2 = 2 for origin, w 3 = 2 for insertion). Finally, the parameter with greater variation, vascularization (k = 4), received the weighted coefficient 1 (w 4 = 1). Indeed, vascularization can be different between the same muscles in bilateral sides within the same individuals [12].
Zero cannot be accepted as weighted coefficient (w k ). Therefore, w k must be greater than zero, i.e., w k > 0. To make the calculation easier and to keep clear parameters, the best choice is to use only integer values, i.e., w k ≥ 1. Accordingly, it is very important to keep in mind the choice of the weighted coefficients should depend on different degrees of variations of the structures; highest weighted coefficient for the parameter with the lowest variations, or the same weighted coefficients for the parameters with identical degree of variation. The designed numbers also should be integers or discrete ones since it does not make sense to look for values that represent the exact differences among descriptive or nominal variables.

Definition of comparative anatomy index (CAI) for comparison among different species
In normal structures, P w(ij) must be greater than 0.5 and less than or equal to 1, i.e., 0.5 < P w(ij) ≤ 1. In fact, P w(ij) could be 1, if every P w(ij) has maximal value 1, and if every P w(ij) is minimum (P w(ij) > 0.5), the P w(ij) will be 0.5, as well. Mathematically, P w(ij) can vary within the range of 0 ≤ P w(ij) ≤ 1, since P w(ij) could be zero or less than 0.5 in analyses using different species in which such structure might not be normal.
It is noted that P w(ij) [which is a function of each relative frequency of each specific feature (P ijk ) and its weight (w k )] can be used to assess mathematical similarity of a population of anatomical data; equal values indicate high similarity and large difference in the values between two species indicates dissimilarities or less similarity. In order to compare structures among different species, P w(ij) has to be calculated in each species.
Before calculating P w(ij) , each P ijk must be computed according to the data of species used as a reference, i.e., the control species. For example, the coracobrachialis, a muscle of the arm, could have one or two cranial heads in different primates [15]. For the coracobrachialis, P ijk could be different depending on the number of cranial heads in the control species. Thus, P ijk must be consistently calculated in reference to control species, since different species could have different normal structures (see below in detail).
For example, there are two types of origin in the coracobrachialis (j = 1); type 1 has one origin and type 2 has two origins (k = 2). P ijk could take different values according to the number of heads in the reference species (control species) (i = 1). In noncontrol species to be studied (i = 2) in which type 1 (number of origin is 1) is normal, the P 212 will be 1 in reference to the species with one head, and P 212 of type 1 will be 0.5 in reference to the species in which type 2 (two heads) is normal. In case of the muscle that has one to three heads of origins in different species, the P ijk should be divided by maximum number (i.e., 3) of heads resulting in 1/3, because P ijk should not be greater than 1. Accordingly, when control species with three heads of cranial origin of a muscle is chosen as reference, then, P 212 is 1.000 (3 heads, i = 2), P 312 is 0.667 (two heads, i = 3), and P 412 is 0.333 (1 head, i = 4).
Therefore, P ijk must be obtained initially for the control species. When the control species (i = 1) have anatomically normal two cranial heads (k = 2) for the coracobrachialis (j = 1), and if all individuals in this species have two cranial heads (100%), P 112 should be 1; if 90% of individuals in this species have two cranial heads, P 112 should be 9/10. In a case of a noncontrol species (i = 2) in which the anatomical normal is one cranial head for the coracobrachial, if all individuals have one head, the P 212 should be 0.5, and if 90% of the individuals have one head, P 212 should be 0.45. These values noted above (P ijk ) can be obtained from the data in previous studies, and can be applied to the CAI analysis (see below).
Although any species can be defined as control species, the species studied in the first time or the species with abundant known data should be chosen as control species. To compare any single structure (e.g., muscle) between two different species (i ≠ i'), the data in any noncontrol species can be compared one by one with those in the control species using the comparative anatomy index (CAI) defined by the following the formula: The CAI ii' formula represents the absolute difference of weighted averages (P w(ij) ) of a single structure between the control i and other noncontrol i' species. In this comparison, the noncontrol i' species should be always compared with the same control species i. For example, the formula to compare a structure (j = 1) with a given feature (parameter) (k = 1) between the i and i' species is shown as follows: It is noted that the CAI ii' ranges from 0 to 1, i.e., 0 ≤ CAI ii' ≤ 1. This is because the maximum value of P w(ij) is 1 and the minimum is 0. Note that this equation permits only comparison of just one structure between the two species.

Definition of group comparative anatomy index (GCAI) for comparison of a group of structures among different species
However, the SMCA analysis of the muscles in the forearm [9] shows the need to compare many muscles of a same functional group between different species, for instance, the deep flexor muscles in the forearm among studied species of primates. This need could be understood as reasonable purpose because they work together for some functions as close the hand, then, the comparison of these muscles as a group seems more appropriate in terms of physiology, phylogeny, taxonomy, and evolution, as well. Thus, was purposed the GCAI to compare a group of the muscles among species [1,9,11], one by one based on the sum of the P w(ij), as follows: where i indicates number of species (i = 1, 2,…, s) and j indicates number of studied structures (j = 1, 2,…, m) and m j is the number of structures studied in a sample. Usually, m j is m (m j = m) because, usually, equal quantity of structures is studied in each species.
The GCAI, which represents difference in P w(i) based on multiple structures between the control (i) and other noncontrol (i') species, is defined by the following formula: or Based on the above inferences, using SMCA, the values close to 0.000 suggest high similarity of the structures between the species, and the value 1.000 indicates that those are completely different structures. Thus, we can rank similarity among species based on SMCA. For example, we can define the CAI or GCAI values of 0 as high similarity among the structures analysed, the values from 0 to 0.200 as similar structures, those from 0.200 to 0.650 as somewhat similar, and those from 0.650 to 1.000 as dissimilar. Thus, the GCAI is the absolute difference in mean weighted averages of P w(ij) for multiple muscles between the two species.
In Table 2, we show that different structures (muscles) among different species can be compared in reference to a control species. The 8 specimens of the Sapajus sp, 16 forearms with a total of the 304 muscles, were analyzed [9], and the data derived from the previous literatures using chimpanzees, gorillas, baboons, and humans were compared with the muscles in the control species (Sapajus sp). For other examples of SMCA application to muscles, see Aversi-Ferreira et al. [7,[9][10][11]. In the same way, we also analyzed 4 Japanese monkeys for 4 muscles in the arm resulting in a total of the 16 structures [11], which were compared with the data of the modern humans and other primates obtained from previous studies except those for Sapajus sp.  Table 2. Examples of the analysis using SMCA applied to muscles of the forearm.

Advances in Statistical Methodologies and Their Application to Real Problems
For humans, variation of the structures is very well informed, and too for great apes. However, others animals, except for domestic ones, are scarcely studied, and if any the number of the specimens or species could be small. However, calculation of the SMCA based on the previous studies, except the values of the N, will provides quantitative data for analyses of morphological distances of structures among the species or those within the same species.

Comparison of the SMCA with other nonparametric statistics
Another possibility to study nominal variables is to use the cladistics method that is commonly used in evolutionary studies. This method supposes binary characteristic of data, and any other possibility could be an error, because these features are mutually exclusives [15,16]. This method is useful to obtain objective and/or precise information of evolution-related structures regarding the absence or presence of such structure across different species. However, this characteristic limits its application to morphological analyses of structures, since it considers just two parameters; 0 for absent characteristic, and 1 for its presence. Nevertheless, this method is important in evolutionary studies, since this method might provide evolutionary information. Cladistics analyses prioritize the primitive and derivative features [15,16], while the morphological analyses studied here (SMCA) prioritize utmost characters observed in a given structure.
We previously compared SMCA with other nonparametric methods including cladistics [7,10]. In fact, the SMCA accept more variables to be analyzed for each structure than the cladistics method. For a more detailed comparison, see Aversi-Ferreira et al. [7].

Conclusion
It is desirable to quantitatively assess any kinds of data, even in gross anatomy [7,10], which is important for more precise discussions and more reliable conclusions [17]. Indeed, according to Lord Kelvin, "When you can measure what you are speaking about, and express it in numbers, you know something about it." Our objective is to provide a statistical test for gross anatomy to numerically compare structures of different subjects within the same species and those across different species, which should be useful to analyse more precisely and objectively the data in comparative anatomy.
The SMCA is a new statistical method and requires further verification using many data. We reported SMCA analyses previously [7,[9][10][11] and the SMCA could satisfactorily incorporate many qualitative data numerically. In conclusion, the main features of SMCA are as follows: (1) to allow numerical description of the data shown by discrete or nominal variables in comparative anatomy or in other areas of morphology and (2) to provide a, at least, more precise (numerical) method for comparison of samples of structures from the same species and from different species. Thus, the SMCA fills the lack of an appropriate method for statistical works in comparative anatomy, and in other areas of morphology and other disciplines such as taxonomy, phylogenetic, and evolution.