Application of PCA in Taxonomy Research – Thrips (Insecta, Thysanoptera) as a Model Group

In analysis of principal components, eigenvalues represent the relative participation of each principal component in presenting the general variability of sampled material. The numerical value of a given eigenvalue is a direct indicator of the weight of a particular component in the general characteristics of the variability of a set of data. In practice, the distribution of the elements of the analyzed set in the space of the first three or four components allows one to present almost the complete diversity of the set.


Introduction
Phenetic taxonomy is based on analysis of many unweighted characters. The number of variables that can be analyzed for a plant or animal species is so high that it is necessary to use a mathematical tool for grouping them into units corresponding to taxa.
Principal Component Analysis enables researchers to reduce the number of possible groupings. It is significant due to the occurrence of some redundancy in variables. In this case, redundancy means that some of the variables are correlated with one another, because they are measuring the same construct.
Principal Component Analysis replaces many original characters with only a few most significant principal components (PCs) which represent combinations of closely correlated original characters.
Principal Component Analysis was first described by Pearson (1901); in the 1930's Hotteling (1933) prepared a fully functional method that generates a set of orthogonal axes, placed in decreasing order and determining the main directions of variability of samples.
In analysis of principal components, eigenvalues represent the relative participation of each principal component in presenting the general variability of sampled material. The numerical value of a given eigenvalue is a direct indicator of the weight of a particular component in the general characteristics of the variability of a set of data. In practice, the distribution of the elements of the analyzed set in the space of the first three or four components allows one to present almost the complete diversity of the set.
Eigenvectors are sets of numbers which show the weights of individual characters for each principal component. Like the correlation coefficient, eigenvalues are scaled from -1 to +1. The higher the value, the more closely a given trait is connected with a component. On the basis of eigenvectors it is then possible to interpret principle components, e.g. determine which character (or characters) are the most representative.
Before principal components are determined, the data can be processed in many ways. It is necessary if variables (features) are expressed in different units or the range of their variability is different. The two methods used are centering and standardization. In the former, the variable axes are moved so that the beginning of one axis is in the centre of inertia of the axis. Standardization of data involves changing the values of characters in such a way that their mean is 0 and standard deviation is 1.
If analysis covers calculating the covariance matrix, the variables are centred; if, on the other hand, the correlation coefficient matrix needs to be calculated, the variables are centred and standardized at the same time.

Characteristic of the order Thysanoptera
Up to now almost 6000 species of the order Thysanoptera have been described worldwide but many other are added to this list every year. The asymmetric, with only one left mandibule, mouth cone is a synapomorphic character state which differs thrips from the other insect orders (Mound et al., 1980;. In currently accepted systematic position the order Thysanoptera is divided into two suborders: Terebrantia and Tubulifera. The former consists of eight families which comprise very small and tiny insects, most of them reaching 1-3 mm in length. They are mostly herbivorous insects feeding both on dicotyledonous and monocotyledonous plants, only a small part of them are facultative or obligate predators (fam. Aeolothripidae). The family Thripidae with nearly 2,500 species is the largest within Terebrantia but the relationships within this group is not clear (Mound & Morris, 2007). Only one tubuliferan family -Phlaeothripidae with almost 3,500 known species includes larger thrips, the biggest, mostly tropical taxa, reach up to 15 mm. Excepting herbivores most of them live on dead wood or in leaf litter and feed on fungi. Some phytophagous species are regarded as pests, feeding and breeding on different parts of plants they cause deformations of leaves, flowers and fruits, and in the final result stop their development. A limited number of terebrantians may transmit fungi, bacteria and viruses, which may infect the host plants reducing the quality of yields and their market value (Lewis, 1997;Tommasini & Maini, 1995).
Originally taxonomy referred to the description and naming of the organisms (alpha taxonomy). Currently it is the science based on different fields of knowledge and that uses various tools to classify the organisms and determine the relationships amongst them. There are two systems of the Thysanoptera order classification: phenetic based mainly on the morphological characteristics of adults' specimens and phylogenetic one based on the evolutionary relationships (Mound, 2010). The former one is more practical and widely used in construction of the identification keys. The latter based on the molecular and genetic data is not satisfactory enough because of insufficient data. Therefore in practice, morphology and other biological aspects, e.g. observations of developmental stages and relations with host plants provide more data and may be useful in comprehension of the relationships amongst taxa (Crespi et al., 2004;Mound & Morris, 2007).
Some disagreements exist concerning classification system of Thysanoptera on different taxonomic levels. Because of the great differences in body structures between Terebrantia and Tubulifera Bhatti (1988) proposed to raise them to the order rank in new superorder Thysanopteroidea composed of 40 families. In the next works this author divided Terebrantia into 28 families (Bhatti, 2005(Bhatti, , 2006. However, this classification is not accepted by most of thysanopterologists now. On the other hand the current state of Terebrantia with eight families has not been taken under consideration by zur Strassen (2003) in his latest key. In contrast to the currently accepted division zur Strassen classified the species of the genera Melanthrips and Ankothrips into the family Aeolothripidae. Many revisions at the genus level took place in the past, e.g. changes within the genera: Thrips Linnaeus 1758 and Taeniothrips  The correct identification of specimens to the species level is a basis for further taxonomic study. Often there are many problems with recognition of adults, as well as immature stages because of diversity of variation within and between species, particularly in the species rich genera, e.g. Thrips (Terebrantia) and Haplothrips Amyot & Serville 1843 (Tubulifera). These genera include mainly the Holarctic species feeding and breeding on dicotyledonous plants, though occasionally they are graminicolous. Very numerous species representing both of them may suggest that these genera are relatively young in evolutionary history. Morphologically, many species are very similar and are treated by some researchers as the same species in two forms, e.g.  (Mound & Minaei, 2007Strassen zur, 2003).
There are even more problems with recognizing larval stages. The larvae are less mobile therefore have a stronger relationship with their host plants. Morphological dissimilarity amongst larvae of different species are often larger than amongst their adults (Kucharczyk, 2010). So, the detailed study both on adults and immature stages with application of the statistical tools, within PCA method, allows to solve several problems in the taxonomy of thrips and at least partially explain their phylogenetic relationships.

Identification of the second larval instar of the Haplothrips genus species
The Principal Component Analysis method (PCA) may be useful in selecting from among the great number of morphometric characters, especially those that have some taxonomical value. Such a necessity is occuring within genera which species are very uniform in morphological structure and there are weak qualitative characters differentiating them. The only potential differences are related to the measurements of some body parts. The genus Haplothrips (Thysanoptera, Phlaeothripidae) is a good example for the discussed case. To this genus belong species that are very similar in body structure and therefore they are difficult to identify.
The Haplothrips genus is one of the most numerous in species of the Phlaeothripidae family (about 230 species). It is distributed worldwide but mainly in northern hemisphere, almost 70% known hitherto species of this genus have been noted in Holarctic. Most of them are phytophagous, feeding and breeding in flowers of dicotyledons, mostly of Asteraceae family, only a few taxa are connected with monocotyledons (Mound & Minaei, 2007;Pitkin, 1976;Zawirska & Wałkowski, 2000). Because of the fact that the research on larval morphology is very scarce (especially on species belonging to Phlaeothripidae) the morphological analysis of the second instar larvae of Haplothrips species have been undertaken. Each individual of larva have been measured in respect of 72 potentially important features, most of them concerned lengths of selected both dorsal (d) and ventral (v) setae on all parts of body (h -head, pro -pronotum, mes -mesonotum, meta -metanotum), distances between setae, measurements of apical abdominal segments (8-11) and antennal segments (ant, III-VII). Finally, the data matrix consisted of 11880 measurements was constructed (comprised of 72 character states of 165 individuals belonging to 11 species) (Fig. 1).
The specimens were ordinated along first two PCA axes (transformed data). The results of PCA showed that the cumulative variance of the two principal components reached 62.4%: Axis 1 -38.5%, Axis 2 -23.9%. Figure 2 shows two groups of specimens belonging to different species which are clearly isolated (H. aculeatus and H. subtilissimus). The most discriminative features with the highest eigenvalues are the lengths of abdominal setae (9-d2, 9-v2, 11-d1, 11-v2). On the mentioned PCA graph the most numerous group of specimens, belonging to the nine other and listed above species, is creating compact cloud. To choose more selective character states discriminating the rest of examined species we can remove the measurements of the two first separated species (H. subtilissimus and H. aculeatus) from the primary data matrix. The result of the next PCA is separation of the next group of individuals belonging to H. dianthinus species (Axis 1 -40.5%, Axis 2 -18.1%; Fig. 3). The most discriminative features are short setae 9-v1 and 11-d1. There are some additional selective features (e.g. 8-v2, h-s2) but they are less significant. After elimination of H. dianthinus data from the matrix next two species are emerging: H. arenarius and H. angusticornis (Fig. 4) and three main characters are discriminated (setae: 8-v2, 9-d2, 9-v2). Now we can establish the value-range of selective features for examined species both with the help of data matrix and PCA graph.   That would speed up further measurements of more specimens and improve the precision of determined features range.

Identification of the second larval instar of the Thrips genus species
The study on the morphology of the second larval instar of Central European Thrips species is another example of using PCA method in taxonomic research (Kucharczyk, 2010). In contrast to the previously discussed larvae of the genus Haplothrips, larvae of 34 researched species of the genus Thrips may be recognized mainly on the basis of qualitative characters. The data matrix covered 26 multistate discontinuous characters, amongst them the most important related to the sclerotisation and sculpture of integument, the structure of spiracles and antennae (Tab. 1). This analysis was conducted in two steps. At the beginning all researched specimens were treated as operational taxonomic units (OTUs) which have been characterized by 26 variables. On the graph OTUs form the clouds which are corresponding to the studied species. During the second step not specimens but species were treated as OTUs. In these cases the PCA was applied as a method for ordination and reducing the number of variables, the characters which had the highest loadings to PC1 and PC2 were extracted and they were regarded as the www.intechopen.com most important and useful in distinguishing the studied taxa (Fig. 8). Finally, these selected features have been used in constructing the identification key to second larval instar of studied Thrips species.
The characteristics tested in PCA method have been also used as variables in Claster Analysis (CA). The results in the two-dimension ordination of PCA were consistent with the results in the hierarchical clustering analysis (Fig. 9). The results of numerical analysis sheds some new light on the relationships amongst studied species. Moreover, the received dendrogram showed the similarity within studied Thrips species and allowed to propose the ancestral (plesiomorphic) and advanced (apomorphic) characteristics of immature stage which have not been studied hitherto.  9. Dendrogram (CA) of the similarity amongst analyzed Thrips species based on the morphological characters of the second larval instar (after Kucharczyk, 2010) www.intechopen.com
During the study on the Thrips genus larvae the morphological differences between second larval instar of T. atratus and T. montanus were observed (Kucharczyk, 2010). These species also showed different food preferences. PCA method was applied for distinguishing the most important morphological, measurable features (8 for females and 12 for males) which may be useful in recognizing these species.
On the graphs prepared for females and males separately the specimens of both species were ordered along the first two principal components, the lengths of vectors were correlated with the features significant in recognizing the studied taxa (Figs. 10,11,Tab. 2

Identification of Thrips fuscipennis Haliday 1836 and T. sambuci Heeger 1854 species
Similar problem exist in distinguishing Thrips fuscipennis Haliday 1836 and T. sambuci Heeger 1854. The former is a polyphagous species most often feeding in flowers while the latter is a monophagous insect feeding, breeding and developing on the lower side of Sambucus spp. leaves. In spite of both of the species being recognized as valid, their adults may be distinguished mainly by differences in color of antennal segments (Schliephake & Klimt, 1979;zur Strassen, 2003). Due to the fact that the color characters are very variable in specimens originated from different populations and stations, it is not possible to accurately identify species by using only their color features (Strassen zur, 1997;Mound & Minaei, 2010).
The aim of this task was to find new characters and test their usefulness in distinguishing these taxa. Fedor et al. (2008) proposed to use an artificial neural network method (ANN) for identifying species. This method was successfully applied according to 18 species of four genera: Aeolothrips, Chirothrips, Dendrothrips and Limothrips. Finally the authors selected 19 morphometric features which have been used in ANN analysis. In the current study seven of them were selected to distinguish T. fuscipennis and T. sambuci. Additionally five new quantitative features present in both sexes and three typical for males, and one qualitative feature were proposed for using in comparative study (Tab. 3).
A-VI-l A-VI-l distance between anterior and posterior ocelli D-oc D-oc distance between CS -metanotum D-Cs-mt D-Cs-mt distance between D1 -metanotum D-D1-mt D-D1-mt length of posteroangular seta interna L-p-s-int L-p-s-int length of posteroangular seta externa L-p-s-ext L-p-s-ext number of campaniform sensillae -mesonotum N-cs-ms N-cs-ms distance between setae D1 and fore edge of metanotum D-D1-e-mt D-D1-e-mt width of area porosae on sternum V -W-ap-sV width of area porosae on sternum VI -W-ap-sVI width of area porosae on sternum VII -W-ap-sVII Table 3. Features of Thrips fuscipennis and Thrips sambuci adults used in quantitative analyses Similarly as in T. atratus and T. montanus characters of females and males were analyzed separately. On the graphs being the result of PCA method the specimens of both species are well separated and are located on opposite sides of Axis 2 (Figs 12, 13). The number of campaniform sensilla on the mesonotum -two (sporadically one) in T. fuscipennis and lack of them in T. sambuci is the characteristic which in the highest degree differentiate these species. Additionally the measured setae are shorter in specimens of the former species.
Males of the latter one are characterized by narrower area porosae on abdominal sternites V and VI and very often lack of them on sternite VII. Moreover, this analysis shows low significance such characteristics in recognizing these species as: eye length, distance between ocelli and length of antennal segments. The characteristics mentioned above were used in ANN analysis, the results obtained with PCA method tend to reflect on their usefulness in further similar studies, particularly on Thrips species identification.  Table 3) Fig. 13. PCA scatter diagram of male specimens of Thrips fuscipennis and Thrips sambuci as OTUs along PC1 and PC2 based on 16 quantitative characters (abbreviations as in Table 3)

Conclusion
The above mentioned examples show that PCA method is a valuable tool in identifying species. Its' application allows to select morphometric and qualitative characteristics which discriminate taxa and may be useful in construction of identification keys. This method also allows to verify the significance of some characteristics in taxonomic study and select these most relevant from large set of data.
The results obtained using the numerical taxonomy methods could objectively reflect the taxonomic position of studied taxa.

Acknowledgements
Financial support for this project was partially provided by the Slovak Science Foundation VEGA 1/0137/11.