INTRODUCTION
Coffee (Coffea spp.) is one of the plantation crops developed in Indonesia since the Netherlands’ colonialism. This crop has become a valued commodity in the strengthening of foreign exchange. It is represented by data of production, export, and coffee plantation area in Indonesia (Andini et al., 2021). Indonesia’s coffee production has ranked the fourth rank below Brazil, Vietnam, and Colombia. Indonesia exports less than 0.28 million tons of coffee beans. The plantation area reaches 1.19 million hectares. The plantations are dominated by 96% smallholders and the remains from private and national corporations (Statistics Indonesia, 2019).
The kinds of coffee developed in Indonesia are arabica (Coffea arabica L.) and robusta (Coffea canephora Pierre ex A. Froehner). Both coffees have a demand level higher than the other kinds (Statistics Indonesia, 2019). Arabica coffee has the best taste among others (Dias and Benassi, 2015). The arabica plantation areas are less extensive than robusta coffee, but its economic values are more expensive than robusta coffee (Dias and Benassi, 2015; Nugroho et al., 2016). In general, arabica coffee is more planted at the highland around 800-2000 m a.s.l. (Nugroho et al., 2016; Konieczka et al., 2020). The taste and resistance to Hemileia vastatrix attack are the reasons why arabica is grown in the highland (Alnopri and Hermawan, 2015). Robusta coffee is more resistant to H. vastatrix, it can be planted at less 1000 or 200-900 m a.s.l. and optimal at 600-700 m a.s.l. (Konieczka et al., 2020). The difference of agroecosystems between each coffee impacts different morphology characters and indirectly on the taste of both coffees. The coffee taste is mainly influenced by genetic interaction and agroecosystem (Bicho et al., 2013; Ramadiana et al., 2018), some coffees have a certificate as specialty coffee (Traore et al., 2018). Based on their unique characters, specific identification is interesting to differentiate between arabica and robusta in the same environment.
Indonesian Industrial & Beverages Crops Research Institute (IIBCRI) is a research institution focused on developing industrial crops, including coffee. This institution is located at Pasir Kuda, Sukabumi District, West Java, with an altitude of less than 400 m a.s.l. Based on the altitude, this environment is not optimal for the coffee plant; however, this institution has some robusta and arabica coffee collections in the same ecosystem. Both coffees have good growth at IIBCRI, and coffee collections do not present leaf rust diseases (Anshori, 2014). It is interesting to research genetic distance patterns or the differential response between coffees in the IIBCRI plantation environment. One of the simple methods to identify the difference of response between both coffees is morphological phenotype analysis. This method was reported by Ramadiana et al. (2018) for 15 robusta coffee accessions at Lampung, Indonesia. However, accuracy and a systematic approach are needed to minimize data bias in the analysis process, especially to many morphology characters (Mattjik and Sumertajaya, 2011). One approach that can be done is multivariate analysis.
Multivariate analysis is an approach to compress and combine big variable series, simple and easy to understand. This approach analysis consists of specific analyses categorized based on its function to reduce variance dimension to the cluster or to know the dependent relationship among all variables (Mattjik and Sumertajaya, 2011; Mengual-Macenlle et al., 2015). The standard analysis applied in the genetic distance is cluster analysis concepts. The kind of cluster analysis concepts that could be used is cluster gram analysis, principal component analysis (PCA), multiple dimension scaling, and principal coordinate analysis (PCoA) (Evgenidis et al., 2011; Tounekti et al., 2017; Akpertey et al., 2019). Therefore, applying some clustering analyses in this study would be a solution to detect genetic distance among coffee clones at IFCRI. This study aimed to identify the kinship pattern and specific morphological characteristics of the coffee clone collections by using multivariate analysis.
MATERIALS AND METHODS
The experiment was carried out at Indonesian Industrial & Beverages Crops Research Institute (IIBCRI), Sukabumi, West Java, Indonesia, from March until September 2014. In general, the IIBCRI field has an area of approximately 159.6 ha with flat to undulating topography. The soil types is latosol with a pH ranging from 5-6. The location has a type B climate according to Schmidt-Ferguson. The specific climate information on the study location is shown in Table 1. Plant materials consisted of five clones of robusta coffee (Coffea canephora Pierre ex A. Froehner), BP 308, BP 436, BP 42, SA 237 and SA 237, and three clones of arabica coffee (Coffea arabica L.), S 795, Kartika 1 and Kartika 2. All clones derived from Indonesian Coffee and Cocoa Research Institute, Jember, Indonesia. The experiment was conducted through observation, and all quantitative data were justified through covariance analysis according to Anshori (2014).
The observation used the descriptor list of IPGRI (1996) modified for the coffee plant. Forty-six morphology descriptors were evaluated for vegetative, inflorescence and flower, fruit, and seed. Vegetative description covered (17): overall appearance, plant habit, vegetative development, branch angle, leaf length, leaf width, petiole length, leaf shape, leaf apex shape, stipule length, internode, leaf petiole color, young leaf color, mature leaf color, stipule shape, firmness of leaf surface waves, and firmness of leaf edge waves; inflorescence and flower description covered (15): inflorescence position, inflorescence on old wood, anther insertion, flower tip shape, flower base shape, number of flowers per axil, number of fascicles per axil, number of flowers per fascicle, corolla tub length, number of stamens, corolla length, corolla width, anther length, stigma length, and pistil length; fruit description covered (7): fruit color, fruit shape, fruit-disc shape, endocarp texture, fruit length, fruit width, and pulp thickness; and seed description covered (7): husk seed length, bean length, husk seed width, bean width, husk seed thickness, bean thickness, and husk seed shape. The observation used the common observation tools, like meter tool, ruler, vernier calipers, pen, and digital scales to identify the quantitative characters. Besides that, the additional tool used in this study was the color chart from the Royal Horticultural Society (London, UK) to observe color morphology characters.
Tmin: Minimum temperature; Tmax: maximum temperature; Tavg: average temperature; RH: relative humidity; RR: rainfall; ds: duration of the sunlight.
Data analysis
The data obtained from the observations were divided into qualitative data (22 characters) and quantitative data (24 characters). Quantitative data were selected first with principal component analysis (PCA) and Biplot PCA analysis. Character groups that had the same direction in the biplot analysis were chosen by looking at the slices of the dominant variance character on PC1 and PC2. Both analyses were performed with STAR 2.0.1 software (International Rice Research Institute [IRRI], Los Baños, Laguna, Philippines). After the quantitative characters were selected, all qualitative characters were combined with the selected quantitative data to analyze the kinship between IIBCRI coffee clones. The kinship analysis was carried out through cluster analysis and principal coordinate analysis (PCoA) with Rstudio software (RStudio, Boston, Massachusetts, USA). Cluster analysis used package cluster and PCoA analysis using ape and vegan packages. After that, the results of the PCoA biplot analysis were carried out by orthogonal mapping according to the win-win-where concept to determine the specific character of the clustered clones using paints software.
RESULTS
Identification of variance determinants based on PCA of quantitative characters
The results of the PCA analysis showed that the two PCs were able to represent the diversity of existing quantitative data; it could be seen from the variance accumulation, which reached 84% (Table 2). Based on PC1, fruit length and pulp thickness characters had a positive vector direction and were different from most of the other quantitative characters. On the other hand, the essential negative characters in PC1 were leaf length, leaf width, internodes, number of flowers per axil, number of flowers per fascicle, number of fascicles per axil, anther length, corolla length, pistil length, corolla tube length, corolla width, bean length, bean width, bean thickness, husk seed width, and husk seed thickness. Based on PC2, stipule length, number of flowers per fascicle, bean thickness, and husk seeds width were the characters with the most significant negative variety. On the other hand, the essential positive characters of PC2 were petiole length, stigma length, corolla width, number of stamens, fruit length, pulp thickness, bean length, and husk seed length.
Kinship analysis based on cluster analysis of eight coffee clones at the IIBCRI
The qualitative characters also were filtered by variance pattern among all clones. Based on the phenotype of qualitative characters, almost all characters had variance among all clones, except plant habit, leaf petiole color, mature leaf color, inflorescence position, anther insertion, and fruit-disc shape (Table 3). Cluster analysis result showed that two large groups had a dissimilarity distance of 70% (Figure 1). The two groups could be identified as the two types of coffee groups tested at the IIBCRI. It explained that the cluster analysis could adequately separate the two types with a distant kinship value. The first group (BP 308, SA 237, SA 203, BP 436, and BP 42) was identified as robusta coffee clones, while the second group (S795, Karika 1, and Karika 2) was identified as arabica coffee clones. Based on the cluster analysis, each group could be divided into two sub-groups at a dissimilarity distance of 40% or 0.4.
Kinship analysis based on PCoA of eight coffee clones at the IIBCRI, Sukabumi
The PCoA biplot analysis results showed that all clones were mapped according to the groups in the cluster analysis. However, in this analysis, the specific differences among coffee clones could be seen with character vectors, especially in robusta coffee (Figure 2). BP42 and BP436 as subgroup 2 in robusta coffee were in quadrants that differed from other robusta clones. On the other hand, all clones were in the same quadrant in arabica coffee, although the S795 clones have a clear distance from Kartika 1 and Kartika 2 if the mapping was based on the concept of win-win where it could be explicitly identified the supporting characters of each clone sub-group. Based on the win-win where concept, the arabica coffee sub-group consisted of Kartika 1 and Kartika 2 clones were grouped specifically by leaf apex shape (LAS), flower tip shape (FTS), fruit color (FC), endocarp texture (ET), fruit length (FL), pulp thickness (PT), bean length (BL), and husk seed shape (HSS), while the S795 clone was grouped specifically for the firmness of leaf edge waves (FLEW) and fruit shape (FS). As for robusta coffee, BP 308, SA 237, and SA 203 clones also have some specific characteristics such as the firmness of the leaf surface wave (FLSW), stipule shape (SS), and the flower base shape (FBS). On the other hand, the clones BP 436 and BP 42 were grouped specifically by young leaf color (YLC), number of flowers per fascicle (NFF), inflorescence on old wood (IOW), leaf shape (LS), vegetative development (VD), bean thickness (BT), overall appearance (OA), corolla width (CW), branch angle (BA) and husk seed width (HSW). The concept of win-win is a mapping concept based on connecting lines between the farthest clones, and then the lines are divided based on their orthogonal properties from the center point.
LAS: Leaf apex shape; FTS: flower tip shape; FC: fruit color; ET: endocarp texture; FL: fruit length; PT: pulp thickness; BL: bean length; HSS: husk seed shape; FLEW: the firmness of leaf edge waves; FS: fruit shape; FLSW: the leaf surface wave; SS: stipule shape; FBS: the flower base shape; YLC: young leaf color; NFF: number of flowers per fascicle; IOW: inflorescence on old wood; LS: leaf shape; VD: vegetative development; BT: bean thickness; OA: overall appearance; CW: corolla width; BA: branch angle; HSW: husk seed width.
DISCUSSION
In general, PCA is an analysis that can reduce a large character dimension to be simpler by retaining most of the diversity information in the initial data (Mattjik and Sumertajaya, 2011; Jolliffe and Cadima, 2016; Wang et al., 2021). It was very efficient to see the dominant character proportionally affecting the population of the observed genotype. The use of this analysis has been reported by Anshori et al. (2021) and Farid et al. (2021) in determining essential characters in a population. Identifying important characters in PCA was carried out by looking at the dominant characters on the two most significant PCs independently (Jolliffe and Cadima, 2016; Tounekti et al., 2017). Based on this study, the cumulative proportion of two PCs value was more significant than the minimum limit stated by Mattjik and Sumertajaya (2011) of 80%. Thus, these PCs could be used in looking the dominant characters. Eigenvector of the most significant PCA loading, both positive and negative, is the basis for determining the dominant character (Mattjik and Sumertajaya, 2011; Tounekti et al., 2017; Anshori et al., 2021; Farid et al., 2021), then these results are combined to determine the main characteristic character.
The determination of selected characters was based on the eigenvector threshold for each PC. In PC1, the -0.2 value was used as the threshold. It is due to the eigenvector distribution spread on the negative quadrant, exactly on intervals 0 to -0.3, although one outlier has a more 0 value in PC1. So, the selected character was focused on less than -0.2. On the other hand, in PC2, < -0.1 or > 0.1 were used as a threshold. The PC2 eigenvector distribution was relatively more varied than PC1. Interval of -0.1-0.1 was considered not more significant in determining PC2 variance. According to Mattjik and Sumertajaya (2011), a character more close to 0 indicated a meaningless of the character role, so this character was not considered in the subsequent step analysis. Therefore, a value > 0.1 or < -0.1 was used as the basis for determining the essential characters on PC2. Based on the second PC slices results, it was found that the number of flowers per fascicle, corolla width, fruit length, pulp thickness, bean length, bean thickness, and husk seed width were important quantitative characters with a large variety of directions. Therefore, these seven characters can be combined with qualitative characters in identifying kinship distances between coffee clones at the IIBCRI, Sukabumi.
Quantitative character selection was carried out to identify essential characters that influenced the diversity of clones observed. In general, identification of the genotype kinship is closely related to the nature of the observed character data. There are several data types on a statistical scale, namely binary, nominal, ordinal, and numeric (Ranganathan and Gogtay, 2019). Qualitative character or categorical data has a scope in discrete binary, nominal and ordinal data, although some numeric data also contains discrete numerical data (Ali and Bhaskar, 2016). The nature of discrete data will form a graph with a rigid gap (Ali and Bhaskar, 2016; Ranganathan and Gogtay, 2019) and is excellent in the process of differentiating a trait between genotypes. Genetically, discrete characteristics are influenced by only a few significant genes, so their characters are less affected by the dominance of the environment. The fewer groups of characters, the fewer genes that regulate the trait (Acquaah, 2012), and it is easier for qualitative characters to distinguish between genotypes.
In contrast, quantitative characters are continuous and controlled by many genes (Li et al., 2017), and the environment greatly influences the traits of a genotype unless most of the genes in these traits have the same variance response direction (Acquaah, 2012). Based on their genetic characteristics, coffee plants are relatively highly heterozygous because the propagation of coffee plants is mainly done by cloning of high hybrid vigor plants (Mohammed, 2011; Geneti, 2019) so that their characteristics will be more diverse between genotypes in an environment. Therefore, the quantitative characters in this study need to be filtered first to get the main characters with a large diversity.
The variable should have a variance pattern among the objective in kinship identification. It could help to know the specific characters which determine the object mapping, especially in the cluster and PCoA. So, based on this, plant habit, leaf petiole color, mature leaf color, inflorescence position, anther insertion, and fruit-disc shape were not included in cluster analysis and PCA.
In general, cluster analysis is one of the multivariate analyses used to visualize the relationships between objects based on the similarity or dissimilarity distances of various characters (Mattjik and Sumertajaya, 2011). Several studies widely use this analysis to determine the genetic distance between each genotype of a population, using both quantitative characters (Tounekti et al., 2017; Malau and Pandiangan, 2018; Anshori et al., 2020), qualitative characters (Henry et al., 2015), metabolomic data (Hanifah et al., 2018), molecular data (Li et al., 2017) and the combination of each character (Tessema et al., 2011; Solankey et al., 2015; Kachare et al., 2020). Based on the cluster analysis results, the coffee clones found in the IFCRI germplasm collection have relatively high diversity, both in arabica and robusta coffee. Clones with close kinship are BP 308 and SA 237 in the robusta coffee type, so crossing between these coffees is not recommended because they have close kinship distances.
The PCoA was performed to refine and detail the clustering performed by cluster analysis. Cluster analysis can only identify kinship distances. However, cluster analysis cannot explain how the kinship is formed from many variables (Evgenidis et al., 2011; Anshori et al., 2020). PCA and PCoA cover this deficiency to understand the formed groupings comprehensively. Several studies have also reported this concept in complementing the information contained in cluster analysis (Evgenidis et al., 2011; Tounekti et al., 2017). The PCoA biplot analysis is preferred in this study over the PCA biplot analysis. In general, the difference between PCA and PCoA lies in the basis of the matrix in making eigenvalues and eigenvectors. The PCA is based more on correlation or covariance analysis, while PCoA is based on kinship value matrices (Mattjik and Sumertajaya, 2011). The PCoA allows various categorical data, especially those with many 0 values, or a mixture of categorical and parametric can be distributed linearly like PCA analysis (Liu et al., 2019). The results of PCA analysis are in line with the data on dendrogram formation, mostly categorical data. Therefore, kinship values are better used in the root characteristic formation than correlation or covariance, which is more identical to quantitative data.
Based on cluster analysis and PCoA, the diversity of characters in the robusta coffee type is higher than the diversity in arabica coffee. The robusta coffee is a cross-pollinating (Ramadiana et al., 2018), while the arabica is a self-pollinating plant (Déchamp et al., 2015). In addition, arabica coffee generally grows the best above 800 m a.s.l. (Nugroho et al., 2016; Konieczka et al., 2020). Cross pollinations make robusta coffee more diverse than arabica coffee. In addition, based on the results of kinship analysis, the environment of the IFCRI was considered quite good in mapping the traits between coffee clones so that the environment could be recommended as an identification environment for the diversity of coffee accessions. This study results need further evaluation for the taste-associated characters and a larger population to increase the result’s precision.
CONCLUSIONS
At the Indonesian Industry and Freshner Crops Research Institute (IFCRI), Sukabumi, coffee clones show high diversity in clone grouping, both between types and between clones within types. The clustering results showed that Subgroup 1 of the first arabica coffee consisted of BP 308, SA237, and SA203 coffee clones with specific characters in the firmness of the leaf surface wave, stipule shape, and flower base shape. Subgroup 2 of robusta coffee consisted of BP 436 and BP 42 clones with specific characters in young leaf color, flower number per fascicle, inflorescence on old wood, leaf shape, vegetative development, bean thickness, and overall bean appearance, corolla width, branch angle, and husk seed width. Sub-group 1 on arabica coffee consisted of the S795 clone with the firmness of leaf edge waves and fruit shape as its specific characters. In contrast, the second sub-group of arabica coffee consisted of Karika 1 and Karika 2 clones with particular characters: leaf apex shape, flower tip shape, fruit color, endocarp texture, and fruit length, pulp thickness, bean length, and husk seed shape. Based on this research, it can also be concluded that the environment of the IFCRI is considered suitable for the selection and identification of the morphological lines of coffee, especially robusta coffee.