New insights on spatial genetic structure and diversity of Coffea canephora (Rubiaceae) in Upper Guinea based on old herbaria

1CIRAD, UMR AGAP, FR-34398 Montpellier, France 2AGAP, Université de Montpellier, CIRAD, INRAE, Montpellier SupAgro, Montpellier, France 3Postal address: CIRAD, UMR AGAP, Avenue Agropolis, TA A-108/03, FR-34398 Montpellier Cedex 5, France 4DIADE, Université de Montpellier, IRD, FR-34394 Montpellier, France 5UMR 7206 Éco-Anthropologie, CNRS, MNHN, Université Paris Diderot, FR-75116 Paris, France *Corresponding author: jean-pierre.labouisse@cirad.fr CIRAD: Centre de coopération Internationale en Recherche Agronomique pour le Développement; AGAP: Amélioration Génétique et Adaptation des Plantes tropicales et méditerranéennes; CNRS: Centre National de la Recherche Scientifique; INRAE: Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement; DIADE: DIversité Adaptation Développement des plantes; IRD: Institut de Recherche pour le Développement; MNHN: Muséum National d’Histoire Naturelle; UMR: Unité Mixte de Recherche REGULAR PAPER


INTRODUCTION
Coffee is a popular beverage enjoyed by millions of consumers throughout the world and a major export commodity, with about nine million tons of green coffee produced yearly (ICO 2018: 7). Among the 124 species of the genus Coffea L. (Davis et al. 2006;Davis 2011), only two are widely cultivated: C. arabica L. (arabica coffee) and C. canephora Pierre ex A.Froehner (robusta coffee). A third species is much more rarely in cultivation, Coffea liberica W.Bull. ex Hiern (liberica coffee). While arabica remains the main traded species, robusta's share has gradually increased since the beginning of the twentieth century and now accounts for almost 40% of the world's coffee market (ICO 2018: 7). Robusta production has also been boosted by the demand for instant coffee in emerging countries and by the wide price gap between arabica and robusta (Wallengren 2017). While there are vast tropical lowland reserves suitable for the production of C. canephora, it is likely that, in a few decades, optimum cultivation requirements will become increasingly difficult to reach in many arabica coffee producing regions due to global warming (Davis et al. 2012). Arabica producers in South and Central America are already experiencing a devastating outbreak of coffee leaf rust (CLR), caused by Hemileia vastatrix Berk. & Broome (Avelino et al. 2015). Coffea canephora is more productive, less sensitive to high temperatures, and more resistant to CLR than C. arabica. Although robusta beverage is less prized than arabica, there is a high variability in quality traits within this species and the presence of consistent quantitative trait loci provides breeders with promising tools to improve the cup quality of robusta coffee (Leroy et al. 2011).
Coffea canephora is a perennial forest shrub with the largest natural distribution range for a coffee tree species, stretching from Guinea to Uganda with a discontinuity in the Togo-Benin region (Dahomey Gap) (Cubry et al. 2013b). The type specimen of the species C. canephora was collected in Gabon in 1895 by the Reverend Théophile Klaine and sent to the botanist Jean Baptiste Louis Pierre at the Muséum National d'Histoire Naturelle (MNHN) in Paris, France under the name "Café indigène des Ishira" [specimen Klaine 247 (P)]. Pierre named the species Coffea canephora; Pierre's name remained unpublished, however, and was validated by Froehner (1897).
Coffea canephora is a diploid, allogamous, and self-incompatible species (Devreux et al. 1959). Its geographical range and mating system, as well as the impact from past climates, have resulted in wide genetic variability within the species (Gomez et al. 2009). Another consequence of its mating system is that the majority of cultivated C. canephora is still made up of unselected populations obtained from openpollinated seeds (Eskes & Leroy 2012: 82).
A major breakthrough in our understanding of the genetics of C. canephora was made in the 1980s by Berthaud (1984) who identified two main diversity groups by analysing isozyme polymorphism. Following the chorological system established by White (1979), Berthaud proposed to classify the wild populations growing in Upper Guinea as "Guinean", and those from Lower Guinea and Congolia as "Congolese". Further studies based on the use of DNA markers (RFLP and SSR) validated this classification in two major genetic groups and led to the recognition of five sub-groups (SG1, SG2, B, C, and UW) within the Congolese group (Dussert et al 2003;Gomez et al. 2009;Musoli et al. 2009;Cubry et al. 2013b;Leroy et al. 2014).
In Upper Guinea, wild C. canephora was recorded for the first time in forests of Guinea and Côte d'Ivoire by Auguste Chevalier, in 1905 and1909 respectively (Chevalier 1905(Chevalier , 1920. In both countries, the cultivation of robusta began in the second decade of the twentieth century, mainly on the initiative of a few European planters (Portères 1939(Portères , 1962Fréchou 1955;Cordier 1961). Although these planters may have used seeds and seedlings taken from the forests close to their plantations, they mainly imported robusta seeds from Gabon and the former Belgian Congo (now the Democratic Republic of Congo) to compensate for a shortage of plants and, from 1921, for losses caused by attacks by the coffee berry borer Hypothenemus hampei (Ferrari) (Portères 1939). Massive quantities of robusta seeds belonging to the Congolese group were again introduced after the outbreak of tracheomycosis, a disease caused by Gibberella xylarioides R.Heim & Saccas, which affected West African plantations from 1948 onwards (Cordier 1961). Thus, C. canephora plants of the Congolese group coexisted with local C. canephora in the plantations. In Côte d'Ivoire, Berthaud (1984) analysed the genetic diversity of several clones previously selected for their outstanding agronomic value. He demonstrated that natural hybridization occurred between the two genepools and showed that a majority of these selected clones were Guinean-Congolese F 1 hybrids. In the following years, a breeding programme based on reciprocal recurrent selection was launched in Côte d'Ivoire to create hybrids from the germplasm of the two genetic groups (Leroy et al. , 1997. The genetic resources conserved in ex situ gene banks were the basis of that breeding programme. The largest gene bank of C. canephora is located at the Centre National de Recherche Agronomique (CNRA) research station near Divo, Côte d'Ivoire, where about 1900 clonal accessions are conserved in the field. Many of these accessions have been genetically characterized over the past 30 years using isozymes, RFLP, or SSR markers, e.g. by Montagnon et al. (1992aMontagnon et al. ( , 1993, Dussert et al. (2003), Gomez et al. (2009), andCubry et al. (2013b). Only 205 accessions (about 10%) of C. canephora germplasm in the CNRA gene bank belong to the Guinean group. In Guinea itself, a field gene bank of 101 accessions is currently maintained by the Institut de Recherche Agronomique de Guinée (IRAG) at the Sérédou research station near Macenta, Guinea Forest Region. It consists only of clones and progeny of planting material imported in the past from the former Belgian Congo. Other worldwide collections of robusta germplasm contain mainly accessions belonging to the Congolese group (Anthony et al. 2007). Only two French institutes, the Institut de Recherches pour le Développement (IRD) and the Centre de Coopération Internationale en Recherche Agronomique pour le Développement (CIRAD) conserve a few accessions of the Guinean group in their biological resource centres in Réunion and in French Guiana respectively.
Consequently, although C. canephora native to Upper Guinea may be an important source of diversity for the genetic improvement of the species, its germplasm is underrepresented in ex situ gene banks. While some populations from Côte d'Ivoire have been recently collected and characterized, very little is known about germplasm from Guinea. Since the mid-20 th century, the natural forests of Côte d'Ivoire and Guinea have been highly threatened by deforestation (Brou et al. 2000), reinforcing the need to assess the existence of potentially valuable and largely untapped genetic resources for current and future breeding schemes.
To overcome the limitations of living collections and expand the geographical and temporal range of data on C. canephora belonging to the Guinean group, we managed to get access to old herbarium specimens. Recent advances in genomic technology have revealed the potential of herbarium specimens for population genetics and phylogeographic studies (Lister et al. 2010). Genotyping studies on herbarium samples using SSR markers have been successfully conducted on various species, e.g. on 100 year-old emmer wheat (Lister et al. 2008) and on sweet potato samples dating back from the eighteenth century (Roullier et al. 2013). To carry out our study, we used the resources of two herbaria located in the MNHN in Paris. The Paris vascular plant herbarium, registered in Index Herbarorium as P (Thiers 2019), conserves 91 herbarium sheets of C. canephora collected in Côte d'Ivoire and Guinea corresponding to ca. 50 distinct plant specimens, including the oldest specimens collected in 1905 by Chevalier (1905). Roland Portères was another prolific collector of African cultivated plants (Leroy 1974). The second herbarium (acronym PAT for "Paris, Agronomie Tropicale" according to Index Herbariorum), currently managed by the Laboratoire d'Éco-Anthropologie (http://ecoanthropologie. cnrs.fr), conserves a thousand specimens of Coffea spp. collected by Portères, a quarter of which are wild and cultivated C. canephora collected in Guinea andCôte d'Ivoire between 1929 and1961. Our study is restricted to these two countries, which represent only a part of the Upper Guinea subcentre, but cover most of its west -east extension. Large herbarium and germplasm collection programmes were conducted in these former French colonies, where coffee cultivation was particularly developed with active administration support to research and extension services.
The aim of our study is to provide an updated overview of the geographical distribution and genetic diversity of C. canephora originating in Guinea and Côte d'Ivoire. First, we provided a detailed overview about the history of collection missions in both countries and how the coffee herbaria were assembled. Then, we investigated the pattern of genetic diversity of 126 herbarium specimens, supplemented by 36 genotypes used as controls, with a set of 23 polymorphic nuclear markers (SSRs) and a combination of principal coordinates analysis (PCoA), neighbor-joining (NJ) analysis, and modelbased Bayesian analysis. We discussed the results in the light of historical documentation. Lastly, we gave an overview of deforestation trends and of other factors that could hamper further collections of native C. canephora in both countries.

Guinea
About three thousand kilometres from the type locality in Gabon, the first botanical record of wild C. canephora of West Africa was made by Chevalier in 1905 in Guinea, on the southern edge of the Fouta Djallon plateau ( fig. 1) near the village of Bilima, 20 km west of Mamou (Chevalier 1905). Chevalier named it Coffea maclaudii (as Maclaudi) after Charles Maclaud, a colonial military doctor, who had reported the occurrence of coffee plants in the upper valley of the Konkouré River a few years before (Maclaud 1899). With Octave Caille, a head gardener of the Jardin des Plantes in Paris, Chevalier (1914) collected several samples from these coffee trees and sent them to P [Apr. 1905: Chevalier 12181, 12330 (type specimen), 12331 (specimen lost), 12332, 12333;May 1905: 12332bis;8 Sep. 1905: 14893]. Caille made an additional collection in June 1913. From his monograph (Les Caféiers du Globe) onwards, Chevalier named it C. canephora var. maclaudii (Chevalier 1929: 83).
Another important contributor to the P herbarium was Pierre Barthe, an agronomist who was head of the agricultural extension service of the cercle of Macenta (a cercle was the smallest unit of administration in French Colonial Africa), Guinea Forest Region from 1934 to 1936 (Anonymous 1934). In 1936, he collected coffee in forests and plantations in the region, established a collection plot at the Macenta farm school and sent samples to P.
In 1939, the Sérédou experimental station for coffee and Cinchona officinalis L. was established near the village of Sérédou in the Ziama massif, 35 km southeast of Macenta, with Portères as director (Tourte 2005a: 185). Portères planted several collection plots mainly with clones or seeds transferred from Côte d'Ivoire, but originally imported from the Lula and Yangambi research stations of the Institut National pour l'Étude Agronomique du Congo belge (INEAC). He also collected material from local plantations and from Barthe's collection (Sérédou centre annual report 1958, unpublished report). Portères sent some samples from his collections to PAT in 1958 and1959. In 1962, Portères published a review of Coffea species cultivated in Guinea, which remains the most detailed and comprehensive work on the topic to date (Portères 1962). He classified the C. canephora specimens that "grew spontaneously" in Guinea in three groups: -(i) var. maclaudii A.Chev.: from Bilima near Mamou. -(ii) cultivar Gamé (sensu stricto): from the village of Bambaradou, a few kilometers from Macenta, and named after Gamé Guilavogui, the chief of the canton (a territorial subdivision gathering several villages) of Kolibirima Toma in the early 1930s (JOGF 1935) who promoted Gamé cultivation with the support of the colonial administration. Portères reported that 50% of coffee area in the cercle of Macenta was planted with Gamé cultivar in the late 1950s. He considered this cultivar more productive than any other material introduced in Guinea and its cup quality as "excellent". In addition to Gamé s.s., he defined a "Gamé geographical-racial complex" (or Géo-Gamé) as a group of coffee populations that he thought native and growing over a vast area that extends beyond the border of the cercle of Macenta, from Kissidougou (to the north-west) to Nzérékoré and Beyla (to the east). -(iii) cultivar Gouecké: originally from the vicinity of Gouecké in the north of the cercle of Nzérékoré.
When Guinea became independent in 1958, the Sérédou coffee collection contained coffee trees belonging to the Congolese group and only eight clones of Gamé (Sérédou centre annual report 1958, unpublished report). In 1990, IRAG research staff started to set up a new collection of clonal material with remaining plants from Portères' collection plots and a few clones collected from plantations. At the time of writing (June 2018), the IRAG gene bank comprised 101 accessions of C. canephora, most of which are clones or progenies of the Congolese group. No cv. Gamé or var. maclaudii genotypes remain.

Côte d'Ivoire
Chevalier made the first record of native C. canephora in Côte d'Ivoire in 1909, in the forests near Assikasso in the Indénié Region, 45 km northeast of Abengourou, in the far east of the country [specimens: 18 Dec. 1909: Chevalier 22589, 22590, and 22602 (P)]. Based on morphological similarities with samples collected in Bilima (Guinea), more than 1000 km to the west, he identified it as conspecific with his Coffea maclaudii A.Chev. from Guinea (Chevalier 1920: 336). In 1912, Arsène Dellabonin, an employee of the Assikasso agricultural station, collected similar populations in the same area, near the village of Kongodia, and sent samples to P (Portères 1937b). This coffee was widely cultivated from 1920 in eastern Côte d'Ivoire under the name of Café Petit Indénié, in comparison with the species C. liberica found by Chevalier in the same area and called Café Gros Indénié because of the large size of the bean (Portères 1939).
In 1912, while hunting near Koro, northeast of Touba (Western Côte d'Ivoire), the planter Jules Landré found wild C. canephora in gallery forests along the Irama-ba River and two tributaries of the Sassandra River: the Férédougouba and the Boa-ba. This coffee, and other populations gathered in the same area by Le Campion in 1913, were called Café Landré or Café Touba (Portères 1937b). In 1915, it was introduced into the collection of the agricultural station of Bingerville near Abidjan (Portères 1934) and later duplicated and sent to the station of the Institut d'Enseignement et de Recherches Tropicales (IDERT) near Adiopodoumé, from where Portères sampled it for PAT.
In 1929, Portères founded the agricultural station of Man, in western Côte d'Ivoire, and set up collection plots and comparative trials of various species of Coffea, including native and imported C. canephora (Poupart 1938;Tourte 2005a: 193). From 1929 to 1931, Portères collected samples around Man, Touba, and Béoumi. In 1937, he published a first review of Coffea spp. in Côte d'Ivoire with their geographical distribution and ecology, the maps of the collection sites in the above-mentioned areas, and the main morphological characteristics of the coffee plants (Portères 1937a(Portères , 1937b(Portères , 1937c. Later, in 1958 and again in 1962, Portères and Louis Cordier, a geneticist at the Centre de Recherches Agronomiques (CRA) in Bingerville, assembled the largest coffee herbarium collection ever made in Côte d'Ivoire. These collections included coffee plants from various origin: local or imported, from forests, smallholders' plantations, estates, and agricultural research stations. Duplicates were sent to PAT. There is no published inventory of this herbarium but handwritten documents by Portères give detailed information on the location, origin and morphological traits of the specimens.
Beginning in 1957, the coffee germplasm preserved in the first research stations (Bingerville, Akandjé, Abengourou) and agricultural stations (Man, Gagnoa) was gradually transferred to a new site, near the town of Divo ( fig. 1), which became the main research station of the Institut Français du Café et du Cacao (IFCC 1960;Tourte 2005b: 221). This station and its coffee gene bank are now managed by the CNRA.
When Côte d'Ivoire became independent, Portères (1959) and Cordier (1961) traced the history of coffee cultivation and made an inventory of the native and imported coffee material in the country. On the basis of morphological traits, they classified the populations of C. canephora into two groups: "Kouilou" and "Robusta". On average, "Kouilou" seeds, flowers, leaves, and the tree itself are smaller than those of "Robusta" (Portères 1959). Portères and Cordier classified most of the populations of C. canephora native to Côte d'Ivoire (Petit Indénié, Touba, Bandama, Tos, Kouibly, Agbo, Dianlé, etc.) as belonging to the "Kouilou" group. Only the Ébobo population, which Portères thought originated near Aboisso in the extreme southeast of Côte d'Ivoire, was classified as "Robusta". In fact, there was already a Café du Kouilou (or Café Kouilou) that was named after the Kouilou River in the Republic of the Congo, along which it was discovered in the 1880s (Chevalier 1929: 85-86). Based on similar morphological characteristics, Chevalier used the term Kouilou in 1909 to describe the C. maclaudii that he had found in Guinea in 1905 (Chevalier 1909: 260). Kouilou is also spelled Kouillou, Quillou, or Kwilu. By orthographic distortion, the term became Conilon in Brazil and is still applied to the first C. canephora populations introduced at the beginning of the twentieth century in that country (Ferrão 2007). The use of the terms Kouilou and Robusta has caused some confusion in the classification of C. canephora, an allogamous species with great variability in morphological traits, even within a single population. Berthaud's genetic works clarified the nomenclature. He found that: (i) all populations of "Robusta" (including the Ébobo population), as well as the true Café du Kouilou originating in the Republic of the Congo, belong to the Congolese group and (ii) all the "Kouilou" populations in Côte d'Ivoire belong to the Guinean group (Berthaud 1984: 111).
Following Berthaud's findings, major germplasm collecting efforts resumed from 1975, mainly in Côte d'Ivoire (Berthaud 1984: 12;Le Pierrès et al. 1989;Couturon & Montagnon 1991). Twenty-one populations of C. canephora were collected from forest sites in Côte d'Ivoire, and one population from the forest of Piné in Guinea. These represent the largest part of the Guinean coffee germplasm currently preserved in the CNRA gene bank (Dussert et al. 2003: 242). Among these populations, Cubry (2008) and Cubry et al. (2013b) genotyped seven populations using microsatellite markers with fresh samples provided by the CNRA gene bank. We used DNA of a few genotypes taken from these populations as controls for the diversity study.

Material
We genotyped 162 samples of which 126 were herbarium specimens and 36 were obtained from previous works and used as controls (table 1). More details on the material are available in supplementary file 1 (worksheets A1 to A4).
We focused on herbarium specimens reported to be "spontaneous", or "wild", or "local" by the collectors. The collection dates range from 1905 to 1993. The collection points (red dots in fig. 1) are rather regularly distributed along a north-east -south-west axis from Bilima near Mamou, Guinea to Assikasso near Abengourou, Côte d'Ivoire, two locations 1000 km apart, with a gap in the Faranah Region, Guinea. In total, 40 herbarium specimens were collected from 15 different sites in Guinea and 86 herbarium specimens from 34 sites in Côte d'Ivoire.
-Luki: genotypes collected in forests and plantations in the Mayumbe region, Democratic Republic of the Congo. They belong to the SG1 sub-group.
-Niaouli: population mainly cultivated in Togo and Benin, but originating from the Atlantic Coast of Gabon (Adibolo & Bertrand 1988). It was introduced into Côte d'Ivoire in 1914 (Portères 1934), and later to Guinea (Portères 1962). Niaouli coffee belongs to the SG1 sub-group.

Methods
DNA extraction -All the samples were processed at the Grand Plateau Technique Régional (GPTR) facility in Montpellier, France (http://www.gptr-lr-genotypage.com). Genomic DNA was extracted from approximately 20 mg of dried leaf tissue according to the MATAB (Mixed Alkyl Trimethylammonium Bromide) protocol described by Risterucci et al. (2000). The DNA concentration was estimated with a Fluoroskan Ascent Microplate Fluorimeter (ThermoFisher Scientific, Waltham, Massachusetts, USA) with a bisbenzimide DNA intercalator (Hoechst 33258) and by comparison with known standards of DNA.
Microsatellite markers -We used SSR markers defined from C. canephora and C. arabica genomes and already described (Poncet et al. 2004(Poncet et al. , 2007Leroy et al. 2005;Cubry et al. 2008Cubry et al. , 2013a. Because DNA is highly fragmented in herbarium specimens, we selected short size loci (less than 200 bp) as recommended by Särkinen et al. (2012) in order to increase the PCR success rates. Thirty-three markers were initially tested, but 10 were removed from the final analysis due to a high level of missing data or low genotyping quality (table 2). Pl. Ecol. Evol. 153 (1), 2020 55°C for 60 s (0.5°C decrease at each cycle), 72°C for 1 min, followed by 25 cycles at 94°C for 45 s, 50°C for 1 min, 72°C for 1 min and a final extension at 72°C for 30 min.
Fluorescently labelled PCR products were then organized in five pools for electrophoresis, using respectively 2 µL of products labelled with 6-FAM, 2 µL of those with VIC, 2.5 µL of those with NED and 3.5 µL of those with PET, and completed at 20 µl with high purity water. Table 2 shows which fluorochrome was used to label each marker and how PCR pools were composed. 2 µL of this solution was taken and added to 10 µL of Hi-Di formamide and 0.12 µL of Ge-neScan 600 LIZ size standard (Applied Biosystems). Migration of PCR products was made on an ABI 3500xL Genetic Analyzer (Life Technologies, Carlsbad, California, USA). Alleles were scored using GeneMapper v.4.1 software (Applied Biosystems).
For some low-quality samples resulting in poor PCR amplification, we used a modified version of this protocol. After extraction, genomic DNAs were purified with Agencourt AMPure XP (Beckman Coulter, Brea, California, USA) magnetic beads with 1 volume of DNA for 1.8 volume of beads, then standardized at 0.5 or 1 ng/µl with high purity water. Then, we performed PCR in 10 µl reaction volume using Qiagen Type-it Microsatellite PCR Kits with 5 µL of PCR MasterMix (HotStarTaq Plus DNA polymerase, PCR buffer, dNTP Mix), 2 µl of purified DNA (0.5-1 ng), 0.2 µL of primer mix (0.1 µL of 10 µM forward primer with an M13 tail at the 5'-end, 0.1 µL of 10 µM reverse primer), 0.2 µL of fluorescently labelled M13-tail (6-FAM, NED, VIC or PET from Applied Biosystems), and 2.7 µL of high purity water. The PCR amplification was conducted with an initial denaturation at 95°C for 5 min, followed by 10 cycles of 95°C for 30 s, 55°C for 1 min 30s (0.5°C decrease at each cycle), 72°C for 30 s, followed by 25 cycles at 95°C for 30 s, 50°C for 1 min 30 s, 72°C for 30 s and a final extension at 60°C for 30 min. Dilution, migration of PCR products, and allele scoring were made as previously described. Structure analysis -To identify the pair-wise genetic relationships between the individual genotypes, we computed a genetic dissimilarity matrix using simple matching index with DARwin v.6 software (Perrier & Jacquemoud-Collet 2006). An overall representation of the structure of genetic diversity was obtained by a factorial analysis (Principal Coordinates Analysis, PCoA) using distance matrices, while genetic relationships between individuals were assessed using the neighbor-joining (NJ) method (Saitou & Nei 1987), as implemented in DARwin v.6.
In order to test for sample clustering and to estimate admixture proportions for each individual, we used the model-based Bayesian approach implemented in STRUCTURE v.2.3.4, (Pritchard et al. 2000). The parameters were set to a burn-in period of 100,000 with 200,000 iterations. We performed 20 independent runs for each K, K varying from 1 to 10. We used the online servers CLUMPAK (Kopelman et al. 2015) and STRUCTURE HARVESTER (Earl & vonHoldt 2012) to analyse the STRUCTURE outputs and the method of Evanno et al. (2005) to identify the number of genetic clusters corresponding to the uppermost hierarchical level of genetic partitioning between populations.
Firstly, we analysed the data of all the samples of both countries, assigned each sample to Congolese or Guinean groups and estimated admixture rates. Then, we removed all the Congolese samples as well as admixed ones (we chose a threshold of 95% for membership coefficient) and performed the same analyses in order to reveal a possible structure within the Guinean group. Diversity parameters -We calculated several descriptors of genetic diversity: allele number per marker, observed heterozygosity (Ho), and expected heterozygosity (He), and performed a test of Hardy-Weinberg (HW) equilibrium, using the R package STRATAG (Archer et al. 2017). These statistics were computed for different sets of genotypes defined according to the structure analysis. We used the R package HIERFSTAT (Goudet 2005) to calculate allelic richness, corrected for the different sample size of populations, using a rarefaction method as recommended by El Mousadik & Petit (1996). We also computed the number of private alleles for the different sets of genotypes. Lastly, three different measures of population differentiation (Weir's Fst, Gst, and Jost's D), as well as their significance, were calculated between each genetic group or sub-groups using the R package STRATAG: Congolese group vs. Guinean group and between all Guinean sub-groups as defined by the structure analysis.

Structure analysis
Whole sample -The genotyping data matrix of the 162 samples may be found in supplementary file 1 (worksheet B). The results of the factorial analyses of the SSR-based dissimilarity matrix are represented in fig. 2. The first axis enabled a clear separation of the Guinean and Congolese groups with 19.68% of the global inertia. The second axis (5.88% inertia) and the third axis (4.75% inertia) separated the SG1 and SG2 Congolese sub-groups. Most of the herbarium specimens were distributed in the Guinean group. However, some herbarium specimens were grouped with the Congolese controls, and others occupied an intermediate position between Guinean and Congolese groups, which allowed us to classify them as admixed genotypes. The NJ tree (supplementary file 2) confirmed that classification. Similarly, Bayesian analysis using STRUCTURE clearly differentiated two clusters corresponding to Guinean and Congolese groups ( fig. 3 and supplementary file 3). Examining these results in detail and by collecting country (supplementary files 2 & 3), we found that: (i) in Guinea, eight herbarium specimens (numbers 11 to 18) collected in the villages around Kissidougou grouped together with the Congolese genotypes (SG1 or SG2 sub-groups) used as controls, while all other samples were assigned to the Guinean group (membership coefficient Q > 95%); (ii) in Côte d'Ivoire, three herbarium specimens from Ébobo (numbers 151, 152, and 153) collected by Portères near Aboisso grouped together with the Niaouli and Luki populations (SG1 Congolese sub-group); (iii) one herbarium specimen (number 142), collected in 1930 near Abengourou and stated to be "spontaneous" by  Pl. Ecol. Evol. 153 (1), 2020 Chevalier, clearly belongs to the SG1 Congolese sub-group (Q > 99.5%). (iv) unlike the herbarium specimens collected in Guinea, some of those collected in Côte d'Ivoire are Guinean-Congolese admixed genotypes. The results of STRUCTURE program (K = 2) showed that 12 herbarium specimens have a membership coefficient that varies from Q = 47.9% (probable F 1 hybrid between Congolese and Guinean genotypes) to Q = 95%. For example, Portères collected six plants "in a plantation established from local seeds and seedlings collected in the forest of Tos" near Daloa (Centre West of Côte d'Ivoire). Among them, we found three pure Congolese (Q > 99% for numbers 101, 102, and 103), one putative F 1 hybrid (Q = 52% for number 104), and two admixed (Q = 10.8% and 35.7% for numbers 99 and 100 respectively). Within Guinean group -Once we removed the Congolese, putative Guinean-Congolese F 1 hybrids, and other admixed genotypes (Q < 95%), there were 126 samples of probable Guinean origin left (99 herbarium specimens and 27 controls) that we reanalysed with the same tools. The results of factorial analysis of the matrix of the Guinean group samples did not reveal a strong structure ( fig. 4). However, the first axis (9.98% of the global inertia) enabled the separation of all the specimens of C. canephora var. maclaudii near Bilima and Gamé cultivar near Macenta from more eastern origins. The second axis (5.37%) separated maclaudii specimens from Gamé specimens. In accordance with the factorial analysis, the results of the Bayesian assignment for K = 2 (best K according to Evanno's test) reflected the separation of the maclaudii and the Gamé specimens from more eastern origins ( fig. 5 and supplementary file 4). From K = 5, the maclaudii and the Gamé specimens were assigned to two distinct ancestry sub-groups with little admixture. The other specimens collected from South East Guinea to East Côte d'Ivoire were distributed into three other clusters with a significant level of admixture and no strong geographical support.

Genetic diversity of the Guinean and Congolese groups
Once we removed the admixed individuals based on STRUC-TURE assignation threshold defined above (i.e. removal of individuals having Q < 95%), we calculated the diversity parameters of the 126 individuals of the Guinean group and the 24 individuals of the Congolese group. We found the Congolese group more diverse than the Guinean one with higher expected and observed heterozygosities (mean He = 0.67 and mean Ho = 0.51 for Congolese while mean He = 0.48 and mean Ho = 0.34 for Guinean, supplementary file 5, worksheet 2) and higher rarefied allelic richness per marker (mean of 6.87 for Congolese and mean of 4.21 for Guinean, supplementary file 5, worksheet 10). Several markers were found not to be at HW equilibrium within both groups (16 and 21 out of 23 markers for Congolese and Guinean groups respectively). We also found a high number of private alleles differentiating the two groups, with 79 alleles found within the Congolese group and 54 within the Guinean group (supplementary file 5, worksheet 9). The three complementary differentiation statistics computed were consistently high and highly significant between the two groups (supplementary file 5, worksheet 11).

Genetic diversity and measure of population differentiation between STRUCTURE-defined Guinean genetic sub-groups
We assigned genotypes to sub-groups based on within Guinean group STRUCTURE analysis at K = 5 by considering a threshold of 75% of ancestry. This left us with 91 samples (out of 126) clustered into five different Guinean sub-groups (sgG1 to sgG5) of 10,9,18,29, and 25 genotypes respectively, the remaining 35 samples having admixed ancestry (supplementary file 5, worksheet 3). Mean number of alleles per marker ranged from 1.61 (sgG1) up to 4.04 (sgG3). Observed heterozygosity and expected heterozygosity remained rather high, with values ranging from 0.16 to 0.41 (Ho) and from 0.16 to 0.5 (He) depending on sub-groups. The number of markers found not to be at HW equilibrium within subgroups ranged from 1 to 11. The number of private alleles detected was moderate, ranging from 2 to 19 depending on the sub-group considered (supplementary file 5, worksheets 4 to 9). Genetic diversity assessed through rarefied allelic richness ranged from 1.64 to 3.38, with the lowest value for sgG1 as compared to the other groups (supplementary file 5, worksheet 10). As for the measures of genetic divergence between sub-groups, they were rather high for Fst and Gst and moderate for Jost's D but all were highly significant, whichever the pairwise comparison considered (supplementary file 5, worksheet 11).
Based on the results of the STRUCTURE analysis of the Guinean group detailed above, we represented the geographic Figure 5 -Results of STRUCTURE analysis for 126 samples of Coffea canephora belonging to the Guinean group. Estimated population structure for K = 2 and K = 5. Each individual is represented by a thin vertical line, which is partitioned into K coloured segments that represent the individual's estimated membership fractions in K clusters. distribution of the five Guinean sub-groups on a map ( fig. 6). The first sub-group sgG1, located in the south-west of Fouta Djallon, corresponds to Coffea canephora var. maclaudii with 10 specimens grouped into a single cluster with very little admixture from other groups. The second sub-group sgG2 corresponds to the Gamé cultivar located near Macenta. The last sub-groups, less differentiated, are located in the south of Côte d'Ivoire near the towns of Divo and Tiassalé (sgG3), and in the west and centre west of Côte d'Ivoire (sgG4 and sgG5, respectively).

DISCUSSION
The most recent comprehensive reviews of C. canephora in Guinea and Côte d'Ivoire, by Portères, date back to the end of the 1950s (Portères 1959(Portères , 1962. He classified coffee plants as local or introduced based on historical records (e.g. gene bank entry books), habitat description, phenotypic observations, and by questioning planters and villagers during his collection missions. By genotyping old herbarium specimens with nuclear microsatellites and appropriate controls, we were able to determine whether each specimen was native or introduced, and, if applicable, to estimate the level of admixture. In the light of our results, Portères' observations appear to be remarkably precise and correct. Two notable exceptions are specimens collected around Kissidougou in Guinea and those collected near Aboisso (Ébobo population) in Côte d'Ivoire, which both grouped with Congolese genotypes whereas their collectors considered them native to the country.

Pattern of diversity of genotypes belonging to the Guinean group
Within the Guinean group, our analyses revealed five subgroups distributed over the two countries ( fig. 6) and variable levels of admixture between them.
In Guinea, two sub-groups were clearly distinguished from the others, suggesting restricted gene flow probably due to their geographical isolation. The first sub-group (sgG1) corresponds to Coffea canephora var. maclaudii determined by Chevalier in 1905 near Bilima. The 10 analysed specimens (numbers 1 to 10), collected between 1905 and 1993, grouped into a single cluster with very little admixture from other groups. This is the most western and the most northern natural habitat of C. canephora ever found in Africa. The high plateaus of the Fouta Djallon form a natural barrier separating the Southern Rivers basin from the upper basin of Bafing-Sénégal (Boulvert 2003: 43). Bilima, located west of Fouta Djallon, is exposed to rains (about 2000 mm per year) coming from the coast and has a climate that is suitable for coffee. The only known site where coffee was observed is a forest island, approximately 1400 m long and 350 m wide, which stretches down over the western slope of the Bilima-Hénéré plateau, south of Bilima village, and is surrounded by cultivated fields and fallow land. This diversity group also had the lowest allelic richness observed in our study, suggesting both its peculiarity and vulnerability.
A second Guinean sub-group (sgG2) corresponds to the Gamé cultivar described by Portères (1962). Bilima and Macenta are separated by the southern prolongation of the Fouta Figure 6 -Geographical distribution of Guinean sub-groups of Coffea canephora (pie charts). Assignment in STRUCTURE analysis at K = 5 assumed groups (Q ≥ 75% membership). The four main vegetation zones are delimited by coloured lines: Guinean domain (Z1: moist evergreen forest, widely degraded; Z2: moist semi-deciduous forest); Z3: transition zone; Z4: Sudanian domain. The lines for vegetation zones are drawn after distribution data presented in maps by Guillaumet & Adjanohoun (1971) for Côte d'Ivoire and Boulvert (2003) for Guinea. Djallon to Mamou and by a forest-savanna transition zone around Faranah and Kissidougou over a distance of about 300 km. According to Portères, the area of origin of Gamé is limited to the plain near Macenta and the surrounding hills, west of the Ziama massif. The only herbarium specimen (number 19) taken from the sub-group's natural habitat was collected by Barthe in 1936 in Bambaradou, a village located 3 km from Macenta, west of the Ziama massif, and which was considered by Portères as the birth place of Gamé. The other specimens were collected by Portères in 1958 either from Barthe's collection established in Macenta (number 20) or from the plantation that Fora Camara, an assistant of Portères, set up near the village of Sérédou, east of the Ziama massif (8 genotypes of original collection whose location is unknown).
Separated from the Gamé sub-group in the west by the Ziama massif, the other samples are distributed over a vast area extending from the south of the prefecture of Kérouane, Guinea to the far east of Côte d'Ivoire. The factorial and Bayesian analyses showed low differentiation and weak grouping among these individuals. Several factors could account for this situation. In terms of ecology and topography, there is no significant natural barrier that could restrict gene flow between coffee populations in that vast area. Moreover, from the first quarter of the twentieth century, there was a dramatic expansion of the plantations and intense movements of planting material throughout both French colonies from eastern Côte d'Ivoire to the Guinea Forest Region. This movement may have included coffee with a wide range of admixture levels in locations characterized by plantations and forests situated in proximity. However, the geographical distribution of the sub-groups shown in fig. 6 suggests that one sub-group (sgG3) could be located in the south of Côte d'Ivoire near the towns of Divo and Tiassalé. The last two less differentiated groups (sgG4 and sgG5) are located in the west and centre west of Côte d'Ivoire.
For the very first time, despite the limited number of markers (23) we used, compared with the number used in previous studies, e.g. 108 SSR markers in Cubry et al. (2013a), we found significant evidence for population genetic structure within the Guinean group of C. canephora. The structure detected is well organized spatially, especially in Guinea, and may reflect diverse local adaptation or a long history of isolation of the populations. Further investigations are needed to confirm these results and to elucidate the precise origin of the structure (isolation by distance, gene flow barriers, past climatic history, etc.).

Admixture between Guinean and Congolese groups
We focused on herbarium specimens reported to be "spontaneous", or "wild", or "local" by their collectors. However, we found several Guinean-Congolese admixed genotypes, mainly among the specimens collected by Portères in Côte d'Ivoire at the end of the 1950s, e.g. the ones collected from Tos plantation (see supplementary file 3). At that time, Congolese germplasm was predominant in the plantations after massive imports from the 1930s. The results of our genotyping study underline the interest of the large herbaria collected by Portères to validate information related to the history of introductions and movements of coffee in West Africa we retrieved in our literature review.

The case of two populations of the Congolese group
Robusta Kissidougou -Passing the barrier of Fouta Djallon and progressing to the southeast, coffee is cultivated in the prefecture of Kissidougou in the forest-savanna transition zone, mainly in the island forests that surround the villages. We found that the eight herbarium specimens of our study collected in 1959 from four villages around Kissidougou (Boundo, Makolo, Wendékéré, Massakoundou) belong to the Congolese group (SG1 and SG2 sub-groups), although Portères considered them to derive from local material. According to a handwritten note kept at MNHN, he located the discovery of a local C. canephora by Sagna Camara, an agriculture field assistant, between 1927 and 1930, in a forest near Boundou and the dissemination of that material only from the end of the 1940s. However, our study showed that the coffee plants sampled by Portères most probably belong either to the population Niaouli (Massakoundou specimens) or to the population that he called Robusta Congo Belge. The latter was introduced for the first time in 1914 in Côte d'Ivoire from the botanical garden of Eala (Democratic Republic of the Congo). This population, which is genetically close to the INEAC population, was widely disseminated in Côte d'Ivoire from 1922 and since the end of the 1920s in Guinea, especially in the Kissidougou region (Portères 1934(Portères , 1959(Portères , 1962Cordier 1961). According to Fairhead & Leach (1996) "coffee became more central to Kissidougou's economy from the 1930s. In the context of the high coffee prices of the late 1940s and 1950s farmers increased their coffee production dramatically, filling virtually all available forest island space with plantations and extending forest islands to harbour new plantations where possible." Therefore, the existence of coffee of the Guinean group originating in the region of Kissidougou, as stated by Portères, remains to be demonstrated. Robusta Ébobo -In the extreme south east of Côte d'Ivoire, the Plantation Ébobo was created by the Société des Plantations de la Tanoé in 1925 near the village of the same name on the Abi lagoon. According to Portères' handwritten notes conserved at MNHN, this plantation sent seeds to the Plantation de La Bia, an estate belonging to the Société des Plantations d'Elima (SPE) located at the mouth of the La Bia River, south of the town of Aboisso (Lemery 1937). In 1958, Portères collected 20 herbarium specimens of robusta Ébobo in the Plantation de La Bia and sent them to PAT. According to Portères (1959) and Cordier (1961), the robusta Ébobo may derive from plants native to the Montagnes Bleues (70 to 100 m a.s.l.) located between the villages of Ébobo and Elima (Laplante & Rougerie 1949). In another handwritten note, Portères mentioned a plot of robusta Ébobo established in 1935 on the Plantation Ébobo and inferred 1933-1934 as the likely date of discovery of its natural habitat. We genotyped three herbarium specimens (numbers 151, 152, and 153) and found them to belong to the Congolese group. They are genetically close to the Niaouli and Luki populations that belong to the SG1 sub-group. These results are in accordance with those of previous studies on several clones of robusta Ébobo by Berthaud (1984), Adibolo & Bertrand (1988), Montagnon et al. (1992b), andDussert et al. (2003), Dussert using the term "Aboisso" instead of "Ébobo". Therefore, there is little evidence to support Portères' hypothesis concerning a local origin of robusta Ébobo. This coffee may have been introduced earlier from Lower Guinea, possibly by Beynis, a European planter who established his plantation near Aboisso using seeds of Niaouli and Café du Kouilou imported from Gabon around 1910 (Cordier 1961). Bodard (c. 1960, CRA Bingerville, Côte d'Ivoire, unpubl. res.) mentioned other introductions of Café du Kouilou from Gabon between 1917 and 1927 that were mainly disseminated in the coastal region, particularly to the Plantation de La Bia.
The same early imports could explain why the herbarium sample number 142, collected in 1930 by Chevalier near Abengourou, although labelled as "spontaneous", is actually a specimen of the Niaouli population.

Potential sites for further collection of Guinean germplasm
As shown in fig. 6, most of the native C. canephora populations collected in the past in Côte d'Ivoire and Guinea were found in the vegetation zone (Z2) characterized by semideciduous moist forests, according to the classification introduced during the Conference of Yangambi in 1956 (Aubréville 1957; Guillaumet & Adjanohoun, 1971: 190). As already mentioned by Portères (Portères 1937a(Portères , 1937b and later by Berthaud (1984: 140), a number of populations were also found in a transition zone (Z3) where forest and savanna co-exist in a complex mosaic of forest islands and gallery forests along the rivers.
However, since the middle of the twentieth century, population pressure and immigration, illegal or excessive logging, land encroachment and extension of coffee and cocoa plantations, and inefficient public forest administration have led to the destruction or fragmentation of large forest areas. This historical trend has mainly been documented in Côte d'Ivoire (Brou et al. 1998(Brou et al. , 2005b. According to the conservative estimate by Fairhead & Leach (1998: 40), in 1900, Côte d'Ivoire may have had a forest cover of 7 to 8 million hectares that remained stable until 1955, but only 2.7 million in 1990. The U.S. Geological Survey Earth Resources Observation and Science (USGS EROS) Centre (http://eros.usgs.gov/westafrica/) has mapped the land cover of West Africa for three pe-riods in time (1975, 2000, and 2013) using many hundreds of Landsat images. By 2013, Côte d'Ivoire had lost nearly 60% of the 3.7 million hectares of dense tropical forests that existed in 1975. Similarly, degraded forest decreased by 28% and woodland area declined by 48%. In Guinea, during the same period, dense forest areas decreased by about 33% to only 0.4 million hectares. Recent studies have started to document the impact of the political-military crisis faced by Côte d'Ivoire between 2001 and 2013, on the 182 protected forests ("forêts classées" in French) of that country. Before the conflict, the forest of Haut-Sassandra, in which a coffee population (Pélézi, numbers 79 to 82) was collected in 1986, was one of the best protected forest reserves in Côte d'Ivoire with 93% of forest cover. In 2015, the same forest cover has decreased to less than 28%, mainly due to cocoa planting and tree logging that followed the withdrawal of the forest ranger force (Barima et al. 2016).
Both countries have also been affected by periods of severe drought since the end of the 1960s, in step with what has been observed in the Sahel. The trend intensified during the 1980s and 1990s before a remission in the 2000s (Lubès et al. 1995;Brou et al. 2000Brou et al. , 2005a. Moreover, recent studies investigated the potential impact of global climate change on the natural habitats and cultivation areas of several species of Coffea (Davis et al. 2012(Davis et al. , 2019Bunn et al. 2015). In a few decades, changes in climate will reduce the global area suitable for coffee, leading to severe stress and a high risk of extinction. Even if the species C. canephora (as a whole, including the Congolese group) is not considered as critically endangered, the populations of the Guinean group growing in the transition zone between savanna and forest (Z3) in Guinea and Côte d'Ivoire will likely be the first affected. Lastly, as important as forest degradation, the dilution of the Guinean genetic pool resulting from hybridization with coffee belonging to the Congolese group, which constitutes the major part of the coffee plantations in Guinea and Côte d'Ivoire, could also hamper new collections of native coffee plants in both countries.
Despite those factors, a few populations of coffee trees belonging to the Guinean group were successfully collected in the last decades of the twentieth century in Côte d'Ivoire and Guinea (Le Pierrès et al. 1989;Couturon & Montagnon 1991;Montagnon et al. 1993). Berthaud (1984: 140) mentioned the presence of wild coffee in "sacred forests" that are preserved to host initiation ceremonies. These sites often contain vast biodiversity with abundant fauna and flora (Yao et al. 2013). Recent studies on farming systems in the Guinea Forest Region have highlighted the expansion of agroforests surrounding the villages (Fairhead & Leach 1996;Camara et al. 2009;Correia et al. 2010). In these agroforests, coffee, cocoa and kola are associated with various forest tree species. However, the genetic identity of the coffee populations found in these agroforests remains unknown.
Our findings on the genetic structure of C. canephora in Guinea and Côte d'Ivoire will guide further surveys in both countries. The use of recent aerial and satellite images will provide preliminary information on the current state of vegetation at past collection points. On the spot, in addition to coffee sampling, it is advisable to record detailed information about the ecology and associated flora, as well as a descrip-tion of the coffee populations (age structure, sanitary status, etc.) and of the environmental and social factors that could affect the future of the forest. For sampling, priority should be given to germplasm that is not or insufficiently represented in ex situ gene banks i.e. var. maclaudii, cultivar Gamé, and populations from South-East Guinea, and, in Côte d'Ivoire, forest remnants in the south (Toumodi, Kassa, and Mopri) and in the east (Abengourou and Agnibilekrou). The gene banks belonging to CNRA and IRAG, once supplemented with new material from the Guinean group, will be major resources for the reactivation of coffee breeding activities, especially given the increasing need for adapted germplasm in the context of climate change. As stated by Davis et al. (2019), wild variants of coffee species will be of primary importance in the future, especially with the increasing incidence and duration of drought and the emergence or spread of diseases and pests. This is particularly the case for populations located at the margins of the environmental range of the species, like var. maclaudii in the extreme northwest and populations located in the transition zone (Z3). These particular locations might be associated with specific local adaptations of potential interest for breeding purposes. A more in-depth analysis including phenotypic evaluation of the genotypes from these locations should be undertaken to detect particular traits associated with their specific history and ecological growing conditions.

CONCLUSIONS AND FURTHER STUDIES
The analysis of genetic data of herbarium specimens, combined with information extracted from historical records, herbarium documentation and scientific literature, allowed us to validate (or, in a very few cases, invalidate) previous observations made by botanists regarding the history of introductions and movements of robusta coffee germplasm in Guinea and Côte d'Ivoire.
We explored the genetic diversity of C. canephora in these two countries by genotyping old herbarium specimens with 23 nuclear microsatellites and appropriate controls. We were able to determine the genetic group (Congolese or Guinean) of each specimen, and, if applicable, to estimate the level of admixture. We identified several genotypes belonging to the Congolese group, introduced with the development of commercial coffee plantations from 1920, and diverse Guinean-Congolese hybrids resulting from natural hybridization between local and introduced genetic stocks. The results of factorial and Bayesian analyses provided significant evidence for population genetic structure within the Guinean group of C. canephora. The Guinean genotypes can be assigned to five sub-groups with distinct spatial distribution, especially in Guinea where two sub-groups, corresponding to the var. maclaudii and the cultivar Gamé, are characterized by a low level of admixture, which suggests restricted gene flow probably due to geographical distances and the natural barriers formed by the Fouta-Djallon and the Ziama massifs.
Because of past and current trends in forest degradation, there is an urgent need to describe and protect potential collection sites in West Africa and to evaluate the genetic diversity of the coffee populations they contain. Collecting C. canephora genotypes of the Guinean group will serve as a valuable resource for future genetic improvement of robusta coffee. Due to its limited genetic diversity compared to other Guinean sub-groups, the population of C. canephora var. maclaudii near Bilima may be considered vulnerable and a priority target for conservation.
The following step could be to extend genetic analysis to a larger number of specimens taken from MNHN herbaria (P and PAT) and other European herbaria (e.g. herbarium LISU, Lisbon, Portugal and BR, Meise Botanic Garden, Belgium). With recent DNA technologies like next generation sequencing methods (chloroplast sequencing, gene capture, etc.), already successfully applied to herbarium specimens (e.g. Staats et al. 2013;Hart et al. 2016;Suchan et al. 2016), it is now possible to conduct in-depth investigation of diversity within a species, at reasonable cost. Provided the herbarium specimens are accurately documented, a detailed and comprehensive study of the species C. canephora, including Congolese and Guinean groups and their sub-groups, becomes an achievable goal.