Investigating leaf beetles (Coleoptera, Chrysomelidae) on the west coast islands of Sabah via checklist-taking and DNA barcoding

Sabah is a province of Malaysia located on the northern part of the island of Borneo. Most of the leaf beetle fauna studies from this region conducted over the past 15 years have focussed on the mainland habitats while the leaf beetle fauna from island habitats (ca. 500 islands) have largely been overlooked. This study looks into the leaf beetle fauna of 13 small satellite islands off the west coast of Sabah. All specimens were first sorted into morpho-species operational taxonomic unit (OTU) before being identified to species rank where possible based on morphological characters and species names assigned when the specimens fitted the description of species in the literature. We collected 75 OTUs from 35 genera and five subfamilies according to morphology, 12 of which were identifiable to species level. In addition, the DNA barcode for each OTU was cross checked with records in GenBank and Barcoding of Life Data system (BOLD) to verify their identity. The number of species recorded was reduced from 12 species and 63 OTUs (total 75 OTUs) to 12 species and 56 OTUs (total 68 OTUs) after removal of the colour polymorphic species based on DNA barcode analyses. Pulau Gaya has the highest species richness and Pulau Sulug has the lowest species richness. A total of 64 Barcode Index Numbers consisting of 101 DNA barcodes were obtained from the 12 leaf beetle species and 48 OTUs. Based on the DNA barcode analyses, it was possible to confirm several polymorphic OTUs and cryptic species. The mean intraspecific and interspecific genetic divergence were determined as 0.77% and 16.11%, respectively. DNA barcodes of this study show a low similarity with records in GenBank and BOLD, highlighting the lack of representation and the urgency of studying leaf beetles from this region. The study provides the first documentation of leaf beetle fauna from island habitats of Sabah and the first DNA barcoding data for leaf beetles from this part of the world, with the next steps being larger scale sampling over a wider geographical scale for a better understanding of tropical arthropod diversity.


INTRODUCTION
Chrysomelidae Latreille, 1802 is one of the most diverse beetle families, with 35,000-60,000 species around the world (Splipnski, Leschen & Lawrence, 2011;Jolivet, 2015). The study of leaf beetle fauna in Borneo started in the 19th century, with the first valid species described by Suffrian (1854). A brief history of leaf beetle studies in Borneo is discussed in The Leaf Beetle of Borneo by Mohamedsaid, Salleh & Hassan (1992). Although Borneo is recognized as one of the world's most biodiverse islands, taxonomic research on Borneo leaf beetles has been limited to a few publications. As of 2004, 635 species of leaf beetle had been recorded in Borneo (Mohamedsaid, 2004). Researchers are currently in a race against time to study the Bornean leaf beetles due to the high deforestation rates in the region (Sodhi et al., 2010).
The above mentioned recent taxonomic works are based on morphological characteristics (Mohamedsaid, 2006a;Moseyko, 2012;Medvedev & Romantsov, 2017b) and few studies have utilized the molecular approaches to infer the phylogeny of leaf beetles (Kishimoto-Yamada et al., 2013;Crampton-Platt et al., 2015). Using this conventional taxonomic approach alone is challenging because sexual dimorphism and colour pattern variants or phenotypic polymorphism are common, especially variables within the subfamily Galerucinae (Crowson, 1981;Chaboo, 2007;Prado, 2013;Gómez-Zurita et al., 2016). Consequently, DNA barcoding has been added to the taxonomist's toolkit in order to complement the species identifications that are based on morphological characters (Hebert et al., 2003;Pentinsaari, Hebert & Mutanen, 2014;Gómez-Zurita et al., 2016). To date, there are 73 records of leaf beetles with 15 Barcode Index Numbers (BINs) from Malaysia registered in the Barcoding of Life Data system (BOLD), but none of these records are from Sabah or Borneo.
For all the reasons stated above, it is timely to (1) document the species richness of leaf beetles from different habitats to obtain the general pattern of Bornean leaf beetle diversity, and (2) generate DNA barcodes for known and newly described Bornean leaf beetle species to serves as supplementary information for the morphological identification and building up of the DNA barcodes reference database and to enable rapid species identification. Therefore, this study serves as part of the effort by the second author, Haruo Takizawa towards the above mentioned goals through documentation of the species richness of leaf beetles in Sabah.

Leaf beetle sampling and processing
Standard plot sampling was carried out between January 2016 and March 2017 on the 13 islands along the west coast of Sabah ( Fig. 1; Table 1) under research permit from Sabah Park (TTS/IP/100-6/2 Jld.4 (49)) and permission from Sapangar Naval Base (MWL2.100-2/2/3-(9)). The number of sampling plot varies among the sampled islands according to their respective island area (Table 1). In each 20 Â 20 m plot, a manual hand-picking search for leaf beetles and 200 sweeps of shrubs and herbaceous vegetation were conducted using an entomological sweep net (Sánchez-Reyes, Niño-Maldonado & Jones, 2014) over two person-hours. Leaf beetles from each plot were kept in separate Falcon Tubes and brought to the laboratory for further processing. It should be noted that specimens from outside the plots were also collected.
Leaf beetle specimens collected were first killed using 70% ethanol before identified under the microscope. All the specimens were first sorted and grouped into morphospecies operational taxonomic unit (OTU) before identified to species rank when possible by the second author based on morphological characteristics and literature available. For OTUs that were unable to be identified to species rank, these specimens were identified to genus rank when possible. The usage of the term 'species' in the present study referred to those determined species based on morphological characters and literature available

DNA extraction, PCR amplification and sequencing
DNA was extracted from one to three whole legs of the leaf beetles using Qiagen DNeasy Blood and Tissue Kits, following the manufacturer's protocols. After that, all the DNA extracts were stored under -20 C prior to PCR amplification. The mitochondrial gene region, cytochrome c oxidase subunit I was PCR-amplified using universal primer LCO 1490 and HCO 2198 (Folmer et al., 1994). The 25 ml PCR reaction mixtures contained 2.5 ml of 10Â GoTaq Ò PCR buffer with 15 mM MgCl 2 , 1.5 ml of 25 mM MgCl 2 , 0.5 ml of 2.5 mM dNTP mix, 0.5 ml of 10 pmol each primer, 0.25 ml of 5 u/ml Taq DNA polymerase, one ml DNA template and 18.25 ml ddH 2 O. PCR amplification was performed in Bio-Rad T100 Thermal Cycler following thermal cycling, an initial denaturation at 94 C for 3 min, followed by 35 cycles of denaturation at 94 C for 30 s, annealing at 47 C for 45 s, extension at 72 C for 60 s, and a final extension at 72 C for 5 min. PCR products were checked for successful amplicon using the 1% agarose gel with TBE buffer. Successful PCR amplicons were sent to Genomics BioScience and Technology Co., Ltd. (Taipei, Taiwan) for sequencing.

Data analysis
Sequences were checked visually with Bioedit v7.1.9 (Hall, 1999). All the complete sequences were uploaded, registered and managed in the BOLD (Ratnasingham & Hebert, 2007) together with the information about taxonomy and collection, voucher deposition details, original sequence trace files and photographs of the specimens. Each sequence was assigned the BIN in BOLD (Ratnasingham & Hebert, 2013). Barcode Gap Analysis and distance summary for intraspecific and interspecific distance based on Kimura 2-parameter (K2P) distance model (Kimura, 1980) were performed in BOLD.
In addition, all sequences were crossed checked with the records in the National Center for Biotechnology Information GenBank and BOLD using the Basic local alignment search Tool (BLAST) (Altschul et al., 1990) in Geneious free trial v11.0.3 (Kearse et al., 2012) and identification tool in BOLD platform to search for similar DNA sequences in the database and to verify their tentative taxa identities. Resulting BLAST and BOLD top-hits for all the sequences are shown in Table S1.

DNA barcodes
The BINs for each specimen were listed. The intraspecific and interspecific distances of the species were generated using the sequence analysis in BOLD. For intraspecific distance, only species with more than one individual sequence were shown and for interspecific distance, only two or more species under the same genus were shown in the checklist. The 'Mean' represents the mean distance, 'Max' represents the maximum distance, and abbreviation 'N/A' represents data that are not available.

Species checklist
This checklist is comprised of information about the generic diagnosis of the genus: examined materials in BORNEENSIS, species distribution in west coast islands of Sabah, DNA barcode and general remarks on the species. The taxonomy nomenclature of the present study follows the Family-group names in Coleoptera (Bouchard et al., 2011) at the levels of family and subfamily, while at the levels of genus and species was adopted the Leaf Beetle genera (Seeno & Wilcox, 1982) and the Catalogue of the Malaysian Chrysomelidae (Mohamedsaid, 2004). Morpho-species that could not be identified to genus level were named after the subfamily (e.g. Galerucinae sp.). Photos of dorsal and ventral habitus for each of the species were included.

Species diversity among leaf beetles
A total of 1,104 leaf beetle specimens were collected in this study, including 68 OTUs in 35 genera and five subfamilies after the removal of polymorphic species based on the DNA barcode analyses, in which 12 OTUs were identified to species level based on morphological characters. Of all the leaf beetle subfamilies collected, Galerucinae had the highest number of genera and species recorded (18 genera, 42 OTUs), followed by the subfamily Eumolpinae (nine genera, 16 OTUs), subfamily Cassidinae (five genera, seven OTUs), subfamilies Chrysomelinae (two genera, two OTUs) and Criocerinae (one genus, one OTUs). In terms of genera, genus Monolepta was the most speciose with 18 OTUs collected, followed by genus Hoplosaenidea with seven OTUs and genus Basilepta with six OTUs. In addition, Pulau Gaya has the highest species richness recorded amongst the sampled islands (42 OTUs), followed by Pulau Tiga (22 OTUs), Pulau Sapangar (18 OTUs), Pulau Dinawan and Pulau Sapi (nine OTUs), Pulau Mantukod and Pulau Manukan (eight OTUs), Pulau Mengalum (seven OTUs), Pulau Mamutik and Pulau Udar Besar (six OTUs), Pulau Udar Kecil (five OTUs), Pulau Peduk (four OTUs) and Pulau Sulug (two OTUs). The number of leaf beetle subfamilies, genera and OTUs collected at each island is summarized in Table 2.

DNA barcoding
Total genomic DNA of 75 morphologically identified leaf beetle species was extracted, but only 67 of these were successfully sequenced resulting in 100 barcode compliant sequences and one non-barcode compliant sequence. These 101 sequences were uploaded and assigned to 64 BINs in BOLD (available at: dx.doi.org/10.5883/DS-BCHRY18). Details of the sequenced leaf beetle species, number of specimens, and respective BINs are listed in the Supplementary file, Table S2. The differences between the number of successfully sequenced OTUs and the number of BINs suggests the presence of polymorphic species, which is confirmed by the sequence analyses in BOLD and reduced the species richness recorded in the present study (from 75 to 68 OTUs). Nonetheless, the congruence of species delimitation obtained through both morphological identification and DNA barcoding methods was relatively high (55 OTUs or 82.1%). The number of BINs for each island and number of BINs shared with other islands is summarized in Table 2. Smaller size islands tends to have a lower number of BINs and the majority of these BINs are shared with the other islands, while the larger size islands harbour more BINs and some of these were found to be exclusive to a particular island. A neighbour-joining tree was constructed based on these 101 sequences via BOLD (Fig. 2), to shows the relationship between these sequences. From the sequence nucleotide composition analysis in BOLD, the average percentage of all the sequences G, C, A and T were 16.38% (14.58-18.13%), 17.19% (13.23-24.02%), 29.86% (27.11-33.13%) and 36.57% (30.66-40.99%), respectively. The overall mean AT content of the 101 sequences was 66.43% (57.85-71.66%) and strongly AT biased at the third codon position with mean AT content of 85.09% (63.64-96.51%). Intraspecific and interspecific K2P distances were easily distinguishable from each other, with overall means 0.77% (range 0-1.99%) and 16.11% (range 4.71-24.6%), respectively. Further details of the intraspecific and interspecific distances are available in the Tables S3 and S4.
All the 101 sequences submitted to GenBank through BLAST and identification tools in BOLD platform to search for identical results and the top-hit results are shown in the  Table S1. The pairwise identity percentage of the 101 sequences with records in GenBank and BOLD range from 82.6% to 100% and from 84.01% to 100%, respectively. The majority of the GenBank and BOLD top-hit search record taxonomy are under the same order or family as the queried sequences, with some exceptional records (seven GenBank and BOLD records) classified under other insect orders (e.g. Order Lepidoptera) and other Coleoptera families (e.g. Zopheridae and Staphylinidae). These BLAST and BOLD top-hits results were summarized and grouped into two categories (!90% and <90%) based on their pairwise identity percentages, as shown in Table 3.
In general, a high proportion of these 101 top-hit search records were insufficiently identified, with only 46.5% of the GenBank and 14.9% of the BOLD records being identified to species rank. Besides that, only 21 records (15 species) and 38 records (21 species) from both GenBank and BOLD have pairwise identity percentage higher than  90% with the queried sequences. For instance, only nine out of the 21 GenBank records and five out of the 38 BOLD records have pairwise percentages higher than 90% with the queried sequences were identified to species rank.

Species diversity among leaf beetles
Several earlier studies on species richness and biodiversity of Chrysomelidae from Sabah can be compared with the present study (Mohamedsaid, Salleh & Hassan, 1992;Mohamedsaid, 1995Mohamedsaid, , 2004Chung et al., 2000;Kishimoto-Yamada, Takizawa & Mahadimenakbar, 2016). For instance, Mohamedsaid, Salleh & Hassan (1992) reported 168 leaf beetle species from eight days sampling in Danum Valley Conservation Area, while Chung et al. (2000) reported 80 leaf beetle species from different habitat types in Sabah using different sampling approaches and Kishimoto-Yamada, Takizawa & Mahadimenakbar (2016) presented a checklist of 129 leaf beetle species from their 34 months sampling in Universiti Malaysia Sabah. The relatively low species richness recorded in this study is possibly caused by species poverty on the islands, different forest types sampled, different sampling methods and sampling efforts as in the previous studies (Wagner, 2000;Whittaker & Fernández-Palacios, 2007;Thormann, 2015).
The subfamily composition of the present study was also compared with the previous reports on Borneo and Oriental leaf beetle fauna (Kimoto, 1988;Mohamedsaid, 2004). The dominant leaf beetles on these islands are species of subfamilies Galerucinae (excluding Tribe Alticini) and Eumolpinae which is in accordance with the general trend throughout the Oriental region (Kimoto, 1988). Furthermore, the fact that Monolepta was the most speciose genus collected in this study also agrees with previous reports from Sabah and Borneo (Mohamedsaid, Salleh & Hassan, 1992;Mohamedsaid, 1995Mohamedsaid, , 2004Mohamedsaid, , 2006b; Kishimoto-Yamada, Takizawa & Mahadimenakbar, 2016). Although this study took samples from only 13 out of the 500 islands (∼3%) from Sabah, the number of species documented in the current study was ∼8.5% of the total number of reported species (803 species) in Borneo, which is a notable percentage in spite of the small proportion of sampled area in this study. This also suggests that species richness on the islands is possibly comparable to that of mainland habitats and that more species remain to be discovered. This checklist also reveals the distribution of agricultural pest species on the islands, which is vital for the control of their dispersal. For example, Brontispa longissima, one of the most serious coconuts pests in the Pacific region, is commonly found on the sampled islands noted for human habitation, tourist activities and resorts. The sweet potatoes leaf beetle, Colasposoma auripenne was also found on an island with the cultivated Ipomoea batatas. It was possibly introduced to the island together with its host plant. Another agricultural pest species, Monolepta sp. 4 was found to have heavily defoliated the young shoots of a wide range of fruit trees (e.g. Citrus sp., Mangifera sp. and Sauropus androgynus).

DNA barcoding
Out of the 64 generated BINs, 60 unique BINs are new to BOLD and four non-unique BINs are existing records in BOLD, of which only one of the non-unique BINs, Scelodonta granulosa (BOLD:ADE7488) was previously recorded from Malaysia in BOLD (private data in BOLD, 2018) while the remaining 63 BINs are new to Malaysia. However, existing records of Scelodonta granulosa in BOLD were not made public and hence, this study resulted in the number of public records and BINs of Chrysomelidae from Malaysia in BOLD increasing to 174 records and 79 BINs, respectively. The high number of unique BINs and relatively low number of representatives from this region in BOLD highlight the urgency of studying the biodiversity in the region, in more depth.
All the non-unique BINs samples recorded in BOLD were collected from their known geographical distribution, except for Altica aenea. The known geographical region of Altica aenea does not include Pakistan as recorded in BOLD (Reid & Beatson, 2015) and possibly represents a new locality record for this species. However, re-examination of the specimens is necessary to validate this finding as one of the specimens recorded under the same BIN was identified as Altica birmanensis. The Brontispa longissima, one of the most serious invasive agricultural pests of the coconut palm, has a wide distribution across the Asia-Pacific region (Rethinam & Singh, 2007;CABI, 2018). The geographical distribution of the DNA barcoded Brontispa longissima specimens in BOLD were concordant with its known distribution, but none of these specimens were collected from Malaysia (BOLD, 2018;CABI, 2018). Therefore, the barcode compliant sequences generated in this study represent the first record of this species from Malaysia in BOLD.
The presence of cryptic species and polymorphic species was confirmed using the sequence analysis tools in BOLD. Six sequenced Nodina sp. specimens were revealed to be five different putative species OTUs as a result of the following analysis output combination: (1) assigned to five different BINs in BOLD, (2) intraspecific divergence (19.14%) higher than the distance to the nearest neighbour Basilepta sp. 1 (16.33%) in the Barcode Gap analysis, and (3) split into five different clusters in the Taxon ID Tree analysis (Fig. 2, highlighted in blue colour). However, these five OTUs are morphologically hard to distinguish from one another and thus, they are collectively treated as a single OTU (Kishimoto-Yamada, Takizawa & Mahadimenakbar, 2016) and excluded from the overall mean intraspecific and interspecific distance analyses.
Nonetheless, 97% of the sequences obtained from this study are new to GenBank. On top of that, out of the 21 queried sequences with pairwise identity percentage higher than 90%, only five were identified to species level in BLAST top-hits results. These are Brontispa longissima, Altica birmanensis and Altica engstroemi with pairwise identity percentages of 100%, 99.2% and 99.0%, respectively (see Table S1). However, both Altica birmanensis and Altica engstroemi were previously not recorded in Borneo and the latter species' known distribution was only from northern Europe (Mohamedsaid, 2004;Reid & Beatson, 2015;GBIF, 2017). This has become complicated by the fact that the pairwise identity percentages of these two species sequences in GenBank is 99.5%, suggesting that they should be the same species, and that they were possibly misidentified as the locality of both specimens was recorded as being from Kerala, India. This conforms with previous reports on the poor quality of taxonomic identifications in GenBank (Bridge et al., 2003;Vilgalys, 2003;James Harris, 2003;Kristiansen et al., 2005). A similar situation was noted with for Altica aenea in BOLD, where one of the records under the same BIN was identified as Altica birmanensis. Re-examination of the specimen taxonomy identification confirmation is necessary to resolve this conflict. Besides that, the presence of other taxa in the top-hit search records and the high proportion of low pairwise identity percentages between queried sequences with GenBank and BOLD existing records highlights the poor representation and the urgency of this taxa from the region under study using molecular approaches.

SUBFAMILY GALERUCINAE
Tribe ALTICINI Newman 1835 Genus Altica Geoffroy, 1762 Refer to Appendix A for the generic diagnosis of this genus extracted from respective key literature of the taxa. Altica aenea (Olivier, 1808) Max: 0 Remarks. BLAST top-hit result shows 99% similarity with Altica bermanensis and Altica engstromi. However, both species have not recorded in Sabah (Mohamedsaid, 2004;Reid & Beatson, 2015). So, the records in GenBank are probably misidentified.

Genus Aphthona Chevrolat, 1837
Refer to Appendix A for the generic diagnosis of this genus extracted from respective key literature of the taxa. Aphthona sp. (Fig. 3B    Examined materials (7    Max: 1.69 Remarks. This species exhibits phenotypic polymorphism, with four different phenotypic characters, one fully milky white in colour, one with suture and elytra edge black in colour, one elytra with two dark brown bands separated by light brown bands, and one elytra with two dark brown bands interconnected by dark brown suture but separated by two light brown bands.          Possibly exhibit phenotypic polymorphism with three different patterns and colourations on the elytra. These three patterns also observed at UMS hill based on second author collection.

SUBFAMILY EUMOLPINAE
Tribe ADOXINI Jacoby, 1908 Section Scelodontites Chapius 1874 Genus Scelodonta Westwood, 1837 Refer to Appendix A for the generic diagnosis of this genus extracted from respective key literature of the taxa.   Genus Plagiodera Chevrolat, 1837 Refer to Appendix A for the generic diagnosis of this genus extracted from respective key literature of the taxa. Plagiodera sp.

CONCLUSIONS
The notable number of leaf beetle species collected from a small portion of the island habitats in Sabah (∼3%) during the course of this study, indicates that many species have yet to be discovered and that these habitats should not be neglected in the process of investigating Bornean leaf beetle diversity. Moreover, the number of DNA barcodes generated in this study not only proves the effectiveness of barcoding in species delimitation, but also highlights the unsatisfactory level of representation of this taxa in the public sequence database for this region. This study marks an attempt to improve our current understanding of leaf beetle diversity from this region and helps in building up the DNA barcode reference database of these taxa.