Chromosome-Level Genome Assembly of Anthidium xuezhongi Niu & Zhu, 2020 (Hymenoptera: Apoidea: Megachilidae: Anthidiini)

Abstract Anthidiini, a large bee tribe characterized by light-colored maculations, represents nearly 1,000 pollinator species, but no genomes are yet available for this tribe. Here, we report a chromosome-level genome assembly of Anthidium xuezhongi collected from the Tibetan Plateau. Using PacBio long reads and Hi-C data, we assembled a genome of 189.14 Mb with 99.94% of the assembly located in 16 chromosomes. Our assembly contains 23 scaffolds, with the scaffold N50 length of 12.53 Mb, and BUSCO completeness of 98.70% (n = 1,367). We masked 25.98 Mb (13.74%) of the assembly as repetitive elements, identified 385 noncoding RNAs, and predicted 10,820 protein-coding genes (99.20% BUSCO completeness). Gene family evolution analyses identified 9,251 gene families, of which 31 gene families experienced rapid evolution. Interspecific chromosomal variation among A. xuezhongi, Bombus terrestris, and Apis mellifera showed strong chromosomal syntenic relationships. This high-quality genome assembly is a valuable resource for evolutionary and comparative genomic analyses of bees.


Introduction
Among bees, the family Megachilidae, especially the subfamily Megachilinae, contains some of the most important wild pollinators for crops and pasture (Holm 1984;Free 1993;Garibaldi et al. 2013;Richards 2016). The tribe Anthidiini, with yellow or reddish-yellow maculations, is a major group within Megachilinae, with nearly 1,000 species (Michener 2007;Niu et al. 2020;Ascher and Pickering 2021).

Significance
Many species of Megachilidae are renowned for their ability to pollinate crops and pasture, and for their unusual nesting behaviors. The current lack of high-quality chromosome-level genome resources for the Anthidiini limits our understanding of their evolution and biology. Further, there are few genomic resources for high elevation species. In this study, we present a chromosome-level genome assembly and annotation of Anthidium xuezhongi, including also synteny and gene family evolution analysis. Our results represent a valuable resource for further study on the evolutionary biology of Megachilidae.
Anthidiini exhibit intriguing nesting behavior (Michener 2007;Litman et al. 2016). Like many megachilids, they nest in preexisting cavities in various substrates, including both preexistent burrows and man-made cavities, and some species show remarkable plasticity (Hicks 1933, Michener 2007. Some species build their nests on the surface or underside of rocks, stones, or stems (Anthidiellum; Pasteels 1977;Gess and Gess 2007). Cells within the nests are also lined with various materials such as resin, soil particles, pieces of leaves, chaff, fibers, and plant trichomes (Banaszak and Romasenko 1998;Michener 2007). Despite being interesting and important bees, no high-quality chromosome-level genomes have been reported from the tribe Anthidiini, and very few chromosomal assemblies are available for bees in general. To date, there are few megachilid genomes, with only three among the~70 bee genomes published (accessed on November 2021 from NCBI).
In this study, we report the first chromosome-level genome assembly of Anthidiini, Anthidium xuezhongi Niu & Zhu, 2020, collected from Tibet. We annotated the noncoding RNAs (ncRNAs), protein-coding genes, and repeat elements. We performed gene family evolution analyses on A. xuezhongi and 13 well-studied insect species from major insect orders. In addition, we analyzed interspecific chromosomal variation relationships among A. xuezhongi and the well-studied bees Bombus terrestris and Apis mellifera. This high-quality genome of A. xuezhongi is an important step toward bettering our knowledge of the megachilids, whereas also enabling further comparative studies on nesting biology and behavior.

Genome Assembly
We sequenced 101.83 Gb clean reads in total, including 31.76 Gb (168Â) PacBio reads, 28.74 Gb (152Â) Illumina reads, 35.05 Gb (185Â) Hi-C reads, and 6.28 Gb transcriptome reads (supplementary table S1, Supplementary Material online). The N50 and mean length of the raw PacBio long reads were 21.29 and 18.58 kb, respectively. Following quality control, 26.40 Gb Illumina reads were retained for the genome survey and subsequent genome polishing. We estimated the genome size with GenomeScope under the 21-k-mer parameter. The genome survey suggests that the sequenced A. xuezhongi had a genome size of 188-193 Mb, a single simple peak and statistical analyses showed that its genome had a very low repetitive content and heterozygosity.
The MAKER3 pipeline identified 10,820 protein-coding genes from 16,545 sequences in the A. xuezhongi genome with a mean length of 6,974.9 bp. The average number of exons, introns, and CDS per gene were 13.0, 11.3, 12.3, respectively, and their mean lengths were 245.3, 1827.5, 185.3 bp, respectively. The BUSCO completeness assessment for protein sequences reached 99.20% (n ¼ 1,367), identifying 1,041 single-copy, 315 duplicated, three fragmented, and eight missing BUSCOs, indicating the high quality of these predictions. Diamond searches aligned 15,594 (94.74%) genes to the SwissProt and TrEMBL databases. Integrating with InterProScan and eggnog annotation results, 7,690 GO items, 6,730 KEGG pathway, 5,906 MetaCyc, 7,699

Gene Family Evolution and Phylogenetic Relationships
Gene families were identified with OrthoFinder among 14 species, including 5 Apoidea species (see Materials and Methods). A total of 153,377 (94.60%) genes were assigned to 12,541 orthogroups (gene families). Among them, 1,897 orthogroups and 8,983 genes were assigned as being species specific; another 5,243 were orthogroups present in all species and 3,469 consisted of single-copy genes ( fig. 1c). In A. xuezhongi, 10,091 (93.30%) genes were clustered into 9,251 gene families, of which 158 gene families were expanded, 482 gene families were contracted, and 31 gene families had experienced rapid evolution. Rapidly evolving gene families include several odorant receptors that may be linked to recognition of females during territorial behavior, but we could not meaningfully connect most to biological phenomena. With deeper sampling, we will better be able to isolate exactly which trends are specifically characteristic of both Anthidiini and this species. For these purposes, and to better interpret the ancestral state of the family Megachilidae, a genome of the early-diverging fideline bees will prove extremely valuable (Michener 2007).
Filtering with "symtest" resulted in 3,123 loci (1,826,179 amino acid sites), which were used to infer the phylogenetic tree in IQ-TREE, and with 100/100 support of all nodes. The phylogenetic reconstruction results are fully consistent with Peters et al. (2017) and Branstetter et al. (2017), supporting Formicidea as the sister clade of Apoidea. Parasitica, Vespoidea, Formicidea, and Apoidea were monophyletic groups, as expected. The Apoidea originated during the early Cretaceous period (131-138 Ma). Apidae and Megachilidae diverged during the late cretaceous (68.4-77.2 Ma). The common ancestor of the three Megachilidae taxa species arose in the Eocene period (44.5-47 Ma).

Chromosomal Synteny
In comparing the chromosome-level genome assemblies of A. xuezhongi, B. terrestris, and A. mellifera, we captured 315 syntenic blocks (14,473 collinear genes). Strong collinearity in general among these three species revealed conserved chromosomal characteristics. However, a few chromosomes were derived from two or more chromosome segments, such as 16-chromosome from the fusion of segments of B. terrestris and A. mellifera chromosomes 2, 6, 7, and 11 ( fig. 1d). However, large chromosomal differences have been found based on other behaviors, such as social parasitism within bumblebees (Sun et al. 2021), so we cannot rule out other factors at this time. More study with additional chromosomal assemblies will be necessary to test these possibilities.

Samples Collection and Sequencing
Adult specimens of A. xuezhongi were collected on June 19, 2020, at Chide Village, Pulan County (30.295N 81.142E; 3,934 m), Xizang/Tibet, China. The samples were deposited in liquid nitrogen and then stored at À80 C before DNA was extracted. We removed the metasoma of all samples before sending them to the sequencing company (Berry Genomics, Beijing, China) to reduce potential gut microbe contamination. One individual specimen was used for each of PacBio, Illumina whole genome, Illumina transcriptome, and Hi-C sequencing.
Genomic DNA was extracted using Qiagen Blood and Cell Culture DNA Mini Kits, for PacBio and Illumina whole genome sequencing, respectively. PacBio sequencing was conducted on the PacBio Sequell II platform with an insert size of around 30 kb, using SMRTbell Template Prep Kit 1.0-SPv3. Sequencing libraries of 150-bp paired-end reads with an insert size of 350 bp were generated with a Truseq Nano DNA HT Sample preparation Kit (Illumina USA) on the Illumina NovaSeq 6000 platform. RNA sequencing was extracted by TRIzol Reagent, and the library was generated with TruSeq RNA v2 kit. Hi-C library was conducted on BGI MGISEQ-2000 platform with 150 -bp paired-end reads.
Genome Survey and Assembly BBTools v38.29 (Bushnell 2014) was used for quality control on the raw Illumina data: duplicates removal with clumpity.sh; bbduk.sh to trim areas with quality scores lower than 20, reads shorter than 15 bp or more than 5 Ns, polymer trimming (>10 bp or poly-A/G/C tails), and overlapping paired reads. K-mer analysis and k-distribution were estimated by khist.sh ("21 k-mer"), and Genomescope v2.0 (Vurture et al. 2017) was then performed to estimate the size, heterozygosity, and content of repetitive elements of the A. xuezhongi genome with "-k 21 -p 2 -m 10,000." Flye v2.8.1 (Kolmogorov et al. 2019) was performed to assemble PacBio long reads with minimum overlap between reads of 1,000 bp and one round self-polishing. The Illumina sequences and PacBio assembly were aligned with Minimap2 v2.17 (Li 2018;Danecek et al. 2021). Removing heterozygous regions was performed by Purge_Dups v1.0.1 (Roach et al. 2018) with a cut-off at 60% identified contigs as haploids. Hi-C reads were subjected to quality control and aligned to the assembly by Juicer v.1.6.2 (Durand et al. 2016). Primary contigs were anchored into chromosomes using 3D-DNA (Dudchenko et al. 2017). Juicebox v.1.11.08 (Durand et al. 2016) was used to manually correct possible errors. BlastNlike MMseqs2 v11-e1a1c (Steinegger and Sö ding 2017) were performed to search potential contaminant sequences via the NCBI UniVec and nucleotide (nt) database.
Genome completeness was assessed using BUSCO v3.0.2 (Waterhouse et al. 2018), with the insecta_odb10 database (n ¼ 1,367) as the reference gene set. In addition, to examine the utilization of clean data and completeness of assembly, we mapped clean reads of PacBio long reads and Illumina short reads to the final assembly with Minimap2, then calculated the mapping rate with SAMtools.

Chromosome Synteny
MMsesqs2 v.11 was performed to identify interspecific chromosomal variation among A. xuezhongi, Bombus terrestris, and A. mellifera with an e-value threshold of 1e-10. The genome and annotation files of Bombus terrestris, and A. mellifera via NCBI. TBtools was used to generate a visual diagram for interspecific chromosomal ( fig. 1d). At least eight genes were required to define a collinear block.

Supplementary Material
Supplementary data are available at Genome Biology and Evolution online.