A chromosome-level genome assembly of Prosopocoilus inquinatus Westwood, 1848 (Coleoptera: Lucanidae)

Lucanidae (Coleoptera: Scarabaeidae) are fascinating beetles exhibiting significant dimorphism and are widely used as beetle evolutionary study models. However, lacking high-quality genomes prohibits our understanding of Lucanidae. Herein, we proposed a chromosome-level genome assembly of a widespread species, Prosopocoilus inquinatus, combining PacBio HiFi, Illumina, and Hi-C data. The genome size reaches 649.73 Mb, having the scaffold N50 size of 59.50 Mb, and 99.6% (647.13 Mb) of the assembly successfully anchored on 12 chromosomes. The BUSCO analysis of the genome exhibits a completeness of 99.6% (n = 1,367), including 1,362 (98.5%) single-copy BUSCOs and 15 (1.1%) duplicated BUSCOs. The genome annotation identifies that the genome contains 61.41% repeat elements and 13,452 predicted protein-coding genes. This high-quality Lucanidae genome provides treasured genomic information to our knowledge of stag beetles.


Background & Summary
The stag beetle (Coleoptera: Lucanidae) is a family in Superfamily Scarabaeoidea, comprising around 1,500 species worldwide 1 .Most stag beetle species exhibit significant intraspecific or even interspecific sexual dimorphism, in which males usually tend to have extremely impressive mandibles to fight and attract females in the wild.Thus, stag beetles have received much attention since Linnaeus first described the Scarabaeus parallelipipedus from Europe (later transferred to the genus Dorcus) 2 .Many lucanid species have been selected as an ideal behavior and functional morphology study model, and their fascinating mandibles make them popular pets and valuably private collections [3][4][5][6][7] .In the wild, most stag beetles are closely related to forest ecosystems, as their carboxylic larvae usually feed on decaying logs and other litter, such as leaves or fungi [8][9][10] .
The major geographical distribution and species diversity of Lucanidae are associated with the Indomalayan and Palearctic regions; 33 genera and nearly 400 species are known from China [11][12][13] .The present research on the stag beetle primarily focuses on its taxonomy and phylogeny, including new species descriptions and mitochondrial genome studies 7,[11][12][13][14] .Our understanding of the stag beetle genome, especially high-quality genome assembly, remains in its infancy.Only one genome, Dorcus hopei, has been reported 15 .Compared with other beetles' sharply increasing genome assembly number, more high-quality genome assemblies for stag beetles have become necessary and inevitable.
To enhance the knowledge of the taxonomy, evolution, and ecology of Lucanidae, we proposed the chromosome-level genome of a widespread species, Prosopocoilus inquinatus (Westwood, 1848), with the combination of PacBio HiFi, Illumina, and Hi-C data.Genome annotation, including repeats, non-coding RNAs (ncRNAs), and protein-coding genes (PCGs) were analyzed and exhibited.The high-quality genome of P. inquinatus provides valuable genomic information for Lucanidae study.

Methods
Sample collection and sequencing.A single P. inquinatus male sample was collected for DNA and RNA sequencing data on April 30, 2023, in Motuo County, Xizang, China.Muscle tissue, including the pronotum and posterior abdomen, was extracted from the specimen and washed via phosphate-buffered saline (PBS) solution for five minutes to eliminate any possible external pollutants.The specimen was then transferred into liquid nitrogen, frozen for at least 20 minutes, and kept at −80 °C for temporary storage until sequencing.The specimen's genomic DNA (gDNA) was extracted using the FastPure ® Blood/Cell/Tissue/Bacteria DNA Isolation Mini Kit (Vazyme Biotech Co., Ltd, Nanjing, China).High molecular weight (HMW) gDNA was sheared into 15 kb with the MegaruptorTM device (Diagenode, Liege, Belgium) and was enriched using the AMPurePB Beads.PCR-free short reads library for whole genome sequencing (WGS) was prepared using the Truseq DNA PCR-free Kit.A PacBio HiFi 15 kb library was prepared using the SMRTbellTM Express Template Prep Kit 2.0, and the resulting library was sequenced on the PacBio Sequel II platform.The Hi-C data was carried out by digesting extracted DNA with the Mbol restriction enzyme.RNA was lysed from the specimen using the TRIzoTM Reagent (Invitrogen, Carlsbad, CA, USA).RNA-seq libraries were constructed using the VAHTS mRNA-seq v2 Library Prep Kit (Vazyme, Nanjing, China).The Illumina NovaSeq.6000 platform was used to build all short-read libraries.The Nanopore PromethlION platform constructed long reads of the RNA library.Berry Genomics (Beijing, China) carried out all library constructions and sequencing.Consequently, we obtained 272.73 Gb of sequencing data, including 109.10 Gb (152.68×) of Illumina reads, 42.50 Gb (65.41×) of PacBio HiFi reads, 101.03Gb (155.40×) of Hi-C data, 20.10 Gb of transcriptome data, including 9.72 Gb of short reads data and 10.38 Gb of long reads data (RNA-ONT) (Table 1).
De novo genome assembly.Raw genomic Illumina sequencing reads for genome scan were employed as quality control using Fastp v0.23.2 16 to remove adaptors, duplications, and low-quality reads.
Raw PacBio HiFi reads were generated into the primary assembly using Hifiasm v0.19.8 17 .The direct reads were then mapped with the raw HiFi reads using Minimap2 v2.24 18 to calculate the mapping rate.One round of primary self-polishing assembly was performed for primary assembly by utilizing NextPolish2 v0.2.0 19 .
Raw Hi-C data was processed under quality control to remove duplicates using Chromap v.0.2.5-r473 20 .Clean Hi-C data was then utilized to align the primary assembly for haplotype identification and division.Contigs were anchored and orientated onto chromosomes using YaHS v1.2 21 and Juicer v1.6.2 22 .The result of the contig assembly was reviewed, and any assembly errors were corrected manually under Juicebox v.1.11.08 23 .To determine the autosomes and sex chromosomes, the final assembly was remapped with raw HiFi data by using MiniMap2 to determine each chromosome length.Chromosome coverage was then calculated using SAMtools v. 1.9 24 by dividing raw data by chromosome length.Moreover, the X chromosome was also detected by chromosome synteny between the model beetle species, Tribolium castaneum, and the relative species Trypoxylus dichotomus according to the relatively conserved feature in insect sexual chromosome X 25 .Syntenic blocks were identified and determined using MCScanX 26 and TBtools 27 .Conclusively, the X chromosome was identified by exhibiting around half of the chromosome coverage compared with other chromosomes (Table 3) and re-confirmed by sharing high synteny features with other beetles' X chromosomes (Fig. 2).
To ensure the high-quality assembly of our genome, potential contaminants were detected and eliminated by software and NCBI.In this case, we focused on Humans, Bacteria, viruses, and plant sequences.Possible contaminants were detected using MMseq. 2 v11 28 , which utilizes BLASTN-like searches and the UniVec database based on the NCBI nucleotide database.Potential vector contaminants were also specifically detected and identified by blastn (BLAST + v2.11.0 29 ) against the UniVec database.Sequences with over 90% hits in the database above were considered contaminants, and sequences with over 80% hits were rechecked by online BLASTN analysis in the NCBI nucleotide database.The final genome assembly was uploaded to NCBI to detect and eliminate contaminants.According to vector search, no prominent contaminant was found in our assembly, reflecting the high quality of sample preparation and accuracy of specimen sequencing.The final P. inquinatus genome assembly eventually reached the chromosomal level with a total size of 649.73 Mb, consisting of 174 scaffolds and 195 contigs (Table 2).The scaffold and contig N50 length reached 59.5 Mb and 26.36 Mb, respectively.GC content of the P. inquinatus was 35.67%.Most contigs (612.12Mb, 94.21%) were firmly anchored and orientated onto 12 chromosomes.All chromosome coverage was computed and exhibited (Table 3).Among these chromosomes, one particular chromosome, number 12, has a coverage of 37.02 for long-read sequencing and 88.58 for short-read sequencing, around half of the other chromosomes (Table 3).Hence, the number 12 chromosome was considered the X chromosome in P. inquinatus.All chromosomes in assembly, including 11 autosomes and X chromosome, with individual lengths ranging from 17.22 to 75.68 Mb (Tables 2, 3; Fig. 1).Compared with the assembly result of its related species, Trypoxylus dichotomus    (Sarabaeidae) (636.37 Mb in genome size and 35.11%GC content), P. inquinatus exhibited a larger genome size and GC content (Table 4).
This specific repeat library was combined with RepBase-20230909 32 and added to the custom library.Repeat elements in the P. inquinatus genome were recognized and masked by RepeatMasker v.4.1.4 33 ).

Data Records
The raw sequencing data and genome assembly of Prosopocoilus inquinatus have been deposited at the National Center for Biotechnology Information (NCBI).The Illumina, PacBio, Hi-C, transcriptome short reads, and transcriptome long reads data can be found under identification numbers SRR27127825 57 , SRR27243604 58 , SRR27127828 59 , SRR27127827 60 , and SRR27127826 61    in the GeneBank in NCBI under accession number GCA_036172665.1 62 .The annotation results for repeated sequences, gene structure, and functional prediction have been deposited in the Figshare database 63 .

technical Validation
Berry Genomics (Beijing, China) carried out the DNA extraction.Two quantities, including the NanoDrop and Qubit, were mentioned during the extraction process (Table 7).Our extraction yielded a NanoDrop of 86 ng/μl and a 44.65 ng/μl Qubit.The 280/260 and the 260/230 of our stag beetle are 1.78 and 1.85, respectively.Two methods were used to evaluate the quality of the genome assembly.Firstly, BUSCO v5.4.4 64 was applied for assembly completeness calculation with the reference Insecta gene set (n = 1,367) with the euk_genome_met mode.The final genome assembly showed a BUSCO completeness of 99.6%, including 1,362 (98.5%) single-copy BUSCOs, 15 (1.1%) duplicated BUSCOs, 1 (0.1%) fragmented BUSCOs, and 4 (0.3%) missing BUSCOs.To investigate the quality of the de novo assembly, Merqury v1.3 65 was performed to identify possible assembly sequence errors based on efficient k-mer set operations and QV score calculation.Consequently, the k-mer completeness value of the stag beetle is 94.2%, and the QV score is 46.60.Both the k-mer value and the QV score reflect the high accuracy of the base pairs, combined with the BUSCOs, which exhibit the high completeness and accuracy of our genome assembly.The final annotation validation was also calculated by BUSCOs with a protein mode with the reference Insecta gene set (n = 1,367).The final annotation genome exhibited a BUSCO completeness of 99.6%, including 1,079 (78.9%) single-copy BUSCOs, 283 (20.7%) duplicated BUSCOs, 1 (0.1%) fragmented BUSCOs, and 4 (0.3%) missing BUSCOs.The mapping rate was also measured to determine the assembly accuracy.The mapping rates for PacBio, Illumina, RNA short reads, and RNA long reads were 99.6%, 96.51%, 96.93%, and 97.59%, respectively.These evaluations altogether reflected the high-quality value of the genome assembly.

Fig. 1
Fig.1Genome-wide chromosomal heatmap of Prosopocoilus inquinatus, with each chromosome and contig framed in blue and green, respectively."ChrX" represented the sex chromosome.

Fig. 3
Fig.3 Genome characteristics of Prosopocoilus inquinatus.Circos plot showing the genomic characters of P. inquinatus from outer to inner: chromosome length (Chr) (Mb), the density of GC content (GC), the density of protein-coding genes (GENE), the density of TEs (DNA, SINE, LINE, and LTR), and simple repeats (Simple).(The sliding window size is counted for every 10 kb).

Table 1 .
Statistics of the sequencing data generated for Prosopocoilus inquinatus.
, respectively, under the BioProject accession number PRJNA1015594 and BioSample accession number SAMN37358649.The assembled genome has been deposited

Table 5 .
Species taxonomic information and accession code of all samples used in this study.

Table 6 .
Summary statistics of genome annotations in the Prosopocoilus inquinatus genome.