Chromosomal-level genome assembly and single-nucleotide polymorphism sites of black-faced spoonbill Platalea minor

Platalea minor, or black-faced spoonbill (Threskiornithidae), is a wading bird confined to coastal areas in East Asia. Due to habitat destruction, it was classified as globally endangered by the International Union for Conservation of Nature. However, the lack of genomic resources for this species hinders the understanding of its biology and diversity, and the development of conservation measures. Here, we report the first chromosomal-level genome assembly of P. minor using a combination of PacBio SMRT and Omni-C scaffolding technologies. The assembled genome (1.24 Gb) contains 95.33% of the sequences anchored to 31 pseudomolecules. The genome assembly has high sequence continuity with scaffold length N50 = 53 Mb. We predicted 18,780 protein-coding genes and measured high BUSCO score completeness (97.3%). Finally, we revealed 6,155,417 bi-allelic single nucleotide polymorphisms, accounting for ∼5% of the genome. This resource offers new opportunities for studying the black-faced spoonbill and developing conservation measures for this species.


INTRODUCTION
The black-faced spoonbill Platalea minor (Threskiornithidae) (NCBI:txid259913, Figure 1A) is confined to coastal areas in East Asia, including Hong Kong, Macau, Taiwan, Vietnam, North Korea, South Korea, and Japan.The natural habitats of P. minor have been disturbed by human activities and industrialization, leading to the decline in the bird population over the last century [1,2].With an estimation of more than 6,000 individuals worldwide, the International Union for Conservation of Nature (also known as IUCN) has categorised the black-faced spoonbill as a globally endangered species.A quarter of the worldwide population of P. minor can be found in Hong Kong, and it is protected locally under the Wild Animals Protection Ordinance Cap 200.Genetic methods, including studies on genetic diversity and population structure, have been used to help retain this species with high conservation value [3,4].Nevertheless, a reference genome of this species was missing.

Sample collection
Tissue samples of 14 P. minor individuals were collected from the north and northwestern parts of the New Territories, Hong Kong, between February 2015 and February 2020, with help from Kadoorie Farm and Botanic Garden.These samples were stored in 95% ethanol.Details of the sample collection are listed in Table 1.AMPure PB beads was used to remove short fragments.The final preparation of the library was performed using the Sequel ® II binding kit 3.2 (PacBio Ref. No. 102-194-100).In brief, Sequel II primer 3.2 and Sequel II DNA polymerase 2.2 were added to anneal and bind to the SMRTbell templates, respectively.An internal control provided by the kit was also added.Finally, the library was loaded on the PacBio Sequel IIe System at an on-plate concentration of 90 pM with the diffusion loading mode.The sequencing was run in 30-h movies, with 120 min pre-extension.In total, one SMRT cell was used to output high-fidelity (HiFi) reads, and the sequencing data details are listed in Table 1.

Omni-C library preparation and sequencing
An Omni-C library was constructed using the Dovetail ® Omni-C ® Library Preparation Kit (Dovetail Cat.No. 21005), following the manufacturer's protocol.A total of 80 mg of tissue was ground into a powder with liquid nitrogen, transferred to 1 mL 1× PBS, and then subjected to crosslinking with formaldehyde and digestion with endonuclease DNase I.An aliquot of 2.5 μL lysate was used for assessing lysate quantification and fragment size distribution using Qubit ® Fluorometer and TapeStation D5000 HS Screen Tape, respectively.Novogene.The details of the sequencing data are listed in Table 1.

Genome assembly and gene model prediction
De novo genome assembly was performed using Hifiasm (RRID:SCR_021069) [5].Haplotypic duplications were identified and removed using purge_dups (RRID:SCR_021173) based on the depth of HiFi reads [6].Proximity ligation data from the Omni-C library was used to scaffold genome assembly by YaHS (RRID:SCR_022965) [7].Transposable elements (TEs) were annotated using the automated Earl Grey TE annotation pipeline (version 1.2) as previously described [8].Genome annotation was performed using Braker (v3.0.8) (RRID:SCR_018964) [9] with default parameters.Briefly, the genome was soft-masked using redmask (v0.0.2) [10].A total of 2,468,534 aves reference protein sequences were downloaded from NCBI as protein references.A blood RNA-Seq dataset (SRR6650848) [11] was also downloaded from NCBI and aligned to the soft-masked genome using hisat2 (RRID:SCR_015530) [12] to generate the bam file.The protein and bam files were used as input to Braker for genome annotation.
Details of the resequencing data are listed in Table 1.

Data validation and quality control
During DNA extraction and PacBio library preparation, the samples were subjected to quality control with NanoDrop™ One/OneC Microvolume UV-Vis Spectrophotometer, Qubit ® Fluorometer, and overnight pulse-field gel electrophoresis.The Omni-C library was inspected by Qubit ® Fluorometer and TapeStation D5000 HS ScreenTape.
Regarding the genome assembly, the Hifiasm output was blast to the NT database, and the resulting output was used as input for Blobtools (v1.1.1,RRID:SCR_017618) [18].

Genome assembly of P. minor
A total of 25.35 Gb of HiFi bases was generated with an average HiFi read length of 9,365 bp with 20× data coverage (Table 1).After scaffolding with 77.79 Gb Omni-C sequencing data, the assembled genome size was 1.24 Gb in 468 scaffolds, with a scaffold N50 of 53 Mb and L50 of 8 (Tables 1, 3 and 4; Figures 1B and C).The genome size is comparable to those of other bird species in the family Threskiornithidae, which have genome sizes around 1.0-1.3Gb, according to the data available in the NCBI Genbank, such as Theristicus caerulescens (1.20 Gb, GCA_020745775.1),Nipponia nippon (1.31 Gb, GCA_035839065.1), and Mesembrinibis cayennensis (1.19 Gb, GCA_013399675.1).The genome completeness was estimated by BUSCO (RRID:SCR_015008) with a value of 97.3% (aves_odb10) (Table 3; Figure 1B).The GC content was 42.98%.A total of 14,673 gene models were generated with 18,780 predicted protein-coding genes, having a mean coding-sequence length of 516 amino acids and a complete protein BUSCO value of 78.3% (Table 3).

Repeat content
A total repeat content of 11.94% was found in the genome, which contained a lower level of repeat elements, similar to other avian genomes [32], with 2.49% unclassified elements.Of the remaining repeats, long interspersed nuclear elements (LINE) were the most abundant (5.10%), followed by long terminal repeats (LTR) (1.62%).In contrast, DNA, short interspersed nuclear elements (SINE), Penelope, and rolling circle were only present in low proportions (DNA: 0.63%, SINE: 0.09%, Penelope: 0.06%, rolling circle: 0.02%).A complete catalogue of the repeat content of the genome can be found in Table 5 and Figure 1D.

Single nucleotide polymorphism sites
A total of 6,046,878 bi-allelic SNPs were called from 13 P. minor individuals, accounting for ∼0.5% of the genome.The mean individual heterozygosity was 0.142%.The lowest individual heterozygosity (0.077%) was close to other endangered bird species, such as Pelecanus crispus (0.60%) and Nestor notabilis (0.91%) [33].The heterozygosity levels (0.108% to 0.116%) from five individuals were comparable to previous reports on spoonbills -black-faced spoonbill (0.101%-0.116%, mean 1.09%, n = 11) and royal spoonbill (0.098%-0.109%, mean 0.105%, n = 9) [4].The remaining heterozygosity levels observed in this study were below the mean (0.221%) and median (0.213%) of heterozygosity reported from 40 avian species [33].Signals of inbreeding were observed among the samples, with the inbreeding coefficient (F IS ) ranging from 0.331 to 0.720 (Table 6), providing additional evidence of a recent genetic bottleneck in the black-faced spoonbill population [4].High levels of F IS have also been observed in other bird populations suffering from past bottlenecks [34].These results highlighted the need for continuous efforts in monitoring P. minor.

CONCLUSION AND REUSE POTENTIAL
This study presents the first chromosomal-level genome assembly and single-nucleotide polymorphism sites of black-faced spoonbill Platalea minor.These are useful and valuable resources for future population genomic studies aimed at better understanding spoonbill species numbers and conservation.

Figure 1 .
Figure 1.(A) Picture of Platalea minor; (B) Statistics of the genome assembly generated in this study; (C) Hi-C contact map of the assembly visualised using Juicebox (v1.11.08); (D) Repetitive elements distribution.
Then, end polishing, bridge ligation, and proximity ligation were carried out in the crosslinked DNA fragments.Next, crosslink reversal was performed, followed by DNA purification and size selection with SPRIselect™ Beads (Beckman Coulter Product No. B23317).The library preparation was continued with end repair and adapter ligation using the Dovetail™ Library Module for Illumina (Dovetail Cat.No. 21004), followed by DNA purification with SPRIselect™ Beads.The DNA fragments were then captured with Streptavidin Beads and Universal and Index PCR Primers from the Dovetail™ Primer Set for Illumina (Dovetail Cat.No. 25005) were added to amplify the DNA library.A final size selection was carried out using SPRIselect™ Beads to retain DNA fragments ranging between 350 bp and 1000 bp.The quantity and fragment size distribution of the library were inspected by the Qubit ® Fluorometer and the TapeStation D5000 HS ScreenTape, respectively.The final library was sequenced on an Illumina HiSeq-PE150 platform at

Figure 2 .
Figure 2. Genome assembly quality control and contaminant/cobiont detection.The upper panel shows the BlobPlot of the assembly.Each circle represents a scaffold with its size scaled according to its scaffold length, while the colour of the circle indicates the taxonomic assignment from BLAST similarity search results.The lower panel reveals the ReadCovPlot of the assembly, illustrating the proportion of unmapped and mapped sequences in the BLAST similarity search results on the left.The latter is further dissected according to the rank of phylum on the right.

Table 1 .
Summary of sequencing data.
to remove any non-SMRTbell structures, and a subsequent size-selection step with 35%

Table 4 .
Scaffold information with a length larger than 1 Mb.

Table 5 .
Summary of the repetitive elements analysis.

Table 6 .
[35]er of SNPs, statistics of heterozygosity and inbreeding coefficient of 13 Platalea minor individuals.SAMN40731791) and PacBio HiFi (SAMN35152374) data, have been deposited in the NCBI database under the BioProject accession number PRJNA973839.The genome, genomic and repeat annotation files have been deposited and are publicly available in Figshare[35].