The genome sequence of the Mauritius parakeet, Alexandrinus eques (formerly Psittacula eques) (A.Newton & E. Newton, 1876)

We present a genome assembly from an individual male Alexandrinus eques, formerly Psittacula eques (the Mauritius Parakeet; Chordata; Aves; Psittaciformes; Psittacidae). The genome sequence is 1203.8 megabases in span. Most of the assembly is scaffolded into 35 chromosomal pseudomolecules, including the Z sex chromosome. The mitochondrial genome has also been assembled and is 18.86 kilobases in length.


Background
The Mauritius Parakeet (Alexandrinus eques; formerly Psittacula eques) is the only surviving endemic species of parrot in Mauritius and the Mascarenes.Characterised by its bright green plumage and red and black markings around the beak and eyes (Figure 1A), this parakeet feeds predominantly on the fruits, flowers, and leaves of native forest plants.Mauritius Parakeets are known for nesting in natural cavities within mature trees, where they lay between two to four eggs each breeding season, occurring mainly from September to January.
The population of the Mauritius Parakeet plummeted due to habitat loss and invasive species, dwindling to just 20 birds by 1986 and facing imminent extinction (Jones, 1987;Tatayah et al., 2007).Through three decades of dedicated conservation efforts, including captive-breeding, reintroductions, nest management, predator control, supplementary feeding, and habitat restoration, the population rebounded (Jones, 2010;Jones et al., 2013;Jones et al., 1998) (Figure 1B).By 2019, it reached an estimated 750 birds, leading to its downlisting in the IUCN Red List from Critically Endangered to Vulnerable (BirdLife International, 2019).
In 2005, Psittacine Beak and Feather Disease (PBFD) was detected in the population, a condition characterised by feather dystrophy and immunosuppression, of which the causative agent, Beak and Feather Disease Virus, is one of the most common infections of parrots (Kundu et al., 2012;Ritchie et al., 1989).Although PBFD can be fatal, with juveniles being particularly susceptible (Todd, 2000), the species continued to recover despite significant sub-lethal effects detected in the free-living population (Tollington et al., 2015).
During the decline and recovery of the species, there was a significant loss in genetic diversity, including reduced heterozygosity and allelic richness at microsatellite loci (Tollington et al., 2013).Initial genetic structure showed differentiation between subpopulations, which has diminished as their size and range expanded due to intensive conservation efforts (Raisin et al., 2012;Tollington et al., 2013).The ongoing conservation management includes supplementary feeding, which boosts reproductive fitness but may increase BFDV transmission (Fogell et al., 2019;Fogell et al., 2021;Tollington et al., 2019).Continuous monitoring of genetic diversity, viral prevalence, productivity, and population viability is in place.A vast archive of biological samples and decades of fitness data position the Mauritius Parakeet as an ideal model for studying genomic changes during population recovery and during outbreaks of emergent infectious diseases (EID).
Currently, hundreds of whole genomes are being re-sequenced from historical (pre-1900), recent (1990-2000) and contemporary samples to address these questions.This research efforts are part of a collaboration between several universities (UK -University of Kent, University of East Anglia, Denmark -The University of Copenhagen) with the Government of Mauritius' National Parks and Conservation Service (NPCS) and the Mauritian Wildlife Foundation (MWF -conservation NGO, Mauritius).The conservation monitoring and management of the Mauritius Parakeet is led by the MWF in collaboration with the NPCS under guidance from the university partners.Recent conservation actions have also been implemented by Ebony Forest Reserve (conservation group).

Genome sequence report
The genome was sequenced from a blood sample from a male Alexandrinus eques collected from Black River Gorges, Mauritius (-20.39, 57.45).A total of 79-fold coverage in Pacific Biosciences single-molecule HiFi long reads was generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected 41 missing joins or mis-joins, reducing the scaffold number by 13.13%, and increasing the scaffold N50 by 1.02%.
The final assembly has a total length of 1203.8Mb in 171 sequence scaffolds with a scaffold N50 of 107.0 Mb (Table 1).The snail plot in Figure 2 provides a summary of the assembly statistics, while the distribution of assembly scaffolds on GC proportion and coverage is shown in Figure 3.The cumulative assembly plot in Figure 4 shows curves for subsets of scaffolds assigned to different phyla.Most (97.61%) of the assembly sequence was assigned to 35 chromosomal-level scaffolds, representing 34 autosomes and the Z sex chromosome.Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 5; Table 2).The Z chromosome was identified based on the Hi-C signal from female sample (Pacbio HiFi data used for de novo assembly was derived from a male).While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.The mitochondrial genome was also assembled and can be found as a contig within the multifasta file of the genome submission.The estimated Quality Value (QV) of the final assembly is 65.6 with k-mer completeness of 100.0%, and the assembly has a BUSCO v5.4.3 completeness of 97.1% (single = 96.8%,duplicated = 0.3%), using the vertebrata_odb10 reference set (n = 8,338).

Sample acquisition and nucleic acid extraction
Blood sampling of Mauritius Parakeets is routinely conducted by the MWF and university researchers, overseen by the International Zoo Veterinary Group (IZVG).Samples are taken

Sequencing
Pacific Biosciences HiFi circular consensus DNA sequencing libraries were constructed according to the manufacturers' instructions.Poly(A) RNA-Seq libraries were constructed using the NEB Ultra II RNA Library Prep kit.DNA and RNA sequencing was performed by the Scientific Operations core at the WSI on Pacific Biosciences Sequel IIe, Revio (HiFi) and Illumina NovaSeq 6000 (RNA-Seq) instruments.Hi-C data were also generated from blood from bPsiEch1 using the Arima2 kit and sequenced on the Illumina NovaSeq 6000 instrument.
The sanger-tol/blobtoolkit pipeline is a Nextflow port of the previous Snakemake Blobtoolkit pipeline (Challis et al., 2020).It aligns the PacBio reads with SAMtools and minimap2 (Li, 2018) and generates coverage tracks for regions of fixed size.In parallel, it queries the GoaT database (Challis et al., 2023) to identify all matching BUSCO lineages to run BUSCO (Manni et al., 2021).For the three domain-level BUSCO lineage, the pipeline aligns the BUSCO genes to the Uniprot Reference Proteomes database (Bateman et al., 2023) with DIAMOND (Buchfink et al., 2021) blastp.The genome is also split into chunks according to the density of the BUSCO genes from the closest taxonomically lineage, and each chunk is aligned to the Uniprot Reference Proteomes database with DIAMOND blastx.Genome sequences that have no hit are then chunked with seqtk and aligned to the NT database with blastn (Altschul et al., 1990).All those outputs are combined with the blobtools suite into a blobdir for visualisation.
Table 3 contains a list of relevant software tool versions and sources.

Wellcome Sanger Institute -Legal and Governance
The materials that have contributed to this genome note have been supplied by a Tree of Life collaborator.The Wellcome Sanger Institute employs a process whereby due diligence is carried out proportionate to the nature of the materials themselves, and the circumstances under which they have been/are to be collected and provided for use.The purpose of this is to address and mitigate any potential legal and/or ethical implications of receipt and use of the materials as part of the research project, and to ensure that in doing so we align with best practice wherever possible.
The overarching areas of consideration are: • Ethical review of provenance and sourcing of the material

Jana Wold
University of Canterbury, Christchurch, New Zealand Here, Morales et al provide a high quality genome assembly for a threatened parrot.This assembly will become a foundational genomic resource for a investigation into the genomic diversity of the Mauritius Parakeet.The assembly, and RNA-seq informed annotation, and may be a powerful tool in the continued conservation efforts for this species.
The methods implemented here are well considered and appropriate.I have full confidence that the assembly will be a useful resource going forward.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Conservation genomics I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.although not identical to its sister species; maybe worth mentioning somewhere in the manuscript Note that on Genbank, the genome project is for Psittacula echo, which is the endemic subspecies to Mauritius.It may be worth to homogenize the names between Genbank and the present manuscript, either by using Alexandrinus eques echo throughout the manuscript and mention that the extinct nominate subspecies is found on Reunion Is or change the name on genbank.
As part of a genome description publication, it may be useful to further comparative analyses have the proportion of: The percentage of repeat sequences Reviewer Expertise: Molecular Systematics, Genomics I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Figure 1 .
Figure 1.The fall and rise of the Mauritius Parakeet.(A) A male Mauritius Parakeet (Alexandrinus eques; formerly Psittacula eques; photo credit Jacques de Speville) (B) Demographic trajectory over time (bottleneck and recovery), the line represents the number of known breeding pairs from the monitoring programme.Total population census sizes are estimated to include non-breeding individuals and subadults.The bars represent the number of captive-breed individuals released into the free-living population.

Figure 2 .
Figure 2. Genome assembly of Alexandrinus eques metrics.The BlobToolKit snail plot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 1,203,831,919 bp assembly.The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (168,074,981 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 scaffold lengths (106,987,384 and 12,426,226 bp), respectively.The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the aves_odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/Psittacula%20echo/dataset/CAUJLS01/snail.

Figure 3 .
Figure 3. Genome assembly of Alexandrinus eques; formerly Psittacula eques, bPsiEch3.1:BlobToolKit GC-coverage plot.Sequences are coloured by phylum.Circles are sized in proportion to sequence length.Histograms show the distribution of sequence length sum along each axis.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/Psittacula%20echo/dataset/CAUJLS01/blob.

Figure 5 .
Figure 5. Genome assembly of Alexandrinus eques; formerly Psittacula eques, bPsiEch3.1:Hi-C contact map of the bPsiEch3.1 assembly, visualised using HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=fHq_ahaYS1eLkq54w6IhGA.

Reviewer Report 05
September 2024 https://doi.org/10.21956/wellcomeopenres.24881.r94861© 2024 Fuchs J.This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Jérôme Fuchs Institut de Systématique Evolution Biodiversité, Muséum national d'Histoire naturelle CNRS SU EPHE UA, Paris, France The following manuscript describes a chromosome level assembly of the endemic Mauritius parakeet Alexandrinus equees, a species that went through a very strong population bottleneck Maybe it is worth mentionning that, as currently understood, the Mauritius Parakeet is nested within or very closely related to the Ring-necked Parakeet (A. krameri) I have very few comments that the authors may consider."Most of the assembly sequence was assigned to 35 pseudomolecules, representing 34 autosomes and the Z sex chromosome."2n= 70 is coherent with what is known for Alexandrinus species (Ray-Chaudhuri et al. 1969),

○
Number of protein coding genes (since RNA was sequenced) ○ and possibly: Number of snps in the genome and per chromosome ○ The two former descriptive statistics are routinely mentioned in genome papers Yes Are the protocols appropriate and is the work technically sound?Yes Are sufficient details of methods and materials provided to allow replication by others?Yes Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.

Table 2 . Chromosomal pseudomolecules in the genome assembly of Alexandrinus eques; formerly Psittacula eques, bPsiEch3. INSDC accession
RNA was extracted from the bPsiEch3 blood sample in the Tree of Life Laboratory at the WSI using the RNA Extraction: Automated MagMax™ mirVana protocol(do Amaral  et al., 2023).The RNA concentration was assessed using a Nanodrop spectrophotometer and a Qubit Fluorometer using the Qubit RNA Broad-Range Assay kit.Analysis of the integrity of the RNA was done using the Agilent RNA 6000 Pico Kit and Eukaryotic Total RNA assay.
Protocols developed by the WSI Tree of Life laboratory are publicly available on protocols.io(Dentonetal., 2023b).

Table 1 .
Each transfer of samples is undertaken according to a Research Collaboration Agreement or Material Transfer Agreement entered into by the Tree of Life collaborator, Genome Research Limited (operating as the Wellcome Sanger Institute) and in some circumstances other Tree of Life collaborators.annotated using available RNA-Seq data and presented through the Ensembl pipeline at the European Bioinformatics Institute.Raw data and assembly accession identifiers are reported in