Chromosome-level Genomes Reveal the Genetic Basis of Descending Dysploidy and Sex Determination in Morus Plants

Multiple plant lineages have independently evolved sex chromosomes and variable karyotypes to maintain their sessile lifestyles through constant biological innovation. Morus notabilis, a dioecious mulberry species, has the fewest chromosomes among Morus spp., but the genetic basis of sex determination and karyotype evolution in this species has not been identified. In this study, three high-quality genome assemblies were generated for Morus spp. [including dioecious M. notabilis (male and female) and Morus yunnanensis (female)] with genome sizes of 301–329 Mb and were grouped into six pseudochromosomes. Using a combination of genomic approaches, we found that the putative ancestral karyotype of Morus species was close to 14 protochromosomes, and that several chromosome fusion events resulted in descending dysploidy (2n = 2x = 12). We also characterized a ∼ 6.2-Mb sex-determining region on chromosome 3. Four potential male-specific genes, a partially duplicated DNA helicase gene (named MSDH) and three Ty3_Gypsy long terminal repeat retrotransposons (named MSTG1/2/3), were identified in the Y-linked area and considered to be strong candidate genes for sex determination or differentiation. Population genomic analysis showed that Guangdong accessions in China were genetically similar to Japanese accessions of mulberry. In addition, genomic areas containing selective sweeps that distinguish domesticated mulberry from wild populations in terms of flowering and disease resistance were identified. Our study provides an important genetic resource for sex identification research and molecular breeding in mulberry.


Introduction
Mulberry (Morus spp.) is a major member of the family Moraceae, which also includes various other important plant species, such as banyan tree and paper mulberry. As one of the earliest domesticated plants, mulberry is considered an ''oriental sacred wood", because it is not only a good food source for rearing silkworms but has also been utilized in a number of other ways, including as a fruit, for landscaping, in medicine, in ecological protection, and as forage for animal production [1][2][3]. The draft genome sequence assembly of male Morus notabilis was published in 2013 based on short sequencing reads [4]. In 2020, genome sequencing research revealed two diploid karyotypes in mulberry species [5]. Congeneric species commonly display varying chromosome counts due to two opposing trends: 1) an increased chromosome copy number as a result of polyploidy [whole-genome duplication (WGD)] and 2) a reduced basic chromosome number via structural rearrangements (descending dysploidy) [6,7]. These processes have long been considered important in speciation due to the shock of chromosome number discrepancies in reproductive isolation [8]. Multi-karyotype evolutionary models for mulberry are limited due to the existence of only a few comparative analyses of multiple chromosome-scale genomes. Without considering genomic synteny and large-scale rearrangement events, misleading conclusions on pathway evolution have likely been obtained from evolutionary and comparative analyses of genes for important agronomic traits. Similar to M. notabilis, Morus yunnanensis is also commonly found in Southwest China due to its unique altitude and humidity requirements [9]. The phylogenetic relationship between wild M. notabilis and M. yunnanensis is unknown. In addition, our previous study revealed the population structure of cultivated mulberry but lacked sampling of gene pools within wild mulberry and in the Guangdong region, leaving gaps in our knowledge and raising important questions about the evolutionary history of domesticated perennial mulberry [5].
Mulberry species are either dioecious or hermaphroditic, thus providing abundant resources for research on plant sex determination [10,11]. The sex determination system of mulberry is similar to that in humans (XY type) [12]. Specific DNA markers associated with the sex determination of male flowers in mulberry have been identified using restriction site-associated DNA sequencing [13]. However, the genotypes of these identified markers were not correlated with sex determination among cultivars, and genomic evidence remains unavailable. Previously, clonal propagation was widely used in mulberry propagation, and consequently, sex determination has been little studied in this genus. Moreover, the key genetic basis of sex chromosomes in mulberry has yet to be discovered.
With the development of the fruit mulberry industry, cultivation of mulberry cultivars with stable sex characteristics has become increasingly important to ensure high fruit production. Knowledge of sex-specific gene expression is a prerequisite for developing an effective breeding program for mulberry (especially fruit mulberry). Dissecting the genetic mechanisms underlying dioecy (i.e., separate male and female trees) is crucial for understanding the evolution of this widespread reproductive strategy. M. notabilis always shows dioecious, which is not affected by the environment, making it an ideal system for studying the genetic basis of sex determination and sex chromosome evolution.
In this study, we performed de novo assembly of two chromosome-level genomes of female M. yunnanensis and female M. notabilis to improve the previous draft sequence of male M. notabilis, and revealed the evolutionary genomic basis of descending dysploidy and sex determination. Compared with our previously reported Morus alba genome, the genomes of the two wild species were more diverse in terms of chromosome number and environmental adaptation level, providing new insights into karyotype evolution in Morus. Through a multidata combination analysis, we not only identified the putative sex-determining region (SDR) and candidate loci responsible for sex determination but also provided a clear framework for broader studies of sex determination in mulberry species in the future. Population genomic analyses further revealed the phylogenetic relationships within the gene pool of different Morus species and the genetic architecture of domestication traits of cultivated mulberry. These sequencing results will provide a valuable resource for future mulberry research and breeding programs.

Assembly and annotation of three high-quality Morus genomes
To assemble the genomes of male and female individuals of dioecious M. notabilis and female M. yunnanensis, we used a combination of sequencing technologies including Pacific Biosciences (PacBio) long reads, Illumina paired-end reads, and high-throughput chromosome conformation capture (Hi-C) sequencing reads. Based on k-mer counting, the estimated genome sizes of M. notabilis and M. yunnanensis were approximately 280 Mb and 296 Mb, respectively (Figure S1A-C). The genome assemblies of female M. notabilis and female M. yunnanensis were highly contiguous, with 96.4% and 95.1% of genome contigs anchored to chromosomes by Hi-C scaffolding, respectively (with genome assembly sizes of 301 Mb and 313 Mb and contig N50 values of 2.7 Mb and 6.5 Mb, respectively). Moreover, Hi-C scaffolding revealed that 91.8% of the original male M. notabilis sequences were anchored on six pseudochromosomes with a contig N50 of 6.8 Mb, indicating considerably improved assembly contiguity compared with the original version (Figure S1D-F; Table S1). The coverage of the genome assembly obtained here was evaluated using high-coverage Illumina sequencing data and transcriptome reads mapped against the assembled genome (Table S2). The scores of the long terminal repeatretrotransposon (LTR-RT) assembly index [14] for male M. notabilis, female M. notabilis, and female M. yunnanensis were 19.98, 20.48, and 21.25, respectively, and the Benchmarking Universal Single-Copy Orthologs (BUSCO) values were 93.5%, 93.6%, and 94.1%, respectively. These results imply that the three genomes are of high quality ( Table 1).

Repetitive elements drive genome expansion of mulberry species
Synteny analysis of two Morus genomes with grape (Vitis vinifera) did not reveal WGD after the triplication event shared by eudicots, similar to the pattern previously observed in the M. alba genome [5] ( Figure S2). Furthermore, the collinearities of the genomes of M. notabilis, M. yunnanensis, Ficus microcarpa, and Ficus hispida revealed a high frequency of chromosomal rearrangement events, confirming that M. notabilis and M. yunnanensis were diploid (2n = 2x = 12) ( Figure 1A and B). Phylogenetic analysis of eight angiosperms using singlecopy orthologues identified by OrthoFinder [15] revealed that M. notabilis and M. yunnanensis diverged only approximately 3 million years ago (MYA) ( Figure S3), indicating their close phylogenetic relationship. Combined with similar histological morphologies ( Figure S4), these results suggest that M. notabilis and M. yunnanensis belonged to the same subgenus. Moreover, using a combination of several wild mulberry accessions and Ficus as the outgroup, we constructed an evolutionary tree with single nucleotide polymorphisms (SNPs) of fourfold degenerate (neutral) evolving sites based on the maximum-likelihood method in IQ-TREE [16]. The evolutionary distances and divergence time between M. notabilis and domesticated M. alba were far greater than those observed between other wild mulberry trees ( Figure 1C), suggesting that these two species may belong to parallel evolutionary clades.
Whole-genome alignment analysis showed that 30.2 Mb (9% of the assembled genome) covering 870 genes in M. notabilis and 51.4 Mb (15%) covering 948 genes in M. alba were segmental duplications (SDs) ( Figure 1D; Table S3), indicating that SD was a major contributor to genome-size expansion. Genes overlapping within the female M. notabilis SD regions were significantly enriched in fatty acid degradation biological processes (''ath00071", hypergeometric test, adjusted P value < 8.10 Â 10 À5 ) ( Figure S5). We identified 54% of the male M. notabilis genome, 49% of the female M. notabilis genome, and 51% of the female M. yunnanensis genome as transposable elements (TEs) ( Figure 1D; Table S3).  Genome-wide proliferation of intact LTR-RTs from M. yunnanensis and female and male M. notabilis occurred approximately 1.5 MYA. Moreover, these LTR-RTs were dated to more recent than M. alba formation ( Figure 1D; Table S3). Therefore, we concluded the occurrence of a high level of divergence in SDs and TEs between M. notabilis and M. alba. We also identified an average of 13,179,790 SNPs between the M. notabilis and M. yunnanensis genomes in one-to-one aligned regions (Table S4). The total length of small insertions and deletions (50-500 bp) in those one-to-one aligned regions was 644,198 bp, which accounted for approximately 0.21% of the M. notabilis genome (Table S4). Notably, fewer SNPs, small insertions, and insertion-deletion mutations (indels) were observed between the M. notabilis and M. yunnanensis genomes than between the M. notabilis and M. alba genomes, suggesting a strong relationship between the two wild-grown plants, a finding consistent with the results from the phylogenetic tree described above. Two sequences of 31.91 Mb and 125.95 Mb were affected by structural variants (SVs) in the two comparisons. In particular, most (71%) of the SV sequences in the comparison between M. notabilis and M. alba showed repeat expansion and contraction.

Chromosomal fusion in mulberry genomes is associated with adaptive evolution
Chromosomal evolution is associated with genome size, gene family evolution, and speciation. Genome structural changes led to the present-day karyotypes of M. notabilis (2n = 12) and M. alba (2n = 28). Using the available genomes of M. notabilis, M. alba, Ficus microcarpa, Ficus hispida, and Cannabis sativa (as the outgroup), we reconstructed the ancestral karyotype of the Moraceae (n = 21) (Figure 2A), which corresponded to the ancestral eudicot karyotype (AEK) [17]. Compared with the putative ancestral chromosomes, 22 and 24 large syntenic blocks were identified in M. notabilis and M. alba, respectively (Table S5), which enabled us to deduce the arrangements of ancestral chromosome segments in mulberry. Karyotyping of M. alba revealed that at least ten major chromosomal fusions (CFUs) and one chromosomal fission of 21 chromosomes of the paleohexaploid ancestor may have been involved. Approximately half of the chromosomes of M. alba were found to have descended from a single ancient chromosome, a finding similar to the results reported for paper mulberry [18]. Moreover, pseudochromosomes of M. notabilis were constructed from the ancestor karyotype via at least 18 CFU events and one chromosomal fission event, resulting in a substantial decrease in the haploid chromosome number from 21 to 6. Furthermore, MnChr1, MnChr2, MnChr3, and MnChr4 were each derived from at least two M. alba chromosomes via complex translocations. The high-quality genome assemblies revealed that the karyotype of M. alba is more similar than that of M. notabilis to the ancestral karyotype, whereas the karyotype of M. notabilis was derived from more CFUs. We performed repeat element annotation analysis to investigate the genome features of evolutionary fusion regions (EFRs) and found that the EFRs in the M. notabilis genomes were mainly enriched with LTR elements (50.29%) ( Table S6).
We identified 26 genes located in six CFU regions (Figure 2B). The Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis confirmed that the genes in the rearranged regions of M. notabilis and M. alba chromosomes were mainly enriched in functions related to photosynthesis (''ath00195", hypergeometric test, adjusted P value < 6.35 Â 10 À10 ) and metabolic pathways (''ath01100", hypergeometric test, adjusted P value < 1.55 Â 10 À4 ) ( Figure S6A). The Gene Ontology (GO) annotation results showed that the rearranged regions were mainly involved in adenosine diphosphate-binding sites (''GO: 0043531", hypergeometric test, adjusted P value < 4.34 Â 10 À9 ) and positive regulation of gene expression (''GO: 0010628", hypergeometric test, adjusted P value < 2.13 Â 10 À5 ) ( Figure S6B), which prompted us to investigate the function of chromosomal shuffling loci in Morus species. We then examined the transcriptome expression levels of these genes in M. notabilis, M. yunnanensis, and the cultivars of M. alba ( Figure 2C; Table S7). Most of these fusion genes were differentially expressed in the two karyotypes of mulberry. These findings provide novel insights into chromosome evolution and link chromosomal rearrangements to the evolution of functional genes.

Location of SDR and identification of candidate sex-determining genes
Short Illumina reads of four male and four female plants of M. notabilis were subsequently catalogued into 40-bp k-mers in different categories using the genome-wide classification method to identify the putative SDR and sex locus regions [19,20]. Sex-specific k-mers (detected in all samples of one sex, but not in the other sex) were obtained, including 333,348 male-specific k-mers (MSKs) and 1664 femalespecific k-mers (FSKs). The higher MSK count in males was consistent with the results obtained for persimmon and ginkgo [19,20], suggesting unique genomic regions in male individuals. This result suggested that the sex determination system of M. notabilis is an XY system.
Based on the position of male-specific reads in the genome assembly, we identified a candidate SDR (Chr3: 38,911,287-45,186,478) that contained male-specific reads with 100-kb windows, whereas female-specific reads were found to be uniformly distributed throughout the genome ( Figure 3A, tracks a and b; Figure S7). We then detected higher densities of SNPs and indels in the SDR in male M. notabilis individuals than in female individuals but not in the rest of the genome, indicating early divergence between the Y and X chromosomes (Figure 3A, tracks c-e). In addition, we further analyzed the genome-wide methylation levels of male and female flowers, and found that CG and CHG methylation levels were higher in the candidate SDR of males ( Figure 3A, tracks f-h).
We delineated the SDR-X and SDR-Y haplotypes in this region ( Figure 3B). The collinearity results indicated that some fragments of the X and Y haplotypes were not aligned, which motivated us to identify any inversions present on the Y chromosome in M. notabilis that masked recombination, consistent with the results obtained from banyan tree [21]. Based on this result, these sex-specific regions contained additional candidate genes for sex determination. From the analysis, 404 genes were predicted in the SDR-Y haplotype, and differentiation of Y and X haplotypes in this region provided strong evidence for the presence of a fully linked region on Chr3. A total of 306 conserved gene pairs (75.76%) from synteny blocks within the SDR and X counterpart were detected using MCscan (Figure S8; Table S8). We further evaluated two genomic datasets for sex determination as follows: 1) based on resequencing data of different sexes, we determined the regions present only in male Morus genomes, and 2) the expression of sex biasrelated genes in male flowers was measured. Notably, all analyses revealed the existence of a 5-kb region (Chr3: 44,129,165-44,134,608; Figure 3C and D, Figures S9 and S10).
Manual annotation and curation revealed four genes in this region: three male-specific Ty3_Gypsy retrotransposon (RT) gene models (designated MSTG1, MSTG2, and MSTG3) and one male-specific DNA helicase gene (designated MSDH) resulting from partial duplication of the RECQL1 gene (located on Chr2 and found in both male and female genomes) ( Figure 3E). The transcriptomic analysis showed that the four male-specific genes were expressed at high levels in male flowers, suggesting their importance in male development and maintenance in M. notabilis ( Figure 3F; Table S9).
In particular, the coding region of the MSDH gene showed 95.25% nucleotide identity with that of canonical RECQL1, with considerable divergence at the N-terminus. The RECQL1 gene contained 10 exons, while MSDH had only seven exons ( Figure 3E). Moreover, the MSDH gene in M. notabilis appeared to have undergone a splicing event involving its second exon. MSDH was predicted to have helicase_ATP_BIND and helicase_C domains and structural similarities to RECQL1 ( Figure S11), which are critical for anti-crossover signaling during meiotic recombination [22].
We used polymerase chain reaction (PCR) primers designed to amplify four separate gene fragments and detected target fragments of each sex in a collection of dioecious mulberry species, including M. notabilis and species from other subgenera. For the MSDH gene, we designed a specific gene primer fragment (named MSDH-sp1; physical location shown in Figure 3G) for male-specific amplification. Four gene fragments were successfully amplified in all male individuals, whereas no amplified products were obtained from female individuals ( Figure 3G, Figures S12 and S13). Based on these results, we revealed a possible XX/XY-determining system and putative candidate sex-determining genes in Morus.

Population structure analysis improves the landscape of genetic affinity in mulberry accessions
Wild relatives are expected to be important sources of genetic diversity in mulberry, and understanding the phylogenetic relationship between wild and cultivated mulberry accessions might greatly facilitate mulberry breeding. In the present study, we resequenced 32 representative wild and landrace specimens from various regions, including Cambodia, Sri Lanka, China, and other countries. Using these data and pub-licly available genomic sequences for 123 cultivars/landraces, the genetic divergence among multiple wild and domesticated species was studied (Table S10). All sequence reads were mapped to the M. alba genome with an average coverage depth of $ 21.6Â (Table S11), and a total of 29,185,577 SNPs were identified and used in subsequent population-based genomic analyses ( Figure 4A; Table S12).
We performed ADMIXTURE analysis, neighbor-joining (NJ) tree analysis, and principal component analysis (PCA) using genomic SNPs to investigate the phylogenetic relationships between wild relatives and cultivated mulberry. The NJ tree showed clustering of mulberry accessions into six separate genetic groups with the Ficus genus (including F. hispida and Ficus macrocarpa) at the root ( Figure 4B). The first group (CHS) included the wild-growing plants M. notabilis and M. yunnanensis from Southwest China. The second group (WD) included 11 wild plants collected in China and other countries. The third group (JP) mainly included landraces and cultivars from Japan and other countries. The fourth group (GD) consisted of landraces and cultivated species mainly from Guangdong Province. The fifth group (HU) was mainly derived from Taihu Basin in the southern Yangtze delta plain in China, whereas the sixth group (NH) consisted of cultivated mulberry accessions from other places, mainly distributed throughout northern China. The phylogeny of the wild species revealed their close relationships with the cultivated relatives in the JP and GD groups. PCA was performed to confirm these phylogenetic relationships ( Figure 4C and D). When higher k values were used (Table S13), the NH group was further divided into subgroups, including those from Northeast China, Southwest China, Northwest China, North-central China, and Northeast China ( Figure 4E, Figure S14).
The results presented in Figure 4F show that nucleotide diversity (p) was highest in WD, followed by JP and GD (Table S14). The highest p (5.51 Â 10 À3 ) among the four domesticated groups was observed in JP, which also showed the most singletons and the highest linkage disequilibrium (LD) decay rate ( Figure 4G). We further explored the phylogeny and migration history of cultivated and wild mulberry using Treemix [23]. When using CHS as an outgroup, a bifurcation pattern similar to the initial phylogenetic result was observed when up to two migration events were included, and gene flow to HU cultivated mulberry was observed in the JP group ( Figure S15).

Screening for selective sweeps related to domestication
Present-day mulberry cultivars exhibit diversity in many agronomic characteristics, such as flowering time, disease resistance, and leaf development. Among them, traits such as resistance strength and delayed flowering have been recognized as important agronomic traits in cultivated mulberry. We      sweeps. Merging outlier windows yielded 103 unique regions that contained 411 positively selected genes. We also performed functional enrichment analysis by identifying GO terms for these overlapping genes (Table S15). The GO analysis revealed three significantly enriched biological processes (hypergeometric test, adjusted P value < 0.01) that were associated with disease resistance (adjusted P value = 3.10 Â 10 À5 to 2.42 Â 10 À9 ), flowering (adjusted P value = 7.52 Â 10 À6 to 4.67 Â 10 À7 ), and plant hormones (adjusted P value = 4.41 Â 10 À5 to 2.02 Â 10 À6 ) ( Figure 5B). Changes in flowering time are a major goal of mulberry breeding programs for the production of cultivars optimally adapted to local environments. Therefore, we identified three overlapping genes on Chr4 that were associated with flowering time in cultivated mulberry (DCNL4, NUP96, and DESL-like) using the two selection methods mentioned above (F ST = 0.74 and p ln-ratio WD/domestic = 2.133). This analysis indicated significant selection on these genes in response to the domestication of mulberry. The positive selection signals around this region were further confirmed by significantly lower values for Tajima's D and long haplotype patterns in WD ( Figure 5C). Of these three genes, NUP96 functions as a negative regulator of long day-induced flowering [24], whereas DCNL4 is proposed to be involved in pollen development and embryogenesis [25]. Current knowledge of these genes suggests that mulberry domestication involved selection on flowering time.
Mulberry production is severely threatened by diseases, which highlights the importance of breeding programs with a focus on disease resistance for crop improvement. Our selective sweep analyses revealed genomic regions and candidate genes associated with disease resistance, and some selected genes were involved in increasing plant defence responses (e.g., NHL10, LPXD2, CRK10/25, and JMJ25) ( Figure 5D) [26][27][28][29][30][31]. Signals for positive selection in cultivars compared with WD were also detected for the gene G2OX8, which encodes the gibberellin 2-beta-hydroxylase enzyme, resulting in changes in hypocotyl length or plant height via the gibberellin signaling pathway ( Figure 5E) [32,33]. These results should be valuable for future mulberry research and breeding plans and should facilitate mulberry improvement.

Discussion
Genome rearrangement, a common genetic process associated with speciation, has long been postulated to be a key phenomenon in the evolution of higher eukaryotes such as land plants. Recently, with the surge in plant genome sequencing projects and advances in bioinformatics tools, more examples of selection-induced plant genome evolution driven by genome rearrangement, such as in rye (Secale cereale) [34], Brassica crops [35], and Miscanthus floridulus [36], have been reported. Our findings on M. notabilis and findings from a previous study [21] have suggested that the formation and diversification of karyotypes in plants are largely attributable to gene gain and reuse via SVs.
Three high-quality mulberry genomes and their distinct, descending dysploidy status enabled us to infer the origins of and steps leading to the current mulberry karyotype. The ancestral karyotype of mulberry was particularly similar to that of modern M. alba (2n = 28) and was most similar to the inferred ancestral chromosomal arrangement, a finding that is consistent with the results reported for paper mulberry [18]. Due to adaptation in response to unique environmental changes, karyotype formation in M. notabilis involved more CFU events than in other plants and caused the observed descending dysploidy (from x = 14 to x = 6), which indicated parallel evolution and selection throughout the genus. Researchers previously hypothesized that this type of chromosomal fusion-fission event may be a common phenomenon in M. notabilis during growth and development [37]. Some of the genes present in regions that have undergone evolutionary rearrangement may be associated with the adaptation and evolution of the species [38], a hypothesis consistent with our findings. Alternative theories for descending dysploidy generation, such as ''telomere sequence directivity", may also be applicable but will require analysis with further sequencing data and genome analysis. The two unsolved studies on chromosomal behavior by Barbara McClintock promoted further experiments on decreasing chromosome numbers, suggesting that centromere breaks, inactivation, and fusions, in combination with telomeres, may be common mechanisms of karyotype evolution in plants and may extend to all eukaryotes [39]. However, the mechanisms that coregulate plant chromosomal rearrangements, such as descending dysploidy or WGD, at the telomere and centromere levels remain largely unclear and require more research.
Genetically determined dioecy occurs through mutations in two linked genes, one causing male infertility and the other causing female infertility, according to a theoretical concept of the formation of sex chromosomes [40,41]. Dual-gene models have    been discovered in campion (Silene spp.), papaya, asparagus, and kiwifruit in recent empirical studies [42][43][44][45]. In persimmon, however, a single gene appears to be sufficient for the expression of male features and inhibition of female development [19]. The pattern of sequence differences between males and females observed in the present study suggested the existence of a large nonrecombining region containing genes involved in sex determination and that male sex in mulberry is associated with heterogamy. We hypothesize that the hemizygosity of this Yspecific region is caused by mutations in sex-determining genes, which might also explain why regions containing sexdetermining genes do not recombine between the Y and X haploids. In plants, most sex-biased genes are not carried on sex chromosomes, and their expression levels are most likely regulated by at least one upstream sex-determining gene [46,47]. As TE insertion is rarely fixed in all individuals within a species, the presence of TE sequences in the same location is unusual [48]. Insertion of a complete sequence over a long evolutionary period is necessary for males to be retained in a specific area. Three MSTG transposon sequences are absent from the genomes of Arabidopsis thaliana and Oryza sativa. However, they exist stably in the mulberry genus ( Figure S12), suggesting that their specific functions are conserved. TE insertions have been shown to silence the expression of neighboring genes, such as those that control sexual forms in monoecious individuals [49]. This finding is surprising and implies that TE insertions have specialized functions in plants, such as improved male function. Although the specific activities of these genes are unknown, Ty3_Gypsy in budding yeast and humans selects binding sites for essential meiosis transcription factors, linking their transcriptional activity to the meiosis process [50,51]. Ty3_Gypsy LTR-RTs contribute to the development of flowers on male Populus deltoides plants by producing long noncoding RNAs (lncRNAs) [52]. Based on a de novo repeat library constructed from M. notabilis genome sequences (see the Materials and methods section), MSTGs are annotated as transposable proteins in the LTR/Gypsy transposon. Consistent with their functions as transposable proteins, transcripts were detected to have the ability to encode proteins using Long non-coding RNA-sequencing (lncRNA-seq) and general RNA-sequencing (RNA-seq) ( Figure S16), confirming that these transcripts do not produce lncRNAs. These results suggested that these transposon genes may have new regulatory patterns at the level of sex differentiation.
In plants, DNA helicases, such as RECQ4A and RECQ4B [22], and their interacting partners play a role in meiotic recombination [53]. In Drosophila, one male-specific lethal (msl) gene encodes a protein with sequence similarity to members of a superfamily of RNA and DNA helicases [54]. MSDH and RECQL1 have very similar sequence structures, with alternative splicing occurring in exon 2 of MSDH ( Figure 3E). The phylogenetic analysis revealed that the MSDH gene is a partial duplicate of the RECQL1 gene in male M. notabilis (Figure S17). The transcriptomic analysis showed that canonical RECQL1 was expressed in both leaves and female flowers, with a higher expression level in female flowers, whereas MSDH expression was specific to male flowers ( Figure 3F, Figure S18). Expression data from strand-specific lncRNAseq and small RNA-seq revealed that MSDH is transcribed into long transcripts that do not generate small interfering RNAs (siRNAs; Figure S16). We also used bisulfite sequencing to analyze methylation levels in the RECQL1 gene region and discovered no significant sex-biased differences (Figure S19), implying that MSDH and RECQL1 function independently. MSDH is therefore either a new gene that evolved de novo or a gene that transposed to new locations, followed by partial loss of the duplicated sequences. However, MSDH genes may have a very complex regulatory network, which is currently not well characterized, and this network will be the focus of future research.
Understanding population structure and phylogenetic relationships is very important for the management and utilization of gene pools of germplasm resources. Despite its widespread usage in basic mulberry research, the CHS group is not valuable in mulberry breeding, probably due to reproductive obstacles. Our evolutionary study answered various questions about the taxonomic status of these distantly related species and highlighted a potential future path for germplasm collection and exploitation. Based on the samples surveyed, Japan and the Guangdong region of China were the most promising sources of germplasm resources because the populations in these regions had the greatest nucleotide diversity and shared the same genetic composition, extending our previous understanding of their classification. East Asia has been identified as a significant ancient hotspot for the domestication of crops, including rice, sorghum, millet, soybeans, foxnut, apricot, and peach [55,56]. The domestication and diversification of mulberry, similar to those of other woody plant species, involve several complex steps, leading to geographic radiation and deliberate breeding of varieties. The selection of traits to maximize yield and quality necessitates the collection of accessions and additional evidence in the future to test our proposed evolutionary scenario [57].

Plant materials
Young leaves of two wild mulberry species, M. notabilis (male and female) and M. yunnanensis (female), were collected for whole-genome and de novo assembly. M. notabilis grows in Yingjing County, Sichuan Province, China (29°80 0 N 102°8 5 0 E, altitude 1100-1400 m), and M. yunnanensis grows in Pingbian County, Yunnan Province, China (22°68 0 N 103°6 7 0 E, altitude 1900-2200 m). Young leaves were collected from wild mulberry species in various regions in China and other countries, grafted, and stored at the Mulberry Breeding Center, Southwest University, for genome resequencing.

Genome size estimation
Short Illumina reads were obtained to estimate the size of the two genomes using a publicly available Perl script (https:// github.com/josephryan/estimate_genome_size.pl) to determine the distribution of k-mer values with Jellyfish [58]. Genome size was calculated by dividing the total number of k-mers by the k-mer distribution peak. For visualization, the online web software GenomeScope [59] was used.

Genome sequencing and assembly
Illumina short-read sequencing Cetyltrimethylammonium bromide (CTAB) was used to extract genomic DNA [60]. The Illumina HiSeq platform was used to sequence a library with a 350-bp insert size, yielding 150-bp paired-end reads (Table S1). The raw reads were subsequently trimmed and filtered to acquire clean reads.

PacBio library construction and sequencing
A portion of the DNA samples were transferred to Annoroad (Ningbo, China) for the construction of circular consensus sequence (CCS) libraries (male M. notabilis) and 20 K de novo libraries (female M. notabilis and female M. yunnanensis) using PacBio methodology and sequenced using the PacBio Sequel platform.

Hi-C library construction and sequencing
The library for Hi-C sequencing was created from young leaves crosslinked with the MboI restriction enzyme, as described previously [5]. Then, the Hi-C libraries were amplified using 12-14 cycles of PCR and sequenced on the Illumina HiSeq platform. An Illumina HiSeq instrument combined with 2150-bp reads was used to infer the sequencing interaction pattern.

Genome assembly and pseudomolecule construction
The genomes mentioned here were assembled as follows. 1) For male M. notabilis, we used hifiasm [61] with the default parameters to construct contigs from PacBio High Fidelity (HiFi) CCS clean reads. PacBio SMRT subreads were corrected and assembled into contigs for the two females using Canu [62] with the parameters ''corOutCoverage = 1000, minReadLength = 1000, and correctedErrorRate = 0.085". 2) Sequencing errors were repaired using Arrow (Pacific Biosciences) with the default parameters (only the females) and the Illumina paired-end reads obtained with Pilon [63] to increase the accuracy of the reference assembly. 3) The improved contigs were further rebuilt into two subassemblies (ref and alt) with HaploMerger2 [64]. 4) Based on the reference subassembly, clean Hi-C reads were analyzed using Juicer v1.6.2 [65], and 3D-DNA [66] was then used to scaffold the contigs into pseudomolecules.

Validation of the genome assembly
The genome completeness from the contig to chromosomelevel assemblies was assessed using BUSCO v3.0217 [67]. The completed assembly was compared to the Plantae BUSCO ''Embryophyta odb9" database, which contains 1440 protein sequences and orthogroup annotations for key clades, using the default parameters. This result was then compared with that obtained for M. alba genomes (Table S2).
Furthermore, HISAT2 [68] was used to align the RNA-seq data to the two wild mulberry genomes, and the results showed 98.07% and 98.15% single-base mapping accuracy for M. notabilis and M. yunnanensis, respectively. We used BWA v0.7.8 [69] to map Illumina reads from short-insert-size libraries back to genome assemblies. Our findings indicated that 97.90% of the reads from M. notabilis and 99.50% of the reads from M. yunnanensis were mapped to the assemblies, implying that the assemblies were highly complete (Table S2).
The Hi-C heatmap revealed a well-organized interaction contact pattern along the diagonals within each pseudochromosome ( Figure 1D and F). The LTR-RT assembly index [14], a metric used to evaluate the completeness of a genome assembly based on the quality of the assembly of repeat sequences, was also used for all the aforementioned genomes in the LTR_retriever pipeline [70].

RNA-seq and transcriptome assembly
RNA-seq data were generated from six tissues (root, bark, stem, male flower, female flower, and leaf), and total RNA was extracted using RNAiso Plus (Catalog No. 9108, Takara, Dalian, China) according to the manufacturer's protocols.
The Illumina HiSeq XTen platform was used to create and sequence 15 paired-end libraries containing sequences with a 150-bp read length, and Trimmomatic v0.36 [71] software was used to trim the adapter sequences of the RNA-seq reads. Genome-guided transcriptome assembly was performed with HISAT2 and StringTie v1.3.475 [72]. HISAT2-build was used to construct the genome index, and HISAT2 was used to map the clean transcriptome reads to the M. notabilis genome. The findings were integrated with StringTie in merge mode after the transcripts for each sample were assembled. HISAT2 and the StringTie pipeline were used to calculate reads per kilobase per million (RPKM) values. The R package ''edgeR" [73] was used to investigate differentially expressed genes.

Repeat annotation and gene prediction
A combination of homology searching and ab initio prediction was used to identify the repetitive sequences in the mulberry genome. We searched against Repbase with RepeatMasker [74] and RepeatProteinMask for homology-based prediction. We employed Tandem Repeats Finder [75], LTR FINDER [76], and RepeatScout [77] with default parameters for ab initio predictions.
The predictions of protein-coding genes were performed using previously reported methods with minor revisions [5]. Briefly, using Augustus [78], GlimmerHMM [79], and SNAP [80], we performed ab initio coding region prediction in the repeat-masked genome. PASA-H-set gene models trained Augustus, SNAP, and GlimmerHMM, which were then used to predict three masked mulberry genomes. The annotated proteins from M. notabilis, M. alba, C. sativa, Fragaria vesca, Malus domestica, Prunus persica, A. thaliana, and O. sativa were used to obtain protein evidence. Next, we used Trinity to assemble the transcriptome based on RNA-seq data and then PASA [81] to align the assembled sequence to the genome for gene predictions. Additionally, TransDecoder (https:// github.com/TransDecoder) was also used to identify putative coding regions in transcript sequences. EVidenceModeler [82] was then used to combine the aforementioned findings to forecast the complete set of nonredundant genes. All predicted proteins were annotated using InterProScan v5. 35-74.0 [83] and by running a BLASTP [84] search against the KEGG [85], Swiss-Prot, and TrEMBL [86] databases with an Evalue threshold of 1EÀ5.

Identification of LTR-RTs
A comparative analysis of LTR-RTs was performed using the genome sequences of M. notabilis (male and female), M. yunnanensis (female), and M. alba. LTR-FINDER [76] (parameters: -w 2 -d 0 -l 100) was used to detect LTR-RTs.

Estimation of the insertion time of LTR-RTs
All LTR sequences with complete 5 0 -LTR and 3 0 -LTR were used. MUSCLE [87] (with default parameters) was used to align the 5 0 -LTR flanking and 3 0 -LTR flanking sequences, and the distance between the alignment sequences was calculated using distMat (https://www.bioinformatics.nl/cgi-bin/ emboss/distmat, with the parameter -nucmethod 2). T = K/2r (divergence between LTRs/substitution per site per year) was used to calculate the insertion time. The mutation rate (per base per year) used was 1.8 Â 10 À8 .

Chromosome evolution
We used the genomes of M. notabilis, M. alba, F. hispida, F. microcarpa, and C. sativa (with C. sativa as the outgroup) to reconstruct the ancestral chromosome karyotype of the Moraceae family based on a previously published technique with minor revisions [88]. Briefly, using M. alba as the reference genome, we performed pairwise alignments with other species as targets using LAST v1.1 with the default parameters. Subsequently, axtChain, chainMergeSort, chainPreNet, and ChainNet were used to generate ''chain" and ''net" files as inputs for DESCHRAMBLER (https://github.com/jkimlab/ DESCHRAMBLER). We then identified 455 conserved segments using DESCHRAMBLER at a 1200-kb resolution and reconstructed 21 predicted ancestral chromosomes with a total length of $ 305 Mb (Table S5).

Comparative genomic analysis and detection of EFRs in mulberry genomes
The MCscan toolkit (https://github.com/tanghaibao/jcvi/wiki/ MCscan) was used to identify homologous gene pairs between the genomes of M. notabilis, M. yunnanensis, and M. alba. We estimated large-scale homologous synteny blocks (HSBs) in the pairwise whole-genome alignment utilizing the chromosomal sequences of the M. notabilis and M. alba genomes to detect probable EFRs. Raw local synteny blocks between the two genomes were detected using the MCscan toolkit. An EFR was defined as the interval between two large-scale HSBs demarcated by the end-sequence coordinates of large-scale HSBs on each side. The relative gene density, SD content, and repetitive content within the EFRs of each chromosome were compared to the complete chromosome using an inhouse script.

Analyses of gene families and phylogenetic evolution
Orthologous gene families in M. notabilis, M. yunnanensis, M. alba, and five other species (V. vinifera, P. persica, C. sativa, F. hispida, and F. microcarpa) were identified based on annotated genes using OrthoFinder v2.2.7 [15]. The expansion and contraction of the M. notabilis and M. yunnanensis gene families were examined using CAFÉ [89]. Single-copy orthologous genes were subsequently extracted, aligned with MUSCLE v8.2.10 [87], and analyzed phylogenetically using RAxML v8.2.10 [90] with the GTRGAMMA model. The species divergence time was estimated using MCMCTree in PAML v4.8 [91], and calibration times were determined using the TimeTree database (https://www.timetree.org/). Fourfold degenerate sites (4DTv) from the whole-genome SNP collection were extracted and concatenated into a ''supergene" format for each species to construct a Morus genus tree. The seven aligned 4DTv supergenes were used to construct a phylogenetic tree using the IQ-TREE [16] program.

Detection of SNPs, small indels, SVs, and SDs
SNPs and small indels (length 500 bp) were compared between the two wild genomes assembled in this work and the M. alba genome using MUMmer v3.2394 [92]. First, we used the number from MUMmer with the parameter ''-mum -g 1000 -c 90 -l 40" to generate the alignment. The files were then filtered using the delta-filter comparison with the query ''-r -q" to create a one-to-one map. Show-snps (a module of MUMmer) with the parameter ''-ClrTH" was used to call SNPs and small indels from the one-to-one alignment module. Furthermore, the web-based SV analysis tool Assemblytics [93] was used to analyze large SVs. In addition, we identified SDs in the mulberry genomes with reference to the Ficus genomes (https://github.com/tangerzhang/popCNV).

Identification of SDRs in M. notabilis
A previously described technique based on k-mers [19,20] was used to identify sex-determining genes in the M. notabilis genome, with a slight modification. Using Jellyfish [58], the reads from male and female M. notabilis (four individual replicates per sex) were catalogd into 40-bp k-mers. The total cumulative counts of k-mers starting with ''AG" in females and males were greater than ten, and the reads were defined as valid kmers. In addition, k-mers with a count of zero in the female group were defined as MSKs, and those with a count of zero in the male group were defined as FSKs. We complemented this approach by mapping the sex-specific reads to the assembled male M. notabilis genome using BWA-MEM with the default parameters, and we prioritized genomic regions (10-kb windows) with a high depth of sex-specific reads. After deleting duplicates, SAMtools v1.9 [94] was used to calculate the mapping depth of the reference genome (by parsing specific 50-kb windows). Additionally, deduplicated BAM data were normalized using the deepTools bamCoverage tool [95].
Quantifying the digital expression and analyzing the potential function of candidate sex genes In the analysis of RNA-seq, lncRNA-seq, and small RNA (sRNA)-seq data, rRNA/tRNA contaminants were removed by mapping the reads to rRNA/tRNA sequences from public databases. Transcripts of lncRNAs were assembled using Trinity V2.6.6 with the parameter settings for strand-specific reads (''--SS_lib_type RF"). The obtained sequences were used to search the National Center for Biotechnology Information (NCBI) RefSeq non-redundant proteins (NR) database. Small RNA reads and bisulfite sequencing reads were mapped to the reference genome using Bowtie2 v2.3.4.1 and Bismark v0.16.3 [96], respectively. The alignment results were visualized using Integrative Genomics Viewer (IGV) v2.4.14 [97].

Population structure and phylogenetic analyses
The NJ tree, PCA, and ADMIXTURE methods were used to explore the genetic relationships among wild and domesticated mulberry populations. An NJ tree was constructed for the whole-genome SNP set using MEGA v6.0 [99] based on a pairwise genetic distance matrix, which was calculated using PLINK v1.9 [100] with the option ''--distance-matrix". PCA was performed using smartpca in EIGENSOFT v6.1 [101]. The significance of eigenvectors was assessed using the Tracy-Widom test. Population genetic structure was inferred using ADMIXTURE v1.3.0 [102] considering k = 2 to k = 10 ( Figure S11), and the analysis was repeated 20 times for each k value. For the PCA and ADMIXTURE analysis, we used a nonredundant SNP data set obtained after removing rare alleles with the option ''--indep-pairwise 50 5 0.4" in PLINK and further excluding SNPs with intrachromosomal LD r 2 < 0.4 to remove the bias caused by LD.
The value of p was calculated using VCFtools v0.1.15 [103] based on the high-confidence filtered SNPs. The p value for each SNP was calculated, and the nucleotide diversity level was measured using ''--window-pi 50000 --window-pi-step 20000" for each subpopulation. LD decay was calculated for all pairs of SNPs within 500 kb using PopLDdecay v3.27 [104] with the default parameters.
Treemix (v1.13) was used to infer models of population split and migration between groups. Treemix was run using the allele frequencies calculated from the LD-pruned SNP set with the parameters ''-bootstrap 5000 -global" and ''migration event -m (range 0-4)".

Identification of selective sweeps
We performed the analysis described below to detect selective sweeps during mulberry domestication. 1) The high divergence in genetic diversity (p ln-ratio) and high fixation index F ST were analyzed by parsing specific 50-kb windows between the domesticated group (JP, GD, HU, and NH) and WD. 2) Putative selective sweeps were defined as windows with outlier signals (top 1%) overlapping for the two statistics (F ST > 0.564, p ln-ratio WD/JP-GD-HU-NH > 2.041). 3) Tajima's D and comparison haplotypes were applied to confirm the top signals.

Functional enrichment analyses
We characterized the most relevant functions of the proteincoding genes with chromosomal break regions and selective sweeps by searching for overrepresented KEGG pathways and GO terms. Target protein sequences were used to conduct functional enrichment tests of the target genes using KOBAS 3.0 (https://kobas.cbi.pku.edu.cn/kobas3/annotate/). The P value was calculated using a hypergeometric distribution, and P < 0.05 was considered significantly enriched.

RNA extraction and qRT-PCR analysis
Total RNA was extracted as described above, and cDNA was generated with a PrimeScript RT Reagent Kit with gDNA Eraser (Catalog No. RR047B, Takara, Japan). We performed qRT-PCR with TB Green Premix Ex Taq II (Catalog No. RR820Q, Takara, Japan). Relative expression was calculated using the 2 ÀDDCt method [105]. The primers used for the gene expression analysis are listed in Table S16. The values are presented as the mean ± SD from three biological replicates ( ** , P < 0.01; *** , P < 0.001; and **** , P < 0.0001), as determined using a one-way analysis of variance (ANOVA).

PCR validation of candidate sex-determining genes
Degenerate primers were designed for four candidate genes to verify their male specificity (Table S16). The PCR mix was composed of 10 ll of 2Â Ex Taq MasterMix (Catalog No. CW0682, CWBIO, China), 1 ll of each primer, and 1 ll of genomic DNA at a concentration of $ 100 ng/ll, and ddH 2 O was added to obtain a total reaction volume of 20 ll. The thermocycling conditions were as follows: an initial cycle of 5 min at 94°C, followed by 35 cycles of 30 s at 94°C, 30 s at 57°C, 30 s at 72°C, and 2 min of extension at 72°C. The PCR products were loaded on a 1.0% agarose gel and run at 150 V for 15 min. The samples were imaged under ultraviolet light.