Identiﬁcation of Growth-Related SNPs and Genes in the Genome of the Pearl Oyster ( Pinctada fucata ) Using GWAS

: Pinctada fucata , the pearl oyster, is a bivalve primarily cultivated for the production of saltwater pearls. In this study, the genome-wide association study (GWAS) for the growth-related traits and a principal components analysis (PCA) in P. fucata were performed. Genomic parameters of 6 growth-related traits in 60 individuals were estimated by using 4,937,162 single-nucleotide polymorphisms (SNPs). A total of 45 SNPs associated with growth traits were thus identiﬁed. Furthermore, 165 candidate genes were identiﬁed, including collagen alpha-3 (VI), serine/threonine-protein kinase mos-like harboring signiﬁcant markers, and histidine-rich protein PFHRP-III-like, which may inﬂuence growth-related traits associated with various biological processes. The results of this study can facilitate marker-assisted selection and breeding programs designed to enhance growth and also offer a theoretical foundation for the further development and utilization of genomic resources in P. fucata .


Introduction
Pinctada fucata, the pearl oyster, is a bivalve primarily cultivated for the production of saltwater pearls [1]. Since the successful hatchery production in 1965, this species has played a crucial role in the country's economy [2,3]. However, in recent years, saltwater pearl farming has faced some challenges, including lower growth rates, weak disease resistance, and decreased mariculture production [4]. To address these issues, it has been suggested that improving breeding programs would be an effective solution. Improving production has always been a top priority for the pearl oyster farming sector. To this end, genetic breeding with the objective of enhancing growth rates is a crucial aspect of pearl oyster culture. Determining genomic loci and genes related to economically relevant traits can not only play a vital role in facilitating and speeding up the breeding process, but also increase production of pearl oyster.
With the advancement of high-throughput sequencing techniques, the exploration of growth-related genes in pearl oysters has become more comprehensive [1]. RNA-seq databases constructed from the mantle tissue of diploid and triploid P. fucata have revealed a total of 11 growth-related genes, including N151, shematrin-partial, mantle gene 2, mantle 11 and paramyosin-like isoform X1 [5]. The identification of these genes provides markers

DNA Extraction and Sequencing
The adductor muscle was collected from each pearl oyster, and frozen in liquid nitrogen for DNA extraction. Genomic DNA was extracted using the CTAB (cetyltrimethylammonium bromide) method, described in detail in [19]. Agarose 1.5% gel electrophoresis was used to analyze degradation and contamination. Qubit and Nanodrop (Thermo Fisher Scientific, Waltham, MA, USA) were used to assess DNA purity and quantity, respectively. Only high-quality DNA samples with optical density (OD) 260/280 between 1.8 and 2.0 were used for library construction. Using Illumina Inc.'s Illumina NovaSeq 6000 platform (San Diego, CA, USA) at Genedenovo (Guangzhou, China), paired-end libraries of 500 bp insert size were constructed for each sample, utilizing genomic DNA and the Paired-End DNA Sample Prep Kit.

SNP Discovery and Genotyping
To obtain high-quality data, strict filtering criteria were applied to the raw data (Gen-Bank accession:GCA_002216045.1) [20] using FASTP (version 0.18.0, Beijing, China) [21], including removing reads with ≥10% uncalled nucleotides (N), medium-low-quality reads (those where >50% of all bases had Phred quality scores ≤20), removing reads aligned to the barcode adapter. SNP and Indel discovery was performed using the Burrows-Wheeler Aligner (BWA) [21] and GATK's Unified Genotyper [22]. The BWA mem tool, with parameters (-k 32-M−k being the minimum seed length, -M being an option to denote shorter split alignment hits as secondary alignments) [23] was employed to align the clean reads to the pearl oyster genome. GATK's Variant Filtration module (-Window 4, -filter "QD < 4.0|| FS > 60.0|| MQ < 40.0", -G _filter "GQ < 20", QD: Variant Confidence/Quality by Depth; FS: Fisher's exact test-scaled p-value to detect strand bias; MQ: RMS Mapping Quality; GQ: Genotype Quality) was employed to filter SNPs and Indels. Those displaying segregation distortion or sequencing errors were then discarded. Uniquely, SNPs placed in well-assembled chromosomes were retained. ANNOVAR was employed to ascertain the physical position of SNPs and indels in the genome by aligning and annotating them [24].

Analysis of Population Structure and Linkage Disequilibrium (LD) Decay
Subjecting the phenotypic data to principal component analysis (PCA) to minimize redundancy, GCTA (v1.94.1, GCTA, Phoenix, AZ, USA) software was employed to perform the PCA of all samples to understand the community structure of P. fucata based on SNP markers obtained from screening [25]. The samples were also subjected to kinship analysis by GCTA software to avoid false positives, the resulting kinship matrix was added to the GWAS model as the K matrix. Linkage disequilibrium (LD) decay of the population was analyzed using PopLDdecay software (version 3.40 Beta, Wuhan, China) [26] to calculate the LD size and intensity between markers in terms of D' or ΠAΠaΠBΠb) ), the resulting LD coefficients were then fitted at various distance levels to ascertain the pattern of LD decay. Plink software (v1.07, Santa Monica, CA, USA) [27] removed marker pairs with r 2 greater than 0.2 according to a step size of 100 kb and a window size of 10 nt (lower physical location loci were removed).

Statistical Analyses
Data analysis was performed with R package stats software version 3.5.0 [28]. The significance of differences between L group and S group was assessed through a oneway ANOVA with the Bonferroni post hoc test. Statistically significant correlations were considered to be present when p-value < 0.05.
Genome-wide association analyses were conducted using TASSEL 3.0 (San Francisco, CA, USA) [29], which gained a mixed liner model (MLM) by the Q + K method. Admixture V1.3.0 software (Los Angeles, CA, USA) [30], which is based on the admixture model, was also used for the estimation of the population structure (Q matrix). The SPAGeDi V 1.5 software (Paris, France) [31] was used to calculate the kinship matrix (K-matrix) using the method of Loiselle et al. [32]. The correlation between genotypes and growth traits were assessed using the Q + K method, where Q is the optimal value of K (corresponding to the population structure matrix), K is the genetic relationship matrix between samples. The mixed linear model was established by the Q + K method, which could be specified as follows: y = Xα + Qβ + Kµ + eα (1) y, the phenotype vector, X, the genotype matrix, α, the genotype effect vector, Q, the fixed effect matrix, β, the fixed effect vector, K, the random effect matrix, mainly the kinship matrix, and e, the residuals vector, all constitute the random effect vector.
For each SNP site, we test if α is equal to 0 and use the p-value, which represents the probability of α being 0, to assess the degree of association between the marker genotype and phenotype. The smaller the p-value, the lower the probability of α being 0, the stronger the likelihood of the marker is associated with the trait. The p-value obtained was transformed to -log 10 and displayed in Manhattan and Q-Q plots, being the significant threshold a -log10(p-value) of 6. The Manhattan plot was generated using the software MATLAB (R2012a v7.14.0.739, Mathworks, Natick, MA, USA) (http://www.mathworks. com/products/matlab, accessed on 29 April 2020).

Selection of Candidate Gene
The relationship between significant SNPs and growth-related traits was further verified through the resequencing of 60 collected from two breeding lines constructed for growth selection selected P. fucata individuals. Sequences of key SNPs centered markers were identified in the GWAS dataset based on the mixed linear model (MLM). Further, in the case of a reference genome, the QTL regions were defined as the most strongly associated SNP (lead SNP) of each significant peak plus 50 kb upstream and 50 kb downstream regions. The genes within QTL regions were defined as candidate-related genes.

KEGG Enrichment Analysis
Further, KEGG enrichment analysis was also performed to investigate the potential functional categories and metabolic pathways associated with candidate genes. Based on the KEGG annotation and classification, genes in selected regions were classified and analyzed in terms of their biological pathways, leading to the identification of six growth traits associated to growth traits. Pathway enrichment analysis, when compared to the entire genome background, revealed significantly enriched metabolic or signal transduction pathways in candidate associated genes. The p-value is: the number of all genes that with KEGG annotation, n the number of candidate associated genes in N, M the amount of genes annotated to particular pathways, and m the number of candidate associated genes in M). The calculated p-value was corrected for false discovery rate (FDR), using FDR < 0.05 as threshold. Pathways meeting this condition were defined as significantly enriched pathways in candidate associated genes.

Descriptive Statistics of Growth-Related Traits
The growth traits of shell length varied from 46.64 mm to 70.68 mm, with a mean of 57.88 mm and a CV of 14.65. The shell height ranged from 48.22 mm to 75.06 mm, with a mean of 60.10 mm and a CV of 13.52%. The shell width spanned from 16.69 mm to 25.02 mm, with a mean of 20.39 mm and a CV of 9.80. The shell weight was between 7.47 g and 20.53 g, with a mean of 13.38 g and a CV of 30.20. The soft tissue weight had a mean of 13.05 g and a CV of 33.80, with a minimum of 5.92 g and a maximum of 22.69 g. The total weight ranged from 15.12 to 39.75 g, with an average of 26.44 g and a CV of 31.20. (Tables 1 and 2). The correlation between growth-related traits was illustrated in Figure 1. (All correlations were significant.) The results indicated that the six growth-related traits basically followed a bimodal distribution using two breeding lines. The correlation, measured as the Pearson correlation coefficient, between sl and SW, as well as between sh and SW, was estimated to be within the range of 0.75 to 0.80. In addition, a significant correlation was observed between stw and tw, as well as between sw and tw. The results showed that the growth traits were the expression of the morphological, structure, physiological and ecological characteristics of P. fucata, and all the traits were related to each other.

Quality Control of the Sequencing Data
The whole-genome resequencing of the 60 pearl oyster individuals generated a total of 510 Gb raw data. The sequencing depth for each sample and SNP was 9.5× and 6.5×, respectively. After the elimination of low-quality reads, 500 Gb of pristine data were left. The average clean data for samples ± SD was 8.2 Gb ± 0.6 Gb. The SNP call identified a total of 12,270,990 SNPs. After the site-based filtration and screening, 4,937,162 SNPs were retained for GWAS analysis.  showed that the growth traits were the expression of the morphological, structure, physiological and ecological characteristics of P. fucata, and all the traits were related to each other.

Analysis of Population Structure and Genetic Relationship
The results of the PCA showed that PC1, PC2 and PC3 accounted for 13.8% of the total phenotypic variation ( Figure 2). Specifically, PC1 explained 5.61% of the total genetic variation and was found to divide the samples into two distinct clusters. Meanwhile, PC2 and PC3 accounted for 4.29% and 3.9% of the total genetic variation, respectively.

Analysis of Population Structure and Genetic Relationship
The results of the PCA showed that PC1, PC2 and PC3 accounted for 13.8% of the total phenotypic variation ( Figure 2). Specifically, PC1 explained 5.61% of the total genetic variation and was found to divide the samples into two distinct clusters. Meanwhile, PC2 and PC3 accounted for 4.29% and 3.9% of the total genetic variation, respectively.

GWAS of Growth Traits
The GWAS of the 6 growth-related traits was conducted using TASSEL3.0 with 4,937,162 SNPs and 60 samples. A total of 45 genome-wide significant SNPs (−log10 p > 6) were identified across all traits, located on 1-14 ( Figure 3 and Table S1), with 7 of them significant for multiple traits. These SNPs were mainly located on chromosomes 1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 14. Among them, PIN_chr3_76865070 and PIN_chr3_76865288 were significant for sl, sh, sw, and tw, with SNP effects ranging from 0.27 to 0.32. PIN_chr10_36697210 and PIN_chr12_8906589 were significant for stw and tw, with SNP effects of 0.24, 0.34, and 0.30, respectively. PIN_chr2_43686383 and PIN_chr7_35952421 were significant for sw and tw, with SNP effects of 0.23 and 0.30, respectively. Finally, PIN_chr2_50316499 was significant for stw and tw, with SNP effects of 0.19 and 0.18, respectively. Some markers or SNPS are associated with two or more traits at the same time, indicating that there may be a certain degree of genetic correlation between traits.

GWAS of Growth Traits
The GWAS of the 6 growth-related traits was conducted using TASSEL3.0 with 4,937,162 SNPs and 60 samples. A total of 45 genome-wide significant SNPs (−log 10 p > 6) were identified across all traits, located on 1-14 ( Figure 3 and Table S1), with 7 of them significant for multiple traits. These SNPs were mainly located on chromosomes 1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 14. Among them, PIN_chr3_76865070 and PIN_chr3_76865288 were significant for sl, sh, sw, and tw, with SNP effects ranging from 0.27 to 0.32. PIN_chr10_36697210 and PIN_chr12_8906589 were significant for stw and tw, with SNP effects of 0.24, 0.34, and 0.30, respectively. PIN_chr2_43686383 and PIN_chr7_35952421 were significant for sw and tw, with SNP effects of 0.23 and 0.30, respectively. Finally, PIN_chr2_50316499 was significant for stw and tw, with SNP effects of 0.19 and 0.18, respectively. Some markers or SNPS are associated with two or more traits at the same time, indicating that there may be a certain degree of genetic correlation between traits.
The statistical analysis revealed a significant association between these markers and growth-related traits.

Identification of Candidate Genes and KEGG Pathway Analysis
As presented in Figure 4, the squared allele frequency correlation (r 2 ) declined drastically in the experimental group, with the overall mean at a distance approximately 80 kb. The KEGG database revealed the top 20 enriching pathways, which are mainly linked to the glycosphingolipid biosynthesis-globo and the isoglobo series, glycosphingolipid biosynthesis-lacto and neolacto series, ECM-receptor interaction, protein digestion and absorption, purine metabolism, the Toll and lmd signaling pathway, and the TNF signaling pathway, as illustrated in Figure 5 and beyond. A total of 165 genes were identified across all significant SNPs of the six traits, with a subset of these genes presented in Table S2.  The statistical analysis revealed a significant association between these markers and growth-related traits.

Identification of Candidate Genes and KEGG Pathway Analysis
As presented in Figure 4, the squared allele frequency correlation (r 2 ) declined drastically in the experimental group, with the overall mean at a distance approximately 80 kb. The KEGG database revealed the top 20 enriching pathways, which are mainly linked to the glycosphingolipid biosynthesis-globo and the isoglobo series, glycosphingolipid biosynthesis-lacto and neolacto series, ECM-receptor interaction, protein digestion and absorption, purine metabolism, the Toll and lmd signaling pathway, and the TNF signaling pathway, as illustrated in Figure 5 and beyond. A total of 165 genes were identified across all significant SNPs of the six traits, with a subset of these genes presented in Table S2.

Discussion
The pearl oyster, P. fucata, is a crucial tropical aquaculture species that is widely farmed for its highly prized "South Sea" pearls. It has been estimated that P. fucata accounts for approximately 90% of the total marine pearl production in China [33]. Despite this, the pearl culture industry is still in its early stages when compared to terrestrial animal production systems and has not yet established advanced selective breeding programs. Among the primary objectives of selective breeding programs is improvement of growth rates, which could potentially reduce the culture cycle. Decades of artificial breeding have drastically enhanced shellfish, particularly scallops, abalone

Discussion
The pearl oyster, P. fucata, is a crucial tropical aquaculture species that is widely farmed for its highly prized "South Sea" pearls. It has been estimated that P. fucata accounts for approximately 90% of the total marine pearl production in China [33]. Despite this, the pearl culture industry is still in its early stages when compared to terrestrial animal production systems and has not yet established advanced selective breeding programs. Among the primary objectives of selective breeding programs is improvement of growth rates, which could potentially reduce the culture cycle. Decades of artificial breeding have drastically enhanced shellfish, particularly scallops, abalone and pacific oysters, through selective breeding [34][35][36]. In this study, the growth traits of P. fucata were detected by GWAS and phenotypic correlation analysis, the results revealed that the six growth traits exhibited normal or biased normal distribution and demonstrated positive correlation in pairwise comparisons between traits. The findings suggested that these traits may be jointly regulated, which is consistent with the results of the GWAS. Therefore, it is necessary to carefully consider the mutual influence of various traits and make a comprehensive judgment in the breeding process of P. fucata.
Previous investigations have concentrated on the genetic mapping and trait location examination of P. fucata. For instance, genetic linkage maps were constructed utilizing 230 and 189 markers, including 15 and 10 microsatellites, respectively [37]. Although preliminary, the genetic maps can be employed for genome-wide scanning of QTLs. Li et al. using restriction-site associated DNA sequencing (RAD-seq), a genetic mapping and QTL analysis of growth-related traits was conducted. Subsequently, single-marker analysis and a non-parametric mapping Kruskal-Wallis (KW) test were employed to identify thirty-nine QTL-peak loci for seven growth-related traits [38]. These previous studies employed AFLP, SSR, or RAD-seq techniques to identify markers. One shared characteristic among these techniques is the identification is the limited number of detected markers. In 2015, Brondum and associates [17] showed that the use of QTL markers taken from whole-genome sequence data augmented the dependability of genomic forecasts. Likewise, Bentley [39] emphasized the need for whole-genome resequencing data in reference groups to obtain abundant genetic information and gene resources. Here, LD analysis revealed a rapid LD decay in a set of 60 samples of P. fucata, with an r2 value falling below 0.1 within 1 kb, which indicated that LD levels were low in the investigated populations. Consequently, markers of high density are indispensable for capturing markers in close proximity to the actual causal mutations in this species.
GWAS is an efficient way to uncover the genetic variation associated with complex quantitative traits and identify the corresponding candidate gene [40]. In this study, we screened 45 SNPs that were significant for six growth-related traits (-log10p > 6). Some of these SNPs were linked to multiple traits, suggesting coordinated regulatory mechanisms. Our results, consistent with previous studies, indicated that growth traits are polygenic in the pearl oyster [41].
Interactions between genes, enabling coordinated expression, are the basis of the coordination of various biological functions in biology. In our study, the candidate genes significantly enriched into a variety of pathways. The pathways were mainly in glycan biosynthesis and metabolism, ECM-receptor interaction, oocyte meiosis, the C-type lectin receptor signaling pathway and metabolic pathways. These are related to growth and development, including Sec1, FUT2, PTPN11 (tyrosine-protein phosphatase non-receptor type 11), and COL6A (collagen, type VI, alpha). It is previously reported that transcriptome analysis was used to analyze the molecular mechanism of the different growth rates of the knockout layer in the shell of P. martensii, and a large number of traits and genes were significantly enriched in many pathways, such as glycan biosynthesis and metabolism and the ECM-receptor interaction pathway, and the genes (C1q, C-type lectin) may be involved in the process of nacres formation [1]. Moreover, some studies also pointed out that the ECM-receptor interaction pathway was significantly enriched in the group with fast growth, and all the genes in it were considerably up-regulated, indicating that this pathway exerted a crucial role in the growth of nacres and participated in nacres formation [42]. These results provide theoretical basis and a new perspective for studying P. fucata growth traits.
In this study, 165 candidate genes with significant SNPs were identified. Among the identified genes, ColVI, belongs to the collagen family and represents a primary component of connective tissues. In vertebrates, ColVI has been implicated in the process of bone calcification, and collagen alpha-3 (VI) has also been associated with the growth of P. fucata in other studies [43]. Previous work has demonstrated that collagen VI-like gene (PmColVI) displays significant expression levels within the pallial mantle tissue and is responsible for nacre layer formation in shells through the VWA domain interacting with a variety of matrix proteins [44]. The association of the PTPN11 (tyrosine-protein phosphatase non-receptor type 11) gene with growth-related traits has been investigated. Du et al. [20] have done WGCNA analysis on 234 matrix proteins and found that 27 genes were located in important sites, including VWAP, tyrosinase, and HSP70. These genes were found in the transcriptome of different growth rates in P. fucata. Other candidate genes, such as nacreserine protease inhibitor 2, serine/threonine-protein kinase mos-like, and histidine-rich protein PFHRP-III-like, have not yet been studied in P. fucata. Nevertheless, these genes have been shown to exert an impact on traits associated with growth through a multitude of biological processes such as digestion, activation of the complement system, cell differentiation, and hemostasis [45][46][47][48][49][50][51][52][53]. Serine/threonine-protein kinase mos-like, also known as MASTL, is an enzyme that plays a crucial role in cell cycle regulation. To our knowledge, no study has demonstrated a direct association between serine/threonine-protein kinase mos-like and growth traits. Nevertheless, serine/threonine-protein kinase mos-like relevant to vitellogenesis and oocyte maturation was validated by qRT-PCR in Chinese mitten crab (Eriocheir sinensis) [54]. We identified candidate genes that are associated with growthrelated traits in P. fucata, which gives valuable insights into the genetic mechanisms of its growth. These results will help us understand the genetic structures and molecular mechanisms of the growth phenotype, and provide useful SNPs loci for marker-assisted selection and breeding of P. fucata.

Conclusions
Crucial for the breeding program of P. fucata and the pearl industry is the creation of dependable markers linked to growth and the augmentation of growth-related characteristics. In this study, the growth traits of P. fucata were detected by GWAS and phenotypic correlation analysis, and the results revealed that the six growth traits exhibited normal or biased normal distribution and demonstrated positive correlation in pairwise comparisons between traits. In addition, a total of 45 SNPs and 165 candidate genes related to growth traits were identified. In this study, we found the relationship between multiple multi-effect loci and traits, which provided reference for explaining the genes and heredity associations among different traits. The findings of this study may advance the marker-assisted selection and breeding aimed at improving growth, providing a theoretical foundation for the further development and utilization of gnomic resources in P. fucata.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/fishes8060296/s1, Table S1 reveals genome-wide significant SNPs associated with growth traits; Table S2   Institutional Review Board Statement: The Guangdong Ocean University Research Council, after an exhaustive review, gave its approval for the animal protocols in this study, as declared by the Institutional Review Board; approval no. GDOU-LACUC-2022-A3122.

Data Availability Statement:
The data underlying this article are available in the NCBI, and can be accessed with GenBank accession: PRJNA962298.