Whole-genome single-nucleotide polymorphism (SNP) marker discovery and association analysis with the eicosapentaenoic acid (EPA) and docosahexaenoic acid (DHA) content in Larimichthys crocea

Whole-genome single-nucleotide polymorphism (SNP) markers are valuable genetic resources for the association and conservation studies. Genome-wide SNP development in many teleost species are still challenging because of the genome complexity and the cost of re-sequencing. Genotyping-By-Sequencing (GBS) provided an efficient reduced representative method to squeeze cost for SNP detection; however, most of recent GBS applications were reported on plant organisms. In this work, we used an EcoRI-NlaIII based GBS protocol to teleost large yellow croaker, an important commercial fish in China and East-Asia, and reported the first whole-genome SNP development for the species. 69,845 high quality SNP markers that evenly distributed along genome were detected in at least 80% of 500 individuals. Nearly 95% randomly selected genotypes were successfully validated by Sequenom MassARRAY assay. The association studies with the muscle eicosapentaenoic acid (EPA) and docosahexaenoic acid (DHA) content discovered 39 significant SNP markers, contributing as high up to ∼63% genetic variance that explained by all markers. Functional genes that involved in fat digestion and absorption pathway were identified, such as APOB, CRAT and OSBPL10. Notably, PPT2 Gene, previously identified in the association study of the plasma n-3 and n-6 polyunsaturated fatty acid level in human, was re-discovered in large yellow croaker. Our study verified that EcoRI-NlaIII based GBS could produce quality SNP markers in a cost-efficient manner in teleost genome. The developed SNP markers and the EPA and DHA associated SNP loci provided invaluable resources for the population structure, conservation genetics and genomic selection of large yellow croaker and other fish organisms.


127
The sample collection and experiments in the study was approved by the Animal Care and Use Committee 128 of Fisheries College of Jimei University (Animal Ethics no. 1067).
129 Sample preparation and DNA extraction

130
The mixed reference population of 500 individuals was bred by a random fertilization of 30 males and 30 131 females at the large yellow croaker breeding base of Jimei University in Ningde, Fujian, China. All fish 132 individuals were 1.5 year old with the total length and weight of 24.5~25.9 cm and 217.8~234.1 g (95% 133 confidence interval), respectively. To extract genomic DNA respectively from 500 individuals, the dorsal fins 134 (20-30 mg) of the fish individuals were collected, frozen in liquid nitrogen for the following DNA extraction.
135 Total genomic DNA was prepared in 1.5 ml microcentrifuge tubes containing 550 μl TE buffer (100 mM NaCl, 136 10 mM Tris, pH 8, 25 mM EDTA, 0.5% SDS and proteinase K, 0.1 mg/ml). The samples were incubated at 55 137°C overnight and subsequently extracted twice using phenol and then phenol/chloroform (1:1) method. DNA 138 was precipitated by adding two and a half volumes of ethanol, collected by brief centrifugation, washed twice 139 with 70% ethanol, air dried, re-dissolved in TE buffer (10 mM Tris-HCl, 1 mM EDTA, pH 7.5). DNA 140 concentration and quality were estimated with an ND-1000 spectrophotometer (NanoDrop, Wilmington, DE, 141 USA) and by electrophoresis in 0.8% agarose gels with a lambda DNA standard. 150 and NlaIII (NEB, Ipswich, MA, USA). A pilot GBS experiment was performed before the library construction 151 to optimize the temperature and time parameters for yield, size distribution. Based on the pilot experiment, the 152 GBS libraries of large yellow croaker based on EcoRI and NlaIII were constructed following the similar 153 method in previous report (Beissinger et al. 2013 165 Sequencing read quality control and genotyping

166
The raw sequencing reads generated by Illumina HiSeq 2500 from the GBS libraries were treated and 167 cleaned for SNP detection. First, the adaptors were removed and the resulted reads were split by sample-168 specific barcode sequences. Only reads begins with the digest site sequences of EcoRI and NlaIII were retained 169 for the following quality control. Second, the overall base and read quality were accessed by FastQC. To avoid 170 the negative influence of ambiguous bases for SNP detection, reads with more than 5% of N were removed. 171 Then, the resulted reads were cleaned by the following steps: 1) discarding the reads that the quality lower than 172 20; 2) deleting 5bp windows in reads end that the average quality smaller than 20; 3) removing read pairs if 173 one end was shorter than 50 bp.

174
The cleaned reads were mapped to large yellow croaker genome by BWA 0.7.6a (Li & Durbin 2009). The 175 mapping was preceded by a short reads alignment with BWA-MEM algorithm. The alignment were then 176 sorted by coordinates and duplicate marked by SortSam and MarkDuplicates programs in Picard tools 1.107 177 (picard.sourceforge.net), respectively. To reduce the false positives of SNP detection in this study, three 178 processes were carried out: 1) short read mapping were re-aligned by local bases matches; 2) base Quality 179 Score Recalibration (BQSR) was employed to adjust the accuracy of the base and mapping quality scores; 3) 180 only reads pairs that both aligned on genome with a mapping score higher than 30 were used for SNP calling. 181 Then, the SNP markers were detected by GATK UnifiedGenotyper utility.

183
Genomic DNA was extracted from dorsal fin ray tissue as the method described before. PCR 184 amplification was performed in the reaction system (5μl total volume) containing 20 ng of genomic DNA, 185 0.5U HotstarTaq (Qiagen), 0.5μl 10×PCR buffer, 0.1μl dNTPs for each nucleotide and 0.5 pmol of each primer.

194
To evaluate the accuracy of SNP detection in this study, the genotypes from GATK SNP calling were 195 compared with those from MassARRAY assay. If the genotypes of one SNP locus from GATK calling were 196 identical with that in MassARRAY, then the locus was called a correct genotype. As a result, 1,421 of 1500 197 SNP loci were correctly genotyped by GATK and the success rate of SNP calling was ~94.7%. The specificity 198 and sensitivity of SNP calling in the study were also evaluated. The reference homozygous genotypes (AA) 199 both from MassARRAY and GATK were called true negatives, and the heterozygous genotypes or allelic 211 After saponification with 1ml of 50% KOH in 15ml ethanol, the lipid was then esterified in 80°C for 20 min 212 using 6.7% boron trifluoride (BF3) in methanol (Morita chemical industries Co., Ltd., Osaka, Japan). After 213 making up in hexane (20 mg/ml), fatty acid methyl esters (FAME) preparations were analyzed by gas 214 chromatography (GC). The temperature increase of 170 to 260°C at 2 °C /min was set and helium was used as 215 the carrier gas. Since the muscle contents of EPA and DHA were highly correlated, we combined those two 216 components together in fish muscle.

217
Prior to the association study, pairwise clustering based on the alleles shared identical by state (IBS) 218 between any two individuals was performed to assess the genetic relatedness.