Development of a 38K Single Nucleotide Polymorphism Array and Application in Henomic Selection for Resistance Against Vibrio harveyi in Chinese Tongue sole, Cynoglossus Semilaevis


 Background: In recent years, the disease outbreak caused by Vibrio harveyi upset the booming development of the Chinese tongue sole (Cynoglossus semilaevis) farming industry. Genomic selection (GS) is a powerful method to improve the traits of interest, which has been proved in livestock and some fishes. Besides, the single nucleotide polymorphism (SNP) array is an efficient genotyping platform that can be used for genetic studies. To improve V. harveyi resistance in C. semilaevis, we firstly constructed a reference group of 1,572 individuals and investigated accuracies of four genomic methods (genomic best linear unbiased prediction (GBLUP), weighted GBLUP, BayesB, and BayesC) at predicting the genomic estimated breeding value (GEBV) using five-fold cross-validation and SNPs varying from 0.5 k to 500 k. Then, an SNP array was developed using the Affymetrix Axiom technology, and its accuracy in genotyping was evaluated by comparing SNPs generated by the array and by the re-sequencing technology. Finally, we selected 44 candidates as the parents of 23 families of C. semilaevis to evaluate the feasibility of the SNP array for GS.Results: all genomic methods outperformed the pedigree-based BLUP (ABLUP) when at least 50 k SNPs used for prediction, of which GBLUP resulted in better estimation than ABLUP when more than 1 k SNPs used. A 38 k SNP array, “Solechip No.1”, was developed with an average of 10.5 kb inter-spacing between two adjacent SNPs. The SNPs generated by the array and by the re-sequencing reached an average consistency of 94.8 %, of which 79.3 % of loci had a more than 90 % of the consistency. The survival rates of these 23 offspring families had a correlation of 0.706 with the family GEBVs (mid-parental GEBVs), and the average survival rate of the top five families in GEBVs (79.1 %) is higher than the bottom five families (58.1 %).Conclusion: GS is an efficient method to improve the V. harveyi resistance in C. semilaevis, and the SNP array “Solechip No.1” is a convenient and reliable tool for the Chinses tongue sole selective breeding practice.


Background
Chinese tongue sole (Cynoglossus semilaevis) is an economic marine at sh which was cultured widely across the coastal area of China and known for its delicious taste, high nutrition, and easy domestication. During the past decades, the booming development of large-scale and industrialized culturing of C. semilaevis has provided highquality sh foods for consumers and brought signi cant economic pro ts to the aquaculture practitioners. However, in recent years, the farming practice of C. semilaevis has encountered great challenges, such as frequent disease occurrence, reduction in fertility, and germplasm degeneration, which have resulted in huge economic losses. Vibrio harveyi (Gram-negative) is one of the most epidemic and high lethal pathogens in the C. semilaevis farming industry. Usually, the infected shes appeared some typical symptoms, including dermal ulceration, eye lesions, and tail bleeding. Since many studies have proven that selective breeding is an e cient method to improve the traits of interest [1], disease-resistance breeding became an important concern for aquatic breeders.
In aquaculture, the arti cial challenge test using a speci c pathogeny is a common strategy to investigate individuals' ability of disease resistance. Since the disease resistance is the trait that could not be measured directly as well as the surviving shes from the test were not suitable to be parents, selection based on the survival rate or estimation based on pedigree (ABLUP) is not precise enough for identifying individuals with great breeding potential. In 2001, Meuwissen et al. proposed a methodology that utilizes SNP markers to predict the genetic breeding values for increasing the genetic gains substantially, i.e. genomic selection (GS) [2]. After Schaeffer et al. exhibited the bene ts of GS in dairy cattle breeding with simulated data [3], research and application of GS has been rapidly developed in livestock and poultry [4,5]. Recent years, studies about genomic prediction have been reported in some farmed shes, such as Atlantic salmon (Salmo salar) [6][7][8][9], rainbow trout (Oncorhynchus mykiss) [10][11][12], common carp (Cyprinus carpio) [13], large yellow croaker (Larimichthys crocea) [14], Japanese ounder (Paralichthys olivaceus) [15], and Nile tilapia (Oreochromis niloticus) [16], etc., and all genomic methods revealed more accurate than the ABLUP at selecting individuals with excellent breeding potential. However, studies on GS about resistance to V. harveyi in sh have not been reported yet.
With the rapid development and dramatic reduction of costs on next-generation sequencing technology, the genome sequencing, re-sequencing, and genome-wide marker discovery have become accessible to aquatic species. Except for the sequencing technology, the genotyping array also provides an e cient and convenient way to detect hundreds of thousands of single nucleotide polymorphism (SNP) simultaneously, which is a bene t for the genetic studies, such as population structure identi cation, genome-wide association study (GWAS), quantitative trait loci (QTL) mapping, and GS, etc. In aquaculture, the SNP array has been developed for several arti cially cultured shes, such as Atlantic salmon, rainbow trout, and cat sh. For Atlantic salmon, a 50 k SNP array and two high-density (132 k and 200 k) arrays have been developed and applied to genetics studies [17,18]. For cat sh, a 250 k SNP array was developed and exhibited good application in population and genetic analysis [19]. For rainbow trout, a 57 k SNP array was also developed [20].
Based on the data of challenge test tted in four models, Li et al. reported that V. harveyi resistance in C. semilaevis belongs to a low heritability trait (ranging from 0.11 to 0.28), and there are moderate positive genetic correlations (0.27 to 0.51) between the resistance and growth performance (body weight and body length) [21]. Bene t from the availability of the whole-genome sequence of C. semilaevis [22], studies on the genetics of complex traits and highe cient SNP array could be developed in Chinese tongue sole. Zhou et al. identi ed several genes, such as plekha7, nucb2, and fgfr2, etc., that might be potentially related to resistance against V. harveyi in Chinese tongue sole using GWAS with 505 re-sequenced individuals [23]. However, no GS study about C. semilaevis has been reported, especially for the V. harveyi resistance. Besides, no SNP array is available for the disease-resistance selective breeding in the Chinese tongue sole yet. The main purposes of this study are to 1) investigate the accuracies of four genomic methods on predicting the genomic estimated breeding value (GEBV); 2) develop an SNP array for C. semilaevis breeding practice; 3) evaluate the selection e ciency of GS using the SNP array.

Fish materials and trait de nition
From 2014 to 2018, we established tens of families of C. semilaevis each year for the disease resistance and growth performance breeding. After mating, each family was reared separately and provided culture conditions as identical as possible before tagging. Details about family establishing was followed the describes of Chen et al. [24].
When shes grew up to approximately 10 cm, 80 to 100 juveniles were selected randomly from each family for challenging with V. harveyi via intraperitoneal injection. The concentration of V. harveyi for formal challenge test was determined through a pre-experiment. In 2014, the concentration for formal test was 2.5 × 10 7 CFU/mL [21]. After injection, all shes from the same family were moved to a 0.7 to 1.0 m 3 separate tank and reared in ltered running water. Every 8 h observed the status of subjects, and the test terminated when no more dead juveniles appeared. The dead sh was moved from the tank at each observation and recorded relevant details, such as family information, body weight and body length. Besides, the tail n was also sampled and preserved in absolute ethanol.
In this study, survival trait was de ned as a binary trait, i.e. scoring 0 when the subject died in the test, scoring 1 for the alive. Based on the results of challenge test, individuals from the families established in 2014, 2016, and 2018 were selected as the reference group for GS study.
Whole genome re-sequencing and SNP calling Genomic DNA of these shes was isolated and puri ed from the tail n using the TIANamp Marine Animals DNA Kit (Tiangen Biotech, Beijing, China). Illumina pair-ended (PE) libraries were constructed for each sh with an insert size of approximately 300 bp according to the manufacturer's protocol (Illumina). The PE libraries were then sequenced using Illumina HiSeq 2000 platform, generating 2*100 bp paired reads. The raw sequencing reads were subjected to quality control steps using QC-Chain and the tag sequences, low quality bases/reads, and ambiguous nucleotides were removed [25]. Then the clean reads were aligned to the reference genome of C. semilaevis (NCBI accession No. AGRG00000000.1) using BWA with default settings, generating the alignment results in BAM format. Prior to SNP identi cation, the BAM les were processed with Picard tools (version 1.119) to decrease false positive SNPs from duplicated genome regions. The SNP calling was then performed by SAMtools [26], with multiple ltering standards, including the mapping quality ≥ 20, the base quality ≥ 30 and the SNP quality score ≥ 20.

Quality control and SNP selection
To obtain reliable variants for the SNP array and GS study, the SNP selection was performed with several quality control (QC) steps, which decreased the false positives and ensured an even distribution of the SNPs across the genome. Firstly, raw SNPs were ltered based on the minor allele frequency (MAF), missing rate, and the Hardy-Weinberg equilibrium (HWE). SNPs with MAF less than 0.01, missing rate more than 0.1, and deviation from HWE (p = 0.001) were discarded using PLINK (version 1.90b6.6) [27]. For the GS analysis, ten subsets of genotypes with the number of SNP varying from 0.5 k to 500 k were extracted from these ltered SNPs.
Secondly, the preliminary selected SNPs were further ltered for the SNP array if (1) they were located in repetitive sequences; (2) there were other SNPs occurring in each side of 35 bp anking sequences; (3) Fewer than two copies of each allele were detected in the sequence reads; (4) GC content of their 35 bp anking sequences were more than 70% or less than 30%. In addition, the pairwise linkage disequilibrium was estimated to avoid the SNPs that fell within the same haplotype were simultaneously selected.
Finally, the remainder SNPs were submitted to Affymetrix® in-silico analytical pipeline to assess the potential of probe design. The p-converted value represents the probability of conversion to a reliable SNP assay. Based on that, potential probes were designed for each SNP in both the forward and reverse direction, and were categorized as "recommended", "neutral", "not recommended" or "not possible". Only probes classi ed into "recommended" or "neutral" were selected. The position and spacing of the selected SNPs were assessed to ensure an entire coverage in 20 autosomes. In addition, to lter the background signal from the assay, totally 2000 dish quality control (DQC) probes were designed and anchored on the SNP array as negative controls. The effects of SNPs selected for the SNP array were predicted by the software SnpEff (version 4.3t) [28]. The MAF distribution of all SNPs tilling on the SNP array was plotted using R-ggplot2 [29].

Statistical methods
Four genomic methods, including genomic best linear unbiased prediction (GBLUP), weighted GBLUP (wGBLUP), BayesB and BayesC, were used to predict the GEBVs of V. harveyi resistance in C. semilaevis. The threshold model (probit) used for these GS methods was de ned as: where \varvecy is the vector of phenotypes, the element of 0 and 1 represent the sh died and survived from the challenge test, respectively; vector \varvecb contains xed effects, including intercept, locations and years of families establishing; vector \varvecg contains random additive genomic effects distributed as N(0, \varvecGs 2 g ), of which \varvecG is the genomic relationship matrix that was constructed using \raisebox1ex$(\varvecM − \varvecP)\varvecD(\varvecM − \varvecP) ' $ \raisebox− 1ex$2? p k (1 − p k )$ [30]. In brief, the element of matrix \varvecM (M jk ) represents the genotype of sh j at locus k, and the genotypes AA/Aa/aa were recoded as 0/1/2; the element of matrix \varvecP is de ned as 2p k , and p k represents the observed second allele frequency at locus k. In addition, formula ? \varvecG + (1 − ? )\varvecA 22 was used for adjusting matrix \varvecG to avoid the potential of singularity, of which ω equals 0.95, subscript 2 expresses the genotyped individual, and matrix \varvecA is the relationship matrix based of pedigree (four generations). Matrix \varvecX and \varvecZ are incidence matrices that relate phenotypes to xed effects and individuals, respectively. The difference between GBLUP and wGBLUP is whether matrix \varvecG was weighted through matrix \varvecD. For GBLUP, matrix \varvecD is the identity matrix. For wGBLUP, matrix \varvecG was weighted through an iterative approach which was described by Wang et al. [31]. In brief, the weight of each SNP (d k ) was estimated from the SNP effect and frequency of the minor allele of SNP. At the rst iteration, each SNP had same weights which was 1, i.e. matrix \varvecD is the identity matrix. The SNP effects (\varvecu) were calculated using 2? p k (1 − p k )\varvecD(\varvecM − \varvecP) ' \varvecG \varvec * \ varvecg, of which \ varvecg is the GEBVs from GBLUP. Then, element of matrix \varvecD (d k ) for weighting matrix \varvecG in next iterations was estimated as described by Zhang et al.: \ varvecu 2 k 2? p k (1 − p k ) [32]. In this study, results of wGBLUP were derived from the second iteration, and GEBVs were re-estimated in each iteration. Genetic variances and GEBVs of GBLUP and wGBLUP were estimated by R-ASReml 3 [33].
In two Bayesian methods, the GEBV of individual j (\varvecg j ) was expressed as ? m k = 1 M jk u k , where m is the number of SNP; M jk is the genotypes of individual j at locus k; u k is the SNP genetic effects. For BayesB, π is the proportion of SNP that had genetic effects. We tested π equals 0.01, 0.02, 0.03, 0.04, and 0.05 via ve-fold cross validation to determine the value of π (results not shown here). In this study, π = 0.02 was selected for the nal GS analysis. The genetic effects of SNPs in BayesB and BayesC were estimated using R-BGLR through the MCMC Gibbs sampling with 30,000 iterations and the rst 10,000 runs being as burn-in (Pérez and de los Campos, 2014).

Cross validation
In this study, the predictive accuracies of four genomic methods with the number of SNP varying from 0.5 k to 500 k on predicting GEBVs were investigated by ve-fold cross validation. In brief, the full data set was randomly divided into ve non-overlapping subsets. In a single independent estimation, one subset was used as the validation set that the phenotypes of this subset were treated as missing, the rest of the subsets were used for model training. The above steps repeat ve times, i.e. each non-overlapping subset was used as the validation set once. The predictive accuracy was de ned as Acc = \raisebox1ex$r ( GEBV , y ) $/\raisebox− 1ex$h$ [34], where r ( GEBV , y ) is the Pearson's correlation between GEBVs and phenotypes of validation set; h is the square root of the heritability of V. harveyi resistance in C. semilaevis. In this study, heritability used here equals 0.16, which was reported by Li et al. [21]. The nal accuracy is the average accuracy of ve validation sets.
Evaluation of the SNP array performance / Page 6/17 The DNA hybridization, staining and array scanning were completed using GeneTitan Multi-Channel Instrument (Thermo Fisher, USA). The Axiom Analysis Suite software was used to control the quality of samples with parameters: DQC ≥ 0.82; call rate ≥ 97%; percent of passing samples ≥ 95%; and average call rate for passing samples ≥ 98.5%. In addition, the default settings of SNP QC criteria were used to lter the SNPs. After the computer automatically analysis, SNPs could be divided into six classi cations, including "NoMinorHom" (SNPs formed two good clusters but none of the minor homozygous genotypes were detected), "MonoHighResolution" (SNPs were detected as the monomorphic SNPs with good cluster resolution), "PolyHighResolution" (SNPs were detected as the polymorphic SNPs with three high-resolution clusters), "Other" (SNPs were identi ed as the low-quality genotypes), "CallRateBelowThreshold" (SNPs that were below the threshold of SNP call rate), and " Off Target Variant" (OTV, SNPs that few hybridization signals were captured). Therefore, SNPs with "PolyHighResolution", "MonoHighResolution", and "NoMinorHom" classi cation were recommended for downstream analysis.
To further evaluate the performance of the SNP array, the genotyping accuracy was also evaluated through comparing the SNPs generated by the SNP array and by the re-sequencing technology. For that purpose, 24 individuals of Chinese tongue sole were selected randomly from the reference group.
Application of the SNP array in breeding

Challenge test
The description of challenge test of families established in 2014 was reported by Li et al. [21]. removed. These ltering resulted in a rapid decline of the SNPs number from 23.57 M to 2.08 M. Finally, we extracted ten subsets of genotypes containing 0.5 k, 1 k, 5 k, 8 k, 10 k, 30 k, 50 k, 100 k, 300 k, and 500 k markers from these ltered 2.08 M SNPs for downstream analysis.
The accuracies of GBLUP, wGBLUP, BayesB, and BayesC in estimating GEBVs of V. harveyi resistance in C. semilaevis were evaluated using the ve-fold cross validation strategy and the ten SNP subsets. As showed in Fig. 1, the dashed line in grey represented the predictive accuracy of ABLUP (0.599), and solid line in black with rhombi, squares, triangles, and circles showed the accuracy variation of GBLUP, wGBLUP, BayesB, and BayesC, respectively. At rst, the accuracy of the four genomic methods had a growing tendency with the increasing number of SNPs used for prediction, and then basically maintained steady or declined slightly after reached the highest accuracy. For GBLUP, more than 1 k SNPs could result in better estimation than ABLUP. For BayesB and BayesC, 8 k or more SNPs were essential for good prediction. For wGBLUP, at least 50 k SNPs were needed. The accuracy of GBLUP increased sharply from 0.468 to 0.681 when SNPs added from 0.5 k to 5 k and increased gently to 0.705 with the number of markers added to 30 k. When 50 k SNPs were used for prediction, the accuracy of GBLUP reached the peak (0.724) and then remained steady when the number of SNPs continually increased to 500 k. BayesB and BayesC showed similar  Based on the criteria of Affymetrix in-silico probe design, the anking sequence of SNPs and GC content, and the potential effects of the SNPs predicted by SNP annotation, etc., 38,295 SNPs were included in the SNP array with an average p-convert value of 0.687. Since 2,000 DQC probes were serving as negative controls, the total number of SNP probes on the array was 40,295. This 38 k SNP array was named as "Solechip No.1". These 38,295 SNPs distributed throughout the whole genome with an average of 10.5 kb inter-spacing between two adjacent SNPs (Fig. 2). A total of 2,242 SNPs had an inter-spacing less than 4 kb and 1,202 SNPs had an interval that is more than 10 kb. The greatest intronic (30.51%) variants, and 19.12% and 12.30% of SNPs were identi ed as the upstream and downstream of the gene, respectively (Table 1).   To test the genotyping quality of the SNP array "Solechip No.1", 24 individuals of Chinese tongue sole were selected randomly from the reference group for genotyping again using the SNP array. After automatic analysis with the Axiom Analysis Suite software, all subjects passed the quality control processing, and 28,016 SNPs (73.2%) were recommended for downstream analysis. Among these loci, 63.2% of SNPs (including 6,668 (23.8%) "PloyHighResolution" and 11,043 (39.4%) "NoMinorHom") were classi ed into the polymorphic and 36.8% (including 10,305 "MonoHighResolution") were categorized as the monomorphic. Besides, to evaluate the genotyping accuracy, we compared the SNP of each locus generated by the array and by the re-sequencing technology. Among the recommended 28,016 SNPs, the average consistency of SNPs reached 94.8%, of which 53.5% of these SNPs had a consistency of 100% (Fig. 4). Cumulatively, a total of 79.3% of loci had a more than 90% of the consistency; 91.4% of loci had a more than 85% of the consistency. A proportion of 8.5% of loci had a consistency between 0 to 85%, and 0.2% of loci are completely inconsistency. Genomic selection using the SNP array "Solechip No.1" We selected 44 shes as the candidates and were genotyped through the SNP array "Solechip No.1". Then, these candidates were used as the parents of 23 families of C. semilaevis, and their GEBVs were estimated by the GBLUP.
The GEBVs of these 23 families were expressed as the mid-parental GEBVs. The Pearson correlation between the family GEBVs and survival rate of the corresponding family is 0.706, which is belonging to a strong positive correlation. The average survival rate of the top ve families in GEBVs is 79.1%, which is higher than the bottom ve families (58.1%).

Discussion
The arti cial challenge test is a commonly used method to identify subjects' ability of disease resistance. Based on that, the evaluation of genetic parameters and the selective breeding schemes have been achieved in some farmed shes [1]. From the results of the test conducted in 2016 and 2018, we observed abundant genetic variation of V. harveyi resistance among these Chinese tongue sole populations, which implied that the V. harveyi resistance of these populations could be improved by an appropriate strategy of selective breeding.
The accuracies of four genomic methods at estimating GEBVs of V. harveyi resistance in C. semilaevis were investigated using the ve-fold cross validation strategy and the number of SNP varying from 500 to 500 k. The accuracy of GBLUP reached the peak in the 50 k SNPs subset and became steady when more than 50 k SNPs were used for prediction. This might imply that it is su cient for GBLUP to construct a precise genetic relationship matrix using 50 k SNPs. Different from the GBLUP, BayesB and BayesC relied on the magnitude of linkage disequilibrium (LD) between genetic markers and QTLs [35]. Though BayesB and BayesC had the highest accuracy using 50 k SNPs, a few decreases were observed when more than 50 k SNPs were used for prediction. Theoretically, more SNPs meant more LD information could be captured, likewise implied more SNPs would be LD phase with each other. Since BayesB and BayesC assumed that only a fraction of SNPs had genetic effects, an excess of markers used for prediction might affect the sampling results of MCMC. This might be the reason why slight decreases in accuracy were observed when more than 50 k SNPs were used for estimation in BayesB and BayesC. Besides, it is interesting to note that wGBLUP showed erce changes in accuracy than the other three GS methods. This phenomenon might be caused by the nature of estimation of weights of SNPs. Since the SNP effect is a regressed value instead of the true value, the SNP effects derived from the GEBVs were suboptimal [31]. Therefore, it is worth to study the substitution of the regressed SNP effects by the effects estimated from the Bayesian methods.
In the 50 k SNPs subset, the accuracy of GBLUP outperformed the other three GS methods with an accuracy of 0.724 and followed by the wGBLUP (0.688). BayesB and BayesC yielded a close estimation with an accuracy of 0.659 and 0.662, respectively. Some studies on GS about the trait of disease resistance in sh reported that Bayesian methods had a close or higher accuracy than GBLUP [10,15,36,37]. However, Vallejo et al. reported that single-step GBLUP (ssGBLUP) had a better prediction than weighted ssGBLUP, BayesB, and BayesC at improving resistance to bacterial cold water disease in rainbow trout [38], which was similar to the situation showed here. According to the results of GWAS conducted by Zhou et al. [23], we could conclude that resistance against V. harveyi in Chinses tongue sole belonged to a polygenic genetic architecture. Therefore, the method based on an assumption of equal variance for all SNPs, such as GBLUP and ssGBLUP, might be more suitable than that method based on different variance for a fraction of SNPs, such as Bayesian methods and weighted single-step method for estimation of GEBV of resistance to V. harveyi in C. semilaevis. However, GBLUP or Bayesian methods only utilized the data of genotyped individuals. Therefore, the performance of the ssGBLUP which integrated phenotypes, pedigree, and genotypes of all breeding populations is worth investigating in the follow-up study [39,40].
The re-sequencing technology and SNP array are both e cient tools to obtain abundant SNPs. However, the SNP array has advantages at convenience and timesaving, which might have a priority in breeding practice. In this study, we developed a 38 k SNP array for the disease resistance selective breeding of Chinese tongue sole, which is names as "Solechip No.1". After several steps of selection, 38,295 SNPs were selected for the SNP array from 1.07 M highquality SNPs with an average of 10.5 kb inter-spacing between two adjacent SNPs and an average MAF of 0.055. A high proportion of SNPs with MAF between 0.01 to 0.05 were observed. These rare variants (MAF less than 0.05) might exist in some families, which would be bene cial to the GWAS and GS analysis. If the SNPs obtained by the resequencing were considered to precise enough, it is worth comparing the SNPs generated by the array and by the resequencing to evaluate the genotyping accuracy of the "Solechip No.1". We randomly selected 24 re-sequencing individuals and genotyped them again using the SNP array "Solechip No.1". All subjects passed the QC procedures and yielded 28,016 SNPs for subsequent analysis. Among these markers, 53.5% of SNPs had a complete consistency with the SNPs obtained by the re-sequencing and 0.2% of SNPs were fully different. We inferred that all these inconsistent SNPs might be caused by the false positive, rare bases among some families, and genotype imputation of the reference group. From these results, we could conclude that the SNP array "Solechip No.1" possessed high accuracy at genotyping, which could generate high-quality SNP to meet the needs of genetic analysis, such as GWAS and GS.
GS refers to the use of genomic prediction in selection. Here, we selected 44 candidates as the parents of 23 families of C. semilaevis and genotyped them using the SNP array "Solechip No.1". Their GEBVs were estimated by GBLUP, and the family GEBV was expressed as the mid-parental GEBV. The Pearson's correlation between the family GEBVs and survival rate of these 23 offspring families is 0.706, and the average survival rate of the top ve families in GEBVs (79.1%) is higher than the bottom ve families (58.1%). It suggested that families with high GEBV are more likely to possess low mortality after infection, which is accordant with the conclusion of the high e ciency in the selection of GS in Japanese ounder (Paralichthys olivaceus) disease-resistant breeding reported by Liu et al. [15].

Conclusion
We successfully constructed a reference group of C. semilaevis containing 1572 individuals for V. harveyi resistance breeding and investigated the accuracies of GBLUP, wGBLUP, BayesB, and BayesC at estimating the GEBVs. Among these genomic methods, GBLUP had a better estimation than ABLUP when more than 1 k SNPs used for prediction. Therefore, it could be an optimal method to improve the V. harveyi resistance in C. semilaevis. Besides, we developed the 38 k SNP array "Solechip No.1" for the Chinese tongue sole breeding practice. This array could generate highquality SNPs for genetic analysis and had a high accuracy at genotyping. Last, we selected 44 candidates of C. semilaevis as the parents of 23 families. The GEBVs of these candidates were estimated using GBLUP and SNPs generated by the SNP array "Solechip No.1". The Pearson's correlation between the family GEBVs and the survival rates of these 23 offspring families was 0.706, which indicated that selection based on high GEBV is practicable. Our study demonstrated that GS is a feasible and e cient method to improve the V. harveyi resistance in C. semilaevis, and the SNP array "Solechip No.1" is a convenient and reliable tool for the Chinses tongue sole breeding practice. Abbreviations ABLUP best linear unbiased prediction using pedigree CallRateBelowThreshold SNPs that were below the threshold of SNP call rate in the SNP array analysis DQC dish quality control, probes used as the negative control in the SNP array GBLUP genomic best linear unbiased prediction The accuracy curves of GBLUP with SNPs varying from 0.5 k to 500k.  Distribution of inter-spacing of adjacent SNPs on the SNP array "Solechip No.1".

Figure 3
Minor allele frequency distribution of SNPs tilling on the SNP array "Solechip No.1".

Figure 4
Consistency of genotypes yielded by the SNP array "Solechip No.1" and by the re-sequencing technology.