The micro-RNA content of unsorted cryopreserved bovine sperm and its relation to the fertility of sperm after sex-sorting

The use of sex-sorted sperm in cattle assisted reproduction is constantly increasing. However, sperm fertility can substantially differ between unsorted (conventional) and sex-sorted semen batches of the same sire. Sperm microRNAs (miRNA) have been suggested as promising biomarkers of bull fertility the last years. In this study, we hypothesized that the miRNA profile of cryopreserved conventional sperm is related to bull fertility after artificial insemination with X-bearing sperm. For this purpose, we analyzed the miRNA profile of 18 conventional sperm samples obtained from nine high- (HF) and nine low-fertility (LF) bulls that were contemporaneously used to produce conventional and sex-sorted semen batches. The annual 56-day non-return rate for each semen type (NRRconv and NRRss, respectively) was recorded for each bull. In total, 85 miRNAs were detected. MiR-34b-3p and miR-100-5p were the two most highly expressed miRNAs with their relative abundance reaching 30% in total. MiR-10a-5p and miR-9-5p were differentially expressed in LF and HF samples (false discovery rate < 10%). The expression levels of miR-9-5p, miR-34c, miR-423-5p, miR-449a, miR-5193-5p, miR-1246, miR-2483-5p, miR-92a, miR-21–5p were significantly correlated to NRRss but not to NRRconv. Based on robust regression analysis, miR-34c, miR-7859 and miR-342 showed the highest contribution to the prediction of NRRss. A set of miRNAs detected in conventionally produced semen batches were linked to the fertilizing potential of bovine sperm after sex-sorting. These miRNAs should be further evaluated as potential biomarkers of a sire’s suitability for the production of sex-sorted sperm.


Background
Manipulating the calf sex ratio can be a powerful tool for increasing the profitability and for accelerating the genetic gain in dairy and beef cattle farming [1][2][3]. Thus, it is not a surprise that the use of sex-sorted sperm in bovine assisted reproduction has steadily increased in the last years [4,5]. Although alternative methodologies have been described [6][7][8], the separation of X-and Y-bearing spermatozoa by means of flow cytometry after Hoechst 33342 labeling is still the technique of choice applied in most sorting facilities, mainly due to its high accuracy, repeatability and suitability for commercial application [9]. Nevertheless, several research groups had already reported that inseminating dairy heifers with a dose of 1 to 2 million frozen-thawed Xbearing sperm resulted in conception rates not higher than 70-90% of these achieved with unsorted sperm (henceforward mentioned as "conventional" in the text; [10][11][12][13][14][15]). Consequently, along with the higher price of sex-sorted products, a variable loss in bull fertility appeared to be the major cost of artificial insemination (AI) with sex-sorted sperm [16,17] and, thus, a considerable drawback to the global expansion of its use.
Recent advancements in sorting technology in combination with an almost two-fold increase of the number of sperm per AI dose (i.e. 4 instead of 2.1 million sex-sorted sperm per dose) are expected to address the fertility problem both in heifers and cows, resulting in non-return rate (NRR) values of approximately 90% of those obtained after AI with conventional sperm [18][19][20]. Nonetheless, the production of sex-sorted sperm remains an expensive procedure and processing ejaculates of sires that do not perform optimally after sex-sorting costs a considerable amount of resources. Post-thaw quality characteristics of sex-sorted sperm can be of some predictive value for its fertilizing potential after AI [21]; however, this information is available only at late stages of the production process, when sire and ejaculate selection, semen logistics, sperm sorting and cryopreservation, all time-consuming and costly procedures, have already taken place. Not surprisingly, the NRR for conventional semen (NRR conv ) has not been proven a reliable indicator of the NRR for sexsorted semen (NRR ss ) either, even when equal doses of both semen types were used for the generation of NRR data [22,23]. Indeed, a large study on dairy bulls used for the production of both conventional and sex-sorted sperm in the U.S.A. demonstrated that sire fertility rankings significantly differ between the two semen types [24].
Several studies have shown that the fertilizing potential of sperm after sorting largely varies between bulls when used either for field AI [12,20,23,25,26] or for in vitro embryo production [27][28][29]. It is well known that NRR values respond to increasing sperm doses in a bulldependent manner; this response pattern is linked to the level of non-compensable defects present in sperm and has been documented for both conventional [30,31] and sex-sorted sperm [25]. There is also indication that sperm tolerance to mechanical stress (i.e. sorting pressure) and prolonged storage prior to sorting varies between individuals [12]. Interestingly, sex-sorting affects sperm molecular mechanisms in a bull-dependent manner too. In a split-ejaculate experiment, Carvalho et al. (2012) observed that the effects of the sorting procedure on the methylation profile of the IGF2R gene of Ybearing sperm differed significantly between bulls [32]. Thus, a better understanding of bull-specific factors that affect the functional status and molecular biology of sperm cells after sorting would substantially contribute to fertility prognostics of sex-sorted sperm [9,33].
Studies about the impact of sex-sorting on the molecular features of sperm and the respective consequences for male fertility are rather scarce. It has been shown that both sexsorting and cryopreservation induce epigenetic changes to sperm, particularly related to their gene methylation pattern [32] and transcriptome profile [34,35]. In the same direction, Morton et al. (2007) described differences in the relative transcript abundance of developmentally relevant genes between day-7 bovine embryos that were in vitro produced using conventional and sex-sorted sperm [36]. Similar findings have also been reported in other ruminant species [37]. The authors attributed the differential expression of these embryonic genes to alterations of sperm molecular characteristics after sex-sorting; however, the nature of these alterations was not further investigated [36].
Among other RNA molecules, small non-coding RNAs (sncRNA), i.e. transcripts with length of less than 200 nucleotides that do not serve as template for protein synthesis, have rapidly attracted the interest of researchers in the field of animal reproduction in the last decade, mainly due to their potential use as fertility biomarkers [38]. Bovine spermatozoa are equipped with a wide array of sncRNAs including microRNAs (miRNA) and Piwi-interacting RNAs (piRNA) [39][40][41][42]. Several studies have focused on the relation between sperm miRNA profile and bull fertility [40,[43][44][45]. Although mature sperm are considered transcriptionally silent, their miRNA content shows a dynamic response to stressful procedures, like cryopreservation [34,46] and induction of capacitation [47]. Indeed, the transcriptome profile of porcine spermatozoa has recently been suggested as an indicator of their freezability and, thus, their ability to tolerate stress related to semen processing [48]. Moreover, it is known that miRNA genes located on the X chromosome are capable of escaping the meiotic sex chromosome inactivation, i.e. the transcriptional silencing of the unsynapsed X-and Y-chromosomal region at the onset of pachynema in mammalian male germ cells [49]. X-linked miRNAs remain active even until the onset of spermiogenesis and serve as post-transcriptional regulators of spermatogenesis at the late meiotic and post-meiotic phases [50]. Despite the increasing evidence about the dynamics of miRNAs in mature sperm and their role in the inactivation/activation cycle of the X and Y chromosome during sperm cell development, their profile in sperm lined up for sex-sorting has not been adequately studied yet.
In the present study, we tested the hypothesis that the miRNA profile of conventional semen is related to the fertility outcome of AI with X-bearing sperm. For this purpose, we assessed the sperm functional status and miRNA profile in conventional AI doses produced from proven sires with diverse fertility after sorting.

Descriptive statistics Sperm quality traits
Descriptive statistics (mean value±SD, min and max values) of sperm quality characteristics are presented in Table 1. The samples examined in our study were commercially produced doses that had already passed the postcryopreservation quality control before being released in the market; thus, not surprisingly the percentage of plasma membrane-and acrosome-intact sperm (PMAI) in both high-(HF) and low-fertility (LF) groups was higher than the commonly applied threshold of 40% (45.96 ± 8.63 and 48.98% ± 8.75% for the LF and HF bulls, respectively). As demonstrated in Table 1, LF bulls had a lower percentage of sperm with high esterase activity, intact plasma membrane, unstained acrosome, low intracellular Ca 2+ levels and high mitochondrial membrane potential (C pos PI neg P-NA neg F neg M pos ; 31.12 ± 6.66 and 35.08% ± 8.19% for the LF and HF group, respectively). The percentage of sperm with high DNA fragmentation index (%DFI) was similar between the two fertility groups (4.01 ± 1.59 and 4.57% ± 2.21% for the LF and HF group, respectively).

Small RNA sequencing data
In total, 48,170 to 1,070,345 reads were identified in each sample (279,763 ± 235,489 reads per sample). More than 50% of the total reads (50.78 to 72.62%) were 18-to 30-nucleotide long (Fig. 1). Across the 18 samples, 4209 unique sequences were identified after filtering of sequences with neglectable read counts. Alignment of unique sequences against bovine and human non-coding and coding sequences revealed 683 sncRNA transcripts in total, with the number of uniquely mapped reads per sample ranging between 5788 and 277,775 reads. Eighty-five miRNAs were identified across the 18 analyzed samples. Counts per million reads (cpm) of the 85 detected miR-NAs in the pooled sperm sample of each bull are available in Additional file 1, Table S1. A subset of 55 out of the 85 miRNAs was found in common with miRNAs detected in our previous studies on 30 bovine sperm samples from two cohorts of bulls [43]. The cpm values of the 10 most abundant miRNAs in samples of the LF and HF group are presented in Fig. 2a. MiR-34b-3p and miR-100-5p were the two most highly expressed miRNAs, with their relative abundance reaching approximately 30% in total (Fig. 2b).

Principal component analysis (PCA)
PCA was performed in an attempt to capture and visualize potential redundancy in the miRNA expression dataset. In total, 14 principal components (PC) were extracted, with the first five of them explaining 68.46% of the dataset's variance (27.52, 12.35, 10.94, 9.21 and 8.44%, respectively; Additional file 3, Table S3). The coordinates, quality of representation and contribution of the 85 miRNAs to the first five PCs are presented in Additional file 3, Tables S4-S6. The correlations between the first two PCs and the expression levels of the 85 identified miRNAs across the two experimental groups are demonstrated by means of a PCA correlation circle in Fig. 3a. The most characteristic miRNAs for each of the first five PCs, i.e. miRNAs whose expression levels are correlated with single PCs at significance level < 0.05, are presented in Additional file 3, Table S7. The sperm samples obtained from LF and HF bulls could not be distinctly separated when plotting their miRNA expression profile against the first and second PC (Fig. 3b). PCA plots were created for all pairs of the five PCs; however, the results were similar and, thus, not presented here.

Differential expression analysis
Two out of 85 miRNAs, miR-10a-5p and miR-9-5p, were differentially expressed (DE) between samples of the LF and HF group with a false discovery rate (FDR) of < 10% in both cases. In particular, miR-9-5p was downregulated and miR-10a-5p was upregulated in HF vs. LF sperm samples (− 1.26 and 1.31 log fold change, respectively).

Robust regression
Forward model selection revealed five miRNAs with the highest contribution to the prediction of NRR ss : miR-34c, miR-7859, miR-342, miR-106b-5p and miR-92a. Thus, the following robust regression line (M ss ) was fit: where NRR ss is the estimated value of NRR ss for individual i, a the intercept of the regression line, b 1-5 the coefficients of the respective linear regressors, [miR-x] the expression levels (cpm) of the selected miRNA, and e the additive error term of the model. The variance inflation factor (VIF) of each regressor was computed to evaluate the multicollinearity of expression levels in the subset of the five miRNAs. The regression coefficients b (±SEM) and their respective P and VIF values are shown in Table 2. Values of NRR ss were negatively related to the expression levels of miR-34c (b = − 0.011 ± 0.003, P = 0.002) and miR-342 (b = − 0.005 ± 0.001, P = 0.022), while the miR-7859 appeared to have a positive effect on NRR ss (b = 0.041 ± 0.014, P = 0.009; Table 2). The effect of miR-106b-5p expression levels on the latter was proven not significant (b = − 0.016 ± 0.010, P = 0.122; Table 2). Although NRR ss values were positively related to cpm of miR-92a, this trend was not statistically significant (0.016 ± 0.005, P = 0.058; Table 2). The NRR ss values predicted with robust regression for each of the five miRNAs (when all other regressors are kept constant at their mean value) are demonstrated in Fig. 4a. The observed expression levels of the five miRNAs in samples of the LF and HF group are presented in Fig.  4b. Three bulls (A, I, K) were identified as outliers based on their sperm miRNA profile (with overall outlying expression of the five regressor miRNAs) and were treated by the robust regression model as so; the above mentioned samples are marked in Fig. 4a. In a following step, we tried to explore whether the five selected miRNAs made an actual contribution to the variance of the outcome variable NRR ss or indirectly affected the sire's performance after sex-sorting through its overall fertility status. Therefore, NRR conv was modeled as a function of the five miRNAs (model M conv ). The standardized b coefficients and the confidence intervals of models M ss and M conv are graphically demonstrated in Fig. 5. The regression coefficients describing the relation of the five selected miRNAs to NRR conv were closer to zero, while their 95% confidence intervals crossed the vertical zero-threshold line, apparently indicating non-significance of the M conv model parameters. Therefore, it was confirmed that the five selected miR-NAs had a direct effect on the fertilizing potential of  sex-sorted sperm and did not affect its performance through the general fertility status of the bull.
Functional annotation of miRNA predicted targets DE miRNAs (miR-9-5p and miR-10a-5p) In total, 442 potential target genes were detected for the two DE miRNAs (FDR<5%; Additional file 4, Table S8). The most significant enriched GO terms were related to developmental process, cellular component organization or biogenesis, immune system process, locomotion and response to stimulus (Fig. 6). Other interesting GO terms were associated to regulation of biological process, localization, cell proliferation, biological adhesion and reproductive process (Fig. 6). The most representative enriched GO terms of the top 20 clusters (one term per cluster) along with their P and multi-test adjusted P values (q) are presented in Table 3. The complete list of GO terms with a score ≥ 1.3 is available in Additional file 4, Table S9.
To further capture the relationships between the enriched GO terms, a subset of them was selected and rendered as a network plot, where terms with a similarity > 0.3 were connected by edges. We selected the terms with the best P values from each of the 20 clusters, with the constraint that there were no more than 15 terms per cluster and no more than 250 terms in total. For network visualization, each node represented an enriched term and was colored by its cluster ID (Additional file 5, Figure A) and by its P value (Additional file 5, Figure B).
Robust regression predictor miRNAs (miR-34c, miR-7859, miR-342, miR-106b-5p and miR-92a) Six hundred and eight potential target genes (Additional file 4, Table S10) of four out of five miRNAs used for robust regression (miR-34c, miR-342, miR-106b-5p, miR-92a) were identified (FDR < 5%); miR-7859 was not included in existing databases for human species. The top enriched GO terms were associated to metabolic processes, transcription factor binding, response to stimulus, cell cycle and protein kinase binding. GO terms linked to vesicle-mediated transport, ubiquitin-like protein ligase binding, valine, leucine and isoleucine degradation, autophagy and mitophagy were also identified.
The complete list of GO terms with a score ≥ 1.3 is available in Additional file 4, Table S11.

Discussion
The use of sex-sorted sperm in bovine assisted reproduction is constantly expanding; however, it is frequently observed that bulls with high fertility after AI with conventional sperm may not perform optimally when sex-sorted doses are used for inseminating either cows or heifers. In the present study, we explored the relation between the miRNA profile of conventionally produced semen doses and the fertility status of the bull after AI with sex-sorted sperm.
Our analysis revealed a wide variety of miRNAs in bovine sperm, with miR-34b-3p and miR-100-5p comprising approximately 30% of the analyzed sperm miRNAome. The most abundant miRNA, miR-34b-3p, was present in both HF and LF bulls with a relative abundancy of approximately 15%. Similar values have been previously reported for bovine sperm in other studies [51,52]. It has been shown that transcripts of the miR-34b/c cluster are preferentially expressed in the testis and play a crucial role in sperm chromatin condensation in the stage of pachytene spermatocytes and round spermatids [53]. Based on the outcome of robust regression analysis, miR-34c but not miR-34b-3p was highlighted as a deciding predictor of NRR ss . Even though the two miRNAs are co-transcribed from a common cluster (miR-34b/c) on bovine chromosome 15, their expression and function can largely vary within the same cell type [51]. This could probably explain the relation of miR-34c but not of miR-34b-3p with NRR ss in our study.
Our results suggested a negative correlation between miR-34c expression levels and the fertility of sperm after sex-sorting. MiR-34c has been detected in sperm of several species, including the equine [54], porcine [55], murine [56] and human [57][58][59], and is considered as one of the most abundant miRNAs in bull sperm and male germ cells [42,51,60]. MiR-34c profoundly plays a role in the growth, differentiation and apoptosis of the male germ cell line through regulation of the transforming growth factor beta and the notch signaling pathways [61]. Individuals not able to express the seed sequence of miR-34c (i.e. lacking miR-34c and miR-449 simultaneously) have lower sperm concentration and impaired sperm kinetics [53]. In the same direction, Capra et al. (2017) reported elevated miR-34c expression levels in the high-motile fraction of cryopreserved bovine sperm selected by means of Percoll density gradient centrifugation [52]. In our study, miR-34c cpm were related neither to the functional traits of conventional sperm nor to NRR conv . This could be apparently attributed to the low between-bull variability of the latter. However, it is not easy to explain why sperm with lower miR-34c expression performs better after sex-sorting, as indicated by our analysis. Recent studies have highlighted the importance of an intense co-expression and target network of  miRNAs involved in the manifestation of complex phenotypic traits including bovine fertility [42,62]. In the light of this information, one could speculate that pre-sorting elevated miR-34c expression indirectly affects the fertilizing ability of sperm after sorting, possibly through alterations in the expression or function of interrelated miRNAs. Interestingly, the profile of sperm miR-34c is susceptible to semen processing and particularly cryopreservation [46]. Future split-ejaculate experiments focusing on the effect of sex-sorting on sperm miRNA profile could elucidate the role of miR-34c in the fertility of sex-sorted sperm. As suggested by our results, miR-342 was also included in the predictors of NRR ss . This miRNA has been previously identified in various bovine tissues [63,64] as well as in the testes and sperm of other mammalian species [65][66][67]. MiR-342 belongs to the long list of miRNAs whose relative abundance substantially changes during epididymal maturation [56], a process that renders sperm their ability to capacitate once deposited in the female genital tract. Furthermore, miR-342 can block the lipogenesischolesterogenesis pathway [68] which is directly linked to the capacitation of mature sperm [69,70]. Premature capacitation-like changes of sperm structure and function during sorting are counted among the main causes for the reduced longevity of sex-sorted sperm [71][72][73]. In our study, miR-342 expression levels in conventional sperm samples were negatively correlated with NRR ss but not with NRR conv or the inducibility of acrosome reaction of conventional sperm (flow cytometrically assessed after challenging sperm with calcium ionophore A23187 and dual PI/PNA staining; data not shown). Thus, any potential effect of miR-342 expression on the fertilizing ability of X-bearing sperm was apparently not linked to the capacitation status of spermatozoa before sorting.
Remarkably, our analysis and particularly forward model selection pointed out miR-7859, a miRNA that had not been previously described in bovine germ cells or sperm, as one of the five predictors of NRR ss . MiR-7859 has been previously documented as part of the microRNAome of the bovine [74] and caprine mammary gland [75], but information about its functional role are rather scarce. Based on our results, the expression levels of miR-7859 appeared to have a Fig. 5 Standardized b coefficients for two robust regression models describing the relation between the expression levels of five sperm miRNAs and the 56-day non-return rate of 15 bulls recorded after artificial insemination with unsorted (orange) and sex-sorted (blue) cryopreserved sperm, respectively. Transparent points represent the mean of the standardized coefficient b, the horizontal line the 95% confidence intervals and the thicker part of the line the 90% confidence intervals of the estimated coefficients positive predictive value for NRR ss ; however, miR-7859 transcripts were on average more abundant in LF than in HF bulls. At this point, one should note that miR-7859 and miR-106b-5p, another predictor of NRR ss values in our robust regression model, were listed among the five least abundant miRNAs in the examined samples. It is most likely that the low expression levels of miR-7859 in sperm samples of both groups along with the relatively small sample size of our experiment are responsible for these inconsistent observations.
Regarding miR-92a, its expression levels were distinctly lower in conventional sperm produced by bulls with suboptimal performance after sex-sorting. However, the positive predictive value of miR-92a for NRR ss was proven marginally not significant. Even when the weakly expressed miR-7859 and miR-106b-5p were excluded from robust regression analysis, the effect of miR-92a on NRR ss was still not significant (data not shown). MiR-92a and miR-106b-5p are members of the multifunctional paralog gene clusters miR-17-92 and miR-106b-25, respectively, and are considered essential for the Fig. 6 The top 20 gene ontology enriched clusters (biological processes) for the predicted targets of the two differentially expressed miRNAs (miR-9-5p, miR-10a-5p; top panel), and four out of five robust regression predictors (miR-34c, miR-342, miR-106b-5p, miR-92a; bottom panel), colored for P value. log10(P), P value in log base 10 undisrupted progress of spermatogenesis [76][77][78]. MiR-17-92-and miR-106b-25-deficient mice display extended loss of spermatogonia and spermatogonial stem cells, and, consequently, testicular atrophy and reduced sperm production [76,77]. The small number of sperm that reach the epididymis of miR-17-92-mutant animals exhibit normal morphology and motility characteristics [77], which implies that miR-92a might have an effect on male fertility but in a manner not directly linked to conventional sperm quality characteristics. Interestingly, Tong et al. (2012) showed that the expression of miR-106b-25 cluster increases dramatically in the germ cells of miR-17-92 knockout male mice, while sperm quality of the latter remains unchanged [76]. This observation urged the authors to suggest that the members of the two paralog clusters miR-17-92 and miR-106b-25 function in a redundant way in sperm cells. Apparently, impairment of the function of either of the two miRNA clusters can be compensated by the other, so that no phenotypic alterations are observed in the produced sperm [76]. In the light of this information, one could attribute the inconsistent effect of miR-92a and miR-106b-5p on NRR ss to the well documented functional interrelation of the two miRNAs and potential collinearity issues in the robust regression model.
When we compared the miRNA profiles between sperm samples of the two artificially created groups LF and HF, two out of the 85 sperm miRNAs were found to be differentially expressed. MiR-9-5p and miR-10a-5p were downand upregulated in the HF and LF sperm, respectively. Transcripts of miR-10a have previously been detected in mature bovine sperm [52,60], while miR-9-5p has been frequently reported in porcine sperm [34,46,79]. It has been shown that both miRNAs are upregulated in porcine sperm after cryopreservation [34,46]; however, there is no information regarding the effect of sex-sorting on miR-10a-5p and miR-9-5p profiles. In the bull, miR-10a-5p is upregulated in sperm with low motility [52]. Furthermore, overexpression of miR-10a-5p in late spermatogenesis can adversely affect the DNA repair capacity of murine spermatocytes leading to severe testicular atrophy and infertility in adulthood [80]. Based on the literature described above, the upregulation of miR-10a-5p is linked to impaired sperm quality, which makes it difficult to explain the overexpression of miR-10a-5p in sperm samples obtained from HF bulls in our study.
The production of cryopreserved sex-sorted sperm comes with inherent stress, like the elevated sorting pressure and the subsequent freezing steps, which substantially impairs Count, number of genes in the user-provided lists with membership in the given ontology term; %, percentage of all user-provided genes in the given ontology term (only input genes with at least one ontology term annotation are included in the calculation); Log10(P), P value in log base 10, Log10(q), multi-test adjusted P value in log base 10 sperm quality. Changes of sperm morphometry [81] and function [82] are well documented consequences of mechanical stress during sorting. Although there is some indication for the importance of miRNAs for sperm freezability [34], their role in the resistance of sperm against mechanical stimuli has not been studied yet. Several miRNAs mediate metabolic changes induced by mechanical stress in other types of cells [83][84][85], for example miR-92a, miR-34c and miR-21-5p that were correlated with NRR ss values in our study. Nevertheless, the contribution of miRNAs, if any, to the response of bovine sperm to sorting stress is still to be clarified.
In the present study, we intentionally selected bulls with uniform NRR conv but with diverse fertility after AI with sex-sorted sperm, in order to avoid the masked effect of a sire's general fertility status on his performance after sex-sorting. The examined semen doses had successfully passed the post-thaw quality control before being used for commercial AI in the field, and laboratory assessed traits of conventional sperm showed relatively low between-bull variability. Given the limited variance of NRR conv and sperm quality characteristics, it is not surprising that only few miRNAs were related to the above-mentioned parameters. In contrast to differential expression analysis, robust regression enabled the analysis of sperm miRNAome in relation to continuous fertility and sperm variables without losing within-group phenotypic variation; however, limitations in the interpretation of the results due to the small sample size of our study must be considered.

Conclusions
In conclusion, we were able to detect 85 miRNAs in conventional cryopreserved semen doses collected from proven sires that were in parallel used for the production of sex-sorted sperm. Our analysis revealed that NRR ss values were related to the expression levels of five out of the 85 identified miRNAs (miR-34c, miR-342, miR-7859, miR-106b-5p and miR-92a) but not to the post-thaw quality characteristics of conventional sperm. This finding raises questions regarding the mechanisms by which these miRNAs could affect the performance of sperm after sexsorting. Whether they are responsible for subtle alterations of spermatozoal phenotypes that are only manifested after sex-sorting or they specifically interfere with the performance of X-bearing sperm, remains to be clarified. Studies with higher numbers of samples and split-ejaculate experiments focusing on miRNA profiles of sex-sorted compared to conventional sperm are apparently necessary to validate the suitability of miRNAs as markers for the fertilizing potential of sex-sorted sperm.

Biological material Bull and sperm sample selection
For this study, 18 bulls (Bos taurus taurus) were selected from a pool of 50 sires housed in a single AI station under uniform management and feeding conditions. All bulls were in parallel used for the production of conventional and sex-sorted (X-bearing sperm at 90% purity) cryopreserved sperm doses. Fertility of sires after AI with conventional and sex-sorted semen was systematically recorded in form of an annual % 56-day NRR after a minimum of 500 first services (NRR conv and NRR ss , respectively). For each bull the relative change of the NRR (Δ NRR ) after AI with conventional and sex-sorted semen was computed according to the following formula: Eighteen bulls with Δ NRR values lying at the extremes of the Δ NRR distribution of the 50 sires, i.e. outside the range of mean Δ NRR ± SD, were selected and assigned in two equal groups: nine bulls (five Holstein-Friesian, one Red Holstein, two Swiss Fleckvieh and one Simmental bull) with low fertility (LF; n LF = 9), and nine bulls (three Holstein-Friesian, two Red Holstein, three Brown Swiss and one Limousin bull) with high fertility (HF; n HF = 9) after sex-sorting. The mean values±SD of fertility data (number of first services with conventional and sexsorted sperm, NRR conv , NRR ss , Δ NRR ) in relation to fertility group are presented in Table 4. Bulls of both groups showed similar NRR conv values (69.57 ± 3.33 and 68.61% ± 2.09% for LF and HF bulls, respectively); however, the NRR values of LF bulls were reduced by 22.33% after AI with sex-sorted sperm against a reduction of only 8.63% observed for the HF bulls (Table 4).

Semen collection and processing
The cryopreserved sperm samples examined in this study originated from the regular semen collection schedule of the AI center. Semen was collected by using a pre-warmed (38°C) artificial vagina after the bulls mounted on a dummy bull or cow. Ejaculates were evaluated immediately after ejaculation in terms of ejaculate volume, sperm concentration, progressive motility and morphology using a phase contrast microscopy with 100× magnification. Only ejaculates fulfilling the criteria of volume ≥ 1 mL, sperm concentration ≥ 300 × 10 6 sperm/mL and progressive motility ≥70% were further processed and cryopreserved.
In total, 72 unsorted and cryopreserved ejaculates (four conventional semen batches per bull) were used as input for laboratory semen examination and analysis of their small non-coding RNA profile.

Laboratory sperm analysis Preparation of semen prior to analysis
Four straws from each batch were thawed in a water bath (38°C, 30 s) and pooled in a pre-warmed (38°C) 1.5-mL laboratory tube. Pooled samples were further assessed with computer-assisted sperm analysis and flow cytometry immediately after thawing (0 h) at 38°C. Materials and buffers or staining solutions contacting with sperm were pre-warmed at 38°C.

Computer-assisted sperm analysis (CASA)
Sperm concentration, motility and kinematics were assessed using an IVOS II CASA system driven by the software version 1.10.1 (Hamilton Thorne Inc., Beverly, U.S.A.). A pre-warmed (38°C) 20 μm-deep 4-chamber Leja slide (IMV Technologies; L'Aigle, France) was filled with 6 μL of semen, and a minimum of 1000 cells were analyzed in at least eight randomly selected fields with 30 frames acquired per field at a frame rate of 60 Hz. Sperm with straightness ≥70% and average path velocity ≥ 50 μm/s were considered progressively motile. In each sample the percentage of progressively motile sperm (progressive motility) was recorded. Sperm chromatin structure assay™ (SCSA™) 1 The SCSA was performed to evaluate the susceptibility of sperm to acid-induced DNA fragmentation at 0 h, using a COULTER EPICS XL flow cytometer driven by EXPO32 ADC XL 4 COLOR software (Beckman Coulter Inc., Krefeld, Germany). In short, 400 μL of acid detergent solution were added to 200 μL of semen previously diluted with TNE buffer to a final concentration of 1 to 2 × 10 6 sperm/mL. Following the mixing of the sample for 30 s, 1.2 mL of AO staining solution (6.0 μg AO/mL AO staining buffer) were added and stained samples were assessed by flow cytometry after exactly 3 min. Cells were excited by a 488-nm argon laser and the emitted green and red fluorescence was captured by means of a 525/20 and a 620/15 band-pass (BP) filter, respectively. In total, 10,000 cells were analyzed for each sample at a flow rate of 200 cells/sec. Flow cytometric  (5) was employed for the simultaneous evaluation of intracellular esterase activity, plasma membrane integrity, acrosomal status, intracellular Ca 2+ levels, and mitochondrial membrane potential of sperm, respectively. The laser and band-pass (BP) filters used for the excitation and detection of emission signal of each fluorophore, respectively, are presented in Additional file 7.

Flow cytometric analysis of sperm
For the examination of each sperm sample with the multicolor assay, sperm was diluted to a concentration of 1.2 × 10 6 sperm/mL with Tyrode's solution at a final volume of 244.75 μL in a 250-μL reaction well of a 96-well plate. Just prior to the performance of the assay, the fluorescent probes were combined in a master mix solution consisting of 0.375 μL calcein violet AM, 1.5 μL PI, 0.5 μL PE-PNA, 2.5 μL Fluo-4 AM, and 0.375 μL DiIC 1 (5) per reaction well. Thus, 5.25 μL of master mix were added to each reaction well. After 15 min of incubation at 38°C, sperm were analyzed by flow cytometry.
The % size of the sperm sub-population simultaneously exhibiting the following features was quantified: high esterase activity, intact plasma membrane, PE-PNA-unstained acrosome, low [Ca 2+ ] i and high mitochondrial membrane potential. Values of PMAI were also determined.
Analysis of sperm small non-coding RNA profile Sperm RNA extraction Sperm homogenization Total RNA was extracted from cryopreserved bovine sperm using a modified heated TRIzol® Reagent-based protocol (TRIzol® Reagent, Invitrogen, ThermoFisher Scientific, Waltham MA, U.S.A.). Using the CASA system, the sperm concentration of single 0.25-ml straws was assessed post-thaw; straws of the LF and HF group had a mean concentration of 60.4 × 10 6 and 49.9 × 10 6 sperm/mL, respectively; thus, 50-60 × 10 6 cells were used as input for total RNA extraction. Sperm samples of each bull (four cryopreserved ejaculates per bull, one straw per ejaculate) were thawed on ice and pooled immediately after thawing; the sperm pellet was separated from semen extender after centrifugation at 5000×g for 5 min (4°C). Homogenization of harvested sperm was performed according to the procedure previously described by Rauber (2008) [89] in order to control potential somatic cell contamination. In short, the sperm pellet was re-suspended with 1 mL of ice-cold hypotonic somatic cell lysis buffer (10 mM Tris HCl, 50 mM KCl, 2.5 MgCl 2 , 4 mM DTT, 0.05% w/v SDS, 0.5 v/v Triton-X 100; pH 7.4), incubated on ice for 10 min and thereafter centrifuged at 5000×g for 5 min (4°C). After discarding the supernatant, the harvested pellet was washed with 1 mL of ice-cold 1× PBS (Invitrogen, ThermoFisher Scientific, Vilnius, LT) and centrifuged at 5000 for 5 min (4°C). Then, the re-harvested pellet was suspended with 1.5 mL of icecold TRIzol® Reagent (1.5 mL TRIzol®/1.2 × 10 6 sperm). In order to disrupt the plasma membrane of sperm, the TRI-zol®-sperm suspension was passed four times through a 25G needle and vigorously mixed for 30 s.
Total RNA extraction Samples were centrifuged at 12, 000×g for 10 min (4°C). The harvested pellet consisted of insoluble material, such as membranes, polysaccharides and high molecular weight DNA [89] and was, therefore, discarded; RNA was contained in the supernatant, which was transferred in a new tube, and heated at 65°C for 10 min while being mixed at 600 rpm. Right after, 200 μL chloroform/mL TRIzol® were added and samples were vigorously mixed for 30 s. Following a 3min incubation at room temperature, samples were centrifuged (12,000×g, 4°C, 15 min) and 500 μL of the aqueous phase containing the spermatozoal total RNA were separated and transferred to a new tube that included 500 μL of 100% Isopropanol/mL TRIzol®. Total RNA was allowed to precipitate for 45 min at room temperature after the addition of 10 μg RNase-free glycogen/mL TRIzol®. The RNA pellet was harvested after centrifugation at maximum relative centrifugal force (rcf) for 30 min (4°C), twice washed with 1 mL 75% Ethanol/mL TRIzol® (centrifugation between the washing steps at maximum rcf for 5 min, 4°C) and resuspended with 10 μL pyrogen free DEPC-treated water (Invitrogen, Carlsbad, CA). Then, the pellet was dissolved after 10 min incubation on ice.
DNase-treatment Collected spermatozoal RNA was treated with RNase-free recombinant DNase I (Invitrogen, ThermoFisher Scientific, Waltham MA, U.S.A.) according to the protocol provided by the manufacturer. Shortly, 1 μl 10 DNase I reaction buffer and 1 μl DNase I (1 U/μl) were added per 10 μl of DEPC-treated water used for the re-suspension of the RNA pellet, and RNA samples were incubated for 15 min at room temperature. DNase was inactivated by addition of 1 μl of 25 mM EDTA and heating at 65°C for 10 min, and was then removed through a second precipitation step. Following the addition of 10 μg Glycogen/mL TRIzol®, the tube was filled with RNase−/DNase-free water to 100 μL. Thereafter, 10 μL 3 M NaOAc (Sigma-Aldrich, Buchs, Switzerland) and 100 μL 100% isopropanol (Sigma-Aldrich, Buchs, Switzerland) were added at 1/10 volume. After incubation of samples at room temperature for 45 min, the washing steps with 75% ethanol (Sigma-Aldrich, Buchs, Switzerland) were twice performed as described above. Finally, the RNA pellet was suspended in 10 μL of RNase−/DNase-free water (Gibco, Life Technologies), incubated for 10 min on ice and right after for 10 min at 55°C before being used for downstream analysis.
Total RNA quality control Each sample was subjected to quality control regarding the quantity, purity and integrity of harvested RNA, using spectrometric evaluation (NanoDrop® 3300 Fluorospectrometer, ThermoFisher Scientific Inc., V 2.8 software, U.S.A.), and an electrophoretic assay (Agilent RNA 6000 Pico Kit and Agilent BioAnalyzer 2100 system, Agilent Technologies, Santa Clara CA, U.S.A.). RNA samples of both fertility groups showed comparable total RNA concentration, i.e. 1.6 ± 1.1 ng/μL and 1.6 ± 1.2 ng/μL for the LF and HF group, respectively. The BioAnalyzer electropherograms of the examined samples showed no 18S or 28S peaks that would be indicative of somatic cell contamination; a representative electropherogram is presented in Additional file 8, Figure C. The isolated total RNA samples were stored at − 80°C until used for library preparation.

Small RNA library preparation
Using a total RNA input of 1 ng, small RNA libraries were prepared with the NEXTflex® Small RNA Sequencing Kit v3 for Illumina® Platforms (Bioo Scientific, Austin TX, U.S.A) according to the manufacturer's instructions. A sufficient amount of~150 bp product was detected after 22 cycles of polymerase chain reaction amplification, but also a considerable amount of~130 bp adapter-only product was observed in the analysis of the small RNA libraries with an Agilent High Sensitivity DNA Kit (Agilent Technologies, Santa Clara CA, U.S.A.). To reduce the amount of adapter-only products, a polyacrylamide gel size selection was performed. A hand-cast 10% polyacrylamide gel was prepared with the following reagents: 0.8 mL of 50× Tris-acetate-EDTA (TAE), 29.2 mL of H 2 O, 0.25 g ammonium persulfate (APS) 25% dissolved in 1 mL H 2 O, 32 mL of N,N,N,N′-tetramethylenediamine (TEME D) which acts as a catalyst and 10 mL of 40% acrylamide. In a next step, all the reagents were combined in a beaker for the gel solution, except for the TEMED and APS that were degassed for 15 min. Just prior to pouring, TEMED and APS were added to the solution to polymerize in a prepared cast tray of two glass plates and two spacers. The solution was poured 1 cm bellow the teeth of the comb and it was left to cast for 1 h. After 1 h, when it is solid, the comb was removed and positioned to stand in the gel stand and 1× TAE buffer was poured. Gel wells were washed with a pipette. Then, 5 μL of 10 bp (2 μg/lane) of ready-to-load Low range DNA ladder (Gibco BRL, Life Technologies), 18 μL of sample and 2 μL of 6× loading dye (ThermoFisher Scientific Inc., Waltham, U.S.A.) were loaded. Samples were loaded slowly and allowed to settle evenly on the bottom of the well. After loading the samples, the gel was started at 100 V and left to run overnight. Then, the voltage was increased up to 300 V for 1 h until the dye bands reached the bottom of the gel. After 1 h, the gel was removed carefully from the glass plate and stained with SYBR Gold. Forty milliliters of 1× TAE and 4 μL SYBR Gold were prepared. The gel was stained on a shaker for 1 h. The area between 147 and 165 bp was cut out on a UV transilluminator (ChemiDoc™ MP Imaging System, Bio-Rad) using a clean scalpel. The gel slices were placed into a clean 1.5 mL tube and crushed thoroughly with a disposable pestle.
Purification from the gel slice 300 μL of elution buffer were added to the crushed gel slices. To obtain as much gel as possible, the pestle was also washed in the tube with the buffer. Washed gel slices were incubated around 3.5 h at room temperature on a shaker at 400 rpm. The eluate including crushed gel was transferred carefully to spin columns (Millipore Centrifugal Filter units). After centrifugation at 16,000 rcf for 2 min, spin filters were disposed. Then, 50 μL of NEXTflex Cleanup Beads and 350 μL of isopropanol were added and incubated at room temperature for 10 min, vortexed and spinned down for 10 s. The supernatant was separated in clean tubes and magnetized for 2 min or until the solution appeared clear. After discarding the supernatant, 950 μL of freshly prepared 80% ethanol was added and incubated for 30 s, then the supernatant was removed. This washing step was repeated twice and thereafter samples were dried for 3 min.
Then, samples were removed from the magnetic stand and bead pellets were re-suspended in 13 μL of resuspension buffer. Finally, the samples were incubated for 2 min and magnetized for 3 min. Thereafter, 12 μL of supernatant, which was the sequencing library, were transferred to clean 1.5 mL tubes.

Sequencing and bioinformatics analysis
Small RNA libraries were sequenced at the Functional Genomics Center Zurich (Zurich, Switzerland) as one pool of 18 barcode-tagged samples on one lane of an Illumina HiSeq 2500 instrument (75 bp single-end reads). Analysis of the obtained fastq files was performed on a local Galaxy server installation [90] as previously described [91,92]. Briefly, the tool 'clip adapter sequences' was used to remove adapter sequences. Nonclipped sequences were discarded. Removal of PCR duplicates was based on the four random nucleotides on each side of the cDNA inserts that were introduced during ligation of the 5′ and 3′ RNA adapters during library preparation. All identical cDNA sequences containing the same four nucleotides at the ends were removed by using the tool 'collapse'. Afterwards, the four random bases were cut at each side of the sequences and unique sequences and their read counts were obtained by running the tool 'collapse' again. Then, a count table was generated by joining the lists of all samples in one table and read counts were scaled by total number of reads. The sequences with neglectable read counts, mainly derived from sequencing errors were removed using the cpm per sample filtering tool. Then, the cutoff was set to 42.84 cpm corresponding to an average of 20 reads per library for at least 5 out of 18 libraries. After that, the annotation of small RNA sequences was performed using Basic Local Alignment Search Tool (BLAST). The BLAST databases involved bovine and human sequences from miRBase (release 22.1), Rfam 14.1, transcript sequences from Ensembl and National Center for Biotechnology Information (NCBI), including non-coding RNAs, and predicted piRNA sequences.
The analysis of DE miRNAs was performed using the Bioconductor package EdgeR [93]. An adjusted P value (FDR) of 10% was used as a threshold to determine DE miRNAs. Subsequently, the MicroRNA ENrichment TURned NETwork (MIENTURNET; http://userver.bio. uniroma1.it/apps/mienturnet/) was used to identify target genes of two lists of miRNAs, the two DE miRNAs and the five miRNAs used for robust regression, with FDR < 5%,. The miRNA-target interactions were provided from the miRTarBase database via the MIENTURNET tool [94]. In addition, following the identification of target genes, the biological DataBase network bioDBnet (http:// biodbnet.abcc.ncifcrf.gov) allowed us to match gene symbols with Entrez Gene IDs (bovine and putative homolog orthologs) [95]. Gene ontology enrichment analysis (biological process, molecular function, cellular localization) and pathway analysis were conducted using the Metascape web platform (http://metascape.org/gp/index.html#/main/ step1) [96]. For each given gene list, pathway and process enrichment analysis was carried out with the following ontology sources: KEGG Functional Sets, GO Biological Processes, KEGG Pathway, GO Molecular Functions, GO Cellular Components, KEGG Structural Complexes, Reactome Gene Sets, Canonical Pathways, BioCarta Gene Sets, CORUM, TRRUST and Transcription Factor Targets. All genes in the genome have been used as the enrichment background. Terms with a P value < 0.01, a minimum count of 3, and an enrichment factor (i.e. the ratio between the observed counts and the counts expected by chance) > 1.5 were collected and grouped into clusters based on their membership similarities. More specifically, P values were calculated based on the accumulative hypergeometric distribution [97], and q-values were calculated using the Benjamini-Hochberg procedure to account for multiple testing [98]. Kappa scores [99] were used as the similarity metric when performing hierachical clustering on the enriched terms, and sub-trees with a similarity of > 0.3 were considered a cluster. The most statistically significant term within a cluster was chosen to represent the cluster. The functional network of the representative enriched GO terms was visualized using the Cytoscape software (v3.1.2) [100] with "force-directed" layout and edge bundled for clarity.

Statistical analysis
Three out of the 18 bulls (two LF and one HF bulls) were not included in the statistical analysis because of their outlier miRNA expression levels. The statistical analysis was performed using the R Language for Statistical Computing version 3.6.1 [101].

Descriptive statistics
The set of continuous variables that was used as input for statistical analysis included: a) sperm quality traits (progressive motility, %DFI, PMAI sperm, C pos PI neg PNA neg Fneg M pos sperm), b) fertility measures (NRR conv , NRR ss , Δ NRR ), and c) miRNA expression levels (cpm). The mean value, standard deviation, min and max values were reported as descriptive measures of sperm quality and fertility traits. The association between miRNA expression levels and sperm quality traits or fertility variables was tested using the Spearman's rank correlation coefficient r s ; the respective P values were corrected for multiple testing using the Holm-Bonferroni method. In this case, r s values were computed at bull level, i.e. between the cpm of individual miRNAs and the mean value of sperm quality traits across ejaculates of the same bull. The Spearman's correlation test was carried out on the overall dataset, including data of both experimental groups.

Principal component analysis
The expression levels of the 85 identified miRNAs were assumed to be linearly inter-correlated; thus, PCA was performed to summarize and visualize the variance of the miRNA expression dataset. The standardization of miRNA expression levels, extraction of principal components and graphical demonstration of the PCA output was performed using the toolset of the FactoMineR [102] and factoextra [103] statistical packages for R.

Robust regression
Forward model selection was employed to identify the subset of miRNAs whose expression made the most valuable contribution to explaining the variance of NRR ss . The selection process was performed twice using two different datasets: a dataset including all 85 miRNAs detected in the examined sperm samples, and a dataset including only the 72 miRNAs with the highest contribution to the first 5 PCs (Additional file 3, Table S7). In both cases, the same subset of miR-NAs was recovered. Thus, a series of linear models including a maximum of five predictors were built using the regsubsets function in the leaps package for R [104]. The linear model with the best overall fit was selected based upon the values of the Bayesian Information Criterion. After determining the five most important miRNAs, the relationship between NRR ss and the latter was assessed with robust regression. The robust regression approach for RNA-seq data has been suggested by Seo et al. (2016) as a way to overcome the issue of influential observations in experiments of small sample size [105]. Shortly, the rlm function of the MASS package for R was applied for fitting the regression line with an MM-estimator [106], with the expression levels of the selected miR-NAs and NRR ss functioning as predicting and response variables of the model M ss , respectively. The use of MM-estimation, a combination of S-and Mestimators, is robust against and allows the detection of outliers in the dataset. Because of the small sample size in our study, model parameters were determined using the Wald test. Similar to model M ss , a second robust regression line (M conv ) was fit with the expression levels of the five selected miRNAs functioning as predictors of NRR conv . Values of the outcome and predicting variables in M ss and M conv were meancentered and scaled by one SD, in order to compare the standardized b coefficients between the two models.
the Functional Genomics Center Zurich for the Illumina sequencing of the small RNA libraries. The study has been conducted in the facilities of the joint research and educational institute AgroVet-Strickhof (Eschikon 27, CH-8315 Lindau, Switzerland); the authors are grateful to the AgroVet-Strickhof cooperation for providing the necessary infrastructure and resources.
Authors' contributions EK performed the lab work (total RNA extraction and quality control, preparation of small RNA libraries), carried out part of the bioinformatic analysis, reviewed the existing literature and prepared the manuscript. EM supervised the experiment, designed the study, performed the statistical analysis and made major contribution to literature search and manuscript writing. SiB carried out part of the lab work (total RNA extraction and quality control, preparation of small RNA libraries). MS performed the computerassisted sperm analysis and flow cytometric sperm examinations. SW and UW contributed to the conception of the work, the collection of the samples and fertility data. StB supervised the experiment, analyzed the RNA sequencing data and critically reviewed the manuscript. HB supervised the experiment, designed the study and critically reviewed the manuscript. All authors read and approved the final manuscript.
Authors' information EM was holding a researcher position in the Veterinary Research Institute, Hellenic Agricultural Organization Demeter, Thermi 57001, Thessaloniki, Greece at the time of manuscript preparation.

Funding
This work was partially supported from the Turkish Government Scholarship that was granted from the Ministry of Agriculture and Forestry General Directorate of Agricultural Research and Policies, Republic of Turkey (Law-Nr. 1416) to EK for sample and data analysis as well as for manuscript preparation during her doctoral studies. The funding body played no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.

Availability of data and materials
The bulls included in this study as well as the biological material and data (small RNA sequencing data, artificial insemination fertility data) are a property of Swissgenetics (Swissgenetics, Meielenfeldweg 12, CH-3052 Zollikofen BE, Switzerland; https://swissgenetics.com/). The small RNA sequencing data that support the findings of this study were used under license for this current study only. Data are, however, available from the authors upon reasonable request and with permission of Swissgenetics; restrictions apply to the availability of these data.

Ethics approval and consent to participate
The cryopreserved bovine sperm doses used in this study were commercially produced and made available by Swissgenetics (Swissgenetics, Meielenfeldweg 12, CH-3052 Zollikofen BE, Switzerland; https://swissgenetics. com/). The sperm donors were owned by Swissgenetics, housed in the Swissgenetics Semen Production Center in CH-5243 Mülligen AG, Switzerland and routinely used for commercial bovine semen production. No additional animal handling or treatment were required for the acquirement of the sperm samples. Thus, according to the European, Swiss federal and cantonal legislation, a study approval of animal use for research purposes was not required. In addition, the present study had no effect on the well-being and fate of sperm donors.

Consent for publication
Not applicable.
Competing interests UW and SW were employees of Swissgenetics (Swissgenetics, Meielenfeldweg 12, CH-3052 Zollikofen BE, Switzerland; https://swissgenetics. com/) at the time of the conception of the study, generation and analysis of the data. The biological material and data needed for the realization of the study are a property of and were made available from Swissgenetics. StB was an associate editor of the BMC Genomics journal at the time the study was realized.