Genome-wide association and expression quantitative trait loci in cattle reveals common genes regulating mammalian fertility

Most genetic variants associated with fertility in mammals fall in non-coding regions of the genome and it is unclear how these variants affect fertility. Here we use genome-wide association summary statistics for Heifer puberty (pubertal or not at 600 days) from 27,707 Bos indicus, Bos taurus and crossbred cattle; multi-trait GWAS signals from 2119 indicine cattle for four fertility traits, including days to calving, age at first calving, pregnancy status, and foetus age in weeks (assessed by rectal palpation of the foetus); and expression quantitative trait locus for whole blood from 489 indicine cattle, to identify 87 putatively functional genes affecting cattle fertility. Our analysis reveals a significant overlap between the set of cattle and previously reported human fertility-related genes, impling the existence of a shared pool of genes that regulate fertility in mammals. These findings are crucial for developing approaches to improve fertility in cattle and potentially other mammals.


Introduction
Line 52-53 Some more explanation is required here to indicate that SNP chips, with specific predefined content, are less likely to include variants with low heritability associated with fertility.
Could the authors mention the breed or sub-species in the introduction and describe the study population further?Were all the datasets from the same population including the 28K heifers the results were validated in?This information is provided in the results but this paragraph in the introduction requires some additional detail added to make the rationale for the study clearer.It would even be useful to define the discovery and validation populations here I think.
In the introduction it is also important to mention the ChIP and ATAC data and the analysis of the eQTLs in relation to linking fertility traits to functional genomic regions.

Results and Discussion
I realise it is a prerequisite of the journal but having the methods after the results and discussion makes the narrative for this paper quite difficult to follow as there are several different datasets and populations being analysed.This is why I think describing the study populations at the end of the introduction might make following the narrative easier.
Line 112-113 Please add details of the species when describing expression levels of genes here and throughout the manuscript.The same is true for the GWAS results where often the species is not reported and this makes the manuscript really hard to follow.
Line 128 to 130 Make sure when it's the results of this study that this is referred to so the narrative is easier to follow.Without referring to the gene expression results relating to the cohort of cattle with expression profiles from whole blood analysed in this study it is really difficult to follow which study is being referred to.
Line 134 The expression of PLAG1 is only measured in whole blood in this study though, could it not be higher in other tissues, such as liver that are associated with growth, that are not measured and in fetal tissues as shown in fig 4 and Fang et al. 2020? Line 156-157 Could the authors determine if this is the case by performing the analysis only with Bos indicus individuals in the validation population?Line 165-166 Include the breed of the heifers here or if they were Bos indicus so the narrative is easier to follow.
Line 165-166 More detail in Fig 1 on the populations would really help here to show which dataset and breeds etc were used for which component of the analysis.Adding that the cattle in the discovery population were Bos indicus would also be useful to make clear here.
Iine 207-207 More description is required here.What is the functional relevance of the lead trans eQTLs being located in ATAC-Seq or ChIP peaks?What is the significance of the datasets used e.g. for the purposes of annotation of the bovine genome as a resource to link complex traits to functional genomic regions.
For figure 3 could any of the GWAS hits be annotated?

Methods
Including some of this information about the study population earlier in the manuscript would make the rationale for the study and narrative of the results and discussion much clearer.

Could more information in Fig 1 be included showing the different breeds in the datasets used for each part of the analysis?
Were the red blood cells lysed before the RNA extractions were performed?Would the expression profiles be predominantly associated with white blood cells?Line 411 After filtering for expression 'of the 10,455 genes included in this analysis' were these genes particularly associated with any functional roles e.g.immunity?
The datasets haven't been deposited in the public repositories yet but the authors indicate where the accession will be added in the manuscript.This should ideally be done before the manuscript is resubmitted.

Specific line changes
Line 50 change 'demonstrates' to 'demonstrating' Line 82 Add 'in humans' after 'menarche' Line 160-161 Change to 'The current results provide' Line 183-184 Include whether this study is on cattle.Line 184-185 and presumably also sex and age of animals?Line 190-192 Provide details of the genome and annotation used to define these regions.Add citation or accession and some description of the ChIP data, which tissue etc? Line 206 Add 'a' before 'previous' and add whether the study is on cattle.Line 218 Delete 'variants'?Line 220 On the expression data from whole blood?Line 230 Add reference for the previous study.Line 237 Change 'the' to 'a' Line 241 Add that the effect might be more obvious in other tissues, but is difficult to sample other tissues at scale in the same way as blood.Line 275 Remove the extra 'the' Line 290 I can't see how fetal age in weeks is a heritable trait as it is explained here?Line 392-393 How were the blood samples stored and preserved prior to RNA extraction?Line 400 What was the sequencing depth per sample?Line 452 Please include the ENA accessions for these datasets and also add the cattle gene expression atlas reference here for the data shown in Fig 4 .Reviewer #2 (Remarks to the Author): Forutan et al. studied the genetic underpinning of female fertility, an economically important yet challenging trait in cattle.They use multiple proxy phenotypes in two cattle populations and integrate RNA seq data to identify 87 genes putatively associated with female fertility.They further show evidence for a shared pool of genes regulating fertility in cattle and human.The paper is thus of interest to the broad readership of the journal.
The paper is interesting and generally well written.However, I have the following concerns 1. Authors motivate integration eQTL data (Line 217) by stating significant GWAS loci which are also eQTL are more likely to be causal variants.Is there a reason why splicing QTL were not studied?Given there is limited overlap between GWAS and eQTL signals (Line 237), it might be worth looking into splicing QTL. 2. It is not clear in the result section that the trans-eQTL scan was limited to variants significant in GWA and cis-eQTL mapping analyses.A discussion on how many cis-eQTL showed trans effect is perhaps interesting.3. The proportion of variance explained by the 255 associated variants is reported only for one trait (heifer puberty).A discussion on variance explained for the other four fertility traits could be interesting.
Few other minor points 1.In methods, (example #358) heifer puberty is referred to as CL600.2. Line #80: "some overlap in" -perhaps authors could provide numbers here.3. Line #124: Not clear what do authors mean by "full GWAS".
Reviewer #3 (Remarks to the Author): The manuscript reports identification of genes affecting fertility in cattle.Fertility traits are economically important to beef cattle production.The authors conducted a comprehensive study and the results are very valuable to both the science community and the industry.In general, the experimental design and data analyses were well described and the results were well presented and discussed.The manuscript can be accepted for publication after addressing the following comments: Line 154, "these loci explained 2.3% of the variation".Was it phenotypic or genetic variance?Please clarify in the manuscript.Line 187."Seventy-eight percent of the lead cis-eQTLs were close to the respective gene start site".How many cis-eQTL were within the respective gene?i.e. a DNA variant within a gene was found to be associated with the gene expression of the same gene.
Line 311 "Only 29 autosomes and variants with imputation accuracy of greater than 0.4 and MAF>0.01 (31,140,417) were used for GWAS study".An imputation accuracy of 0.4 seems very low (also in line 413).Please justify why an accuracy of greater than 0.4 was used.In addition, please briefly describe how the imputation accuracy of DNA variants was assessed.
Line 324, please indicate how the genotypes were coded and what variant effect (i.e allele substitution or additive effect) was estimated?Based on supplementary tables, it looks like that allele substitution effect (b) was estimated.But please include the SNP effect estimation information in the manuscript.
Line 327, Please describe how the binary trait "Heifer pregnancy status" was analyzed.
Line 328, please describe how was the contemporary groups defined, and how many levels of the contemporary groups.
Line 330, "for trait AFC, calving success defined as 0 and 1, based upon whether they calved before or after 900 days of age was considered as a fixed effect".Does this mean if the cow calved before 900 days of age, its calving success was assigned "1", or otherwise as "0".Please justify why calving success was considered as a fixed effect for AFC.
Line 366, please describe phenotype values of the validation population in more detail.Were they pre-adjusted for non-genetic effects?Line 380.Did the 489 Bos indicus heifers and cows used for expression analysis have the same or similar four fertility phenotypes?If yes, did the 489 cattle have similar mean and ranges of variation as the population used for the GWAS?Table S1, it would be more informative to add mean, range of phenotypic values, SD of the four traits and their estimates of genomic Heritability from the GWAS?Table S2 and Table S4, It would be more informative to add SNP annotation (A1, A2) and indicate which SNP allele is the minor allele with the allele frequency, i.e. similar to Table S6 on SNP information (A1 A2).
Introduction Line 52-53 Some more explanation is required here to indicate that SNP chips, with specific predefined content, are less likely to include variants with low heritability associated with fertility.authors: More explanation has been added to this paragraph.
Could the authors mention the breed or sub-species in the introduction and describe the study population further?Were all the datasets from the same population including the 28K heifers the results were validated in?This information is provided in the results but this paragraph in the introduction requires some additional detail added to make the rationale for the study clearer.It would even be useful to define the discovery and validation populations here I think.authors: The discovery and validation populations have now been more clearly described in the introduction.
In the introduction it is also important to mention the ChIP and ATAC data and the analysis of the eQTLs in relation to linking fertility traits to functional genomic regions.authors: Additional detail has been added in the introduction to describe this.

Results and Discussion
I realise it is a prerequisite of the journal but having the methods after the results and discussion makes the narrative for this paper quite difficult to follow as there are several different datasets and populations being analysed.This is why I think describing the study populations at the end of the introduction might make following the narrative easier.