Defining the optimal animal model for translational research using gene set enrichment analysis

Abstract The mouse is the main model organism used to study the functions of human genes because most biological processes in the mouse are highly conserved in humans. Recent reports that compared identical transcriptomic datasets of human inflammatory diseases with datasets from mouse models using traditional gene‐to‐gene comparison techniques resulted in contradictory conclusions regarding the relevance of animal models for translational research. To reduce susceptibility to biased interpretation, all genes of interest for the biological question under investigation should be considered. Thus, standardized approaches for systematic data analysis are needed. We analyzed the same datasets using gene set enrichment analysis focusing on pathways assigned to inflammatory processes in either humans or mice. The analyses revealed a moderate overlap between all human and mouse datasets, with average positive and negative predictive values of 48 and 57% significant correlations. Subgroups of the septic mouse models (i.e., Staphylococcus aureus injection) correlated very well with most human studies. These findings support the applicability of targeted strategies to identify the optimal animal model and protocol to improve the success of translational research.

Thank you for the submission of your manuscript to EMBO Molecular Medicine and many apologies due to intervening holiday season and the delay in retrieving the evaluation from one reviewer, who ultimately did not deliver.
We have now heard back from the three Reviewers whom we asked to evaluate your manuscript.
As you will see the Reviewers issues are globally positive, but #1 and #2 raise a number of important issues. Although I will not dwell into much detail, I would like to highlight the main points.
Reviewer 1 mentions the dilemma that on one hand different conditions to induce inflammation lead to different outcomes and on the other, particular triggers produce similar effects. While s/he is not asking you to solve the conundrum, you should address and discuss it. The reviewer also lists a number of instances where additional and more detailed information and analysis should be provided.
Reviewer 2 is also positive while more reserved. In general, s/he raises a general issue of novelty, which is of great concern for us, and suggests a number of approaches, both experimental and in discussion, to resolve this aspect. Similarly to Reviewer 1, Reviewer 2 also requests a number of clarifications.
In conclusion, while publication of the manuscript cannot be considered at this stage, given the potential interest of your findings and after internal discussion, we have decided to give you the opportunity to address the above concerns.
We are thus prepared to consider a revised submission, with the understanding that the Reviewers' concerns must be addressed with additional experimental data where appropriate and that acceptance of the manuscript will entail a second round of review. The overall aim is to significantly upgrade the relevance and usefulness of the manuscript.
Please note that it is EMBO Molecular Medicine policy to allow a single round of revision only and that, therefore, acceptance or rejection of the manuscript will depend on the completeness of your responses included in the next, final version of the manuscript.
EMBO Molecular Medicine now requires a complete author checklist (http://embomolmed.embopress.org/authorguide#editorial3) to be submitted with all revised manuscripts. Provision of the author checklist is mandatory at revision stage; The checklist is designed to enhance and standardize reporting of key information in research papers and to support reanalysis and repetition of experiments by the community. The list covers key information for figure panels and captions and focuses on statistics, the reporting of reagents, animal models and human subject-derived data, as well as guidance to optimise data accessibility. The checklist will be published with the Peer-Review process file in case of acceptance of your manuscript, in accordance with our Transparent Review Process.
As you know, EMBO Molecular Medicine has a "scooping protection" policy, whereby similar findings that are published by others during review or revision are not a criterion for rejection. However, I do ask you to get in touch with us after three months if you have not completed your revision, to update us on the status. Please also contact us as soon as possible if similar work is published elsewhere.
I look forward to seeing a revised form of your manuscript as soon as possible. ***** Reviewer's comments ***** Referee #1 (Remarks): In this manuscript the authors re-analyzed the dataset of Seok et al (PNAS 2013) that has resulted in questioning in more general terms the relevance of mouse models to mimic disease conditions in human. This same dataset has been re-analyzed recently also by Takao and Miyakawa (2015) leading to a different conclusion. The diverse outcomes and conclusions drawn raise the question where the problem lies. Are the models poorly reproducing the phenotypes seen in man, have we not yet used the right tools to determine what that right models are, or do we not apply the right analytical tools to analyze them? In the current study this same dataset was analyzed using Gene set enrichment analysis (GSEA) that utilizes the expression data from all the transcripts, independent on their level of expression, belonging to a particular pathway. In using this strategy the authors found that a subset of the mouse models actually well mimicked the human condition associated with sepsis. The authors conclude from their study that it is important to select the "right" mouse model for a particular disease indication for translational research purposes.
The authors mention thousands of genes being used in their analysis as to not discard information that can be acquired from relatively small differences. They should indicate which gene sets they have actually used for their calculations. Are these the genes listed in Table 1? Is the selection of genes as belonging to a "particular pathway" robust enough? Most pathways are only superficially known whereas especially in case of comparing different species the unidentified components could well play a prominent role.
Some sentences need revision. E.g. In the paragraph "the Paper explained" they cite two groups but with the same sentence "genomic responses in mouse models poorly mimic human inflammatory diseases". I presume that is not what they meant. It also helps the review process when page numbers are indicated. CD14 and CD41 are both used for denoting CD14. There are additional sentences that do not read properly.
It would be helpful if the authors included an enrichment plot with normalized enrichment score and q values for false discovery rates of expression levels of genes acting in the Toll receptor pathway for both human disease and mouse models.
Similarly, it would be useful to explore the similarities and dissimilarities between samples: a sample to sample heat map or sample to sample distance matrix for upregulated and downregulated genes in the inflammatory pathways in both mouse and man.

Referee #2 (Remarks):
Weider and collaborators tackle the question of the relevance of biological data in different mouse models to mimic and study human inflammatory diseases. This question has been addressed in two previous articles (Seok et al, 2013 and Takao&Miyakawa, 2015) using the same data resulting in contradictory results. In contrast to previous approaches, the authors follow a strategy that does not restrict the analysis to genes that are highly up or downregulated in the disease samples. Instead, they performed a gene set enrichment analysis that compares the transcriptional modulation of predefined gene sets of pathways that are assigned to inflammatory processes in either human or mouse. This approach has the advantage of analyzing a whole set of related genes rather than selected subset of genes, and focus on the analysis of biological pathways and not individual genes.
The authors claim that this approach is fundamentally different from assigning GO terms or pathways to genes after filtering for strongly regulated genes. I only partially agree with the authors, as a pathway where most of their gene members are strongly up or downregulated should be captured following both a gene-based approached and a GSEA method. Accordingly, one would expect an overlap on the pathways significantly enriched detected by both methods. In this context, I miss a detail comparison of the significantly regulated pathways identified in this study and those found by Takao and Miyakawa,2015 using a gene based approach. In this comparison, the authors should pay particular attention to the gain in biological information that their approach provides compared with the Takao and Miyakawa,2015, in terms of the enriched pathways that both methods found.
In general it is unclear what is the novelty that this analysis provide compared to the study of Takao and Miyakawa,2015. I suggest the authors to highlight in the manuscript the distinct contribution of this analysis to the subject. It seems that the method is able to point to the mouse model that better mimics the human disease. However, cannot the gene-based approach proposed by Takao and Miyakawa,2015 do the same?. It would be interesting to compare their results to the correlations between mouse-human samples of Takao and Miyakawa,2015 and analyze if the results are qualitatively different.
I find the paragraph describing the results of the GSEA a bit confusing. This paragraph does not reflect the large differences between human-human, mouse-mouse and human-mouse comparisons that the authors claim to observe in the manuscript (they say that human datasets correlate with each other very well and that various distinct mouse datasets only showed a slight correlation). Human datasets have average positive and negative predictive values of 61%. The comparison between mouse datasets reveals an average positive and negative predictive value of 44%. The authors add that "strikingly, the overlap between mouse and human revealed average positive and negative predicted value of 48% for all human and mouse datasets". The numbers of the human-human comparison (61%) are just slightly higher than those of human-mouse comparison (48%) and the latter is even higher than the mouse-mouse comparison (44%). These numbers show that similar differences in pathway regulation are found within and across species, and suggest that these differences are due to differences in samples rather than a species differences. There is not evaluation on how significant are the differences they observe. I would suggest to do this in order to substantiate their claims.

Referee #3 (Remarks):
Albeit I am not a statistician, I really enjoyed and followed the points made by Dr. Weidner and colleagues. In their paper, the authors compared transcriptomic datasets between human/human, mouse/mouse and mouse/human addressing the IMPORTANT question if data obtained in preclinical (mouse) models can be translated in the human setting. Their main finding is a moderate overlap between mouse/mouse and human/mouse data for most datasets. Interestingly, in some diseases a high correlation was observed.
Overall, this paper adds to setting the value of preclinical animal models, which need to be determined for each model individually.

Referee #1 (Remarks):
In this manuscript the authors re-analyzed the dataset of Seok et al (PNAS 2013) that has resulted in questioning in more general terms the relevance of mouse models to mimic disease conditions in human. This same dataset has been re-analyzed recently also by Takao and Miyakawa (2015) leading to a different conclusion. The diverse outcomes and conclusions drawn raise the question where the problem lies.
Are the models poorly reproducing the phenotypes seen in man, have we not yet used the right tools to determine what that right models are, or do we not apply the right analytical tools to analyze them?
In the current study this same dataset was analyzed using Gene set enrichment analysis (GSEA) that utilizes the expression data from all the transcripts, independent on their level of expression, belonging to a particular pathway.
In using this strategy the authors found that a subset of the mouse models actually well mimicked the human condition associated with sepsis. The authors conclude from their study that it is important to select the "right" mouse model for a particular disease indication for translational research purposes.

Critique:
Although it is satisfying that by using GSEA one can find quite good correlations between human disease and some of the mouse models the problem remains how then to identify the appropriate mouse model. According to the authors testing a series of mouse models on the basis of GSEA would be the approach. However it becomes problematic if inducers of disease conditions in the mouse would significantly divert from what causes the condition in man. In this case, one would like to understand on the one hand why quite different conditions to induce inflammation give such diverse outcome and on the other hand why particular triggers to induce inflammation show a high degree of similarity. This is especially relevant since the selection of models is in general based on introducing similar genetic defects (e.g. in KO or transgenic models) or using the same agents to induce the disease. It would be helpful if the authors more clearly point out this dilemma and discuss their observations in this context.

Authors reply:
We fully agree with the reviewer in that establishing animal models to understand human disease is a complicated task and a challenge for most of translational science. Indeed, more and more reports are being published that discuss why new therapies or interventions shown to be effective in animal studies are often less effective or ineffective in clinical trials (Hooijmans & Ritskes-Hoitinga, 2013, see the full reference below). In this context, there are several reasons discussed why animal models of disease can/can´t be reliably translated into human (van der Worp, Howells et al., 2010). Albeit we cannot solve this dilemma, we added a discussion in the according paragraph and also more clearly discussed the benefits and limitations of our method in this context (pg. 9).
We would like to emphasize that our GSEA approach will not overcome some basic difficulties in developing the appropriate animal model before any (transcript)omics data of this model have been generated. Mostly, similar agents are used in both human and mice to induce diseases with similar phenotypes. The same holds true for the introduction of genetic defects that target species homologous. This does not reflect the possible variety of outcomes resulting from the diversity of biological processes across species barriers (like mechanisms of adaptation or different pharmacokinetics upon gene deletion or treatment with similar triggers/inducers). Importantly, GSEA is not a tool to predict how to design new model systems but is an effective tool to decide how to interpret existing data in a standardized way, which may add value to the careful selection of the right animal model, thus avoiding unnecessary and misleading translational studies. The authors mention thousands of genes being used in their analysis as to not discard information that can be acquired from relatively small differences. They should indicate which gene sets they have actually used for their calculations. Are these the genes listed in Table 1? Page 3 of 16

Authors reply:
We attached the gene sets used for the GSEA analyses as Expanded View Dataset EV1 named 'Gene_sets_Inflammation_BIOCARTA_KEGG_REACTOME.gmt'. Table   1 only lists genes that are involved in the Toll receptor cascade (Reactome) pathway and that were upregulated in at least 9 of 11 datasets. Table 1 shows one result from the GSEA and comprises one of the most commonly regulated inflammatory pathways.

Critique:
Is the selection of genes as belonging to a "particular pathway" robust enough? Most pathways are only superficially known whereas especially in case of comparing different species the unidentified components could well play a prominent role.

Authors reply:
We agree with the reviewer that for many signaling pathways, especially in the field of immunology, probably not all important genes and their functions are identified so far. Thus, our biological understanding of how those pathways are exactly regulated is subject to constant improvements. However, assigning single genes to groups of similar biological functions, signaling processes and pathways is a valuable strategy to objectively include biological knowledge into the data analyses. GSEA makes use of a gene set (=pathway) permutation in order to determine the statistical significance for the measured pathway. We used 1000 permutations to create random pathways that are used as background against the defined inflammatory pathways. This strategy ensures a robust measurement for the pathway, albeit future knowledge acquisition in particular pathways will improve the GSEA approach.
Of course, comparing pathway regulation between different species can be prone to errors due to species-dependent genome constitution and regulation mechanisms, raising the main question of the current debate: how conserved is the regulation of genes and biological processes between men and mice? We hypothesized that the conservation is higher at pathway level than on single gene level, and used the GSEA approach to identify murine disease models that resemble human disorders at pathway level.

Critique:
Some sentences need revision. E.g. In the paragraph "the Paper explained" they cite two groups but with the same sentence "genomic responses in mouse models poorly mimic human inflammatory diseases". I presume that is not what they meant.

Authors reply:
We apologize for any confusion and corrected the sentence for the second group of authors accordingly ('greatly' instead of 'poorly'; pg. 13).

Critique:
It also helps the review process when page numbers are indicated.

Authors reply:
We inserted page numbers in order to facilitate the review process.

Critique:
CD14 and CD41 are both used for denoting CD14.

Authors reply:
We have corrected this term in the text (pg. 8).

Critique:
There are additional sentences that do not read properly.

Authors reply:
We are grateful for that note and carefully revised the language style of our manuscript.

Critique:
It would be helpful if the authors included an enrichment plot with normalized enrichment score and q values for false discovery rates of expression levels of genes acting in the Toll receptor pathway for both human disease and mouse models.

Authors reply:
For further improving the comprehensibility we added enrichment plots for the Tolllike receptor pathway for both human disease and mouse models in the Appendix Figure S1 of our revised manuscript. The figure shows normalized enrichment scores, nominal (uncorrected) P values and false discovery rates.

Critique:
Similarly, it would be useful to explore the similarities and dissimilarities between samples: a sample to sample heat map or sample to sample distance matrix for upregulated and downregulated genes in the inflammatory pathways in both mouse and man.

Authors reply:
We fully recognize the importance of systematically comparing upregulated and downregulated genes between human and mouse, e.g. for defining potential drug targets. However, we clearly have to admit that the identification of congruently or differently regulated single genes is i) beyond the scope of our study and ii) not possible with the approach we presented here. The aim of our study was to add value to the discussion of how to interpret data in order to identify suited animal models -based on existing data by the use of pathway-derived gene set enrichment analyses. As discussed above this could be helpful e.g. when choosing the suitable genetic background of mice strains.
However, to fulfill the reviewers request, we additionally performed principal component analyses of the inflammatory gene expression profiles to properly answer this question (see Figure Ref#1_1A below, which is not supposed to be included in the supplement of the revised manuscript). Thus, we could identify several genes that were congruently and differently regulated in mouse and men, respectively. Genes that were induced throughout all human and mouse datasets included the interleukin 1 receptor type I (IL1R2), the interleukin 1 receptor antagonist (IL1RN), the matrix metallopeptidase 9 (MMP9), the peptidoglycan recognition protein 1 (PGLYRP1) and the suppressor of cytokine signalling 3 (SOCS3) (Fig Ref#1_1B and Table Ref#1_1, which are not supposed to be included in the supplement of the revised manuscript).
However, the example of MMP9 demonstrates that promising data from animal models may not be successfully translated into clinical practice. The MMP family has been a pharmaceutical target for a long time, but none of the developed drugs has passed clinical trials so far (Fingleton, 2008, see the full reference below).
Nonetheless, the therapeutic potential of the MMP family is still of great interest because of its role in many pathological processes, including inflammation  control (blue, decreased; red, increased). Expression of IL1R2, MMP9, PGLYRP1, SOCS3 and IL1RN was concordantly increased throughout all human and mouse data sets (see Table Ref#1_1 below).

Table Ref#1_1
Expression data of genes concordantly induced in inflammatory studies in both human and mouse.
Gene expression data are presented as linear fold-change ratio over the appropriate control group with respective nominal P value.
Gene symbols refer to the human gene nomenclature. Genes were selected based on principal component analyses (Fig Ref#1_1, see above).

Referee #2 (Remarks):
Weider and collaborators tackle the question of the relevance of biological data in different mouse models to mimic and study human inflammatory diseases. This question has been addressed in two previous articles (Seok et al, 2013 and Takao&Miyakawa, 2015) using the same data resulting in contradictory results. In contrast to previous approaches, the authors follow a strategy that does not restrict the analysis to genes that are highly up or downregulated in the disease samples.
Instead, they performed a gene set enrichment analysis that compares the transcriptional modulation of predefined gene sets of pathways that are assigned to inflammatory processes in either human or mouse. This approach has the advantage of analyzing a whole set of related genes rather than selected subset of genes, and focus on the analysis of biological pathways and not individual genes.

Critique:
The authors claim that this approach is fundamentally different from assigning GO terms or pathways to genes after filtering for strongly regulated genes. I only partially agree with the authors, as a pathway where most of their gene members are strongly up or downregulated should be captured following both a gene-based approached and a GSEA method.

Authors reply:
We agree with the reviewer that for studies, in which the individual gene effect is  , Fig 4 B) in the human sepsis study, which shows a significant activation with the GSEA approach (our revised manuscript, Fig   2, dataset GSE28750, line 17). In general, a detailed comparison between our results and that of Takao and Miyakawa is not possible due to missing information concerning their data handling (e.g. it is unclear what time points were used for the human studies).

Authors reply:
The novelty of our approach has several key aspects. First, by using a GSEA approach instead of single-gene analyses we are able to circumvent any problems associated with subjectively setting of gene expression thresholds that led to opposite conclusions by Seok et al. vs. Takao&Miyakawa, thus allowing us to analyze the datasets in an unbiased, standardized manner. Second, we included all datasets presented by Seok et al. (besides the ARDS study due to lack of healthy controls), whereas Takao & Miyakawa only presented analyses for a selection of datasets. Third, by using pathway-based GSEA we focused exclusively on genes that were annotated to be involved in inflammatory processes, thus specifically addressing the (patho)physiological process of question. Forth, we thus could identify key pathways that might play dominant roles for the translation of human disease conditions to the mouse model. Fifth, our GSEA approach is able to separate mouse models with high predictivity for the human condition from those with low predictivity.
Since Takao  Indeed, there exists a fundamental qualitative difference between both approaches.
When the mouse models are ranked according to the average predictive capability using the GSEA approach on the one hand and single-gene-based approach used by Takao & Miyakawa on the other hand, the correlation between both rankings is low (r = -0.35) and statistically not significant (P = 0.36, test for correlation between paired samples using Spearmans ρ, applying the R function cor.test).

Critique:
I find the paragraph describing the results of the GSEA a bit confusing. This paragraph does not reflect the large differences between human-human, mousemouse and human-mouse comparisons that the authors claim to observe in the manuscript (they say that human datasets correlate with each other very well and that various distinct mouse datasets only showed a slight correlation). Human datasets have average positive and negative predictive values of 61%. The comparison between mouse datasets reveals an average positive and negative predictive value of 44%. The authors add that "strikingly, the overlap between mouse and human revealed average positive and negative predicted value of 48% for all human and mouse datasets". The numbers of the human-human comparison (61%) are just slightly higher than those of human-mouse comparison (48%) and the latter is even higher than the mouse-mouse comparison (44%). These numbers show that similar differences in pathway regulation are found within and across species, and suggest that these differences are due to differences in samples rather than a species differences. There is not evaluation on how significant are the differences they observe. I would suggest to do this in order to substantiate their claims.

Authors reply:
We thank the reviewer for raising this very important point and performed a statistical rank sum test with those three comparisons (h-h vs. m-m vs. h-m). We calculated a statistically significant difference (p<0.0001, Kruskal-Wallis test followed by Dunn's multiple comparisons test) between human-human and mouse-mouse correlations underpinning our claim of interspecies differences. We added this aspect in the main text (pg. 5) and amended Figure 1A of our revised manuscript accordingly (pg. 17).
We also would like to emphasize that positive and negative predictive values (which is the part of overlapping pathways) alone do not fully address the question how meaningful the overlap in pathway regulation is, since it could be a matter of chance.
Instead, the increase over estimation by chance is a better variable to evaluate the overlap in pathway regulation. As written in the main text, the increase over estimation by chance is +35%, +11% and +19% for all correlations between humanhuman, mouse-mouse and human-mouse, respectively (pg. 5). To point on this statistical aspect, we also supplemented Fig EV1 ( Thank you for the submission of your revised manuscript to EMBO Molecular Medicine. We have now received the enclosed reports from the referees that were asked to re-assess it. As you will see the reviewers are now globally supportive and I am pleased to inform you that we will be able to accept your manuscript pending final minor amendments ***** Reviewer's comments ***** Referee #1 (Comments on Novelty/Model System): It is valuable to have different approaches to assess the utility of experimental models of human disease. This manuscript add to the discussion and provides a different, likely more robust and also balanced approach. 3. Were any steps taken to minimize the effects of subjective bias when allocating animals/samples to treatment (e.g. randomization procedure)? If yes, please describe.
For animal studies, include a statement about randomization even if no randomization was used.
4.a. Were any steps taken to minimize the effects of subjective bias during group allocation or/and when assessing results (e.g. blinding of the investigator)? If yes please describe.

Data
the data were obtained and processed according to the field's best practice and are presented to reflect the results of the experiments in an accurate and unbiased manner. figure panels include only data points, measurements or observations that can be compared to each other in a scientifically meaningful way. graphs include clearly labeled error bars for independent experiments and sample sizes. Unless justified, error bars should not be shown for technical replicates. if n< 5, the individual data points from each experiment should be plotted and any statistical test employed should be justified Please fill out these boxes  (Do not worry if you cannot see all your text once you press return) a specification of the experimental system investigated (eg cell line, species name).
Each figure caption should contain the following information, for each panel where they are relevant:

Captions
The data shown in figures should satisfy the following conditions: Source Data should be included to report the data underlying graphs. Please follow the guidelines set out in the author ship guidelines on Data Presentation. a statement of how many times the experiment shown was independently replicated in the laboratory.
Any descriptions too long for the figure legend should be included in the methods section and/or with the source data.
Please ensure that the answers to the following questions are reported in the manuscript itself. We encourage you to include a specific subsection in the methods section for statistics, reagents, animal models and human subjects.
In the pink boxes below, provide the page number(s) of the manuscript draft or figure legend(s) where the information can be located. Every question should be answered. If the question is not relevant to your research, please write NA (non applicable).

B--Statistics and general methods
the assay(s) and method(s) used to carry out the reported observations and measurements an explicit mention of the biological and chemical entity(ies) that are being measured. an explicit mention of the biological and chemical entity(ies) that are altered/varied/perturbed in a controlled manner. the exact sample size (n) for each experimental group/condition, given as a number, not a range; a description of the sample collection allowing the reader to understand whether the samples represent technical or biological replicates (including how many animals, litters, cultures, etc.).

Reporting Checklist For Life Sciences Articles (Rev. July 2015)
This checklist is used to ensure good reporting standards and to improve the reproducibility of published results. These guidelines are consistent with the Principles and Guidelines for Reporting Preclinical Research issued by the NIH in 2014. Please follow the journal's authorship guidelines in preparing your manuscript. PLEASE NOTE THAT THIS CHECKLIST WILL BE PUBLISHED ALONGSIDE YOUR PAPER