Big Data in Large-Scale Systemic Mouse Phenotyping

Systemic phenotyping of mutant mice has been established at large scale in the last decade as a new tool to uncover the relations between genotype, phenotype and environment. Recent advances in that field led to the generation of a valuable open access data resource that can be used to better understanding the underlying causes for human diseases. From an ethical perspective, systemic phenotyping significantly contributes to the reduction of experimental animals and the refinement of animal experiments by enforcing standardisation efforts. There are particular logistical, experimental and analytical challenges of systemic large-scale mouse phenotyping. On all levels, IT solutions are critical to implement and efficiently support breeding, phenotyping and data analysis processes that lead to the generation of high-quality systemic phenotyping data accessible for the scientific community.


Introduction
Phenotypic characterisation of mutant mouse lines has been used as a tool to study gene function since many decades. Usually, only a limited set of phenotypic traits is analysed in hypotheses-testing studies. Two decades ago, another paradigm of mouse phenotyping has emerged -the idea of systemic phenotyping. Large-scale random (ENUdriven) mutagenesis projects tried to identify novel phenotype-causing mutations by characterising every potential mutant for a large set of phenotypes in many organs [1][2][3]. Such genome-wide screening required a large panel of test procedures covering the whole physiological system of a mouse -thus the term systemic phenotyping. This led to the formation of so-called mouse clinics, with the German Mouse Clinic (GMC) as the first one established in 2001 [4][5][6][7][8]. Mouse clinics implement standardised breeding and M A N U S C R I P T

A C C E P T E D ACCEPTED MANUSCRIPT
phenotyping pipelines for large-scale production of phenotype data to generate evidence-based hypotheses. Applying standardized systemic phenotyping on cohorts of mutant mouse strains is also known as "primary screening" [8], describing the fact of providing a first and unbiased, hypothesis-free look for phenotypic deviations in such mutants. As a next step, consortia of mouse clinics were formed in projects like Eumorphia and Eumodic [9][10][11][12] on the European level as well as the on-going International Mouse Phenotyping Consortium (IMPC) [13,14] on a global scale. Thus, in this review, the term systemic phenotyping also implies a large-scale, high-throughput component.

A rationale for large-scale systemic mouse phenotyping
Systemic phenotyping is no replacement for specific, hypothesis-driven mouse studies.
Rather, primary screening of mutant mouse lines helps generating new hypotheses by providing a full picture of the system-wide effects of a specific genotype mutation.
Pleiotropic effects (i.e. one gene influences more than one phenotypic trait), which often are not visible in hypothesis-driven, focussed studies, can be uncovered this way. This has been shown many times by mouse clinics [15][16][17][18][19][20]. For example, a recent joint study showed that spermidine treatment protects the heart from age associated deterioriatons and leads to life time extention. The cardioprotective effects of spermidine may be due to several underlying mechanisms, including both direct cardiac effects and extracardiac (systemic and renal) effects. Systemic effects by spermidine might involve antiinflammatory processes, as well as a blood-pressure-lowering effect [17].

M A N U S C R I P T A C C E P T E D ACCEPTED MANUSCRIPT
Systemic phenotyping can be understood as physiology-wide or alternatively, as a phenome-wide approach. On a genome scale, the IMPC aims to produce an encyclopaedia of gene function for all mouse genes [14].
Evidently, an open-access resource of well-structured systemic phenotype data can be subjected much better to data mining methods in order to identify biological mechanisms that cannot be uncovered otherwise. This is impressively supported by a series of first publications from the Eumodic and the IMPC projects [21][22][23][24][25]. For example, in order to study sexual dimorphism, IMPC scientists analysed up to 234 characteristics of more than 50,000 mice, including over 40,000 mutant mice. It was shown that sex influences the prevalence, course and severity of the majority of common diseases and disorders [24]. The IMPC provides full public access to the generated results, including data visualisation tools and machine-to-machine interfaces (APIs) [26]. Phenotypic similarities between inherited human diseases and knockout mouse lines are presented at the IMPC webpage and can be used to find suitable disease models for clinical researchers [27].
Due to the rather large panel of tests applied on a single animal, systemic phenotyping requires a well-defined pipeline -the composition as well as the order and time point of every test procedure. An example of such a primary screening pipeline from the GMC is shown in table 1.
Standardisation also applies on phenotyping methods. In the IMPC, IMPReSS

M A N U S C R I P T A C C E P T E D ACCEPTED MANUSCRIPT
Such standardisation efforts ensure good scientific practice, improve reproducibility, enhance data quality and make data more suitable for common analysis methods [28].
From an animal welfare perspective, systemic phenotyping directly means reduction of experimental animal use, as several hundred phenotypic parameters can be measured on an individual animal.

Systemic phenotyping as a process
Mouse clinic operations can best be described and handled in form of modular processes. A generic mouse clinic business process model has originally been developed at the GMC, but can also be applied to other mouse clinics, as shown in [29]. Large-scale mouse phenotyping requires a well-organised mutant generation and breeding pipeline to provide enough age-matched mice of desired genotype for phenotyping. Figure 1 shows a simplified version of such a process model as a flowchart.

The importance of electronic data management
Systemic phenotyping produces large amounts of data, which need to be highly structured to be suitable for subsequent software-assisted data processing. For instance, any measured parameter (e.g. blood glucose concentration) needs attributes like data type (float, integer, text) or unit (mmol/l, g). Also demographic data (sex, genotype, date of birth) is captured and stored for every mouse. Customised LIMS have been developed at different phenotyping facilities, which is described and discussed in detail in [29].

Standardised quality control and data analysis
In large-scale systemic phenotyping, data needs to undergo standardised procedures for quality control (QC) due to the sheer amount of data and the possibility for errors. In the IMPC, a thorough QC process has been established at the Data Coordination Centre PhenoDCC [30]. The process involves automated and manual checking of data consistency and out of range data points as well as a ticket system to track possible data quality issues. Such massive data quality control and standardisation efforts are usually not possible in individual research labs.
Automated data analysis is crucial for working with large-scale phenotyping data. At the GMC, standardised R scripts [31] for data visualisation and statistics are developed for M A N U S C R I P T A C C E P T E D ACCEPTED MANUSCRIPT every single phenotyping test and routinely applied to the data in order to determine genotype-related phenotype deviations. Another such toolkit, PhenStat, has been developed by the IMPC consortium for the same purpose [32].
Systemic phenotyping is also challenged with the n<<p problem, because a large number of parameters (p) is measured on a single mouse, while the sample size (n) is much lower. In the standard pipeline of the GMC, a cohort size of n=15 animals per sex and genotype is used (15+15 male/female mutants, 15+15 male/female controls), while in the IMPC pipeline, this number is lower (n=7). Sample size n directly affects the α and β error probabilities (false positives and false negatives). Keeping them low demands for increasing the sample size, which would result in higher statistical power, but also higher costs and animal use. Thus, the currently chosen numbers for n are a trade-off between these opposing requirements.
A further statistical challenge is the multiple testing problem. Simultaneously testing large numbers of parameters with inferential methods, requires α adjustment to avoid inflation of false positive detection rate.
Minimising metadata variability is another requirement. For instance, mutant and control mice should be measured by the same experimenter. Otherwise, a positive result could just indicate a possible experimenter influence rather than a genotype effect.
Naturally, procedures that use a human scoring step are most prone to experimenter bias, although this can be reduced by experimenter training and procedure standardisation. While human scoring is not part of every test procedure, experimenter metadata is still routinely captured to allow retrospective studies of such influence on a growing data pool. Thus, experimenter bias can routinely be monitored and corrected for.

Standardisation of phenotyping results -the use of ontologies
Direct raw data comparison is not always possible, since metadata may differ between mouse clinics and accordingly, shifts in data ranges can be observed. In such cases, mutant and control mice usually exhibit a similar shift and statistical analysis still leads to comparable results between clinics. However, a qualitative results level ("phenotypic difference yes/no?") is required in order to facilitate data comparison.
Ontologies provide a perfect solution here. Ontologies are hierarchically organised structures of controlled vocabulary. The Mammalian Phenotype Ontology (MP) [33] currently includes almost 13.000 classes to describe any mammalian phenotype in a standardised way, e.g. "MP:0005559 increased circulating glucose level". Providing unique IDs, MP terms allow systematic and programmatic exploitation of phenotyping result databases [34]. The assignment of a distinct MP term to a mutant mouse line is based on statistical analysis of raw data to make a binary decision between "MP term assigned" and "MP term not assigned". The possibility of cross-linking phenotyping results with other public databases allows mapping of mouse phenotyping results to phenotyping results of other species, e.g. the Human Phenotype Ontology (HPO) [35,36]. Building such "data bridges" from biology to medicine have been subject of the recent EU-funded BioMedBridges project (http://www.biomedbridges.eu) and is followed up in the CORBEL project (http://www.corbel-project.eu).

M A N U S C R I P T A C C E P T E D ACCEPTED MANUSCRIPT
Working with systemic phenotyping data -interactive, data mining and multivariate approaches While the use of ontologies requires bioinformatics expertise, a much simpler and even more intuitive approach is to apply so-called phenomaps, an adaptation of the heatmap visualisation well-known from the transcriptomics field. In this case, the phenotyping results are drastically reduced to a qualitative yes/no statement. As shown in figure 2, a simple matrix of mutant lines vs. physiological category allows intuitive identification of mutant mouse lines of interest for the non-expert user. An application of phenomaps is the use of clustering methods to identify mutant mouse lines that show a similar overall or partial phenotype profile.
A still very intuitive, however quantitative approach is using the full raw data set. It can be applied for interval-scaled phenotype parameters. For a given parameter, the mean value of mutants is divided by the mean value of control animals to form a mutant/control ratio. Mutant/control ratios from many lines can be plotted as a histogram, as shown in figure 3. In the resulting distribution, mutant/control ratios near 1.0 correspond to "no genotype-related phenotype deviation", whereas mutant/control ratios at the left and right margins of the histogram mean "decreased/increased parameter phenotype". Being quantitative, this allows applying an individual threshold to factor in biological relevance. For example, this method can be used to rapidly identify mutant mouse lines showing an extreme deviation from blood glucose levels compared to control animals by selecting lower and upper 5% from a distribution of several thousand mutant lines -these can be considered candidate genes for a "low/high glucose" phenotype and put in a gene set enrichment analysis.

M A N U S C R I P T A C C E P T E D ACCEPTED MANUSCRIPT
More advanced data mining methods include data integration approaches to link phenotype information with other public databases (e.g. Gene Ontology (GO) [37] or KEGG [38,39]) and then apply data mining algorithms or gene set enrichment analysis (GSEA). For instance, PhenoDigm (Phenotype comparisons for DIsease Genes and Models) [40] uses an association rule mining approach to automatically integrate data from a variety of model organisms using several scoring methods to identify only Phenotypic readout for a given disease or syndrome typically involves several phenotypic parameters. A well-known example is the metabolic syndrome, which involves abdominal obesity, elevated blood pressure, plasma glucose, serum triglycerides and low HDL levels [41,42]. Multivariate methods are therefore needed to address the large-scale analysis of such phenotypic patterns and comorbidity. Clustering and heatmap displays of phenotype data, as performed in [22], support the visual identification of such patterns. However, this approach has limitations, e.g. the handling of missing data.
In general, missing data is frequently observed in large-scale mouse phenotyping.
Reasons are: phenotyping may not have been performed due to animal welfare procedures. Far more frequent are scheduling issues: as every test is scheduled for a particular age of mice, no data can be taken if a test is skipped due to a broken phenotyping device. However, a certain missing data portion is counterbalanced by the M A N U S C R I P T A C C E P T E D ACCEPTED MANUSCRIPT advantage of having a large, consistent data set that cannot be obtained by collecting data from individual labs in a meta study approach.

Conclusions
The complex logistics to organise high-throughput phenotyping, to manage large data sets and to ensure standardized and transparent QC and analysis of the data, is a major challenge in large-scale systemic phenotyping performed by mouse clinics. Electronic data management solutions support these processes. Large-scale data sets obtained in well-structured, quality controlled and standardized format enable further and comprehensive analysis of systemic phenotyping data, even across different centres.
Employing data analysis tools linking phenotypic traits to known biological pathways and other information in public databases, aims for the discovery of disease-associated network of genes that can be investigated as a next step in more depth. Importantly, these data sets are the prerequisite to reach a new level in combining data sets from different species and disciplines to unravel the complexity of health and disease.
Currently, about 1/4 of all mouse genes have phenotyping data in the IMPC project.
However, phenotyping data collection continues while at the same time, automation (e.g. histological image analysis), data integration and analysis methods (e.g. machine learning) are further developed, investigated and improved. During the next years, this will result in a high quality and highly annotated comprehensive phenotyping data set for every mouse gene. The complete, publicly available data set and the associated methods and tools will provide a valuable ressource for big data projects involving mammalian gene function. Phenotyping consortium (IMPC) led to the discovery of 410 genes whose genetic deactivation impaired the development of embryos so strongly that they were not viable. In addition mutations in further 198 genes led to fewer offspring. Interestingly, many of the essential genes found here also play a key role in human diseases.  Table Legends   Table 1 Overview of the primary phenotyping pipeline performed by the German Mouse

Clinic
The tests of the primary phenotyping pipeline cover all relevant organ systems in order to have a full picture of disease-associated alterations a deficient gene might cause in the organism. The first column ("Screens") names the principal physiological field respectively the organ system, the second column ("Methods") specifies the applied phenotyping procedure, e.g. IpGTT (Intraperitoneal glucose tolerance test).

Systemic phenotyping as a process
Systemic phenotyping is not an isolated task, but a complex process embedded in a whole process workflow. Mouse clinic operations can be described and handled in form of modular processes to cope with the complex logistics needed for the different areas.
In the top row, basic processes of systemic phenotypic are depicted in boxes, connected by arrows. In the middle row, processes are described in more details, starting from timely production of age-matched mutant and control cohorts, performing the actual phenotyping procedures according to the pipeline and finally the assignment of MP terms after data quality control and analysis. Throughout the whole process, tight scheduling and tracking of activities and resources is necessary (bottom row) in order to identify workflow problems and to ensure continuous data flow.