Comparative systems biology between human and animal models based on next-generation sequencing methods

: Animal models provide myriad benefits to both experimental and clinical research. Unfortunately, in many situations, they fall short of expected results or provide contradictory results. In part, this can be the result of traditional molecular biological approaches that are relatively inefficient in elucidating underlying molecular mechanism. To improve the efficacy of animal models, a technological breakthrough is required. The growing availability and application of the high-throughput methods make systematic comparisons between human and animal models easier to perform. In the present study, we introduce the concept of the comparative systems biology, which we define as “ comparisons of biological systems in different states or species used to achieve an integrated understanding of life forms with all their characteristic complexity of interactions at multiple levels ”. Furthermore, we discuss the applications of RNA-seq and ChIP-seq technologies to comparative systems biology between human and animal models and assess the potential applications for this approach in the future studies.

Accurately modeling the physiology and pathology of human systems research requires the establishment of a quality animal model (Alvarado & Tsonis 2006;Francia et al, 2011Francia et al, , 2006Götz & Lttner 2008;Hasenfuss 1998;Lieschke & Currie, 2007). To this end, generally, how closely the model should mimic the human disease depends on the scientific question under investigation. Only in cases when the causal connections-structure function relationship or regulation of gene expressionare definitive, can the differences between human and animal models have minor effect on the analysis results (Hasenfuss, 1998). For example, although the zebrafish (Danio rerio) is phylogenetically distant from humans, its use as a complete animal model for in vivo drug discovery and development is growing rapidly (Chakraborty et al, 2009). However, if the pathophysiological processes are studied, especially for the complex diseases, then models should mimic clinical settings as closely as possible, otherwise the expected results may not be achieved or the findings of such studies will be of limited value. Accordingly, comparisons between human and animal models are becoming increasingly important for both clinical and fundamental applications (Alini et al, 2008;Cox et al, 2009;Fuentes et al, 2009;Huh et al, 2010;Merchenthaler & Shughrue, 2005;Nestler & Hyman, 2010;Northoff, 2009). Among the available strategies to assess this connection, comparative systems biology has begun attracting special attention (Cox et al, 2009). 1 In this review, we introduce the concept of comparative systems biology. Next, we focus on the applications of next-generation sequencing methods, including RNA-seq and ChIP-seq, to comparative systems biology between human and animal models, before outlining some general directions of future developments and impacts of these types of studies.

The rise of comparative systems biology
One of the greatest twentieth century achievements in biological research is undoubtedly the sequencing of different genomes. There are now complete genome sequences for more than 1,000 organisms (excluding bacteria and archaea), with more sequences being completed (Henkelman, 2010). Once the genome of a species is available, researchers are able to begin mapping sequences against humans and find candidate disease genes and build a proper disease model. However, the ability to fundamentally understand the genotypephenotype relationship in a distinct species is often hindered by the inherent complexity of biological systems. The difference in genotype-phenotype relationships between human and animal models may originate from three sources ( Figure 1): (1) functional divergence of genes or proteins; (2) gene deletions or duplications; and (3) divergent up-or down-stream components, out of which gene deletions or duplications may play the leading role (Jaillon et al, 2004). The difference in genotype-phenotype relationships between human and animal models may originate from three sources: (A) functional divergence of genes or proteins; (B) gene deletions or duplications; and (C) divergent up-or down-stream components, out of which gene deletions or duplications may play the leading role. In the schematic drawing, Gene A and Gene A' are orthologs while Gene A' and Gene A'' are paralogs due to gene duplication.
Over the last decade, this third mechanism has received more attention in systems biology. The Rb (Retinoblastoma) gene family is a good case, because the members in this family are functionally conserved while the involved pathways are divergent between C. elegans and humans (van den Heuvel & Dyson, 2008). Likewise, a previous study reported that over 20% of the essential genes for humans are non-essential for mice (Liao & Zhang, 2008). Consequently, traditional molecular biology techniques, while providing valuable insights into individual and/or simple genotype-phenotype relationship, are insufficient in deducing the complex phenotype-genotype relationships. Therefore, the more systematic methods at the systems biology level are necessary.
The ultimate goal of systems biology is generating successful models to comprehensively describe living organisms. Comparative systems biology, an important subfield of systems biology, has no straightforward definition. In animal model research, the term first appeared in Ogawa et al's (2008) work, reporting a comparative study of circadian oscillatory network models of Drosophila. Here, we define comparative systems biology as "comparisons of biological systems in different states or species to achieve an integrated understanding of life forms with all their characteristic complexity of interactions at multiple levels." The comparison can be performed either horizontally (e.g., between individuals or states) or longitudinally (between species). The latter, which is mainly focused on human and animal models, is reviewed in detail here.
Over the past decade, comparative systems biology has attracted widespread interest, especially for its utility in comparisons between human and animal models of complex diseases. Miller et al (2010) used a systems biology approach to find a number of divergent network modules relevant to Alzheimer disease between humans and mice. In a previous work, we compared humans and four common animal models of cardiovascular disease through comparative transcriptome and pathway analysis, revealing that a few pathways have functionally diverged (Zhao et al, 2012). A recent review highlighted that the emerging technologies in comparative systems biology between human and animal models offers a platform to systematically explore not only the molecular mechanism of a particular disease, thus leading to the identification of disease modules and pathways, but also the molecular relationships among distinct (patho)phenotypes (Barabasi et al, 2011).
The majority of recent comparative systems biology studies on obtain their data through traditional high throughput technologies, such as microarray and ChIPchip. Despite the experimental and statistical rigor as well as substantial insights gained through these methods, there has been a fundamental shift from these first-generation technologies (microarray and ChIP-chip) to next-generation sequencing (RNA-seq and ChIP-seq) over the last five years. We surmise that the applications of next-generation sequencing methods will serve a crucial function in the field of comparative systems biology between human and animal models, offering a number of potential advantages.

RNA-seq in transcriptome studies
Previous studies demonstrated that changes in gene expression underlie many or even most of phenotypic differences between species (Marques et al, 2008;Yanai et al, 2004). As a result, comparative transcriptome analysis potentially provides information on functional conservation for candidate human disease genes within animal models.
Initial trancriptomics studies largely relied on hybridization-based microarray technologies and have yielded valuable insights into the functional divergence between human and model animals (Enard et al, 2002;Liao & Zhang 2006). However, microarray technology has several limitations: over reliance upon existing knowledge about genome sequences; high background levels owing to cross-hybridization; and a limited dynamic range of detection owing to both background and saturation of signals (Wang et al, 2009). Recent advances in the DNA sequencing technology have enabled sequencing of cDNA derived from cellular RNA by massively parallel sequencing strategies, a process termed RNA-seq (Garber et al, 2011;Mortazavi et al, 2008). Compared with the microarray, RNA-seq has the advantage of allowing high-resolution characterization and quantification of transcriptomes with low background noise and the ability to distinguish different isoforms. Figure 2 shows the key procedures performed during RNA-seq analysis of comparative transcriptomes between human and animal models. The computational challenges in this process have been reviewed in detail by (Garber et al, 2011), therefore, we mainly illustrated the potential advantages of RNA-seq in comparative systems biology, including (a) comparisons between human and non-model animals, and (b) actual biological systems induced by the states of gene expression. There are two strategies for sequencing animal models. If the genome was not complete or was badly annotated, the genome-independent approach should be used (right part). The genome-guided approach is more typical (left part).
Though a variety of organisms have been genomically sequenced, the majority of these are used as model organisms. Since microarray relies on the genome information, this technique has serious limitations in both quantifying and comparing gene expression profiles from non-model animals. RNA-seq, meanwhile, can be applied to reconstruct the complete and high-resolution transcriptomes across all species. To build the transcriptome, several methods based on RNA-seq have been developed, usually falling into two main classes: the 'genome-guided' (Guttman et al, 2010;Trapnell et al, 2010) and genome-independent classes (De novo assembly) (Birol et al, 2009;Schulz et al, 2012). The first methods rely on a reference genome to initially map all the RNA-seq reads to the genome and then assemble overlapping reads into transcripts. Unfortunately, the genome-guided method is not always effective, both because despite a large drop in the cost of nextgeneration sequencing, the study of a complete genome is still costly and difficult, especially for non-model organisms, and because the particular model being studied may be sufficiently different from its reference

Zoological Research
www.zoores.ac.cn genome because it comes from a different strain or line. Consequently, de novo assembly is particularly suitable for application to obtain accurate reconstructions. A recent study reported a large RNA-seq data set obtained from six organs of nine different mammals (human, chimpanzee, bonobo, gorilla, orangutan, macaque, mouse, opossum, and platypus) and one bird (chicken), including both males and females (Brawand et al, 2011), demonstrated the utility of applying comparative systems biology between human and non-model animals and elucidated the large evolutionary gaps among these model organisms.
Determining the expression states (i.e., the presence or absence) of genes with low abundance is a challenge for microarray. Consequently, the reconstruction of the actual biological networks (e.g., protein-protein interaction, transcriptional regulation network, or metabolic network) in either human or animal models in a specific condition is very difficult, not to say anything of the difficulty in comparing the dynamic networks (Farmer et al, 2012). Moreover, abnormal variations in alternative splicing are also implicated in disease, thus alternative splicing is a critical factor to consider in building a proper and viable animal model (Luco et al, 2011). Unfortunately, obtain the precise alternative splicing map using the microarray technique is almost impossible.
RNA-seq data is highly replicable with relatively little technical variation. For many purposes, RNA-seq may be sufficient to sequence each mRNA sample once. The information obtained in a single lane of RNA-seq data appears to be comparable to that in a single array, and is therefore useful in enabling the identification of differentially expressed genes and allowing for additional, further analyses, such as detection of low-expressed genes, novel transcripts and alternative splice variants. In using this method, researchers can obtain actual biological networks in both human and animal models, and garner biologically meaningful results by comparing between these two networks. Rowley et al (2011), for example, compared the actual transcriptome in platelets between humans and mice, providing critical information used in the design of mouse models of hemostasis and in catalyzing the discovery of new platelet functions..

ChIP-seq for detecting regulation changes
Molecular interactions between proteins and DNA play an essential role in the regulation of gene expression (Cawley et al, 2004;Pokholok et al, 2006). Accordingly, changes in protein-DNA interactions between human and animal models may lead to the divergent functions of homologous pathways (Brown et al, 2011;Greber et al, 2010), which is also an important aspect of comparative systems biology.
Chromatin immunoprecipitation (ChIP) followed by genomic tiling microarray hybridization (ChIP-chip) has become the most widely used approach for genome-wide identification and characterization of in vivo protein-DNA interactions during the past decade (Ho et al, 2011). Specifically, when applied to the study of animal models of human disease, CHIP-chip approaches led to many important discoveries in relation to transcriptional regulation (Chen et al, 2008), epigenetic regulation through histone modification (Heintzman et al, 2007), and evolution of protein-DNA interactions (Kim et al, 2007).
Like the microarray technique, CHIP-chip also has some limitations arising from the innate characteristics of microarray hybridization. Chromatin immunoprecipitation followed by sequencing (ChIP-seq) makes it possible to obtain the accurate information about the genome-wide profiling of DNA-protein interaction. Compared to the CHIP-chip, ChIP-seq has a higher resolution, fewer artifacts, a larger coverage and a more extensive dynamic range (Blow et al, 2010;Johnson et al, 2007;Mardis 2007;Schmid & Bucher, 2007;Visel et al, 2009). Subsequently, we will introduce the practical applications of ChIP-seq in comparison between human and animals, including (1) identifying the regulatory sequences, and (2) tracing the evolution of epigenetic regulation.
The human genome project, while obtaining the complete genomic sequences, leaves open the question of how to identify the regulatory sequences that control the spatial and temporal expression of genes unanswered (Birney et al, 2007;McGaughey et al, 2008). Through applying the ChIP-seq techniques with the enhancerassociated protein p300 from mouse embryonic heart tissue, Blow et al (2010) made an attempt to identify candidate heart enhancers on genomic scale, revealing that most of the candidate heart enhancers were less deeply conserved in vertebrate evolution when compared to the enhancers that are active in other tissues. Such methods could also be applied to identification of other transcriptional factors (TFs), and therefore are helpful in the reconstruction of the transcriptional regulation network in human and animal models. Thankfully, the decreasing cost of ChIP-seq has extended the comparative systems biology investigation to some TFs. For example, Schmidt et al (2010) used ChIP-seq to determine experimentally the genome-wide occupancy of two TFs, i.e., CCAAT/enhancer-binding protein alpha and hepatocyte nuclear factor 4 alpha, in the livers of five vertebrates, revealing large interspecies differences in transcriptional regulation and providing insight into the evolution of regulatory networks.
Epigenetic regulation is now accepted as being closely associated with human development, and subsequently many developmental disorders may be caused by the dysfunction of this regulation (Gottesman & Hanson, 2005). However, due to the deficient knowledge of this phenomena in other animals, build proper animal models for these studies is difficult. Nevertheless, a recent study that employed the CHIP-seq technique to investigate the epigenetic regulation of histone H3 K4 on frogs (Xenopus tropicalis), revealed a hierarchy in the spatial control of zygotic gene activation (Akkers et al, 2009). Taken together, these advances lead us to speculate that the applications of CHIP-seq in comparative systems biology will be of great help in understanding embryonic diseases.
Despite the advances that ChIP-seq offers, researchers should be cautious when performing ChIPseq analysis because the experimental steps in ChIP-seq involve several potential sources of artefacts (Park, 2009). For example, one challenge in this technique is that the identified enriched regions are of different types for different proteins (for details, refer to (Park, 2009)). The other potential source of artefacts comes from the divergence of both protein and DNA; therefore when using this analysis, the control experiment should be designed carefully.

Perspective applications of comparative systems biology
Comparative systems biology takes advantage of the systematic information from other organisms and can be used to great effect in studying human physiology and disease. Over the coming years, we expect many exciting developments as this field evolves in several potential directions.

Dynamic networks
Biological systems exhibit complex dynamic behavior, enabling cells to react to various conditions or cell states such as cell cycle progression (Zhu et al, 2007). Although static biological systems have been well studied (Benfey & Mitchell-Olds, 2008;Gianchandani et al, 2006;Macilwain, 2011;Werner, 2007), the information gained from such studies is of limited use in moving forward due to the fact that the static interactions are often identified from cells exposed to a single condition or at a single time point, i.e., under nonnative conditions. Only recently have approaches emerged that attempt to analyze the dynamics of complex biological networks. For transcriptional regulatory interactions, ChIP-seq technology is likely to become increasingly popular as it can be used to uncover contextual and temporal variation. For context-specific metabolic network, RNA-seq could provide the dynamic states of metabolic enzymes.

Biological engineering
The ability to manipulate living organisms is at the heart of a range of emerging technologies aimed at addressing critical problems in environment, energy, and health. Because of their complexity and interconnectivity, however, animal models have been less than useful for engineered manipulation. To move forward with employing animal models with greater breadth and application, we vitally need more detailed information that can be obtained using new methods like those outlined in the present study. for instance utilizing real-time RNA-seq technique to obtain the information about the effects of perturbations on biological systems (Faith et al, 2011). Next-generation sequencing technology and the concurrent development of applications for it are a fast-moving area of biomedical research that greatly advance the development of comparative systems biology.