A systems-based framework for understanding complex metabolic and cardiovascular disorders.

Common forms of metabolic and cardiovascular diseases involve the interplay of numerous genes as well as important environmental factors. Traditional biochemical and genetic approaches generally attempt to dissect these diseases one gene at a time, for example, by analysis of Mendelian forms or genetically engineered experimental organisms. But, it is also important to understand how the genes interact with each other and the environment, and how these interactions change in disease states. Technological advances, such as the development of expression arrays that allow quantification of all transcript levels in a cell or tissue, have made it feasible to globally monitor molecular phenotypes that underlie disease states. By applying statistical methods, relationships between DNA variation, gene expression patterns, and diseases can be modeled.

Common forms of metabolic and cardiovascular diseases are exceptionally complex. Although only a small fraction of the underlying genes has been identified, it seems likely that common or rare variants of hundreds or even thousands of genes will be involved (1,2). And, of course, environmental factors such as overnutrition, sedentary lifestyle, and smoking play a crucial role. Most biochemical and genetic studies to date, such as those involving transgenic animals, have focused on identifying and characterizing the individual genes that contribute to disease (3). Likewise, genome-wide association and other genetic approaches typically focus on finding the specific genes responsible for the association (4). While these approaches continue to be informative, it is also important to address the interactions between genes and the environment (5). In this brief review, we discuss how systems-based approaches involving the integration of genomic, molecular and physiological/clinical data, can complement traditional approaches to address the complexity of these disorders.
A systems perspective on disease envisions the integration of multiple elements, from genome through phenotype as depicted in Fig. 1. The technological advances that allow large-scale and high-throughput quantification of elements at each level are critical. Thus far, only genome and transcriptome come near the level necessary, and so we focus on these as exemplary of the systems approach.

IDENTIFYING DISEASE-ASSOCIATED GENES BY INTEGRATING GENETIC AND GENE EXPRESSION INFORMATION
Candidate gene studies and recent genome-wide association studies have identified an impressive list of genes contributing to complex disorders such as atherosclerosis (6), hyperlipidemia (7,8), obesity (9), and diabetes (10), but altogether these account for a minority of the genetic effect on disease development. This suggests that many genes carrying small to modest effect contribute to complex disease, as has long been postulated. The established biology underlying these disorders has identified at least hundreds of genes for each. Genetic studies in humans and animal models have made significant contributions to elucidating the pathogenesis of complex diseases, though the number of specific genes identified has been only a fraction of the total expected to be involved for any given disease. Accordingly, one of the major challenges in studying complex disease is to understand how genes that carry causal variants interact with each other, and with "downstream" genes, to regulate disease expression.
Traditional genetic studies in humans or animal models establish the relationship between genotype and pheno-type, but provide little inference as to the intervening biology or, correspondingly, guidance for selecting likely causative genes. Incorporating global gene expression analyses into genetic studies has significantly enhanced both these aspects (1,11,12). Gene expression can be measured as a quantitative trait and has been observed to be highly heritable (12)(13)(14). Linkage or association analysis can be employed to identify genetic loci or single nucleotide polymorphisms perturbing abundance and activity of gene products. The linkage-based identification of loci for gene expression is termed eQTL (expression quantitative trait locus) mapping, and the same concept can be extended to discover single nucleotide polymorphisms associated with transcript abundance, termed eSNP (expression single nucleotide polymorphism) (15,16). Observation of eQTL for a gene indicates that genetic factors are partially responsible for its transcript abundance. Based on proximity between genetic factors influencing gene expression and the location of the gene, eQTL can be categorized as cis-or trans-eQTL. When an eQTL localizes closely to the location of the gene coding the transcript, it is likely that the causative genetic variations resides within the gene or its regulator elements and directly influence transcription, or transcript stability, acting in a cis manner, and thus termed cis-eQTL. Conversely, when eQTL does not encompass the physical location of the gene and flanking regions, the gene is defined as being regulated in trans manner, termed trans-eQTL ( Fig. 2A). In several genetic crosses between inbred strains of mice, hundreds or thousands of expressed genes in a given tissue have at least one eQTL, including both cis and trans ones. Among all observed eQTLs in crosses of several hundred or more mice, the total number of trans-eQTL is approximately 10 times higher than cis-eQTL.
Cis-acting eQTL are obvious candidates for genes underlying a phenotypic trait (the quantitative trait gene or QTG) when the eQTL and the trait quantitative trait loci analysis (QTL) coincide. Though causative mutations may not act by altering transcript levels, many do. Applying this concept can significantly reduce the time and effort involved in positional cloning of genes. For example, this was a critical factor in our discovery of Abcc6 as the major causal gene in cardiac calcification in a genetically random-ized mouse population (17). Coincidence of trans-eQTL and trait QTL may also be informative for downstream genes involved in trait expression as discussed below.

INTEGRATIVE GENETICS ALLOWS CAUSAL INFERENCE BETWEEN TRAIT-TRANSCRIPT CORRELATIONS
The application of global gene expression analyses to studies of cells and tissues has provided a wealth of data relevant to complex diseases. Finding statistically significant correlations between a trait and particular genes suggests a biologic relationship between them (18,19). Many databases and analytical tools have been developed to help identify and characterize which functional classes of genes may be involved in a given process. Gene Set Enrichment Analysis (20), Database for Annotation Visualization and Integrated Discovery (21,22), and Ingenuity Pathway Analysis are analytical tools developed to test the enrichment of particular biological processes and molecular functions of gene sets by examining information collected by databases such as Gene Ontology and Kyoto Encyclopedia of Genes and Genomes pathway.
However, as has long been recognized in statistics, correlation does not prove causality. In complex diseases, genes may be correlated with a given trait because they are directly involved in the development of that process (i.e., "causal") or because the process itself secondarily alters the expression of the genes ("reactive"). A major contribution of integrative genetics has been the development of analytical tools to allow causal inferences to be made between correlated genes and traits when genetic data are incorporated (23,24). This is elaborated in Fig. 2B, which shows possible relationships between trait, transcript, and genetic location when correlations among these are observed. Because genetic variation is for practical purposes always primary, this can be utilized to order the relationship between transcript and trait.
Several analytical approaches have been proposed to assess potential causality in this setting, and undoubtedly more will be developed given the importance of the problem. Schadt et al. (23) developed the likelihood-based causality model selection procedure and applied it to predicting liver expressed genes causal for abdominal obesity in a mouse intercross setting. Subsequent studies using transgenic or knockout models have validated eight of nine predicted genes. Structural equation modeling has also been applied to assess directionality between correlated traits and transcripts, termed edge orientation. An algorithm based on this approach was developed by Aten et al. (25), termed Network Edge Orientation.

CONSTRUCTING GENE EXPRESSION NETWORKS FOR COMPLEX DISEASES
Rather than uncovering single genes for complex diseases such as atherosclerosis, a systems-based perspective is interested in elucidating the interactions of genes and environment operating on a complex multicellular biological system (26). Such a systems approach involves modeling the relationship among elements of the system, such as transcript levels in the form of a network. Two major modeling approaches have been employed to decipher network patterns underlying complex traits: forward and reverse engineering. Forward approaches apply a set of equations generated a priori from previously defined biologic relationships that are then tested and revised as needed. This approach generally is used with small-scale network formation. The reverse approach does not apply a predefined set of relationships. Rather it utilizes general mathematical tools for network construction and lets the data itself define the relationships among the elements being studied, such as transcript level (27). This approach typically utilizes large data sets and is computationally intensive. In the setting of data obtaining from populations with genetic and or environmental variations, such analyses allow one to infer the relationship and interaction among all such elements. This is valuable for studying complex diseases, because we do not yet adequately understand the relationships between gene expression and trait variability.
Network analysis provides a useful framework to identify and visualize interactions among genes, by creating a graphic model. A network is composed of elements, such as specific gene transcripts (referred to technically as nodes), and connections (relationships) among these ("edges"). Edges can indicate a relationship between genes as at transcript level, protein interaction pattern, and any other measure-ment that describes a meaningful association between two elements of the system. A gene transcriptional network is composed of individual gene transcripts as nodes, while the edges represent a measure of pair-wise correlation of transcript levels. A given gene can correlate with multiple genes, and a measure of the relative number of such connections is referred to as connectivity (28,29). An important feature of network constructed with biological data is the "scale-free" nature. In a scale-free network, there is a small number of highly connected genes and many more with far fewer connections. Such highly connected genes are often referred to as hubs. Targeting hub genes have been found to disrupt the structure of gene networks and are more likely to impact biological processes when disrupted in animal models. For example, in our analyses of coexpression networks for activated endothelial cells, Atf4, Xbp1, and Insig1 were identified as hub genes (among others) (30). Targeted knockout of these genes had been shown to be lethal in mouse models (31)(32)(33)(34).
Similar to other biologic networks, genes in coexpression networks are found to organize into "modules," which are clusters of genes that have higher degree of connectedness with other members of the same module than with genes in different modules (29). Genes composing a module therefore tend to behave more similarly to one another with regard to correlation with phenotypes (Fig. 3) and are Fig. 3. Network construction approach reveals molecular signatures associated with complex diseases. Association network comprising gene modules (small nodes) and edges (connecting lines) was constructed by k-mean clustering method with gene expression profile in whole livers of a genetically randomized mouse population. Blue and orange edges represent positive and negative correlation between modules. Phenotypes related to metabolic and cardiovascular diseases were selected in this figure to indicate that examining molecular signatures at transcript level can reveal trait interconnectedness. Purple and orange modules represent unique and common modules among relevant phenotypes in the network (unpublished work). often enriched for particular functional categories of genes. Using data-reduction methods such as principle component analysis, the aggregate of genes in a module can be characterized by a single value to use for such analyses (35). One example from our work is the identification of the unfold protein response (UPR) pathway as being important in the response of endothelial cells to oxidized lipids. Coexpression networks were constructed of transcripts induced by oxPAPC in primary endothelial cells isolated from human aorta (30,36). Two of fifteen modules were strongly correlated with interleukin-8 induction. These modules were enriched for UPR pathway genes, and several of the most highly connected genes were members of the UPR pathway. Knockdown experiment with certain UPR genes revealed the role of the URP pathway in the regulation of interleukin-8 and other cytokines. In other experiments, a hub gene in the UPR module, MGC4504, proved to contribute to an apoptosis response. These findings led to the discovery of a novel gene that is critical for UPR function in this process, based on it being a hub gene in the same module and closely associated with known UPR genes. By traditional analyses, it mostly would not have been recognized as being particularly important.
One promising approach to reconstruct a directed gene network is the Bayesian Modeling approach. Instead of examining strictly the gene connectivity and module formation, Bayesian modeling is useful in leveraging genetic information to infer causality among genes in the directed network (39). As a probabilistic model approach, Bayesian network reconstruction utilizes posterior probability to map traits with particular markers to exploit the increased information from joint mapping of correlated transcripts. Yang et al. have incorporated the genetic data into Bayesian networks for hepatic gene expression from genetically randomized mouse populations, using the likelihood-based causality model selection approach described above for causal gene detection (Yang et al., in press). This provides a significant improvement over the gene coexpression network constructed without the genetic data by incorporating predicted causal relationships among genes. Such "directed" networks can elucidate the mechanisms underlying phenotypes by which causal regulators give rise to changes in expression activity of various genes. More recently, using expression data from a genetically randomized yeast population, Zhu et al. (40) integrated noisy protein interaction data collected from various sources as well as genetic information into gene expression network by applying Bayesian modeling approach.
Recently, a genome-wide functional network for mouse population has also been established and validated, with which Bayesian integrative modeling brings the protein interaction pattern together with gene expression profiles to illustrate the network including probabilistic functional linkages among over 20,000 genes (38). In addition to Bayesian probabilistic models, there are various ways to integrate biological information, especially transcriptional regulatory mechanisms into gene network underlying complex diseases. Success in identifying critical regulators have been demonstrated in various organisms by employ-ing integrative genetics approaches (41). Furthermore, the involvement of tissue specificity and sex effect in complex disease pathogenesis can also provide further information in personalized medicine in the assist of systems biology approaches.
We are a long way from understanding complex diseases from a systems perspective. However, the use of high-throughput global gene expression assays in the context of genetic analyses has shown how an integrative genetics approach can reveal higher order interactions for traits as complex as diabetes and heart disease (1,2,5,8). As analogous methods for the metabolomic and proteomic elements are developed, progressively richer models of complex disease will be developed (42,43).