Iterative Systems Biology for Medicine – time for advancing from network signature to mechanistic equations

: The rise and growth of Systems Biology following the sequencing of the human genome has been astounding. Early on, an iterative wet-dry methodology was formulated which turned out as a successful approach in deciphering biological complexity. Such type of analysis effectively identified and associated molecular network signatures operative in biological processes across different systems. Yet, it has proven difficult to distinguish between causes and consequences, thus making it challenging to attack medical questions where we require precise causative drug targets and disease mechanisms beyond a web of associated markers. Here we review principal advances with regard to identification of structure, dynamics, control, and design of biological systems, following the structure in the visionary review from 2002 by Dr. Kitano. Yet, here we find that the underlying challenge of finding the governing mechanistic system equations enabling precision medicine


M A N U S C R I P T A C C E P T E D ACCEPTED MANUSCRIPT
In our view, systems biology has now become an accepted paradigm in biological research [1].This is in part reflected in the sheer number and quality of publications utilizing systems approaches acknowledging and embracing the complexity of biology [2][3][4].Recently, the application of such frameworks to clinical challenges has led to the emergence of what could be referred to as systems or network medicine [1, 5,6].It is therefore timely to ask to what extent real progress has been achieved -and to critically assess the nature of conceptual and technical hurdles remaining in meeting the needs from a medical standpoint.Here, we use the structure of the very influential position paper (close to 4000 citations) by Kitano in 2002 [7] to assess achievements and challenges on the basis of the research agendas put forward.Next, on the basis this analysis we argue that despite conceptual and technical advances, there remains a fundamental gap between finding associated features (biomarkers) of a given process versus the more challenging task obtaining a causal (e.g.mechanistic) understanding of the process.This, in our view an ultimate gap, becomes even more glaring in a medical context, since there we would like to ask therapeutic questions such as what happens if we do X to a (human) system.At the end of the day, X is an intervention based on causal understanding in the sense that "if X is executed" then "the relevant processes become properly modified".We conclude this opinion paper with the sentiment that the time is ripe for bridging this gap and algorithmic tools in combination with richer data and more powerful computational platforms have the potential to operationally address the inherent challenges in wordings such as 'relevant' and 'properly' above.

Systems-based analysis a la Kitano.
Since the sequencing of the human genome, there has been a shift in biomedical research from reductionism towards a holistic view in the sense of acknowledging the complexity and myriad of parallel and interconnected processes, including the multiple spatio-temporal scales involved in almost any biological phenomena.Interestingly, technological advances rather than theory itself have largely driven this shift of perspective.It has generated a multitude of novel methodologies (or creative applications of existing methodologies), many of them labeled under the fields of Systems Biology [7] or Systems Medicine [8,9].While multiple complementary definitions of Systems Biology do exist [10,11], we frame our discussion using the landmark paper from Prof. Kitano in 2002 [7].Prof. Kitano provided a comprehensive concept, and what could be referred to as a normative account, in turn translated into an operational pipeline defining Systems Biology as a methodology to understand biological systems.Specifically, an iterative standpoint was formulated such that a cycle of research combining dry-lab and wet-lab efforts would generate, validate or reject a hypothesis, and finally incorporated the outcomes of the analysis in the state-of-the-art amenable for a new iteration of the cycle.In this, Prof. Kitano emphasized four necessary vital avenues of investigations that jointly would admit system-level understanding: (1) system structures (for instance the network of interactions), ( 2) system dynamics (mathematical description and analysis), (3) control method (identification of the biological targets that can modulate or control the state of the cell) and a (4) design method (aiming to construct systems de novo to make use or to validate properties identified or hypothesis generated).Remarkably, in hindsight the 2002 Kitano's vision has turned out to be truly predictive in that we have witnessed remarkable progress in all those four areas, yet at different pace, and in part evolving in separate communities.For example, the emergence of the young dynamic filed of synthetic biology can be viewed as response to the need for design, which in turn can be traced back to Feynman's classic dictum on what you can't

M A N U S C R I P T A C C E P T E D ACCEPTED MANUSCRIPT
create you don't understand.At this juncture, we could conceptually ask whether these four areas are necessary, sufficient, or both to achieve systems understanding [12].To shed light on this issue we will first briefly review progress in respective area, finding that the aforementioned gap between biomarkers and mechanisms cuts across all four areas.Structure, Dynamics, Control, and Design -progress and gaps.In engineering, or more specifically control theory, system identification is defined as a method for developing mathematical and computer-based models that represent the characteristics of that system from measurements of the system inputs and outputs [13].Traditionally, linear systems have been in focus and the mathematical model captures the transfer function between input and output, thus not necessarily incorporating neither the underlying biophysical components nor the non-linear dynamics governing the interactions between the components over time.In contrast, in biology we aspire to identify not only the structure of cellular networks but also their dynamics, in order to achieve engineered control of the system [14,15].This motivates the division of labor between finding the structure, dynamics, and control respectively as originally conceptualized by Kitano.The identification of System Structures can be attained by data-driven reverse-engineering approaches [16], either augmented by prior knowledge as a structural scaffold or by direct experimental analysis requiring structural learning directly from the data [17].With the advent of high-throughput technologies -including both microarray and Next-Generation Sequencing technologies, reverse-engineering approaches have been a major research area in systems biology since the original 2002 publication.Pure data-driven reverse-engineering methods have as a rule only used time-series and/or perturbation experiments to uncover associations -not necessarily causal -between features e.g a transcription factor and the expression of the corresponding target genes [18].Such relationships can readily be represented using different modelling formalisms, such as Mutual information [11], Boolean networks, Bayesian networks (BNs) [12], Petri nets [19], constraint-based models, differential equations [20], rule-based models [21], cellular automata or agent-based models [22], all being parts of a growing toolkit for data-driven reverse-engineering approaches.Yet, causal parameterizing remains challenging due to uncertainties in model structure and parameters [23].A second line of reasoning is to define a prior network structure or scaffold through a literature review.Examples include modeling of atherosclerosis modeling [20], brain functioning [24], or immune system [25] to name a few examples among many.Alternative, the prior structural template can be collected from systematic experiments, as in the case of Protein-Protein interactions and the generation of the Proteome-Scale Map of the Human Interactome Network [17].From the three approaches, experimental and datadriven approaches are in our view to become even more prevalent due to the exponential growth of data in public repositories [26][27][28] and the decrease cost in sequencing [29].The knowledge-based approach appears to be at a turning point in the sense that "classical" text-mining methodologies [30,31] have not, in our view, provided a significant edge when compared with other approaches, whereas recent advances using DeepLearning [32] methodologies hold promise to disrupt current state-of-the-art in text mining similarly to recent achievements in genomic analysis [33,34].In summary, these advances in network biology have enriched the notion of biomarker from a single or very few features to include a larger set of (inter-connected) features (i.e. a network signature) associated with disease or a biological process.In contrast, to achieve a mathematical description of a biological system from first or derived principles has been challenging ever since the pioneering work of Hodgkin-Huxley and still constitutes a fundamental barrier.If we are unable to have what could be referred to as fundamental guiding principles to model dynamic biological processes and how these are controlled (e.g least action principle in physics) and given that we are dealing with large inter-connected living systems the challenge is how to formulate unbiased mathematical models under these modeling conditions.Regardless of how the interactions between elements are represented, the importance of system dynamics remains as illustrated by the central dogma of Systems Biology: "system dynamics that gives rise to the functioning and function of cells" [35].One of the best examples is the pioneering work by Tyson and colleagues on the cell-cycle in different systems [36,37].For this problem, it is imperative to model and understand the Systems Dynamics, i.e. what governs the switch from one phase to another, as it is dependent on the dynamics between many variables, which is different in each of the phases of the cell cycle.A wiring diagram is simply not sufficient.Such a systems dynamics analysis requires: (a) an accurate definition of the system structure [23], and analysis of how variations of the structure may affect the dynamics [38,39], (b) a mathematical parameterization, linear or non-linear [40], (c) a mathematical analysis such as the bifurcation analysis [41] or the identification of slow manifolds [12,42], and (d) sensitivity analysis of structure and parameters in the model using the system dynamics as a readout [43,44].From experience within the community, we would like to argue that essential factors for success includes dealing with a "small confined" system (e.g the cell cycle or nerve impulse propagation) combined with physical insights enabling a simplified or reduced (phenomenological) description of the system at hand.Yet, despite this, in our hands any such analysis targeting system dynamics sooner or later is challenged by the conundrum on: "how to generate hypothesis under uncertainty in both structure and dynamics?".This becomes exponentially challenging the larger system is and the less experimental constraints we can in principle impose on the state-variables and parameters governing the system dynamics.In our view, because of these difficulties the issue of uncertainty has mainly been addressed in the area of Control Method where 'what if' kind of questions are as a rule being asked aiming to uncover what controls, as a rule, a small confined system.Broadly speaking, meeting this challenge, investigators have developed two different approaches: ordinary differential equations (ODE) based dynamical models versus structural network-based models.Using ODE, the challenge of generating hypothesis under uncertainty is an active field of research [45], and methods are emerging for exploring the space of feasible models [23,39] explaining existing data, and generating robust hypothesis by their study [20,46,47].In the case of network-models, the majority of models or wiring diagrams, such as Protein-Protein interactions [17,48], gene-gene interactions or co-morbidity oriented models [49], suffer from the lack of dynamics because the modeling framework is an interactome-based description of the system.However, interactome or structural models are in many cases considered as systems biology modelsor as a "gateway to systems biology" [50] -because they provide some insights into control aspects by identifying for instance: (a) candidates for master regulators [41] in highly connected genes or genes with a high betweeness centrality value [51] (measures the centrality of a node in a network by its prevalence in the shortest paths between other nodes); (b) identifying groups of highly connected nodes associated to a specific functional role or disease [48]; or, among others, (c) identifying recurrent motifs [52] in the network that may be associated to robustness [7].Yet, a wiring diagram is not sufficient to address what happens if those nodes or edges are modified, i.e. causal interventions are as a rule beyond the scope of such models.In our experience, and very importantly, both approaches,

M A N U S C R I P T A C C E P T E D ACCEPTED MANUSCRIPT
ODE and Network-based models, can be complementary.A first-order approach is simply to use network-approaches combined with knowledge-based methods to select the most relevant features (nodes and edges) to be incorporated in a dynamical model [53].A major bottleneck for such an approach is that, in most cases, the information used for variable selection may not suffice for the generation of accurate mathematical models [54].We like to argue that a critical but fair assessment, is that while dynamical system analysis and causal control analysis has been ongoing in the modeling community those efforts have not been successfully adopted in systems medicine [55] or impacted the clinical community.This is not surprising since this state-of-affairs reflect the current gap between network biomarkers and a true causal understanding of a system [12].The fourth area of Design Method, pointed out by Kitano, entails the use of combined insilico and wet-lab strategies to design new systems [56] or to-modify existing systems ondemand [57].In this regards, Synthetic Biology plays a fundamental role in Systems Biology because it opens the door to investigate basic design principles in biological systems [58,59].Interestingly, some investigators consider Synthetic biology as a logical end point of Systems Biology [10].Several milestones have been achieved since 2002, including the generation of a minimal bacterial cell, an 531-kb genome encoding 438 proteins and 35 RNAs [60].Additionally, several techniques allows several degrees of manipulation, such as silencing RNA [61], shRNA and CRISPR-Cas9 [62,63] perturbation methodologies.Moreover, several advances in synthetic biology are already approaching clinical utility [64].An impressive lineup of success stories has enabled interventions and reprogramming of systems without requiring complete systems understanding.

Timely to close the gap between network biomarkers and causal mechanisms?
In the previous section, we have briefly enumerated some advances since 2002 in Systems Biology (and Systems Medicine) with special reference to structure, dynamics, control, and design.Yet, there are several practical, technical, and conceptual hurdles impeding smooth transition and iterations of the Kitano cycle.These include for instance: (i) insufficient amount and quality of data supporting not only accurate reconstruction of the network structure of a system but importantly the generation of accurate mechanistic models [16]; (ii) many of the methodologies proposed were prohibitory computationally expensive.Additionally, (iii) the research community has uncovered many regulatory layers, beyond PPI or TF-gene interactions that are required to be included in any faithful biological model.For example, epigenetic regulations [65,66], chromatin accessibility [67] or transcriptional control exerted by miRNAs [68], thus making the problem of formulating a dynamical model even more difficult.Hence, the astute reader may well ask why should we believe that it is timely to advance beyond network biomarkers to dynamical models capable of addressing causal or therapeutic interventions in "large" systems?In our view, we have attained a technological stage where hypothesis driven research in Systems Biology is becoming (or can become) a reality.We now highlight the most relevant elements that could spur such a development: Computational speed: in the year of 2000, Pentium 4 was the new Intel Processor, and in contrasts, 2015 the latest Intel processor was not only 84x times faster [69] but also significantly cheaper.Furthermore, recent advances in the utilization of GPUs have fueled large-scale parallel computations [70].These developments clearly support massive ensemble approaches [38] and computational-driven parameter explorations [39] that previously were restricted to supercomputer centers.

M A N U S C R I P T A C C E P T E D ACCEPTED MANUSCRIPT
Importantly, in addition to the continuous increase in computational power following Moore's law, high-performance computing centers centralized in large institutions and cloud-based systems, such as Amazon Web Services (https://aws.amazon.com/hpc/),are complementing and eventually replacing small and medium computational solutions to a large extent.It is clear that research teams have access to affordable powerful computational resources at an unprecedented scale compared to a decade ago.In summary, the sheer size and the number of models that can explored in any given application is now enhanced by several orders of magnitude.Size and variety of data-types: the advent of NGS technologies initially provided the tools for whole genome sequencing [71] and a better estimation of mRNA expression [72] but soon, applications were extended to all DNA or RNA associated regulatory layers such as Transcription Factor Binding or Histone marks profiles by ChIP-seq [73], miRNA sequencing [68], Chromatin Accessibility [74] or DNA Methylation [65] among others.Through the combination of several NGS technologies we have in principle access to a complete regulatory view of a biological system or disease, and existing large-scale collaborative projects such as TCGA, ENCODE, IHEC or STATegra [75] are examples in that direction.Hence, the capacity to capture the relevant correlated features in a given process has improved dramatically.Increased biological resolution in data: Microarray and NGS technologies were originally capable of generating molecular profiles by averaging across several millions of cells per sample.For example, the expression profile estimated from millions of cells, masking cell-to-cell variation, has been a serious concern when analyzing heterogeneous tumors.Fortunately, recent advances in single-cell technologies, such as C1 Fluidigm, Drop-seq or 10X Genomics, are completely changing the game [76][77][78][79].Most importantly, single-cell technologies are not exclusively targeting RNA-seq but other omics such as DNA Methylation [80], Chromatin Conformation [81,82], and efforts are currently conducted to profile several omics data-types from the same cell [83].A second major value of single-cell techniques is that we can profile large number of single cells, within budgetary limits, which in the context of sample-hungry reverse-engineering algorithms is a major advantage.In essence, single cell techniques hold the promise of delivering a fundamental description of cells and their dynamics, thus setting the stage for modeling larger systems as well as accelerating designer or synthetic biology approaches.Method development: Yet, despite the technical advances referred to above we would like to emphasize that there is a deep need for developing powerful algorithms and computational modeling techniques in close alignment with the new types of high-precision data while making use of the new computational raw power.We have witnessed such kind of synergistic method evolution during the last two decades in the area of network analysis [84][85][86].Furthermore, there has been advances in mechanistic modeling, analysis [45] and visualization [87].From our point of view, the most relevant and timely challenge is still the systematic generation of in-silico derived causal hypothesis in larger systems in the face of uncertainty which is addressed in detail in [45].Workforce: Systems Biology clearly requires multi-skilled groups, covering multiple fields from mathematics to biology and even more so when entering the medical

M A N U S C R I P T A C C E P T E D ACCEPTED MANUSCRIPT
domain.While over the years courses and communities [88,89] responding to these needs have developed, there is still a need for funding soft infrastructures including education to further nurture the advances and communication across knowledge silos to properly integrate and support advanced data-analysis close to biology and medicine.

Conclusions:
Systems Biology has evolved over the last two decades to become a practical methodology taking center stage in the analysis of complex biological processes.Much has developed along the lines of Kitano´s prophetic vision.Here we have made the point out why the remaining gaps and bottlenecks existing at the beginning of the 21 st century are ready to be bridged.The overarching challenge is level the field towards a quantitative and dynamical account of biological processes during health and disease.Dramatic improvements in raw computational power, rich high-precision multi-level data in conjunction with new advanced algorithms hold the promise to bridge the gap between associated biomarkers and mechanistic elucidation of the governing processes.It is therefore timely that systems-based approaches, using a combined and integrative in-silico and experimental approach, address this gap and in doing so setting the stage to advance systems biology into targeting medical challenges in high precision [90,91].Key here for drug development and personalized medicine is our ability to identify causal mechanisms enabling robust precise therapeutic interventions beyond enumerating single biomarkers or network signatures.Last, but critical, as enablers of such work crossing traditional boundaries is the need for multi-skilled research teams, or collaborations, that make use of all tools and resources [91].As a conclusion, we are truly excited to be part of this era in the analysis of living systems during health and disease, where systems-based research and techniques can further revolutionize the all-changing biomedical research field, possibly even beyond what Kitano could imagine after the sequencing of the human genome Hypothesis-driven research in systems biology.The inner circle reproduces Kitano's classical Iterative System Biology workflow.The outer boxes denote gaps and current advances and tools mitigating those gaps.