Programming cells: towards an automated ‘Genetic Compiler’
Introduction
Numerous genetic circuits have been built that encode functions that are analogous to electronic circuits [1, 2, 3]. When multiple circuits are connected to sensors and actuators, this forms a genetic program. For example, we constructed an ‘edge detector’ program that combines a ANDN gate, light sensor, and cell–cell communication that give bacteria the ability to draw the edge between the light and dark regions of an image projected onto a plate [4]. Other genetic programs have been built that combine circuits to produce a push-on/push-off circuit [5•], implement a counter [6••], and reproduce predator–prey dynamics [7]. These represent toy systems, but the implementation of such programs in applications for industrial biotechnology is inevitable.
Automated DNA synthesis gives genetic engineers an unprecedented design capacity [8]. This technology enables the specification of every basepair for long sequences, without having to be concerned about the path to construction. Together with methods to rapidly combine genetic parts [9••, 10] and assembly methods that scale to whole genomes [11, 12, 13], the problem of DNA construction has far outpaced our capacity for design [14•]. A good example of this is the 2006 UCSF iGEM team to build a ‘remote-controlled bacterium.’ DNA synthesis was used to build the first construct (requiring a few weeks), but after four years of additional tinkering, the paper will be submitted in 2010.
Our ability to design programs has been hampered by three problems. First, there is a lack of good, robust genetic circuits that can be easily connected. Second, there are few design rules that are sufficiently quantitative to be carried out algorithmically. Modeling can be helpful before the experiments to determine the topologies and parameter regimes required to obtain a particular function. However, simulations cannot be used to ‘reach down’ to the DNA and suggest a specific mutation or select a part. Third, the frequency of mistakes in the DNA sequence increases quickly with size. Currently, to scan for potential errors (e.g. transposon insertion sites or putative internal promoters), it requires the running of multiple (usually web-based) programs. There is no unified software package to date that addresses all of these issues.
The creation of a simulation environment for genetic engineering is complicated by the diversity of cellular functions. When studying natural networks, there is a feeling of ‘peeling an onion,’ where there are seemingly endless redundancies and classes of biochemical interactions. Even within the Registry of Standard Biological Parts (www.partsregistry.org), there are a wide variety of cellular functions: from enzymes and transcription factors to multi-gene gas vesicles and secretion chaperones. Each specific problem requires its own style of simulation; a dynamic program may be well satisfied by sets of differential equations, pattern formation by cellular automata, and enzymes by metabolic flux analysis. It would be daunting to create a simulation package that could encompass all of this diversity.
To reduce the problem complexity and to frame recent computational work, we introduce the concept of a ‘Genetic Compiler,’ whose inputs are high-level instructions (equivalent to VHDL or Verilog) and whose output is a DNA sequence. The sequence can be sent to a company for DNA synthesis or a robot for automated assembly. The problem is constrained by focusing on genetic programs that encode a desired logical or dynamical function, which can be integrated into many applications in biotechnology (Figure 1). This avoids the application-specific portions of the problem; for example, building a butanol sensor a particular metabolic pathway. It is distinct from tools for protein or metabolic engineering [15].
The scope of this review is on the underlying algorithms and biophysical methods that would power such a compiler (Figure 2). Realizing this goal will require: 1. Libraries of reliable genetic circuits designed specifically to be part of a CAD program, 2. the definition of a higher-level language, 3. algorithms to assemble circuits according to a specified program, 3. biophysical methods to connect and optimize circuits, 4. simulation programs to debug the program dynamics, 5. algorithms for DNA assembly and experimental design. The scope has been limited to exclude several topics that are crucial to synthetic biology, but have been well-reviewed elsewhere, notably codon optimization and tools from systems biology and metabolic engineering [15, 16, 17].
Section snippets
Robust combinatorial logic
Combinatorial logic is implemented by Boolean circuits and is the basis for digital computing. It is used to build circuits that apply Boolean algebra on a set of inputs to transform them into a set of desired outputs. Simple circuits can be layered in different configurations in order to achieve a computational operation. This has enabled the automated design that underlies VLSI. The ability for digital circuits to be flexibly used and easily captured by CAD comes at a cost of speed, design
Conclusions
Genetic engineering is moving towards becoming an information science. The model of storing and distributing genetic material is slowly loosing relevance. It is routine to outsource the task of constructing DNA from the designer to synthesis facilities. This has created a strong need for computer-aided design programs that are able to facilitate the organization and construction of large projects. Once the parts are experimentally characterized, it is unnecessary to distribute the DNA. Rather,
References and recommended reading
Papers of particular interest, published within the annual period of review, have been highlighted as:
• of special interest
•• of outstanding interest
Acknowledgements
The authors thank Ron Weiss (MIT), Rahul Sarpeshkar (MIT), Alan Mishchenko (UC-Berkeley), Jean Peccoud (VPI), Costas Maranas (Penn State), and Douglas Densmore (BU) for helpful discussions. CAV is supported by Life Technologies, ONR, Packard Foundation, NIH, NSF (synBERC: Synthetic Biology Engineering Research Center, www.synberg.org) and a Sandpit on Synthetic Biology hosted by EPSRC/NSF.
References (97)
Genetic parts to program bacteria
Curr Opin Biotechnol
(2006)Ultra Low Power Bioelectronics
(2010)- et al.
Semi-synthetic mammalian gene regulatory networks
Metabol Eng
(2005) - et al.
Environmental signal integration by a modular AND gate
Mol Syst Biol
(2007) - et al.
A universal RNAi-based logic evaluator that operates in mammalian cells
Nat Biotechnol
(2007) - et al.
Eugene—a domain specific language for specifying and constraining synthetic biological parts, devices, and systems
PLoS ONE
(2010) - Mishchenko, A. ABC A system for sequential synthesis and verification. 2010; Available from:...
- et al.
A synchronized quorum of genetic clocks
Nature
(2010) - et al.
Desynchronization: synthesis of asynchronous circuits from synchronous specification
IEEE Trans Comput Aided Des Integr Circuits Syst
(2006) - et al.
The virtual instrument: support for grid-enabled MCell simulations
Int J High Perform Comput Appl
(2004)