Synthetic genomics: a new venture to dissect genome fundamentals and engineer new functions

Highlights • Viral, organelle and bacterial genomes can be synthesized.• Sc2.0 is on the way to generating the first synthetic designer eukaryote.• 6 of 17 S. cerevisiae chromosomes are synthesized and published.• GP-write will open a new era of synthetic genomics.

Synthetic genomics: a new venture to dissect genome fundamentals and engineer new functions Daniel Schindler 1 , Junbiao Dai 2 and Yizhi Cai 1,2 Since the first synthetic gene was synthesized in 1970s, the efficiency and the capacity of made-to-order DNA sequence synthesis has increased by several orders of magnitude. Advances in DNA synthesis and assembly over the past years has resulted in a steep drop in price for custom made DNA. Similar effects were observed in DNA sequencing technologies which underpin DNA-reading projects. Today, synthetic DNA sequences with more than 10 000 bps and turn-around times of a few weeks are commercially available. This enables researchers to perform large-scale projects to write synthetic chromosomes and characterize their functionalities in vivo. Synthetic genomics opens up new paradigms to study the genome fundamentals and engineer novel biological functions.

Introduction
One of the major challenges in biological sciences was the determination of DNA sequences. In the beginning, only single DNA fragments were sequenced using the chain termination sequencing technique [1]. However, the Human Genome Project (GP-Read) accelerated the evolution of new sequencing techniques by having the ambitious goal to sequence the human genome within 15 years. The development of Next Generation Sequencing techniques today allows sequencing of a human genome within days. However, most eukaryotic genomes are not fully sequenced and new sequencing techniques are still being developed. As exemplary achievement of this development, in 2017 sequencing of one of the highly repetitive human centromeres was achieved [2 ]. Scientists are now performing well in reading genomes, a measurable output being the growing number of genome sequences in public databases. However, reading a book alone does not make a good writer, instead it requires one to start writing extensively and creatively to master the art and ultimately it leads to a better understanding of grammar and expression. In this case, one needs to write synthetic DNA sequences in order to better understand the grammar of life.
Writing DNA starts with short single-stranded fragments: the oligonucleotides. Since the development of the Polymerase Chain Reaction and the first complete synthesis of a gene, writing DNA in vitro has progressed impressively ( Figure 1) [3,4]. Recent drops in DNA synthesis costs and the improved capability of synthesizing longer stretches of DNA allow the design and construction of whole synthetic chromosomes in the mega-base range. Recent publications report the construction of viral and microbial synthetic genomes, and the Sc2.0 project aims to generate the first synthetic eukaryotic genome. It is an open discussion how to define whether a chromosome or genome is synthetic. In this review, chromosomes and genomes are defined as synthetic when all building blocks of the final DNA molecule are generated by chemical synthesis. Chromosomes and genomes which are not completely synthesized are considered 'engineered' or 'modified' and are outside the scope of this review. We define synthetic genomics to be a new field where biology is being engineered at the genome level, and it is an intersection of synthetic biology and systems biology. This review neither aims to discuss assembly methods nor the dualuse character of synthetic genomics. The authors are fully aware of the potential dual use character, especially for the synthesis of viral genomes. However, these issues are discussed and reviewed extensively elsewhere [5][6][7].
changes to the genetic content, but nonetheless resulted in the breakthrough in synthesizing, assembling and ultimately transplanting chromosome-scale synthetic DNA [11,12 ]. With increasing knowledge and progress in chromosome-scale DNA synthesis, the designs of synthetic sequences are becoming more complex and ambitious [8 ,13 ]. Many genome synthesis projects utilize a hierarchical genome assembly strategy starting with small building blocks which are assembled, by the technique of choice, to larger building blocks of around 50-100 kb. These fragments are used to further assemble the synthetic chromosome in a heterologous host or to replace the corresponding wildtype sequence in a stepwise manner. Each of the techniques have advantages and disadvantages (Box 1), and should be chosen carefully based on the use cases.

Synthesizing DNA goes viral
Although viruses and phages are not considered to be 'alive' they have a genome. They can reproduce themselves by leveraging the resources from a host. Viral genomes are rather small, with sizes between 1759 bps (Porcine circovirus [14]) and 1259 kb (Megavirus chilensis [15]) and can consist of DNA or RNA. The first complete synthesis of a viral RNA genome, the polio virus, was accomplished in 2002 [16 ]. The 7.5 kb synthesized cDNA genome was in vitro transcribed by RNA polymerase and can generate infectious virus particles after transfer into a cell free extract. Further viral RNA and DNA genomes were synthesized up to a size of 212 kb in recent years (Table 1). Synthesizing, as well as engineering variations of viral genomes to produce genome libraries, has an enormous potential for therapeutic applications. Vaccines and drugs could be quickly generated in    The number of genome sequences uploaded to databases is exploding and it is impossible to give a number which would be accurate and valid for some time. The knowledge gained by genome sequencing and advantages in gene synthesis is the dawn of writing chromosomes. By now the number of bases incorporated into completely synthetic chromosomes is: 6.1 mb. The cost by today would be roughly $425 000 assuming the current competitive price rate of 7 cents per base for nonclonal 1.8 kb DNA-fragments. The synthesis cost for a haploid human genome would by today be roughly $45 000 000. However, lowering DNA synthesis costs is one of the major goals of GP-write. In the future, the DNA synthesis cost of a human genome in will be less than the price of the Mycoplasma mycoides JCVI-syn1.0 project (estimated $40 000 000 [33]).
response to the emergency of a certain virus variant, which may help to prevent wider outbreaks [7].

Synthesizing genomes of organelles
Mitochondria in general, and the plastids of plants, contain a genome. Their sizes are rather small but show a huge variation in size and content. Studying these organelles is very interesting but challenging. Transformation of organelles must be done by bio-ballistic transformation [17]. The efficiency of synthetic DNA transformation is rather low. Mitochondria are the only organelles for which a complete organelle genome has been synthesized so far. The synthesis of the 16.3 kb mouse mtDNA genome was achieved by using 600 60mer oligonucleotides in four consecutive assembly rounds [18]. This step was predominantly the proof of principle for a DNA assembly method. However, it is an intriguing question why organelles still contain genetic content and have not migrated all necessary genes to the nucleus. There are exceptions in nature where the mitochondria do not contain any DNA [19].

Synthesizing microbial genomes
The genome of Mycoplasma genitalium was the first completely chemically synthesized genome [11]. However, the genome could not be transferred to produce a viable synthetic M. genitalium strain, presumably because of an interruption of rnpB, a subunit of RNaseP. In 2010, the same group demonstrated, in a remarkable work, the complete chemical synthesis and transfer of the 1.08 Mb Mycoplasma mycoides genome into the close relative Mycoplasma capricolum [12 ]. This is the first organism which is controlled by a synthetic genome, and is referred as M. mycoides JCVI-syn1.0. The genomic differences to M. mycoides are marginal and consist of designed 'watermark' sequences, 14 genes are deleted or disrupted and nineteen harmless polymorphisms were acquired during the building process.
This successful project was the starting point to generate a minimal Mycoplasma organism based on JCVI-syn1.0. Briefly, two independent teams failed to generate a viable cell, based on knowledge and genome synthesis, from scratch. However, multiple rounds of transposon mutagenesis and genome reduction finally generated M. mycoides JCVI-syn3.0 a minimal genome with a genome reduction of 50.8% in a design, build and test cycle manner [20,13 ]. The 901 genes of M. mycoides JCVI-syn1.0 were reduced to 473 genes of which 149 are of unknown function and will give deeper insights into essentiality of genes [21].
An interesting ongoing project is the generation of a synthetic Escherichia coli genome. The genome of a previously diminished E. coli strain is redesigned in 87 ca. 50 kb segments to eliminate 7 codons in the coding sequence in a stepwise manner [9 ]. The 62 214 (5.4%) excluded codons are replaced by synonymous codons to maintain viability. The freed-up codons may be used to incorporate non-natural amino acids into proteins in the future. Absence of seven codons and corresponding tRNAs will, in addition, provide sufficient resistance to phages, rendering this strain of great general interest. Currently 55 of the 87 segments have been tested experimentally but the incorporation into a fully synthetic E. coli genome still needs to be proven functional.

Synthesizing eukaryotic genomes
As of today, there is no complete synthetic eukaryotic genome. However, the synthetic yeast genome -Sc2.0 project (www.syntheticyeast.org) aims to generate the first eukaryotic cell operated by a synthetic genome. The 16 chromosomes are synthesized in individual strains by teams of scientists within the Sc2.0 Consortium. The chromosomes are re-designed in a higher order of magnitude compared to any other existing write project [8 ].
The major changes include the removal of most introns, transposons and repetitive elements. One central element of Sc2.0 design is the relocation of all tRNA genes to an independent 17th Saccharomyces cerevisiae chromosome, designated as the tRNA neochromosome. tRNA genes are heavily transcribed and therefore are hotspots of genomic instability caused by replication stress and transposon insertions. In addition, all non-essential genes are flanked by loxPsym sites which allow inducible largescale genomic re-arrangements mediated by Cre-recombinase. This implemented genome rearrangement technique is therefore referred to as Synthetic Chromosome Rearrangement and Modification by LoxP-mediated Evolution (SCRaMbLE) and has already proven its functionality [20,22 ,23 ].
Recent publications report the synthesis and characterization of six Sc2.0 chromosomes and the right arm of synthetic chromosome IX (Table 1) which collectively correspond to 32% of the yeast genome [20,24 ,25-29]. Strikingly, the individually synthesized chromosomes can be merged in a single cell by mating with a technique called endoreduplication intercross [8 ]. Currently, the strain with the most synthetic chromosomes in one cell contains synIII, synVI and synIXR. With further progression of the Sc2.0 project more synthetic chromosomes will be finalised and ultimately merged to the final Sc2.0 strain.

GP-write: a sneak preview into the future of synthetic genomics
The successes in current genome synthesizing projects are leading to the next grand challenge in modern biological science: The Genome Project-write (GPwrite). This project is a grand challenge using synthesis, gene editing and other technologies to understand, engineer and test living systems with the overarching goal to understand the blueprint for life provided by the Human Genome Project (HGP-read) [30 ,31,32]. Therefore, a new international consortium was formed and first meetings were held in 2016 and 2017. The consortium is an open, interdisciplinary and international research group to focus efforts to realize GP-write.
GP-write has several goals, one being the development of new techniques and to accelerate the evolution of existing techniques with an overall goal to reduce synthesis costs by 1000-fold within ten years. Similar effects were achieved by HGP-read: today the cost of sequencing a human genome are magnitudes lower than the initial human genome sequence. The open nature of GP-write allows everyone to submit project proposals which will be evaluated by the Scientific Executive Committee. As of   Table 2 Approved GP-write pilot projects (by January 2018)

Project title Project goals a Project lead(s)
UltraSafe Cell Line The project aims to generate an Ultrasafe cell line by altering roughly 1% of the human genome. Some key goals are: Virus and prion resistance, removal of transposable elements, recoding of triplet repeats, recoding to a human consensus sequence in regard to SNPs and indels, implementing the bespoke SCRaMbLE system beside further alterations.

Jef Boeke & George Church
High-throughput HAC Design to Test Connections Between Gene Expression, Location and Conformation There is still a lack of understanding of the regulation of gene expression. The project will build two 1 mb regions of the human genome. The regions will be constructed as combinatorial libraries with different promoters and insulators to investigate 'rules' for optimal gene expression. Codon alteration is an important part of GP-write. The project aims to develop: Firstly, a rapid method for multiplex targeted genome modification; secondly, a respective rapid and robust screening system for living cells in 96-well format; thirdly, a strategy for rapid evaluation of heterogenic cell populations; and finally, a software to design the synthetic DNA fragments and evaluate viability of codon replacements.

Marc Lajoie
The Seven Signals Toolbox: Leveraging Synthetic Biology to Define the Logic of Stem-Cell Programming Cell differentiation is mainly driven by seven signal types. This project aims to generate a toolbox which allows the in vitro differentiation of GP-write cell lines. This is a crucial step for future applications in the field of cell therapies, tissue replacement or transplantation of organs.

Liam Holt
Precision Human Genome Engineering of Disease-Associated Noncoding Variants Efficient and precise engineering of the human genome is still a challenge. This project aims to create a complete pipeline for rapid engineering of human cells with an enrichment for homologous recombination repair. The project will also provide bioinformatic tools to optimize CRISPR based engineering.

Synthesizing a Prototrophic Human Genome
This project postulates to introduce pathways for the nine amino acids and a variety of vitamins which cannot be synthesized by humans. These molecules derive from the diet. It investigates whether the milieu in the cell makes a prototrophic cell line feasible. If the project succeeds, further engineering would be performed and the first achievement would be a drastic cost reduction of cell line cultivation media.

Harris Wang
Through the Looking Glass: Anticipating and Understanding Governance Systems and the Public's Views on HGP-write Including the publics view and governance systems into GP-write is an important step. This project will generate a dialogue between scientists and the public. Incorporation of the society will enable acceptance and support for GP-write.

Todd Kuiken & Gigi Gronvall
Synthetic Screening for Essential Introns and Retroelements in Human Cell and Animals This project aims to perform systematic screenings of intron and retroelements in the genome. Combinatorial variants of chosen genes will be investigated in a diploid background. The outcome will indicate if the removal of these elements, like in Sc2.0, is feasible in GP-write.

Yasunori Aizawa
Isothermal Amplification Array This project proposes a new method to synthesize DNA. It depends on two steps. Firstly: generation of short oligonucleotides by an isothermal amplification on an array. Secondly: the amplified oligonucleotides can anneal according to their design and nicks are sealed by a ligase.

Max Berry
Recombinase-Mediated Assembly This project proposes a new method to assemble DNA fragments by utilizing a RecA-like recombinase (UvsX). The method should allow, with the Isothermal Amplification Array assembly from short oligos to chromosomesized DNA, with a significant labour reduction.

Max Berry
Synthetic Regulatory Genomics The project aims to study regulatory variations of non-coding regions. The project will use multi-edited regulatory DNA sequences and analyse their function with multiple techniques. This project will give deeper insights into non-coding regions of complex genomes.

Matt Maurano
Concepts & Ethics in GP-write: Understand, Question, Advance This project aims to build a model for deep analysis of concepts and ethics in GP-write, and aims include the dynamics of science and society. It aims to expand collaborations between sciences and the humanities and provide proper education and training.
Jeantine Lunshof a More detailed information can be found at http://www.engineeringbiologycenter.org/.
January 2018 there are 13 pilot projects approved ( Table 2). The projects cover many aspects of synthetic genomics, two highlights are the projects dedicated to the Concepts and Ethics in GP-write as well as Anticipating and Understanding Governance Systems and the Publics Views on HGP-write, which shows the importance to consider ethics and the publics views within GP-write.
One major remaining question is: What can we learn from GP-write? On one hand, there will be the ad hoc advances in enabling technologies, on the other hand there will be an immense gain of knowledge in biological sciences. Our knowledge of complex genomes is still limited. For instance, roughly 1% of the genome is responsible for all proteins in the cell. The remaining 99% are often referred as the 'dark matter' of the genome. Stepwise replacement of these elements, like in the Sc2.0 project, will potentially help us decipher the functions of the dark matter in the genome. On the application front, the pilot project to engineer a stable and safe cell line, has a profound implication for biomanufacturing and bioproduction (Table 2). GP-write still has a long way to go. However, the scientific community is curious about the outcome of the first pilot projects in the GP-write framework.

Conclusion
The initial genome writing projects summarized here show that individual native chromosomes and whole genomes can be replaced by chemically synthesized genomes. So far, the changes to DNA sequences are relatively modest but with growing knowledge of biological systems, the design will become more aggressive and adventurous which will lead us into previously unexplored territories. The exciting field of synthetic genomics will give new insights in basic research and will open new possibilities in applied science.

Funding
The work in the UK is funded through a Biotechnology and Biological Sciences Research Council grant (BB/ P02114X/1), and the University of Manchester President's Award for Research Excellence to YC. This work is also supported by the National Natural Science Foundation of China (31471254 and 31725002) and partially supported by the Bureau of International Cooperation, Chinese Academy of Sciences (172644KYSB20170042) to JD.