Towards Plant Synthetic Genomics

Rapid advances in DNA synthesis techniques have allowed the assembly and engineering of viral and microbial genomes. Multicellular eukaryotic organisms, with their larger genomes, abundant transposons, and prevalent epigenetic regulation, present a new frontier to synthetic genomics. Plant synthetic genomics have long been proposed, and exciting progress has been made using the top-down approach. In this perspective, we propose applying bottom-up genome synthesis in multicellular plants, starting from the model moss Physcomitrium patens, in which homologous recombination, DNA delivery, and regeneration are possible, although further optimizations are necessary. We then discuss technical barriers, including genome assembly and plant transformation, associated with synthetic genomics in seed plants.


Introduction
It has been well recognized that DNA carries the genetic information for living organisms.The ability to read DNA sequences has enormously expanded our understanding of biology.Following the sequencing of complete genomes, gene editing techniques have enabled us to modify the genome at desired sites, even in a high-throughput manner.More recently, advances in DNA synthesis techniques have made genome synthesis a new frontier [1].Viral and bacterial genomes and yeast chromosomes have been engineered and reassembled using bottom-up approaches [2][3][4][5][6][7] (Fig. 1), enabling synthesis-based assays, such as accelerated evolution, multiplex gene deletions, and introduction or elimination of genetic codons.Despite the inevitable surge of interests in applying these technologies in more complex organisms, genome synthesis remains largely unexplored in multicellular organisms.
In viruses, bacteria, and yeast, the first step in genome synthesis is the design of synthetic genomes.There are multiple levels of consideration to be taken into account in genome design.A primary requirement is to distinguish between synthetic and native genomes.This is often achieved by altering or adding a few dozens of nucleotides, sometimes named watermarks, in intergenic regions that are tolerant to sequence changes [4].PCRTags, as an alternative watermarking system, introduce synonymous changes into coding sequences within open reading frames, and enable polymerase chain reactionbased assay for not only the incorporation of synthetic sequences but also elimination of native sequences [2].Additional designer alternations and inserted elements can provide new functions to synthetic genomes.In a synthetic yeast genome, a loxPsym site has been inserted after each stop codon, allowing whole-genome rearrangement upon Cre activation.With induced Cre expression, this technique, termed synthetic chromosome rearrangement and modification by loxPsym-mediated evolution, can restructure the synthetic genome, yielding highly variable structures and contents [2].In bacteria and yeast, TAG stop codons have been replaced with TAA, reserving the TAG codon for a different function [2,6].Repetitive sequences, such as transposable elements, have been eliminated in bacteria and yeast genome design, although they only represent a small portion of the genome [2,8].
Advances over the past two decades have enabled costeffective DNA synthesis of ~150-nt oligonucleotides with high fidelity.Several in vitro and in vivo assembly methods have been developed to assemble building blocks at several to tens of kilobases.Among them, Gibson assembly method is widely used to in vitro assemble large building blocks [4].Budding yeast has highly efficient homologous recombination and has been used as the chassis for large fragment assembly.The combined use of Escherichia coli and budding yeast has yielded hundred-kilobase level assembly efficiently [9].Delivery of synthetic genomes is relatively simple in single cellular bacteria and budding yeast.Nevertheless, stepwise substitution, in which synthetic DNA chunks are iteratively introduced to replace the native counterpart by homologous recombination, has to be used during yeast genome synthesis [2].
Multicellularity is accompanied by the substantial increase in gene numbers and genome sizes.The variable and usually vast amounts of repetitive DNA derived from transposable elements account for an increase in genome sizes.Although transposable elements in certain chromosomal regions, such as centromeres and heterochromatin, have been found essential for chromosomal integrity and organismal survival [10], it is speculated that a substantial portion of transposable elements are dispensable.Natural selection is not effective enough to remove these transposable elements.Chromosomal-level elimination of transposable elements is expected to resolve whether a large amount of transposable elements can be simultaneously eliminated.In addition, epigenetic diversity expanded rapidly with multicellularity emergence, representing a new challenge to genome synthesis.It is unclear whether the epigenetic landscape in chromosomal-scale DNA chunks can be restored solely on the basis of sequence.Large DNA fragment transformation and assembly in multicellular organisms are difficult, and regeneration from transformed cells can be difficult, presenting additional hurdles.

The Top-Down Approach for Plant Synthetic Genomics
Because of these difficulties, the abovementioned bottom-up approach has not been applied in multicellular organisms, including land plants.In plants, as well as most species studied so far, centromeres are difficult to synthesize because of their length (a few hundred kilobases or longer) and highly repetitive nature.Furthermore, sequences alone may be insufficient to de novo establish centromeres [11].Therefore, a top-down approach has been proposed, in which a small chromosome, such as the supernumerary B chromosomes of maize, is simplified.By inserting telomeres, chromosome arms can be truncated to form a minichromosome [12].By simultaneously inserting a loxP site, additional DNA fragments can be inserted through recombination, also termed gene stacking.
These pioneering studies have made exciting breakthroughs, and broad applications of the top-down strategy have to overcome certain limitations [13].In particular, the constructed minichromosomes are not yet possible to isolate and transfer to another cell, or even another species, making sequence modifications tedious and time-consuming.In addition, chromosome truncation and assembly often rely on site-directed manipulation, which at least requires insertion of recombinase recognition sites to guide integration of large fragments delivered by Agrobacterium-mediated T-DNA vectors.Fortunately, prime editing and other new methods would facilitate site-specific modifications.

Bottom-Up Synthetic Genomics in Moss
Advances in genome synthesis and chromosome assembly in yeast have shed new light in the field and have encouraged us to test an alternative bottom-up approach to building fully synthetic plant chromosomes in part or completely [13].The early terrestrial plant Physcometrium (Physcometrella) patens is a well-established model organism for nonseed plants [14].As a broadly used model in evolutionary developmental and cell biological studies, the P. patens genome has been fully sequenced [15].Several characters have made P. patens a suitable testbed for bottom-up genome synthesis of multicellular organism.P. patens has efficient homologous recombination [16], providing a start point to explore large (>10-kb) fragment assembly.In addition, P. patens has high protoplast regeneration ability, making it possible to obtain viable plants from transformed cells.This property makes it possible to bypass Agrobacteriummediated transformation.Protoplast can directly uptake large DNA fragments, which may integrate into desired genome loci through homologous recombination.
The P. patens genome is also an excellent system to test genome simplification as it has a relatively high transposable element (TE) content at ~60%, as well as sophisticated epigenetic landscape comparable to seed plants [15], so that the tested genome simplification strategies can be broadly translated and applied to seed plants.In addition, P. patens has been used for decades as a versatile chassis for synthetic biology to express recombinant therapeutic proteins and small natural products of high commercial value.All these characters make P. patens an ideal platform to explore bottom-up synthetic genomics in multicellular organisms.Reconstructing the epigenetic landscape remains a challenge.For example, the epigenetic modification is likely key to centromere functions in P. patens.Sequence analysis indicates that P. patens centromeres are of comparable size to other plants and animals [15] but distinct from the point centromere of budding yeast, implying that epigenetic modification might be key to centromeric functions in P. patens.In cases where epigenetic modifications are not de novo established following DNA sequence guidance, epigenetic editing can be used to precisely modify the epigenetic profile [17].

Challenges in Seed Plant Synthetic Genomics
Building synthetic chromosomes using the bottom-up approach in plant species other than P. patens needs to overcome additional barriers (Fig. 1).We list three outstanding challenges: (a) chromosome assembly, (b) functional centromere, and (c) transformation and regeneration.
Chromosome assembly has proven to be challenging.Whereas homologous recombination is widely utilized in bacteria and yeast genome assembly, its efficiency is extremely low in seed plants.In addition to inserting loxP or comparable recombination insertion sites [12], the CRISPR-CRISPR-associated protein (Cas) technology has been applied to create directional chromosome cleavage, which further induces segment translocation between chromosome arms [18].The controlled megabase pair range exchange between heterologous chromosome arms paves the way for future assembly in seed plants.
There are several criteria to establish functional centromeres.As mentioned above, centromeres in seed plants are not only large in size but also subject to complex epigenetic regulation that is key to their functions.Recent advances in maize and Arabidopsis have shown that a tethering approach that recruits centromeric histone H3 (CENH3) to a synthetic repeat array activates the formation of functional centromeres [19,20].In maize, a LexA-CENH3 fusion protein organizes functional kinetochores at synthetic LexO repeat arrays, leading to chromosome breakages.Chromosome fragments form neochromosomes that can be stably inherited and further self-sustained in the absence of the LexA-CENH3 activator [19].
Although yeast and bacteria assembly of a plant chromosome is potentially possible, the chromosome will have to be transformed into a plant cell.Agrobacterium-mediated transformation, biolistic transformation, and protoplast transformation have their advantages and limitations.If protoplast transformation is used for its possibly higher tolerance to large DNA fragments, regeneration efficiency must be improved to obtain viable plants.

Conclusions
Compared with animals, plants have fewer ethic issues and are more likely to regenerate from a single transformed cell.Hence, it is likely that plants will pioneer genome synthesis in multicellular organisms for testing aggressive deletion of repetitive sequences, epigenetic reconstruction in synthetic sequences, and optimizing design principles.It is easier to test-run genome design, synthesis, and assembly in P. patens, and experience accumulated will be very useful to be applied to seed plants, including crops.Knowledge obtained from this effort, in combination with top-down and bottom-up approaches in other species, will likely to teach us much about what is needed for a functional plant chromosome.Looking forward, these endeavors may also lead to other biotechnology breakthroughs other than providing vectors for large DNA chunks.

Fig. 1 .
Fig. 1.Comparison of genome size and other properties of existing and potential model species for synthetic genomics.HR, homologous recombination.