An Epigenetics‐Inspired DNA‐Based Data Storage System

Abstract Biopolymers are an attractive alternative to store and circulate information. DNA, for example, combines remarkable longevity with high data storage densities and has been demonstrated as a means for preserving digital information. Inspired by the dynamic, biological regulation of (epi)genetic information, we herein present how binary data can undergo controlled changes when encoded in synthetic DNA strands. By exploiting differential kinetics of hydrolytic deamination reactions of cytosine and its naturally occurring derivatives, we demonstrate how multiple layers of information can be stored in a single DNA template. Moreover, we show that controlled redox reactions allow for interconversion of these DNA‐encoded layers of information. Overall, such interlacing of multiple messages on synthetic DNA libraries showcases the potential of chemical reactions to manipulate digital information on (bio)polymers.

Abstract: Biopolymers are an attractive alternative to store and circulate information. DNA, for example,c ombines remarkable longevity with high data storage densities and has been demonstrated as am eans for preserving digital information. Inspired by the dynamic,b iological regulation of (epi)genetic information, we herein present howb inary data can undergo controlled changes when encoded in synthetic DNAs trands. By exploiting differential kinetics of hydrolytic deamination reactions of cytosine and its naturally occurring derivatives,we demonstrate how multiple layers of information can be stored in as ingle DNAt emplate.M oreover,w es how that controlled redox reactions allow for interconversion of these DNAencoded layers of information. Overall, such interlacing of multiple messages on synthetic DNAl ibraries showcases the potential of chemical reactions to manipulate digital information on (bio)polymers.
Means to access,c irculate,a nd preserve information have shaped human society by increasing knowledge,s timulating the economy and enriching culture.I nt his respect, the development of optical and magnetic storage devices has facilitated an unprecedented increase of accessible information, but their limited shelf lives and storage densities have prompted as earch for alternative data carriers. [1] Current lines of research focus on further increasing storage densities by compacting information into single atoms, [2] supramolecular systems, [3] or biopolymers. [4] Nucleic acids,f or example, are remarkably compact and long-lived, and have been proposed for storing digital information. Thea dvent of high-throughput oligonucleotide synthesis [5] and DNA sequencing [6] has allowed DNA-based data storage to rapidly progress from proof-of-concept studies toward systems that can rival established storage media. [7] While such systems have enabled writing and reading of non-trivial amounts of information with synthetic DNAt emplates,t he "one template,o ne information layer" coding scheme employed ( Figure 1A)i si ns tark contrast to natures dynamic control over the primary information encoded in genomes.Inorder to produce ac omplex organism from as ingle genetic makeup, cells regulate access to different layers of information by modifying histone proteins and DNAn ucleobases (Figure 1B). [8] This epigenetic regulation orchestrates processes such as gene expression and ultimately drives cell differentiation. Herein, we apply principles from biological regulation toward DNAd ata storage,t hrough the controlled chemical transformations of nucleobases [9] and their associated binary value.A saresult, we were able to (reversibly) recover multiple layers of binary data from as ingle DNA template ( Figure 1C). Moving away from the conventional concatenation approach to encode information on DNA, [7] we examined the possibility of merging several strings of information within the same DNAm olecule.I nafirsts tep,w ei nvestigated an irreversible chemical transformation to recover two,separate layers of information from as ingle DNAt emplate (Figure 2A). Fort his,w ee xploited the bisulfite ion (HSO 3 À )catalyzed hydrolytic deamination of cytosine (C) to uracil (U, Figure 2B). [10] Thed istinct Watson-Crick base pairing properties of Cand Umean that after PCR amplification, bisulfite converted positions yield 1:1m ixtures of C:thymine (T) for the forward strand and 1:1m ixtures of guanine (G):adenine (A) for the reverse strand ( Figure 2C). These splits arise, because cytosine deamination reactions on one strand are accompanied by retention of the cognate Gb ase on the complementary one.W es urmised that bisulfite-mediated Cto-U conversions could be used to alter bits encoded by C nucleobases and thus allow for merging of two information layers in the same DNAt emplate.S pecifically,w hile Cpositions encode for 0when sequenced directly,the 1:1mixture of Cand Tthat results from chemical conversion is registered as 1. Conversely,Gpositions that encode for 1before bisulfite conversion are transformed to 0f ollowing the chemical treatment ( Figure 2C). When combined with A'sa nd T's (encoding for 0a nd 1, respectively) that are unaffected by bisulfite treatment, appropriately designed sequences (see Supporting Information) have two sense meanings before and after bisulfite conversion. Forexample, Figure 2Dreports our sequencing results for astretch of 40 nucleotides that encodes for ASCII representations of two words.W hile we obtained "BLACK" in the absence of chemical treatment, bisulfiteinduced C-to-U conversion generates "WHITE".
To assess the robustness of this two-layer encoding strategy,w ed esigned and synthesized al ibrary of oligonucleotides encoding ab inary representation of the first stanza of TheRaven by Edgar A. Poe(for details,see Figure S1 and the Supporting Information). [11] When we subjected the same library to bisulfite-catalyzed cytosine conversion before sequencing,w er ecovered the second stanza of the poem as the readout. Overall, our strategy proved to be efficient and selective ( Figure 2E). Without assuming any prior knowledge of the information, we recovered both stanzas error-free.I n the process,8 60 positions were selectively deaminated upon chemical treatment (99.35 AE 0.35 %conversion, Figure S1). In the design we introduced 5-methyl cytosine (5mC), which is largely resistant to bisulfite-conversion, at 68 positions to balance the GC content (37 %inthe library prior conversion) and avoid extended homopolymer runs,w hich can be problematic for high throughput sequencing techniques. [12] Since 5mC deaminates 100 times slower than Cupon bisulfite treatment, [13] these positions register as 0inboth readouts.We confirmed that at 5mC positions 97.0 AE 2.1 %o fs equencing reads indicated non-conversion (i.e.s till read as Cd uring sequencing), and therefore retention of the initial binary value of 0( Figure 2E and Figure S1). Our two-layer encoding Scheme is reminiscent of the Cdeamination process catalyzed by enzymes,such as the Activation-Induced Cytidine Deaminase,toproduce antibody diversity in Bc ells. [14] At hree-layer encoding strategy can be achieved when incorporating an additional chemical transformation that alters an ucleobase,a nd, as ar esult, its associated binary information. As depicted in Figure 3A,this would give rise to another information state,which encodes for athird, distinct message within as ynthetic DNAt emplate.T oward this end,

Angewandte Chemie
Communications we took advantage of the selective,p otassium perruthenate (KRuO 4 )oxidation of 5-hydroxymethylcytosine (5hmC) into 5-formylcytosine (5fC) and 5-carboxycytosine (5caC,F igure 3B). [15] 5fCa nd 5caC are both converted to uracil upon reaction with bisulfite,w hile 5hmC forms cytosine-5-methylenesulfonate which is read as aCupon DNAsequencing.As such, the resultant primary sequence readout of DNAt hat initially comprises 5hmC is different depending on whether or not chemical oxidation is carried out prior to ab isulfitereaction. When aD NA library also comprises 5mC,t hreelayer encoding can be achieved. Assigning bit switches at positions that undergo changes following the use of the described chemical transformations,w er ecovered three strings of information from the same template ( Figure 3C). First, in the absence of chemical treatment, sequencing of the DNAl ibrary revealed af irst message.N ext, by combining KRuO 4 oxidation with bisuflite-catalyzed hydrolytic deamination, as econd information layer is revealed. This process also identifies all 5mC positions present in the DNAl ibrary, as they are the only cytosine species read as C. Finally,b y inverting the binary values at these positions in the third information state,which is obtained by omitting the oxidation step before bisulfite treatment, we recover athird information layer. Following this scheme,wedesigned an oligonucleotide that encodes simultaneously for ASCII representations of the words "BLACK", "WHITE", and "COLOR" (Figure 3D).
To exemplify the robustness and generality of our threelayer encoding strategy,wedesigned and synthesized alibrary of oligonucleotides comprising A, C, G, T, 5mC and 5hmC that simultaneously encodes for three images ( Figure 3E and the Supporting Information). [16] Sequencing data from the original DNAt emplate can be decoded to give ap icture of Charles Darwin. Consecutive treatment with KRuO 4 and NaHSO 3 oxidized 413 5hmCs and deaminated at otal of 892 positions (i.e.all 5hmCs and Cs are converted to Tinthe final sequence readout) in the library and revealed ap ortrait of Rosalind Franklin. As described above,t his process also identified 488 5mC positions.W hena ssigning the opposite Figure 3. Chemical alteration of modified nucleobases enables three-layer encoding in DNA. A) Chemicalt ransformations T 1 and T 2 can be combinedt oaccess anew information state, S 2 ,while the use of transformation T 1 generates athird information state, S 3 ,from the initial state S 1 .U pon sequencing, this strategy results in three distinct readouts. B) KRuO 4 -mediated conversion of 5hmC (T 2 )t o5fC or 5caC. C) Use of KRuO 4 and bisulfite-mediated transformation for the decoding of three layers of information encoded within the same DNA template. Interconversion of bits rely on identifying1:1 mixtures of C:Tand G:Apositions after chemical conversion( same color code as Figure 2C is used). Sequencing before chemical conversion uncovers the first message, while oxidation and subsequent bisulfite treatment (left side) reveals asecond layer of information. 5mC positions are also identified by this procedure, as they are the only cytosine species to be read as Cwhen sequenced. By assigning the opposite binary values at the 5mC positions to the information state obtained by the bisulfite reaction of the original template (S 3 ,right side), athird message is revealed. D) Three-layer encoding proof-of-concept. Shown is a4 0base pair region of an oligonucleotide, which encodes for binary representation of the ASCII text "BLACK", "WHITE"a nd "COLOR" before and after chemical binary values to these positions in the third information state obtained by bisulfite treatment, we recovered ap icture of Alan Tu ring ( Figure 3E). Both the oxidation and bisulfite reactions proved to be robust and selective:w hile A, T, and 5mC positions remained unchanged by the oxidation and/or bisulfite treatment in all experiments (> 98 %r etention of bases), 5hmCs were efficiently converted (96.2 AE 2.1 %) when oxidized but were retained (99.4 AE 0.3 %) in the absence of KRuO 4 .B isulfite conversion of unmodified C'st oT 's (> 98.0 %) was independent of the oxidation step (Table S1). Thee fficiencya nd selectivity of all employed transformations supports the scalability of the overall approach (see the Supporting Information for further discussion).
Ther eversible addition, removal and interconversion of DNAm odifications,t hrough the demethylation of DNA mediated by the Te n-eleven translocation (TET) enzymes for example,a re vital to control the expression of information encoded in genomes. [17] To mimic this type of control in synthetic DNAt emplates we envisioned incorporating the oxidation reaction of 5hmC into ar edox cycle ( Figure 4A). Oxidation conditions were optimized to enable the selective transformation of 5hmC to 5fC( Figure S2), and we employed NaBH 4 as ar educing agent to transform the oxidationderived 5fCb ack to 5hmC ( Figure 4B). [18] These alternating reactions enabled the interconversion of informational states (5hmC!5fC!5hmC) as exemplified in Figure 4C.T oassess the proficiencyofthe employed chemical transformations we performed five consecutive redox cycles on the portraitsencoding library.Whenfollowing the conversion efficiency of 5hmC positions over the course of these 10 transformations we observed the desired cycling behavior,w hile Ca nd 5mC positions remain largely unaffected ( Figure 4D). Oxidation and reduction steps displayed mean conversion efficiencies over the 5f ull cycles of 83.28 AE 4.43 %a nd 83.55 AE 10.77 %, respectively for 5hmC and 5fC( see Table S2). Thea pparent decrease in 5hmC reactivity over five cycles may reflect adegree of over oxidation to 5caC,which cannot be reduced by NaBH 4 .O verall, our redox chemistry for this reversible recovery of multiple information layers was efficient and selective,a nd in its current state enabled the correct bit recovery for 5hmC positions over 4f ull cycles,a nd > 95 % after the fifth reduction (Table S2).
In this manuscript we demonstrate the potential of chemical reactions to manipulate digital information encoded within DNA. While our work focused on storing multiple data sets in one library-a strategy reminiscent of steganography-it is noteworthy that multilayer encoding represents an enticing approach to maximize storage capabilities of DNA templates.The information content of additional layers could be repurposed for different tasks,s uch as error-correcting algorithms or encoding barcodes that are usually installed into synthetic libraries at the expense of storage space. [7] The use of additional modified nucleobases (see Figure S3 for an example) [19] together with reversible chemical reactions and direct sequencing readouts (e.g. nanopore sequencing) should enable the development of more complex systems. [20] One significant challenge lies in the engineering of polymerases and other enzymes that will allow for amplification and sequencing of templates containing modified nucleobase. [21] Future efforts are likely to expand on our approach by also employing non-natural, sequence-specific oligomers [22] that will enable greater control over the encoded digital information. Ultimately,such developments may permit the design of multistable DNAs ystems that could facilitate the development of operative,m olecular computers,s uch as Tu ring machines.
[23] Figure 4. Interconversion of three informationl ayers encoded in DNA. A) Combining reversible chemical transformations (T 2 and T 3 )a llows for cycling between the information states S 1 and S 1 ',a nd enables the reversible encoding of three layers of information in DNA. B) Redox interconversion between 5hmC and 5fCbyK RuO 4 (T 2 )a nd NaBH 4 (T 3 ). C) Sequencing results of 413 5hmC positions within the portraitsencoding DNA library ( Figure 3E)o ver one full redox cycle. Starting from af ully deaminated reduced state (S 1 ), mild oxidation prior bisulfite treatment converts all 5hmC positions into S 1 ',w hile subsequent reduction restores the initial state of the oligonucleotide library. Conversion efficiencies are measured by subsequent bisulfite treatment to recover S 2 and S 3 .D )Sequencing results for 5hmC positions (in %cytosine called, data represent mean values AE standard deviation, n = 413) over five full cycles (red and black points represent the oxidized and reduced states of the 5hmC's, respectively). C's(yellow, n = 479) and 5mC's( blue, n = 488) are not affected by the chemical transformations. 5hmC positions, on average, remain distinct from other cytosine species after five cycles. number FP7-PEOPLE-2013-IEF/624885). TheS .B.l ab is supported by ap rogram grant and core funding from Cancer Research UK (C9681/A18618), an ERC Advanced grant (339778) and by aS enior Investigator Award of the Wellcome Tr ust (099232/Z/12/Z). We thank Eun-Ang Raiber and Dario Beraldi for stimulating discussions and proofreading the manuscript.