Gene memories

Ball, Philip

doi:10.1038/nmat4887

Download PDF

News & Views
Published: 29 March 2017

Material witness

Gene memories

Philip Ball

Nature Materials volume 16, page 393 (2017)Cite this article

1185 Accesses
3 Altmetric
Metrics details

Subjects

The recent announcement by George Church and co-workers at Harvard University of plans to bring mammoths 'back from extinction' — more properly, to create elephants with mammoth features by splicing their genomes with fragments of those preserved in mammoth remains — has provoked some scepticism. But that optimistic scheme does at least remind us how good DNA is at preserving information over very long times.

That's worth remembering if the notion of storing computer information in the chemical structure of a 'fragile' organic molecule seems risky. Church has also been one of the leading proponents of efforts to do just that: to use DNA as a storage medium for information technology. As over 20 years have passed since the notion of DNA computing was introduced¹, and as the capacity of DNA for encoding digital information was in fact apparent ever since Watson and Crick's discovery of the double helix in 1953, it might seem surprising that it has taken so long for DNA storage to be seriously contemplated. What has made the difference is the rapid evolution over the past two decades of technologies for making long stretches of DNA to precise specifications at commercially viable cost, and likewise for reading out the information. Those techniques were not, of course, developed with data storage in mind, but to support gene manipulation and sequencing for biotechnology and genomics. They are augmented now with tools for pinpoint editing of DNA sequences using the CRISPR/Cas9 system — giving would-be DNA data engineers a complete set of affordable tools for input, editing and readout.

The very real potential of DNA as a storage medium has been amply demonstrated. In 2012 Church et al. stored and read out a 5 MB file (a book) using microchips and advanced sequencing²; Goldman et al. subsequently showed how the approach might be scaled up while retaining 100% accuracy³. CRISPR/Cas9 has been used to find and edit specific stored strings in a DNA random-access memory⁴. All the same, encoding arbitrary data remains compromised by several factors. Whereas in principle each nucleotide (A, T, G and C) can encode two bits (00, 01, 10, 11), this coding capacity can't be reached in practice. For one thing, some sequences — for example, those with high guanine-cytosine content — are hard to synthesize and read out accurately. Furthermore, variability in the synthesis and stability of some oligonucleotide sequences makes them unavailable for coding. So the actual storage capacity per nucleotide is limited to 1.8 bits. Previous studies have attained no more than 60% of this theoretical limit.

Credit: PHILIP BALL

Erlich and Zielinski have now used an algorithm familiar to computer science, called fountain codes, to improve this performance⁵. It involves dividing the data set into short, non-overlapping segments in a way that screens and eliminates error-prone sequences. These message fragments, called droplets, are also tagged with random-number 'barcode' sequences for subsequent reassembly.

In this way Erlich and Zielinski could encode and read various data files — a computer operating system, an early motion picture, a computer virus and others — and not only read them back with perfect fidelity but also create error-free copies. The fountain algorithm achieves a capacity per nucleotide of 86% of the theoretical maximum.

This won't yet make DNA data storage a commercial reality: it is still far too costly (about US$3,500 per MB here). Yet not only are these costs still falling, but methods like this can also tolerate cheaper, less accurate synthesis methods than those developed for genomic technologies.

References

Adleman, L. M. Science 266, 1021–1024 (1994).
Article CAS Google Scholar
Church, G. M., Gao, Y. & Kosuri, S. Science 337, 1628–1629 (2012).
Article CAS Google Scholar
Goldman, N. et al. Nature 494, 77–80 (2013).
Article CAS Google Scholar
Tabatabaei Yazdi, S. M. H., Yuan, Y., Ma, J., Zhao, H. & Milenkovic, O. Sci. Rep. 5, 14138 (2015).
Article CAS Google Scholar
Erlich, Y. & Zielinski, D. Science 355, 950–954 (2017).
Article CAS Google Scholar

Download references

Authors

Philip Ball
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ball, P. Gene memories. Nature Mater 16, 393 (2017). https://doi.org/10.1038/nmat4887

Download citation

Published: 29 March 2017
Issue Date: April 2017
DOI: https://doi.org/10.1038/nmat4887

Gene memories

Subjects

References

Rights and permissions

About this article

Cite this article

Search

Quick links

Subjects

References

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links