Harvard Scientists Write Book In DNA And Accurately Copy, Read It Back
August 17, 2012

Harvard Scientists Write Book In DNA And Accurately Copy, Read It Back

Lawrence LeBlond for redOrbit.com - Your Universe Online

DNA, the building block of life, is now home to more than just the world´s living creatures. Scientists from Harvard University report that they have written an entire novel in DNA, a feat that could revolutionize our ability to save data.

Our genetic code packs billions of gigabytes into a single gram. That is significantly more information that a single microchip could even think about storing. In fact, a single milligram of genetic code could encode the entire Library of Congress and still have room to spare.

Long held as only a theory, the storage of data in DNA has now tipped the genetic scale and has become a reality. George Church of Harvard Medical School and his colleagues stored an entire genetics handbook in less than a picogram - trillionth of a gram -- of DNA.

The experiment, reported in Thursday´s edition of the journal Science, could pave the way for eventual data-storage systems that can handle vast amounts of data, perhaps millions of times more data than a single hard drive can handle. Using next-generation sequencing technology, the Harvard team, were not only able to encode the book in DNA, but also were able to accurately copy and read it.

A few other teams have tried to write data into the DNA of living cells. But because the approach carries some disadvantages, it may not prove feasible. Because cells die, writing data into genetic code could mean that you are going to ultimately lose your work. And because cells also replicate, there would be the possibility that new mutations could change the data.

To work around these possible scenarios, Church and his colleagues created a DNA information-archiving system using no cells at all. Instead, they utilized an inkjet printer to embed short fragments of chemically synthesized DNA onto the surface of a tiny glass chip. To encode the file, the team divided it into tiny blocks of data and converted it not into typical digital storage 1s and 0s, but rather DNA´s four-letter alphabet of As (adenine), Cs (cytosine), Gs (guanine) and Ts (thymine).

The team explained that each DNA fragment also contains a digital “barcode” that records its location in the original file. Reading the data requires a DNA sequencer and a computer to put back together the DNA puzzle of fragments in order to convert them into digital format. The computer also corrects for errors; each block of data is replicated thousands of times so that any chance glitch can be identified and fixed by comparing it to the other copies.

To demonstrate the technology, the team used the DNA chips to encode a genetics book co-authored by Church -- “Regenesis: How Synthetic Biology Will Reinvent Nature and Ourselves in DNA.” After converting the book into DNA and translating it back into digital form, the team´s system only produced a rate of two errors per million bits of information, and only amounted to a few single-letter typos, which is on par with DVDs and far better than magnetic hard drives.

However, the impracticability of such a system is not there right now. Sequencing DNA is a costly procedure and is not feasible for general use, according to Daniel Gibson, a synthetic biologist at the J. Craig Venter Institute in Rockville, Maryland. However, he noted, “the field is moving fast and the technology will soon be cheaper, faster, and smaller.”

The production costs of generating raw, unassembled DNA sequence data, have dropped from $10,000 per million base pairs of DNA in 2001 to around 10 cents per million base pairs in 2012, according to the National Human Genome Institute.

Gibson led a team in the creation of the first completely synthetic genome, which included a “watermark” of extra data encoded into the DNA. His team used a three-letter coding system that is less efficient than Church´s system, but did have built-in safeguards to prevent living cells from translating the DNA into proteins.

“If DNA is going to be used for this purpose, and outside a laboratory setting, then you would want to use DNA sequence that is least likely to be expressed in the environment,” Gibson told John Bohannon of Science Magazine.

Church disagrees. Unless someone deliberately “subverts” his DNA data-archiving system, he sees little danger.

Church, a founding core faculty member of the Wyss Institute for Biologically Inspired Engineering at Harvard University and the Robert Winthrop Professor of Genetics at Harvard Medical School, said unlike some experimental media that requires incredibly cold temperatures and tremendous energy, DNA is stable at room temperature. “You can drop it wherever you want, in the desert or your backyard, and it will be there 400,000 years later.”

Church admits that reading and writing in DNA is slower than in other media formats, which would make it better suited for archival storage of massive amounts of data, rather than for quick retrieval or data processing.

“The information density and scale compare favorably with other experimental storage methods from biology and physics,” said Sriram Kosuri, a senior scientist at the Wyss Institute and senior author on the paper.

“A device the size of your thumb could store as much information as the whole Internet,” said Church. A billion copies of his book could easily fit into a test tube, he added.

In all, Church´s book contains 53,426 words, 11 illustrations and a JavaScript computer program. The 5.27 megabits of data are more than 600 times bigger than the largest data set previously encoded in DNA. It is the equivalent of the storage capacity of a 3.5-inch floppy computer disk.

“This new work demonstrates that there is a whole new market for these technologies, to synthesize DNA for people who want to store information,” pioneering synthetic biologist Drew Endy at Stanford University, told The Wall Street Journal.

Church said he first considered encoding the novel "Moby Dick," but then chose to use his own manuscript because its combination of words, pictures and JavaScript code would better showcase DNA´s capacity to handle different kinds of information.

The study was supported by the US Office of Naval Research, Agilent Technologies, and the Wyss Institute.