January 24, 2013
Shakespeare Sonnets Among Some Of The Data Recently Stored In DNA
April Flowers for redOrbit.com - Your Universe Online
Scientists from Cambridge University have developed a new method of long-term information storage: synthesized DNA.
"We´re using DNA here as a chemical molecule of storage. It just happens to be the same molecule that is used in our bodies as well," Ewan Birney, senior author of the study and geneticist at the United Kingdom's European Bioinformatics Institute told CNN.
DNA that is kept cold, dry and dark will last for a very long time. For example, consider that scientists are sequencing the DNA of woolly mammoths tens of thousands of years after they were stored by chance.
“There must be some point in time when it´s cheaper to store information for that length of time as DNA than as something that requires electricity or some other maintenance cost to keep it around,” Birney said.
The math supports this theory. The research team found that although DNA storage is expensive, in the end it is more cost effective than other methods for preserving a file for 600 to 5,000 years. The study, published in a recent issue of Nature, suggests that the cost of synthesizing DNA will decrease; making it possible that DNA storage could ensure that your grandchildren have access to your wedding pictures.
"The idea that DNA, which people think of as a biological molecule, can be used as a physical storage tape in a non-biological function is pretty incredible," Drew Endy, a Stanford University bioengineer who was not involved in the work, told USA Today.
"It's a really nice example of how a fundamental investment in a basic scientific tool can lead to (amazing things)."
"Anything that you want to store we could store," Birney said. "Really, the only limit is the expense."
DNA storage is definitely expensive. Agilent Technologies provided the DNA for this study free of charge, but the team reports that the commercial rates for DNA synthesis are between $10,000 and $30,000.
According to the research team, this method of storage could encode a zettabyte's worth of data. In other words, the total amount of digital information currently in existence on Earth could fit in this storage. However, this would be "breathtakingly expensive," Birney explains.
The team used five different kinds of digital information to highlight the versatility of their storage method. The five included a text file with William Shakespeare´s 154 sonnets, a PDF of the Watson and Crick paper describing the double helical nature of DNA, a photo in JPEG format of the European Bioinformatics Institute, and an MP3 audio excerpt of Martin Luther King´s “I Have a Dream” speech.
The team encoded these files in the DNA and then by sequencing it, reconstructed them with 100 percent accuracy.
Files on your computer are encoded in binary — a set of ones and twos. To encode information onto the DNA, the team took the digital binary and converted it to base 3 — zeroes, ones and twos. This is then translated to DNA's nucleic acid bases, which are represented by the letters A, C, G, and T.
Every block of eight numbers in the digital code was translated into one of the letters of DNA code. For example, the first word in "Thou art more lovely and more temperate" from Shakespeare's sonnet 18, becomes TAGATGTGTACAGACTACGC.
To test their theory, the team converted the files into DNA code, and then emailed it to Agilent. Agilent made the physical strands of DNA and mailed back a small test tube to the scientists, filled with a speck of DNA that encoded all the information they sent.
Goldman and Birney mixed this into a solution and ran it through a gene sequencing machine to make sure the DNA stored the information correctly. This allowed them to read the complete files again. The Associated Press reports that the "reading" of the information took a little over two weeks, but the team says technological advances are driving that time down.
The European science team is not the first to encode DNA. In 2012, a Harvard University research team published a paper in Science describing their own method of DNA storage, in which George Church encoded a copy of his book "Regenesis," 11 images and a computer program in DNA.
Goldman says the difference in the new study is in error correction. The method has built in measures that adjust for possible errors in translation.
For example, Goldman's method does not allow for identical letters of DNA to be adjacent. In other words, there would be no instances of "AA" or "GG" in the final code. This kind of repetition could cause errors, Birney says. The method also encodes the information multiple times, in multiple ways — including recording it twice backwards, just in case something goes wrong with one copy.
DNA storage devices have the advantage of being light and small. Just one of Shakespeare's sonnets would fit in 0.3 pictograms of DNA. A small test tube holds approximately a petabyte of data — that is a billion megabytes - in a space about as small as the space between the top two joints on your little finger.
“A gram of DNA would hold the same information as a bit over a million compact discs,” Goldman said. “Your storage options are: one thing a bit smaller than your little finger, or a million CDs.”
Goldman was asked if the DNA could pose any danger to health. He responded, "The DNA we've created can't be incorporated accidentally into a genome, it uses a completely different code to that used by the cells of living bodies. If you did end up with any of this DNA inside you it would just be degraded and disposed of."
Church says search engine companies and storage media manufacturers have approached him since the publication of his paper in Science. They are interested in learning more about the technology and the possibly of developing it for commercial viability.
"I thought this was really refreshing that they were willing to think out of the box even though this could conceivably be disruptive to their industry," he said.
With the twin advantages of small size and long endurance, this method of DNA storage could be used to propagate information about our current lives thousands of years into the future. That is, assuming our descendants in the year 4013 understand the language as we speak and write it currently.