dna data storage computers
© Getty ImagesIn an age of gargantuan, power-sucking data centers, the space-saving potential of data stored in DNA is staggering.
How a synthetic version of our genetic code could become the world's most efficient hard drive

A quick riddle: What do 100 works of classic literature, a seed database from the nonprofit Crop Trust and the Universal Declaration of Human Rights have in common? All of them were recently converted from bits of digital data to strands of synthetic DNA. In addition to these weighty files, researchers at Microsoft and the University of Washington converted a high-definition music video of "This Too Shall Pass" by the alternative rock band OK Go. The video is an homage to Rube Goldberg-like contraptions, which bear more than a passing resemblance to the labyrinthine process of transforming data into the genetic instructions that shape all living things.

This recent data-to-DNA conversion, completed in July, totaled 200 megabytes—which would barely register on a 16-gigabyte iPhone. It's not a huge amount of information, but it bested the previous DNA storage record, set by scientists at Harvard University, by a factor of about 10. To achieve this, researchers concocted a convoluted process to encode the data, store it in synthetic DNA and then use DNA sequencing machines to retrieve and, finally, decode the data. The result? The exact same files they began with.

Which raises the question: Why bother?

"We are seeing this explosion in the amount of data that needs to be stored," says Karin Strauss, the principal Microsoft researcher on the project. "To continue storing this information, we need radical new approaches." In an age of gargantuan, power-sucking data centers, the space-saving potential of data stored in DNA is staggering. "You can archive all the data on the internet in a shoebox," says Luis Ceze, an associate professor of computer science and engineering at the University of Washington.
data storage dna
© Alex WalkerStep 1:
From 1’s and 0’s To A’s, C’s, G’s and T’s
Every data file can be reduced to binary code, the 1’s and 0’s of computing. Researchers have developed programs to convert that code into the four-letter alphabet (A, C, G and T) corresponding to the building blocks (or bases) of our genetic code: adenine, cytosine, guanine and thymine. This program is capable of translating any file—from this article to a high-definition film—into a sequence of letters that act as a precursor to strands of synthetic DNA.
See an interactive graphic of the six-step process here.

Easy, economical data access could also address concerns about data sovereignty, a hot-button topic in Europe, where regulators are pressing companies that hold sensitive information—financial services firms, health care organizations—to store information locally. DNA storage could, eventually, provide a cheaper and more eco-friendly alternative to huge server farms.

Perhaps more important, DNA could prove a far more durable storage medium than our present options. "If you look at digital data storage, it's an ephemeral thing," says Bill Peck, chief technology officer at Twist Bioscience, a San Francisco startup that's creating synthetic DNA for the Microsoft-University of Washington team. Hard disks and flash drives can crash without warning, and some last just a few years. Magnetic tape may survive a few decades, and DVDs even longer, but they are by no means immortal. Data stored in DNA, provided it's kept cold and dry, could last for thousands of years.

While the concept is promising, the technology is years, perhaps even decades, from moving out of labs and into everyday use. The cost of this convoluted process—particularly of creating synthetic DNA—is exorbitant, and it hasn't experienced the price drop we've seen in DNA sequencing as demand has increased, thanks to medical applications such as disease screening. But if researchers can prove the viability of DNA data storage, it could spark the same sort of market dynamics. And that could reduce the Rube Goldberg complexity as well.