DNA data
© ymgerman/Shutterstock
If we used DNA like we use magnetic tape to store data today, it's theoretically possible to store all of the information humans have ever recorded in a space roughly the size of a double garage.

Sharing their goals with MIT Technology Review this week, Microsoft Research computer architects say they want to start storing their data on strands of DNA within the next few years, and expect to have an operational storage system using DNA within a data centre by the end of the decade.

As antiquated as it seems, one of the best ways to store a lot of information in a small space right now is good, old-fashioned magnetic tape - not only is it cheap, it's rugged enough to hold information for up to 30 years, and can hold as much as a terabyte of data per roll.

But when we consider more data has been generated in just the past two years than in all of human history, it seems even magnetic tape might not cut it in the next few decades.

A biological material such as DNA might appear to be an odd choice for backing up large amounts of digital information, yet its ability to pack enormous amounts of data in a tiny space has been clear for more than 70 years.

Back in the 1940s, physicist Erwin "cat in a box" Schrödinger proposed a hereditary "code-script" could be packed into a non-repeating structure he described as an aperiodic crystal.

His suggestion famously inspired James Watson and Francis Crick to determine DNA's helical structure based on the research of Rosalind Franklin, sparking a revolution in understanding the mechanics of life.

While strings of nucleic acid have been used to cram information into living cells for billions of years, its role in IT data storage was demonstrated for the first time just five years ago, when a Harvard University geneticist encoded his book - including jpg data for illustrations - in just under 55,000 thousand strands of DNA.


Since then, the technology has progressed to the point where scientists have been able to record a whopping 215 petabytes (215 million gigabytes) of information on a single gram of DNA.

It might be compact, but recording data in the form of a nucleic acid sequence isn't fast. Or cheap.

Last year, Microsoft demonstrated its DNA data storage technology by encoding roughly 200 megabytes of data in the form of 100 literary classics in DNA's four bases in a single process.

According to MIT Review, this process would have cost around US $800,000 using materials on the open market, meaning it would need to be thousands of times cheaper to make it a competitive option.

It was also incredibly slow, with data stored at a rate of about 400 bytes per second. Microsoft says it needs to get to around 100 megabytes per second to be feasible.

It's not clear what efficiencies Microsoft may have found to lower the costs of the process and speed it up, but new technologies have been seeing the cost of gene sequencing drop in recent years, so its end of the decade target may be realistic.

Even then, it's likely it would only be used in select circumstances for customers willing to pay for a specialised storage solution - like critical archives of medical or legal data - rather than as a replacement for current large-scale storage methods.

But while we're speculating, a somewhat more sci-fi use for DNA-based data storage could one day involve living computers.

While Microsoft's DNA storage solution will be based on chips, there's every possibility that future versions of storage could involve enzymes or bacteria engineered to carry out computations.

Even outside of cells, DNA potentially offers novel ways to compute data, opening ways to rapidly crunch numbers for certain problems much as quantum computers do for other areas of mathematics.

For now, it's looking as if DNA has a solid role to play in solving a very real problem that will only get worse.