Skip to main content

Why the future of data is in our DNA

Forget USB drives, we can now store data – even text, music and video – in DNA, writes SCIENCE AND SOCIETY

IN every one of your body’s tissue cells there is one of the most information-dense storage materials that has ever existed: DNA.

Your genetic data contains the information that your body needs to grow, change and maintain each of the organs of your body and, even more apparently, unused information.

The data is stored in a sequence of four chemicals of the DNA molecule — the rungs of the famous helical ladder.

Each rung encodes bases we denote A and G, C and T, analogous to the 0s and 1s in computer binary code.

Reading, copying and recombining of DNA is what has produced all life on Earth. What’s more, the simplicity and ubiquity of the code allows us to trace the origins and relationships between every species.

The data is so securely stored that it may be retrieved from fossils. The oldest human DNA extracted so far is from 45,000 years ago, although the oldest claim goes to scientists who reported extracting bacterial DNA from salt that trapped the material in crystals 450 million years ago.

Historical DNA research is booming, only 30 years since the first ancient DNA was read.

The explosion in DNA sequencing hit the mainstream in the form of at-home DNA testing kits, one of which was an Amazon’s black Friday top five bestseller in 2018.

These kits use the same technology to analyse parts of your genome and return information about your genetic health predisposition, the geographic origin of your ancestors and relatedness to other people.

Aside from warnings about the risk of being given a potentially life-changing diagnosis about how at-risk you are of a serious medical condition, many commentators have warned of the dangers of mass genetic decoding.

The insurance industry is looking to profit off medical genotyping as the information gathered from an estimated 12 million new users already belongs to just a handful of companies.

And they are selling the information as fast as they can: Alphabet (Google’s parent company), and GlaxoSmithKline have deals with Ancestry.com and 23andMe, respectively the first and second largest DNA databases globally.

Even without having their own genome sequenced, individuals can be pinpointed by publicly available genes of distant relations, as in the high-profile case last year of the “Golden State Killer,” convicted decades after rapes and murders in which his DNA was found.

A voluntary gene-sharing site was trawled to find matches to the crime-scene DNA, and the multiple distant relatives compared to pinpoint 17 suspects.

Even using amateur genetics enthusiasts, the potential for individual and mass genetic surveillance appears world-changing.

FamilyTreeDNA, which allows people to search for relatives, confirmed this month that the FBI uses its two million individual dataset to match for suspects.

The potential dataset of all human DNA is enormous. And yet, in an age of big data, when our everyday appliances can record and network vast amounts of data every millisecond, information generation and manipulation on this scale is feasible.

The only question — and it is far from trivial — is how we will be able to store the exponentially growing volumes of data that we have grown accustomed to storing.

The world data storage capacity is estimated at more than a billion terabytes and is growing by 50 per cent every year. This data is stored in 0s and 1s as either magnetic domains, or in the charged and discharged gates on solid state drives.

While improving digital storage technology is an area of intense research, our current ability to store data long-term is extremely limited.

Even the longest-lasting current technology, magnetic tape, has a lifespan of up to just 20 years.

After this point magnetic domains start to degrade, and solid state drives discharge, wiping the information. Without regular back-up, rewriting and transfer, any data stored on the discs is lost forever.

If we continue to produce unmanageably big data sets, we face an information crisis, because high-volume data must be continually replaced, even as the amount of data at risk grows ever greater.

And this is where DNA becomes part of the picture again. Just as researchers have successfully streamlined the reading of DNA code, it is also possible now to write artificial non-life-producing strings of DNA to encode any possible data, not just the blueprint for life.

Given DNA’s extremely high-density, resilience and independence of electricity supply, it is a resource ready to be exploited. DNA itself, which has stored our vital statistics for millions of years, can thus become the databank of future generations.

The technique still has some problems. The most serious is the long read-write time of DNA, currently adding each element of the code takes about five minutes, instead of the microseconds taken on a hard drive, making it suitable only for data which need not be immediately accessible.

Once data is encoded as strings of DNA it can be kept in a storage liquid, or even reinserted into living cells: a Slovenian team showed back in 2016 that living plants could reproduce a short artificial DNA sequence in new cells with perfect accuracy.

When the data is needed, the DNA is simply read out again to reproduce the digital information.

One significant hurdle is the ability to find data within the pool in which it is stored. Imagine needing to print out the entire contents of your computer when you were just looking for one file.

The ability to selectively decode just a specific part of the DNA string would give DNA data storage random access memory.

It is this ability that researchers demonstrated last year. A team based in the University of Washington, working with Microsoft researchers, showed that they could successfully store 35 separate files, including text and a music video, in DNA, and then read them back out again selectively, paving the way for real DNA memory.

It remains to be seen whether the technology can be streamlined to make it marketable, but for long-term mass data storage the future may be in our DNA.

With the ability to store information on this scale, the data age might become sustainable in the long-term. Once data can be recorded in molecules untouched for hundreds of generations, every piece of data could be recorded by default.

With each step increasing powers of indiscriminate and indefinite data storage, questions about whose data is kept, and who has access to that data become even more pressing.

Science and Society is a fortnightly Morning Star column from Joel Hellewell, Rox Middleton and Liam Shaw.

OWNED BY OUR READERS

We're a reader-owned co-operative, which means you can become part of the paper too by buying shares in the People’s Press Printing Society.

 

 

Become a supporter

Fighting fund

You've Raised:£ 10,282
We need:£ 7,718
11 Days remaining
Donate today