Test Tube Holds Artificial DNA, Encoding thousands of copies of the Bible in Two Languages

Let all the food of these good years that are coming be gathered, and let the grain be collected under Pharaoh’s authority as food to be stored in the cities Genesis 41:35 (The Israel Bible™)

The world is choking on digital data – the information that appears on the Internet – from YouTube to social networks like Facebook and Wikipedia. Where is all that stored? On server “farms” – massive warehouses of computer programs or devices that provide a service to another computer program and its user, also known as the client. In a data center, the physical computer that a server program runs in is also frequently referred to as a server.

Not only do they take up space, but they also consume a huge amount of electricity and create pollution – 2% of all global carbon emissions, a similar rate to the cumulative emission of global air traffic and for about 3% of global electricity consumption, which is more than the electricity consumption of the entire UK.

But now, researchers at the Technion-Israel Institute of Technology in Haifa and the Interdisciplinary Center (IDC) in Herzliya have shown how to store such information on “artificial DNA.”  

In living things, including humans, sequences of DNA make up genes in all cells that contain genetic information and can influence the phenotype (a set of observable characteristics of an individual resulting from the interaction of its genotype with the environment of an organism. 

The Israeli researchers, who showed significant improvement in the efficiency of the process needed to store digital information in artificial DNA, have just published their research in the prestigious journal Nature Biotechnology

On one of the shelves in the Technion lab sits a small test tube containing about 10 nanograms (billionths of a gram) of artificial DNA, encoding thousands of copies of a bilingual version of the Bible.

They showed that computerized data can be stored in a density of more than 10 petabytes (one petabyte – PB – is one million gigabytes) in a single gram of DNA while significantly improving the writing process. Such density means that all the information stored on YouTube could be placed in a single teaspoon.

The study was led by research student Leon Anavy, a student in the Technion Faculty of Computer Science, under the guidance of Prof. Zohar Yakhini of the Technion Faculty of Computer Science and the Efi Arazi School of Computer Science at the Interdisciplinary Center Herzliya. The study was conducted in collaboration with Professor Roee Amit’s Synthetic Biology Laboratory at the Technion Faculty of Biotechnology and Food Engineering.

The amount of digital information available to humanity has grown at a tremendous speed since IBM invented the hard disk in the 1950s. Storing this information has become a major challenge, not only in the technological context but also with regard to economic and environmental aspects. 

Server farms are currently responsible for about Against this backdrop, a new technological approach has developed over the last decade: information storage in DNA. This technology allows for significant minimization, longer-term (thousand-fold) retention of information, and zero energy and economic cost of maintenance.

The basic idea of encoding information on DNA is that the DNA molecule is a chain made up of links called nucleotides. The nucleotides are divided into four types marked with letters A, C, G and T. To store information on DNA, each binary sequence (consisting of the 0 and 1 symbols) must be translated into a sequence consisting of these letters. In the next step, in a process called synthesis, actual DNA molecules are produced representing these same sequences. To read the data, these DNA molecules are sequenced. DNA sequencing produces an output that represents the nucleotide sequence that makes up each molecule in the input. That output is then translated into a binary sequence that represents the original message that was coded. Modern technologies support the synthesis of many thousands of different nucleotide series in parallel.

The storage of information on artificial genes is a very complex technological challenge. In the field of information reading (sequencing), there has been tremendous progress driven by the genome revolution; for the writing of information, however, there are still significant technological difficulties and costs are heavier. This is the importance of the breakthrough achieved at the Technion and IDC Herzliya. It allows for: (1) increasing the number of letters used to encode the information (beyond the original 4 letters); (2) significantly reducing the number of synthesis rounds required to store information on DNA; (3) improving the error correction mechanism used.

The Israeli scientists have increased the effective number of letters beyond the four building blocks in natural DNA, using new letters that are unique combinations of the original letters. The idea is similar to the formation of new colors using mixtures of base colors. Increasing the number of letters allows more information to be encoded in each letter in the sequence.

According to Yakhini, “The current synthesis and sequencing processes are inherently redundant because each molecule is produced in large numbers1 and is read in multiple copies during sequencing. The method we developed leverages this redundancy to increase the effective number of letters well over the original four letters, making it possible for us to encode and write each unit of information in fewer cycles of synthesis.”The team demonstrated a reduction of the number of synthesis rounds required per unit of information by 20%. They also showed that the number of synthesis rounds could be reduced in the future by 75% without significant development efforts. This means that the storage process will be faster and less expensive.

“In this work, we have implemented a DNA based storage system that encodes information with synthesis efficiency that is significantly better than the standard approach,” explained Amit. “The study included the actual implementation of the new coding technique for storing large-volume information on DNA molecules and reconstructing it for testing the process.”

The research group has developed advanced error correction mechanisms to overcome errors that are an integral part of biological-physical processes, like the one used here. Part of the DNA sequence of the molecules that store the information, designed by Anavy and Yakhini, is used for this error correction.

According to Anavy, “thanks to the use of error-correction codes that are tailored to the unique encoding we created, we were able to perform highly efficient coding and to successfully recover the information. When working in a system consisting of millions of parts (molecules), even one-in-a-million events occur, which can disrupt the reading. Careful coding allowed us to overcome these problems.” 

According to the researchers, “the technology we presented in the paper has the potential to streamline further processes in synthetic biology and biotechnology. We believe that in the coming years, we will see a significant increase in the use of synthetic DNA in research and industry.”