Will DNA Data Storage Be The Future?

Contents

  1. What Is Data?
  2. What is DNA and why are we storing data in it?
  3. Storing the DNA
  4. Why is DNA a good alternative?
  5. Other Alternatives?
  6. How would you encode Data into DNA?
  7. Why aren’t we using DNA storage now?
  8. Conclusion
  9. References

What Is Data?

All data is stored in a base 2 system called binary. There are only 2 values in binary, 1 and 0. In computers, these are represented as transistors, or switches. When the switch is on, 1 and when it’s off, 0. Computers are essentially a collection of these binary digits, stored in sets called ‘bytes’.

Each binary value represents one bit and 8 bits is a byte. From there we go on to have KB (1024 bits) and MB (1024 KB) and so on. Using bytes, it is possible to make any number from 0 to 255. This is how we store data in the modern age, as a collection of 1’s and 0’s, capable of storing data that exceeds 1.2 million terabytes [1].

What is DNA and why are we storing data in it?

DNA (stands for deoxyribonucleic acid) is a self - replicating biological compound which in itself carries the genetic information required to build life. It is seen as the “Building Blocks” of our biology and is the foundation of all natural life and is made up of 4 nitrogen containing bases called “nucleotides”. Adenine (A), Thymine (T), Cytosine (C) and, Guanine (G).

DNA Structure
Figure 1: DNA Structure

Since DNA is the foundation of all life, it has to be incredibly efficient at storing and duplicating information, this storage and duplication of data is used as a “blueprint” to build everything in our bodies from the epidermis to the leukocytes. Over the millennia, DNA has evolved to become one of the most efficient ways of storing information that mother nature has created.

Storing the DNA

DNA storage is the process of encoding binary data (bytes) from data servers and storing them in the ATCG combinations of synthesised DNA. Currently, this process is being done via a single strand of DNA and labelling the individual ATGC bases with fluorescent markers [2]. The fluorescence markers can be detected via scanning and cameras to decode the DNA sequence.

Why is DNA a good alternative?

DNA is capable of storing incredible amounts of data in a small space. A collaboration with Columbia University and New York Genome centre, published a method known as the ‘DNA Fountain’ in 2017 [3], where 215 petabytes (2.15 x 108 GB) of data was stored in 1 gram of DNA. This exceeded expectations of the scientists, achieving 85% of the theoretical limit of 253 petabytes per gram according to the Shannon Limit (A limit on the amount of data that can be stored over a channel with a very small error [4] )

DNA also requires no energy to keep stored over time, reducing the number of resources required to keep the data clean and error free. Another benefit of DNA storage is its non-power intensive storage capabilities, only requiring a room of -18 °C to survive for more than a million years [5] and due to this, only an automated thermoregulated room with inert materials to embed the molecules in would be required to keep it cool and thus requiring no maintenance to store the DNA over time.

Other Alternatives?

Optical

In the mid 20th century, the first optical disc was created. Over time, they became more sophisticated and efficient, becoming the DVD’s we used in the recent past. They work via a laser beam which encodes data onto a disk which is separated into tracks and each track has ‘pits’ burned into them [6] and the parts which are not indented are called ‘lands’. A low - power laser scanner is then used to read these, the variations in light intensity due to the pits and lands are converted into electrical signals, a pit or land, a ‘0’ or a ‘1’. This technology was used to store and read data only but as time went by, they became rewritable as well.

While they are an amazing non - volatile (not vulnerable to data loss due to power failure) way of storing data, unfortunately, they have already become obsolete. The invention of other, more efficient methods of data storages, such as flash memory, have made the optical storage useless due to its bulky size and slower read / write speeds [6]. So while they may have been the superior storage medium in the past, currently they are becoming redundant.

Optical Disc
Figure 2: Optical Disc - Encyclopaedia Britannica

Magnetic

Magnetic drives, or HDD (Hard disk drive) as we usually refer to them, are the most conventional type of non - volatile storage media used in this day and age for storing data. They can store massive amounts of data, modern everyday computers are even able to have many terabytes of storage in their computers due to this technology. They work, as the name suggests, using magnetism. They consist of a number of ‘disks’ which are coated in a magnetic material. The changes in the direction of the magnetic field are detected and recorded as bits, being either a ‘1’ or a ‘0’ [7].

While this is the most used type of storage, they may also become obsolete in the future due to the creation of Solid state memory. As HDD has moving parts, its degradation over time is greater than Solid state.

Flash

The most modern type of non - volatile storage used is Flash memory. Currently, they are mostly used as compact SD cards and flash drives however they are now being integrated into everyday computer desktops as SSD’s (Solid State Drives). A major benefit of SSD, apart from its small size, is its machinery, it has no moving parts. While both optical and magnetic drives have moving disks, solid state drive works by using floating gate transistors, the Heisenberg’s uncertainty principle and quantum tunneling.

Floating Gate Transistor
Figure 3: Floating Gate Transistors - Flashdba

While this is very complicated and mostly goes over my head as it involves quantum mechanics, I’ll try to explain it as best as possible. When a high voltage is applied to the transistor, the electrons gain energy and begin to move faster. When electrons enter the transistor, some are stored in the floating gate section, causing it to be non-volatile as even with no power the transistor is still functioning. If the electron has enough energy and the oxide layer separating the electrons from the floating gate is thin enough, some of the electrons can travel (quantum tunnel), as we know from the Heisenberg uncertainty principle, through the oxide layer to the floating gate [8]. Once inside the gate, the electrons are trapped again due to the oxide layer and the gate is fixed in the ‘on’ or ‘off’ position, a ‘1’ or a ‘0’.

However, due to the effects of quantum tunneling, the flash memory will degrade over time, but not due to its age but rather due to how many write cycles it has executed as high voltages eventually degrade the transistor [8].

How would you encode Data into DNA?

In bioinformatics, the standard text file format used to store the nucleotides sequence of DNA is a fasta file, it’s easier for scientists as nucleotides or whole amino acids can be displayed as single lettered codes making it easier to read and requiring less memory to store.

A simplified example of how the nucleotide sequence encoding would be:

# Defines the function where Binary to Dna conversion will take place
def DNA(Binary):

# Stores the conversion in a dictionary, so it can be accessed anywhere in the function
    Conversion = {
    "00": "A",
    "01": "T",
    "10": "G",
    "11": "C"
    }

# Creates an empty array to store the string
    Output = []
# list() takes a single argument, Binary, and converts to a list, binary_list
    binary_list=list(Binary)
# Repeats the iteration, depending on how many 2 bit sequences are in the Binary variable
    for i in range(0, len(Binary), 2):
    # Looks for the keys from the conversion dictionary values and converts the 2 bits from binary to its DNA sequence
        for x in Conversion.keys():
            # if statement telling the empty array, Output, to add the newly converted DNA sequence to itself if it matches the functions input 
            if x == Binary[i:i+2]:
                Output.append(Conversion[x])
    # Prints out the completed list, concatenating the array together to produce a string
    print(''.join(Output))

DNA("10001100110111100111")

# Output: GACACTCGTC

To make this better, a fasta text file reader section could be added to retrieve the nucleotide data and encode it from the file as well as a more efficient way to encode the binary.

Once this sequence has been encoded, CRISPR gene editing technology can be used to insert this sequence into synthetic DNA to be stored in a genome of a cell and keep it secure [9].

Why aren’t we using DNA storage now?

One of the reasons why we can’t use DNA storage right now is mutations. DNA is prone to mutation and as a result, can lead to errors in the bases and therefore errors in the data. The problematic mutations lead to insertion, deletion or, substitution of nucleotide bases. However, there have been improvements in this department with the Reed–Solomon error correction code to reduce these errors from occurring [10].

Another problem with DNA storage is the price, in the DNA fountain study [3], it cost $7000 (£5230) to produce 2MB of data and an additional $2000 (£1490) to decode it.

Although, a paper published by Microsoft in collaboration with the University of Washington in 2019 [11], demonstrated a fully automated system to encode and decode the data from DNA, however, this is still in the early stages and requires more research and development.

Conclusion

DNA storage is a very useful alternative and maybe the future of data storage as it overcomes the flash memory problem of degradation of data over time as it can be stored for millions of years, and it also overcomes the optical and magnetic problem by being incredibly efficient at storing data in a very small space. It is still in development and is not currently as convenient as magnetic, optical and flash are, as it still has problems with errors and overwriting the data that’s on the DNA. While I do see a future where it could be used in data centers, reducing the space required to store data, I don’t think we will see it any time soon being used conventionally in everyday computers as it may not be feasible right now.

However, DNA data storage is a very exciting piece of technology and it is always amazing to know how incredible nature really is at tasks that it was not meant to handle in the first place. No one would have thought that you could use the DNA that holds your own genetic material to store bits made up of 0’s and 1’s, yet here we are, on the verge of such brilliant technology.

By Owais M Siddiqi


References

[1] Science Focus, Written by Gareth Mitchell, How much data is on the internet?, [Accessed August 26, 2020]

[2] Ceze L, Nivala J, Strauss K (August 2019). “Molecular digital data storage using DNA”. Nature Reviews. Genetics. 20 (8): 456–466.

[3] Erlich Y, Zielinski D (March 2017). “DNA Fountain enables a robust and efficient storage architecture”. Science. 355 (6328): 950–954.

[3] Venture Beat, Written by John Brandon, Why the iRobot Roomba 980 is a great lesson on the state of AI, published November 3, 2016 [Accessed July 27, 2020]

[4] Saleem Bhatti. “Channel capacity”. Lecture notes for M.Sc. Data Communication Networks and Distributed Systems D51 – Basic Communications and Networks

[5] Reuters, Science News, Written by Matthew Stock, DNA data storage could last thousands of years, published 22 March, 2016 [Accessed August 27, 2020]

[6] Britannica, Written by The Editors of Encyclopaedia Britannica, Optical storage, published 9 July, 2020 [Accessed August 27, 2020]

[7] Chron, Written by Jane Williams, Are Flash Drives Optical or Magnetic?, [Accessed August 27, 2020]

[8] Bez, R., Camerlenghi, E., Modelli, A. and Visconti, A., 2003. Introduction to flash memory. Proceedings of the IEEE, 91(4), pp.489-502.

[9] Ceze L, Nivala J, Strauss K (August 2019). “Molecular digital data storage using DNA”. Nature Reviews. Genetics. 20 (8): 456–466

[10] Grass RN, Heckel R, Puddu M, Paunescu D, Stark WJ (February 2015). “Robust chemical preservation of digital information on DNA in silica with error-correcting codes”. Angewandte Chemie. 54 (8): 2552–5.

[11] Microsoft, Written by Jennifer Langston“Microsoft, UW demonstrate first fully automated DNA data storage”. Innovation Stories., published March 21, 2019 [Accessed August 29, 2020]