This post is part of a series on the fact and fiction of biological computing.
How DNA used to be read
In 1977, Fred Sanger and friends changed science when they discovered a way to read DNA sequences. The ability to read DNA sequences meant that scientists could start to unlock the DNA code. They could start to work out things that we now take for granted like the sequence of individual genes, how genes are regulated, how much of the genome codes for genes and how much codes other stuff, and what that other stuff is.
Sanger’s method is, unsurprisingly, called Sanger sequencing and although it is still used in some labs, and is still the best way to read long sequences of DNA (over 500 base pairs long), it has largely been superceded by Next-Generation sequencing.
In Sanger sequencing, DNA is first denatured, or separated, into single strands. A short sequence of DNA called a primer binds to the separated DNA strands. Then a DNA polymerase enzyme extends the primer by reading the DNA sequence and adding complementary DNA letters (DNA is always made up of two strands of DNA bases, or letters, that are bound together. A always binds with T and C always binds with G).
The extension reaction includes two versions of each DNA letter. One version is normal and the other is fluorescently labelled and chemically altered so that no other base can be added after it. That means that every time a fluorescent base is added to the growing strand of DNA, it terminates the DNA copying reaction.
These DNA segments are then separated by running them along a gel in a thin capillary tube. Smaller segments run faster than larger fragments. The separated segments are excited with a light. Each segment ends with one of the fluorescently labelled DNA bases. Each base emits light with a different wavelength. This is detected as different coloured peaks – see the picture above. The coloured peaks are then lined up into a continuous trace (smallest segment to longest segment) which identifies which base is at each position in a sequence of DNA.
How DNA is read now
Most labs now use Next-Generation sequencing to sequence DNA. The human genome can be sequenced in several hours for $1000 using Next-Generation sequencing.
In this type of sequencing, DNA is first denatured, fragmented into small pieces and tagged with adaptors. The DNA is then put onto a glass slide that has lanes coated in short proteins that bind to the adaptor fragments on the bits of DNA. Now the DNA fragments are held in place on the slide. Then DNA polymerase makes many copies of each DNA fragment which is called cloning or amplifying the DNA. Then sequencing primers are added to the DNA fragments and again DNA polymerase makes copies of the DNA fragments by adding DNA bases to the end of the primers. This time, however, just like in the Sanger sequencing, fluorescently tagged bases are used. The difference is that the growing DNA strand is excited with light every time a new DNA base is added. The new DNA base emits a characteristic signal that tells the machine which base it is: A, T, C or G. So as the DNA fragments grow, their sequence is read in real time. This is called sequencing by synthesis. Once each segment is fully copied (or read), an indexing primer is added and a very short code sequence is copied which tells scientists where in the genome each DNA fragment should go. Then the fragments are lined up and turned back into one continuous DNA sequence.
It sounds slow but the sequencing process is happening for millions of short segments of DNA at same time, so it is faster and cheaper than Sanger sequencing. It is however, still far too slow to be a useful way to read computer files.
Here’s a video from ImGenTechWPI showing the difference between Sanger and Next-Generation sequencing:
You can find the other posts in the bio-computing series here: