Hey everyone. Sorry for the late reply, I was out of town all weekend. Sections 8.3-8.5 are prettybasic, just giving a bit of a biological background on how scientists go about determining the A, G, C, T letter sequence of some unknown DNA sample.
Section 8.3 – DNA sequencing
This section talks about fragment assmebly, a method of DNA sequencing in which one can determine an entire genome by cutting it up into thousands of little DNA fragments and then reassembling them based on overlapping patterns. One extremely successful method of DNA sequencing, and probably the most renowned, was developed by Fred Sanger. By leaving out one of the four bases (A, G, C, or T) when making copies of the unknown DNA sequence, and of course running this experiment once for each base, Sanger discovered that he could in fact copy different length fragments of the DNA. He then organized these fragments by length, and read off the sequence accordingly. The way it sort of works is: each location of a particular base (which ever is left out of the experiment) is highlighted by the fragments that are created. For example, the sequence ACGTAAGCTA is cut into the fragments ACG and ACGTAAGC whenever T is left out of the experiment, highlighting that the base following G (in the first fragment) and C (in the second) is a T. If we took the same sequence, ACGTAAGCTA, and left out G, we would get back the fragments AC and ACGTAA. Once we’ve run the experiment four times, we look at all of our ”ladder” of fragments and can reconstruct the original sequence based off of the overlapping patterns seen. Though the process seems simple enough, it does become a bit complicated because billions of fragments must be measured in order to read the ladder.
8.4 – Shortest Superstring Problem
The sequence that contains all of the information from the readings is know as the superstring. There can be more than one superstring for any given set of data, but the one most useful to us is the shortest, in which no information is duplicated. For example, given the set of strings {000, 001, 010, 011, 100, 101, 110, 111}, we can create a superstring by simply linking each fragment one after the other to create the sequence 000 001 010 011 100 101 110 111. However, by taking into account any overlapping info from our original set of eight strings, we can see that the string 0001110100 also contains all information from the orginal set and it is presented in a much more simplified manner. This shortest superstring turns out to be a reasonable first guess at the unknown genomic DNA sequence.
8.5 – DNA Arrays as an Alternative Sequencing Technique
An alternative sequencing technique, Sequencing by Hybridization, was developed in an attempt to overcome the time-consuming nature of the Sanger method. This method involves constructing a mini DNA array, or chip, that contains thousands of short DNA fragments, called probes. A flourescently labeled strand of DNA is then applied to the array. Those probes which are complimentary to a substring of the added target strand will hybridize, or weakly bond to the target strand at that location. For example, the probe ACCGTGGA would hybridize to the target CCCTGGCACCTA where it is bolded, because ACCGTGGA is complimentary to TGGCACCT. This method of sequencing didn’t really takeoff until1991 with Fodor’s light-directed polymer synthesis, because many were hesitant that creating a bunch of probes and assembling arrays in order to sequence the DNA might not be any more efficient than Sanger’s method. Yet with Fodor’s technique, building an array with probes of length
requires only
reactions, rather than the expected
.
Lauren,
Section 8.3 reminded me of a TV show I watched where the were allowed to film inside the NSA (I think maybe they were the first TV show to do so). They were interviewing someone down in the basement where they handle the waste, and he showed how the NSA not only shreds everything, but then puts it through pulping machines which turns the shredded paper back into mush.*
I’m sure the NSA (and other countries’ spy agencies) has very good algorithms for reassembling the bits of paper that come out of a shredder (even if it’s many thousands of pages all mixed together, maybe with repeated pages, maybe with pages missing) back into the original documents.
* Amusingly, even though all they were showing on the TV was this sludge going by on a conveyor belt (which couldn’t possibly look like anything other than grey mush), the NSA made the TV program blur out the conveyor belt!
The other thing I was going to mention was that the DNA chips they mention in Section 8.3 are widely used these days. I know of researchers who use them. Depending on what you’re doing, you may have to custom order them so that they have the correct collection of probes so you can detect what you’re looking for, and they can be quite expensive (ie. thousands of dollars per chip).
By: jonkujawa on August 10, 2011
at 3:32 pm