« The Other Cat Makes Her Move | Main | Friday »
February 23, 2007
NMPDR - The 48-Hour Key to Understanding DNA
Bruce's day job, as regular readers know, is maintaining a database of genetic information. The goal of any genetics project is to find out which parts of the DNA do what. Insiders call this genetic annotation, and this article is about a major step forward in that art.
To understand annotation, you first need to understand DNA. The basic principle of DNA is simple: it's a set of instructions written in an alphabet of four letters-- adenine, thymine, cytosine, and guanine. Any sequence of 3 letters corresponds to a single amino acid. Most genes specify a string of amino acids that is converted into a protein. The mix of proteins floating around a cell determines what chemical reactions take place. Now, if I had designed DNA, I would section it into rigid, three-letter sections and have a start and end marker around each gene. God, however, designed DNA so you can add and delete stuff willy-nilly without breaking the process. As a result, the distance between the end of one gene and the start of the next is not necessarily a multiple of three, there's extra garbage that can be used to create entirely new genes in your children, and some of the genes point backwards.
Basically, DNA is horribly inconvenient for scientists to understand, but it's very intelligently designed for evolution. This is one reason why I believe that God has a sense of humor. (The other reason is that cats can get sick from swimming but love to eat fish.)
The DNA for an organism is called its genome. A simple bacterial genome contains a few thousand genes. Human genomes contain upwards of thirty thousand. To annotate the genome, you have to first find all the genes (while not knowing how many there are), and then figure out what they do.
The old-school way of doing this is to find similar DNA sequences in other organisms and assume they have similar functions. This process is symbolized by the blue lines in the diagram on the left. There is, however, a better way, and it relates to the one thing God made simple: genes that are involved in the same process tend to be close to one another and pointing in the same direction. If you take a single process and find its genes on a whole bunch of DNA sequences, you can organize your work in a table, which makes it both easier and faster. This method is called subsystem annotation, and it's represented by the yellow arrow in the diagram.
For several years now, a team of microbiology experts all over the world have been using subsystem annotation to find and annotate genes in the DNA for pathogens and other species. This data is kept in a database called SEED, and it is processed by Bruce every few weeks to create the web pages on the NMPDR web site.
For the past two years, the research group Bruce works for has been developing tools for automating gene annotation using all they've learned creating the SEED data. The result is a really astounding advance called The 48-Hour Server. A biologist who has a registered account on the server can upload a genome and get back annotated genes in 48 hours.
The diagram on the left shows how this works. The Rapid Propagation tool finds the genes in the new genome. Next, the Quality Check displays statistics on problems with the genes found. These statistics enable a staff biologist to make improvements to the list of genes. The Similarity Computation phase looks for similar genes in genomes already in the system. Finally, the Automated Assignment phase annotates the genes.
The 48-Hour Server is currently running on a test machine at the lab, where they are pushing new genomes through it to shake out all the bugs. Eventually, the real microbiologists on the project will be writing a paper on how the Server works and how to use it, at which point it will become available to the public. None of them are cats, however, so whatever paper they write can't possibly be as interesting as this article.
But consider this: if we can determine my complete DNA sequence, then it could be analyzed in only two days, and then we can begin figuring out why I'm so much smarter than everybody else. It would be a huge step forward for modern science, so long as the sequencing process doesn't involve putting me in a centrifuge or something equally messy.
Anyway, the point is that our understanding of gene annotation has reached a critical mass, and a lot of good stuff will be happening real soon.
Respectfully submitted,
Ferdinand T. Cat
# At Fri 6:12 PM | Permalink | Trackback URI | Comments (0) | More NMPDR | Tags: automated bioinformatics DNA DNA annotation microbiology subsystems
Trackback Pings
» Open Trackback Weekend #42 from 123beta
Alright, it's the weekend once again... And I have to work. Ugh [Read More]
Tracked on February 23, 2007 6:54 PM
Comments
| HTML is not allowed in comments; however, if you put in a raw URL (http://www.somewhere.com/page.html) it will automatically be converted to a link.. Also, it is likely your comment will not appear unless you refresh the page manually after posting it. |

