Home
About Us
Members
Gallery
Events Calendar
Seminars
Dr. Selengut
Film
Dr. Glass
Fieldtrip
Sci & Tech Day
Parties
Driving Directions
Forms
Links
FAQs
Contact Us
Site Map
MB3 Club Pizza Seminar
October 5, 2007
Friday 12pm-1pm
SM 340


Dr. Jeremy Selengut
Staff Scientist III
Applied Bioinformatics Department
J. Craig Venter Institute

selengut@jcvi.org

"On Beyond BLAST: Calculating Biological Processes from Genomic Data"

ABSTRACT:

Phenotypes, the observable properties of organisms, are related to the functions of sets of genes. In order to predict whether a genome encodes the potential to express a particular phenotype one must have, 1) a definition of what the underlying components of the system are, and 2) methods for detecting the genes encoding those components. Definitions must allow for the complexity observed in nature in which a process may be achieved by differing modules in combinatorial ways. Detection methods must be sensitive and specific, identifying candidate genes and separating them from ones with similar but distinct functions.

Pairwise homology techniques such as BLAST generate measures of relationships between sequences. One of the major pitfalls of bioinformatics is the transitive annotation of functions via pairwise alignment scores alone. Such scores are not accompanied by guidelines, which say how good a score should be to conclude that two sequences are equivalent in function: one size does not fit all. Additionally, not all annotations are based on sound analysis: the transfer of bad annotation based on good evidence is still bad annotation.

The failure to recognize these two issues has polluted the gene-to-function information found in public databases. Reliance on annotations therefore, to make assertions about the presence or absence of biological processes is prone to error.

Reliable automated gene function determination can be achieved by 1) providing threshold scores below which assignment of function is unsupported, 2) predicating the homology method on comparison to high-confidence ( e.g. laboratory-characterized) assignments of function and 3) utilizing multiple sequence alignment and phylogeny methods. The TIGRFAMs library of equivalog HMMs (Hidden Markov Models) is one example of such a resource.

The Genome Properties system combines HMM models (and other suitable methods) to create and evaluate definitions of biological processes. The system implements robust logical and hierarchical structures to make assertions about biological processes.