User:Morgan G. I. Langille/Notebook/Unknown Genes
<!-- sibboleth --><div id="lncal1" style="border:0px;"><div style="display:none;" id="id">lncal1</div><div style="display:none;" id="dtext">09/23/2010,09/27/2010,09/29/2010,09/30/2010,10/01/2010</div><div style="display:none;" id="page">User:Morgan G. I. Langille/Notebook/Unknown Genes</div><div style="display:none;" id="fmt">yyyy/MM/dd</div><div style="display:none;" id="css">OWWNB</div><div style="display:none;" id="month"></div><div style="display:none;" id="year"></div><div style="display:none;" id="readonly">Y</div></div>
|Customize your entry pages|
Characterization and prediction of unknown genes
Perhaps one of the most frustrating aspects of genome and metagenome analysis is that for many protein families we cannot make any predictions of function using similarity search methods. Such "hypothetical" or "unknown" proteins, represent a significant fraction of the proteins in most genomes or metagenomes (sometimes up to 50%). The percentage of "unknown genes" will probably continue to increase as sequencing technology continues to outpace lab experiments that can shed light on these genes. This severely limits our ability to use metagenome data to understand communities. We propose here to extend some of the work from the initial iSEEM project to develop new computational approaches that will improve the amity to use and interpret unknown famous in Metagenomic data.
This project will result in two major outcomes. First, is a resource that would allow researchers to identify particular genes with unknown function that are a high level of interest due to their presence across the tree of life, their possible role in pathogenisis, or their contribution to species in particular environments. These families of high interest could be targeted for analysis using more traditional lab experiments to determine their function. Second, is a completely novel method that would predict gene function that does not use sequence similarity and would in theory improve as the number of metagenomic datasets become available over time. This method would help annotate the vast number of proteins that we currently can not annotate, which will otherwise continue to be an increasing problem to biology.
Recently Edited Notebook Pages|