Wikiomics:Protein function prediction

There are now plenty of proteins which have a totally unknown function. Automated function prediction is an active research field, with a growing community of bioinformaticians as observed at the AFP-SIG that took place at the ISMB 2005 conference, and at University of California San Diego in 2006.

Most often, only the sequence of the protein is known, but there are also hundreds of protein structures of unknown function which are provided by the structural genomics centers. Sometimes the proteins come from prokaryotes where the operons make it possible to infer the function of a protein from its genomic context, but this is more complicated in eukaryotes. And more generally, it is easier to guess right when a given protein has well-described homologs than when it belongs to a family of unknown biological role.

Of course, the notion of protein function is pretty broad and cannot easily be encoded without relying on a complex vocabulary. For that matter, the Gene Ontology aka GO provides hierarchical set of keywords called GO terms which describe different aspects of protein function with different levels of precision. GO is currently imposing itself as a standard for proteome annotation and function prediction of proteins.

Among the current software tools that exist today, several main strategies can be distinguished:
 * homology search and transfer of annotations:
 * sequence alignment
 * structure alignment
 * function inference by genomic context
 * phylogenomic approaches
 * prediction from structure using similarities that are not homology-based:
 * local sequence patterns
 * physico-chemical sequence features
 * 3D local sites
 * 3D physico-chemical features

Servers which competed at the AFP-SIG 2005
See also the short summaries by the authors themselves at the official site of AFP-SIG 2005.

These servers are based on transfer of function based on homology:
 * DBAli Annolite dbali
 * PFP pfp
 * ProteomeAnalyst pa pa-gosub
 * GOPET gopet

And the other servers are:
 * SpearMint and RuleBase (not public yet) wieser2004
 * PhydBac enault2003 phydbac phydbac2 fusiondb phydbac2005 analyzes bacterial proteins using genomic context.
 * ProKnow proknow searches for known 3D folds, sequences, motifs, and functional linkages

Basic tools

 * Wikiomics:BLAST and Wikiomics:PSI-BLAST blast psiblast are commonly used to search for homologous protein sequences by sequence alignment.
 * Prosite prosite_first prosite_last is a searchable database of sequence patterns that are associated with some biological functions.

Other protein function prediction servers
JAFA is a meta-server for function prediction of proteins: it produces a prediction based on an aggregate from other servers. You might want to start with JAFA since it queries 5 servers (GOFigure, GOblet, InterproScan, GOtcha, PhydBac) and shows you where their results agree and differ.

Miscellaneous servers:
 * Protein Function Prediction Server - Protein function predictions from PDB structures dobson-doig . An enzyme/non-enzyme predictor, and an enzyme class predictor are available.
 * GoFigure gofigure predicts the function of a gene or protein
 * ProFunc profunc_a profunc_b performs predictions from a protein structure

See also Wikiomics:Searching for 3D functional sites in a protein structure.

Methods using non-sequential sequence features:
 * ProtFun protfun2002 protfun2003
 * PropSearch propsearch

These methods are based on function transfer after homology searches:
 * Blast2GO blast2go
 * OntoBlast ontoblast
 * GOblet goblet2003 goblet2004
 * GOtcha gotcha
 * Phunctioner phunctioner is a method based on the association of GO terms with conserved residues in 3D structural alignments

Phylogenomic approaches:
 * Orthostrapper orthostrapper
 * RIO rio
 * SIFTER sifter
 * FIGENIX figenix

Credits

 * Martin Jambon: introduction plus the initial list of tools and papers, put together after the AFP-SIG 2005 conference (at ISMB 2005)
 * other Wikiomics authors