Wikiomics:Protein function prediction
There are now plenty of proteins which have a totally unknown function. Automated function prediction is an active research field, with a growing community of bioinformaticians as observed at the AFP-SIG that took place at the ISMB 2005 conference, and at University of California San Diego in 2006.
Most often, only the sequence of the protein is known, but there are also hundreds of protein structures of unknown function which are provided by the structural genomics centers. Sometimes the proteins come from prokaryotes where the operons make it possible to infer the function of a protein from its genomic context, but this is more complicated in eukaryotes. And more generally, it is easier to guess right when a given protein has well-described homologs than when it belongs to a family of unknown biological role.
Of course, the notion of protein function is pretty broad and cannot easily be encoded without relying on a complex vocabulary. For that matter, the Gene Ontology aka GO provides hierarchical set of keywords called GO terms which describe different aspects of protein function with different levels of precision. GO is currently imposing itself as a standard for proteome annotation and function prediction of proteins.
Among the current software tools that exist today, several main strategies can be distinguished:
- homology search and transfer of annotations:
- sequence alignment
- structure alignment
- function inference by genomic context
- phylogenomic approaches
- prediction from structure using similarities that are not homology-based:
- local sequence patterns
- physico-chemical sequence features
- 3D local sites
- 3D physico-chemical features
Servers which competed at the AFP-SIG 2005
See also the short summaries by the authors themselves at the official site of AFP-SIG 2005.
These servers are based on transfer of function based on homology:
And the other servers are:
- SpearMint and RuleBase (not public yet) 
- PhydBac [7, 8, 9, 10, 11] analyzes bacterial proteins using genomic context.
- ProKnow  searches for known 3D folds, sequences, motifs, and functional linkages
- Wikiomics:BLAST and Wikiomics:PSI-BLAST [13, 14] are commonly used to search for homologous protein sequences by sequence alignment.
- Prosite [15, 16] is a searchable database of sequence patterns that are associated with some biological functions.
Other protein function prediction servers
JAFA is a meta-server for function prediction of proteins: it produces a prediction based on an aggregate from other servers. You might want to start with JAFA since it queries 5 servers (GOFigure, GOblet, InterproScan, GOtcha, PhydBac) and shows you where their results agree and differ.
- Protein Function Prediction Server - Protein function predictions from PDB structures . An enzyme/non-enzyme predictor, and an enzyme class predictor are available.
- GoFigure  predicts the function of a gene or protein
- ProFunc [19, 20] performs predictions from a protein structure
Methods using non-sequential sequence features:
These methods are based on function transfer after homology searches:
- Blast2GO 
- OntoBlast 
- GOblet [26, 27]
- GOtcha 
- Phunctioner  is a method based on the association of GO terms with conserved residues in 3D structural alignments
- Wikiomics:Automated function prediction of genes and proteins, our local community pages
- Martí-Renom MA, Ilyin VA, and Sali A. DBAli: a database of protein structure alignments. Bioinformatics. 2001 Aug;17(8):746-7. DOI:10.1093/bioinformatics/17.8.746 |
- Hawkins T, Luban S, and Kihara D. Enhanced automated function prediction using distantly related sequences and contextual association by PFP. Protein Sci. 2006 Jun;15(6):1550-6. DOI:10.1110/ps.062153506 |
- Szafron D, Lu P, Greiner R, Wishart DS, Poulin B, Eisner R, Lu Z, Anvik J, Macdonell C, Fyshe A, and Meeuwis D. Proteome Analyst: custom predictions with explanations in a web-based tool for high-throughput proteome annotations. Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W365-71. DOI:10.1093/nar/gkh485 |
- Lu P, Szafron D, Greiner R, Wishart DS, Fyshe A, Pearcy B, Poulin B, Eisner R, Ngo D, and Lamb N. PA-GOSUB: a searchable database of model organism protein sequences with their predicted Gene Ontology molecular function and subcellular localization. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D147-53. DOI:10.1093/nar/gki120 |
- Vinayagam A, König R, Moormann J, Schubert F, Eils R, Glatting KH, and Suhai S. Applying Support Vector Machines for Gene Ontology based gene function prediction. BMC Bioinformatics. 2004 Aug 26;5:116. DOI:10.1186/1471-2105-5-116 |
- Wieser D, Kretschmann E, and Apweiler R. Filtering erroneous protein annotation. Bioinformatics. 2004 Aug 4;20 Suppl 1:i342-7. DOI:10.1093/bioinformatics/bth938 |
- Enault F, Suhre K, Abergel C, Poirot O, and Claverie JM. Annotation of bacterial genomes using improved phylogenomic profiles. Bioinformatics. 2003;19 Suppl 1:i105-7. DOI:10.1093/bioinformatics/btg1013 |
- Enault F, Suhre K, Poirot O, Abergel C, and Claverie JM. Phydbac (phylogenomic display of bacterial genes): An interactive resource for the annotation of bacterial genomes. Nucleic Acids Res. 2003 Jul 1;31(13):3720-2. DOI:10.1093/nar/gkg603 |
- Enault F, Suhre K, Poirot O, Abergel C, and Claverie JM. Phydbac2: improved inference of gene function using interactive phylogenomic profiling and chromosomal location analysis. Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W336-9. DOI:10.1093/nar/gkh365 |
- Suhre K and Claverie JM. FusionDB: a database for in-depth analysis of prokaryotic gene fusion events. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D273-6. DOI:10.1093/nar/gkh053 |
- Enault F, Suhre K, and Claverie JM. Phydbac "Gene Function Predictor": a gene annotation tool based on genomic context analysis. BMC Bioinformatics. 2005 Oct 12;6:247. DOI:10.1186/1471-2105-6-247 |
- Pal D and Eisenberg D. Inference of protein function from protein structure. Structure. 2005 Jan;13(1):121-30. DOI:10.1016/j.str.2004.10.015 |
- Altschul SF, Gish W, Miller W, Myers EW, and Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990 Oct 5;215(3):403-10. DOI:10.1016/S0022-2836(05)80360-2 |
- Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, and Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997 Sep 1;25(17):3389-402. DOI:10.1093/nar/25.17.3389 |
- Bucher P and Bairoch A. A generalized profile syntax for biomolecular sequence motifs and its function in automatic sequence interpretation. Proc Int Conf Intell Syst Mol Biol. 1994;2:53-61.
- Hulo N, Sigrist CJ, Le Saux V, Langendijk-Genevaux PS, Bordoli L, Gattiker A, De Castro E, Bucher P, and Bairoch A. Recent improvements to the PROSITE database. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D134-7. DOI:10.1093/nar/gkh044 |
- Dobson PD and Doig AJ. Distinguishing enzyme structures from non-enzymes without alignments. J Mol Biol. 2003 Jul 18;330(4):771-83. DOI:10.1016/s0022-2836(03)00628-4 |
- Khan S, Situ G, Decker K, and Schmidt CJ. GoFigure: automated Gene Ontology annotation. Bioinformatics. 2003 Dec 12;19(18):2484-5. DOI:10.1093/bioinformatics/btg338 |
- Laskowski RA, Watson JD, and Thornton JM. ProFunc: a server for predicting protein function from 3D structure. Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W89-93. DOI:10.1093/nar/gki414 |
- Laskowski RA, Watson JD, and Thornton JM. Protein function prediction using local 3D templates. J Mol Biol. 2005 Aug 19;351(3):614-26. DOI:10.1016/j.jmb.2005.05.067 |
- Jensen LJ, Gupta R, Blom N, Devos D, Tamames J, Kesmir C, Nielsen H, Staerfeldt HH, Rapacki K, Workman C, Andersen CA, Knudsen S, Krogh A, Valencia A, and Brunak S. Prediction of human protein function from post-translational modifications and localization features. J Mol Biol. 2002 Jun 21;319(5):1257-65. DOI:10.1016/S0022-2836(02)00379-0 |
- Jensen LJ, Gupta R, Staerfeldt HH, and Brunak S. Prediction of human protein function according to Gene Ontology categories. Bioinformatics. 2003 Mar 22;19(5):635-42. DOI:10.1093/bioinformatics/btg036 |
- Hobohm U and Sander C. A sequence property approach to searching protein databases. J Mol Biol. 1995 Aug 18;251(3):390-9. DOI:10.1006/jmbi.1995.0442 |
- Conesa A, Götz S, García-Gómez JM, Terol J, Talón M, and Robles M. Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics. 2005 Sep 15;21(18):3674-6. DOI:10.1093/bioinformatics/bti610 |
- Zehetner G. OntoBlast function: From sequence similarities directly to potential functional annotations by ontology terms. Nucleic Acids Res. 2003 Jul 1;31(13):3799-803. DOI:10.1093/nar/gkg555 |
- Hennig S, Groth D, and Lehrach H. Automated Gene Ontology annotation for anonymous sequence data. Nucleic Acids Res. 2003 Jul 1;31(13):3712-5. DOI:10.1093/nar/gkg582 |
- Groth D, Lehrach H, and Hennig S. GOblet: a platform for Gene Ontology annotation of anonymous sequence data. Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W313-7. DOI:10.1093/nar/gkh406 |
- Martin DM, Berriman M, and Barton GJ. GOtcha: a new method for prediction of protein function assessed by the annotation of seven genomes. BMC Bioinformatics. 2004 Nov 18;5:178. DOI:10.1186/1471-2105-5-178 |
- Pazos F and Sternberg MJ. Automated prediction of protein function and detection of functional sites from structure. Proc Natl Acad Sci U S A. 2004 Oct 12;101(41):14754-9. DOI:10.1073/pnas.0404569101 |
- Storm CE and Sonnhammer EL. Automated ortholog inference from phylogenetic trees and calculation of orthology reliability. Bioinformatics. 2002 Jan;18(1):92-9. DOI:10.1093/bioinformatics/18.1.92 |
- Zmasek CM and Eddy SR. RIO: analyzing proteomes by automated phylogenomics using resampled inference of orthologs. BMC Bioinformatics. 2002 May 16;3:14.
- Engelhardt BE, Jordan MI, Muratore KE, and Brenner SE. Protein molecular function prediction by Bayesian phylogenomics. PLoS Comput Biol. 2005 Oct;1(5):e45. DOI:10.1371/journal.pcbi.0010045 |
- Gouret P, Vitiello V, Balandraud N, Gilles A, Pontarotti P, and Danchin EG. FIGENIX: intelligent automation of genomic annotation: expertise integration in a new software platform. BMC Bioinformatics. 2005 Aug 5;6:198. DOI:10.1186/1471-2105-6-198 |
- Friedberg I. Automated protein function prediction--the genomic challenge. Brief Bioinform. 2006 Sep;7(3):225-42. DOI:10.1093/bib/bbl004 |
- Martin Jambon: introduction plus the initial list of tools and papers, put together after the AFP-SIG 2005 conference (at ISMB 2005)
- other Wikiomics authors