Wikiomics:Protein function prediction

From OpenWetWare

Revision as of 01:41, 20 November 2007 by Bill Flanagan (Talk | contribs)
Jump to: navigation, search

There are now plenty of proteins which have a totally unknown function. Automated function prediction is an active research field, with a growing community of bioinformaticians as observed at the AFP-SIG that took place at the ISMB 2005 conference, and at University of California San Diego in 2006.

Most often, only the sequence of the protein is known, but there are also hundreds of protein structures of unknown function which are provided by the structural genomics centers. Sometimes the proteins come from prokaryotes where the operons make it possible to infer the function of a protein from its genomic context, but this is more complicated in eukaryotes. And more generally, it is easier to guess right when a given protein has well-described homologs than when it belongs to a family of unknown biological role.

Of course, the notion of protein function is pretty broad and cannot easily be encoded without relying on a complex vocabulary. For that matter, the Gene Ontology aka GO provides hierarchical set of keywords called GO terms which describe different aspects of protein function with different levels of precision. GO is currently imposing itself as a standard for proteome annotation and function prediction of proteins.

Among the current software tools that exist today, several main strategies can be distinguished:

  • homology search and transfer of annotations:
    • sequence alignment
    • structure alignment
  • function inference by genomic context
  • phylogenomic approaches
  • prediction from structure using similarities that are not homology-based:
    • local sequence patterns
    • physico-chemical sequence features
    • 3D local sites
    • 3D physico-chemical features


Servers which competed at the AFP-SIG 2005

See also the short summaries by the authors themselves at the official site of AFP-SIG 2005.

These servers are based on transfer of function based on homology:

And the other servers are:

  • SpearMint and RuleBase (not public yet) [6]
  • PhydBac [7, 8, 9, 10, 11] analyzes bacterial proteins using genomic context.
  • ProKnow [12] searches for known 3D folds, sequences, motifs, and functional linkages

Basic tools

  • Wikiomics:BLAST and Wikiomics:PSI-BLAST [13, 14] are commonly used to search for homologous protein sequences by sequence alignment.
  • Prosite [15, 16] is a searchable database of sequence patterns that are associated with some biological functions.

Other protein function prediction servers

JAFA is a meta-server for function prediction of proteins: it produces a prediction based on an aggregate from other servers. You might want to start with JAFA since it queries 5 servers (GOFigure, GOblet, InterproScan, GOtcha, PhydBac) and shows you where their results agree and differ.

Miscellaneous servers:

  • Protein Function Prediction Server - Protein function predictions from PDB structures [17]. An enzyme/non-enzyme predictor, and an enzyme class predictor are available.
  • GoFigure [18] predicts the function of a gene or protein
  • ProFunc [19, 20] performs predictions from a protein structure

See also Wikiomics:Searching for 3D functional sites in a protein structure.

Methods using non-sequential sequence features:

These methods are based on function transfer after homology searches:

Phylogenomic approaches:

See also


  1. Martí-Renom MA, Ilyin VA, and Sali A. . pmid:11524379. PubMed HubMed [dbali]
  2. Hawkins T, Luban S, and Kihara D. . pmid:16672240. PubMed HubMed [pfp]
  3. Szafron D, Lu P, Greiner R, Wishart DS, Poulin B, Eisner R, Lu Z, Anvik J, Macdonell C, Fyshe A, and Meeuwis D. . pmid:15215412. PubMed HubMed [pa]
  4. Lu P, Szafron D, Greiner R, Wishart DS, Fyshe A, Pearcy B, Poulin B, Eisner R, Ngo D, and Lamb N. . pmid:15608166. PubMed HubMed [pa-gosub]
  5. Vinayagam A, König R, Moormann J, Schubert F, Eils R, Glatting KH, and Suhai S. . pmid:15333146. PubMed HubMed [gopet]
  6. Wieser D, Kretschmann E, and Apweiler R. . pmid:15262818. PubMed HubMed [wieser2004]
  7. Enault F, Suhre K, Abergel C, Poirot O, and Claverie JM. . pmid:12855445. PubMed HubMed [enault2003]
  8. Enault F, Suhre K, Poirot O, Abergel C, and Claverie JM. . pmid:12824402. PubMed HubMed [phydbac]
  9. Enault F, Suhre K, Poirot O, Abergel C, and Claverie JM. . pmid:15215406. PubMed HubMed [phydbac2]
  10. Suhre K and Claverie JM. . pmid:14681411. PubMed HubMed [fusiondb]
  11. Enault F, Suhre K, and Claverie JM. . pmid:16221304. PubMed HubMed [phydbac2005]
  12. Pal D and Eisenberg D. . pmid:15642267. PubMed HubMed [proknow]
  13. Altschul SF, Gish W, Miller W, Myers EW, and Lipman DJ. . pmid:2231712. PubMed HubMed [blast]
  14. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, and Lipman DJ. . pmid:9254694. PubMed HubMed [psiblast]
  15. Bucher P and Bairoch A. . pmid:7584418. PubMed HubMed [prosite_first]
  16. Hulo N, Sigrist CJ, Le Saux V, Langendijk-Genevaux PS, Bordoli L, Gattiker A, De Castro E, Bucher P, and Bairoch A. . pmid:14681377. PubMed HubMed [prosite_last]
  17. Dobson PD and Doig AJ. . pmid:12850146. PubMed HubMed [dobson-doig]
  18. Khan S, Situ G, Decker K, and Schmidt CJ. . pmid:14668239. PubMed HubMed [gofigure]
  19. Laskowski RA, Watson JD, and Thornton JM. . pmid:15980588. PubMed HubMed [profunc_a]
  20. Laskowski RA, Watson JD, and Thornton JM. . pmid:16019027. PubMed HubMed [profunc_b]
  21. Jensen LJ, Gupta R, Blom N, Devos D, Tamames J, Kesmir C, Nielsen H, Staerfeldt HH, Rapacki K, Workman C, Andersen CA, Knudsen S, Krogh A, Valencia A, and Brunak S. . pmid:12079362. PubMed HubMed [protfun2002]
  22. Jensen LJ, Gupta R, Staerfeldt HH, and Brunak S. . pmid:12651722. PubMed HubMed [protfun2003]
  23. Hobohm U and Sander C. . pmid:7650738. PubMed HubMed [propsearch]
  24. Conesa A, Götz S, García-Gómez JM, Terol J, Talón M, and Robles M. . pmid:16081474. PubMed HubMed [blast2go]
  25. Zehetner G. . pmid:12824422. PubMed HubMed [ontoblast]
  26. Hennig S, Groth D, and Lehrach H. . pmid:12824400. PubMed HubMed [goblet2003]
  27. Groth D, Lehrach H, and Hennig S. . pmid:15215401. PubMed HubMed [goblet2004]
  28. Martin DM, Berriman M, and Barton GJ. . pmid:15550167. PubMed HubMed [gotcha]
  29. Pazos F and Sternberg MJ. . pmid:15456910. PubMed HubMed [phunctioner]
  30. Storm CE and Sonnhammer EL. . pmid:11836216. PubMed HubMed [orthostrapper]
  31. Zmasek CM and Eddy SR. . pmid:12028595. PubMed HubMed [rio]
  32. Engelhardt BE, Jordan MI, Muratore KE, and Brenner SE. . pmid:16217548. PubMed HubMed [sifter]
  33. Gouret P, Vitiello V, Balandraud N, Gilles A, Pontarotti P, and Danchin EG. . pmid:16083500. PubMed HubMed [figenix]
  34. Friedberg I. . pmid:16772267. PubMed HubMed [afpreview]
All Medline abstracts: PubMed HubMed



  • Martin Jambon: introduction plus the initial list of tools and papers, put together after the AFP-SIG 2005 conference (at ISMB 2005)
  • other Wikiomics authors
Personal tools