Wikiomics:Searching for 3D functional sites in a protein structure

From OpenWetWare
Jump to navigationJump to search

Given a protein structure, which are the potentially interesting sites? Approaches which are based only on sequence patterns or backbone architecture are often insufficient to find similarities between sites of similar biochemical function.

The set of methods which are shown here use the 3D arrangement of the atoms of proteins to find putative functional sites, such as ligand binding sites or catalytic sites.

Search by comparison against annotated sites

Comparing 3D structures locally at the atomic level is not a simple problem, and there is no standard method in this field. However, many of these recent techniques are available from web servers, which makes them relatively easy to use.

An advantage of comparing a query protein structure against 3D sites of known biological activity is that both sites can be compared and the similarity can be further investigated either visually or using other tools.

Methods and tools

PdbFun [1] is a web server for the identification of local structural similarities between annotated residues in proteins, gives fast access to the whole PDB organized as a database of annotated residues, helps selecting any residue subset by combining the available features, compares query and target selections with a fast and sequence-independent 3D comparison algorithm representing each amino acid by one point located at its centroid.

PDBSiteScan [2] will scan a protein structure against its PDBSite [3] database. Each amino acid is represented by its 3 backbone atoms (N, C-alpha, C).

PINTS [4, 5] defines types of atoms for certain atoms of the lateral chains of amino acids. 2 atoms of the same type such as an oxygen of a carboxyl group (in Asp or Glu) can be considered as equivalent. The search is based on interatomic distances and the scoring is based on Wikiomics:RMSD values.

PROCAT [6] and now Catalytic Site Atlas [7, 8, 9] use the TESS [10] and Jess [11] methods for searching a database of 3D templates of catalytic sites.

pvSOAR [12, 13] uses centroids of amino acids forming pockets and the pseudosequence they form: if a pocket is made of amino acids Ala45, Tyr12, Ser124 and His32 then the corresponding sequence would be Tyr-Ala-His-Ser. The default comparison procedure uses an alignment between the sequences associated with 2 pockets. This constraint can be removed if only 2 pockets are being compared.

SiteEngine [14] uses surface exposed functional groups that describe the physico-chemical properties of amino acids. It is possible to compare a protein structure against a given site on the web server. The program is also available for download.

SPASM/RIGOR [15] was the first webserver to propose sequence- and fold-independent search in 3D structures of proteins. It represents each residue by it's C-alpha or the centroid of the lateral chain.

Poster showing the main concepts of SuMo. Enlarge

SuMo [16, 17, 18] uses chemical groups with their own geometry and symmetry plus a complementary local shape comparison technique. It does not require a low Wikiomics:RMSD between 2 sites to consider them as similar although local pairwise matching is required. Given a protein structure, it will scan the PDB for similar ligand binding sites and return a list of sites, sorted by decreasing size. Clicking on each individual result gives a parallel view of the matched sites.

Prediction of functional sites from geometrical or physico-chemical properties

These tools do not try to match 3D sites between a query and sites of biological importance. Based on the geometry or the chemistry of the protein sites, they are associated with a given function.

  • SARIG [19] predicts functional sites using residue interaction graphs (contact maps)
  • WebFEATURE [20, 21] scans a protein structure for local environments of a given type. An RNA version exists too, naFEATURE [22].
  • THEMATICS [23, 24, 25] catalytic sites are predicted from deviations in theoretical titration curves of proteins

Prediction using phylogenetic information

Combined with projections onto 3D structures, the degree of conservation of aligned residues within a family of proteins can indicate amino acids which are functionally important.

See also


  1. Ausiello G, Zanzoni A, Peluso D, Via A, and Helmer-Citterich M. pdbFun: mass selection and fast comparison of annotated PDB residues. Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W133-7. DOI:10.1093/nar/gki499 | PubMed ID:15980442 | HubMed [pdbfun]
  2. Ivanisenko VA, Pintus SS, Grigorovich DA, and Kolchanov NA. PDBSiteScan: a program for searching for active, binding and posttranslational modification sites in the 3D structures of proteins. Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W549-54. DOI:10.1093/nar/gkh439 | PubMed ID:15215447 | HubMed [pdbsitescan]
  3. Ivanisenko VA, Pintus SS, Grigorovich DA, and Kolchanov NA. PDBSite: a database of the 3D structure of protein functional sites. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D183-7. DOI:10.1093/nar/gki105 | PubMed ID:15608173 | HubMed [pdbsite]
  4. Russell RB. Detection of protein three-dimensional side-chain patterns: new examples of convergent evolution. J Mol Biol. 1998 Jun 26;279(5):1211-27. DOI:10.1006/jmbi.1998.1844 | PubMed ID:9642096 | HubMed [pints_method]
  5. Stark A, Sunyaev S, and Russell RB. A model for statistical significance of local similarities in structure. J Mol Biol. 2003 Mar 7;326(5):1307-16. DOI:10.1016/s0022-2836(03)00045-7 | PubMed ID:12595245 | HubMed [pints_assessment]

    read [1] first

  6. Wallace AC, Laskowski RA, and Thornton JM. Derivation of 3D coordinate templates for searching structural databases: application to Ser-His-Asp catalytic triads in the serine proteinases and lipases. Protein Sci. 1996 Jun;5(6):1001-13. DOI:10.1002/pro.5560050603 | PubMed ID:8762132 | HubMed [procat]
  7. Porter CT, Bartlett GJ, and Thornton JM. The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D129-33. DOI:10.1093/nar/gkh028 | PubMed ID:14681376 | HubMed [csa1]
  8. Bartlett GJ, Porter CT, Borkakoti N, and Thornton JM. Analysis of catalytic residues in enzyme active sites. J Mol Biol. 2002 Nov 15;324(1):105-21. DOI:10.1016/s0022-2836(02)01036-7 | PubMed ID:12421562 | HubMed [csa2]
  9. Torrance JW, Bartlett GJ, Porter CT, and Thornton JM. Using a library of structural templates to recognise catalytic sites and explore their evolution in homologous families. J Mol Biol. 2005 Apr 1;347(3):565-81. DOI:10.1016/j.jmb.2005.01.044 | PubMed ID:15755451 | HubMed [csa3]
  10. Wallace AC, Borkakoti N, and Thornton JM. TESS: a geometric hashing algorithm for deriving 3D coordinate templates for searching structural databases. Application to enzyme active sites. Protein Sci. 1997 Nov;6(11):2308-23. DOI:10.1002/pro.5560061104 | PubMed ID:9385633 | HubMed [tess]

    successor of PROCAT [1]

  11. Barker JA and Thornton JM. An algorithm for constraint-based structural template matching: application to 3D templates with statistical analysis. Bioinformatics. 2003 Sep 1;19(13):1644-9. DOI:10.1093/bioinformatics/btg226 | PubMed ID:12967960 | HubMed [jess]

    successor of TESS [1]

  12. Binkowski TA, Adamian L, and Liang J. Inferring functional relationships of proteins from local sequence and spatial surface patterns. J Mol Biol. 2003 Sep 12;332(2):505-26. DOI:10.1016/s0022-2836(03)00882-9 | PubMed ID:12948498 | HubMed [pvsoar_method]
  13. Binkowski TA, Freeman P, and Liang J. pvSOAR: detecting similar surface patterns of pocket and void surfaces of amino acid residues on proteins. Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W555-8. DOI:10.1093/nar/gkh390 | PubMed ID:15215448 | HubMed [pvsoar_server]
  14. Shulman-Peleg A, Nussinov R, and Wolfson HJ. Recognition of functional sites in protein structures. J Mol Biol. 2004 Jun 4;339(3):607-33. DOI:10.1016/j.jmb.2004.04.012 | PubMed ID:15147845 | HubMed [siteengine]
  15. Kleywegt GJ. Recognition of spatial motifs in protein structures. J Mol Biol. 1999 Jan 29;285(4):1887-97. DOI:10.1006/jmbi.1998.2393 | PubMed ID:9917419 | HubMed [spasm_rigor]
  16. Jambon M, Imberty A, Deléage G, and Geourjon C. A new bioinformatic approach to detect common 3D sites in protein structures. Proteins. 2003 Aug 1;52(2):137-45. DOI:10.1002/prot.10339 | PubMed ID:12833538 | HubMed [sumo2003]

    describes the basic method, which has been considerably refined since. Read [1] for a good understanding of the current method and the concepts on which it relies.

  17. Jambon M, Andrieu O, Combet C, Deléage G, Delfaud F, and Geourjon C. The SuMo server: 3D search for protein functional sites. Bioinformatics. 2005 Oct 15;21(20):3929-30. DOI:10.1093/bioinformatics/bti645 | PubMed ID:16141250 | HubMed [sumo2005]

    application note about the SuMo web server

  18. Jambon M. A bioinformatic system for searching functional similarities in 3D structures of proteins. PhD thesis, 2003.

  19. Amitai G, Shemesh A, Sitbon E, Shklar M, Netanely D, Venger I, and Pietrokovski S. Network analysis of protein structures identifies functional residues. J Mol Biol. 2004 Dec 3;344(4):1135-46. DOI:10.1016/j.jmb.2004.10.055 | PubMed ID:15544817 | HubMed [sarig]
  20. Wei L and Altman RB. Recognizing protein binding sites using statistical descriptions of their 3D environments. Pac Symp Biocomput. 1998:497-508. PubMed ID:9697207 | HubMed [feature]
  21. Liang MP, Banatao DR, Klein TE, Brutlag DL, and Altman RB. WebFEATURE: An interactive web tool for identifying and visualizing functional sites on macromolecular structures. Nucleic Acids Res. 2003 Jul 1;31(13):3324-7. DOI:10.1093/nar/gkg553 | PubMed ID:12824318 | HubMed [webfeature]
  22. Banatao DR, Altman RB, and Klein TE. Microenvironment analysis and identification of magnesium binding sites in RNA. Nucleic Acids Res. 2003 Aug 1;31(15):4450-60. DOI:10.1093/nar/gkg471 | PubMed ID:12888505 | HubMed [nafeature]
  23. Ko J, Murga LF, Wei Y, and Ondrechen MJ. Prediction of active sites for protein structures from computed chemical properties. Bioinformatics. 2005 Jun;21 Suppl 1:i258-65. DOI:10.1093/bioinformatics/bti1039 | PubMed ID:15961465 | HubMed [thematics2005a]
  24. Shehadi IA, Abyzov A, Uzun A, Wei Y, Murga LF, Ilyin V, and Ondrechen MJ. Active site prediction for comparative model structures with thematics. J Bioinform Comput Biol. 2005 Feb;3(1):127-43. DOI:10.1142/s0219720005000916 | PubMed ID:15751116 | HubMed [thematics2005b]
  25. Ko J, Murga LF, André P, Yang H, Ondrechen MJ, Williams RJ, Agunwamba A, and Budil DE. Statistical criteria for the identification of protein active sites using Theoretical Microscopic Titration Curves. Proteins. 2005 May 1;59(2):183-95. DOI:10.1002/prot.20418 | PubMed ID:15739204 | HubMed [thematics2005c]
  26. Lichtarge O, Bourne HR, and Cohen FE. An evolutionary trace method defines binding surfaces common to protein families. J Mol Biol. 1996 Mar 29;257(2):342-58. DOI:10.1006/jmbi.1996.0167 | PubMed ID:8609628 | HubMed [et1996]
  27. Innis CA, Shi J, and Blundell TL. Evolutionary trace analysis of TGF-beta and related growth factors: implications for site-directed mutagenesis. Protein Eng. 2000 Dec;13(12):839-47. DOI:10.1093/protein/13.12.839 | PubMed ID:11239083 | HubMed [et2000]
  28. Mihalek I, Res I, and Lichtarge O. A family of evolution-entropy hybrid methods for ranking protein residues by importance. J Mol Biol. 2004 Mar 5;336(5):1265-82. DOI:10.1016/j.jmb.2003.12.078 | PubMed ID:15037084 | HubMed [et2004]
  29. Glaser F, Pupko T, Paz I, Bell RE, Bechor-Shental D, Martz E, and Ben-Tal N. ConSurf: identification of functional regions in proteins by surface-mapping of phylogenetic information. Bioinformatics. 2003 Jan;19(1):163-4. DOI:10.1093/bioinformatics/19.1.163 | PubMed ID:12499312 | HubMed [consurf]
  30. Polacco BJ and Babbitt PC. Automated discovery of 3D motifs for protein function annotation. Bioinformatics. 2006 Mar 15;22(6):723-30. DOI:10.1093/bioinformatics/btk038 | PubMed ID:16410325 | HubMed [polacco]

    uses the same technique as SPASM [1]

  31. Schmitt S, Kuhn D, and Klebe G. A new method to detect related function among proteins independent of sequence and fold homology. J Mol Biol. 2002 Oct 18;323(2):387-406. DOI:10.1016/s0022-2836(02)00811-2 | PubMed ID:12381328 | HubMed [schmitt2002]

    one of the most advanced technique with SuMo [1, 2, 3], but not available online (?). (more details needed)

All Medline abstracts: PubMed | HubMed