Prediction of functional sites from protein multiple sequence alignments
Author(s): Jonathan Manning and Emily Jefferson and Geoff Barton
Affiliations: University of Dundee
Keywords: 'protein' 'alignment' 'function' 'site'
Protein multiple sequence alignments are powerful tools that may reveal residues important to the structure and function of protein family members. Alignment positions that conserve identical amino acids, or those with similar physico-chemical properties are routinely used to predict potential functional sites by inspection. Automated procedures to predict catalytic residues by considering conservation across all sequences in a family  have also met with some success in identifying functional sites. More sophisticated methods seek to exploit the evolutionary information present in a family of sequences, by considering sub-families, or trees. For example, the AMAS algorithm  identifies positions that have conserved physico-chemical properties within sub-families of proteins (e.g. +ve charge), yet exhibit different properties between the sub-families (e.g. +ve charge compared to -ve). A large number of algorithms have been based on this principle and have met with varying degrees of success [3-5].
We detail a new method for the prediction of functionally significant positions in multiple sequence alignments, named SMERFS. The algorithm exploits patterns present in the homology relationships of protein multiple sequence alignments to propose functionally significant regions. However in contrast to many other techniques based on this premise, SMERFS requires neither fixed subgrouping nor phylogenetic tree. We validate the method using structurally derived data, present some preliminary evaluation by comparison to hierarchical analysis methods  and conventional conservation measures , and discuss future prospects.
1. Zvelebil, M.J., et al., Prediction of protein secondary structure and active sites using the alignment of homologous sequences. J Mol Biol, 1987. 195(4): p. 957-61.
2. Livingstone, C.D. and G.J. Barton, Protein sequence alignments: a strategy for the hierarchical analysis of residue conservation. Comput Appl Biosci, 1993. 9(6): p. 745-56.
3. Lichtarge, O., H.R. Bourne, and F.E. Cohen, An evolutionary trace method defines binding surfaces common to protein families. J Mol Biol, 1996. 257(2): p. 342-58.
4. Armon, A., D. Graur, and N. Ben-Tal, ConSurf: an algorithmic tool for the identification of functional regions in proteins by surface mapping of phylogenetic information. J Mol Biol, 2001. 307(1): p. 447-63.
5. La, D. and D.R. Livesay, Predicting functional sites with an automated algorithm suitable for heterogeneous datasets. BMC Bioinformatics, 2005. 6(1): p. 116.
6. Valdar, W.S., Scoring residue conservation. Proteins, 2002. 48(2): p. 227-41.