Multiple Sequence Alignment Using ClustalW
The alignment and subsequent analysis of protein amino acid sequences can provide potential insights into their structure, function and evolutionary relationships. As a complement to your molecular modeling work with RasMol, you will be using a software program called ClustalW to compare an E. coli β-galactosidase sequence with β-galactosidase sequences obtained from other bacterial species. This is called a multiple sequence alignment. The primary sequences were obtained at the National Center for Biotechnology Information (NCBI): http://www.ncbi.nlm.nih.gov/. We will use the ClustalW software available at the EMBL-EBI website http://www.ebi.ac.uk/clustalw. The sequences from the various β-galactosidases are in the RasMol folder (as a word document) on the lab’s computers. When you access the EMBL-EBI web site you will find a form to fill out. Make sure Protein is in the box at the top of the form. Copy and paste the entire word file of sequences into the box. They can all be pasted at the same time and then hit the submit button.
Once you see the output, scroll down to the box that will toggle between “Show Colors/Hide Colors”. Click it to color code the alignment (the key below shows the meaning of the colors). Excerpts found below from the Help file at the ClustalW site should help in interpreting the output. There is also a very useful tutorial found within the Help file at the EMBL-EBI web site.
Determine the degree of conservation among the amino acids you have located in the active site using the RasMol program. Jacobsen et al. (1994) and Roth et al. (1998) conducted similar analyses. What are the potential implications if an amino acid in the active site is highly conserved across all of the species tested? See the introduction in the paper by Juers et al. (2001) for additional information on amino acids that have been previously shown to be important for catalysis. The “yes” option below refers to the “Show Colors” selection. If you set this option to “yes” the alignment will be shown in colors. Note: this option only works when you have chosen ALN or GCG as the output format. The coloring of residues follows the following physiochemical criteria:
AVFPMILW-Red: Small (small + hydrophobic [includes armomatic –Y])
STYHCNGQ -Green: Hydroxly +Amine+ Basic + Q
Consensus Symbols: The following symbols denote the degree on conservation observed in each column:
* (asterix) means that the residues or nucleotides in that column are identical in all sequences in the alignment.
: (colon) means that conserved substitutions have been observed, according to the Color table above
. (period) means that semi-conserved substitutions are observed.