User:Kishor Krishnarao Shende/Notebook/Bioinformatics Tutorials/Entry Base

From OpenWetWare
Jump to: navigation, search
Owwnotebook icon.png Tutorials in Bioinformatics <html><img src="/images/9/94/Report.png" border="0" /></html> Main project page

Bioinformatics Tutorials

1. Phylogentic Analysis Phylogenetic Analysis

Phylogenetic analysis is a clustering of organism based on similar and dissimilar characters. Different Data types can be used to study the phylogenetic among the organisms. Data types may be morphological and anatomical characteristics, Biochemical and physiological characteristics, Cytological and molecular characteristics or the nucleotide or protein sequences. Many software tools are available for the phylogenetic analysis such as PHYLIP, MEGA, CLustalW etc. We are going to use ClustalW, Phylip, MEGA, Treeview software tool for the current practical.

Phylogenetic Analysis based on the Nucleotide sequence (16S rRNA) datatypes

Following is a list Firmicutes bacteria from the genus Bacillus, Lactobacillus, Streptococcus, Staphylococcus, Clostridium and Mycoplasma. Total 18 species are given with their GI number and accession numbers.

SN GI Number Accession Number Scientific Name 1 301429543 AB569641.1 Bacillus subtilis 2 301429542 AB569640.1 Bacillus licheniformis 3 188039790 EU624434.1 Bacillus cereus 4 126013493 EF412983.1 Lactobacillus brevis 5 291088158 AB550297.1 Lactobacillus fermentum 6 126013496 EF412986.1 Lactobacillus salivarius 7 1944114 AB002521.1 Streptococcus pyogenes 8 3323429 AF009506.1 Streptococcus suis 9 32396625 AY281084.1 Streptococcus pneumoniae 10 1199939 D83357.1 Staphylococcus aureus 11 1199949 D83367.1 Staphylococcus haemolyticus 12 1199945 D83363.1 Staphylococcus epidermidis 13 302129347 AB573713.1 Clostridium perfringens 14 257961 S46735.1| Clostridium acetobutylicum 15 295147944 AB558166.1 Clostridium thermocellum 16 39653271 AY466443.1 Mycoplasma genitalium 17 281186801 GU227406.1 Mycoplasma hyopneumoniae 18 2443519 AF009837.1 Mycoplasma capricolum


• Retrieve these 16S rRNA sequence from NCBI nucleotide sequence Database in single file. • Name these file as 16rna file • Go to the website of EBI (www.ebi.ac.uk). Go to Sequence Analysis under Tool menu and click on submenu ClustalW. • Paste the 16S rRNA sequences in the Text box. • Select aln format under the list Output Format • Select NJ(Neighboring Hood Joinging) under the list Clustering • Run the program • Under the table “Results of Search”, Save the Alignment File as “16srna” and Guide Tree File as “16srnatree-clst” file. • The 16srna file, which contains the multiple sequences alignment in clustalw format, will be used as an input file for MEGA (Phylogenetic Analysis) tool. This is detailed later.

Visualizing Dendogram (Phylogenetic Tree) Guide tree file as “16srnatree-clst” can be visualized using BioEdit or MEGA Software tool.

A. Visualization Phylogenetic Tree in MEGA (Pro. Rod Page) Software: Download the MEGA5. Install the software tool. • Open the file menu and open the file “16srnatree-clst”, which was obtained from ClustalW. • Visualize the dendogram in the ways as listed bellow i. Radial Tree ii. Slanted Cladogram iii. Rectangular Cladogram iv. Phylogram • To save the image of each dendogram type, go to Edit menu and click Copy function. Paste in MS-Word file. Name the figure with proper label of tree type.


Phylogenetic Analysis using MEGA software Open MEGA and perform the operation as instructed by the instructor ad answer the following question

1. Take the printout of each type of dendogram you generated. 2. What is difference you observed in neighbor-joining and UPGMA tree? 3. What are the differences you observed in three types of phylogenetic tree you obtained using Distance method, Maximum Parsimony method and Maximum Likelihood method? 4. Comment about the phylogenetic relationship among the various bacteria of Firmicutes class from the tree generated using 16S rRNA data.

B. Phylogenetic Analysis using Phylip and Tree drawing The MSA file in phylip format obtained after clustalW analysis, “16srnaphylip” will be used to perform the Phylogenetic analysis and dendogram generation. • Download the PHYLIP (By Joe Felsenstein) package from the web site of the Department of Genome Sciences and the Department of Biology at the University of Washington (http://evolution.gs.washington.edu/phylip.html), extract and save the folder of Phylip under the computer drive “C:”. • Inside the folder Phylip find the folder “exe”. Open this folder and place your file (16srnaphylip) of MSA of 16S rRNA sequences in phylip format.

B.1 Distance Matrix Method • You have the multiple sequences alignment in PHYLIP format. You need to create dissimilarity matrix or Distances among the taxans based on sequences similarity. You can calculate the distance matrix by using the distance matrix program “dnadist” in phylip. • Now double click on the program “dnadist”. A new DOS command window will open. It will ask for you for the input file name. • Type the input file name as “16srnaphylip.aln” • The program will show you all the list of parameter to be set. The Option “D” is to set the Distance matrix calculation algorithms. Presently 4 algorithms are available viz. F84, Kimura, Jukes-Cantor and LogDet. Widely used algorithms are Kimura and Jukes-Cantor method of distance calculations. Default is F84 and you need to type “D” and press enter to change the options. • Check the default option and change the option as given bellow.  Type “D” Press enter. This will change Distance matrix algorithm to “Kimura”  Ignore Option “G” for Gama Distributed rates across the sites.  Ignore Option “T” for Transition/Transversion ration  Ignore Option “C” One category of substitution  Ignore Option “W” for using the weights for the site  Ignore Option “L” for drawing the distance matrix type  Ignore Option “M” for number of dataset. As we are using one replicate of data set no need to change this option  Ignore Option “I”, For Phylip format type. As our format type is interleaved so no need to change this option too.  Ignore Option “0”, to change the type of Computer  Ignore Option “1”, If you change this option it will write input data to outfile  Ignore Option “2”, itwill show the progress of computation.  Type “Y” and press enter.

• Program will run and calculate the distance matrix. A new file will be created as “outfile”. • Rename the “outfile” as “16srnadist”. This file contains the Distance matrix i.e. distances among the each taxans under study. You can open this file in WordPad to observe, but do not edit the file. • The new file created as “16srnadist” will be used to perform the clustering (phylogenetic analysis) of taxons. • Now you need to create the dendogram (Phylogenetic tree) by using the Distance matrix data. There are 4 methods of clustering the taxons viz. Fitch, Kitch, Neighbor and UPGMA (Unweighted Pair Group Method of Arithmatic mean). • 3 programs are available in PHYLIP to cluster the taxons namely “fitch”, ”kitch” and ”neighbor”. The “neighbor” Program contains two algorithms, as 1. Neighborhood Joining and 2. UPGMA. Both these programs are widely used in diversity analysis. • For the present study we will use “neighbor” program of PHYLIP and both the algorithms (1. Neighborhood joining and 2. UPGMA) to draw the dendogram. • Now double click on the program “neighbor”. New DOS command window will open. • This will show you number of options listed to be set as given bellow.  Ignore “N”. This is used to set the clustering algorithms. We will use “Neighbor-Joining” method to cluster  Ignore option “O”, for outgroup root. Let each species be out group.  Ignore option “L”, If you have distance matrix data file with lower matrix filled  Ignore option “R”, If you have distance matrix data file with upper matrix filled. But we have square matrix data file.  Ignore option “S”, as we don’t have any sub-group  Ignore option “J”, This option for randomizing the input order of the data. Here you need to mention the seed which can be of the value 1 to (n-1), where ‘n’ is number of total taxons.  Ignore option “M” as our data is in single replicates. This option can be used for the data which has taken in number of replicate after bootstrapping or permutation combination.  Ignore Option “0”  Type “1” and press enter to change this option so that you can see your data typed during run time of program  Ignore Option “2”, “3”, “4” Not so important  Type “Y” and press enter. This will run the program “neighbor” and two file will be created viz. 1. “outfile” And 2. “outtree” • The two files obtained are as… “outfile” : This file contains the details of the data, tree and branch length calculated for tree building. Take the printout of this file to submit. “outtree” : This file contains the tree data in phylogenetic tree format, which can read by tree visualizing programs such as “TreeView”, MEGA, BioEdit and Drawgram or drawtree of PHYLIP . • Start the program “TreeView” (already installed) and open the file “outtree” • Visualize the dendogram in the ways as listed bellow v. Radial Tree vi. Slanted Cladogram vii. Rectangular Cladogram viii. Phylogram • To save the image of each dendogram type, go to Edit menu and click Copy function and paste it on MS-Word file. Name the figure with proper label of tree type.

B.2 Maximum Parsimony Method • Keep your phylip format MSA file in exe folder under phylip forlder. • Double click the program “dnapars”, a new DOS command window will open asking for the input file name. • Type the input file name as “16srnaphylip.aln”, a list of option will open • Keep all the options default. Type “Y’ and press enter • Two new files will be generated viz 1. Outfile and 2. Outtree • Visualize the “outtree” file in TreeView software tool. Display the dendogram in .. i. Radial Tree ii. Slanted Cladogram iii. Rectangular Cladogram iv. Phylogram • To save the image of each dendogram type, go to Edit menu and click Copy function and paste it on MS-Word file. Name the figure with proper label of tree type.

B.3 Maximum Likelihood Method • Keep your phylip format MSA file in exe folder under phylip forlder. • Double click the program “dnaml”, a new DOS command window will open asking for the input file name. • Type the input file name as “16srnaphylip.aln”, a list of option will open • Keep all the options default. Type “Y’ and press enter • Two new files will be generated viz 1. Outfile and 2. Outtree • Visualize the “outtree” file in TreeView software tool. Display the dendogram in .. v. Radial Tree vi. Slanted Cladogram vii. Rectangular Cladogram viii. Phylogram • To save the image of each dendogram type, go to Edit menu and click Copy function and paste it on MS-Word file. Name the figure with proper label of tree type.


Question: 5. Take the printout of each type of dendogram you generated. 6. What is difference you observed in neighbor-joining and UPGMA tree? 7. What are the differences you observed in three types of phylogenetic tree you obtained using Distance method, Maximum Parsimony method and Maximum Likelihood method? 8. Comment about the phylogenetic relationship among the various bacteria of Firmicutes class from the tree generated using 16S rRNA data.