Kai Yuet/Protocols:Bootstrapping

From OpenWetWare
Jump to: navigation, search

The Bootstrapping Process (KPY)

Bootstrapping is a method of estimating confidence levels of inferred relationships.

  • 1. Align the given sequences using PileUp, a multiple sequence alignment program that utilizes a progressive, pairwise alignment method:
!!AA_MULTIPLE_ALIGNMENT 1.0
PileUp of: *.pep

Symbol comparison table: GenRunData:blosum62.cmp  CompCheck: 1102

                  GapWeight: 8
            GapLengthWeight: 2 

pileup.msf  MSF: 529  Type: P  January 16, 2007 15:39  Check: 970 ..

Name: FuguC            Len:   529  Check: 4363  Weight:  1.00
Name: TetraC           Len:   529  Check: 3118  Weight:  1.00
Name: ZebraC           Len:   529  Check: 3684  Weight:  1.00
Name: HumanC           Len:   529  Check: 2247  Weight:  1.00
Name: FuguA            Len:   529  Check: 6929  Weight:  1.00
Name: TetraA           Len:   529  Check: 4354  Weight:  1.00
Name: ZebraA           Len:   529  Check: 3876  Weight:  1.00
Name: HumanA           Len:   529  Check: 2399  Weight:  1.00

//

       1                                                   50
 FuguC  MGRKKIQITR IMDERNRQVT FTKRKFGLMK KAYELSVLCD CEIALIIFNS 
TetraC  MGRKKIQITR IMDERNRQVT FTKRKFGLMK KAYELSVLCD CEIALIIFNS 
ZebraC  MGRKKIQITR IMDERNRQVT FTKRKFGLMK KAYELSVLCD CEIALIIFNS 
HumanC  MGRKKIQITR IMDERNRQVT FTKRKFGLMK KAYELSVLCD CEIALIIFNS 
 FuguA  MGRKKIQITR IMDERNRQVT FTKRKFGLMK KAYELSVLCD CEIALIIFNS 
TetraA  MGRKKIQITR IMDERNRQVT FTKRKFGLMK KAYELSVLCD CEIALIIFNS 
ZebraA  MGRKKIQITR IMDERNRQVT FTKRKFGLMK KAYELSVLCD CEIALIIFNS 
HumanA  MGRKKIQITR IMDERNRQVT FTKRKFGLMK KAYELSVLCD CEIALIIFNS
  • 2. Re-align the PileUp output using ClustalW, output PHYLIP format, to generate a Clustal Alignment .aln file. This file, renamed as 'infile', will become the input for the SEQBOOT bootstrapping program.
    8    543
 FuguC     MGRKKIQITR IMDERNRQVT FTKRKFGLMK KAYELSVLCD CEIALIIFNS 
TetraC     MGRKKIQITR IMDERNRQVT FTKRKFGLMK KAYELSVLCD CEIALIIFNS 
ZebraC     MGRKKIQITR IMDERNRQVT FTKRKFGLMK KAYELSVLCD CEIALIIFNS 
HumanC     MGRKKIQITR IMDERNRQVT FTKRKFGLMK KAYELSVLCD CEIALIIFNS 
 FuguA     MGRKKIQITR IMDERNRQVT FTKRKFGLMK KAYELSVLCD CEIALIIFNS 
TetraA     MGRKKIQITR IMDERNRQVT FTKRKFGLMK KAYELSVLCD CEIALIIFNS 
ZebraA     MGRKKIQITR IMDERNRQVT FTKRKFGLMK KAYELSVLCD CEIALIIFNS 
HumanA     MGRKKIQITR IMDERNRQVT FTKRKFGLMK KAYELSVLCD CEIALIIFNS
  • 3. SEQBOOT resamples the input data set to create multiple bootstrapped or jackknifed data sets. Bootstrapping involves sampling and replacing N characters randomly; these new data sets can then be analyzed to estimate statistically the sampling distribution of data. SEQBOOT results are placed in a file named 'outfile'. PHYLIP Package. SEQBOOT Web Server.
   8   543
FuguC      MMMKKKQTRR RERRRRRQQQ VVTFTKRKKK GLLLKAYEEE LLVLLCDIAA IIIINSLYYK
TetraC     MMMKKKQTRR RERRRRRQQQ VVTFTKRKKK GLLLKAYEEE LLVLLCDIAA IIIINSLYYK
ZebraC     MMMKKKQTRR RERRRRRQQQ VVTFTKRKKK GLLLKAYEEE LLVLLCDIAA IIIINSLYYK
HumanC     MMMKKKQTRR RERRRRRQQQ VVTFTKRKKK GLLLKAYEEE LLVLLCDIAA IIIINSLYYK
FuguA      MMMKKKQTRR RERRRRRQQQ VVTFTKRKKK GLLLKAYEEE LLVLLCDIAA IIIINSLYYK
TetraA     MMMKKKQTRR RERRRRRQQQ VVTFTKRKKK GLLLKAYEEE LLVLLCDIAA IIIINSLYYK
ZebraA     MMMKKKQTRR RERRRRRQQQ VVTFTKRKKK GLLLKAYEEE LLVLLCDIAA IIIINSLYYK
HumanA     MMMKKKQTRR RERRRRRQQQ VVTFTKRKKK GLLLKAYEEE LLVLLCDIAA IIIINSLYYK
  • 4. To find the phylogeny estimate for these new data sets, rename the 'outfile' to 'infile', and run PROTPARS (PROTein PARSimony) for protein sequences or similarly DNAPARS (DNA PARSimony) for nucleotide sequences, resetting its multiple data set option. PROTPARS infers an unrooted phylogeny from these data sets using a method similar to counting the number of changes required to evolve from one protein to another in different phylogenies and counting the minimum number of substitutions required to recreate a given phylogeny. Two files, the 'outfile' file and the 'outtree' file which contains all of the trees generated from the data sets. PROTPARS Web Server. PROTPARS Mirror.
Protein parsimony algorithm, version 3.66
Data set # 1:
One most parsimonious tree found:

             +--------HumanA    
          +--7  
          !  !  +-----ZebraA    
          !  +--6  
       +--4     !  +--TetraA    
       !  !     +--5  
       !  !        +--FuguA     
    +--3  !  
    !  !  +-----------HumanC    
 +--2  !  
 !  !  +--------------ZebraC    
 1  !  
 !  +-----------------TetraC    
 !  
 +--------------------FuguC     

 remember: this is an unrooted tree!
requires a total of    823.000
  • 5. Finally, the 'outfile' file needs to be renamed to 'infile', the input file for CONSENSE, the program that outputs the majority rule consensus tree. CONSENSE Web Server.
Consensus tree program, version 3.66
Species in order: 

 1. HumanA
 2. ZebraA
 3. TetraA
 4. FuguA
 5. HumanC
 6. ZebraC
 7. TetraC
 8. FuguC

Sets included in the consensus tree

Set (species in order)     How many times out of  100.00

....****                   100.00
..**....                   100.00
.....***                   100.00
......**                   100.00
.***....                   99.00

Sets NOT included in consensus tree:

Set (species in order)     How many times out of  100.00

.*..****                    1.00

Extended majority rule consensus tree

CONSENSUS TREE:
the numbers on the branches indicate the number
of times the partition of the species into the two sets
which are separated by that branch occurred
among the trees, out of 100.00 trees

                             +------FuguC
                      +100.0-|
               +100.0-|      +------TetraC
               |      |
        +100.0-|      +-------------ZebraC
        |      |
 +------|      +--------------------HumanC
 |      |
 |      |             +-------------ZebraA
 |      +--------99.0-|
 |                    |      +------TetraA
 |                    +100.0-|
 |                           +------FuguA
 |
 +----------------------------------HumanA
  • 6. The 'outtree' file can be further manipulated (such as rooting) using TREEVIEW.