Kai Yuet/Protocols:Bootstrapping

=The Bootstrapping Process (KPY)=

Bootstrapping is a method of estimating confidence levels of inferred relationships.

!!AA_MULTIPLE_ALIGNMENT 1.0 PileUp of: *.pep Symbol comparison table: GenRunData:blosum62.cmp CompCheck: 1102 GapWeight: 8 GapLengthWeight: 2 pileup.msf MSF: 529  Type: P  January 16, 2007 15:39  Check: 970 .. Name: FuguC           Len:   529  Check: 4363  Weight:  1.00 Name: TetraC          Len:   529  Check: 3118  Weight:  1.00 Name: ZebraC          Len:   529  Check: 3684  Weight:  1.00 Name: HumanC          Len:   529  Check: 2247  Weight:  1.00 Name: FuguA           Len:   529  Check: 6929  Weight:  1.00 Name: TetraA          Len:   529  Check: 4354  Weight:  1.00 Name: ZebraA          Len:   529  Check: 3876  Weight:  1.00 Name: HumanA          Len:   529  Check: 2399  Weight:  1.00 //       1                                                   50  FuguC  MGRKKIQITR IMDERNRQVT FTKRKFGLMK KAYELSVLCD CEIALIIFNS TetraC MGRKKIQITR IMDERNRQVT FTKRKFGLMK KAYELSVLCD CEIALIIFNS ZebraC MGRKKIQITR IMDERNRQVT FTKRKFGLMK KAYELSVLCD CEIALIIFNS HumanC MGRKKIQITR IMDERNRQVT FTKRKFGLMK KAYELSVLCD CEIALIIFNS FuguA MGRKKIQITR IMDERNRQVT FTKRKFGLMK KAYELSVLCD CEIALIIFNS TetraA MGRKKIQITR IMDERNRQVT FTKRKFGLMK KAYELSVLCD CEIALIIFNS ZebraA MGRKKIQITR IMDERNRQVT FTKRKFGLMK KAYELSVLCD CEIALIIFNS HumanA MGRKKIQITR IMDERNRQVT FTKRKFGLMK KAYELSVLCD CEIALIIFNS
 * 1. Align the given sequences using PileUp, a multiple sequence alignment program that utilizes a progressive, pairwise alignment method:


 * 2. Re-align the PileUp output using ClustalW, output PHYLIP format, to generate a Clustal Alignment .aln file. This file, renamed as 'infile', will become the input for the SEQBOOT bootstrapping program.

8   543  FuguC     MGRKKIQITR IMDERNRQVT FTKRKFGLMK KAYELSVLCD CEIALIIFNS TetraC    MGRKKIQITR IMDERNRQVT FTKRKFGLMK KAYELSVLCD CEIALIIFNS ZebraC    MGRKKIQITR IMDERNRQVT FTKRKFGLMK KAYELSVLCD CEIALIIFNS HumanC    MGRKKIQITR IMDERNRQVT FTKRKFGLMK KAYELSVLCD CEIALIIFNS FuguA    MGRKKIQITR IMDERNRQVT FTKRKFGLMK KAYELSVLCD CEIALIIFNS TetraA    MGRKKIQITR IMDERNRQVT FTKRKFGLMK KAYELSVLCD CEIALIIFNS ZebraA    MGRKKIQITR IMDERNRQVT FTKRKFGLMK KAYELSVLCD CEIALIIFNS HumanA    MGRKKIQITR IMDERNRQVT FTKRKFGLMK KAYELSVLCD CEIALIIFNS


 * 3. SEQBOOT resamples the input data set to create multiple bootstrapped or jackknifed data sets. Bootstrapping involves sampling and replacing N characters randomly; these new data sets can then be analyzed to estimate statistically the sampling distribution of data.  SEQBOOT results are placed in a file named 'outfile'.  PHYLIP Package. SEQBOOT Web Server.

8  543 FuguC      MMMKKKQTRR RERRRRRQQQ VVTFTKRKKK GLLLKAYEEE LLVLLCDIAA IIIINSLYYK TetraC    MMMKKKQTRR RERRRRRQQQ VVTFTKRKKK GLLLKAYEEE LLVLLCDIAA IIIINSLYYK ZebraC    MMMKKKQTRR RERRRRRQQQ VVTFTKRKKK GLLLKAYEEE LLVLLCDIAA IIIINSLYYK HumanC    MMMKKKQTRR RERRRRRQQQ VVTFTKRKKK GLLLKAYEEE LLVLLCDIAA IIIINSLYYK FuguA     MMMKKKQTRR RERRRRRQQQ VVTFTKRKKK GLLLKAYEEE LLVLLCDIAA IIIINSLYYK TetraA    MMMKKKQTRR RERRRRRQQQ VVTFTKRKKK GLLLKAYEEE LLVLLCDIAA IIIINSLYYK ZebraA    MMMKKKQTRR RERRRRRQQQ VVTFTKRKKK GLLLKAYEEE LLVLLCDIAA IIIINSLYYK HumanA    MMMKKKQTRR RERRRRRQQQ VVTFTKRKKK GLLLKAYEEE LLVLLCDIAA IIIINSLYYK


 * 4. To find the phylogeny estimate for these new data sets, rename the 'outfile' to 'infile', and run PROTPARS (PROTein PARSimony) for protein sequences or similarly DNAPARS (DNA PARSimony) for nucleotide sequences, resetting its multiple data set option. PROTPARS infers an unrooted phylogeny from these data sets using a method similar to counting the number of changes required to evolve from one protein to another in different phylogenies and counting the minimum number of substitutions required to recreate a given phylogeny.  Two files, the 'outfile' file and the 'outtree' file which contains all of the trees generated from the data sets. PROTPARS Web Server. PROTPARS Mirror.

Protein parsimony algorithm, version 3.66 Data set # 1: One most parsimonious tree found: +HumanA +--7            !  !  +-ZebraA ! +--6          +--4     !  +--TetraA ! !     +--5          !  !        +--FuguA +--3 !       !  !  +---HumanC +--2 !    !  !  +--ZebraC 1 !    !  +-TetraC !   +FuguC remember: this is an unrooted tree! requires a total of   823.000


 * 5. Finally, the 'outfile' file needs to be renamed to 'infile', the input file for CONSENSE, the program that outputs the majority rule consensus tree. CONSENSE Web Server.

Consensus tree program, version 3.66 Species in order: 1. HumanA 2. ZebraA 3. TetraA 4. FuguA 5. HumanC 6. ZebraC 7. TetraC 8. FuguC Sets included in the consensus tree Set (species in order)    How many times out of  100.00 ....****                  100.00 ..**....                   100.00 .....***                   100.00 ......**                   100.00 .***....                   99.00 Sets NOT included in consensus tree: Set (species in order)    How many times out of  100.00 .*..****                   1.00 Extended majority rule consensus tree CONSENSUS TREE: the numbers on the branches indicate the number of times the partition of the species into the two sets which are separated by that branch occurred among the trees, out of 100.00 trees +--FuguC +100.0-|               +100.0-|      +--TetraC |     |         +100.0-|      +-ZebraC |     |  +--|      +HumanC |     |  |      |             +-ZebraA |     +99.0-|  |                    |      +--TetraA |                   +100.0-|  |                           +--FuguA | +--HumanA


 * 6. The 'outtree' file can be further manipulated (such as rooting) using TREEVIEW.