Kai Yuet/Protocols:Bootstrapping
From OpenWetWare
Jump to navigationJump to search
The Bootstrapping Process (KPY)
Bootstrapping is a method of estimating confidence levels of inferred relationships.
- 1. Align the given sequences using PileUp, a multiple sequence alignment program that utilizes a progressive, pairwise alignment method:
!!AA_MULTIPLE_ALIGNMENT 1.0 PileUp of: *.pep Symbol comparison table: GenRunData:blosum62.cmp CompCheck: 1102 GapWeight: 8 GapLengthWeight: 2 pileup.msf MSF: 529 Type: P January 16, 2007 15:39 Check: 970 .. Name: FuguC Len: 529 Check: 4363 Weight: 1.00 Name: TetraC Len: 529 Check: 3118 Weight: 1.00 Name: ZebraC Len: 529 Check: 3684 Weight: 1.00 Name: HumanC Len: 529 Check: 2247 Weight: 1.00 Name: FuguA Len: 529 Check: 6929 Weight: 1.00 Name: TetraA Len: 529 Check: 4354 Weight: 1.00 Name: ZebraA Len: 529 Check: 3876 Weight: 1.00 Name: HumanA Len: 529 Check: 2399 Weight: 1.00 // 1 50 FuguC MGRKKIQITR IMDERNRQVT FTKRKFGLMK KAYELSVLCD CEIALIIFNS TetraC MGRKKIQITR IMDERNRQVT FTKRKFGLMK KAYELSVLCD CEIALIIFNS ZebraC MGRKKIQITR IMDERNRQVT FTKRKFGLMK KAYELSVLCD CEIALIIFNS HumanC MGRKKIQITR IMDERNRQVT FTKRKFGLMK KAYELSVLCD CEIALIIFNS FuguA MGRKKIQITR IMDERNRQVT FTKRKFGLMK KAYELSVLCD CEIALIIFNS TetraA MGRKKIQITR IMDERNRQVT FTKRKFGLMK KAYELSVLCD CEIALIIFNS ZebraA MGRKKIQITR IMDERNRQVT FTKRKFGLMK KAYELSVLCD CEIALIIFNS HumanA MGRKKIQITR IMDERNRQVT FTKRKFGLMK KAYELSVLCD CEIALIIFNS
- 2. Re-align the PileUp output using ClustalW, output PHYLIP format, to generate a Clustal Alignment .aln file. This file, renamed as 'infile', will become the input for the SEQBOOT bootstrapping program.
8 543 FuguC MGRKKIQITR IMDERNRQVT FTKRKFGLMK KAYELSVLCD CEIALIIFNS TetraC MGRKKIQITR IMDERNRQVT FTKRKFGLMK KAYELSVLCD CEIALIIFNS ZebraC MGRKKIQITR IMDERNRQVT FTKRKFGLMK KAYELSVLCD CEIALIIFNS HumanC MGRKKIQITR IMDERNRQVT FTKRKFGLMK KAYELSVLCD CEIALIIFNS FuguA MGRKKIQITR IMDERNRQVT FTKRKFGLMK KAYELSVLCD CEIALIIFNS TetraA MGRKKIQITR IMDERNRQVT FTKRKFGLMK KAYELSVLCD CEIALIIFNS ZebraA MGRKKIQITR IMDERNRQVT FTKRKFGLMK KAYELSVLCD CEIALIIFNS HumanA MGRKKIQITR IMDERNRQVT FTKRKFGLMK KAYELSVLCD CEIALIIFNS
- 3. SEQBOOT resamples the input data set to create multiple bootstrapped or jackknifed data sets. Bootstrapping involves sampling and replacing N characters randomly; these new data sets can then be analyzed to estimate statistically the sampling distribution of data. SEQBOOT results are placed in a file named 'outfile'. PHYLIP Package. SEQBOOT Web Server.
8 543 FuguC MMMKKKQTRR RERRRRRQQQ VVTFTKRKKK GLLLKAYEEE LLVLLCDIAA IIIINSLYYK TetraC MMMKKKQTRR RERRRRRQQQ VVTFTKRKKK GLLLKAYEEE LLVLLCDIAA IIIINSLYYK ZebraC MMMKKKQTRR RERRRRRQQQ VVTFTKRKKK GLLLKAYEEE LLVLLCDIAA IIIINSLYYK HumanC MMMKKKQTRR RERRRRRQQQ VVTFTKRKKK GLLLKAYEEE LLVLLCDIAA IIIINSLYYK FuguA MMMKKKQTRR RERRRRRQQQ VVTFTKRKKK GLLLKAYEEE LLVLLCDIAA IIIINSLYYK TetraA MMMKKKQTRR RERRRRRQQQ VVTFTKRKKK GLLLKAYEEE LLVLLCDIAA IIIINSLYYK ZebraA MMMKKKQTRR RERRRRRQQQ VVTFTKRKKK GLLLKAYEEE LLVLLCDIAA IIIINSLYYK HumanA MMMKKKQTRR RERRRRRQQQ VVTFTKRKKK GLLLKAYEEE LLVLLCDIAA IIIINSLYYK
- 4. To find the phylogeny estimate for these new data sets, rename the 'outfile' to 'infile', and run PROTPARS (PROTein PARSimony) for protein sequences or similarly DNAPARS (DNA PARSimony) for nucleotide sequences, resetting its multiple data set option. PROTPARS infers an unrooted phylogeny from these data sets using a method similar to counting the number of changes required to evolve from one protein to another in different phylogenies and counting the minimum number of substitutions required to recreate a given phylogeny. Two files, the 'outfile' file and the 'outtree' file which contains all of the trees generated from the data sets. PROTPARS Web Server. PROTPARS Mirror.
Protein parsimony algorithm, version 3.66 Data set # 1: One most parsimonious tree found: +--------HumanA +--7 ! ! +-----ZebraA ! +--6 +--4 ! +--TetraA ! ! +--5 ! ! +--FuguA +--3 ! ! ! +-----------HumanC +--2 ! ! ! +--------------ZebraC 1 ! ! +-----------------TetraC ! +--------------------FuguC remember: this is an unrooted tree! requires a total of 823.000
- 5. Finally, the 'outfile' file needs to be renamed to 'infile', the input file for CONSENSE, the program that outputs the majority rule consensus tree. CONSENSE Web Server.
Consensus tree program, version 3.66 Species in order: 1. HumanA 2. ZebraA 3. TetraA 4. FuguA 5. HumanC 6. ZebraC 7. TetraC 8. FuguC Sets included in the consensus tree Set (species in order) How many times out of 100.00 ....**** 100.00 ..**.... 100.00 .....*** 100.00 ......** 100.00 .***.... 99.00 Sets NOT included in consensus tree: Set (species in order) How many times out of 100.00 .*..**** 1.00 Extended majority rule consensus tree CONSENSUS TREE: the numbers on the branches indicate the number of times the partition of the species into the two sets which are separated by that branch occurred among the trees, out of 100.00 trees +------FuguC +100.0-| +100.0-| +------TetraC | | +100.0-| +-------------ZebraC | | +------| +--------------------HumanC | | | | +-------------ZebraA | +--------99.0-| | | +------TetraA | +100.0-| | +------FuguA | +----------------------------------HumanA
- 6. The 'outtree' file can be further manipulated (such as rooting) using TREEVIEW.