Much of the content on this page is based on Chapters 6 and 7 in Essential Bioinformatics by Jin Xiong.

Concepts/glossary

Bootstrapping: A statistical technique that tests the sampling errors of a phylogenetic tree.
Homoplasy: The obscuring of evolutionary distance which occurs because of several consecutive mutations at the same nucleotide positions.
Among-site variation/among-site heterogenity: Differences in evolutionary rates among nucletoide/amino acid positions. Generally, a portion of sites are variant and the rest are invariant. The distribution of variant sites forllows a gamma distribution.

Newick format: A tree representation format using linear nested parantheses. Taxas are separated by commas. For scaled trees, branch lengths are indicated immediately after the taxon name. Examples:

(((B,C),A),(D,E))

Phylogenetic markers

16S RNA
RpoB
GyrB
EF-Tu
pgk
dnaK
16S–23S ITS

Software

MEGA 5

MrBayes

DIVERGE: http://www.ncbi.nlm.nih.gov/pubmed/11934757

Substitution models

Statistical models used to correct homoplasy are called substitution models or evolutionary models.

Jukes-Cantor model:

Assumes that all nucleotides are substituted with equal probability (unrealistic).

Can only handle reasonably closely related sequences.

Formula:

d_AB = -(3/4) ln [1-(4/3)p_AB]

d_AB: Evolutionary distance between sequences A and B. p_AB: Observed sequence distance, measured by proportion of substitutions over the entire length of the alignment.

Formula corrected for among-site variation:

d_AB = (3/4)alpha [(1-(4/3)p_AB)^-1/alpha] -1 ?? (Formula is incomplete in Xiong's book. Need to check this out.)

alpha: The gamma correction factor.

Kimura model:

Mutation rates for transitions and transversion are assumed to be different (more realisti than Jukes-Cantor)

Formula:

d_AB = -(1/2) ln(1- 2 p_ti - p_tv) - (1/4) ln (1-2 p_tv)

p_ti: Observed frequency for transition. p_tv: Observed frequency for transversion.

Formula adjusted for among-site variation:

d_AB = (alpha/2)[(1- 2pti - ptv)^-1/alpha - (1/2)(1-2ptv)^-1/alpha - 1/2]

alpha: The gamma correction factor.

Kimura model for protein distance:

d = -ln(1- p -0.2p^2)

p: Observed pairwise distance between two sequences.

More advanced models: TN93, HKY, GT3. Take more parameters into consideration, but not normally used in practice (complex calculation, high variance).

Three estimation methods

Clustering-based methods:

Unweighted Pair Group Method Using Arithmetic Average (UPGMA):

The simplest clustering method.
Basic assumption: All taxa evolve at a constant rate and are equally distant from the root ("molecular clock" hypothesis). Unlikely to hold for real data.
Fast calculation speed.

Neighbour Joining (JI)

The most widely used tree estimation method.

http://www.ncbi.nlm.nih.gov/pubmed/3447015

http://en.wikipedia.org/wiki/Neighbor_joining

Optimality-based methods:

Fitch-Margoliash (FM)

Minimum Evolution (ME)

Character-based methods:

Maximum Parsimony (MP)

One of the first methods applied to phylogenetic tree construction.

Maximum Likelyhood (ML)

Bayesian Inference (BI)

Tree representation

Phylogram (scaled tree): The branch lengths represent the amount of evolutionary divergence.
Cladogram (unscaled tree): Branch lengths have no phyologenetic meaning.

Reclassifications

Examples from the literature of (proposals for) reclassifications of taxonomies:

Links

http://peter.unmack.net/molecular/index.html

http://www.kuleuven.be/aidslab/phylogenybook/home.html

http://asserttrue.blogspot.no/2013/07/do-it-yourself-phylogenetic-trees.html

Phylogeny.fr: http://www.phylogeny.fr/

Bibliography

Articles:

A daily-updated tree of (sequenced) life as a reference for genome research: http://www.nature.com/srep/2013/130618/srep02015/full/srep02015.html

http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0062510

Evaluation of general 16S ribosomal RNA gene PCR primers for classical and next-generation sequencing-based diversity studies: http://nar.oxfordjournals.org/content/41/1/e1.full?sid=e66b42ac-a309-47cf-8cd1-94e1229a098e

Molecular phylogenetics: State of the art methods for looking into the past. Trends Genet. 17:262-72.

Books:

Phylogenetic Trees Made Easy - a how-to manual. Fourth edition. Barry G Hall.

Fundamentals of Molecular Evolution. Sunderland, MA: Sinauer Associates.

The Phylogenetic Handbook: http://www.amazon.com/dp/0521730716/ref=rdr_ext_tmb

Jin Xiong: Essential Bioinformatics

User:Jarle Pahr/Phylogenetics

Contents

Concepts/glossary

Phylogenetic markers

Software

Substitution models

Three estimation methods

Tree representation

Reclassifications

Links

Bibliography

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

research

Tools