User:Jarle Pahr/Phylogenetics

From OpenWetWare
Jump to navigationJump to search

Much of the content on this page is based on Chapters 6 and 7 in Essential Bioinformatics by Jin Xiong.

Concepts/glossary

  • Bootstrapping: A statistical technique that tests the sampling errors of a phylogenetic tree.
  • Homoplasy: The obscuring of evolutionary distance which occurs because of several consecutive mutations at the same nucleotide positions.
  • Among-site variation/among-site heterogenity: Differences in evolutionary rates among nucletoide/amino acid positions. Generally, a portion of sites are variant and the rest are invariant. The distribution of variant sites forllows a gamma distribution.


Newick format: A tree representation format using linear nested parantheses. Taxas are separated by commas. For scaled trees, branch lengths are indicated immediately after the taxon name. Examples:

(((B,C),A),(D,E))

Phylogenetic markers

  • 16S RNA
  • RpoB
  • GyrB
  • EF-Tu
  • pgk
  • dnaK
  • 16S–23S ITS


Software

MEGA 5

MrBayes

DIVERGE: http://www.ncbi.nlm.nih.gov/pubmed/11934757

Substitution models

Statistical models used to correct homoplasy are called substitution models or evolutionary models.

Jukes-Cantor model:

Assumes that all nucleotides are substituted with equal probability (unrealistic).

  • Can only handle reasonably closely related sequences.

Formula:

d_AB = -(3/4) ln [1-(4/3)p_AB]

d_AB: Evolutionary distance between sequences A and B. p_AB: Observed sequence distance, measured by proportion of substitutions over the entire length of the alignment.

Formula corrected for among-site variation:


d_AB = (3/4)alpha [(1-(4/3)p_AB)^-1/alpha] -1  ?? (Formula is incomplete in Xiong's book. Need to check this out.)

alpha: The gamma correction factor.


Kimura model:

Mutation rates for transitions and transversion are assumed to be different (more realisti than Jukes-Cantor)

Formula:

d_AB = -(1/2) ln(1- 2 p_ti - p_tv) - (1/4) ln (1-2 p_tv)

p_ti: Observed frequency for transition. p_tv: Observed frequency for transversion.

Formula adjusted for among-site variation:

d_AB = (alpha/2)[(1- 2pti - ptv)^-1/alpha - (1/2)(1-2ptv)^-1/alpha - 1/2]


alpha: The gamma correction factor.

Kimura model for protein distance:

d = -ln(1- p -0.2p^2)

p: Observed pairwise distance between two sequences.

More advanced models: TN93, HKY, GT3. Take more parameters into consideration, but not normally used in practice (complex calculation, high variance).


Three estimation methods

Clustering-based methods:

Unweighted Pair Group Method Using Arithmetic Average (UPGMA):

  • The simplest clustering method.
  • Basic assumption: All taxa evolve at a constant rate and are equally distant from the root ("molecular clock" hypothesis). Unlikely to hold for real data.
  • Fast calculation speed.


Neighbour Joining (JI)

  • The most widely used tree estimation method.

http://www.ncbi.nlm.nih.gov/pubmed/3447015

http://en.wikipedia.org/wiki/Neighbor_joining


Optimality-based methods:


Fitch-Margoliash (FM)


Minimum Evolution (ME)


Character-based methods:

Maximum Parsimony (MP)

  • One of the first methods applied to phylogenetic tree construction.

Maximum Likelyhood (ML)

Bayesian Inference (BI)


Tree representation

  • Phylogram (scaled tree): The branch lengths represent the amount of evolutionary divergence.
  • Cladogram (unscaled tree): Branch lengths have no phyologenetic meaning.

Reclassifications

Examples from the literature of (proposals for) reclassifications of taxonomies:

Links

http://peter.unmack.net/molecular/index.html

http://www.kuleuven.be/aidslab/phylogenybook/home.html

http://asserttrue.blogspot.no/2013/07/do-it-yourself-phylogenetic-trees.html

Phylogeny.fr: http://www.phylogeny.fr/

Bibliography

Articles:

A daily-updated tree of (sequenced) life as a reference for genome research: http://www.nature.com/srep/2013/130618/srep02015/full/srep02015.html

http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0062510

Evaluation of general 16S ribosomal RNA gene PCR primers for classical and next-generation sequencing-based diversity studies: http://nar.oxfordjournals.org/content/41/1/e1.full?sid=e66b42ac-a309-47cf-8cd1-94e1229a098e

Molecular phylogenetics: State of the art methods for looking into the past. Trends Genet. 17:262-72.

Books:

Phylogenetic Trees Made Easy - a how-to manual. Fourth edition. Barry G Hall.

Fundamentals of Molecular Evolution. Sunderland, MA: Sinauer Associates.

The Phylogenetic Handbook: http://www.amazon.com/dp/0521730716/ref=rdr_ext_tmb

Jin Xiong: Essential Bioinformatics