# User:Jarle Pahr/Phylogenetics

Much of the content on this page is based on Chapters 6 and 7 in Essential Bioinformatics by Jin Xiong.

## Contents

# Concepts/glossary

- Bootstrapping: A statistical technique that tests the sampling errors of a phylogenetic tree.
- Homoplasy: The obscuring of evolutionary distance which occurs because of several consecutive mutations at the same nucleotide positions.
- Among-site variation/among-site heterogenity: Differences in evolutionary rates among nucletoide/amino acid positions. Generally, a portion of sites are variant and the rest are invariant. The distribution of variant sites forllows a gamma distribution.

Newick format: A tree representation format using linear nested parantheses. Taxas are separated by commas. For scaled trees, branch lengths are indicated immediately after the taxon name. Examples:

(((B,C),A),(D,E))

# Phylogenetic markers

- 16S RNA
- RpoB
- GyrB
- EF-Tu
- pgk
- dnaK
- 16S–23S ITS

# Software

MEGA 5

MrBayes

DIVERGE: http://www.ncbi.nlm.nih.gov/pubmed/11934757

# Substitution models

Statistical models used to correct homoplasy are called substitution models or evolutionary models.

**Jukes-Cantor model:**

Assumes that all nucleotides are substituted with equal probability (unrealistic).

- Can only handle reasonably closely related sequences.

Formula:

d_AB = -(3/4) ln [1-(4/3)p_AB]

d_AB: Evolutionary distance between sequences A and B. p_AB: Observed sequence distance, measured by proportion of substitutions over the entire length of the alignment.

Formula corrected for among-site variation:

d_AB = (3/4)alpha [(1-(4/3)p_AB)^-1/alpha] -1 ??
(Formula is incomplete in Xiong's book. Need to check this out.)

alpha: The gamma correction factor.

**Kimura model:**

Mutation rates for transitions and transversion are assumed to be different (more realisti than Jukes-Cantor)

Formula:

d_AB = -(1/2) ln(1- 2 p_ti - p_tv) - (1/4) ln (1-2 p_tv)

p_ti: Observed frequency for transition. p_tv: Observed frequency for transversion.

Formula adjusted for among-site variation:

d_AB = (alpha/2)[(1- 2pti - ptv)^-1/alpha - (1/2)(1-2ptv)^-1/alpha - 1/2]

alpha: The gamma correction factor.

Kimura model for protein distance:

d = -ln(1- p -0.2p^2)

p: Observed pairwise distance between two sequences.

**More advanced models:** TN93, HKY, GT3. Take more parameters into consideration, but not normally used in practice (complex calculation, high variance).

# Three estimation methods

**Clustering-based methods:**

Unweighted Pair Group Method Using Arithmetic Average (UPGMA):

- The simplest clustering method.
- Basic assumption: All taxa evolve at a constant rate and are equally distant from the root ("molecular clock" hypothesis). Unlikely to hold for real data.
- Fast calculation speed.

Neighbour Joining (JI)

- The most widely used tree estimation method.

http://www.ncbi.nlm.nih.gov/pubmed/3447015

http://en.wikipedia.org/wiki/Neighbor_joining

**Optimality-based methods:**

Fitch-Margoliash (FM)

Minimum Evolution (ME)

**Character-based methods:**

Maximum Parsimony (MP)

- One of the first methods applied to phylogenetic tree construction.

Maximum Likelyhood (ML)

Bayesian Inference (BI)

# Tree representation

- Phylogram (scaled tree): The branch lengths represent the amount of evolutionary divergence.
- Cladogram (unscaled tree): Branch lengths have no phyologenetic meaning.

# Reclassifications

Examples from the literature of (proposals for) reclassifications of taxonomies:

# Links

http://peter.unmack.net/molecular/index.html

http://www.kuleuven.be/aidslab/phylogenybook/home.html

http://asserttrue.blogspot.no/2013/07/do-it-yourself-phylogenetic-trees.html

Phylogeny.fr: http://www.phylogeny.fr/

# Bibliography

**Articles:**

A daily-updated tree of (sequenced) life as a reference for genome research: http://www.nature.com/srep/2013/130618/srep02015/full/srep02015.html

http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0062510

Evaluation of general 16S ribosomal RNA gene PCR primers for classical and next-generation sequencing-based diversity studies: http://nar.oxfordjournals.org/content/41/1/e1.full?sid=e66b42ac-a309-47cf-8cd1-94e1229a098e

Molecular phylogenetics: State of the art methods for looking into the past. Trends Genet. 17:262-72.

**Books:**

Phylogenetic Trees Made Easy - a how-to manual. Fourth edition. Barry G Hall.

Fundamentals of Molecular Evolution. Sunderland, MA: Sinauer Associates.

The Phylogenetic Handbook: http://www.amazon.com/dp/0521730716/ref=rdr_ext_tmb

Jin Xiong: Essential Bioinformatics