Julius B. Lucks/Meetings and Notes/SMBE2007/models protein evolution 2

= Ziheng Yang : A mutation selection model of codon substitution = Tue Jun 26 14:02:43 EDT 2007
 * first one to propose codon models - analyze codon changes dN/dS (rather than nucleotides and AAs)
 * UCL
 * Rasmus Nielsen - Univ. Copenhagen
 * Alan Moses thinks this will be a revolutionary talk

Talk

 * Goldman & Yang 1994 model - codon substitution model
 * Mol. Biol. Evol., 11, 725, 1994
 * Yang and Nielsen, 1998, J. Mol. Evol.
 * Subst. rate to codon j proportional to equil. freq of codon j
 * does not separate mutational bias and selection on codon usage
 * Bierne & Eyre-Walker, 2003, Genetics, 165, 1587
 * Yang 2006, Computational Molecular Evolution, Oxford, 284
 * TTT,TTC,TCT,TCC transitions - only 2 rates
 * TTC, TCC preferred, others rare
 * realistically want 3 rates
 * model with 3 rates - Neilsen et al. - 20007, Mol. Biol. Evol., 24, 228
 * large rate from unpreferred to pref
 * smal rate for reverse
 * middle rate from pref to pref and unpref to unpref
 * requires a prioir partitioning of codons (Hiroshi Akashi)
 * Codon usage gen believed to be under selection in bacteria and Drosophila
 * mammals case is less clear
 * Akashi H, 1994, Genetics
 * synonymous changes change protein structure and function
 * Kimchi-Sarfaty, 2007, Science, 315, 525
 * Komar AA, 2007, Scienc, 315, 466
 * protein folding co-translational
 * silent SNP - altered protein translation kinetics - final protein diff conformation and function

Model

 * 1) mutation rate from nucl i to j described by HKY85 or GTR (REV) applied to all 3 processes
 * 2) * $$\mu_{ij} = a_{ij}\pi_j^*$$ - a's symmetric
 * 3) * $$\pi^*$$ mutational bias parameters
 * 4) * codons $$I = i_1i_2i_3 $$
 * 5) fixation probability function of selection coefficient
 * 6) * Kimura M, 1962, Genetics, 47, 713 - use Kimura formula
 * 7) ** $$S_{ij} = 2Ns_{ij} = 2N(f_j - f_i)$$
 * 8) ** N number of chromosomes
 * 9) Selection on protein is modeled using $$\omega$$
 * parameters in the model
 * 4 mutation rates
 * 60 codon fitness parameters
 * sequence distance or branch lengths
 * time reversible
 * markov change tr iff rate matrix is product of symmetrical matrix and diagonal matrix
 * equil rate of codon $$\pi_j \propto (\pi_{j1}^*\pi_{j2}^*\pi_{j3}^*)e^{F_J}$$
 * comments
 * use of omega to detect selection on the protein does not rely on assump that synon sites evolve neutrally
 * old medels in codeml such as F1x4, F3x4, Fcodon - not special cases of mutation selection model
 * Muse and Gaut, 1994 ,mBE, 11, 715

Results

 * why little correlation between $$\omega_{human-macaque}$$ and $$\omega_{mouse-rat}$$?
 * liklihood ration test of selection on synonymous codon usage
 * null model assumes synonymous codons have same fitness
 * most genes are under selection of codon usage

Summary

 * estimation of distances using old models fine
 * in most (90%) genes - sig evidence for nat selection driving evol of codon usage
 * most mutations have fitness in range |S| < 1 or 2, implying weak selection on codon usage or nearly neutral evolution

Questions

 * drosophila and bacteria - codon bias and gene expression found
 * expts - optimal codons can use in bacteria - use to translate more eff
 * mammals not as clear

= Tal Pupko : An evolutionary model that accounts for selection on synonymous mutations = Tue Jun 26 14:03:00 EDT 2007
 * Cell Res Immunology - Tel-Aviv
 * Ka/Ks webserver
 * collaborated with Nir Friedman

Words to Look Up

 * positive selection vs. purifying selection

Talk

 * codon models
 * enference of evel selection forces on a protein
 * purify selection
 * phylogeny
 * converting empirical AA replacement matrices into codon-based subst matrices
 * methods for computing Ka/Ks
 * subst. matrix rates 61x61
 * Yang's M model (2000)
 * K - transition/transversion ratio
 * $$\Pi$$ - codon frequency
 * w - factor of selection
 * problems
 * asummes rate of leu (UUG) to tryp (UGG) = rate leu (UUG) to phe (UUU) (single transvertion)
 * 1st 5 times more likely
 * model does not account for exact identity of AA
 * assumes instan rate betwiin two AAs that differ by one mutation ...
 * propose model
 * Mechanistic Empirical Combined (MEC)
 * exapand 20x20 empirical AA matrix into 61x61 codon matrix
 * assumptions
 * sum of rate of all codons = sum of rates of AAs, but take into account codon and AA probabilities
 * intensity of selection - omega - assume gamma distributed

Ks conservation

 * most models assume Ks (synonymous) same for all sites(reflects neutral rate of evolution)
 * is this true?

HIV

 * vif and pol overlap in diff frames - reduced Ks in these regions

Further

 * large scale search for conserved ks in mammals, viruses, bacteria and yeasts
 * impact of Ks conservation on positive selection inference
 * charcterization of conserved Ks regions
 * Goren, Mol. Cell, 2006

Questions
= Claudia Kleinman : Protein structure and sequence evolution - statistical potentials for phylogeny = Tue Jun 26 14:03:26 EDT 2007

Talk

 * probabilistic models of sequence evolution
 * try to incorporate protein structure explicitly into the models
 * site-dependant approaches
 * simulation of evolution: Parisi & Echave 2001
 * statistical potential
 * knowledge-based energy function derived from analysis of known protein structures
 * $$Q_{lm}r_le^{\beta(G_l - G_m)}$$
 * coarse grain structure
 * accounts for implicitly for poorly understood complex effects
 * pairwise potential that depends on distance between residues (w/ solvent accessibility potentials)
 * contact potentials
 * optimized for structure prediction problem

Devise stat potential for an evol context

 * $$E = E_{contact} + E_{solvent} + E_{torsion} + E_{SS}$$
 * derive contact map (binary n.n.'s)
 * contact energy parameter
 * solvent accessibility - arb # of classes
 * torsion - use main chain angles
 * Kleinman, BMC, 7, 326, 2006
 * likelihood proportional to exp of negative of this energy (chemical potential)
 * maximize (maximum likelihood)
 * estimate gradient by MCMC - follow to find the maximum

model comparison using Bayes factors

 * Rodrigue, MBE, 23, 1762, 2006
 * Poisson distr ref model
 * thermodynamic integration

Questions
= Mario Fares : The three-dimensionality of molecular evolution = Tue Jun 26 14:03:53 EDT 2007
 * Trinity College Dublin

Talk

 * detecting selective constraints in protein-coding genes: survival of the fittest
 * $$\omega = dN/dS$$
 * $$\omega < 1>$$ - purifying
 * $$\omega > 1>$$ - positive selection

Questions
= Allan Drummond : Modeling evolution when ribosomes fail = Tue Jun 26 14:04:15 EDT 2007
 * w/ Claus Wilke - UT Austin

Talk

 * ribosomes fail - don't ignore when model protein evol
 * near-universal observations
 * coding sequences evolve at very diff rates
 * dN and dS correlate
 * high expressed proteins evolve slowly
 * codons matching abundant tRNAs preferred
 * high expressed genes
 * conserved sites (Akashi 1994 - trans accuracy)
 * codon biased genes have fewer dS and dN
 * in matrix form - matrices look like block structure
 * bad news - not independant - PCA would predict just one factor
 * 1 protein in 5 mistranslated
 * can still fold
 * or can misfold
 * selection can act to favor protein sequences that are robust to mistranslation
 * certain codons translated 6 times more accurately (model)
 * lattice protein model
 * Bloom, PNAS (2005,2006)
 * Taverna, Goldstein, Proteins (2001)
 * anything within 5 kCal/mol of gs will fold
 * translational selection alone sufficient to explain the observed correlation matrix patters
 * Akashi 1994
 * select for speed - don't matter where opt codons are - have to go through all of them
 * select for acc - should put opt codons at most highly conserved sites
 * this allows a within gene test to see what matters most

Conclusions

 * evol rate should be considered as regulatory as well as functional signal
 * translational selection suffices to explain many evol patterns
 * brute-force modelling of protein evol possible