User:R. Eric Collins/MBL/Popgen

From OpenWetWare
Jump to: navigation, search

Peter Beerli

Population Genetics

  • assumption of independence among sequences violated in a populatino
  • Wright-Fisher
    • individual drams parent from previous generation at random
    • wait on average 2N to coalesce
    • geometric distribution assumes discrete non-overlappy generations
    • real populations do not necessarily behave, but we assume that it can be extrapolated
    • because all samples contain information about the MRCA (most recent common ancestor), even samples with few individuals can most often recover the same TMRCA as a large sample
    • problem when too many samples are collected relative to generation size (> sqrt(4N)) because coalescent simplification assumes that not more than 1 coalescence happens per generation
    • large samples coalesce on averge in 4N generations
  • mutations-scaled population size
    • hard to disentangle, large pop/small mu = small pop/large mu
    • confounding between migration rate and divergence rate
  • recombination
    • if you know where these are you can/should break them up into separate loci
    • netrecodon to generate simulated recombination sequences
    • in order for recombination to make big differences you need to have VERY high rates of recombination
      • at least 1 in 50 per generation in a 2-population model
      • when migration is included it doesn't even seem to matter
  • unless you have time-series data, don't bother with estimating population size changes through time (e.g. skyline/skyride)
  • thanks 5x5 migratory model is even a large model
    • simplify model if possible to improve confidence/power
    • "test hypotheses... don't on fishing expeditions"
  • to get good estimates, you need: 1) a lot of data 2) a good computer
  • people with lots of data often don't run analyses long enough to guarantee convergence
  • F_ST and coalescence are based on same/similar assumptions so really one is not better than the other for recent divergence
  • shape of population size over time can really affect coalescence but need to know how and how it affects parameter estimation
    • e.g. bottlenecks, recoveries, expansions, contractions
  • when effective population size ~ generations since divergence it can get dicey to separate divergence from migration
  • no existing coalescent program take selection into account
  • felsenstein 2005: after ~10 individuals, should add another locus rather than more individuals

Species Tree Estimation

  • with long times between speciation, the gene tree matches the species tree with increasing probability
  • two ways to coalesce to ((A,B),(C,D)), one way each to coalesce (((C,D),B),A) and (((C,D),A),B)
    • so symmetric trees can be overrepresented
  • concatenated gene sequences are not the way to add information, can lead to statistically inconsistent results
    • but with long branch lengths and lots of genes you get enough power that it's ok
    • Bootstrap procedure can be positively misled in this situation
  • STEM: when only source of variability in single-gene histories is due to thecoalescence process
  • species definition? a group of individuals that fit a model of random branching

  • questions:
    • if order generations, can follow min and max to find ancestor of all existing species
    • migration as horizontal gene transfer?

"the reason to do a bayesian analysis is not to get a tree but to get the posterior distributions"

HGT versus huge ancestral population size + long coalescent times "bacteria are special"


  • if there were exponential growth in a population, estimating the mutation rate assuming a constant population size will UNDERESTIMATE the instantaneous mutation rate.
  • given the instantaneous mutation rate and a population growth model you can estimate the past mutation rate