User:R. Eric Collins/MBL/PAUP

From OpenWetWare
Jump to: navigation, search

David Swofford author of PAUP* and uncredited for many of the methods within

Model Selection

  • Parsimony
    • long branch attraction because branch lengths not taken into account so similar bases could equally be from equilibration or conservation
  • Models are _always_ wrong
    • don't/can't expect them to match reality
  • What is a _good_ model?
    • as simple as necessary but no simpler
    • a balance between under- and over-fitting
  • heterotachy: differential rates of evolution at different sites on different branches
    • can confuse maximum likelihood and choose long-branch tree
    • new mixture models are being written to address this issue
  • Model Selection Criteria
    • Likelihood ratio tests: δ = -2(ln L0 - ln L1)
      • chi-squared (frequentist) based, so always have possibility of Type I error, depending on error tolerance (α)
      • i.e. reject simple model in favor of more complex model even if simple model is true
    • Akaike information criterion (AIC)
      • AIC_i = -2lnL_i + 2K
      • tends to overestimate (be liberal with) number of parameters
    • Bayesian information criterion (BIC)
      • BIC_i = -2lnL_i + K ln n where n is sample size (typically number of sites)
      • converges on correct answer as more data is added
  • PAUP
    • tips
      • restrict dataset after loading datafile instead of making multiple copies of data subsets
      • uses -ln L so all things are minimized in PAUP (lower is better)
    • ModelTest
      • any reasonable tree can be used, actual tree topology has little effect on model selection
      • shouldn't have to run ModelTest, better to understand model selection well enough to winnow down model manually

to specify Tamura-Nei (transitions have equal probabilities, each transversion has different probabilities) lscores/nst=6 rclass=(abaaca)

  • to only operate on a subset of data
    • first load all the data
    • taxset: set macro of taxa names
    • delete: delete a certain subset of taxa
    • exclude: exclude subset of characters