User:R. Eric Collins/MBL/PAUP
From OpenWetWare
< User:R. Eric Collins | MBL
David Swofford author of PAUP* and uncredited for many of the methods within
Model Selection
- Parsimony
- long branch attraction because branch lengths not taken into account so similar bases could equally be from equilibration or conservation
- Models are _always_ wrong
- don't/can't expect them to match reality
- What is a _good_ model?
- as simple as necessary but no simpler
- a balance between under- and over-fitting
- heterotachy: differential rates of evolution at different sites on different branches
- can confuse maximum likelihood and choose long-branch tree
- new mixture models are being written to address this issue
- Model Selection Criteria
- Likelihood ratio tests: δ = -2(ln L0 - ln L1)
- chi-squared (frequentist) based, so always have possibility of Type I error, depending on error tolerance (α)
- i.e. reject simple model in favor of more complex model even if simple model is true
- Akaike information criterion (AIC)
- AIC_i = -2lnL_i + 2K
- tends to overestimate (be liberal with) number of parameters
- Bayesian information criterion (BIC)
- BIC_i = -2lnL_i + K ln n where n is sample size (typically number of sites)
- converges on correct answer as more data is added
- Likelihood ratio tests: δ = -2(ln L0 - ln L1)
- PAUP
- tips
- restrict dataset after loading datafile instead of making multiple copies of data subsets
- uses -ln L so all things are minimized in PAUP (lower is better)
- ModelTest
- any reasonable tree can be used, actual tree topology has little effect on model selection
- shouldn't have to run ModelTest, better to understand model selection well enough to winnow down model manually
- tips
to specify Tamura-Nei (transitions have equal probabilities, each transversion has different probabilities) lscores/nst=6 rclass=(abaaca)
- to only operate on a subset of data
- first load all the data
- taxset: set macro of taxa names
- delete: delete a certain subset of taxa
- exclude: exclude subset of characters