User:Vanessa E Apkenas/Notebook/Phylogeny

From OpenWetWare
Jump to: navigation, search

Making a phylogeny

Finding sequence data for your taxa

  • Search on Phylota
  • Look at Phylo. Inform. Seq. Clusters; click on "See complete taxon coverage matrix" and note which genes are sequenced for most of your species (and therefore most informative)
  • Go back to main taxa page if you're looking at multiple families etc. and click on Phylo. Inform. Seq. Clusters for all (at the top of the chart); click on the complete taxa coverage matrix link at the top again; click each cluster number for the genes you want to use
  • Download alignments > Use All > Submit; copy and paste results into a new text file; repeat for each gene into new separate files
  • If the alignment doesn't come up: Download cluster > Use All > Submit; copy and paste into a text file for each gene; go to MUSCLE and copy and paste the results from each download cluster into the box and submit to get the alignments
  • Delete undesired taxa if necessary and any duplicated species (only want one sequence per species per gene – otherwise you'll have to prune them from the tree later..)
  • Sequences can also be added from GenBank for other species before the MUSCLE alignment step (just copy and paste them into the text files)

Finding fossil calibration data for your tree

  • Find names of fossil species in extant genera or subclades etc. in literature (need the minimum age) OR on fossilworks.org --> the age of the stratigraphic layer where the fossil was found is the approx. age
  • Example: Elephantulus antiquus 2.5-3.6 Ma --> 2.5 Ma minimum age for that genus (Elephantulus is an extant genus!)
  • Example: Rhynchocyon sp. 23-28.4 Ma --> 23 Ma minimum age for that genus

Preparing alignment FASTA files

  • Save alignment FASTA files again with only the species name (not < or the gi number), keeping original files for cross-reference just in case
  • Grep in text editor --> Find: >(\w+)_(\w+)_(\w+)_(\w+) Replace: >\3_\4 Replace All
  • Put all of these new files in a new subfolder
  • Convert these FASTA files to Nexus format with GENEIOUS (export as Nexus files)...
  • Run Beastin R script to add placeholders for missing species, make all sequences the same length, and to help you check for repeats of species (e.g., typos)

Generating the XML file in Beauti for BEAST

  • Look up if each gene you're using is coding or non-coding on genecards.org
  • In Beauti: link subst. models (select all), HKY, Gamma, (1+2)3, lognormal relaxed clock (estimate), speciation: Yule, Mu's = lognormal, ucld's = Gamma, 1, 1, 1, 0; Taxa: select groups and set ages according to fossil dates; MCMC; generate the XML file for BEAST

Running BEAST online on CIPRES (a computing cluster)

  • Register to use CIPRES
  • Upload XML file
  • Task --> New
  • BEAST

Wait for the output files to be emailed to you many hours later

Checking your run output on Tracer

  • Open Tracer
  • Import log file from CIPRES run
  • Set burnin value to the same as the run (e.g., 10,000,000); increase burnin if the distribution is weird or if things are coming up yellow or red
  • Re-adjust things in previous step if necessary; this is just a good visual check

Making a single tree file in TreeAnnotator from all of the CIPRES bayesian tree files that were generated

  • Open TreeAnnotator
  • Burnin = 1000 (10% of total runs set previously), these are the number of initial trees that will be thrown out
  • Max clade credibility tree
  • Mean heights
  • Input = .trees file from CIPRES
  • Output = whatever name you want

Visualizing the tree in FigTree

  • Open FigTree
  • Import the file made from TreeAnnotator
  • Adjust settings as desired: Ordering > Decreasing, Scale axis (not scale bar) > Reverse

Pruning the tree if there are duplicates for species

  • Use TreePruning script in R
  • Import back into FigTree