User:Tara K. Luckau/Notebook/Team ConGen/2011/08/18

From OpenWetWare
Jump to navigationJump to search
Tara's Lab Notebook Main project page
Previous entry      Next entry


Meeting with Andy Bohonak

Politics of Academia

  • clarify - authorship (including if I don't publish by a certain time), data ownership, sample ownership, lab notebooks
  • make paper trail ... happy email
  • SDSU form fills out with authorship, etc ... might be a good excuse to being it up

verify Mendelian inheritance - high priority

  • pregnant females, have babies (watch Rosemary's baby)
    • maybe keep offspring alive
    • screen mom and babies
  • find out likelihood that a clutch all has same father
  • will allow to see if that marker has weird null allele things

Duplicate data

  • To find error rate of data (in extraction, PCR, analysis, etc)
  • In text, able to say "total sample size was 300 individuals per species, of which 30 were repeated 3 times, 50 repeated twice; 2 errors found, traceable to such-and-such a process, giving error rate of 1.5%"
  • for this project, not necessary to repeat all the way back to extraction

Spatial Scale

  • gotta get:
    • more consistent distances across roads
    • more points across same road

Literature search

  • Lots of pop gen papers at spatial scale

Software

  • STRUCTURE - easy quick, do it with prelim data!
  • CREATE - formats microsat data for STRUCTURE, etc.



Playing with Structure software

  • Structure 2.3.3 for MacOS
  • spent 4 or 5 hours playing around with Structure, just trying to get it to work
  • Here's what I learned today, so I don't forget!

Formatting Data

MS Excel

  • first column is sample names (structure term = "Label")
  • second column is population designation (structure term = "PopData", in integer form)
    • 1 = green cluster
    • 2 = blue cluster
    • 3 = red cluster
  • third column is location designation (structure term = "LocData", in integer form)
    • 1 = Camp Pendleton
    • 2 = Rancho Jamul / Hollenbeck
    • 3 = Point Loma / Cabrillo National Monument
    • 4 = Santa Ysabel Open Space Preserve
    • 5 = Torrey Pines State Natural Reserve
  • first row is marker names (structure term = "Marker Name")
    • since diploid, leave an extra column for the second allele between markers


MS Word

  • copy data from Excel, Paste Special into Word
    • as Unformatted Text
  • Save As ...
    • Plain Text (.txt)
    • File Conversion: Latin-US (DOS), CR/LF


Structure

New Project
  • File: New Project
  • Step 1
    • Name the project: whatever you want
    • Select directory: make a folder for all Structure projects to be saved in
    • Choose data file: browse to the .txt file you just made (from Excel and Word)
  • Step 2, 3, 4
-- --
Run Project
  • Parameter Set: New...
  • Run Length: 10,000 and 10,000 is a good place to start
  • Ancestry Model: Admixture is appropriate for the ConGen project
  • Allele Frequency Model: run both Correlated and Independent
correlated: frequencies in the different populations are likely to be similar, due to migration or shared ancestry; improves clustering for closely related populations (subtle structure), but may increase the risk of overestimating K
independent: we expect allele frequencies in different populations to be reasonably different from each other; works well for many data sets (strong structure)
  • Advanced: unclick "Compute probability of data (for estimating K)" to make program run faster
  • Parameter Set: Run
  • do several iterations where K=?
  • that pretty graphic that's in all the papers:
  • in the left-hand pane, select the run you want (K=?)
  • in the right-hand pane, click Bar plot: Show, then group by PopID
  • TA-DA!