User:Tara K. Luckau/Notebook/Team ConGen/2011/11/06

From OpenWetWare
Jump to navigationJump to search
Tara's Lab Notebook <html><img src="/images/9/94/Report.png" border="0" /></html> Main project page
<html><img src="/images/c/c3/Resultset_previous.png" border="0" /></html>Previous entry<html>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</html>Next entry<html><img src="/images/5/5c/Resultset_next.png" border="0" /></html>


SCOC MP1 - Prelim Data

Fragment Analysis Scoring

  • first pass = one marker at a time (able to zoom in)
  • second pass = one sample at a time (able to look at sample-wide patterns) - haven't done this one, yet!!!



Structure

  • Structure 2.3.3 for MacOS


Formatting Data

MS Excel

  • ROW 1 = marker names (structure term = "Marker Name")
    • since diploid, leave an extra column for the second allele between markers
  • COLUMN 1 = sample names (structure term = "Label")
  • COLUMN 2 = population designation (structure term = "PopData", in integer form)
    • 1 = red cluster
    • 2 = green cluster
    • 3 = blue cluster
  • COLUMN 3 = location designation (structure term = "LocData", in integer form)
    • 1 = Camp Pendleton
    • 2 = Rancho Jamul / Hollenbeck
    • 3 = Point Loma / Cabrillo National Monument
    • 4 = Santa Ysabel Open Space Preserve
    • 5 = Torrey Pines State Natural Reserve



MS Word

  • copy data from Excel, Paste Special into Word
    • as Unformatted Text
  • Save As ...
    • Plain Text (.txt)
    • File Conversion: Latin-US (DOS), CR/LF


Structure

New Project
  • File: New Project
  • Step 1
    • Name the project: 20111106AXRJ
    • Select directory: Research/Structure/StructureDirectory
    • Choose data file: browse to the .txt file you just made (from Excel and Word)
  • Step 2
    • Number of individuals: 93
    • Ploidy of data: 2
    • Number of loci: 10
    • Missing data value: 0
  • Step 3
    • Row of marker names - check
    • Data file stores data for individuals in a single line - check
  • Step 4
    • Individual ID for each individual - check
    • Putative population origin for each individual - check
    • Sampling location information - check


Run Project
  • Parameter Set: New...
  • Run Length: 10,000 and 10,000
  • Ancestry Model: Admixture
  • Allele Frequency Model: Correlated (frequencies in the different populations are likely to be similar, due to migration or shared ancestry; improves clustering for closely related populations (subtle structure), but may increase the risk of overestimating K)
  • Advanced: unclick "Compute probability of data (for estimating K)" to make program run faster
  • Parameter Set: Run
  • K=1
  • K=2
  • K=3
  • K=4
  • K=5
this doesn't look helpful ... population and location designations might have to be changed around ... but try Ancestry Model: Independent, just to see how it changes


Structure, try #2

  • change Ancestry Model: Independent (we expect allele frequencies in different populations to be reasonably different from each other; works well for many data sets (strong structure))
  • K=1
  • K=2
  • K=3
  • K=4
  • K=5
still not helpful ...


Structure, try #3

  • switch COLUMN 2 and COLUMN 3
  • COLUMN 2
    • 1 = Camp Pendleton
    • 2 = Rancho Jamul / Hollenbeck
    • 3 = Point Loma / Cabrillo National Monument
    • 4 = Santa Ysabel Open Space Preserve
    • 5 = Torrey Pines State Natural Reserve
  • COLUMN 3
    • 1 = red cluster
    • 2 = green cluster
    • 3 = blue cluster
  • saved as "20111106_AXRJb.txt"


  • K=1
  • K=2
  • K=3
  • K=4
  • K=5


  • this definitely looks like I've designated the populations and locations correctly
COLUMN 2 (PopData) = county area
COLUMN 3 (LocData) = array cluster
  • what do these patterns mean?
Pop ID 3 = Point Loma / Cabrillo ... breaks out a little bit if K=4 or K=5