User:Tara K. Luckau/Notebook/Team ConGen/2011/11/06

From OpenWetWare
Jump to: navigation, search
Owwnotebook icon.png Tara's Lab Notebook Report.pngMain project page
Resultset previous.pngPrevious entry      Next entryResultset next.png


SCOC MP1 - Prelim Data

Fragment Analysis Scoring

  • first pass = one marker at a time (able to zoom in)
  • second pass = one sample at a time (able to look at sample-wide patterns) - haven't done this one, yet!!!


  • 20111106 SCOCMP1 FragPhero.png


Structure

  • Structure 2.3.3 for MacOS


Formatting Data

MS Excel

  • ROW 1 = marker names (structure term = "Marker Name")
    • since diploid, leave an extra column for the second allele between markers
  • COLUMN 1 = sample names (structure term = "Label")
  • COLUMN 2 = population designation (structure term = "PopData", in integer form)
    • 1 = red cluster
    • 2 = green cluster
    • 3 = blue cluster
    • 20111106 RoadSchematic.png
  • COLUMN 3 = location designation (structure term = "LocData", in integer form)
    • 1 = Camp Pendleton
    • 2 = Rancho Jamul / Hollenbeck
    • 3 = Point Loma / Cabrillo National Monument
    • 4 = Santa Ysabel Open Space Preserve
    • 5 = Torrey Pines State Natural Reserve


20111106 StructureExcel.png


MS Word

  • copy data from Excel, Paste Special into Word
    • as Unformatted Text
  • Save As ...
    • Plain Text (.txt)
    • File Conversion: Latin-US (DOS), CR/LF


Structure

New Project
  • File: New Project
  • Step 1
    • Name the project: 20111106AXRJ
    • Select directory: Research/Structure/StructureDirectory
    • Choose data file: browse to the .txt file you just made (from Excel and Word)
  • Step 2
    • Number of individuals: 93
    • Ploidy of data: 2
    • Number of loci: 10
    • Missing data value: 0
  • Step 3
    • Row of marker names - check
    • Data file stores data for individuals in a single line - check
  • Step 4
    • Individual ID for each individual - check
    • Putative population origin for each individual - check
    • Sampling location information - check


Run Project
  • Parameter Set: New...
  • Run Length: 10,000 and 10,000
  • Ancestry Model: Admixture
  • Allele Frequency Model: Correlated (frequencies in the different populations are likely to be similar, due to migration or shared ancestry; improves clustering for closely related populations (subtle structure), but may increase the risk of overestimating K)
  • Advanced: unclick "Compute probability of data (for estimating K)" to make program run faster
  • Parameter Set: Run
  • K=1
  • K=2
  • K=3
  • K=4
  • K=5
20111106 StructureBarPlot a1.jpg
20111106 StructureBarPlot a2.jpg
20111106 StructureBarPlot a3.jpg
20111106 StructureBarPlot a4.jpg
20111106 StructureBarPlot a5.jpg
this doesn't look helpful ... population and location designations might have to be changed around ... but try Ancestry Model: Independent, just to see how it changes


Structure, try #2

  • change Ancestry Model: Independent (we expect allele frequencies in different populations to be reasonably different from each other; works well for many data sets (strong structure))
  • K=1
  • K=2
  • K=3
  • K=4
  • K=5
20111106 StructureBarPlot b1.jpg
20111106 StructureBarPlot b2.jpg
20111106 StructureBarPlot b3.jpg
20111106 StructureBarPlot b4.jpg
20111106 StructureBarPlot b5.jpg
still not helpful ...


Structure, try #3

  • switch COLUMN 2 and COLUMN 3
  • COLUMN 2
    • 1 = Camp Pendleton
    • 2 = Rancho Jamul / Hollenbeck
    • 3 = Point Loma / Cabrillo National Monument
    • 4 = Santa Ysabel Open Space Preserve
    • 5 = Torrey Pines State Natural Reserve
  • COLUMN 3
    • 1 = red cluster
    • 2 = green cluster
    • 3 = blue cluster
  • saved as "20111106_AXRJb.txt"


  • K=1
  • K=2
  • K=3
  • K=4
  • K=5
20111106 StructureBarPlot c1.jpg
20111106 StructureBarPlot c2.jpg
20111106 StructureBarPlot c3.jpg
20111106 StructureBarPlot c4.jpg
20111106 StructureBarPlot c5.jpg


  • this definitely looks like I've designated the populations and locations correctly
COLUMN 2 (PopData) = county area
COLUMN 3 (LocData) = array cluster
  • what do these patterns mean?
Pop ID 3 = Point Loma / Cabrillo ... breaks out a little bit if K=4 or K=5