SCOC MP1 - Prelim Data
Fragment Analysis Scoring
- first pass = one marker at a time (able to zoom in)
- second pass = one sample at a time (able to look at sample-wide patterns) - haven't done this one, yet!!!
Structure
- Structure 2.3.3 for MacOS
Formatting Data
MS Excel
- ROW 1 = marker names (structure term = "Marker Name")
- since diploid, leave an extra column for the second allele between markers
- COLUMN 1 = sample names (structure term = "Label")
- COLUMN 2 = population designation (structure term = "PopData", in integer form)
- 1 = red cluster
- 2 = green cluster
- 3 = blue cluster
- COLUMN 3 = location designation (structure term = "LocData", in integer form)
- 1 = Camp Pendleton
- 2 = Rancho Jamul / Hollenbeck
- 3 = Point Loma / Cabrillo National Monument
- 4 = Santa Ysabel Open Space Preserve
- 5 = Torrey Pines State Natural Reserve
MS Word
- copy data from Excel, Paste Special into Word
- Save As ...
- Plain Text (.txt)
- File Conversion: Latin-US (DOS), CR/LF
Structure
New Project
- File: New Project
- Step 1
- Name the project: 20111106AXRJ
- Select directory: Research/Structure/StructureDirectory
- Choose data file: browse to the .txt file you just made (from Excel and Word)
- Step 2
- Number of individuals: 93
- Ploidy of data: 2
- Number of loci: 10
- Missing data value: 0
- Step 3
- Row of marker names - check
- Data file stores data for individuals in a single line - check
- Step 4
- Individual ID for each individual - check
- Putative population origin for each individual - check
- Sampling location information - check
Run Project
- Run Length: 10,000 and 10,000
- Ancestry Model: Admixture
- Allele Frequency Model: Correlated (frequencies in the different populations are likely to be similar, due to migration or shared ancestry; improves clustering for closely related populations (subtle structure), but may increase the risk of overestimating K)
- Advanced: unclick "Compute probability of data (for estimating K)" to make program run faster
-
- this doesn't look helpful ... population and location designations might have to be changed around ... but try Ancestry Model: Independent, just to see how it changes
Structure, try #2
- change Ancestry Model: Independent (we expect allele frequencies in different populations to be reasonably different from each other; works well for many data sets (strong structure))
-
- still not helpful ...
Structure, try #3
- switch COLUMN 2 and COLUMN 3
- COLUMN 2
- 1 = Camp Pendleton
- 2 = Rancho Jamul / Hollenbeck
- 3 = Point Loma / Cabrillo National Monument
- 4 = Santa Ysabel Open Space Preserve
- 5 = Torrey Pines State Natural Reserve
- COLUMN 3
- 1 = red cluster
- 2 = green cluster
- 3 = blue cluster
- saved as "20111106_AXRJb.txt"
-
- this definitely looks like I've designated the populations and locations correctly
- COLUMN 2 (PopData) = county area
- COLUMN 3 (LocData) = array cluster
- what do these patterns mean?
- Pop ID 3 = Point Loma / Cabrillo ... breaks out a little bit if K=4 or K=5
|