User:Tara K. Luckau/Notebook/Team ConGen/2011/11/06

Tara's Lab Notebook

<html><img src="/images/9/94/Report.png" border="0" /></html> Main project page
<html><img src="/images/c/c3/Resultset_previous.png" border="0" /></html>Previous entry<html>      </html>Next entry<html><img src="/images/5/5c/Resultset_next.png" border="0" /></html>

SCOC MP1 - Prelim Data

Fragment Analysis Scoring

first pass = one marker at a time (able to zoom in)
second pass = one sample at a time (able to look at sample-wide patterns) - haven't done this one, yet!!!

Structure

Structure 2.3.3 for MacOS

Formatting Data

MS Excel

ROW 1 = marker names (structure term = "Marker Name")
- since diploid, leave an extra column for the second allele between markers
COLUMN 1 = sample names (structure term = "Label")
COLUMN 2 = population designation (structure term = "PopData", in integer form)
- 1 = red cluster
- 2 = green cluster
- 3 = blue cluster
COLUMN 3 = location designation (structure term = "LocData", in integer form)
- 1 = Camp Pendleton
- 2 = Rancho Jamul / Hollenbeck
- 3 = Point Loma / Cabrillo National Monument
- 4 = Santa Ysabel Open Space Preserve
- 5 = Torrey Pines State Natural Reserve

MS Word

copy data from Excel, Paste Special into Word
- as Unformatted Text
Save As ...
- Plain Text (.txt)
- File Conversion: Latin-US (DOS), CR/LF

Structure

New Project

File: New Project
Step 1
- Name the project: 20111106AXRJ
- Select directory: Research/Structure/StructureDirectory
- Choose data file: browse to the .txt file you just made (from Excel and Word)
Step 2
- Number of individuals: 93
- Ploidy of data: 2
- Number of loci: 10
- Missing data value: 0
Step 3
- Row of marker names - check
- Data file stores data for individuals in a single line - check
Step 4
- Individual ID for each individual - check
- Putative population origin for each individual - check
- Sampling location information - check

Run Project

Parameter Set: New...

Run Length: 10,000 and 10,000
Ancestry Model: Admixture
Allele Frequency Model: Correlated (frequencies in the different populations are likely to be similar, due to migration or shared ancestry; improves clustering for closely related populations (subtle structure), but may increase the risk of overestimating K)
Advanced: unclick "Compute probability of data (for estimating K)" to make program run faster

Parameter Set: Run

K=1
K=2
K=3
K=4
K=5

this doesn't look helpful ... population and location designations might have to be changed around ... but try Ancestry Model: Independent, just to see how it changes

Structure, try #2

change Ancestry Model: Independent (we expect allele frequencies in different populations to be reasonably different from each other; works well for many data sets (strong structure))

K=1
K=2
K=3
K=4
K=5

still not helpful ...

Structure, try #3

switch COLUMN 2 and COLUMN 3
COLUMN 2
- 1 = Camp Pendleton
- 2 = Rancho Jamul / Hollenbeck
- 3 = Point Loma / Cabrillo National Monument
- 4 = Santa Ysabel Open Space Preserve
- 5 = Torrey Pines State Natural Reserve
COLUMN 3
- 1 = red cluster
- 2 = green cluster
- 3 = blue cluster
saved as "20111106_AXRJb.txt"

K=1
K=2
K=3
K=4
K=5

this definitely looks like I've designated the populations and locations correctly

COLUMN 2 (PopData) = county area

COLUMN 3 (LocData) = array cluster

what do these patterns mean?

Pop ID 3 = Point Loma / Cabrillo ... breaks out a little bit if K=4 or K=5

User:Tara K. Luckau/Notebook/Team ConGen/2011/11/06

Contents

SCOC MP1 - Prelim Data

Fragment Analysis Scoring

Structure

Formatting Data

MS Excel

MS Word

Structure

New Project

Run Project

Structure, try #2

Structure, try #3

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

research

Tools