User:Tara K. Luckau/Notebook/Team ConGen/2011/08/18

Tara's Lab Notebook

Main project page

Previous entry Next entry

Meeting with Andy Bohonak

Politics of Academia

clarify - authorship (including if I don't publish by a certain time), data ownership, sample ownership, lab notebooks
make paper trail ... happy email
SDSU form fills out with authorship, etc ... might be a good excuse to being it up

verify Mendelian inheritance - high priority

pregnant females, have babies (watch Rosemary's baby)
- maybe keep offspring alive
- screen mom and babies
find out likelihood that a clutch all has same father
will allow to see if that marker has weird null allele things

Duplicate data

To find error rate of data (in extraction, PCR, analysis, etc)
In text, able to say "total sample size was 300 individuals per species, of which 30 were repeated 3 times, 50 repeated twice; 2 errors found, traceable to such-and-such a process, giving error rate of 1.5%"
for this project, not necessary to repeat all the way back to extraction

Spatial Scale

gotta get:
- more consistent distances across roads
- more points across same road

Literature search

Lots of pop gen papers at spatial scale

Software

STRUCTURE - easy quick, do it with prelim data!
CREATE - formats microsat data for STRUCTURE, etc.

Playing with Structure software

Structure 2.3.3 for MacOS
spent 4 or 5 hours playing around with Structure, just trying to get it to work
Here's what I learned today, so I don't forget!

Formatting Data

MS Excel

first column is sample names (structure term = "Label")
second column is population designation (structure term = "PopData", in integer form)
- 1 = green cluster
- 2 = blue cluster
- 3 = red cluster
third column is location designation (structure term = "LocData", in integer form)
- 1 = Camp Pendleton
- 2 = Rancho Jamul / Hollenbeck
- 3 = Point Loma / Cabrillo National Monument
- 4 = Santa Ysabel Open Space Preserve
- 5 = Torrey Pines State Natural Reserve
first row is marker names (structure term = "Marker Name")
- since diploid, leave an extra column for the second allele between markers

MS Word

copy data from Excel, Paste Special into Word
- as Unformatted Text
Save As ...
- Plain Text (.txt)

- File Conversion: Latin-US (DOS), CR/LF

Structure

New Project

File: New Project
Step 1
- Name the project: whatever you want
- Select directory: make a folder for all Structure projects to be saved in
- Choose data file: browse to the .txt file you just made (from Excel and Word)
Step 2, 3, 4

--

Run Project

Parameter Set: New...

Run Length: 10,000 and 10,000 is a good place to start
Ancestry Model: Admixture is appropriate for the ConGen project
Allele Frequency Model: run both Correlated and Independent

correlated: frequencies in the different populations are likely to be similar, due to migration or shared ancestry; improves clustering for closely related populations (subtle structure), but may increase the risk of overestimating K

independent: we expect allele frequencies in different populations to be reasonably different from each other; works well for many data sets (strong structure)

Advanced: unclick "Compute probability of data (for estimating K)" to make program run faster

Parameter Set: Run

do several iterations where K=?

that pretty graphic that's in all the papers:

in the left-hand pane, select the run you want (K=?)
in the right-hand pane, click Bar plot: Show, then group by PopID

TA-DA!

User:Tara K. Luckau/Notebook/Team ConGen/2011/08/18

Contents

Meeting with Andy Bohonak

Politics of Academia

verify Mendelian inheritance - high priority

Duplicate data

Spatial Scale

Literature search

Software

Playing with Structure software

Formatting Data

MS Excel

MS Word

Structure

New Project

Run Project

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

research

Tools