User:Timothee Flutre/Notebook/Postdoc/2012/01/22: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
(→‎Entry title: add gnuplot cmd-line)
Line 25: Line 25:
  ind3 1.8 2.0
  ind3 1.8 2.0


Here is the file with the commands:
Here is the file with the commands ("+0.1" is an offset which needs to be tuned with respect to the data):
  $ cat commands.gnuplot
  $ cat commands.gnuplot
  set terminal postscript color
  set terminal postscript color

Revision as of 11:40, 22 January 2012

Project name <html><img src="/images/9/94/Report.png" border="0" /></html> Main project page
<html><img src="/images/c/c3/Resultset_previous.png" border="0" /></html>Previous entry<html>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</html>Next entry<html><img src="/images/5/5c/Resultset_next.png" border="0" /></html>

Entry title

  • Use EIGENSOFT to apply PCA on genotypes in order to check how individuals sampled in the study are related with each other.

Assuming that the genotypes are in the IMPUTE format, we need to convert them into the EIGENSTRAT format thanks to this script:

impute2eigenstrat.py -i genotypes_allchrs.impute.gz -o mystudy -l MyStudy

Now we can run the PCA:

smartpca.perl -i mystudy_eigenstrat_genotypes.txt -a mystudy_eigenstrat_snp.txt -b mystudy_eigenstrat_indiv.txt \
-o mystudy_eigenstrat_pca.txt -p mystudy_eigenstrat_plot -e mystudy_eigenstrat_eigenvalues.txt -l mystudy_eigenstrat_log.txt -s 6.0
  • If GNUPLOT is installed, we now have a plot of the first two principal components (in a pdf document). It may be useful to add the labels of the individuals on this plot. For this, one can modify the file "mystudy_eigenstrat_plot.xtxt" containing the commands for GNUPLOT.

As an example, let's assume that the output data file from EIGENSOFT containing the PCs is like this (3 individuals, 2 PCs):

$ cat data.txt
ind1	2.0	2.0
ind2	2.5	1.8
ind3	1.8	2.0

Here is the file with the commands ("+0.1" is an offset which needs to be tuned with respect to the data):

$ cat commands.gnuplot
set terminal postscript color
set title  "" 
set xlabel  "eigenvector 1" 
set ylabel  "eigenvector 2" 
#plot  "mystudy_eigenstrat_pca.txt.evec" using 2:3 title "MyStudy"
plot  "data.txt" using 2:3 title "Data", "" using 2:($3+0.1):1 with labels notitle
## pause 9999

We can obtain the new plot like this:

$ gnuplot < commands.gnuplot > newplot.ps; ps2pdf newplot.ps newplot.pdf