User:Lindenb/Notebook/UMR915/20110714
From OpenWetWare

(allonzenfan)
playing with dbNSFP
curl -s "http://dl.dropbox.com/u/17001647/dbNSFP/dbNSFP1.1.chr1-22XY.zip" | funzip -t | head -n1 |tr " " "\n" | cat -n
1 #chr
2 pos(1-based)
3 ref
4 alt
5 aaref
6 aaalt
7 hg19pos(1-based)
8 genename
9 geneid
10 CCDSid
11 refcodon
12 codonpos
13 fold-degenerate
14 aapos
15 cds_strand
16 LRT_Omega
17 PhyloP_score
18 PlyloP_pred
19 SIFT_score
20 SIFT_pred
21 Polyphen2_score
22 Polyphen2_pred
23 LRT_score
24 LRT_pred
25 MutationTaster_score
26 MutationTaster_pred
27 Ancestral_allele
28 UniSNP_ids
29 Allele_freq
30 Alt_gene_name
31 dbXrefs
32 Descriptive_gene_name
33 1000_genomes_high_coverage
34 1000_genomes_low_coverage
getting the columns
AA1, AA2 sift & pph2 predictions.
curl -s "http://dl.dropbox.com/u/17001647/dbNSFP/dbNSFP1.1.chr1-22XY.zip" | zcat | cut -d ' ' -f 5,6,19,20,21,22 | head aaref aaalt SIFT_score SIFT_pred Polyphen2_score Polyphen2_pred M L 1.0 D 0.997 D M V 0.945248 NA 0.999 D M L 1.0 D 0.997 D M K 1.0 D 0.999 D M T 1.0 D 0.999 D M R 0.942261 NA 0.999 D M I 1.0 D 0.999 D M I 1.0 D 0.999 D M I 1.0 D 0.999 D
<html><script src="https://gist.github.com/1082406.js?file=predictions.cpp"></script></html>
Compile and run
g++ -I /usr/include/cairo predictions.cpp -lcairo curl -s "http://dl.dropbox.com/u/17001647/dbNSFP/dbNSFP1.1.chr1-22XY.zip" | zcat |\ cut -d ' ' -f 5,6,19,20,21,22 | egrep '^[A-Z] [A-Z]'| ./a.out
Result
Sift scores (xaxis) vs PPH2 score (yaxis) for each amino acid substitution in chr1 of dbNSFP ( http://sites.google.com/site/jpopgen/dbNSFP ).
Red crosses = Sift and polyphen are *BOTH* damaging.
The gray color reflects the Blosum62 matrix.