User:Lindenb/Notebook/UMR915/20110714

From OpenWetWare
Jump to: navigation, search
Owwnotebook icon.png

20110704        Top        20110714       


(allonzenfan)

playing with dbNSFP

curl -s "http://dl.dropbox.com/u/17001647/dbNSFP/dbNSFP1.1.chr1-22XY.zip" | funzip -t | head -n1 |tr "       " "\n" | cat -n

     1	#chr
     2	pos(1-based)
     3	ref
     4	alt
     5	aaref
     6	aaalt
     7	hg19pos(1-based)
     8	genename
     9	geneid
    10	CCDSid
    11	refcodon
    12	codonpos
    13	fold-degenerate
    14	aapos
    15	cds_strand
    16	LRT_Omega
    17	PhyloP_score
    18	PlyloP_pred
    19	SIFT_score
    20	SIFT_pred
    21	Polyphen2_score
    22	Polyphen2_pred
    23	LRT_score
    24	LRT_pred
    25	MutationTaster_score
    26	MutationTaster_pred
    27	Ancestral_allele
    28	UniSNP_ids
    29	Allele_freq
    30	Alt_gene_name
    31	dbXrefs
    32	Descriptive_gene_name
    33	1000_genomes_high_coverage
    34	1000_genomes_low_coverage

getting the columns

AA1, AA2 sift & pph2 predictions.

curl -s "http://dl.dropbox.com/u/17001647/dbNSFP/dbNSFP1.1.chr1-22XY.zip" | zcat | cut -d '  ' -f 5,6,19,20,21,22 | head
aaref	aaalt	SIFT_score	SIFT_pred	Polyphen2_score	Polyphen2_pred
M	L	1.0	D	0.997	D
M	V	0.945248	NA	0.999	D
M	L	1.0	D	0.997	D
M	K	1.0	D	0.999	D
M	T	1.0	D	0.999	D
M	R	0.942261	NA	0.999	D
M	I	1.0	D	0.999	D
M	I	1.0	D	0.999	D
M	I	1.0	D	0.999	D


<html><script src="https://gist.github.com/1082406.js?file=predictions.cpp"></script></html>

Compile and run

g++ -I /usr/include/cairo predictions.cpp -lcairo
curl -s "http://dl.dropbox.com/u/17001647/dbNSFP/dbNSFP1.1.chr1-22XY.zip" | zcat |\
cut -d '	' -f 5,6,19,20,21,22 | egrep '^[A-Z] [A-Z]'| ./a.out 

Result

20110715SiftvsPolyphen.png

Sift scores (xaxis) vs PPH2 score (yaxis) for each amino acid substitution in chr1 of dbNSFP ( http://sites.google.com/site/jpopgen/dbNSFP ).

Red crosses = Sift and polyphen are *BOTH* damaging.

The gray color reflects the Blosum62 matrix.