User:Lindenb/Notebook/UMR915/20100716
From OpenWetWare
- show interface to RR
- starting creating a linkage with BDB
- normalized the database with UMR915DBNormalisation
- downloaded the data for the new project (see mail Jul 16 2010 09H52 )
- added critical distance between variations.
loading GATK variations
loading GATK variations with UnifiedGenotyper. Something like:
~/bin/insertvariants.sh -C -s X1 -d "bwa/recal GATK UnifiedGenotyper X1 20100709" -t vcf -cfg ~/.umr915.properties gatk_call_X1.vcf.gz
generating new input for sift
split -C 900k jeter.sift.input.txt sift_
and for polyphen
removing duplicate in the database
after normalisation, remove duplicates in sift and polyphen:
mysql -N -u anonymous -e 'select id from sift group by variation_id,alt having count(*)!=1' umr915 |\ awk '{printf("delete from sift where id=%s;\n",$1);}' mysql -N -u anonymous -e 'select id from polyphen group by variation_id,alt,library having count(*)!=1' umr915 |\ awk '{printf("delete from polyphen where id=%s;\n",$1);}' > jeter.sql
compare SIFT/polyphen
select P.prediction as "polyphen",S.prediction as "sift",count(*) from polyphen as P, sift as S where P.variation_id=S.variation_id and P.alt=S.alt and P.library="HumVar" group by 1,2 order by 1,2
Polyphen HumVar
polyphen | sift | count(*) |
---|---|---|
PROBABLY_DAMAGING | NULL | 5 |
PROBABLY_DAMAGING | UNSCORED | 437 |
PROBABLY_DAMAGING | TOLERATED | 3156 |
PROBABLY_DAMAGING | DAMAGING_LOW | 3616 |
PROBABLY_DAMAGING | DAMAGING | 5773 |
POSSIBLY_DAMAGING | NULL | 23 |
POSSIBLY_DAMAGING | UNSCORED | 576 |
POSSIBLY_DAMAGING | TOLERATED | 6934 |
POSSIBLY_DAMAGING | DAMAGING_LOW | 4549 |
POSSIBLY_DAMAGING | DAMAGING | 3053 |
BENIGN | NULL | 40 |
BENIGN | UNSCORED | 1009 |
BENIGN | TOLERATED | 20032 |
BENIGN | DAMAGING_LOW | 5135 |
BENIGN | DAMAGING | 1692 |
UNKNOWN | NULL | 6 |
UNKNOWN | UNSCORED | 636 |
UNKNOWN | TOLERATED | 3037 |
UNKNOWN | DAMAGING_LOW | 2838 |
UNKNOWN | DAMAGING | 255 |
Polyphen HumDiv
polyphen | sift | count(*) |
---|---|---|
PROBABLY_DAMAGING | NULL | 12 |
PROBABLY_DAMAGING | UNSCORED | 681 |
PROBABLY_DAMAGING | TOLERATED | 5545 |
PROBABLY_DAMAGING | DAMAGING_LOW | 5597 |
PROBABLY_DAMAGING | DAMAGING | 6951 |
POSSIBLY_DAMAGING | NULL | 15 |
POSSIBLY_DAMAGING | UNSCORED | 407 |
POSSIBLY_DAMAGING | TOLERATED | 5413 |
POSSIBLY_DAMAGING | DAMAGING_LOW | 3283 |
POSSIBLY_DAMAGING | DAMAGING | 1854 |
BENIGN | NULL | 41 |
BENIGN | UNSCORED | 934 |
BENIGN | TOLERATED | 19164 |
BENIGN | DAMAGING_LOW | 4420 |
BENIGN | DAMAGING | 1713 |
UNKNOWN | NULL | 6 |
UNKNOWN | UNSCORED | 636 |
UNKNOWN | TOLERATED | 3037 |
UNKNOWN | DAMAGING_LOW | 2838 |
UNKNOWN | DAMAGING | 255 |