User:Lindenb/Notebook/UMR915/20100716

=loading GATK variations= loading GATK variations with UnifiedGenotyper. Something like: ~/bin/insertvariants.sh -C -s X1 -d "bwa/recal GATK UnifiedGenotyper X1 20100709" -t vcf -cfg ~/.umr915.properties gatk_call_X1.vcf.gz generating new input for sift split -C 900k jeter.sift.input.txt  sift_ and for polyphen
 * show interface to RR
 * starting creating a linkage with BDB
 * normalized the database with UMR915DBNormalisation
 * downloaded the data for the new project (see mail Jul 16 2010 09H52 )
 * added critical distance between variations.

=removing duplicate in the database= after normalisation, remove duplicates in sift and polyphen:

mysql -N -u anonymous -e 'select id from sift group by variation_id,alt having count(*)!=1' umr915 |\ awk '{printf("delete from sift where id=%s;\n",$1);}' mysql -N -u anonymous -e 'select id from polyphen group by variation_id,alt,library having count(*)!=1' umr915 |\ awk '{printf("delete from polyphen where id=%s;\n",$1);}' > jeter.sql =compare SIFT/polyphen= select P.prediction as "polyphen",S.prediction as "sift",count(*) from polyphen as P, sift as S where P.variation_id=S.variation_id and P.alt=S.alt and P.library="HumVar" group by 1,2 order by 1,2