User:Lindenb/Notebook/UMR915/20110103

Bonne année !

VCFAnnotator
wrote a tool to annotate VCf files. see my blog: http://plindenbaum.blogspot.com/2011/01/my-tool-to-annotate-vcf-files.html Running this tool with belgium data.

Comparing mask/no mask
using ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/pilot_data/data/NA20772/sequence_read/ERR004053_*.recal.fastq.gz

mysql -D hg18 -u anonymous -e 'select distinct chrom,txStart,txEnd from knownGene where chrom="chr1" ' -N > /tmp/jeter.txt java -jar dist/maskfasta.jar -r /tmp/jeter.txt -e 10000 /GENOTYPAGE/data/users/lindenb/MASKBWA/NO_MASK/chr1.fa > /GENOTYPAGE/data/users/lindenb/MASKBWA/WITH_MASK/chr1.fa

[lindenb@srv-clc-02 NO_MASK]$ sed 's/N/\nN\n/g' NO_MASK/chr1.fa | grep N | wc -l 22250000 [lindenb@srv-clc-02 NO_MASK]$ sed 's/N/\nN\n/g' WITH_MASK/chr1.fa | grep N | wc -l 108399153

for each chr1.fa: /usr/local/package/bwa-0.5.9rc1/bwa index  chr1.fa  /usr/local/package/bwa-0.5.9rc1/bwa aln chr1.fa ../FASTQ/ERR004053_1.recal.fastq.gz > aln1.sai /usr/local/package/bwa-0.5.9rc1/bwa aln chr1.fa ../FASTQ/ERR004053_2.recal.fastq.gz > aln2.sai /usr/local/package/bwa-0.5.9rc1/bwa sampe chr1.fa aln1.sai aln2.sai ../FASTQ/ERR004053_1.recal.fastq.gz ../FASTQ/ERR004053_2.recal.fastq.gz > file.sam /usr/local/package/samtools-0.1.10/samtools faidx chr1.fa /usr/local/package/samtools-0.1.10/samtools view -b -t chr1.fa.fai file.sam > file.bam /usr/local/package/samtools-0.1.10/samtools sort file.bam sorted /usr/local/package/samtools-0.1.10/samtools index sorted.bam /usr/local/package/samtools-0.1.10/samtools pileup -vcf chr1.fa sorted.bam | awk '($3=="*"&&$6>=50)||($3!="*"&&$6>=20)' | cut -d '     ' -f 1-4 | sort | uniq > pileup.txt

[lindenb@srv-clc-02 WITH_MASK]$ wc -l pileup.txt 24921 pileup.txt [lindenb@srv-clc-02 NO_MASK]$ wc -l pileup.txt 26062 pileup.txt [lindenb@srv-clc-02 MASKBWA]$ comm -12 WITH_MASK/pileup.txt NO_MASK/pileup.txt | wc 13573   54292  266327 [lindenb@srv-clc-02 MASKBWA]$ comm -13 WITH_MASK/pileup.txt NO_MASK/pileup.txt | wc 12489   49956  249431 [lindenb@srv-clc-02 MASKBWA]$ comm -23 WITH_MASK/pileup.txt NO_MASK/pileup.txt | wc 11348   45392  229152 on position appears masked but not w/o mask: chr1	100005960	c	A in Masked /usr/local/package/samtools-0.1.10/samtools tview WITH_MASK/sorted.bam WITH_MASK/chr1.fa

100005921 100005931 100005941 100005951 100005961 100005971 tgctaattggtcagattggagatggaatca*tggggggtcgacgtgaggttttcttgctgtcttct ....G.........G............... MM.......R.A....RK................. ..             ,,,,,,,,,,,,,,cac,,,,,,,,,a,,,,a,,,  ,,,,,,,,,,,,, ....               ..........*CA.......A.A.......N....... ,,,,,, ..                 ..........*CA.......A.A............... ,,,, G...G..G.. ..........*CA.......A.A............... ....G.........G..T.. ...........T................. ....G.......                        .....A.....T................. ,,,,g,,,,,,,,,g,,,,,,,,,,,,,,,*,,,,,      ..............G........

NO MASK: /usr/local/package/samtools-0.1.10/samtools tview NO_MASK/sorted.bam NO_MASK/chr1.fa

100005931 100005941 100005951 100005961 100005971 100005981 tcagattggagatggaatcatggggggtcgacgtgaggttttcttgctgtcttctgttcctgggtg ..........CA.......R.A.....K............................         ..........CA.......A.A.......N.......                          ...........T.........................                          .....A.....T.........................                              .A..................................                                ..............G...................