User:Lindenb/Notebook/UMR915/20101124

From OpenWetWare
Jump to: navigation, search
Owwnotebook icon.png

20101123        Top        20101125       


Cover

waiting for http://www.cell.com/immunity/home ...

Sysadmin

meeting with SC : we need space

Dindel

installing dindel from sources http://www.sanger.ac.uk/resources/software/dindel . Edit the Makefile and set the path to the samtools src then 'make'.

g++ -I/usr/local/package/samtools-0.1.10 -Iseqan_library/ -I./  -Wno-deprecated  -O3  -DNDEBUG -D_IOLIB=2 -DMINREADS=2 -DDINDEL  -c -o DInDel.o DInDel.cpp
Library.hpp: In member function ‘void Library::calcProb(const std::vector<double, std::allocator<double> >&)’:
Library.hpp:89: warning: converting to ‘int’ from ‘const double’
g++ -I/usr/local/package/samtools-0.1.10 -Iseqan_library/ -I./  -Wno-deprecated  -O3  -DNDEBUG -D_IOLIB=2 -DMINREADS=2 -DDINDEL  -c -o HapBlock.o HapBlock.cpp
g++ -I/usr/local/package/samtools-0.1.10 -Iseqan_library/ -I./  -Wno-deprecated  -O3  -DNDEBUG -D_IOLIB=2 -DMINREADS=2 -DDINDEL  -c -o HaplotypeDistribution.o HaplotypeDistribution.cpp
g++ -I/usr/local/package/samtools-0.1.10 -Iseqan_library/ -I./  -Wno-deprecated  -O3  -DNDEBUG -D_IOLIB=2 -DMINREADS=2 -DDINDEL  -c -o ObservationModelFB.o ObservationModelFB.cpp
Library.hpp: In member function ‘void Library::calcProb(const std::vector<double, std::allocator<double> >&)’:
Library.hpp:89: warning: converting to ‘int’ from ‘const double’
g++ -I/usr/local/package/samtools-0.1.10 -Iseqan_library/ -I./  -Wno-deprecated  -O3  -DNDEBUG -D_IOLIB=2 -DMINREADS=2 -DDINDEL  -c -o GetCandidates.o GetCandidates.cpp
Library.hpp: In member function ‘void Library::calcProb(const std::vector<double, std::allocator<double> >&)’:
Library.hpp:89: warning: converting to ‘int’ from ‘const double’
g++ -I/usr/local/package/samtools-0.1.10 -Iseqan_library/ -I./  -Wno-deprecated  -O3  -DNDEBUG -D_IOLIB=2 -DMINREADS=2 -DDINDEL  -c -o Faster.o Faster.cpp
Library.hpp: In member function ‘void Library::calcProb(const std::vector<double, std::allocator<double> >&)’:
Library.hpp:89: warning: converting to ‘int’ from ‘const double’
g++ -o dindel -I/usr/local/package/samtools-0.1.10 -Iseqan_library/ -I./  -Wno-deprecated  -O3   DInDel.o HapBlock.o HaplotypeDistribution.o ObservationModelFB.o GetCandidates.o Faster.o   -L/usr/local/package/samtools-0.1.10  -lbam -lz -lboost_program_options -static  
/usr/local/package/samtools-0.1.10/libbam.a(knetfile.o): In function `socket_connect':
/home/lindenb/samtools-0.1.10/knetfile.c:99: warning: Using 'getaddrinfo' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
but
[lindenb@srv-clc-02]$ /usr/local/package/dindel-1.01-src/dindel --analysis getCIGARindels --bamFile sampe.sorted.bam  --outputFile dindel_output  --ref mygene.fa
Error parsing input options. Usage: (...)

fixed the bug by changing "bamFiles" to "bamList" in /usr/local/package/dindel-1.01-src/DInDel.cpp (seems to be a bug in BOOST http://lists.boost.org/Archives/boost/2006/01/98811.php ).

/usr/local/package/dindel-1.01-src/dindel --analysis getCIGARindels --bamFile sampe.sorted.bam  --outputFile dindel_output  --ref mygene.fa
Reading BAM file: sampe.sorted.bam
Parsing indels from CIGAR strings...
Library: dindel_default mean: 207.5 stddev: 40.9283
Wrote indels in CIGARS for target XXXXXXX to file dindel_output
Wrote library insert sizes to dindel_output.libraries.txt
done!

head  dindel_output.libraries.txt
#LIB dindel_default
0 1
1 1
2 1
3 1
4 1
5 1
6 1
7 1
8 1
(...)
but makeWindows.py described in the manual is not available in the sources... Hum switching to the binaries...

ok, restart

 python /usr/local/package/dindel-1.01/dindel-1.01-python/makeWindows.py --inputVarFile dindel_output.variants.txt --windowFilePrefix sample.realign_windows --numWindowsPerFile 1000
(...)
Number of candidates: 604 Number of windows: 494 Maximum window size: 161 Mean window size: 121
Chromosome: chr7_random Total lines: 173 at minimum distance 20
Number of candidates: 173 Number of windows: 131 Maximum window size: 141 Mean window size: 121
Chromosome: chrX_random Total lines: 200 at minimum distance 20
Number of candidates: 200 Number of windows: 163 Maximum window size: 141 Mean window size: 121
Chromosome: chr9 Total lines: 41545 at minimum distance 20
Number of candidates: 41545 Number of windows: 27168 Maximum window size: 848 Mean window size: 122
Chromosome: chr8 Total lines: 38569 at minimum distance 20
Number of candidates: 38569 Number of windows: 25891 Maximum window size: 201 Mean window size: 122
Chromosome: chr16_random Total lines: 12 at minimum distance 20
Number of candidates: 12 Number of windows: 10 Maximum window size: 123 Mean window size: 119
Chromosome: chr10 Total lines: 46560 at minimum distance 20
Number of candidates: 46560 Number of windows: 30113 Maximum window size: 321 Mean window size: 122
Chromosome: chr17_random Total lines: 763 at minimum distance 20
Number of candidates: 763 Number of windows: 587 Maximum window size: 161 Mean window size: 121

Many files are generated... Then for each result file one should

/usr/local/package/dindel-1.01/binaries/dindel-1.01-linux-64bit --analysis indels --doDiploid --bamFile recal_bwa_rmdup_XXXX.bam  --outputFile sample.dindels.1  --ref /GENOTYPAGE/data/pubdb/ucsc/hg18/chromosomes/hg18.fa --varFile sample.realign_windows.1.txt --libFile dindel_output.libraries.txt 

head sample.dindels.1.glf.txt 
msg index analysis_type tid lpos rpos center_position realigned_position was_candidate_in_window ref_all nref_all num_reads post_prob_variant qual est_freq logZ hapfreqs indidx msq numOffAll num_indel num_cover_forward num_cover_reverse num_unmapped_realigned var_coverage_forward var_coverage_reverse nBQT nmmBQT mLogBQ nMMLeft nMMRight glf
error_too_few_reads 1 NA chr6_random 417 536 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
error_too_few_reads 2 NA chr6_random 635 755 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
error_too_few_reads 3 NA chr6_random 2043 2162 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
ok 4 dip.map chr6_random 5183 5302 5242 5243 1 NA +T 3 NA 0.0518175 NA NA NA 0 60 NA NA 1 0 0 1 0 NA NA NA NA NA 1/1:0.0518175
ok 4 dip chr6_random 5183 5302 5242 5243 1 NA +T,+TT 3 NA NA NA -21.4465 NA 0 60 0 0 1 0 0 1,0 0,0 129 1 -3.22403 0 0 0/0:-26.2281,0/1:-22.1231,0/2:-25.6365,1/1:-21.4403,1/2:-22.1099,2/2:-25.2702
ok 4 dip chr6_random 5183 5302 5242 5280 0 NA R=>A 3 NA NA NA -24.8196 NA 0 60 0 1 1 0 0 0 0 128 0 -3.22188 0 0 0/0:-26.2159,0/1:-25.4643,1/1:-24.8013
ok 5 dip.map chr6_random 21407 21538 21473 21467 1 NA +C 3 NA 0.000108848 NA NA NA 0 60 NA NA 1 1 0 0 0 NA NA NA NA NA 0/1:0.000108848
ok 5 dip chr6_random 21407 21538 21473 21467 1 NA +C 3 NA NA NA -17.8254 NA 0 53.4447 0 0 0 0 0 0 0 162 0 -3.27222 0 0 0/0:-17.8254,0/1:-19.2092,1/1:-34.9632
ok 5 dip chr6_random 21407 21538 21473 21468 1 NA +T 3 NA NA NA -17.8254 NA 0 53.4447 0 0 0 0 0 0 0 162 0 -3.27222 0 0 0/0:-17.8254,0/1:-19.2114,1/1:-35.423
save the file names:
echo "sample.dindels.1.glf.txt" > sample.dindel_stage2_outputfiles.txt

and then...

python /usr/local/package/dindel-1.01/dindel-1.01-python/mergeOutputDiploid.py --inputFiles sample.dindel_stage2_outputfiles.txt --outputFile dindel.variants.vcf --ref /GENOTYPAGE/data/pubdb/ucsc/hg18/chromosomes/hg18.fa

Number of non-empty GLF files: 1
Calling variants from GLF file sample.dindels.1.glf.txt


 more dindel.variants.vcf 
##fileformat=VCFv4.0
##source=Dindel
##reference=/GENOTYPAGE/data/pubdb/ucsc/hg18/chromosomes/hg18.fa
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total number of reads in haplotype window">
##INFO=<ID=HP,Number=1,Type=Integer,Description="Reference homopolymer tract length">
##INFO=<ID=NF,Number=1,Type=Integer,Description="Number of reads covering non-ref variant on forward strand">
##INFO=<ID=NR,Number=1,Type=Integer,Description="Number of reads covering non-ref variant on reverse strand">
##INFO=<ID=NFS,Number=1,Type=Integer,Description="Number of reads covering non-ref variant site on forward strand">
##INFO=<ID=NRS,Number=1,Type=Integer,Description="Number of reads covering non-ref variant site on reverse strand">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype quality">
##ALT=<ID=DEL,Description="Deletion">
##FILTER=<ID=q20,Description="Quality below 20">
##FILTER=<ID=hp10,Description="Reference homopolymer length was longer than 10">
##FILTER=<ID=fr0,Description="Non-ref allele is not covered by at least one read on both strands">
##FILTER=<ID=wv,Description="Other indel in window had higher likelihood">
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	SAMPLE
chrX_random	46862	.	CTT	C	42	PASS	DP=2;NF=1;NR=1;NRS=1;NFS=1;HP=2	GT:GQ	1/1:6
chrX_random	49403	.	g	gC	3	q20	DP=2;NF=1;NR=0;NRS=1;NFS=1;HP=1	GT:GQ	0/1:3
chrX_random	61565	.	ga	g	49	PASS	DP=5;NF=1;NR=1;NRS=1;NFS=1;HP=2	GT:GQ	1/1:6
chrX_random	84517	.	G	GA	9	q20	DP=2;NF=1;NR=1;NRS=1;NFS=1;HP=9	GT:GQ	1/1:6
chrX_random	149111	.	c	cTG	11	q20	DP=4;NF=0;NR=1;NRS=0;NFS=1;HP=1	GT:GQ	1/1:4
chrX_random	1462787	.	atg	a	59	PASS	DP=2;NF=1;NR=1;NRS=1;NFS=1;HP=1	GT:GQ	1/1:6
chrX_random	1468928	.	A	AAG	11	q20	DP=3;NF=1;NR=1;NRS=2;NFS=1;HP=2	GT:GQ	1/1:6
chrX_random	1472124	.	C	CGT	4	q20	DP=3;NF=0;NR=1;NRS=0;NFS=1;HP=1	GT:GQ	1/1:4

Cover for Immunity

http://www.cell.com/immunity/abstract/S1074-7613%2810%2900401-2 Cell-Cell Propagation of NF-κB Transcription Factor and MAP Kinase Activation Amplifies Innate Immunity against Bacterial Infection Immunity, Volume 33, Issue 5, 804-816, 18 November 2010 . Christoph Alexander Kasper, Isabel Sorg, Christoph Schmutz, Therese Tschon, Harry Wischnewski, Man Lyang Kim, Cécile Arrieumerlou


CellImmunityCover20101124.jpg