User:Lindenb/Notebook/UMR915/20110222

From OpenWetWare

Jump to: navigation, search

20110221        Top        20110223       


Belgium

re-aligning on hg19 . see Makefile in

 /GENOTYPAGE/data/users/lindenb/20101108_belgium/20110222
/usr/local/package/mosaik-aligner/bin/MosaikBuild -fr /GENOTYPAGE/data/pubdb/ucsc/hg19/chromosomes/hg19.fa -oa _ignore.backup/hg19/reference.dat
------------------------------------------------------------------------------
MosaikBuild 1.1.0018                                                2010-10-29
Michael Stromberg & Wan-Ping Lee  Marth Lab, Boston College Biology Department
------------------------------------------------------------------------------

- converting /GENOTYPAGE/data/pubdb/ucsc/hg19/chromosomes/hg19.fa to a reference sequence archive.

- parsing reference sequences:
ref seqs: 25 (0.2884 ref seqs/s)

- writing reference sequences:
100%[===========================================================================================================]    0.8177 ref seqs/s        in 30 s  

- calculating MD5 checksums:
100%[===========================================================================================================]      1.43 ref seqs/s        in 17 s  

- writing reference sequence index:
100%[===========================================================================================================]      25.0 ref seqs/s        in  1 s  

- creating concatenated reference sequence:
100%[===========================================================================================================]      7.13 ref seqs/s        in  3 s  

- writing concatenated reference sequence...        finished.
- creating concatenated 2-bit reference sequence... finished.
- writing concatenated 2-bit reference sequence...  finished.
- writing masking vector...                         finished.

MosaikBuild CPU time: 187.420 s, wall time: 200.587 s
/usr/local/package/mosaik-aligner/bin/MosaikJump -ia _ignore.backup/hg19/reference.dat -hs 15 -out _ignore.backup/hg19/reference_hs15
------------------------------------------------------------------------------
MosaikJump 1.1.0018                                                 2010-10-29
Michael Stromberg                 Marth Lab, Boston College Biology Department
------------------------------------------------------------------------------


- retrieving reference sequence... finished.

- hashing reference sequence:
100%[=============================================================================================================] 2,301,657 hashes/s       in 22:24  

- serializing final sorting vector... finished.

- writing jump positions database:
100%[=====================================================================================================] 1,423,918 hash positions/s       in 33:29  

- serializing jump keys database (17 blocks):
blocks: 17 (0.9169 blocks/s)

MosaikJump CPU time: 3327.950 s, wall time: 3405.920 s

mean length for sample 1 is 385.282

gunzip -c 1.TCA.454Reads.fna.gz | egrep ">" | tr " " "\n" | grep length | cut -d '=' -f 2 | awk '{n+=1.0; total+=int($1)*1.0;} END { print total/n;}'

using

 -mmp 0.05 -act 26 -bw 51 -mhp 100 

for the Align param ( http://code.google.com/p/mosaik-aligner/wiki/ParameterSettings )

Align

/usr/local/package/mosaik-aligner/bin/MosaikAligner -mmp 0.05 -act 26 -bw 51 -mhp 100  -p 7 -in ../reads1.mkb -ia _ignore.backup/hg19/reference.dat -j _ignore.backup/hg19/reference_hs15 -out align1.hg19.mka
------------------------------------------------------------------------------
MosaikAligner 1.1.0018                                              2010-10-29
Michael Stromberg & Wan-Ping Lee  Marth Lab, Boston College Biology Department
------------------------------------------------------------------------------

- Using the following alignment algorithm: all positions
- Using the following alignment mode: aligning reads to all possible locations
- Using a maximum mismatch percent threshold of 0.05
- Using 7 processors
- Using a Smith-Waterman bandwidth of 51
- Using an alignment candidate threshold of 26bp.
- Setting hash position threshold to 100
- Using a jump database for hashing. Storing keys & positions in memory.
- Using a homo-polymer gap open penalty of 4
- loading reference sequence... finished.
- loading jump key database into memory... finished.
- loading jump positions database into memory... finished.

Aligning read library (317992):
100%[============================================================================================================================]      83.7 reads/s   in   1:03:18  

Alignment statistics (mates):
===================================
# filtered out:     19943 (  6.3 %)
# unique:          288449 ( 90.7 %)
# non-unique:        9600 (  3.0 %)
-----------------------------------
total:             317992
total aligned:     298049 ( 93.7 %)

MosaikAligner CPU time: 26744.380 s, wall time: 4015.994 s
/usr/local/package/mosaik-aligner/bin/MosaikAligner -mmp 0.05 -act 26 -bw 51 -mhp 100  -p 7 -in ../reads2.mkb -ia _ignore.backup/hg19/reference.dat -j _ignore.backup/hg19/reference_hs15 -out align2.hg19.mka
------------------------------------------------------------------------------
MosaikAligner 1.1.0018                                              2010-10-29
Michael Stromberg & Wan-Ping Lee  Marth Lab, Boston College Biology Department
------------------------------------------------------------------------------

- Using the following alignment algorithm: all positions
- Using the following alignment mode: aligning reads to all possible locations
- Using a maximum mismatch percent threshold of 0.05
- Using 7 processors
- Using a Smith-Waterman bandwidth of 51
- Using an alignment candidate threshold of 26bp.
- Setting hash position threshold to 100
- Using a jump database for hashing. Storing keys & positions in memory.
- Using a homo-polymer gap open penalty of 4
- loading reference sequence... finished.
- loading jump key database into memory... finished.
- loading jump positions database into memory... finished.

Aligning read library (563204):
100%[============================================================================================================================]     131.1 reads/s   in   1:11:35  

Alignment statistics (mates):
===================================
# filtered out:     28749 (  5.1 %)
# unique:          519486 ( 92.2 %)
# non-unique:       14969 (  2.7 %)
-----------------------------------
total:             563204
total aligned:     534455 ( 94.9 %)

MosaikAligner CPU time: 30219.450 s, wall time: 4500.977 s

Result

A few reads on the other chromosomes: e.g:

CTCACCAGACCTCCTAGGCGACCCAGACAATTATACCCTAGCCAACCCCTTAAACACCCCTCCCCACATCAAGCCCGAATGATA
TTTCCTATTCGCCTACACAATTCTCCGATCCGTCCCTAACAAACTAGGAGGCGTCCTTGCCCTATTACTATCCATCCTCATCCT
AGCAATAATCCCCATCCTCCATATATCCAAACAACAAAGCATAATATTTCGCCCACTAAGCCAATCACTTTATTGACTCCTAGC
CGCAGACCTCCTCATTCTAACCTGAATCGGAGGACAACCAGTAAGCTACCCTTTTACCATCATTGGACAAGTAGCATCCGTACT
ATACTTCACAACAATCCTAATCCTAATACCAACTATCTCCCTAATTGAAAACAAAATACTCAAATGGGCCTGTCCTTGTAGTAT
AAACTAATACACCAGTCTTGTAAACCGGAGATGAAAACCTTTTTCCAAGGACAAATCAGAGAAAAAGTCTTTAACTCCACCATT
AGCACCCAAAGCTAAGATTCTAATTTAAACTATTCTCTG

e.g. BLAT confirms on chrM (!):

   ACTIONS      QUERY           SCORE START  END QSIZE IDENTITY CHRO STRAND  START    END      SPAN
---------------------------------------------------------------------------------------------------
browser details YourSeq          541     1   543   543  99.9%     M   +      15482     16024    543


00001 ctcaccagacctcctaggcgacccagacaattataccctagccaacccct 00050
>>>>> |||||||||||||||||||||||||||||||||||||||||||||||||| >>>>>
15482 ctcaccagacctcctaggcgacccagacaattataccctagccaacccct 15531

00051 taaacacccctccccacatcaagcccgaatgatatttcctattcgcctac 00100
>>>>> |||||||||||||||||||||||||||||||||||||||||||||||||| >>>>>
15532 taaacacccctccccacatcaagcccgaatgatatttcctattcgcctac 15581

00101 acaattctccgatccgtccctaacaaactaggaggcgtccttgccctatt 00150
>>>>> |||||||||||||||||||||||||||||||||||||||||||||||||| >>>>>
15582 acaattctccgatccgtccctaacaaactaggaggcgtccttgccctatt 15631

00151 actatccatcctcatcctagcaataatccccatcctccatatatccaaac 00200
>>>>> |||||||||||||||||||||||||||||||||||||||||||||||||| >>>>>
15632 actatccatcctcatcctagcaataatccccatcctccatatatccaaac 15681

00201 aacaaagcataatatttcgcccactaagccaatcactttattgactccta 00250
>>>>> |||||||||||||||||||||||||||||||||||||||||||||||||| >>>>>
15682 aacaaagcataatatttcgcccactaagccaatcactttattgactccta 15731

00251 gccgcagacctcctcattctaacctgaatcggaggacaaccagtaagcta 00300
>>>>> |||||||||||||||||||||||||||||||||||||||||||||||||| >>>>>
15732 gccgcagacctcctcattctaacctgaatcggaggacaaccagtaagcta 15781

00301 cccttttaccatcattggacaagtagcatccgtactatacttcacaacaa 00350
>>>>> |||||||||||||||||||||||||||||||||||||||||||||||||| >>>>>
15782 cccttttaccatcattggacaagtagcatccgtactatacttcacaacaa 15831

00351 tcctaatcctaataccaactatctccctaattgaaaacaaaatactcaaa 00400
>>>>> |||||||||||||||||||||||||||||||||||||||||||||||||| >>>>>
15832 tcctaatcctaataccaactatctccctaattgaaaacaaaatactcaaa 15881

00401 tgggcctgtccttgtagtataaactaatacaccagtcttgtaaaccggag 00450
>>>>> |||||||||||||||||||||||||||||||||||||||||||||||||| >>>>>
15882 tgggcctgtccttgtagtataaactaatacaccagtcttgtaaaccggag 15931

00451 atgaaaacctttttccaaggacaaatcagagaaaaagtctttaactccac 00500
>>>>> | |||||||||||||||||||||||||||||||||||||||||||||||| >>>>>
15932 acgaaaacctttttccaaggacaaatcagagaaaaagtctttaactccac 15981

00501 cattagcacccaaagctaagattctaatttaaactattctctg 00543
>>>>> ||||||||||||||||||||||||||||||||||||||||||| >>>>>
15982 cattagcacccaaagctaagattctaatttaaactattctctg 16024
Personal tools