User:Lindenb/Notebook/UMR915/20101108

From OpenWetWare
Jump to navigationJump to search

20101105        Top        20101109       


Bioinformatics Course

http://www.ncbi.nlm.nih.gov/sra/SRX005999

ftp://ftp-trace.ncbi.nlm.nih.gov/sra/static/SRX005/SRX005999/SRR018110_1.fastq.bz2

ftp://ftp-trace.ncbi.nlm.nih.gov/sra/static/SRX005/SRX005999/SRR018110_2.fastq.bz2

MOSAIK

Build ref sequence

MosaikBuild -fr <reference.fasta> -oa <reference.dat>
------------------------------------------------------------------------------
MosaikBuild 1.1.0018                                                2010-10-29
Michael Stromberg & Wan-Ping Lee  Marth Lab, Boston College Biology Department
------------------------------------------------------------------------------

- converting chr4_19.fa to a reference sequence archive.

- parsing reference sequences:
ref seqs: 2 (0.2849 ref seqs/s)

- writing reference sequences:
100%[==========================================================================================================]    0.9975 ref seqs/s        in  2 s  

- calculating MD5 checksums:
100%[==========================================================================================================]      2.00 ref seqs/s        in  1 s  

- writing reference sequence index:
100%[==========================================================================================================]      2.00 ref seqs/s        in  1 s  

- creating concatenated reference sequence:
100%[==========================================================================================================]      2.00 ref seqs/s        in  1 s  

- writing concatenated reference sequence...        finished.
- creating concatenated 2-bit reference sequence... finished.
- writing concatenated 2-bit reference sequence...  finished.
- writing masking vector...                         finished.

MosaikBuild CPU time: 14.570 s, wall time: 17.410 s

for lo,g ref seq, do a MosaikJump

MosaikJump -ia reference.dat -hs 15 -out reference_hs15.dat
------------------------------------------------------------------------------
MosaikJump 1.1.0018                                                 2010-10-29
Michael Stromberg                 Marth Lab, Boston College Biology Department
------------------------------------------------------------------------------


- retrieving reference sequence... finished.

- hashing reference sequence:
100%[================================================================================================================] 3,463,196 hashes/s       in 01:13  

- serializing final sorting vector... finished.

- writing jump positions database:
100%[========================================================================================================] 869,312.8 hash positions/s       in 04:39  

- serializing jump keys database (17 blocks):
blocks: 17 (1.89 blocks/s)

Build fastq database

MosaikBuild -fr  1.TCA.454Reads.fna.gz -fq 1.TCA.454Reads.qual.gz -st 454 -out reads1.mkb
------------------------------------------------------------------------------
MosaikBuild 1.1.0018                                                2010-10-29
Michael Stromberg & Wan-Ping Lee  Marth Lab, Boston College Biology Department
------------------------------------------------------------------------------

- setting read group ID to: ZDI4H940NP2
- setting sample name to: unknown
- setting sequencing technology to: 454
- trimming leading and lagging N's. Mates with >4 interior N's will be deleted.

- parsing FASTA files:
reads: 318,127 (4,773.9 reads/s)

Filtering statistics:
============================================
# reads deleted:               135 (  0.0 %)
# leading N's trimmed:           1
# lagging N's trimmed:        2703
--------------------------------------------
# reads written:            317992
# bases written:         122512849

MosaikBuild CPU time: 65.390 s, wall time: 66.805 s

Align

MosaikAligner -in reads1.mkb -ia reference.dat -j reference_hs15.dat -out align1.mka