User:Lindenb/Notebook/UMR915/20101108

Bioinformatics Course
http://www.ncbi.nlm.nih.gov/sra/SRX005999

ftp://ftp-trace.ncbi.nlm.nih.gov/sra/static/SRX005/SRX005999/SRR018110_1.fastq.bz2

ftp://ftp-trace.ncbi.nlm.nih.gov/sra/static/SRX005/SRX005999/SRR018110_2.fastq.bz2

Build ref sequence
MosaikBuild -fr  -oa  -- MosaikBuild 1.1.0018                                               2010-10-29 Michael Stromberg & Wan-Ping Lee Marth Lab, Boston College Biology Department --

- converting chr4_19.fa to a reference sequence archive.

- parsing reference sequences: ref seqs: 2 (0.2849 ref seqs/s)

- writing reference sequences: 100%[==========================================================================================================]   0.9975 ref seqs/s        in  2 s

- calculating MD5 checksums: 100%[==========================================================================================================]     2.00 ref seqs/s        in  1 s

- writing reference sequence index: 100%[==========================================================================================================]     2.00 ref seqs/s        in  1 s

- creating concatenated reference sequence: 100%[==========================================================================================================]     2.00 ref seqs/s        in  1 s

- writing concatenated reference sequence...       finished. - creating concatenated 2-bit reference sequence... finished. - writing concatenated 2-bit reference sequence... finished. - writing masking vector...                        finished.

MosaikBuild CPU time: 14.570 s, wall time: 17.410 s

for lo,g ref seq, do a MosaikJump
MosaikJump -ia reference.dat -hs 15 -out reference_hs15.dat -- MosaikJump 1.1.0018                                                2010-10-29 Michael Stromberg                Marth Lab, Boston College Biology Department --

- retrieving reference sequence... finished.

- hashing reference sequence: 100%[================================================================================================================] 3,463,196 hashes/s      in 01:13

- serializing final sorting vector... finished.

- writing jump positions database: 100%[========================================================================================================] 869,312.8 hash positions/s      in 04:39

- serializing jump keys database (17 blocks): blocks: 17 (1.89 blocks/s)

Build fastq database
MosaikBuild -fr 1.TCA.454Reads.fna.gz -fq 1.TCA.454Reads.qual.gz -st 454 -out reads1.mkb -- MosaikBuild 1.1.0018                                               2010-10-29 Michael Stromberg & Wan-Ping Lee Marth Lab, Boston College Biology Department --

- setting read group ID to: ZDI4H940NP2 - setting sample name to: unknown - setting sequencing technology to: 454 - trimming leading and lagging N's. Mates with >4 interior N's will be deleted.

- parsing FASTA files: reads: 318,127 (4,773.9 reads/s)

Filtering statistics:

=
===============================
 * 1) reads deleted:               135 (  0.0 %)
 * 2) leading N's trimmed:           1
 * 3) lagging N's trimmed:        2703


 * 1) reads written:            317992
 * 2) bases written:         122512849

MosaikBuild CPU time: 65.390 s, wall time: 66.805 s

Align
MosaikAligner -in reads1.mkb -ia reference.dat -j reference_hs15.dat -out align1.mka