User:Lindenb/Notebook/UMR915/20101108
From OpenWetWare
Bioinformatics Course
http://www.ncbi.nlm.nih.gov/sra/SRX005999
ftp://ftp-trace.ncbi.nlm.nih.gov/sra/static/SRX005/SRX005999/SRR018110_1.fastq.bz2
ftp://ftp-trace.ncbi.nlm.nih.gov/sra/static/SRX005/SRX005999/SRR018110_2.fastq.bz2
MOSAIK
Build ref sequence
MosaikBuild -fr <reference.fasta> -oa <reference.dat> ------------------------------------------------------------------------------ MosaikBuild 1.1.0018 2010-10-29 Michael Stromberg & Wan-Ping Lee Marth Lab, Boston College Biology Department ------------------------------------------------------------------------------ - converting chr4_19.fa to a reference sequence archive. - parsing reference sequences: ref seqs: 2 (0.2849 ref seqs/s) - writing reference sequences: 100%[==========================================================================================================] 0.9975 ref seqs/s in 2 s - calculating MD5 checksums: 100%[==========================================================================================================] 2.00 ref seqs/s in 1 s - writing reference sequence index: 100%[==========================================================================================================] 2.00 ref seqs/s in 1 s - creating concatenated reference sequence: 100%[==========================================================================================================] 2.00 ref seqs/s in 1 s - writing concatenated reference sequence... finished. - creating concatenated 2-bit reference sequence... finished. - writing concatenated 2-bit reference sequence... finished. - writing masking vector... finished. MosaikBuild CPU time: 14.570 s, wall time: 17.410 s
for lo,g ref seq, do a MosaikJump
MosaikJump -ia reference.dat -hs 15 -out reference_hs15.dat ------------------------------------------------------------------------------ MosaikJump 1.1.0018 2010-10-29 Michael Stromberg Marth Lab, Boston College Biology Department ------------------------------------------------------------------------------ - retrieving reference sequence... finished. - hashing reference sequence: 100%[================================================================================================================] 3,463,196 hashes/s in 01:13 - serializing final sorting vector... finished. - writing jump positions database: 100%[========================================================================================================] 869,312.8 hash positions/s in 04:39 - serializing jump keys database (17 blocks): blocks: 17 (1.89 blocks/s)
Build fastq database
MosaikBuild -fr 1.TCA.454Reads.fna.gz -fq 1.TCA.454Reads.qual.gz -st 454 -out reads1.mkb ------------------------------------------------------------------------------ MosaikBuild 1.1.0018 2010-10-29 Michael Stromberg & Wan-Ping Lee Marth Lab, Boston College Biology Department ------------------------------------------------------------------------------ - setting read group ID to: ZDI4H940NP2 - setting sample name to: unknown - setting sequencing technology to: 454 - trimming leading and lagging N's. Mates with >4 interior N's will be deleted. - parsing FASTA files: reads: 318,127 (4,773.9 reads/s) Filtering statistics: ============================================ # reads deleted: 135 ( 0.0 %) # leading N's trimmed: 1 # lagging N's trimmed: 2703 -------------------------------------------- # reads written: 317992 # bases written: 122512849 MosaikBuild CPU time: 65.390 s, wall time: 66.805 s
Align
MosaikAligner -in reads1.mkb -ia reference.dat -j reference_hs15.dat -out align1.mka