RRedon:Protocols/Variation pipeline/BWA

=Download=


 * http://bio-bwa.sourceforge.net

=Indexing the reference sequence=

See Main article : Reference genome

=Map= -l Take the first INT subsequence as seed
 * Align one fastq files

-q Parameter for read trimming.

-t number of threads = 10 on server 2

bwa aln -l 32 -q 15 -t 10 -foutput1.aln hg18.fasta file1.fastq.gz bwa aln -l 32 -q 15 -t 10 -foutput2.aln hg18.fasta file2.fastq.gz


 * Generate alignments in the SAM format given paired-end reads. Repetitive read pairs will be placed randomly.

bwa sampe hg18.fasta output1.aln output2.aln file1.fastq.gz file2.fastq.gz | gzip --best > output.sam.gz


 * export to bam ?

samtools view output.sam > output.bam

Use the reference genome indexed by samtools

samtools import hg18.fa.fai output.sam output.bam samtools sort output.bam output.bam.sorted samtools index chr1.sorted.bam


 * sort bam

samtools sort output.bam sorted_prefix

do insert size stats e.g. 99.8 percentile for MAQ max insert size