User:Timothee Flutre/Notebook/Postdoc/2013/12/01: Difference between revisions
From OpenWetWare
(→One-liners for high-throughput sequencing data: add strand) |
(→One-liners for high-throughput sequencing data: add info diff chroms) |
||
Line 108: | Line 108: | ||
samtools view -f 0x0008 -F 0x0004 -F 0x0100 -c $OUT | samtools view -f 0x0008 -F 0x0004 -F 0x0100 -c $OUT | ||
* ''' | * '''different chromosomes''': same as lines 12 and 13 | ||
samtools view -F 0x0004 -F 0x0008 -F 0x0100 $OUT | cut -f7 | sort | uniq -c # to list all possible values the 7-th field is taking | |||
samtools view -F 0x0004 -F 0x0008 -F 0x0100 $OUT | awk '($7 != "=")' | wc -l | |||
samtools view -q 5 -F 0x0004 -F 0x0008 -F 0x0100 $OUT | awk '($7 != "=")' | wc -l | |||
* '''strand''': get all alignments on the reverse strand (use <code>-F 0x10</code> for the forward) | |||
samtools view -f 0x10 $OUT | cut -f2 | sort | uniq -c | samtools view -f 0x10 $OUT | cut -f2 | sort | uniq -c |
Revision as of 03:26, 14 July 2015
Project name | <html><img src="/images/9/94/Report.png" border="0" /></html> Main project page Next entry<html><img src="/images/5/5c/Resultset_next.png" border="0" /></html> |
One-liners for high-throughput sequencing data
IN1="reads_R1.fastq.gz" IN2="reads_R2.fastq.gz" OUT="alignments.bam"
nbLines=$(zcat $IN1 | wc -l); echo "scale=0; "${nbLines}"/4" | bc -l
samtools view $OUT | head -1 | tr "\t" "\n" | nl -n ln
samtools flagstat $OUT which returns something like: line 1: 4635834 + 0 in total (QC-passed reads + QC-failed reads) line 2: 20290 + 0 secondary line 3: 0 + 0 supplimentary line 4: 0 + 0 duplicates line 5: 4443270 + 0 mapped (95.85%:-nan%) line 6: 4615544 + 0 paired in sequencing line 7: 2307772 + 0 read1 line 8: 2307772 + 0 read2 line 9: 4299122 + 0 properly paired (93.14%:-nan%) line 10: 4412810 + 0 with itself and mate mapped line 11: 10170 + 0 singletons (0.22%:-nan%) line 12: 57898 + 0 with mate mapped to a different chr line 13: 44330 + 0 with mate mapped to a different chr (mapQ>=5)
samtools view $OUT | cut -f2 | sort | uniq -c | sort -k1,1nr -k2,2n # | awk '{sum+=$1}END{print sum}' # <- same as line 1 list of bits used to form the flags this website is useful to interpret flags for instance, if there are N entries with flag 99, there should also be N entries with flag 147; idem for 83 and 163; idem for 77 and 141; etc
samtools view -c $OUT
samtools view -F 0x0004 -c $OUT
samtools view -f 0x0004 -c $OUT
samtools view $OUT | cut -f1 | sort | uniq | wc -l
samtools view -f 0x0100 -c $OUT # same as line 2 samtools view -f 0x0800 -c $OUT
nbE1=$(samtools view -f 0x0040 -c $OUT) nbE2=$(samtools view -f 0x0080 -c $OUT) echo "$nbE1 + $nbE2" | bc -l # same as line 1
samtools view -f 0x0040 $OUT | cut -f1 | sort | uniq -c > tmp1 samtools view -f 0x0080 $OUT | cut -f1 | sort | uniq -c > tmp2 wc -l < tmp1 # same as line 7 wc -l < tmp2 # same as line 8 cat tmp1 | awk '{print $1}' | sort | uniq -c cat tmp2 | awk '{print $1}' | sort | uniq -c
samtools view -f 0x0001 -c $OUT # same as line 1 if both input fastq files have the same number of reads samtools view -f 0x0001 -F 0x0100 -c $OUT # same as line 6 Note that this is different from line 6 which is equal to line 1 - line 2.
samtools view -f 0x0002 -F 0x0100 -c $OUT
samtools view -f 0x0002 -F 0x0100 $OUT | cut -f1 | sort | uniq | wc -l
samtools view -f 0x0008 -F 0x0004 -F 0x0100 -c $OUT
samtools view -F 0x0004 -F 0x0008 -F 0x0100 $OUT | cut -f7 | sort | uniq -c # to list all possible values the 7-th field is taking samtools view -F 0x0004 -F 0x0008 -F 0x0100 $OUT | awk '($7 != "=")' | wc -l samtools view -q 5 -F 0x0004 -F 0x0008 -F 0x0100 $OUT | awk '($7 != "=")' | wc -l
samtools view -f 0x10 $OUT | cut -f2 | sort | uniq -c samtools flags 113 # idem for 117, 121, etc |