User:Timothee Flutre/Notebook/Postdoc/2012/05/25: Difference between revisions
From OpenWetWare
(→About one-liners in data wrangling: add link to bash guide) |
(→About one-liners in data wrangling: add get first bases in fastq) |
||
(2 intermediate revisions by the same user not shown) | |||
Line 22: | Line 22: | ||
** [http://www.tldp.org/LDP/abs/html/ Advanced Bash-Scripting Guide] by Mendel Cooper | ** [http://www.tldp.org/LDP/abs/html/ Advanced Bash-Scripting Guide] by Mendel Cooper | ||
** [http://quinlanlab.org/tutorials/cshl2013/bedtools.html tutorial for bedtools] | ** [http://quinlanlab.org/tutorials/cshl2013/bedtools.html tutorial for bedtools] | ||
** [http://www.commentcamarche.net/faq/8386-kit-de-survie-linux kit de survie Linux] (en français) | |||
Line 73: | Line 74: | ||
$ echo -e "gene1\tsnp1" > file_subset.txt | $ echo -e "gene1\tsnp1" > file_subset.txt | ||
$ awk 'NR==FNR{a[$1$2]++;next;}{x=$1$2;if(x in a)print $0}' file_subset.txt <(sed 1d file_all.txt) | $ awk 'NR==FNR{a[$1$2]++;next;}{x=$1$2;if(x in a)print $0}' file_subset.txt <(sed 1d file_all.txt) | ||
* '''Get length of each sequence in a fasta file''': | |||
$ awk 'BEGIN{RS=">"} {split($0,a,"\n"); if(length(a)==0) next; seqlen=0; for(i=2;i<=length(a);++i){seqlen += length(a[i])}; printf a[1]"\t"seqlen"\n"}' sequences.fa | |||
* '''Get the bases 6 to 9 of each sequence in a fastq file''': provided that each rad only uses 4 lines | |||
$ zcat reads.fq.gz | awk '(NR % 4 == 2)' | cut -c 6-9 | |||
Revision as of 07:47, 15 January 2015
Project name | <html><img src="/images/9/94/Report.png" border="0" /></html> Main project page <html><img src="/images/c/c3/Resultset_previous.png" border="0" /></html>Previous entry<html> </html>Next entry<html><img src="/images/5/5c/Resultset_next.png" border="0" /></html> |
About one-liners in data wrangling
for i in {1..10}; do echo $i; done | sed 3,6d
$ for i in {1..20}; do echo $i; done | sed -n 3,5p
$ for i in {-5..5}; do echo $i; done | awk 'function abs(x){return (((x < 0.0) ? -x : x) + 0.0)} {print abs($1)}'
$ echo -e "gene\tsnp\tpvalue\ng1\ts1\t0.3\ng1\ts2\t0.002\ng2\ts2\t0.7\ng2\ts3\t0.05" > dat.txt gene snp pvalue g1 s1 0.3 g1 s2 0.002 g2 s2 0.7 g2 s3 0.05 $ cat dat.txt | sed 1d | sort -k1,1 -k3,3 | awk '{print $3"\t"$2"\t"$1}' | uniq -f2 g1 s2 0.002 g2 s3 0.05
$ subgroups=("s1" "s2" "s3" "s4"); for i in {0..2}; do let a=$i+1; for j in $(seq $a 3); do s1=${subgroups[$i]}; s2=${subgroups[$j]}; echo $s1 $s2; done; done
$ awk 'BEGIN{RS=">"} {if(NF==0)next; split($0,a,"\n"); printf "@"a[1]"\n"a[2]"\n+\n"; \ for(i=1;i<=length(a[2]);i++)printf "}"; printf"\n"}' probes.fa > probes.fq
$ echo -e "x\ty"; for i in {1..10}; do echo -e $i"\t"$RANDOM; done | (read -r; printf "%s\n" "$REPLY"; sort -k2,2n)
$ echo -e "gene\tsnp\tpvalue\ngene1\tsnp1\t0.002\ngene2\tsnp2\t0.8\ngene2\tsnp3\t0.1" > file_all.txt $ echo -e "gene1\tsnp1" > file_subset.txt $ awk 'NR==FNR{a[$1$2]++;next;}{x=$1$2;if(x in a)print $0}' file_subset.txt <(sed 1d file_all.txt)
$ awk 'BEGIN{RS=">"} {split($0,a,"\n"); if(length(a)==0) next; seqlen=0; for(i=2;i<=length(a);++i){seqlen += length(a[i])}; printf a[1]"\t"seqlen"\n"}' sequences.fa
$ zcat reads.fq.gz | awk '(NR % 4 == 2)' | cut -c 6-9
|