User:Timothee Flutre/Notebook/Postdoc/2012/05/25: Difference between revisions
From OpenWetWare
(→One-liners with GNU tools: add "loop over pairs") |
|||
(4 intermediate revisions by the same user not shown) | |||
Line 6: | Line 6: | ||
| colspan="2"| | | colspan="2"| | ||
<!-- ##### DO NOT edit above this line unless you know what you are doing. ##### --> | <!-- ##### DO NOT edit above this line unless you know what you are doing. ##### --> | ||
== | ==One-liners with GNU tools== | ||
* | * '''Tutorial''': [http://www.ibm.com/developerworks/aix/library/au-unixtext/index.html Introduction to text manipulation on UNIX-based systems] by Brad Yoes (IBM) | ||
* '''Toolbox''': | |||
** [http://en.wikipedia.org/wiki/AWK AWK] | |||
** grep | |||
** sed | |||
** cut | |||
** tr | |||
** wc | |||
* '''Skip a subset of successive lines''': | |||
for i in {1..10}; do echo $i; done | sed 3,6d | |||
* '''Extract a subset of successive lines''': | |||
$ for i in {1..20}; do echo $i; done | sed -n 3,5p | |||
* '''Use absolute values:''' | |||
$ for i in {-5..5}; do echo $i; done | awk 'function abs(x){return (((x < 0.0) ? -x : x) + 0.0)} {print abs($1)}' | |||
* '''Extract the best snp per gene''': | |||
$ echo -e "gene\tsnp\tpvalue\ng1\ts1\t0.3\ng1\ts2\t0.002\ng2\ts2\t0.7\ng2\ts3\t0.05" > dat.txt | |||
gene snp pvalue | |||
g1 s1 0.3 | |||
g1 s2 0.002 | |||
g2 s2 0.7 | |||
g2 s3 0.05 | |||
$ cat dat.txt | sed 1d | sort -k1,1 -k3,3 | awk '{print $3"\t"$2"\t"$1}' | uniq -f2 | |||
g1 s2 0.002 | |||
g2 s3 0.05 | |||
* '''Loop over pairs''': | |||
$ subgroups=("s1" "s2" "s3" "s4"); for i in {0..2}; do let a=$i+1; for j in $(seq $a 3); do s1=${subgroups[$i]}; s2=${subgroups[$j]}; echo $s1 $s2; done; done | |||
* '''Convert file from fasta to fastq''': we can use the built-in variable "RS" ([http://www.gnu.org/software/gawk/manual/gawk.html#Records split records]) and use "split" ([http://www.gnu.org/software/gawk/manual/gawk.html#String-Functions string function]): | |||
<nowiki> | <nowiki> | ||
awk 'BEGIN{RS=">"} {if(NF==0)next; split($0,a,"\n"); printf "@"a[1]"\n"a[2]"\n+\n"; \ | $ awk 'BEGIN{RS=">"} {if(NF==0)next; split($0,a,"\n"); printf "@"a[1]"\n"a[2]"\n+\n"; \ | ||
for(i=1;i<=length(a[2]);i++)printf "}"; printf"\n"}' probes.fa > probes.fq | for(i=1;i<=length(a[2]);i++)printf "}"; printf"\n"}' probes.fa > probes.fq | ||
</nowiki> | </nowiki> |
Revision as of 11:11, 22 April 2013
Project name | <html><img src="/images/9/94/Report.png" border="0" /></html> Main project page <html><img src="/images/c/c3/Resultset_previous.png" border="0" /></html>Previous entry<html> </html>Next entry<html><img src="/images/5/5c/Resultset_next.png" border="0" /></html> |
One-liners with GNU tools
for i in {1..10}; do echo $i; done | sed 3,6d
$ for i in {1..20}; do echo $i; done | sed -n 3,5p
$ for i in {-5..5}; do echo $i; done | awk 'function abs(x){return (((x < 0.0) ? -x : x) + 0.0)} {print abs($1)}'
$ echo -e "gene\tsnp\tpvalue\ng1\ts1\t0.3\ng1\ts2\t0.002\ng2\ts2\t0.7\ng2\ts3\t0.05" > dat.txt gene snp pvalue g1 s1 0.3 g1 s2 0.002 g2 s2 0.7 g2 s3 0.05 $ cat dat.txt | sed 1d | sort -k1,1 -k3,3 | awk '{print $3"\t"$2"\t"$1}' | uniq -f2 g1 s2 0.002 g2 s3 0.05
$ subgroups=("s1" "s2" "s3" "s4"); for i in {0..2}; do let a=$i+1; for j in $(seq $a 3); do s1=${subgroups[$i]}; s2=${subgroups[$j]}; echo $s1 $s2; done; done
$ awk 'BEGIN{RS=">"} {if(NF==0)next; split($0,a,"\n"); printf "@"a[1]"\n"a[2]"\n+\n"; \ for(i=1;i<=length(a[2]);i++)printf "}"; printf"\n"}' probes.fa > probes.fq |