User:Timothee Flutre/Notebook/Postdoc/2012/05/25: Difference between revisions

Revision as of 11:19, 26 June 2013

Project name

<html><img src="/images/9/94/Report.png" border="0" /></html> Main project page
<html><img src="/images/c/c3/Resultset_previous.png" border="0" /></html>Previous entry<html>      </html>Next entry<html><img src="/images/5/5c/Resultset_next.png" border="0" /></html>

One-liners with GNU tools

Tutorial: Introduction to text manipulation on UNIX-based systems by Brad Yoes (IBM)

Toolbox:
- AWK
- grep
- sed
- cut
- tr
- wc

Skip a subset of successive lines:

for i in {1..10}; do echo $i; done | sed 3,6d

Extract a subset of successive lines:

$ for i in {1..20}; do echo $i; done | sed -n 3,5p

Use absolute values:

$ for i in {-5..5}; do echo $i; done | awk 'function abs(x){return (((x < 0.0) ? -x : x) + 0.0)} {print abs($1)}'

Extract the best snp per gene:

$ echo -e "gene\tsnp\tpvalue\ng1\ts1\t0.3\ng1\ts2\t0.002\ng2\ts2\t0.7\ng2\ts3\t0.05" > dat.txt
gene    snp     pvalue
g1      s1      0.3
g1      s2      0.002
g2      s2      0.7
g2      s3      0.05

$ cat dat.txt | sed 1d | sort -k1,1 -k3,3 | awk '{print $3"\t"$2"\t"$1}' | uniq -f2
g1      s2      0.002
g2      s3      0.05

Loop over pairs:

$ subgroups=("s1" "s2" "s3" "s4"); for i in {0..2}; do let a=$i+1; for j in $(seq $a 3); do s1=${subgroups[$i]}; s2=${subgroups[$j]}; echo $s1 $s2; done; done

Convert file from fasta to fastq: we can use the built-in variable "RS" (split records) and use "split" (string function):

$ awk 'BEGIN{RS=">"} {if(NF==0)next; split($0,a,"\n"); printf "@"a[1]"\n"a[2]"\n+\n"; \
for(i=1;i<=length(a[2]);i++)printf "}"; printf"\n"}' probes.fa > probes.fq

Sort a file with header line: that is, we don't want the first line to be sorted

$ echo -e "x\ty"; for i in {1..10}; do echo -e $i"\t"$RANDOM; done | (read -r; printf "%s\n" "$REPLY"; sort -k2,2n)

@@ Line 6: / Line 6: @@
 | colspan="2"|
 <!-- ##### DO NOT edit above this line unless you know what you are doing. ##### -->
-==Awk one-liner to convert fasta file into fastq format==
+==One-liners with GNU tools==
-* We can use the built-in variable "RS" ([http://www.gnu.org/software/gawk/manual/gawk.html#Records split records]) and use "split" ([http://www.gnu.org/software/gawk/manual/gawk.html#String-Functions string function]):
+* '''Tutorial''': [http://www.ibm.com/developerworks/aix/library/au-unixtext/index.html Introduction to text manipulation on UNIX-based systems] by Brad Yoes (IBM)
+* '''Toolbox''':
+** [http://en.wikipedia.org/wiki/AWK AWK]
+** grep
+** sed
+** cut
+** tr
+** wc
+* '''Skip a subset of successive lines''':
+ for i in {1..10}; do echo $i; done | sed 3,6d
+* '''Extract a subset of successive lines''':
+ $ for i in {1..20}; do echo $i; done | sed -n 3,5p
+* '''Use absolute values:'''
+ $ for i in {-5..5}; do echo $i; done | awk 'function abs(x){return (((x < 0.0) ? -x : x) + 0.0)} {print abs($1)}'
+* '''Extract the best snp per gene''':
+ $ echo -e "gene\tsnp\tpvalue\ng1\ts1\t0.3\ng1\ts2\t0.002\ng2\ts2\t0.7\ng2\ts3\t0.05" > dat.txt
+ gene    snp     pvalue
+ g1      s1      0.3
+ g1      s2      0.002
+ g2      s2      0.7
+ g2      s3      0.05
+ $ cat dat.txt | sed 1d | sort -k1,1 -k3,3 | awk '{print $3"\t"$2"\t"$1}' | uniq -f2
+ g1      s2      0.002
+ g2      s3      0.05
+* '''Loop over pairs''':
+ $ subgroups=("s1" "s2" "s3" "s4"); for i in {0..2}; do let a=$i+1; for j in $(seq $a 3); do s1=${subgroups[$i]}; s2=${subgroups[$j]}; echo $s1 $s2; done; done
+* '''Convert file from fasta to fastq''': we can use the built-in variable "RS" ([http://www.gnu.org/software/gawk/manual/gawk.html#Records split records]) and use "split" ([http://www.gnu.org/software/gawk/manual/gawk.html#String-Functions string function]):
   <nowiki>
-awk 'BEGIN{RS=">"} {if(NF==0)next; split($0,a,"\n"); printf "@"a[1]"\n"a[2]"\n+\n"; \
+$ awk 'BEGIN{RS=">"} {if(NF==0)next; split($0,a,"\n"); printf "@"a[1]"\n"a[2]"\n+\n"; \
 for(i=1;i<=length(a[2]);i++)printf "}"; printf"\n"}' probes.fa > probes.fq
 </nowiki>
+* '''Sort a file with header line''': that is, we don't want the first line to be sorted
+ $ echo -e "x\ty"; for i in {1..10}; do echo -e $i"\t"$RANDOM; done | (read -r; printf "%s\n" "$REPLY"; sort -k2,2n)
 <!-- ##### DO NOT edit below this line unless you know what you are doing. ##### -->

User:Timothee Flutre/Notebook/Postdoc/2012/05/25: Difference between revisions

Revision as of 11:19, 26 June 2013

One-liners with GNU tools

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

research

Tools