User:Timothee Flutre/Notebook/Postdoc/2012/05/25

From OpenWetWare

(Difference between revisions)
Jump to: navigation, search
(One-liners with GNU tools: add "extract the best snp per gene")
(One-liners with GNU tools: add tuto cmd-line)
(3 intermediate revisions not shown.)
Line 8: Line 8:
==One-liners with GNU tools==
==One-liners with GNU tools==
-
* '''Tutorial''': [http://www.ibm.com/developerworks/aix/library/au-unixtext/index.html Introduction to text manipulation on UNIX-based systems] by Brad Yoes (IBM)
+
* '''Toolbox''': often available by default on many computers running GNU/Linux
 +
** [https://en.wikipedia.org/wiki/Bash_%28Unix_shell%29 Bash]
 +
** [https://en.wikipedia.org/wiki/AWK AWK]
 +
** [https://en.wikipedia.org/wiki/Grep grep]
 +
** [https://en.wikipedia.org/wiki/Sed sed]
 +
** [https://en.wikipedia.org/wiki/GNU_Core_Utilities GNU coreutils] (head, tail, cut, uniq, sort, tr, ...)  
-
* '''Toolbox''':
+
* '''Tutorials''':
-
** [http://en.wikipedia.org/wiki/AWK AWK]
+
** [http://en.flossmanuals.net/command-line/index/ Introduction to the command-line]
-
** grep
+
** [http://www.ibm.com/developerworks/aix/library/au-unixtext/index.html Introduction to text manipulation on UNIX-based systems] by Brad Yoes (IBM)
-
** sed
+
-
** cut
+
-
** tr
+
-
** wc
+
Line 44: Line 45:
  g1      s2      0.002
  g1      s2      0.002
  g2      s3      0.05
  g2      s3      0.05
 +
 +
 +
* '''Loop over pairs''':
 +
 +
$ subgroups=("s1" "s2" "s3" "s4"); for i in {0..2}; do let a=$i+1; for j in $(seq $a 3); do s1=${subgroups[$i]}; s2=${subgroups[$j]}; echo $s1 $s2; done; done
Line 52: Line 58:
for(i=1;i<=length(a[2]);i++)printf "}"; printf"\n"}' probes.fa > probes.fq
for(i=1;i<=length(a[2]);i++)printf "}"; printf"\n"}' probes.fa > probes.fq
</nowiki>
</nowiki>
 +
 +
 +
* '''Sort a file with header line''': that is, we don't want the first line to be sorted
 +
 +
$ echo -e "x\ty"; for i in {1..10}; do echo -e $i"\t"$RANDOM; done | (read -r; printf "%s\n" "$REPLY"; sort -k2,2n)
 +
 +
 +
* '''Get rows from a big file which are also in a small file''': example of using awk with 2 input files by loading the important information from the small file into an array in memory, then parsing the big file line by line and comparing each with the content of the array
 +
 +
$ echo -e "gene\tsnp\tpvalue\ngene1\tsnp1\t0.002\ngene2\tsnp2\t0.8\ngene2\tsnp3\t0.1" > file_all.txt
 +
$ echo -e "gene1\tsnp1" > file_subset.txt
 +
$ awk 'NR==FNR{a[$1$2]++;next;}{x=$1$2;if(x in a)print $0}' file_subset.txt <(sed 1d file_all.txt)
 +
<!-- ##### DO NOT edit below this line unless you know what you are doing. ##### -->
<!-- ##### DO NOT edit below this line unless you know what you are doing. ##### -->

Revision as of 01:00, 4 November 2013

Project name Main project page
Previous entry      Next entry

One-liners with GNU tools

  • Toolbox: often available by default on many computers running GNU/Linux



  • Skip a subset of successive lines:
for i in {1..10}; do echo $i; done | sed 3,6d


  • Extract a subset of successive lines:
$ for i in {1..20}; do echo $i; done | sed -n 3,5p


  • Use absolute values:
$ for i in {-5..5}; do echo $i; done | awk 'function abs(x){return (((x < 0.0) ? -x : x) + 0.0)} {print abs($1)}'


  • Extract the best snp per gene:
$ echo -e "gene\tsnp\tpvalue\ng1\ts1\t0.3\ng1\ts2\t0.002\ng2\ts2\t0.7\ng2\ts3\t0.05" > dat.txt
gene    snp     pvalue
g1      s1      0.3
g1      s2      0.002
g2      s2      0.7
g2      s3      0.05
$ cat dat.txt | sed 1d | sort -k1,1 -k3,3 | awk '{print $3"\t"$2"\t"$1}' | uniq -f2
g1      s2      0.002
g2      s3      0.05


  • Loop over pairs:
$ subgroups=("s1" "s2" "s3" "s4"); for i in {0..2}; do let a=$i+1; for j in $(seq $a 3); do s1=${subgroups[$i]}; s2=${subgroups[$j]}; echo $s1 $s2; done; done


$ awk 'BEGIN{RS=">"} {if(NF==0)next; split($0,a,"\n"); printf "@"a[1]"\n"a[2]"\n+\n"; \
for(i=1;i<=length(a[2]);i++)printf "}"; printf"\n"}' probes.fa > probes.fq


  • Sort a file with header line: that is, we don't want the first line to be sorted
$ echo -e "x\ty"; for i in {1..10}; do echo -e $i"\t"$RANDOM; done | (read -r; printf "%s\n" "$REPLY"; sort -k2,2n)


  • Get rows from a big file which are also in a small file: example of using awk with 2 input files by loading the important information from the small file into an array in memory, then parsing the big file line by line and comparing each with the content of the array
$ echo -e "gene\tsnp\tpvalue\ngene1\tsnp1\t0.002\ngene2\tsnp2\t0.8\ngene2\tsnp3\t0.1" > file_all.txt
$ echo -e "gene1\tsnp1" > file_subset.txt
$ awk 'NR==FNR{a[$1$2]++;next;}{x=$1$2;if(x in a)print $0}' file_subset.txt <(sed 1d file_all.txt)



Personal tools