User:Timothee Flutre/Notebook/Postdoc/2012/05/25: Difference between revisions

Revision as of 11:42, 11 October 2013

Project name

<html><img src="/images/9/94/Report.png" border="0" /></html> Main project page
<html><img src="/images/c/c3/Resultset_previous.png" border="0" /></html>Previous entry<html>      </html>Next entry<html><img src="/images/5/5c/Resultset_next.png" border="0" /></html>

One-liners with GNU tools

Toolbox: often available by default on many Linux computers
- Bash
- AWK
- grep
- sed
- GNU coreutils (head, tail, cut, uniq, sort, tr, ...)

Tutorial: Introduction to text manipulation on UNIX-based systems by Brad Yoes (IBM)

Skip a subset of successive lines:

for i in {1..10}; do echo $i; done | sed 3,6d

Extract a subset of successive lines:

$ for i in {1..20}; do echo $i; done | sed -n 3,5p

Use absolute values:

$ for i in {-5..5}; do echo $i; done | awk 'function abs(x){return (((x < 0.0) ? -x : x) + 0.0)} {print abs($1)}'

Extract the best snp per gene:

$ echo -e "gene\tsnp\tpvalue\ng1\ts1\t0.3\ng1\ts2\t0.002\ng2\ts2\t0.7\ng2\ts3\t0.05" > dat.txt
gene    snp     pvalue
g1      s1      0.3
g1      s2      0.002
g2      s2      0.7
g2      s3      0.05

$ cat dat.txt | sed 1d | sort -k1,1 -k3,3 | awk '{print $3"\t"$2"\t"$1}' | uniq -f2
g1      s2      0.002
g2      s3      0.05

Loop over pairs:

$ subgroups=("s1" "s2" "s3" "s4"); for i in {0..2}; do let a=$i+1; for j in $(seq $a 3); do s1=${subgroups[$i]}; s2=${subgroups[$j]}; echo $s1 $s2; done; done

Convert file from fasta to fastq: we can use the built-in variable "RS" (split records) and use "split" (string function):

$ awk 'BEGIN{RS=">"} {if(NF==0)next; split($0,a,"\n"); printf "@"a[1]"\n"a[2]"\n+\n"; \
for(i=1;i<=length(a[2]);i++)printf "}"; printf"\n"}' probes.fa > probes.fq

Sort a file with header line: that is, we don't want the first line to be sorted

$ echo -e "x\ty"; for i in {1..10}; do echo -e $i"\t"$RANDOM; done | (read -r; printf "%s\n" "$REPLY"; sort -k2,2n)

Get rows from a big file which are also in a small file: example of using awk with 2 input files by loading the important information from the small file into an array in memory, then parsing the big file line by line and comparing each with the content of the array

$ echo -e "gene\tsnp\tpvalue\ngene1\tsnp1\t0.002\ngene2\tsnp2\t0.8\ngene2\tsnp3\t0.1" > file_all.txt
$ echo -e "gene1\tsnp1" > file_subset.txt
$ awk 'NR==FNR{a[$1$2]++;next;}{x=$1$2;if(x in a)print $0}' file_subset.txt <(sed 1d file_all.txt)

@@ Line 8: / Line 8: @@
 ==One-liners with GNU tools==
-* '''Tutorial''': [http://www.ibm.com/developerworks/aix/library/au-unixtext/index.html Introduction to text manipulation on UNIX-based systems] by Brad Yoes (IBM)
+* '''Toolbox''': often available by default on many Linux computers
+** [https://en.wikipedia.org/wiki/Bash_%28Unix_shell%29 Bash]
-* '''Toolbox''':
 ** [http://en.wikipedia.org/wiki/AWK AWK]
 ** grep
 ** sed
-** cut
+** [https://en.wikipedia.org/wiki/GNU_Core_Utilities GNU coreutils] (head, tail, cut, uniq, sort, tr, ...)
-** tr
-** wc
+* '''Tutorial''': [http://www.ibm.com/developerworks/aix/library/au-unixtext/index.html Introduction to text manipulation on UNIX-based systems] by Brad Yoes (IBM)
@@ Line 62: / Line 61: @@
   $ echo -e "x\ty"; for i in {1..10}; do echo -e $i"\t"$RANDOM; done | (read -r; printf "%s\n" "$REPLY"; sort -k2,2n)
+* '''Get rows from a big file which are also in a small file''': example of using awk with 2 input files by loading the important information from the small file into an array in memory, then parsing the big file line by line and comparing each with the content of the array
+ $ echo -e "gene\tsnp\tpvalue\ngene1\tsnp1\t0.002\ngene2\tsnp2\t0.8\ngene2\tsnp3\t0.1" > file_all.txt
+ $ echo -e "gene1\tsnp1" > file_subset.txt
+ $ awk 'NR==FNR{a[$1$2]++;next;}{x=$1$2;if(x in a)print $0}' file_subset.txt <(sed 1d file_all.txt)

User:Timothee Flutre/Notebook/Postdoc/2012/05/25: Difference between revisions

Revision as of 11:42, 11 October 2013

One-liners with GNU tools

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

research

Tools