User:Timothee Flutre/Notebook/Postdoc/2013/12/27

From OpenWetWare

(Difference between revisions)
Jump to: navigation, search
(About organizing a project: add link to pdf describing linux and free software)
(About organizing a project: add lots of details)
Line 6: Line 6:
| colspan="2"|
| colspan="2"|
<!-- ##### DO NOT edit above this line unless you know what you are doing. ##### -->
<!-- ##### DO NOT edit above this line unless you know what you are doing. ##### -->
-
==About organizing a project==
+
==About organizing computer-based research==
* '''Motivation''': when starting a new project, it is very handy to ''quickly'' and ''easily'' set up a ''portable'' structure allowing the project to be ''backed-up'' on other machines, ''shared'' with collaborators and the work to be ''reproduced/replicated'' by colleagues.
* '''Motivation''': when starting a new project, it is very handy to ''quickly'' and ''easily'' set up a ''portable'' structure allowing the project to be ''backed-up'' on other machines, ''shared'' with collaborators and the work to be ''reproduced/replicated'' by colleagues.
* '''OS choice''': concerning computers, one usually has a preferred [https://en.wikipedia.org/wiki/Operating_system operating system]. Yet, in scientific projects where computing is an important aspect of research, the most frequent is [https://en.wikipedia.org/wiki/Linux GNU/Linux]. Thus, even if it's always good to know how to find our way on other operating systems, such as [https://en.wikipedia.org/wiki/Microsoft_Windows Microsoft Windows] and [https://en.wikipedia.org/wiki/OS_X Apple Mac OS X], I will focus in the following on GNU/Linux.
* '''OS choice''': concerning computers, one usually has a preferred [https://en.wikipedia.org/wiki/Operating_system operating system]. Yet, in scientific projects where computing is an important aspect of research, the most frequent is [https://en.wikipedia.org/wiki/Linux GNU/Linux]. Thus, even if it's always good to know how to find our way on other operating systems, such as [https://en.wikipedia.org/wiki/Microsoft_Windows Microsoft Windows] and [https://en.wikipedia.org/wiki/OS_X Apple Mac OS X], I will focus in the following on GNU/Linux.
* '''My history''': in 2006, during an internship in a bioinformatics lab, I discovered GNU/Linux. More specifically, I worked on a [https://en.wikipedia.org/wiki/Fedora_%28operating_system%29 Fedora] distribution and was able to install it on my laptop. From 2007 to 2010, during my PhD, I switched to [https://en.wikipedia.org/wiki/Debian Debian] and then [https://en.wikipedia.org/wiki/Ubuntu_%28operating_system%29 Ubuntu] for my laptop, and I used several computer clusters running with [https://en.wikipedia.org/wiki/Solaris_%28operating_system%29 Solaris] and [https://en.wikipedia.org/wiki/CentOS CentOS].
* '''My history''': in 2006, during an internship in a bioinformatics lab, I discovered GNU/Linux. More specifically, I worked on a [https://en.wikipedia.org/wiki/Fedora_%28operating_system%29 Fedora] distribution and was able to install it on my laptop. From 2007 to 2010, during my PhD, I switched to [https://en.wikipedia.org/wiki/Debian Debian] and then [https://en.wikipedia.org/wiki/Ubuntu_%28operating_system%29 Ubuntu] for my laptop, and I used several computer clusters running with [https://en.wikipedia.org/wiki/Solaris_%28operating_system%29 Solaris] and [https://en.wikipedia.org/wiki/CentOS CentOS].
** This looks like an anthology of weird names but, fundamentally, all these distributions are more or less similar to each other and can be described as [https://en.wikipedia.org/wiki/Unix-like Unix-like systems]. Please note, however, that all these are not equivalent in terms of protecting your ''freedom''. Michael Kerrisk presents this quite well ([http://man7.org/conf/udes2012/Linux_and_Free_Software.pdf pdf]). It is indeed important to know about the difference between [https://en.wikipedia.org/wiki/GNU/Linux_naming_controversy GNU and Linux] and, for those who read the [http://www.amazon.com/dp/1451648537/ biography of Steve Jobs], I highly recommend reading the [https://en.wikipedia.org/wiki/Free_as_in_Freedom:_Richard_Stallman%27s_Crusade_for_Free_Software biography of Richard Stallman] (founder of GNU).
** This looks like an anthology of weird names but, fundamentally, all these distributions are more or less similar to each other and can be described as [https://en.wikipedia.org/wiki/Unix-like Unix-like systems]. Please note, however, that all these are not equivalent in terms of protecting your ''freedom''. Michael Kerrisk presents this quite well ([http://man7.org/conf/udes2012/Linux_and_Free_Software.pdf pdf]). It is indeed important to know about the difference between [https://en.wikipedia.org/wiki/GNU/Linux_naming_controversy GNU and Linux] and, for those who read the [http://www.amazon.com/dp/1451648537/ biography of Steve Jobs], I highly recommend reading the [https://en.wikipedia.org/wiki/Free_as_in_Freedom:_Richard_Stallman%27s_Crusade_for_Free_Software biography of Richard Stallman] (founder of GNU).
-
* '''Home structure''': bin/, src/, src_ext/, texmf/, tmp/, work/
+
 
-
* '''Project structure''': analysis/, doc/, download/, figures/, preprocessing/, scripts/, src/
+
* '''Home''':
-
* '''Editor''': emacs with org-mode
+
** I create a set of directories, via <code>mkdir -p bin include lib share src src_ext texmf tmp work</code>:
-
* '''Backup''': rsync
+
*** <code>bin</code>: contains executables;
-
* '''Programming''': templates, git, Autotools,  
+
*** <code>include</code>: contains C/C++ header files;
 +
*** <code>lib</code>: contains C/C++ shared libraries;
 +
*** <code>share</code>: contains documentation;
 +
*** <code>src</code>: contains source code from my own packages;
 +
*** <code>src_ext</code>: contains source code from external packages;
 +
*** <code>texmf</code>: contains LaTeX packages;
 +
*** <code>tmp</code>: contains temporary tasks;
 +
*** <code>work</code>: contains projects.
 +
** this structure is reflected in my file <code>~/.bash_profile</code>:
 +
 
 +
<nowiki>
 +
# Get the aliases and functions
 +
if [ -f ~/.bashrc ]; then
 +
        . ~/.bashrc
 +
fi
 +
 
 +
# User specific environment and startup programs
 +
PATH=$HOME/bin:$PATH
 +
export PATH
 +
CFLAGS="$CFLAGS -I$HOME/include"
 +
export CFLAGS
 +
LDFLAGS="$LDFLAGS -L$HOME/lib"
 +
export LDFLAGS
 +
</nowiki>
 +
 
 +
* '''External packages''': for each external package in <code>src_ext</code>, I create a directory which name is in upper letter, say EMACS, in which I create a file <code>install.bash</code> with the necessary commands to compile and install the package:
 +
 
 +
<nowiki>
 +
#!/usr/bin/env bash
 +
wget http://gnu.mirrors.hoobly.com/gnu/emacs/emacs-24.3.tar.gz
 +
tar xzvf emacs-24.3.tar.gz
 +
cd emacs-24.3
 +
./configure --prefix=$HOME --with-x-toolkit=no --with-xpm=no --with-jpeg=no --with-gif=no --with-tiff=no
 +
make
 +
make install
 +
</nowiki>
 +
 
 +
* '''Projects''':
 +
** for each project in <code>work</code>, I create a set of directories, via <code>mkdir -p analysis doc download figures preprocessing scripts src</code>:
 +
*** <code>analysis</code>: contains the outputs (exploratory, temporary, final) of the analyzes;
 +
*** <code>doc</code>: contains the documentation allowing to replicate the whole project, usually as an <code>README.org</code> file;
 +
*** <code>download</code>: contains the data sets obtained externally;
 +
*** <code>figures</code>: contains all figures, used in the README and the manuscript;
 +
*** <code>preprocessing</code>: contains the outputs of the preprocessing, then used to obtain outputs in <code>analysis</code>;
 +
*** <code>scripts</code>: contains all scripts, usually used for preprocessing;
 +
*** <code>src</code>: contains my own source code, usually used in the analyzes and not yet mature enough to be in <code>~/src</code>.
 +
** to share my work with colleagues, I use <code>tar -czvf project.at.gz --exclude=project/download project</code>
 +
 
 +
* '''Choices''': I strive for freedom-protection (''à la'' free software), portability, longevity, robustness and modularity
 +
** editing text: I use [http://openwetware.org/wiki/User:Timothee_Flutre/Notebook/Postdoc/2012/07/25 Emacs];
 +
** documenting projects: I use org-mode (major argument for using Emacs);
 +
** writing code: I start from one of my [http://openwetware.org/wiki/User:Timothee_Flutre/Notebook/Postdoc/2012/05/16 templates] for bash, Python, R and C++;
 +
** versioning: I use [http://openwetware.org/wiki/User:Timothee_Flutre/Notebook/Postdoc/2012/08/14 git];
 +
** developing packages: I use the [http://openwetware.org/wiki/User:Timothee_Flutre/Notebook/Postdoc/2012/11/27 Autotools];
 +
** presenting: I use LaTeX (for papers) and Beamer (for talks), eventually LibreOffice Writer and Impress (papers and talks, respectively);
 +
** drawing: GIMP and Inkscape.
 +
** backup: I use <code>rsync</code>, via a script <code>backup.bash</code>:
 +
 
 +
<nowiki>
 +
#!/usr/bin/env bash
 +
# backup.bash <path_to_backup> >& backup.log &
 +
date
 +
RSYNC_OPT="--compress --recursive --times --perms --links --exclude="*~" --delete --delete-excluded --progress"
 +
rsync $RSYNC_OPT ~/remote1/work/project1 $1
 +
rsync $RSYNC_OPT ~/remote1/work/project2 $1
 +
date
 +
</nowiki>
<!-- ##### DO NOT edit below this line unless you know what you are doing. ##### -->
<!-- ##### DO NOT edit below this line unless you know what you are doing. ##### -->

Revision as of 12:41, 22 January 2014

Project name Main project page

About organizing computer-based research

  • Motivation: when starting a new project, it is very handy to quickly and easily set up a portable structure allowing the project to be backed-up on other machines, shared with collaborators and the work to be reproduced/replicated by colleagues.
  • OS choice: concerning computers, one usually has a preferred operating system. Yet, in scientific projects where computing is an important aspect of research, the most frequent is GNU/Linux. Thus, even if it's always good to know how to find our way on other operating systems, such as Microsoft Windows and Apple Mac OS X, I will focus in the following on GNU/Linux.
  • My history: in 2006, during an internship in a bioinformatics lab, I discovered GNU/Linux. More specifically, I worked on a Fedora distribution and was able to install it on my laptop. From 2007 to 2010, during my PhD, I switched to Debian and then Ubuntu for my laptop, and I used several computer clusters running with Solaris and CentOS.
    • This looks like an anthology of weird names but, fundamentally, all these distributions are more or less similar to each other and can be described as Unix-like systems. Please note, however, that all these are not equivalent in terms of protecting your freedom. Michael Kerrisk presents this quite well (pdf). It is indeed important to know about the difference between GNU and Linux and, for those who read the biography of Steve Jobs, I highly recommend reading the biography of Richard Stallman (founder of GNU).
  • Home:
    • I create a set of directories, via mkdir -p bin include lib share src src_ext texmf tmp work:
      • bin: contains executables;
      • include: contains C/C++ header files;
      • lib: contains C/C++ shared libraries;
      • share: contains documentation;
      • src: contains source code from my own packages;
      • src_ext: contains source code from external packages;
      • texmf: contains LaTeX packages;
      • tmp: contains temporary tasks;
      • work: contains projects.
    • this structure is reflected in my file ~/.bash_profile:
# Get the aliases and functions
if [ -f ~/.bashrc ]; then
        . ~/.bashrc
fi

# User specific environment and startup programs
PATH=$HOME/bin:$PATH
export PATH
CFLAGS="$CFLAGS -I$HOME/include"
export CFLAGS
LDFLAGS="$LDFLAGS -L$HOME/lib"
export LDFLAGS

  • External packages: for each external package in src_ext, I create a directory which name is in upper letter, say EMACS, in which I create a file install.bash with the necessary commands to compile and install the package:
#!/usr/bin/env bash
wget http://gnu.mirrors.hoobly.com/gnu/emacs/emacs-24.3.tar.gz
tar xzvf emacs-24.3.tar.gz
cd emacs-24.3
./configure --prefix=$HOME --with-x-toolkit=no --with-xpm=no --with-jpeg=no --with-gif=no --with-tiff=no
make
make install

  • Projects:
    • for each project in work, I create a set of directories, via mkdir -p analysis doc download figures preprocessing scripts src:
      • analysis: contains the outputs (exploratory, temporary, final) of the analyzes;
      • doc: contains the documentation allowing to replicate the whole project, usually as an README.org file;
      • download: contains the data sets obtained externally;
      • figures: contains all figures, used in the README and the manuscript;
      • preprocessing: contains the outputs of the preprocessing, then used to obtain outputs in analysis;
      • scripts: contains all scripts, usually used for preprocessing;
      • src: contains my own source code, usually used in the analyzes and not yet mature enough to be in ~/src.
    • to share my work with colleagues, I use tar -czvf project.at.gz --exclude=project/download project
  • Choices: I strive for freedom-protection (à la free software), portability, longevity, robustness and modularity
    • editing text: I use Emacs;
    • documenting projects: I use org-mode (major argument for using Emacs);
    • writing code: I start from one of my templates for bash, Python, R and C++;
    • versioning: I use git;
    • developing packages: I use the Autotools;
    • presenting: I use LaTeX (for papers) and Beamer (for talks), eventually LibreOffice Writer and Impress (papers and talks, respectively);
    • drawing: GIMP and Inkscape.
    • backup: I use rsync, via a script backup.bash:
#!/usr/bin/env bash
# backup.bash <path_to_backup> >& backup.log &
date
RSYNC_OPT="--compress --recursive --times --perms --links --exclude="*~" --delete --delete-excluded --progress"
rsync $RSYNC_OPT ~/remote1/work/project1 $1
rsync $RSYNC_OPT ~/remote1/work/project2 $1
date


Personal tools