Wayne:High Throughput Sequencing Resources: Difference between revisions

Revision as of 08:44, 20 February 2013

Basic unix and usage of Sirius (our lab server)

Sirius is our analytical powerhouse (64 cores, amazing for parallel computing; 512Gb memory; 64 bit file system in the x86_64 configuration) and we have specific locations on the server to do specific jobs. It is stored in a lovely server closet and so the way to access it is though a secure shell (ssh). Your username and password are obtained through our IT staff. Once you have logged on, there are a series of commands and "server etiquette" you will need to follow. For the PDF, click here.

Login

ssh user@sirius.eeb.ucla.edu
slogin user@sirius.eeb.ucla.edu
to learn about the server:
- uname -a
To change the default password you are given, use:
- passwd
To logout of the server
- logout (or control+D)

Structure and organization

Your home (user) director holds <5Gb of data (be aware!)
- /home/user
For genomes and databases
- /databases
Location of installed programs
- /usr/local/bin
- /opt/
The location to store your data
- /data/
- /data/user
  - You can create your own personal directory if you'd like (see below for commands)
The location to place scripts and data ONLY while you are working with it
- /work/user

Rules

Developing a pipeline:
- copy a small but representative part of your data to sirius
- run all the programs you need on them
- debug and save final version of pipeline e.g. in a text file
- copy all your data
- run your pipeline on all data
- debug and update pipeline
- mv results wherever you want
- erase data
Never start more jobs than the number of available cores (e.g. If there are 50 jobs running, do NOT submit more than 14 to make a total of 64 jobs)!!
Look at the memory and cpu usage before you start to load sirius with commands (cmd)
- htop --- use to view real-time CPU usage
- top --- displays the top CPU processes/jobs and provides an ongoing look at processor activity in real time. It displays a listing of the most CPU-intensive tasks on the system, and can provide an interactive interface for manipulating processes. It can sort the tasks by CPU usage, memory usage and runtime.
If you don't know something, use manual
- man ls --- to look up the functionality of the ls tool, use Google, or ask admins (Jonathan or Ron) or in-lab (Rena or Pedro)
mpstat --- to display the utilization of each CPU individually. It reports processors related statistics
mpstat -P ALL --- the mpstat command display activities for each available processor, processor 0 being the first one. Global average activities among all processors are also reported
sar --- displays the contents of selected cumulative activity counters in the operating system

Installing programs yourself

Check if it's already installed
mkdir ~/bin --- to creak a directory in your home folder
cat .bash_profile --- put it in your path or check to see if it's already there
PATH=$PATH:$HOME/bin
export PATH
compile it with prefix ~/bin --- install programs to bin

Data transfer (network)

scp options user@host_source:path/to/file1 user@host_dest:/dest/path/to/file2 -- Command Line Interface (CLI) for moving files
scp -r user@host_source:path/to/dir user@host_dest:/dest/path -- Command Line Interface (CLI) for moving directories
FileZilla, Cyberduck, Fugu, etc..... -- Graphical User Interface (GUI)
df -h -- check disk usage
du -hs /path --- check disk space used by a directory
du -h -max-depth=1 /path --- check disk space used by a directory

Data editing

- vim filename --- to edit the file

History

ctrl+r --- searching history
history --- display history
!#cmd_num --- display history
Arrow up is a short cut to scroll through recently used commands

Files

ls --- lists your files
ls -l --- lists your files in long format
ls -a --- shows hidden files
ls -t --- sorted by time modified instead of name
more filename --- shows first part of a file; hit space bar to see more
head filename --- print to screen the top 10 lines or so of the specified file
tail filename --- print to screen the last 10 lines or so of the specified file
emacs filename --- an editor for editing a file
cp filename1 filename2 --- copies a file in your current location
cp path/to/filename1 path/to/filename2 --- you can specify a file copy at another location
rm filename --- permanently remove a file (Caution! This cannot be undone!)
diff filename1 filename2 --- compares files and shows where they differ
wc filename --- tells you how many lines (whitespace or newline delimited), words, and characters (bytes) are in a file
wc -l filename --- tells you how many lines are in a file (whitespace or newline delimited)
wc -w filename --- tells you how many words are in a file
wc -c filename --- tells you how many characters (bytes) are in a file
chmod options filename --- change the read, write, and execute permissions for a file (Google this!)

File compression

gzip filename --- compresses files to make a file with a .gz extension
gzip -c filename >filename.gz --- compress file into tar.gz; the ">" means print to outfile filename.gz
gunzip filename ---uncompress a gzip file
tar -xzf filename.tar.gz --- decompressing a tar.gz file
gzcat filename --- lets y ou look at a gzipped file without having to gunzip it

Directories

pwd --- prints working directory (your current location)
cd /path/to/desired/location --- change directories by providing path
cd ../ --- go up one directory
mkdir directoryName --- make a new directory
rmdir directoryName --- remove directory (must be empty)...Remember that you cannot undo this move!
rmdir -r directoryName --- recursively remove directory and the files it contains...Remember that you cannot undo this move!
rmdir filename --- remove specified file...Remember that you cannot undo this move!

Finding things

whereis [filename, command] --- lists all occurances of filename or command
~/path --- tilde designated a shortcut for the path to your home directory
nohup commands & --- to initiate a no-hangup background job (writes stdout to nohup.out)
screen --- to initiate a new screen session to start a new background job (ctrl+a+d if you need to detach; screen -ls to list running screens; reattach screen pid)

@@ Line 4: / Line 4: @@
 == Basic unix and usage of Sirius (our lab server) ==
-Sirius is our analytical powerhouse (64 cores, amazing for parallel computing; 512Gb memory; 64 bit file system in the x86_64 configuration) and we have specific locations on the server to do specific jobs. It is stored in a lovely server closet and so the way to access it is though a secure shell (''ssh''). Your username and password are obtained through our IT staff. Once you have logged on, there are a series of commands and "server etiquette" you will need to follow. For the PDF, click [http://openwetware.org/images/5/5e/Sirius_rules.pdf here].
+Sirius is our analytical powerhouse (64 cores, amazing for parallel computing; 512Gb memory; 64 bit file system in the x86_64 configuration) and we have specific locations on the server to do specific jobs. It is stored in a lovely server closet and so the way to access it is though a secure shell (''ssh''). Your username and password are obtained through our IT staff. Once you have logged on, there are a series of commands and "server etiquette" you will need to follow. For the PDF, click [http://openwetware.org/images/5/5e/Sirius_rules.pdf here]. <br>
-<u>Login</u>
+'''Login'''
 *ssh user@sirius.eeb.ucla.edu
 *slogin user@sirius.eeb.ucla.edu
@@ Line 13: / Line 13: @@
 *To change the default password you are given, use:
 **passwd
+*To logout of the server
+**logout (or control+D)
+<br>
-<u>Structure and organization</u>
+'''Structure and organization'''
 *Your home (user) director holds <5Gb of data (be aware!)
 **/home/user
@@ Line 28: / Line 31: @@
 *The location to place scripts and data ONLY while you are working with it
 **/work/user
+<br>
-<u>Rules</u>
+'''Rules'''
 *Developing a pipeline:
 **copy a small but representative part of your data to sirius
@@ Line 41: / Line 45: @@
 *Never start more jobs than the number of available cores (e.g. If there are 50 jobs running, do NOT submit more than 14 to make a total of 64 jobs)!!
 *Look at the memory and cpu usage before you start to load sirius with commands (cmd)
-*Use to view real-time CPU usage:
+**htop --- use to view real-time CPU usage
-**htop
+**top --- displays the top CPU processes/jobs and provides an ongoing look at processor activity in real time. It displays a listing of the most CPU-intensive tasks on the system, and can provide an interactive interface for manipulating processes. It can sort the tasks by CPU usage, memory usage and runtime.
-**top
 *If you don't know something, use manual
-**man ls
+**man ls --- to look up the functionality of the ls tool, use Google, or ask admins (Jonathan or Ron) or in-lab (Rena or Pedro)
-**Google
+*''mpstat'' --- to display the utilization of each CPU individually. It reports processors related statistics
-**Ask admins (Jonathan or Ron) or in-lab (Rena or Pedro)
+*''mpstat -P ALL'' --- the mpstat command display activities for each available processor, processor 0 being the first one. Global average activities among all processors are also reported
+*''sar'' --- displays the contents of selected cumulative activity counters in the operating system
+<br>
-<u>Installing programs yourself</u>
+'''Installing programs yourself'''
 *Check if it's already installed
-*Create a dir in your home folder
+*mkdir ~/bin --- to creak a directory in your home folder
-**mkdir ~/bin
+*cat .bash_profile --- put it in your path or check to see if it's already there
-*Put it in your path or check to see if it's already there
+*PATH=$PATH:$HOME/bin
-**cat .bash_profile
+*export PATH
-***PATH=$PATH:$HOME/bin
+*compile it with prefix ~/bin --- install programs to bin
-***export PATH
+<br>
-*Install programs to bin
-**compile it with prefix ~/bin
-<u>Data transfer (network)</u>
+'''Data transfer (network)'''
-*Command Line Interface (CLI)
+*scp ''options'' user@host_source:path/to/file1 user@host_dest:/dest/path/to/file2 -- Command Line Interface (CLI) for moving files
-**scp [options] user@host_source:path/to/file1 user@host_dest:/dest/path/to/file2
+*scp -r user@host_source:path/to/dir user@host_dest:/dest/path -- Command Line Interface (CLI) for moving directories
-**if you copy directories, use the option -r for recursive
+*FileZilla, Cyberduck, Fugu, etc..... -- Graphical User Interface (GUI)
-*Graphical User Interface (GUI)
+*df -h -- check disk usage
-**FileZilla, Cyberduck, Fugu, etc.....
+*du -hs /path --- check disk space used by a directory
-*First check if there is enough space available for you to move data
+*du -h -max-depth=1 /path --- check disk space used by a directory
-**check disk usage
-***df -h
-**check disk space used by a dir
-***du -hs /path
-***du -h -max-depth=1 /path
-<u>Data editing)</u>
+'''Data editing'''
-*Small modifications to a file on the server
+**vim ''filename'' ---  to edit the file
-**vim filename
-<u>History</u>
+'''History'''
-*ctrl+r for searching history
+*ctrl+r ---  searching history
-*history
+*history --- display history
-*!#cmd_num
+*!#cmd_num --- display history
 *Arrow up is a short cut to scroll through recently used commands
-<br>
-<div align="right">[http://openwetware.org/wiki/Wayne:High_Throughput_Sequencing_Resources Top]</div>
-<div align="right">[http://openwetware.org/wiki/Wayne_Lab Wayne Lab Home]</div>
 '''Files'''
 *ls --- lists your files
-*ls -l ---
+*ls -l --- lists your files in long format
-* ---
+*ls -a --- shows hidden files
-* ---
+*ls -t --- sorted by time modified instead of name
-* ---
+*more ''filename'' --- shows first part of a file; hit space bar to see more
-* ---
+*head ''filename'' --- print to screen the top 10 lines or so of the specified file
-* ---
+*tail ''filename'' --- print to screen the last 10 lines or so of the specified file
-* ---
+*emacs ''filename'' --- an editor for editing a file
-* ---
+*cp ''filename1'' ''filename2'' --- copies a file in your current location
-* ---
+*cp ''path/to/filename1'' ''path/to/filename2'' --- you can specify a file copy at another location
-* ---
+*rm ''filename'' --- permanently remove a file (Caution! This cannot be undone!)
-* ---
+*diff ''filename1'' ''filename2'' --- compares files and shows where they differ
-* ---
+*wc ''filename'' --- tells you how many lines (whitespace or newline delimited), words, and characters (bytes) are in a file
-<table border="0">
+*wc -l ''filename'' --- tells you how many lines are in a file (whitespace or newline delimited)
-<tr>
+*wc -w ''filename'' --- tells you how many words are in a file
-<td><b>Command</b></td>
+*wc -c ''filename'' --- tells you how many characters (bytes) are in a file
-<td><b>Usage</b></td>
+*chmod ''options'' ''filename'' --- change the read, write, and execute permissions for a file (Google this!)
-</tr>
-<tr>
-<td>ssh ''username@sirius.eeb.ucla.edu''</td>
-<td>Secure shell login to the Sirius server</td>
-</tr>
-<tr>
-<td>logout (or control+D)</td>
-<td>Logout of the Sirius server</td>
-</tr>
-<tr>
-<td>pwd</td>
-<td>Print working directory (your current location</td>
-</tr>
-<tr>
-<td>ls</td>
-<td>List (all contents of current location)</td>
-</tr>
-<tr>
-<td>ls ''options''</td>
-<td>ls -a (hidden files), ls -l (long/detailed list), ls -t (sorted by time modified instead of name)</td>
-</tr>
-<tr>
-<td>cd /give/path</td>
-<td>Change directories</td>
-</tr>
-<tr>
-<td>cd ..</td>
-<td>Go up one directory</td>
-</tr>
-<tr>
-<td>mkdir ''directoryName''</td>
-<td>Make a new directory</td>
-</tr>
-<tr>
-<td>rmdir ''directoryName''</td>
-<td>Remove directory (must be empty)...Remember that you cannot undo this move!</td>
-</tr>
-<tr>
-<td>rmdir -r ''directoryName''</td>
-<td>Recursively remove directory and the files it contains...Remember that you cannot undo this move!</td>
-</tr>
-<tr>
-<td>rmdir ''filename''</td>
-<td>Remove specified file...Remember that you cannot undo this move!</td>
-</tr>
-<tr>
-<td>head ''filename''</td>
-<td>Print to screen the top 10 lines or so of the specified file</td>
-</tr>
-<tr>
-<td>tail ''filename''</td>
-<td>Print to screen the last 10 lines or so of the specified file</td>
-</tr>
-<tr>
-<td>more ''filename''</td>
-<td>Allows file contents or piped output to be sent to the screen one page at a time</td>
-</tr>
-<tr>
-<td>less ''filename''</td>
-<td>Opposite of more command</td>
-</tr>
-<tr>
-<td>wc ''filename''</td>
-<td>Print byte, word, and line counts</td>
-</tr>
-<tr>
-<td>wc [''options''] ''filename'' </td>
-<td>-c (bytes); -l (lines); -w (words) delimited by whitespace or newline</td>
-</tr>
-<tr>
-<td>whereis [''filename, command'']</td>
-<td>Lists all occurances of filename or command</td>
-</tr>
-<tr>
-<td>mv ''current/path destination/path''</td>
-<td>Move (akin to cut/paste), to remove the file in the current location</td>
-</tr>
-<tr>
-<td>cp ''current/path destination/path''</td>
-<td>Copy (also used to rename files if you keep them in their current path), keeps a copy in the current path </td>
-</tr>
-<tr>
-<td>~''/path''</td>
-<td>Tilde designated a shortcut for the path to your home directory</td>
-</tr>
-<tr>
-<td>nohup ''commands'' &</td>
-<td>To initiate a no-hangup background job (writes stdout to nohup.out)</td>
-</tr>
-<tr>
-<td>screen</td>
-<td>To initiate a new screen session to start a new background job (ctrl+a+d if you need to detach; screen -ls to list running screens; reattach screen pid)</td>
-</tr>
-<tr>
-<td>tar -xzf ''filename.tar.gz''</td>
-<td>Decompress tar.gz file</td>
-</tr>
-<tr>
-<td>gzip -c ''filename'' >''filename.gz''</td>
-<td>Compress file into tar.gz; the ">" means print to outfile ''filename.gz''</td>
-</tr>
-</table>
 <br>
+'''File compression'''
+*gzip ''filename'' --- compresses files to make a file with a .gz extension
+*gzip -c ''filename'' >''filename.gz'' --- compress file into tar.gz; the ">" means print to outfile ''filename.gz''
+*gunzip ''filename'' ---uncompress a gzip file
+*tar -xzf ''filename.tar.gz'' --- decompressing a tar.gz file
+*gzcat ''filename'' --- lets y ou look at a gzipped file without having to gunzip it
 <br>
-Here is a list of commonly used linux commands for learning about the CPU utilization:
+'''Directories'''
+* pwd --- prints working directory (your current location)
+* cd /path/to/desired/location --- change directories by providing path
+* cd ../ --- go up one directory
+*mkdir ''directoryName'' --- make a new directory
+*rmdir ''directoryName'' --- remove directory (must be empty)...Remember that you cannot undo this move!
+*rmdir -r ''directoryName'' --- recursively remove directory and the files it contains...Remember that you cannot undo this move!
+*rmdir ''filename'' --- remove specified file...Remember that you cannot undo this move!
+<br>
-<table border="0">
+'''Finding things'''
-<tr>
+*whereis [''filename, command''] --- lists all occurances of filename or command
-<td><b>Command</b></td>
+*~''/path'' --- tilde designated a shortcut for the path to your home directory
-<td><b>Usage</b></td>
+*nohup ''commands'' & --- to initiate a no-hangup background job (writes stdout to nohup.out)
-</tr>
+*screen --- to initiate a new screen session to start a new background job (ctrl+a+d if you need to detach; screen -ls to list running screens; reattach screen pid)
-<tr>
+<br>
-<td> ''top''</td>
-<td>Display top CPU processes/jobs and provides an ongoing look at processor activity in real time. It displays a listing of the most CPU-intensive tasks on the system, and can provide an interactive interface for manipulating processes. It can sort the tasks by CPU usage, memory usage and runtime. </td>
-</tr>
-<tr>
-<td> ''mpstat''</td>
-<td>To display the utilization of each CPU individually. It reports processors related statistics.</td>
-</tr>
-<tr>
-<td> ''mpstat -P ALL'' </td>
-<td>The mpstat command display activities for each available processor, processor 0 being the first one. Global average activities among all processors are also reported.</td>
-</tr>
-<tr>
-<td> ''sar''</td>
-<td>Displays the contents of selected cumulative activity counters in the operating system</td>
-</tr>
-</table>
 <br>

Wayne:High Throughput Sequencing Resources: Difference between revisions

Revision as of 08:44, 20 February 2013

Basic unix and usage of Sirius (our lab server)

High throughput (HT) platform and read types

CBI Collaboratory

File formats and conversions

Deplexing using barcoded sequence tags

Quality control

Trimming and clipping

FASTQC and FASTX tools

BED and SAM tools

GATK variant calling

R basics

Python basics

HT sequence analysis using R (and Bioconductor)

DNA sequence analysis

RNA-seq analysis

SOLiD software tools

Navigation menu

Search