Wayne:High Throughput Sequencing Resources: Difference between revisions

Revision as of 16:12, 19 February 2013

Sirius Usage (our lab server)

Sirius is our analytical powerhouse (64 cores, amazing for parallel computing; 512Gb memory; 64 bit file system in the x86_64 configuration) and we have specific locations on the server to do specific jobs. It is stored in a lovely server closet and so the way to access it is though a secure shell (ssh). Your username and password are obtained through our IT staff. Once you have logged on, there are a series of commands and "server etiquette" you will need to follow.

Login

ssh user@sirius.eeb.ucla.edu
slogin user@sirius.eeb.ucla.edu
to learn about the server:
- uname -a

Structure and organization

Your home (user) director holds <5Gb of data (be aware!)
- /home/user
For genomes and databases
- /databases
Location of installed programs
- /usr/local/bin
- /opt/
The location to store your data
- /data/
- /data/user
  - You can create your own personal directory if you'd like (see below for commands)
The location to place scripts
- /work/user

Rules Developing a pipeline:  copy a small but representative part of your data to sirius  run all the programs you need on them  debug and save final version of pipeline e.g. in a text file  copy all your data  run your pipeline on all data  debug and update pipeline  mv results wherever you want  erase data  never start more jobs than the number of available cores  look at the memory and cpu usage before you start to load sirius with commands (cmd)  htop or top

@@ Line 7: / Line 7: @@
 <u>Login</u>
-*$ ssh user@sirius.eeb.ucla.edu
+*ssh user@sirius.eeb.ucla.edu
-*$ slogin user@sirius.eeb.ucla.edu
+*slogin user@sirius.eeb.ucla.edu
 *to learn about the server:
-**$ uname -a
+**uname -a
-<u>Structure</u>
+<u>Structure and organization</u>
-*Your home (user) director
+*Your home (user) director holds <5Gb of data (be aware!)
 **/home/user
+*For genomes and databases
+**/databases
 *Location of installed programs
 **/usr/local/bin
 **/opt/
+*The location to store your data
+**/data/
+**/data/user
+***You can create your own personal directory if you'd like (see below for commands)
+*The location to place scripts
+**/work/user
+<u>Rules</u>
+Developing a pipeline:
+ copy a small but representative part of your data to sirius  run all the programs you need on them
+ debug and save final version of pipeline e.g. in a text file  copy all your data
+ run your pipeline on all data
+ debug and update pipeline
+ mv results wherever you want
+ erase data
+ never start more jobs than the number of available cores
+ look at the memory and cpu usage before you start to load
+sirius with commands (cmd)  htop or top
+*
+*
+*
 <br>

Command	Usage
ssh username@sirius.eeb.ucla.edu	Secure shell login to the Sirius server
logout (or control+D)	Logout of the Sirius server
pwd	Print working directory (your current location
ls	List (all contents of current location)
ls options	ls -a (hidden files), ls -l (long/detailed list), ls -t (sorted by time modified instead of name)
cd /give/path	Change directories
cd ..	Go up one directory
mkdir directoryName	Make a new directory
rmdir directoryName	Remove directory (must be empty)...Remember that you cannot undo this move!
rmdir -r directoryName	Recursively remove directory and the files it contains...Remember that you cannot undo this move!
rmdir filename	Remove specified file...Remember that you cannot undo this move!
head filename	Print to screen the top 10 lines or so of the specified file
tail filename	Print to screen the last 10 lines or so of the specified file
more filename	Allows file contents or piped output to be sent to the screen one page at a time
less filename	Opposite of more command
wc filename	Print byte, word, and line counts
wc [options] filename	-c (bytes); -l (lines); -w (words) delimited by whitespace or newline
whereis [filename, command]	Lists all occurances of filename or command
mv current/path destination/path	Move (akin to cut/paste), to remove the file in the current location
cp current/path destination/path	Copy (also used to rename files if you keep them in their current path), keeps a copy in the current path
~/path	Tilde designated a shortcut for the path to your home directory
nohup commands &	To initiate a no-hangup background job
screen	To initiate a new screen session to start a new background job
tar -xzf filename.tar.gz	Decompress tar.gz file
gzip -c filename >filename.gz	Compress file into tar.gz; the ">" means print to outfile filename.gz

Command	Usage
top	Display top CPU processes/jobs and provides an ongoing look at processor activity in real time. It displays a listing of the most CPU-intensive tasks on the system, and can provide an interactive interface for manipulating processes. It can sort the tasks by CPU usage, memory usage and runtime.
mpstat	To display the utilization of each CPU individually. It reports processors related statistics.
mpstat -P ALL	The mpstat command display activities for each available processor, processor 0 being the first one. Global average activities among all processors are also reported.
sar	Displays the contents of selected cumulative activity counters in the operating system

Wayne:High Throughput Sequencing Resources: Difference between revisions

Revision as of 16:12, 19 February 2013

Sirius Usage (our lab server)

Basic server commands (for Sirius)

High throughput (HT) platform and read types

CBI Collaboratory

File formats and conversions

Deplexing using barcoded sequence tags

Quality control

Trimming and clipping

FASTQC and FASTX tools

BED and SAM tools

GATK variant calling

R basics

HT sequence analysis using R (and Bioconductor)

DNA sequence analysis

RNA-seq analysis

SOLiD software tools

Navigation menu

Search