VonHoldt:High Throughput Sequencing Resources: Difference between revisions

Revision as of 16:38, 23 July 2013

Della and Tigress

The DELLA processing and TIGRESS data storage servers of the High Performance Computing center of Princeton are our analytical powerhouses and we have specific locations on the server to do specific jobs. It is stored in a lovely server closet and so the way to access it is though a secure shell (ssh). Your username and password are obtained through the IT staff. Once you have logged on, there are a series of commands and "server etiquette" you will need to follow. The PU website has more information on basic usage and tutorials if you are interested.

You should familiarize yourself with some basic Unix commands by doing a few tutorials. Here is also a nice website with a large number of linux commands.

Login

ssh netid@della.princeton.edu --- to secure login
slogin netid@della.princeton.edu --- to secure login
uname -a --- to learn about the server
passwd --- to change the default password you are given
logout (or control+D) --- to logout

Rules

Della is only to be used to execute code via a formal job submission program (qsub command)
- You only have 1GB of space on your Della home directory
- You have 500GB of SCRATCH space on your Della home directory
Tigress is for storage of all data! Write all output to this server, as well
- ln -s --- soft link to your Tigress data files using the command

Submitting jobs using qsub
#!/bin/bash
#PBS -l nodes=1:ppn=1,walltime=1:05:00
#PBS -m abe
#PBS -N clusetr_data
#PBS -M vonholdt@princeton.edu
cd /tigress/vonholdt/RRBS_foxes
./cluster_data.sh

Usage

qsub --- to submit a script (e.g. jobs_to_run.sh) on Della which can point to a perl/python/R/shell scripts on Tigress that does the actual work
Job length: Initially estimate 2x the amount of time you think your job will take to complete. You can refine this value over time.
- Test queue
  - 1 hour limit
  - 2 job maximum per user and NOT to be used for production mode
- Short queue
  - 24 hour limit
  - 40 job maximum
- Medium queue
  - 72 hour limit
  - 16 jobs maximum per user
  - 432 total cores
qstat --- to check the job progress on Della
You can ssh into any node once you have the node ID from your qsub to check on the job status using traditional commands:
- htop --- use to view real-time CPU usage
- top --- displays the top CPU processes/jobs and provides an ongoing look at processor activity in real time. It displays a listing of the most CPU-intensive tasks on the system, and can provide an interactive interface for manipulating processes. It can sort the tasks by CPU usage, memory usage and runtime.

@@ Line 8: / Line 8: @@
 You should familiarize yourself with some basic Unix commands by doing a few [https://www.google.com/webhp?hl=en#hl=en&sugexp=les%3B&gs_rn=4&gs_ri=psy-ab&tok=695YN0gQHgjTmJgR_jf5RA&cp=7&gs_id=51&xhr=t&q=unix+tutorial&es_nrs=true&pf=p&output=search&sclient=psy-ab&oq=unix+tu&gs_l=&pbx=1&bav=on.2,or.r_gc.r_pw.r_cp.r_qf.&bvm=bv.42661473,d.cGE&fp=f1235a99eba9d386&biw=1440&bih=724 tutorials]. Here is also a nice website with a large number of [http://www.trembath.co.za/commands2.html linux commands].
 <br>
+<br>
 '''Login'''

VonHoldt:High Throughput Sequencing Resources: Difference between revisions

Revision as of 16:38, 23 July 2013

Della and Tigress

Basic unix

High throughput (HT) platform and read types

CBI Collaboratory

Getting your HT sequence data

File formats and conversions

Deplexing using barcoded sequence tags

Quality control

Trimming and clipping

FASTQC and FASTX tools

BED and SAM tools

GATK variant calling

R basics

Python basics

HT sequence analysis using R (and Bioconductor)

DNA sequence analysis

DNA methylation analysis

RNA-seq analysis

SOLiD software tools

Passing Arguments to Scripts and Programs Using xargs

Navigation menu

Search