Wayne:High Throughput Sequencing Resources: Difference between revisions
No edit summary |
|||
Line 7: | Line 7: | ||
<u>Login</u> | <u>Login</u> | ||
* | *ssh user@sirius.eeb.ucla.edu | ||
* | *slogin user@sirius.eeb.ucla.edu | ||
*to learn about the server: | *to learn about the server: | ||
** | **uname -a | ||
<u>Structure</u> | <u>Structure and organization</u> | ||
*Your home (user) director | *Your home (user) director holds <5Gb of data (be aware!) | ||
**/home/user | **/home/user | ||
*For genomes and databases | |||
**/databases | |||
*Location of installed programs | *Location of installed programs | ||
**/usr/local/bin | **/usr/local/bin | ||
**/opt/ | **/opt/ | ||
*The location to store your data | |||
**/data/ | |||
**/data/user | |||
***You can create your own personal directory if you'd like (see below for commands) | |||
*The location to place scripts | |||
**/work/user | |||
<u>Rules</u> | |||
Developing a pipeline: | |||
copy a small but representative part of your data to sirius run all the programs you need on them | |||
debug and save final version of pipeline e.g. in a text file copy all your data | |||
run your pipeline on all data | |||
debug and update pipeline | |||
mv results wherever you want | |||
erase data | |||
never start more jobs than the number of available cores | |||
look at the memory and cpu usage before you start to load | |||
sirius with commands (cmd) htop or top | |||
* | |||
* | |||
* | |||
<br> | <br> |
Revision as of 16:12, 19 February 2013
Sirius Usage (our lab server)
Sirius is our analytical powerhouse (64 cores, amazing for parallel computing; 512Gb memory; 64 bit file system in the x86_64 configuration) and we have specific locations on the server to do specific jobs. It is stored in a lovely server closet and so the way to access it is though a secure shell (ssh). Your username and password are obtained through our IT staff. Once you have logged on, there are a series of commands and "server etiquette" you will need to follow.
Login
- ssh user@sirius.eeb.ucla.edu
- slogin user@sirius.eeb.ucla.edu
- to learn about the server:
- uname -a
Structure and organization
- Your home (user) director holds <5Gb of data (be aware!)
- /home/user
- For genomes and databases
- /databases
- Location of installed programs
- /usr/local/bin
- /opt/
- The location to store your data
- /data/
- /data/user
- You can create your own personal directory if you'd like (see below for commands)
- The location to place scripts
- /work/user
Rules Developing a pipeline: copy a small but representative part of your data to sirius run all the programs you need on them debug and save final version of pipeline e.g. in a text file copy all your data run your pipeline on all data debug and update pipeline mv results wherever you want erase data never start more jobs than the number of available cores look at the memory and cpu usage before you start to load sirius with commands (cmd) htop or top
Basic server commands (for Sirius)
Here is a list of commonly used linux commands:
Command | Usage |
ssh username@sirius.eeb.ucla.edu | Secure shell login to the Sirius server |
logout (or control+D) | Logout of the Sirius server |
pwd | Print working directory (your current location |
ls | List (all contents of current location) |
ls options | ls -a (hidden files), ls -l (long/detailed list), ls -t (sorted by time modified instead of name) |
cd /give/path | Change directories |
cd .. | Go up one directory |
mkdir directoryName | Make a new directory |
rmdir directoryName | Remove directory (must be empty)...Remember that you cannot undo this move! |
rmdir -r directoryName | Recursively remove directory and the files it contains...Remember that you cannot undo this move! |
rmdir filename | Remove specified file...Remember that you cannot undo this move! |
head filename | Print to screen the top 10 lines or so of the specified file |
tail filename | Print to screen the last 10 lines or so of the specified file |
more filename | Allows file contents or piped output to be sent to the screen one page at a time |
less filename | Opposite of more command |
wc filename | Print byte, word, and line counts |
wc [options] filename | -c (bytes); -l (lines); -w (words) delimited by whitespace or newline |
whereis [filename, command] | Lists all occurances of filename or command |
mv current/path destination/path | Move (akin to cut/paste), to remove the file in the current location |
cp current/path destination/path | Copy (also used to rename files if you keep them in their current path), keeps a copy in the current path |
~/path | Tilde designated a shortcut for the path to your home directory |
nohup commands & | To initiate a no-hangup background job |
screen | To initiate a new screen session to start a new background job |
tar -xzf filename.tar.gz | Decompress tar.gz file |
gzip -c filename >filename.gz | Compress file into tar.gz; the ">" means print to outfile filename.gz |
Here is a list of commonly used linux commands for learning about the CPU utilization:
Command | Usage |
top | Display top CPU processes/jobs and provides an ongoing look at processor activity in real time. It displays a listing of the most CPU-intensive tasks on the system, and can provide an interactive interface for manipulating processes. It can sort the tasks by CPU usage, memory usage and runtime. |
mpstat | To display the utilization of each CPU individually. It reports processors related statistics. |
mpstat -P ALL | The mpstat command display activities for each available processor, processor 0 being the first one. Global average activities among all processors are also reported. |
sar | Displays the contents of selected cumulative activity counters in the operating system |
High throughput (HT) platform and read types
- ABI-SOLiD
- Illumina single-end vs. paired-end
- Ion Torrent
- MiSeq
- Roche-454
- Solexa
CBI Collaboratory
UCLA's
File formats and conversions
- blc
- qseq
- fastq
Deplexing using barcoded sequence tags
- Editing (or hamming) distance
Quality control
- Fastx tools
- Using mapping as the quality control for reads
Trimming and clipping
- Trim based on low quality scored per nucleotide position within a read
- Clip sequence artefacts (e.g. adapters, primers)
FASTQC and FASTX tools
BED and SAM tools
GATK variant calling
R basics
HT sequence analysis using R (and Bioconductor)
DNA sequence analysis
RNA-seq analysis
Common objectives of transcriptome analysis:
- Quantifying and annotating aligned reads
- Normalizing RNA-Seq read count data and identifying differentially expressed genes (DEG) (R packages):
- easyRNASeq (simplifies read counting per genome feature)
- DEXSeq (Inference of differential exon usage)
- baySeq (also see: segmentSeq)
- Genominator (Bullard et al. 2010)
- Detection of alternative splice junctions
SOLiD software tools