User talk:Darek Kedra/sandbox 28

From OpenWetWare
Revision as of 04:10, 11 October 2013 by Darek Kedra (talk | contribs) (init page)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Winterschool program

Introduction to Linux and the command line

  1. why Linux?
  2. logging in, connecting to other servers with ssh / sftp
  3. copy, rename/move files, create directories, symbolic links
  4. view files (more/less, head, tail), count (wc)
  5. search for strings / replace strings (grep & sed)
  6. compressing / uncompressing files (gzip, bzip2, tar)
  7. pipelines and redirection
  8. awk in 5 minutes
  9. where to go from there (clusters, python)

FASTQ

  1. Illumina file formats (quality encodings)
  2. paired / unpaired reads
  3. quality checking (fastqc)
  4. trimming & filtering (TagDust)
  5. source of published FASTQ data: Short Read Archive vs ENA

Genomic fasta and gtf/gff gene annotation

  1. resources at ENSEMBL
  2. basic checks and reformatting

Mapping genomic reads

  1. overview of mappers
    1. GEM
    2. bwa +/- stampy
    3. last / bowtie
  2. mapping steps (for each mapper)
  3. genome indexing
  4. mapping
  5. +/- postprocessing

SAM and BAM file formats

  1. Analyzing BAM files
  2. sorting / indexing
  3. viewing the mappings in IGV

tools for processing BAM files

  1. samtools
  2. picard
  3. bamtools

getting mapping stats

  1. extracting reads mapping to regions
  2. getting coverage info for selected regions

Detecting SNPs

  1. general procedure
  2. GATK pipeline
  3. other SNP calling programs [tba]

Working with VCF files

  1. VCF file format
  2. viewing VCFs in IGV
  3. filtering SNPs by quality
  4. set operations on VCF files (common SNPs, unique SNPs)

RNASeq

  1. caveats (ribosomal RNA contamination)
  2. mapping RNASeq
  3. tophat
  4. GRAPE
  5. creating gene models from RNASeq (cufflinks)