User talk:Darek Kedra/sandbox 28
From OpenWetWare
Jump to navigationJump to search
Winterschool program
Introduction to Linux and the command line
- why Linux?
- logging in, connecting to other servers with ssh / sftp
- copy, rename/move files, create directories, symbolic links
- view files (more/less, head, tail), count (wc)
- search for strings / replace strings (grep & sed)
- compressing / uncompressing files (gzip, bzip2, tar)
- pipelines and redirection
- awk in 5 minutes
- where to go from there (clusters, python)
FASTQ
- Illumina file formats (quality encodings)
- paired / unpaired reads
- quality checking (fastqc)
- trimming & filtering (TagDust)
- source of published FASTQ data: Short Read Archive vs ENA
Genomic fasta and gtf/gff gene annotation
- resources at ENSEMBL
- basic checks and reformatting
Mapping genomic reads
- overview of mappers
- GEM
- bwa +/- stampy
- last / bowtie
- mapping steps (for each mapper)
- genome indexing
- mapping
- +/- postprocessing
SAM and BAM file formats
- Analyzing BAM files
- sorting / indexing
- viewing the mappings in IGV
tools for processing BAM files
- samtools
- picard
- bamtools
getting mapping stats
- extracting reads mapping to regions
- getting coverage info for selected regions
Detecting SNPs
- general procedure
- GATK pipeline
- other SNP calling programs [tba]
Working with VCF files
- VCF file format
- viewing VCFs in IGV
- filtering SNPs by quality
- set operations on VCF files (common SNPs, unique SNPs)
RNASeq
- caveats (ribosomal RNA contamination)
- mapping RNASeq
- tophat
- GRAPE
- creating gene models from RNASeq (cufflinks)