User talk:Darek Kedra/sandbox 28: Difference between revisions
From OpenWetWare
Jump to navigationJump to search
Darek Kedra (talk | contribs) (init page) |
Darek Kedra (talk | contribs) (links to software) |
||
Line 1: | Line 1: | ||
=Winterschool program = | =Winterschool program = | ||
==Software list == | |||
===Basics== | |||
# linux Ubuntu 12.04.3 vs Debian 7.1 (think about 32 vs 64 bit versions) | |||
# java http://www.java.com/en/download/linux_manual.jsp?locale=en | |||
===Specific tools 1=== | |||
# TagDust: http://genome.gsc.riken.jp/osc/english/software/src/tagdust.tgz | |||
# fastareformat from fastareformat exonerate-2.2.0 | |||
# fixing fasta headers (gff fields) with python? small script | |||
# GEM (problem with cores on different laptops...) | |||
http://sourceforge.net/projects/gemlibrary/files/gem-library/Binary%20pre-release%203/ | |||
# BWA http://sourceforge.net/projects/bio-bwa/files/ | |||
# Stampy http://www.well.ox.ac.uk/~gerton/software/Stampy/stampy-1.0.22r1848.tgz | |||
# last http://last.cbrc.jp/ | |||
# bowtie http://bowtie-bio.sourceforge.net/bowtie2/index.shtml (bowtie2) | |||
# samtools http://sourceforge.net/projects/samtools/files/ | |||
# picard http://sourceforge.net/projects/picard/files/ | |||
# IGV/ IGVtools http://www.broadinstitute.org/software/igv/download | |||
# bamtools https://github.com/pezmaster31/bamtools | |||
# bedtools http://code.google.com/p/bedtools/downloads/list | |||
#GATK | |||
# http://www.broadinstitute.org/gatk/auth?package=GATK (download yourself: license!) | |||
# vcftools http://sourceforge.net/projects/vcftools/files/ | |||
===Specific tools 2/RNA-Seq=== | |||
# tophat http://tophat.cbcb.umd.edu/ | |||
# cufflinks http://cufflinks.cbcb.umd.edu/ (may require Boost libs!) | |||
# GEMtools https://github.com/gemtools/gemtools | |||
==Introduction to Linux and the command line== | ==Introduction to Linux and the command line== | ||
#why Linux? | #why Linux? | ||
Line 19: | Line 50: | ||
#resources at ENSEMBL | #resources at ENSEMBL | ||
#basic checks and reformatting | #basic checks and reformatting | ||
* grepping fasta headers | |||
* fasta reformat from exonerate?? | |||
==Mapping genomic reads== | ==Mapping genomic reads== |
Revision as of 06:47, 11 October 2013
Winterschool program
Software list
=Basics
- linux Ubuntu 12.04.3 vs Debian 7.1 (think about 32 vs 64 bit versions)
- java http://www.java.com/en/download/linux_manual.jsp?locale=en
Specific tools 1
- TagDust: http://genome.gsc.riken.jp/osc/english/software/src/tagdust.tgz
- fastareformat from fastareformat exonerate-2.2.0
- fixing fasta headers (gff fields) with python? small script
- GEM (problem with cores on different laptops...)
http://sourceforge.net/projects/gemlibrary/files/gem-library/Binary%20pre-release%203/
- BWA http://sourceforge.net/projects/bio-bwa/files/
- Stampy http://www.well.ox.ac.uk/~gerton/software/Stampy/stampy-1.0.22r1848.tgz
- last http://last.cbrc.jp/
- bowtie http://bowtie-bio.sourceforge.net/bowtie2/index.shtml (bowtie2)
- samtools http://sourceforge.net/projects/samtools/files/
- picard http://sourceforge.net/projects/picard/files/
- IGV/ IGVtools http://www.broadinstitute.org/software/igv/download
- bamtools https://github.com/pezmaster31/bamtools
- bedtools http://code.google.com/p/bedtools/downloads/list
- GATK
- http://www.broadinstitute.org/gatk/auth?package=GATK (download yourself: license!)
- vcftools http://sourceforge.net/projects/vcftools/files/
Specific tools 2/RNA-Seq
- tophat http://tophat.cbcb.umd.edu/
- cufflinks http://cufflinks.cbcb.umd.edu/ (may require Boost libs!)
- GEMtools https://github.com/gemtools/gemtools
Introduction to Linux and the command line
- why Linux?
- logging in, connecting to other servers with ssh / sftp
- copy, rename/move files, create directories, symbolic links
- view files (more/less, head, tail), count (wc)
- search for strings / replace strings (grep & sed)
- compressing / uncompressing files (gzip, bzip2, tar)
- pipelines and redirection
- awk in 5 minutes
- where to go from there (clusters, python)
FASTQ
- Illumina file formats (quality encodings)
- paired / unpaired reads
- quality checking (fastqc)
- trimming & filtering (TagDust)
- source of published FASTQ data: Short Read Archive vs ENA
Genomic fasta and gtf/gff gene annotation
- resources at ENSEMBL
- basic checks and reformatting
- grepping fasta headers
- fasta reformat from exonerate??
Mapping genomic reads
- overview of mappers
- GEM
- bwa +/- stampy
- last / bowtie
- mapping steps (for each mapper)
- genome indexing
- mapping
- +/- postprocessing
SAM and BAM file formats
- Analyzing BAM files
- sorting / indexing
- viewing the mappings in IGV
tools for processing BAM files
- samtools
- picard
- bamtools
getting mapping stats
- extracting reads mapping to regions
- getting coverage info for selected regions
Detecting SNPs
- general procedure
- GATK pipeline
- other SNP calling programs [tba]
Working with VCF files
- VCF file format
- viewing VCFs in IGV
- filtering SNPs by quality
- set operations on VCF files (common SNPs, unique SNPs)
RNASeq
- caveats (ribosomal RNA contamination)
- mapping RNASeq
- tophat
- GRAPE
- creating gene models from RNASeq (cufflinks)