User talk:Darek Kedra/sandbox 28: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
Line 8: Line 8:


# TagDust: http://genome.gsc.riken.jp/osc/english/software/src/tagdust.tgz
# TagDust: http://genome.gsc.riken.jp/osc/english/software/src/tagdust.tgz
# fastareformat from fastareformat exonerate-2.2.0
# fastareformat from fastareformat exonerate-2.2.0 [http://www.ebi.ac.uk/~guy/exonerate/]
# fixing fasta headers (gff fields) with python? small script
# fixing fasta headers (gff fields) with python? small script  
# GEM (problem with cores on different laptops...)  
# GEM [http://algorithms.cnag.cat/wiki/The_GEM_library]
## CAVEAT: (problem with cores on different laptops...)  
http://sourceforge.net/projects/gemlibrary/files/gem-library/Binary%20pre-release%203/
http://sourceforge.net/projects/gemlibrary/files/gem-library/Binary%20pre-release%203/
# BWA http://sourceforge.net/projects/bio-bwa/files/
# BWA http://sourceforge.net/projects/bio-bwa/files/
# Stampy http://www.well.ox.ac.uk/~gerton/software/Stampy/stampy-1.0.22r1848.tgz
# Stampy http://www.well.ox.ac.uk/~gerton/software/Stampy/stampy-1.0.22r1848.tgz
# last http://last.cbrc.jp/
# last http://last.cbrc.jp/ (the 362 versiona has split and splice-mapping options)
# bowtie http://bowtie-bio.sourceforge.net/bowtie2/index.shtml (bowtie2)
# bowtie http://bowtie-bio.sourceforge.net/bowtie2/index.shtml (bowtie2)
# samtools http://sourceforge.net/projects/samtools/files/
# samtools http://sourceforge.net/projects/samtools/files/
Line 21: Line 22:


# bamtools https://github.com/pezmaster31/bamtools
# bamtools https://github.com/pezmaster31/bamtools
## requires cmake: http://www.cmake.org/files/v2.8/cmake-2.8.12.tar.gz (or apt get)
# bedtools http://code.google.com/p/bedtools/downloads/list
# bedtools http://code.google.com/p/bedtools/downloads/list
#GATK  http://www.broadinstitute.org/gatk/auth?package=GATK (download yourself: license!)  
#GATK  http://www.broadinstitute.org/gatk/auth?package=GATK (download yourself: license!)  
Line 29: Line 31:
# cufflinks http://cufflinks.cbcb.umd.edu/ (may require Boost libs!)
# cufflinks http://cufflinks.cbcb.umd.edu/ (may require Boost libs!)
# GEMtools https://github.com/gemtools/gemtools
# GEMtools https://github.com/gemtools/gemtools


==Introduction to Linux and the command line==
==Introduction to Linux and the command line==

Revision as of 06:20, 24 October 2013

Winterschool program

Software list

Basics

  1. linux Ubuntu 12.04.3 vs Debian 7.1 (think about 32 vs 64 bit versions)
  2. java http://www.java.com/en/download/linux_manual.jsp?locale=en

Specific tools 1

  1. FastQC http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
  1. TagDust: http://genome.gsc.riken.jp/osc/english/software/src/tagdust.tgz
  2. fastareformat from fastareformat exonerate-2.2.0 [1]
  3. fixing fasta headers (gff fields) with python? small script
  4. GEM [2]
    1. CAVEAT: (problem with cores on different laptops...)

http://sourceforge.net/projects/gemlibrary/files/gem-library/Binary%20pre-release%203/

  1. BWA http://sourceforge.net/projects/bio-bwa/files/
  2. Stampy http://www.well.ox.ac.uk/~gerton/software/Stampy/stampy-1.0.22r1848.tgz
  3. last http://last.cbrc.jp/ (the 362 versiona has split and splice-mapping options)
  4. bowtie http://bowtie-bio.sourceforge.net/bowtie2/index.shtml (bowtie2)
  5. samtools http://sourceforge.net/projects/samtools/files/
  6. picard http://sourceforge.net/projects/picard/files/
  7. IGV/ IGVtools http://www.broadinstitute.org/software/igv/download
  1. bamtools https://github.com/pezmaster31/bamtools
    1. requires cmake: http://www.cmake.org/files/v2.8/cmake-2.8.12.tar.gz (or apt get)
  2. bedtools http://code.google.com/p/bedtools/downloads/list
  3. GATK http://www.broadinstitute.org/gatk/auth?package=GATK (download yourself: license!)
  4. vcftools http://sourceforge.net/projects/vcftools/files/

Specific tools 2/RNA-Seq

  1. tophat http://tophat.cbcb.umd.edu/
  2. cufflinks http://cufflinks.cbcb.umd.edu/ (may require Boost libs!)
  3. GEMtools https://github.com/gemtools/gemtools

Introduction to Linux and the command line

  1. why Linux?
  2. logging in, connecting to other servers with ssh / sftp
  3. copy, rename/move files, create directories, symbolic links
  4. view files (more/less, head, tail), count (wc)
  5. search for strings / replace strings (grep & sed)
  6. compressing / uncompressing files (gzip, bzip2, tar)
  7. pipelines and redirection
  8. awk in 5 minutes
  9. where to go from there (clusters, python)

FASTQ

  1. Illumina file formats (quality encodings)
  2. paired / unpaired reads
  3. quality checking (fastqc)
  4. trimming & filtering (TagDust)
  5. source of published FASTQ data: Short Read Archive vs ENA

Genomic fasta and gtf/gff gene annotation

  1. resources at ENSEMBL
  2. basic checks and reformatting
  • grepping fasta headers
  • fasta reformat from exonerate??

Mapping genomic reads

  1. overview of mappers
    1. GEM
    2. bwa +/- stampy
    3. last / bowtie
  2. mapping steps (for each mapper)
  3. genome indexing
  4. mapping
  5. +/- postprocessing

SAM and BAM file formats

  1. Analyzing BAM files
  2. sorting / indexing
  3. viewing the mappings in IGV

tools for processing BAM files

  1. samtools
  2. picard
  3. bamtools

getting mapping stats

  1. extracting reads mapping to regions
  2. getting coverage info for selected regions

Detecting SNPs

  1. general procedure
  2. GATK pipeline
  3. other SNP calling programs [tba]

Working with VCF files

  1. VCF file format
  2. viewing VCFs in IGV
  3. filtering SNPs by quality
  4. set operations on VCF files (common SNPs, unique SNPs)

RNASeq

  1. caveats (ribosomal RNA contamination)
  2. mapping RNASeq
  3. tophat
  4. GRAPE
  5. creating gene models from RNASeq (cufflinks)