User talk:Darek Kedra/sandbox 29: Difference between revisions
From OpenWetWare
Jump to navigationJump to search
Darek Kedra (talk | contribs) No edit summary |
Darek Kedra (talk | contribs) |
||
Line 108: | Line 108: | ||
* L.major | * L.major | ||
http://tritrypdb.org/common/downloads/release-8.0/LmajorFriedlin/fasta/data/TriTrypDB-8.0_LmajorFriedlin_Genome.fasta | http://tritrypdb.org/common/downloads/release-8.0/LmajorFriedlin/fasta/data/TriTrypDB-8.0_LmajorFriedlin_Genome.fasta | ||
=Extra material = | |||
==Whole Genome Sequencing== | |||
===Tips before you start=== | |||
* if possible, use haploid genome (see bee genome project where they used haploid drones) | |||
* next best thing: highly inbred, nearly homozygous lines | |||
Sometimes you have unexpected treasures: | |||
There is probably (almost)-homozygous cow: http://en.wikipedia.org/wiki/Chillingham_cattle | |||
* there are cases where extra-chromosomal DNA (40-70 chloroplasts per plant cell are often in 150kb range, with 40-70 copies per organelle) contributes non-trivial portion of total DNA. Select tissues/stages with less multiple copy DNAs |
Revision as of 18:05, 16 September 2014
EMBO Tunis 2014
From sequencing data to knowledge
00 Programs used
sequence pre-processing
- SRA_toolkit ver current
- FastQC ver 0.11.2
- Trimmomatic ver 0.32
- TagDust ver 2.13
- Coral ver 1.4
general tools
- fastx_toolkit ver 0.0.13
- Samtools classic ver 0.1.19
- samtools/HTSlib ver 1.0
- Picard ver 1.119
mappers
Splice reader mappings
- fqgrep Github version plus
- TRE_library ver 0.80
viewers
quantification
SNPs discovery
- GATK ver 3.2-2
01 Data files used
FASTQ files
L.amazonensis RNA-Seq
L mexicana genomic DNA
(extra set) L.enriettii genomic DNA
Stuff to read / compare
File formats
- http://biobits.org/samtools_primer.html (file formats)
VCF
- http://vcftools.sourceforge.net/ (VCFTools)
BED
- http://genome.ucsc.edu/FAQ/FAQformat.html#format1
- http://www.broadinstitute.org/igv/BED
- http://www.ensembl.org/info/website/upload/bed.html
- http://bedtools.readthedocs.org/en/latest/ BEDTOOLS
GFF / GTF
Genomes and annotations
- L mexicana
- L.amazonensis
- L.enriettii
- L.major
Extra material
Whole Genome Sequencing
Tips before you start
- if possible, use haploid genome (see bee genome project where they used haploid drones)
- next best thing: highly inbred, nearly homozygous lines
Sometimes you have unexpected treasures: There is probably (almost)-homozygous cow: http://en.wikipedia.org/wiki/Chillingham_cattle
- there are cases where extra-chromosomal DNA (40-70 chloroplasts per plant cell are often in 150kb range, with 40-70 copies per organelle) contributes non-trivial portion of total DNA. Select tissues/stages with less multiple copy DNAs