Cronn Lab:Informatics: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
No edit summary
 
(10 intermediate revisions by 2 users not shown)
Line 1: Line 1:
{{Template:Cronn}}
{{Template:Cronn}}


==Informatics Infrastructure==
==Informatics infrastructure==
Much of our computational needs are facilitated through dedicated nodes on Oregon State University's [http://corelabs.cgrb.oregonstate.edu/biocomputing Center for Genome Research and Biocomputing] high-performance computing cluster.  We currently own the following nodes:
Much of our computational needs are facilitated through dedicated nodes on Oregon State University's [http://corelabs.cgrb.oregonstate.edu/biocomputing Center for Genome Research and Biocomputing]high-performance computing cluster.  We currently own the following resources:


* pine1 - dual quad core 2.66 GHz Intel processors with a total of 16 GB of RAM.
* pine1 - The original. Dual quad core 2.66 GHz Intel processors with 32 GB of RAM.
* pine2 - coming online late 2009: dual 12 core 2.66 GHz Intel processors with a total of 96 GB of RAM.
* pine2 - Dual quad core 2.13 GHz Intel processors with 96 GB of RAM.
* pine3 - still thinkin' about it.
* smokey - 20 TB RAID system.


These systems are currently run through a 64 bit version of [http://www.redhat.com/ Enterprise Red Hat] Linux.
These systems are currently run through a 64 bit version of [http://www.redhat.com/ Enterprise Red Hat] Linux.


==Solexa Barcode Sorting==
==Solexa barcode sorting==
Most of our Solexa runs include multiplex massively parallel sequencing (MMPS).  Because these micro-reads include a sample-specific barcode (as well as the quality control 'T') a first step is to sort these reads by barcode and to remove the barcode.  This is facilitated by a custom perl script.
Most of our Solexa runs include multiplex massively parallel sequencing (MMPS).  Because these micro-reads include a sample-specific barcode (as well as the quality control 'T') a first step is to sort these reads by barcode and to remove the barcode.  This is facilitated by a custom perl script.


==De Novo Assembly==
*[http://nar.oxfordjournals.org/cgi/content/full/36/19/e122?maxtoshow=&hits=10&RESULTFORMAT=1&author1=cronn&andorexacttitle=and&andorexacttitleabs=and&andorexactfulltext=and&searchid=1&FIRSTINDEX=0&sortspec=relevance&resourcetype=HWCIT Nucl. Acids. Res.] - Article describing barcoding.
For ''de novo'' assembly of micro-reads we typically use [http://www.ebi.ac.uk/~zerbino/velvet/ velvet].
*[http://brianknaus.com/software/srtoolbox/shortread.html Short read toolbox] - Includes barcode sorting script.


==Reference Based Assembly==
''--- notice! --- As of summer 2011, we have been increasingly moving to Illumina's index sequencing method. Our experience to date from ~50 libraries has been very positive, and we are starting to explore novel indexes beyond the currently-available list of 24.''
 
==De novo assembly==
For ''de novo'' assembly of micro-reads we typically use [http://www.ebi.ac.uk/~zerbino/velvet/ velvet] for genomic DNA. We are now using the [http://trinityrnaseq.sourceforge.net/ Trinity] package for de novo assembly of RNA-seq data.
 
==Reference based assembly==
When we have a reasonable reference we use either [http://rga.cgrb.oregonstate.edu/ RGA] or [http://maq.sourceforge.net/ MAQ].
When we have a reasonable reference we use either [http://rga.cgrb.oregonstate.edu/ RGA] or [http://maq.sourceforge.net/ MAQ].


==Pine2 Software==
==Lists of software==
The lab group is getting a new server named 'pine2.'  During the setup process our wonderful sysadmin will be involved with installing software.  In order to help get as much software as we would like on the system we're using this space as a list of softwares for him to install.  In the interest of stability, he usually uses old versions and does not update them.  If you're interested in a particular version please state it.
[[Short read toolbox]]
 
*R 2.1.0
**ape
**seqinr
**RMySQL
*Perl 5.10.1
*BioPerl 1.6.0
*Circos
*Python 2.6.3
*BioPython 1.52
*MySQL
*Emacs
*Emacs Speaks Statistics
*Seaview

Latest revision as of 23:23, 12 September 2011

Home        Research        Lab Members        Protocols        Informatics        Calendar        Links       


Informatics infrastructure

Much of our computational needs are facilitated through dedicated nodes on Oregon State University's Center for Genome Research and Biocomputinghigh-performance computing cluster. We currently own the following resources:

  • pine1 - The original. Dual quad core 2.66 GHz Intel processors with 32 GB of RAM.
  • pine2 - Dual quad core 2.13 GHz Intel processors with 96 GB of RAM.
  • smokey - 20 TB RAID system.

These systems are currently run through a 64 bit version of Enterprise Red Hat Linux.

Solexa barcode sorting

Most of our Solexa runs include multiplex massively parallel sequencing (MMPS). Because these micro-reads include a sample-specific barcode (as well as the quality control 'T') a first step is to sort these reads by barcode and to remove the barcode. This is facilitated by a custom perl script.

--- notice! --- As of summer 2011, we have been increasingly moving to Illumina's index sequencing method. Our experience to date from ~50 libraries has been very positive, and we are starting to explore novel indexes beyond the currently-available list of 24.

De novo assembly

For de novo assembly of micro-reads we typically use velvet for genomic DNA. We are now using the Trinity package for de novo assembly of RNA-seq data.

Reference based assembly

When we have a reasonable reference we use either RGA or MAQ.

Lists of software

Short read toolbox