BioMicroCenter:Software

From OpenWetWare

(Difference between revisions)
Jump to: navigation, search
(BMC-BCC Pipeline)
(3 intermediate revisions not shown.)
Line 42: Line 42:
== UNIX SERVER ==
== UNIX SERVER ==
-
A large amount of software is installed on our cluster server.  Please look at the ROUS page.
+
A large amount of software is installed on our cluster server.  Please look at the [http://openwetware.org/wiki/BioMicroCenter:Servers#Server_Software_Installed_on_ROUS '''ROUS'''] page .
== BMC-BCC Pipeline ==
== BMC-BCC Pipeline ==
The pipeline processes flowcell directories as they are generated by the Illumina sequencer software and postprocesses the output for use in downstream biological analyses. It is intended to be used by core facilities who own and/or operate Illumina sequencers for automation and consistency of processing Illumina data. The pipeline is a collection of command line utilities written primarily in the Python programming language. The commands are tied together using the ruffus pipelining package.
The pipeline processes flowcell directories as they are generated by the Illumina sequencer software and postprocesses the output for use in downstream biological analyses. It is intended to be used by core facilities who own and/or operate Illumina sequencers for automation and consistency of processing Illumina data. The pipeline is a collection of command line utilities written primarily in the Python programming language. The commands are tied together using the ruffus pipelining package.
 +
 +
'''Release Notes 1.2''' (01/01/2014)
 +
* An information site about the pipeline run is delivered to MIT users
 +
* Sample data directory includes the flowcell code
 +
* Bug fix for pipeline re-run. When the pipeline was re-run, data may be duplicated in the fastq files. This is now fixed.
 +
* Performance enhancement. Data is written directly to the published directory for users, and copy is avoided whenever possible. This not only reduces disk storage, but also allows users to get their data faster.
'''Release Notes 1.0.2''' (08/19/2013)
'''Release Notes 1.0.2''' (08/19/2013)
-
* Switch from Bowtie to BWA for default alignments for generating SAM and BAM files.<p>The BWA version 0.7.5a is used by default for alignment. For Illumina sequence reads up to 70bp, the alignment is done by aln/samse/sampe (the BWA-backtrack algorithm). For longer sequence read > 70bp, the mem subcommand (the BWA-MEM algorithm) is used.</p>
+
* Switch from Bowtie to BWA for default alignments for generating SAM files.<p>The BWA version 0.7.5a is used by default for alignment. For Illumina sequence reads up to 70bp, the alignment is done by aln/samse/sampe (the BWA-backtrack algorithm). For longer sequence read > 70bp, the mem subcommand (the BWA-MEM algorithm) is used.</p>
* Bug fix for large SAM/BAM files<p>When processing large fastq files to generate a sam file, the sam file may be corrupted at the end of the file under certain circumstance if it is larger than 40GB. As a result, the SAM-BAM conversion may get a core dump. This is now fixed.</p>
* Bug fix for large SAM/BAM files<p>When processing large fastq files to generate a sam file, the sam file may be corrupted at the end of the file under certain circumstance if it is larger than 40GB. As a result, the SAM-BAM conversion may get a core dump. This is now fixed.</p>

Revision as of 13:19, 30 December 2013

Image:BioMicroCenter-header6.jpg

A large amount of bioinformatic software is available at MIT. This page is meant to summarize some of the most common requests we have. The BioMicro Center collaborates with the Koch Institute Bioinformatics Computing Core and the MIT Libraries to support different packages

Contents

Desktop Software

Desktop software is available from our Download Page. Access may be limited to MIT users only. Below is a list of the software available:

  • Agilent 2100 Expert This software package is used to control the Agilent 2100 Bioanalyzer and to perform analysis of the output, including microfluidic and electrophoretic assays for RNA, DNA and proteins, as well as two-color flow cytometry. The software can be installed on your desktop to allow users to do additional analyses.
  • SSH This software is what we recommend for UNIX access to rous and for downloading files form our servers
  • Spotfire is a widely used data analysis and visualization tool. It can handle a number of clustering functions and statistical tests and has very robust graphical capabilities. The BioMicro Center operates a Spotfire server that is available to anyone at MIT. Licenses for Spotfire are available through the BioMicro Center on a yearly basis.ew
  • MATLAB A mathematical programming language used for mathematical modeling, as well as analyzing and visualizing data. Contact Stephen Goldman for access.
  • Tecan EvoWare Standard This software is available as part of our robotics service. Identical to the software used on the Tecan EVO 150s, the software contains a simulator that can be used to design your robotics experiments at your bench. Note that this software is on a different server.
  • COMSOL Multiphysics This software package creates a simulation environment that facilitates all steps in the modeling process.
  • MacVector a comprehensive Macintosh application that provides sequence editing, primer design, internet database searching, protein analysis, sequence confirmation, multiple sequence alignment, phylogenetic reconstruction, coding region analysis, and a wide variety of other functions.
  • Lasergene v8.0 A software package that provides sequence assembly including next-generation sequence analysis; simplified primer design, and expanded SNP reporting and management.

Galaxy

Front Page of the MIT Galaxy Site
Front Page of the MIT Galaxy Site

Galaxy is a bioinformatics platform that is designed to bring complicated informatics tools to bench scientists. Galaxy allows you to do analyses you cannot do anywhere else without the need to install or download anything. You can analyze multiple alignemnts, compare genomic annotations, profile metagenomic samples and much much more. For many users, the public Galaxy instance at Penn State can provide a very robust tool.

To make things even easier we have created a galaxy server here at MIT. The Galaxy Server acts as a separate head node for ROUS. Users are required to have data storage space on Rowley or BMC-PUB and may be required to purchase a queue on ROUS.

Additional Resources

Software from MIT Libraries

  • BioBASE The BIOBASE Knowledge Library (BKL) contains comprehensive sets of protein databases such as HumanPSD, WormPD, GPGR-PD, PombePD, and MycopathPD in addition to analysis tools such as TRANSFAC, TRANSPATH, and ExPlain. BKL brings together curated data, analysis tools, and gene-centered information. BKL is one of the best ways to quickly assess a vast set of protein properties for a given protein or set of proteins.
  • GeneGO Metacore GeneGo is a leading provider of data mining & analysis solutions in systems biology. MetaCore, GeneGo's flapship product, is an integrated software suite for functional analysis of experimental data. MetaCore is based on a curated database of human protein-protein, protein-DNA interactions, transcription factors, signaling and metabolic pathways, disease and toxicity, and the effects of bioactive molecules.
  • INGENUITY PATHWAY ANALYSIS software that helps researchers model, analyze, and understand complex biological and chemical systems relevant to their experimental data. Researchers can search the scientific literature and find insights most relevant to their experimental data; analyze and build pathway models related to thier experimental data;and share and collaborate with colleagues. IPA is currently licensed through June 2012.

UNIX SERVER

A large amount of software is installed on our cluster server. Please look at the ROUS page .

BMC-BCC Pipeline

The pipeline processes flowcell directories as they are generated by the Illumina sequencer software and postprocesses the output for use in downstream biological analyses. It is intended to be used by core facilities who own and/or operate Illumina sequencers for automation and consistency of processing Illumina data. The pipeline is a collection of command line utilities written primarily in the Python programming language. The commands are tied together using the ruffus pipelining package.

Release Notes 1.2 (01/01/2014)

  • An information site about the pipeline run is delivered to MIT users
  • Sample data directory includes the flowcell code
  • Bug fix for pipeline re-run. When the pipeline was re-run, data may be duplicated in the fastq files. This is now fixed.
  • Performance enhancement. Data is written directly to the published directory for users, and copy is avoided whenever possible. This not only reduces disk storage, but also allows users to get their data faster.

Release Notes 1.0.2 (08/19/2013)

  • Switch from Bowtie to BWA for default alignments for generating SAM files.

    The BWA version 0.7.5a is used by default for alignment. For Illumina sequence reads up to 70bp, the alignment is done by aln/samse/sampe (the BWA-backtrack algorithm). For longer sequence read > 70bp, the mem subcommand (the BWA-MEM algorithm) is used.

  • Bug fix for large SAM/BAM files

    When processing large fastq files to generate a sam file, the sam file may be corrupted at the end of the file under certain circumstance if it is larger than 40GB. As a result, the SAM-BAM conversion may get a core dump. This is now fixed.

Release Notes 0.9 (10/18/2011)

Implemented all core functionality:

  • setting up and converting qseq files
  • qseq to fastq
  • fastqc and tag count statistics on flowcell-level sequences
  • splitting of barcoded samples into individual directories
  • individual fastqc
  • genome alignment using bowtie plus statistics
  • contamination qc checking
  • tag counts
  • conversion of alignments from SAM to BAM
  • production of bigWig files from SAM alignments
  • publishing user data to web directories

Personal tools