IC Bioinfs/Project1/BioSQL

From OpenWetWare
Jump to navigationJump to search

About BioSQL - Official BioSQL Wiki site

BioSQL is a generic relational model covering sequences, features, sequence and feature annotation, a reference taxonomy, and ontologies (or controlled vocabularies).

While in its original incarnation (in 2001) conceived by Ewan Birney as a local relational store for GenBank, the project has since become a collaboration between the BioPerl, BioPython, BioJava, and BioRuby projects. The goal is to build a sufficiently generic schema for persistent storage of sequences, features, and annotation in a way that is interoperable between the Bio* projects. Each Bio* project has a language binding (object-relational mapping, ORM) to BioSQL.

Schema Overview

  • This wiki page describes some of the tables and fields in the BioSQL schema. It also aims to demonstrate functional capabilities using example SQL. Design philosophies and expectations are presented with reasoning.
  • PDF copy of the relational model behind BioSQL 1.0.
  • This wiki page breaks down GenBank and GFF3 files into their respective BioSQL tables and columns.

BioSQL on Google App (pasted article) from here

The BioSQL project provides a well thought out relational database schema for storing biological sequences and annotations. For those developers who are responsible for setting up local stores of biological data, BioSQL provides a huge advantage via reusability. Some of the best features of BioSQL from my experience are:

  • Available interfaces for several languages (via Biopython, BioPerl, BioJava and BioRuby).
  • Flexible storage of data via a key/value pair model. This models information in an extensible manner, and helps with understanding distributed key/value stores like SimpleDB and CouchDB.
  • Overall data model based on GenBank flat files. This makes teaching the model to biology oriented users much easier; you can pull up a text file from NCBI with a sequence and directly show how items map to the database.

Given the usefulness of BioSQL for local relational data storage, I would like to see it move into the rapidly expanding cloud development community. Tying BioSQL data storage in with Web frameworks will help researchers make their data publicly available earlier and in standard formats. As a nice recent example, George has a series of posts on using BioSQL with Ruby on Rails. There have also been several discussions of the BioSQL mailing list around standard web tools and APIs to sit on top of the database; see this thread for a recent example.

Towards these goals, I have been working on a BioSQL backed interface for Google App Engine. Google App Engine is a Python based framework to quickly develop and deploy web applications. For data storage, Google’s Datastore provides an object interface to a distributed scalable storage backend. Practically, App Engine has free hosting quotas which can scale to larger instances as demand for the data in the application increases; this will appeal to cost-conscious researchers by avoiding an initial barrier to making their data available.

My work on this was accelerated by the Open Bioinformatics Foundation’s move to apply for participation in Google’s Summer of Code. OpenBio is a great community that helps organize projects like BioPerl, Biopython, BioJava and BioRuby. After writing up a project idea for BioSQL on Google App Engine in our application, I was inspired to finish a demonstration of the idea.

I am happy to announce a simple demonstration server running a BioSQL based backend: BioSQL Web. The source code is available from my git repository. Currently the server allows uploads of GenBank formatted files and provides a simple view of the records, annotations and sequences. The data is stored in the Google Datastore with an object interface that mimics the BioSQL relational model.

Future posts will provide more details on the internals of the server end and client interface as they develop. As always, feedback, thoughts and code contributions are very welcome.

HOW TO INSTALL BIOSQL