Distributed Annotation System for DNA Part Registries

Vincent 05:55, 28 August 2009 (EDT): This is a work in progress. If you are interested to contribute, or if you want some more info, please feel free to contact me.

Overview

...
...

Objectives

...
...

DNA Part DAS Server

Server address:
Typical queries:
- retrieve all parts
- retrieve all supported annotation types
- retrieve DNA from a given part
- retrieve all annotation from a given part
- retrieve subparts from a given part
- retrieve superparts from a given part
Statistics about parts
Source code @ Gut

Software architecture

Implementation Steps

We summarise here the different steps undertaken during this project.

Run Dazzle on the Google App Engine (GAE)

Dazzle is a Java application that usually runs on a Tomcat server. However, GAE support Java applications, and no tweaking is necessary to run Dazzle on GAE.
Instructions@BioJava

Implement a BioSQL subset on top of the Google datastore

BioSQL is a popular relational database model to store DNA sequences and annotations.
BioPython, BioJava, and BioPerl projects provide easy connectivity to the schema.
Google datastore is not a relational database. BioSQL schema has to be reformated into a more object oriented data model.
Only a BioSQL subset was considered for this project. Below is listed the implemented BioSQL tables:
- Ontology and Term
- Biodatabase, Bioentry, Biosequence, Bioentry_Qualifier_Value, Seqfeature, Location

Implement a Dazzle plugin to support BioSQL/datastore queries

You can find here instructions about how to write a new Dazzle plugin.
The new plugin implements the following methods:
- ...
- ...

Process and Upload data from MIT Part Registry to Google App Engine (GAE)

The MIT Part Registry implements a limited API to access its data:
- limited FASTA description of parts (part dump in FASTA)
- limited DAS description of parts (no assembly information for example)
A Biopython script was used to process the FASTA dump file to generate GAE Upload files. Below is the BioBrick information that was processed:
- BioBrick Sequence
- BioBrick Author
- BioBrick Category
- BioBrick DNA Status
- BioBrick Short Description
- BioBrick Assembly information (subpart + superparts from BLAST queries within Biopython script)

Project resources

DAS standard and its current specifications (v.1.53)
Dazzle DAS server
BioSQL schema
BioPython and BioJava
Google App Engine documentation
BioSQL on GAE from Brad Chapman, see his blog post.

User:Vincent Rouilly/Distributed Annotation System (DAS) for DNA Part Registries

Contents

Distributed Annotation System for DNA Part Registries

Overview

Objectives

DNA Part DAS Server

Software architecture

Implementation Steps

Run Dazzle on the Google App Engine (GAE)

Implement a BioSQL subset on top of the Google datastore

Implement a Dazzle plugin to support BioSQL/datastore queries

Process and Upload data from MIT Part Registry to Google App Engine (GAE)

Project resources

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

research

Tools