User:Vincent Rouilly/Distributed Annotation System (DAS) for DNA Part Registries

=Distributed Annotation System for DNA Part Registries=

Vincent 05:55, 28 August 2009 (EDT): This is a work in progress. If you are interested to contribute, or if you want some more info, please feel free to contact me.

DNA Part DAS Server

 * Server address:
 * Typical queries:
 * retrieve all parts
 * retrieve all supported annotation types
 * retrieve DNA from a given part
 * retrieve all annotation from a given part
 * retrieve subparts from a given part
 * retrieve superparts from a given part
 * Statistics about parts
 * Source code @ Gut

Implementation Steps
We summarise here the different steps undertaken during this project.

Run Dazzle on the Google App Engine (GAE)

 * Dazzle is a Java application that usually runs on a Tomcat server. However, GAE support Java applications, and no tweaking is necessary to run Dazzle on GAE.
 * Instructions@BioJava

Implement a BioSQL subset on top of the Google datastore

 * BioSQL is a popular relational database model to store DNA sequences and annotations.
 * BioPython, BioJava, and BioPerl projects provide easy connectivity to the schema.
 * Google datastore is not a relational database. BioSQL schema has to be reformated into a more object oriented data model.
 * Only a BioSQL subset was considered for this project. Below is listed the implemented BioSQL tables:
 * Ontology and Term
 * Biodatabase, Bioentry, Biosequence, Bioentry_Qualifier_Value, Seqfeature, Location

Implement a Dazzle plugin to support BioSQL/datastore queries

 * You can find here instructions about how to write a new Dazzle plugin.
 * The new plugin implements the following methods:

Process and Upload data from MIT Part Registry to Google App Engine (GAE)

 * The MIT Part Registry implements a limited API to access its data:
 * limited FASTA description of parts (part dump in FASTA)
 * limited DAS description of parts (no assembly information for example)
 * A Biopython script was used to process the FASTA dump file to generate GAE Upload files. Below is the BioBrick information that was processed:
 * BioBrick Sequence
 * BioBrick Author
 * BioBrick Category
 * BioBrick DNA Status
 * BioBrick Short Description
 * BioBrick Assembly information (subpart + superparts from BLAST queries within Biopython script)

Project resources

 * DAS standard and its current specifications (v.1.53)
 * Dazzle DAS server
 * BioSQL schema
 * BioPython and BioJava
 * Google App Engine documentation
 * BioSQL on GAE from Brad Chapman, see his blog post.