User:Vincent Rouilly/Distributed Annotation System (DAS) for DNA Part Registries
< User:Vincent Rouilly
- 1 Distributed Annotation System for DNA Part Registries
- 1.1 Overview
- 1.2 Objectives
- 1.3 DNA Part DAS Server
- 1.4 Software architecture
- 1.5 Implementation Steps
- 1.6 Project resources
Distributed Annotation System for DNA Part Registries
Vincent 05:55, 28 August 2009 (EDT): This is a work in progress. If you are interested to contribute, or if you want some more info, please feel free to contact me.
DNA Part DAS Server
- Server address:
- Typical queries:
- retrieve all parts
- retrieve all supported annotation types
- retrieve DNA from a given part
- retrieve all annotation from a given part
- retrieve subparts from a given part
- retrieve superparts from a given part
- Statistics about parts
- Source code @ Gut
We summarise here the different steps undertaken during this project.
Run Dazzle on the Google App Engine (GAE)
- Dazzle is a Java application that usually runs on a Tomcat server. However, GAE support Java applications, and no tweaking is necessary to run Dazzle on GAE.
Implement a BioSQL subset on top of the Google datastore
- BioSQL is a popular relational database model to store DNA sequences and annotations.
- BioPython, BioJava, and BioPerl projects provide easy connectivity to the schema.
- Google datastore is not a relational database. BioSQL schema has to be reformated into a more object oriented data model.
- Only a BioSQL subset was considered for this project. Below is listed the implemented BioSQL tables:
- Ontology and Term
- Biodatabase, Bioentry, Biosequence, Bioentry_Qualifier_Value, Seqfeature, Location
Implement a Dazzle plugin to support BioSQL/datastore queries
- You can find here instructions about how to write a new Dazzle plugin.
- The new plugin implements the following methods:
Process and Upload data from MIT Part Registry to Google App Engine (GAE)
- The MIT Part Registry implements a limited API to access its data:
- A Biopython script was used to process the FASTA dump file to generate GAE Upload files. Below is the BioBrick information that was processed:
- BioBrick Sequence
- BioBrick Author
- BioBrick Category
- BioBrick DNA Status
- BioBrick Short Description
- BioBrick Assembly information (subpart + superparts from BLAST queries within Biopython script)