User:Vincent Rouilly/Distributed Annotation System (DAS) for DNA Part Registries

From OpenWetWare
Jump to navigationJump to search

Distributed Annotation System for DNA Part Registries

Vincent 05:55, 28 August 2009 (EDT): This is a work in progress. If you are interested to contribute, or if you want some more info, please feel free to contact me.


  • ...
  • ...


  • ...
  • ...

DNA Part DAS Server

  • Server address:
  • Typical queries:
    • retrieve all parts
    • retrieve all supported annotation types
    • retrieve DNA from a given part
    • retrieve all annotation from a given part
    • retrieve subparts from a given part
    • retrieve superparts from a given part
  • Statistics about parts
  • Source code @ Gut

Software architecture

Implementation Steps

We summarise here the different steps undertaken during this project.

Run Dazzle on the Google App Engine (GAE)

  • Dazzle is a Java application that usually runs on a Tomcat server. However, GAE support Java applications, and no tweaking is necessary to run Dazzle on GAE.
  • Instructions@BioJava

Implement a BioSQL subset on top of the Google datastore

  • BioSQL is a popular relational database model to store DNA sequences and annotations.
  • BioPython, BioJava, and BioPerl projects provide easy connectivity to the schema.
  • Google datastore is not a relational database. BioSQL schema has to be reformated into a more object oriented data model.
  • Only a BioSQL subset was considered for this project. Below is listed the implemented BioSQL tables:
    • Ontology and Term
    • Biodatabase, Bioentry, Biosequence, Bioentry_Qualifier_Value, Seqfeature, Location

Implement a Dazzle plugin to support BioSQL/datastore queries

  • You can find here instructions about how to write a new Dazzle plugin.
  • The new plugin implements the following methods:
    • ...
    • ...

Process and Upload data from MIT Part Registry to Google App Engine (GAE)

  • The MIT Part Registry implements a limited API to access its data:
  • A Biopython script was used to process the FASTA dump file to generate GAE Upload files. Below is the BioBrick information that was processed:
    • BioBrick Sequence
    • BioBrick Author
    • BioBrick Category
    • BioBrick DNA Status
    • BioBrick Short Description
    • BioBrick Assembly information (subpart + superparts from BLAST queries within Biopython script)

Project resources