The BioBricks Foundation:Standards/Technical: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
mNo edit summary
Line 1: Line 1:
[[The_BioBricks_Foundation:Standards/Technical/Resources | Technical Resources]]
= Data Exchange Standards =
= Data Exchange Standards =



Revision as of 08:41, 7 February 2008

Technical Resources

Data Exchange Standards

This working group aims to define standards for the description of biobricks and formats / technologies for the exchange (or networking) of biobrick-related data.

This falls into the following questions (Discuss and answer!):

1. What is a Biobrick?

2. What is the data model needed to describe a biobrick?

3. What is the best format / technology for exchange?

What is a Biobrick?

Definition

A final definition is beyond the scope of this group. For data exchange purposes we adopt the following draft:

  • BioBrick™ are standard DNA parts that encode basic biological functions. see BBF home
  • A BioBrick has a unique DNA sequence.
  • Basic parts are defined by this DNA sequence.
  • Composite parts are defined as "sequence" of Basic BioBricks, along with intervening "scar" sequences.

Issue: closely related BioBricks

(Mac) should there be a one-to-one relationship between a part 's functional definition and its sequence? What if you introduce a silent mutation into a BioBrick - is there a "different sequence, different part" doctrine, even if the two are functionally equivalent? ... Is this a source code vs. compiled code issue?

(Raik) We right now seem to follow the unspoken rule that a part is defined by its exact DNA sequence. Any modification creates a new part, which is kind of logical to the experimentalist because it maps a biobrick to exactly one DNA fragment (which you either have in your freezer or not) and vice versa. Options:

  • keep/fix the sequence-based definition but introduce relations like "ortholog to", "equivalent to", etc.
  • define "reference biobricks" and link variants to them
  • find a more abstract definition ... and create the concept of BB 'implementation' or 'instance'.

(Mac) Perhaps we could do both? Assuming a biobrick always has one and only one dna sequence, perhaps we could build the data model to support organizing biobricks into families or sets of functionally related parts? Each family could have one canonical biobrick associated with it that works, is available, and exemplifies the function that the family is supposed to have.

Issue: BioBrick formats

(Raik) You can have the "same" Biobrick in different formats, e.g. with prefix/suffix from one of the two suggested protein fusion formats. Now the sequence is exactly the same, but having a sample of biobrick X with biofusion flanks may be of no use if the other biobricks in you freezer are formatted differently. Does a different prefix / suffix create a different biobrick (to the assembling experimentalist in the lab it does, to the user of gene synthesis it doesn't really)?

What is the data model needed to describe a biobrick?

Once the data model is firmly in place, the format should follow as the one that best implements that data model. For example, if we settle on an RDF-like 'everything is a relationship triplet' approach, then some format that can handle these triplets would be most appropriate. In addition, with a model like this, there are XML-based and more human-readable formats that can both implement the model equally well.

I think that tying our selves to a format too early will make us not have a clear model in mind, and will cause us to hack up the format. It is best to do model, then format.

So things to think about in a model are what type of relationships to we want to convey?

  • Inheritance (where was a particular part derived from, and by who = link + data)
  • Characterization (something quantitative about that part by itself = data)
  • Plays well with others (what other parts can this one interact with - with possible data associated with this interaction = link + data)
  • ...


What is the best format / technology for exchange?

Suggestions

Please fill in these sections with details

create a new XML format

adapt existing CellML, SBML XML formats

create a custom file format

use Turtle/N3 notation for semantic web documents

Example of N3

I somewhat share the reservation about completely new file formats, but the non-readability and general nastiness of XML is also an issue. A good solution, IMO, would be to use the Notation3 format developed by the semantic web folks. It is concise, human-readable and editable (i used it myself some years ago) *AND* is equivalent to XML. That means there is a well defined translation back and for and many libraries and tools do the conversion. Being semantic web, it also solves the linking problem (everything is a link).

Quick Example:

# shortcut definition for frequently used ressources ...
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@prefix bbf: <http://biobricks.org/ontology/1.1/>.
@prefix bbf: <http://harvard.edu/registry/parts#>.

# define a biobrick hosted at this address
:BBa_0001
       rdf:type        bbf:biobrick;
       bbf:sequence    "AAACCCGGG";
       bbf:similarTo  [:BBa_0003, harvard:BBa_J1000, :BBa_00010].

# add information to biobrick defined elsewhere
harvard:BBa_J1000
       rdf:sameAs      :BBa_0999.

OK, one can argue about human-readability but it's at least possible to understand and edit these documents (and much better than the equivalent xml).