The BioBricks Foundation:Standards/Technical/Exchange/Core Data Model

= Overview =



The core data model covers the low-level description of DNA constructs. The definition of a "Part" is at the heart of the model. Parts can be combined with Vectors (which are a sub-class of Part) into the description of a full DNA molecule (for example a complete plasmid). This DNA molecule can then be associated with a DNA- or cell-stock sample. Note, the image is not showing all data fields of every class.

= Definitions =

Part
A part is a building block for synthetic biology. At the moment, we are mostly concerned with the DNA-level description. DNA parts MUST map to a continuous stretch of DNA. Multiple disconnected segments have to be broken up into separate parts. Prefix or Suffix sequences from a certain assembly standard are NOT belonging to the part. They are taken care of by the Vector object.

Informally, a part can be either "basic" or "composite". The sequence of composite parts is a concatenation of smaller sub-Parts -- often with intervening "scar" sequences from assembly reactions. However, we do not define special sub-classes for basic and composite parts. Whenever available, the sub-part composition should be described in addition to the plain text sequence.

We later also want to represent RNA or protein parts and we need to represent the relations between them. For example, several different DNA parts may translate to the same protein part. How we exactly do this remains a matter of discussion. My (Raik's) suggestion is to introduce different "Description levels" (i.e. sub-classes of parts) and allow multiple inheritance between part objects. For example, a part with the beta-lactamase DNA sequence would be of "Type" DnaPart (description level) and would be the child (inheritance) of a ProteinPart that describes the amino acid sequence, structure and activity of this beta-lactamase enzyme.

Fields

 * name -- string [1], common name of the part


 * shortDescription -- string [0..1], very brief description (less than 100 characters) for display in tables and lists


 * longDescription -- string [0..1], detailed human readable description


 * author -- Person object(s) [0..n], (rename this to the equivalent foaf:maker from the FriendOfAFriend vocabulary?)


 * ?? owl:type -- pointer to class [1..n], which is used as description level


 * dnaSequence -- string [0..1], only applies to DNA parts


 * partSequence -- PartSequence object [0..1], describing the sequence of sub-Parts and scar sequences from which one can construct this part. This should be a sequence of "basic" parts that can not be de-composed further.


 * feature -- pointer to a SequenceAnnotation object [0..n], which links regions of the sequence to a GeneBank-classified functional description (work in progress).

Missing bits and pieces
A part should ideally be described independent of assembly formats. However we should probably introduce a 'compatibleWith' or similar field that keeps track of assembly issues like incompatible restriction sites within the sequence.
 * Assembly Format

We should consider using the RDFa vocabulary for user reviews -- it's supported by a large industry consortium including Google.
 * Keeping track of rating and experiences:

I suggest using multiple part inheritance for grouping parts into families and categories
 * Categorization of parts:


 * Characterization, experimental data, systems biology models, ...

Vector (Part)
A sub-class of Part describing plasmid backbones.


 * Michal Galdzicki 14:40, 24 August 2009 (EDT): Consider renaming to "VectorBackbone" as the whole circular plasmid is often referred to as the vector.

AssemblyFormat
Description of a (more or less) standardized DNA assembly format. Usually this includes certain prefix, suffix, and scar sequences. According to our naming convention (BBF RFC 29), assembly formats are identified by the BBF RFC that describes them.

PhysicalDna
A light-weight container that connects a part (insert) with a plasmid backbone (vector) in which it is contained. Note: DNA samples can also come linear without a plasmid backbone (primers, digestion products) and plasmid backbones can be shipped without part inserts.

DnaSample
Description of a tube or well containing a certain DNA molecule (usually a plasmid) -- either as naked DNA or within a stock of bacterial cells.

Cell
A strain of bacteria, yeast, or any other homogeneous cell population.

PartSequence
work in progress ...

SequenceAnnotation
work in progress ...