BBF RFC 31
Vincent, 16th May 2009
it is a good surprise to read about some new PoBoL development. I fully agree with the authors that the Synthetic Biology community needs an open and standardized data model to represent and exchange BioBricks. However, I need to be convinced that this brand new RDF schema is the best way to engage with the rest of the SynBio community, as well as the wider scientific community that works with existing DNA sequence formats.
Please find below some comments about the proposal. I hope they are clear and useful.
> related to [6. Motivation]
What about compatibility with existing DNA sequence standards, and their respective database/tools ? I understand that SynBio will require some specific features, but is it really required to start from scratch. Defining a standard that would extend on previous standards might help us to avoid reinventing the wheel, as well as reaching out other communities (in terms of people and software solutions / infrastructures). At least, I would be interested to read about the authors' reasons not to consider, at all, any of the existing standards to represent DNA parts.
I would also find useful if the authors could describe two or three relevant scenarii, where such a data format would be required.
*Michal Galdzicki 14:20, 25 May 2009 (PDT): Establishing a relationship with other standards communities is a really important idea. The MIBBI project seems especially relevant in this case. However, the DNA sequence formats you link to above are well established broad use standards not research domain specific models. Some of these are powerful due to their simplicity (Plain sequence format, FASTA format), but are complex nonetheless as they can contain a lot of information in an implied description (for example FASTA), which must be interpreted correctly for maximal payoff. I hope that using an OWL based solution will help us make this kind of information explicit in PoBoL itself. The other formats are well known repository and software standards (EMBL, IG, GCG, GenBank) which we could try to extend, but we would limit ourselves to a nucleic acid point of view, and would have only satisfied the supporters of the standards. One of the most important aspects of PoBoL should be to build this data model or language as a community with a sense of buy-in from as broad a network of researchers as possible (as long as a something concrete is still achieved). :) Scenarios illustrating the need for PoBoL over a "classic" DNA format could be (I'm making these up right now, so they're a bit terse): 1. The need for describing a relationship between a stretch of DNA sequence composed of two DNA sequences using Assembly Standard 10; 2. How the one DNA sequence relates to two Samples containing some amount of DNA molecules which encode that sequence, and 3. Differentiating between the record for DNA sequence that is theoretically possible, but hasn't been constructed yet and one that has the same potential but I can use it from "Sean's" freezer. Answering this comment stirred up the issue that PoBoL does not yet specify what characters can be used to specify the DNA Sequence string, and maybe using FASTA within that data property may be a good idea.
> related to [10.1.1 Class BioBrick]
Is there a concept of unicity for a BioBrick ? Or is it accepted for the proposed standard to have duplicates ? For example, two Biobricks with same DNASequences, same Format, but different ShortDescriptions ? Also, if unicity is required, at which level ? in the same lab or in the all world ? Would it be useful to consider unique identifiers ?
"The BioBrick class May be extended at any time". This built-in flexibility might be difficult to deal with when people try to practically implement the schema.
What about sequence annotations ?
> related to [10.1.3 BioBrickBasic]
Quick clarification: Let's say that I use direct DNA synthesis to get a 5kb (4 genes) metabolic pathway, with prefix/suffix chosen to satisfy a particular BioBrick standard (+ not incompatible restriction sites in the 5kb). From what I understand, this would constitute a BioBrickBasic instance, no ?
> related to [10.1.5 BioBrickFormat]
Recombinant DNA is a method amongst others to put together 2 pieces of DNA. For example, in vitro recombination could very well become a popular way of physically assembling DNA (no resulting scar). It looks like this proposed scheme only considers Recombinant DNA-type assemblies. Is it a limitation ? Is it ok ? Or are we saying that pieces of DNA using homology recombination for assembly will never be considered as BioBricks ?
At the end, if this proposal is restricted to BioBricks, as opposed to generic "DNA-parts" (or assembled DNA sequences), I would say that it is a limitation of the scope to accomodate future genetic circuit assemblies.
> related to [Class Sample]
What if the sample is a PCR product (linear DNA, no vector) ? How would you distinguish between a mini-prep in buffer, dry DNA, or a stab ?
At the end, I am not sure that this type of information is very useful. I would prefer to see a community agreement on key attributes before getting into those details that are more relevant to a Laboratory Information System.
> General comments
Without denying the descriptive power of RDF, I feel that using a RDF framework, at this stage, might prevent a majority of people within the community to engage with this important process of describing essential features of "DNA-parts". I would prefer to see a "Minimum Information Required for the description of a DNA-part" discussion before getting into a specific knowledge representation, such as RDF.
- See MIBBI: Minimum Information for Biological and Biomedical Investigations
- See The minimum information about a genome sequence (MIGS) specification
- A possible way to work toward a DNA-part format could be:
- Step 1: Get the community to agree on a Minimum Information Required document
- Step 2: Generate a Data Model (UML)
- Step 3: Create a proof of concept implementation with associated software tools to validate/read/write the standard (in RDF for example)
Could the authors comment on the impact of such information model on the current MIT registry, and on future part registries ?
Characterization of Biobricks is one of the highest priorities for our community. How the authors suggest to integrate this new type of information in PoBol ? Will it be part of the standard, or will it require a different system ?