BBF RFC 31
Vincent, 16th May 2009
it is a good surprise to read about some new PoBoL development. I fully agree with the authors that the Synthetic Biology community needs an open and standardized data model to represent and exchange BioBricks. However, I need to be convinced that this brand new RDF schema is the best way to engage with the rest of the SynBio community, as well as the wider scientific community that works with existing DNA sequence formats.
Please find below some comments about the proposal. I hope they are clear and useful.
> related to [6. Motivation]
What about compatibility with existing DNA sequence standards, and their respective database/tools ? I understand that SynBio will require some specific features, but is it really required to start from scratch. Defining a standard that would extend on previous standards might help us to avoid reinventing the wheel, as well as reaching out other communities (in terms of people and software solutions / infrastructures). At least, I would be interested to read about the authors' reasons not to consider, at all, any of the existing standards to represent DNA parts.
I would also find useful if the authors could describe two or three relevant scenarii, where such a data format would be required.
*Michal Galdzicki 18:04, 25 May 2009 (PDT): Establishing a relationship with other standards communities is a really important idea. The MIBBI project seems especially relevant in this case. However, the DNA sequence formats you link to above are well established broad use standards not research domain specific models. Some of these are powerful due to their simplicity (Plain sequence format, FASTA format), but are complex nonetheless as they can contain a lot of information in an implied description (for example FASTA), which must be interpreted correctly for maximal payoff. I hope that using an OWL based solution will help us make this kind of information explicit in PoBoL itself. The other formats are well known repository and software standards (EMBL, IG, GCG, GenBank) which we could try to extend, but we would limit ourselves to a nucleic acid point of view, and would have only satisfied the supporters of the standards. One of the most important aspects of PoBoL should be to build this data model or language as a community with a sense of buy-in from as broad a network of researchers as possible. ... As long as a something concrete is still achieved. :) Scenarios illustrating the need for PoBoL over a "classic" DNA format could be (I'm making these up right now, so they're a bit terse): 1. The need for describing a relationship between a stretch of DNA sequence composed of two DNA sequences using Assembly Standard 10; 2. How the one DNA sequence relates to two Samples containing some amount of DNA molecules which encode that sequence, and 3. Differentiating between the record for DNA sequence that is theoretically possible, but hasn't been constructed yet and one that has the same potential but I can use it from "Sean's" freezer. Are DNA sequence and DNA part synonymous in the context of this comment? If they are not it is confusing. If so please point me to the standards for DNA parts. Otherwise, point taken, and the next revision should include a discussion of DNA sequence formats and how they relate to this effort. Answering this comment stirred up the issue that PoBoL does not yet specify what characters can be used to specify the DNA Sequence string, and maybe using FASTA within that data property may be a good idea.
**Vincent 05:37, 6 June 2009 (EDT): From my understanding, the BioBrick space is a subset of the DNA sequence space due to the constrains that a BioBrick has to comply with. In other words, a BioBrick is a DNA sequence but it is not always true for the contrary. As I tried to explain before, what I am concerned about is the development of PoBoL without any considerations from the existing landascape when it comes to describe a DNA sequence. It could result in the isolation of BioBrick-related information/community with regard to the rest of the DNA sequence information/community. I would really like to see this type of discussion integrated in this RFC.
> related to [10.1.1 Class BioBrick]
Is there a concept of unicity for a BioBrick ? Or is it accepted for the proposed standard to have duplicates ? For example, two Biobricks with same DNASequences, same Format, but different ShortDescriptions ? Also, if unicity is required, at which level ? in the same lab or in the all world ? Would it be useful to consider unique identifiers ?
"The BioBrick class May be extended at any time". This built-in flexibility might be difficult to deal with when people try to practically implement the schema.
What about sequence annotations ?
*Michal Galdzicki 18:04, 25 May 2009 (PDT): "Unicity", sounds like it is hard to enforce. Are you implying a specific benefit, to its establishment? I can see advantages for the individual laboratory as a matter of practicality for aiding in selection of the "F2620 that worked last winter for Bob". It would be nice on a global scale, but anything beyond exact matches bringing up a consistency error locally could be restrictive. We would need the BBF to provide authoritative unique IDs upon verification of unicity and be sure of continued support. Unique IDs could be provided by a authority like BBF or representative body. I think in the meantime using appropriate namespaces for locally created data instances could be ok for a while until we have something more complicated than a few installations "trying" to use PoBoL. It is absolutely necessary for anyone to feel like they can experiment with extending or even changing PoBoL. If the idea is valuable to the community it will be adopted broadly. Knowing this ahead of time, should help developers to expect and anticipate changes in the data model. From my own experience, I known how difficult it is to keep up with the evolution of a schema in the research domain context. I hope that the careful design of this approach will allow for the changes to account for unanticipated developments. In some more philosophical way PoBoL is sequence annotation. (the kind needed for ideas about BioBricks...)
**Vincent 05:53, 6 June 2009 (EDT): I guess we have to consider that we have multiple BioBrick repositories. To some extent, they might end-up having different information on the same BioBrick parts (same DNA sequence), from characterisation for example. In that case, you might be interested to aggregate the information known about a given biobrick across all the publicly available repository. You could always BLAST but it doesn't seem very elegant. I agree that a unique ID is difficult to enforce, but we should have a look at other standards to check how they have approach the problem. Not sure to understand your point about my annotation-realted question. How would you annotate a BioBrick-sequence with PoBoL ?
> related to [10.1.3 BioBrickBasic]
Quick clarification: Let's say that I use direct DNA synthesis to get a 5kb (4 genes) metabolic pathway, with prefix/suffix chosen to satisfy a particular BioBrick standard (+ not incompatible restriction sites in the 5kb). From what I understand, this would constitute a BioBrickBasic instance, no ?
*Michal Galdzicki 18:04, 25 May 2009 (PDT): Short answer: yes The "basic" of BioBrickBasic was intended in terms of construction from other BioBricks themselves, no other hidden meaning. But this is obviously limited in value in the bigger world where really its hard to expect a no other types of compositions, or sequence manipulation not used for combining parts.
**Vincent 06:17, 6 June 2009 (EDT): Ok, thanks for clarifying. It might be useful to think about the scenario where someone would come along to 'Biobrick' this 4-genes metabolic pathway into its individual components (promters, RBBs, ORFs, terminators). How the information would be updated to ensure consistency.
> related to [10.1.5 BioBrickFormat]
Recombinant DNA is a method amongst others to put together 2 pieces of DNA. For example, in vitro recombination could very well become a popular way to physically assemble DNA (no resulting scar). It looks like this proposed scheme only considers Recombinant DNA-type assemblies. Is it a limitation ? Is it ok ? Or are we saying that pieces of DNA using homology recombination for assembly will never be considered as BioBricks ?
*Michal Galdzicki 18:04, 25 May 2009 (PDT): I gather from this: 1. make Scars optional 2. The "BioBrickFormat" is more precisely the potential property for being manipulated by more then one method. () **Vincent 06:17, 6 June 2009 (EDT): More than the Scar, it also mean that Prefix and Suffix are also optional.
At the end, if this proposal is restricted to BioBricks, as opposed to generic "DNA-parts" (or assembled DNA sequences), I would say that it is a limitation of the scope to accommodate future genetic circuit assemblies.
*Michal Galdzicki 18:04, 25 May 2009 (PDT): This is one of the critical issues with the current idea, it has to change. Alec (one of the co-authors) has also discussed this issue in detail in one of our meeting while preparing the RFC. To solve this problem we will have to separate the DNA sequence property from the BioBrick class, and acknowledge/ make possible the description of DNA-parts which are not BioBricks. This is something that I asked we put off until after RFC31, because it was not mentioned at the Standards Workshop in 2008, and would drastically change the proposed standard. (not to mention its potential influence on the PoBoL faux-acronym itself) Additionally, your comment about the potential for assembly using homologous recombination sequence requires careful attention, it brings up the question of whether the BioBrick itself must be in a physical composability standard or just have standardized behavior.
**Vincent 06:17, 6 June 2009 (EDT): It is likely that the issue of physical DNA assembly is going to move fast over the coming years. I would find essential to ensure that an information-based standard established to represent physical DNA assembly is generic enough to cope with these changes before an agreed method or agreed methods. Otherwise, it could severely restrict its adoption by the community. What do you exactly mean by "standardized behavior" ? Does it relate to BioBrick experimental characterization ?
> related to [Class Sample]
What if the sample is a PCR product (linear DNA, no vector) ? How would you distinguish between a mini-prep in buffer, dry DNA, or a stab ?
At the end, I am not sure that this type of information is very useful. I would prefer to see a community agreement on key attributes before getting into those details that are more relevant to a Laboratory Information System.
*Michal Galdzicki 18:04, 25 May 2009 (PDT): Note that you can create BioBrick instances and not create the samples, so if you chose not to use it, its ok. Others may want to to create the relationship to the concepts used at the bench. I agree this class needs to be expanded to include some of those considerations. This RFC is one mechanism by which we hope to ask for input from the community. Could you say which concepts, you believe do not belong in a "core" PoBoL, but should be saved for a LIMS? I'm not completely sure, but Raik's BrickIt may serve the function of a LIMS? PoBoL will probably have to expanded and the "core data model" established in context and contrast to some greater set of possible PoBoL classes. This sort of modularity is what I hope for as each focus will need its own special considerations. Maintaining consistency in the face of a complex set of needs is why this is challenging.
> General comments
Without denying the descriptive power of RDF, I feel that using a RDF framework, at this stage, might prevent a majority of people within the community to engage with this important process of describing essential features of "DNA-parts". I would prefer to see a "Minimum Information Required for the description of a DNA-part" discussion before getting into a specific knowledge representation, such as RDF.
- See MIBBI: Minimum Information for Biological and Biomedical Investigations
- See The minimum information about a genome sequence (MIGS) specification
- A possible way to work toward a DNA-part format could be:
- Step 1: Get the community to agree on a Minimum Information Required document
- Step 2: Generate a Data Model (UML)
- Step 3: Create a proof of concept implementation with associated software tools to validate/read/write the standard (in RDF for example)
*Michal Galdzicki 18:04, 25 May 2009 (PDT): Getting into MIBBI sounds like a good idea, but I am a proponent of moving forward on all fronts. If there is someone who could help with UML, I would welcome that with open arms. However, we need a working example of how PoBoL is useful to demonstrate the value (if we saved someone money that would be great, but I hope we can save them time first and that can serve as a really powerful motivation to adopt) Howabout, "Minimum Information for Synthetic Biology" MisB? :)
Could the authors comment on the impact of such information model on the current MIT registry, and on future part registries ?
*Michal Galdzicki 18:04, 25 May 2009 (PDT): I hope that PoBoL will somehow help, and I hope someone who is involved in the MIT registry would help with PoBoL.
Characterization of Biobricks is one of the highest priorities for our community. How the authors suggest to integrate this new type of information in PoBol ? Will it be part of the standard, or will it require a different system ?
*Michal Galdzicki 18:04, 25 May 2009 (PDT): Yes, PoBoL will take over the world! (I going home for the day now, I'll get back to this interesting question when I have a fresh outlook on it) I didn't want to leave it blank, so I wrote that... I'll erase it later because I can ans its a wiki ;)
Thanks for your input Vincent.