BBF RFC 30

From OpenWetWare
Jump to navigationJump to search

Add your comments to RFC 30 here. See BBF_RFC_20 for an example how comments can be formatted and signed.

Michal Galdzicki, 24/04/09

Raik, This RFC is really helpful. I appreciate the direction it gives for definitions of data formats and your support of RDF/OWL for this purpose. Below I detail some nit-picky comments that came up while reading. I have also included a paragraph which fits as a conclusion. I will consider RFC30 while drafting RFC 31, and that may result in some more comments.

5.2

"Whenever appropriate, extension authors SHOULD re-use definitions from well supported other RDF ontologies"

Consider : "Whenever appropriate, extension authors SHOULD re-use definitions from well established RDF/OWL ontologies, as they constitute standards for other domains science."

5.3

"In any case, a owl:sameAs link SHOULD connect the new standard back to the RDF document of the original proposal."

Consider that owl:sameAs will be interpreted to mean sameAs reciprocally.

Consider another versioning scheme, self defined (active research area I believe)

   *Raik 08:42, 25 April 2009 (EDT):
   I was assuming the BBF would accept the new ontology without changes but may
   prefer to have the elements defined in the BBF name space (which makes life
   easier for everyone). I guess, in this case owl:sameAs would be ok because the
   two copies really are mutually the same. Versioning is indeed a whole different
   issue though... for data *and* for schemata.  

5.4

"The data documents SHOULD be serialized to XML but, depending on the situation, other formats, like Turtle/N3 or JSON MAY be preferred."

Consider adding: " If another serialization is used the format chosen MUST NOT lose or leave out information in the conversion from the original XML serialization."

5.6

"That means a simple HTTP GET MUST serve the document just as it would serve an html formatted web page about it. That also means data access SHALL NOT require the initialization of web services or any other kind of remote procedure calls."

Potentially a technical contradiction

HTTP GET may be technically considered a web service it self. "SHALL NOT require" phrase lacks recommended actions that I should take when interpreting the RFC document. Personally, I agree with the sentiment that I think you are expressing; as a first choice data should be served in a way everyone can use it.

5.6

"Software that consumes Synthetic Biology data records MUST be able to open, parse and interpret RDF documents. The software SHOULD, at least, parse XML-formatted RDF documents. Support of more specialised and readable formats like turtle/N3 is RECOMMENDED."

This is a little confusing:

1. MUST parse RDF (at least one serialization)

a. SHOULD parse XML-RDF

b. RECOMMENDED Turtle / N3

According to RFC 0 'The word "SHOULD" or the adjective "RECOMMENDED" mean that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and care-fully weighed before choosing a different course.' Therefore a = b in objective terms therefore you are not recommending one over the other.

One solution, change to: MUST parse RDF-XML

Another solution, change to say: MAY support Turtle/N3

5.6

"At least in the long term, data stores are also RECOMMENDED to support the SPARQL W3C standard for more complex queries."

SPARQL endpoints are web services, as far as I understand. http://semanticweb.org/wiki/SPARQL_endpoint

Refers back to confusion that may arise from "SHALL NOT require the initialization of web services"

6.1

"Soap" Change to SOAP

*Raik 08:43, 25 April 2009 (EDT):

Thanks a lot for the comments! Looks like I was a bit too quick in pushing this out. I suggest we collect some more and then write a joint RFC that replaces RFC 30.

Douglas Densmore 20:27, 28 April 2009 (EDT)

Disclaimer: I am not very familiar with RDF:)

In general I like the RFC and do have not any concrete reasons why I would not support it. The broad goals of establishing a core data model and a way to share it and extend it are mine as well. Again, I really am a consumer of data at the end of the day. In fact, ultimately I want to get tools built so I can explore the intersection of bio and EECS through a set of much more abstract representations of biological data. I want to make tools that help us get standards in place and I think this is a step in the right direction.

Section by section comments:

4

You mention that "third party RDF data will therefore rarely immediately map into a pre-existing relational database schemas". So is the expectation that folks using a relational database have another layer on top which allows RDF data to populate their database? Would this be the same software that serves up their data as RDF?

   *Raik 02:45, 29 April 2009 (EDT):
   That was the idea. Web frameworks start having such layers built in. Most advanced right now seems to be
   the RDF layer of Ruby on Rails: [1]. There is a similar effort for Django (but not
   yet updated to the new version): [2]. In best case, you would just need
   to activate that layer, create some custom mappings, and your data should be available as RDF. The
   consumption of RDF would probably still need some tweaking.

You mention 4 things needed:

  • RDF definition of core data model - I see that there is an RFC for PoBoL now. We have created one as well (#33). I would be more than happy to capture ours as RDF. I would be interested to see how easy it is for two data models to exist in the RDF space. Your RFC makes it sound as if this is possible and that they could extend each other. Is this true?
   *Raik 02:45, 29 April 2009 (EDT):
   Yes, absolutely. You can mix definitions from different documents (the unique address prevents confusion). And new
   definitions can mix and extend types and properties from many others -- I guess you have to watch out for circular 
   "imports" and other inconsistencies, like in normal programming. RDF also supports multiple inheritance.
  • Some guidelines for extensions - I agree that this is key.
  • Recommendations for data publication and synchronization - I am not clear on what this means. Do you mean the technical aspect of this (e.g. race conditions, data coherency) or do you mean from a "community organization" standpoint?
  *Raik 02:45, 29 April 2009 (EDT):
  Yes, I was more thinking of the technical aspect.
  * synchronization: a description of the RSS data exchange, if this is indeed practical.
  * data publication: mhm... not clear myself now. That needs to be clarified.
  • Software or servers that can read or write RDF - Sign Clotho up!
  *Raik 02:45, 29 April 2009 (EDT): Cool :) !

5.1

"Biobrick" as a term is used a lot. In my brief exposure to biology this seems like a loaded term and one that is not meaningful to all synthetic biologists (not to mention all biologists). Is something more generic warranted? I am all for bioBricks but it seems like biobricks and their assembly can be represented in more flexible schemes as well. I only mention this since some folks are wary of my tools thinking that they only work for folks doing biobricks.

   *Raik 02:45, 29 April 2009 (EDT): Good point.

5.2

Can you only extend the core model (as opposed to remove items from it)? What if fields are not used? Does something still adhere to the core data model in that case? Is there a notion of backward compatibility? Expressiveness? Basically I would like to know how we are going to compare data models and have tools that require more than the data model provides.

   *Raik 02:45, 29 April 2009 (EDT): 
   RDF is by default much more flexible. You can define that a certain type of data MUST have a certain
   property. But, most of the time, you only say that a certain property CAN be associated to a certain
   subject. And as a tool developer, you can always create new property definitions and try to convince
   data providers to use them... Versioning is an open issue though.    


5.4

You state "RDF data documents SHOULD thus be hosted at permanent immutable locations". How robust is this? What if a server goes down? Does RDF have any notion of data distribution or replication? Has anyone built a system, that traverses the RDF "space" and creates mirror sites? Just wondering....

  *Raik 02:45, 29 April 2009 (EDT):
  No idea. Mike?? We will need to live with some broken links though... that's pretty sure.

5.6

Speaking for Clotho, I am fine with the requirements for software and can commit to getting this capability into Clotho this summer.



Right now we are capturing our idea of a data model for RFC 33. I can volunteer to specify it as RDF to get feedback. It would be great if those making a more Pobol based version could do so as well.

Also, if Java code exists to process and use RDF, it would be great if someone could point me at it. Thanks.