Harvard:Biophysics 101/2007/Notebook:Denizkural/2007-3-22

From OpenWetWare
Jump to navigationJump to search

So to provide an example of what I had in mind when passionately arguing that NCBI should give the public Query access, I provide facebook: The Facebook Query Language

Another example to deal with this problem is SPARQL 'The query language for the semantic web', here is a rather rarefied document describing it: [1] which does not even need a 'database' and 'machine access' to do querying. All it needs is good, web-accessible, consistent data.

My main point remains: The NCBI should tinker about how to provide their data in the most flexible, accessible, documented way, and stop trying to second-guess what we might do with it. I am sure many people will develop 'gui layers' on top of this data to serve the specific needs of biologists, who are no longer confined to a method/gui of access NCBI deems appropriate.

Note that providing data in an unfettered way would absolve the NCBI of engaging in format and ontology wars as well - let people structure it as they wish, and upload 'certificates'. i.e. "this sequence upload is in line with Ant Society of New Zealand Annotation Standards"


1. The format and availabilty of data from NCBI is a peripheral issue with respect to our class project. Perhaps you have valid suggestions for improving NCBI, but I'd encourage you to instead focus your passion on things that are more immediately relevant to the success and broader impact of our efforts.

2. As an end-user, I personally don't find the NCBI interfaces inflexible or in drastic need of improvement. You may consider reading the NCBI Handbook to learn about what types of data are available and how they can be accessed, and perhaps get some historical perspective. If you actively use their resources and have questions or suggestions, you can contact user services directly.

3. The utility and accessibility (i.e. machine readability) of a data set is often directly proportional to its conformity to a well-defined specification. You should come to understand this firsthand as you attempt to parse OMIM records. Another example is where NCBI describes the flow of submitted sequence data. The alternative of remaining completely agnostic with respect to data formats (even with submitter self-certification, etc.) quickly creates such a high barrier for end users (mostly non-programmers) to extract meaningful information from a non-uniform data set, effectively rendering the database useless.

smd 14:49, 22 March 2007 (EDT)

Hi Shawn, of course I am very much learning a lot of this stuff right now, thanks for the pointers, it is greatly appreciated. I will take your advice and focus my efforts on the OMIM project, and clearing the backlog. Thank you for the input!