OpenWetWare talk:Software/Flexible Science Databases

This is the discussion page for Flexible Science Databases. Below you will see the original discussion that sparked the creation of this wiki page. Please, please, please comment below - all of your feedback will eventually form the application we have all been waiting for!

Flexible Science Databases with OWW

 * Lucks 17:45, 24 March 2006 (EST): I recently came across the desire/need for a flexible database system in my research (bioinformatics). For example, I want to know about phage gene expression.  Not knowing enough biology to perform an efficient literature search (coming from a physics background), I have made a little database that allows me to input author name, citation info, abstract, and some relevancy measure while I search in the literature.  (For the specific implementation, please visit PubMed Searches for Phage Gene Expression.) The point that this little example illustrates is the creative use of small, easily customizable databases in scientific investigations.  Accumulation of litearature search results is one example (you can see that common citation managers like EndNote are just specific, albeit not very flexible views of a literature database), but you can imagine many more, from entering specific observations from experiments, to bioinformatic data collection, ...  The goal is some flexible framework where it is easy to create and modify databases on the fly, with nice user interfaces for database entry. Which is why I bring this discussion to OWW.  There are a couple of ways to do this.  The easiest method (almost available) comes from Wufoo (used in the example above).  I have been using this recently with a beta-key and it has many nice features, but I think something could be made that is more geared towards scientists (and when released, Wufoo will require a monthly fee).  I am wondering if this could be done in the framework of a wiki such as OWW.  I imagine a scenario where through template features, users could easily define fields in a database in wikimarkup.  Once this is done, pages would be generated to allow data input, as well as retrieval (dumps in CSV format, or even automatically formatting a wiki table ...) This idea is more along the lines of the Science 2.0 article, but I think it would be very useful, could be integrated into many aspects of scientific investigation, and could be implemented in something like OWW.
 * Smeister 07:58, 25 March 2006 (EST): If I understand you right - you are trying to implement database functionality in OWW - this is the direction I always thought and hoped projects like OWW will move into, eventually - after all, almost everything in science has to do with huge amounts of data... Flexible online databases, customizable and maintained by a big group people with little computer knowledge - but expert knowledge on the subject covered by the database - would add an enormous value to OWW, for sure. One can also envision some sort of peer evaluation for the validity of the entries (i.e. something along the lines of what they use for stories and comments, over at digg.com). That's a huge project, though...
 * Lucks 10:57, 25 March 2006 (EST): While a big project, with some discussion, it could be possible to come up with a few hacks that would be relatively easy to implement so that the idea could at least be tested. I have some experience with running a mediawiki, but I have never looked into the code.  Alternatively, some software could be developed external to OWW, but linked so that new database entreis are automatically inselted into pages in OWW - a lot of alternatives here.  I have put together a database web interface before with Perl-CGI (using CGI::Builder), which works great (see the Pica Literature Database), but requires the user to make an HTML form with limited, but ugly Perl HTML:Template variable references - basically not very user friendly at the initial setup.  I am now learning a little bit about Ruby on Rails, which is fantastically flexible for this sort of thing. (I am pretty sure this is the framework that Wufoo uses.)  I can imagine a wiki design page that allows someone to make a form design via wiki-markup, which then gets fed to a rails program that creates the database, and then passes entries back to the wiki.
 * Lucks 18:42, 26 March 2006 (EST): I am not sure how the templates in Mediawiki work, but perhaps a user could define a database layout by designing a template page. A new record in the database would then be created by making a new page (with some systematic page name) using this template.  There could be some facility for conglomerating all the records together to provide a global view of the dataset, and possibly a database dump for further analysis elsewhere.  Not that mediawiki would be the best end-solution, but it might be an easy testbed to vet out the idea in general.  This also keeps everything within OWW.
 * RS 13:54, 25 March 2006 (EST): I think this is an interesting idea. Why don't we discuss it at the next Steering committee meeting from 4p-5p on April 3.  Can you both make it (i.e. attend in person or phone in)?  Just to be clear, I think we should also discuss it on the wiki, I just thing we might want to discuss it in realtime as well.
 * Lucks 18:08, 25 March 2006 (EST): Fantastic - I think I can make the meeting in person.
 * Smeister 04:29, 27 March 2006 (EST): I already wanted to join the last time around, but the time difference is a bit of a problem for me...
 * RS 19:00, 29 March 2006 (EST): Sorry about this. It can be a bit of a problem scheduling a time that works for everyone.  Regardless, we'll post meeting notes on the wiki.

Comments
Just a quick thought. Check out Jotspot if you haven't already. It may give you some ideas. --Gtaylor 09:03, 13 August 2006 (EDT)

Nevermind. They've changed their business model and feature set. It's not applicable anymore. --Gtaylor 16:35, 16 August 2006 (EDT)

Investigate Prova?
A really great 'integration platform' (kind of middleware) for building and sharing databases / analysis, models, rules, pipelines, etc., is a software language called 'Prova' http://www.prova.ws/

The name (and the language) is an amalgam of Prolog and Java, allowing complex pipelines to be encoded as simple declarative rules. The nature of Prolog provides a perfect environment for multiple data redundancies to be handled easily, in addition to solving that old chessnut of 'integrating multiple heterogenious datasets'. The underlying basis in Java allows complex analysis to be performed via a natural representation of method calls.

The problem formulation, as I am sure you are aware, discussing flexible science databases as you are, is that although it may be trivial to link two databases, to do so in a meaningful way is a scientific challenge. The scientist is the one who knows what data to combine, what analysis to perform, and the conceptual meaning of the resulting analysis.

Prova allows sources of data, be they SQL queries, API calls or file lookups, to be encoded as simple declarations. Complex analysis is facilitated by Java method calls, but they are additionally declared as rules. Rules can be shuffled into pipelines which are easy to maintain and explore. Finally, the resulting analysis pipeline can be declared as a rule, thereby forming the basis for further analysis. These rules can be published to a resource server for automatic resource discovery and open access to scientific protocols.

What is the use of a rule? Think about rules in the context of a scientific ontology. Imagine that any lab could annotate genes with simple 'is.a' rules... Imagine collecting your dataset of protein sequences from a union of the data provided by the best labs in the world. Imagine the logic of your analysis methods becoming transparent via ontological typing of the input and output variables.

Well... as they say, there are more integration technologies than there are data to integrate. Have a look anyway. --Dan 14:01, 13 February 2008 (CST)


 * Wow... first edit in 2 years! --Dan 14:02, 13 February 2008 (CST)