BioSysBio:abstracts/2007/Matthew Pocock

From OpenWetWare
Jump to navigationJump to search
  • Add or delete the sections that you require.

Fluxion and SBO: Integrating systems biology data

Author(s): Matthew Pocock
Affiliations: Newcastle University
Contact:email: matthew.pocock@ncl.ac.uk
Keywords: 'data integration' 'SBO' 'semantic web'

Background/Introduction

To effectively engage with systems biology, it is necessary to bring together techniques and data from a wide range of disciplines. There are many potential sources of relevant data, ranging from un-structured text in publications, queryable web-pages through to legacy flat-file text and relational databases. Each source of data has its own schema, and often any two schemas will be incompatible. Even if the structures of the schemas are similar, the interpretation of the data captured in them may be different. All of these impedance missmatches combine to make the task of integrating the data relevant to systems biology very time-consuming. To do it well, the integrator must have deep cross-disciplinery understanding, and the ability to translate knowledge from the original domains into that of systems biology.

Throughout bioinformatics, these kinds of issues are being addressed by a combination of common data formats, domain-specific nomenclatures, and applications as services. In systems biology, there are now interchange formats for capturing the properties of simulations (e.g. SBML) and related terminology (e.g. SBO).

Semantic web technologies provide a means to capture what is known and not known about a domain in a format that can then be reasoned about automatically. This moves the responsibility of the researcher from developing code that does the integration to describing the concepts they wish to learn about and how the integration should be performed.

As part of ComparaGRID, with the aim of integrating comparative genomics data, we have developed a semantic web data integration architecture called Fluxion. Fluxion is a web-service architecture for publishing and querying data exposed in the web ontology language (OWL). Multiple underlying data sources, such as relational databases, are published using the OWL language directly. Then a second layer of services transform these database-specific OWL languages into the domain ontology. A third layer of services combine multiple sources into a single, virtual warehouse of integrated data. The knowledge required to map from a database OWL language into the domain ontology is captured in the Fluxion rules language, called Runcible. This is to OWL what XSLT is to XML, in that it pattern-matches concepts in the source ontology and populates concepts in the target ontology by filling out a template or skeleton statement.

We are applying Fluxion to the systems biology domain by exposing a number of domain-relevant data sources, including kegg and biomodels, and transforming them into a domain ontology, adapted from SBML and SBO. The resulting semantically-integrated knowledge-base will allow us to answer questions that can not be addressed by any single data source. By using the power of the ontological model, we are able to use information from multiple sources to more accurately classify the original data, allowing more informative models to be built.

The ComparaGRID project is hosted at http://www.comparagrid.org, and the sourcecode for Fluxion can be browsed at http://deanmoor.ncl.ac.uk/websvn/listing.php?repname=fluxion&path=%2Ftrunk%2F&rev=0&sc=0