Implementing the FuGE Object Model: a Systems Biology Data Portal and Archive
Author(s): A. L. Lister, A. Jones, O. Shaw, A. Wipat
Affiliations: Newcastle University (A. L. Lister, O. Shaw, and A. Wipat), University of Manchester (A. Jones)
Keywords: 'Functional Genomics' 'Data Standards' 'High-Throughput Experiments' 'Databases'
Purpose of the CISBAN Data Portal and Archive (DPA)
The Centre for Integrated Systems Biology of Ageing and Nutrition (CISBAN) has developed a Data Portal and Archive (the CISBAN DPA) to allow the archiving, storage, and retrieval of the raw data produced by experimentalists in the Centre. A central data repository prevents deletion, loss, or accidental modification of primary data, while giving convenient access to the data for publication and analysis. The CISBAN DPA will also provide a central location for storage of metadata for the high-throughput (HT) data sets, and will facilitate subsequent data integration strategies.
Many journals and research bodies already require gene, genome, protein and transcriptome data to be submitted according to established industry standards. FuGE contains a model of samples, protocols, instruments, and software, and provides extension points for the creation of technology-specific data standards. The microarray and proteomics communities have already adopted the FuGE standard. As the CISBAN DPA primarily stores microarray, proteomics, and imaging data, it is logical to utilize FuGE. CISBAN has implemented the Functional Genomics Experiment (FuGE) data standard for metadata storage for all HT data sets.
The CISBAN data portal consists of a lightweight web front-end (accessed via the CISBAN website, http://www.cisban.ac.uk) and an Oracle 10g database back-end for storing selected HT data created within CISBAN. The front-end is a web form based on the FuGE Object Model (OM), and is the main entry point for experimentalists within CISBAN to store and archive raw data and associated metadata. The back-end is a modified implementation of the latest release of the FuGE STK.
Further development was required to enable the implementation to handle large raw data sets, and to produce an easy-to-use, easy-to-implement web front-end. An additional blob store (for the raw experimental data) was integrated into the database schema created by the STK, and a web application writing/reading FuGE XML to/from the database was created.
From requirements-gathering sessions between bioinformaticians and experimentalists within CISBAN, the qualities deemed most important in the CISBAN DPA were ease-of-use and, more importantly, speed-of-use. CISBAN researchers currently require only a few sections of the exhaustive FuGE standard, therefore while the DPA back-end is capable of storing everything modelled in the OM, the front-end provides a simpler view. Many sections of the front-end application are pre-filled with information that CISBAN bioinformaticians have gathered based on known instruments and methodologies of the CISBAN laboratory groups.
Emerging standards are constantly being developed and announced. The importance of utilizing standards in database and application development is growing as stringent, and standardized, format requirements for published data are set by publishing and research bodies. Capturing CISBAN data in a standard format is an essential part of simplifying integration efforts with external data sources, and makes the systems-level approach to research, the foundation of the Centre's work methods, easier.
The success of the CISBAN DPA and its availability for testing and download (as a modified implementation of the FuGE STK) shows how straightforward and practical the use of the a standard can be. Future work will include expansion of the web front-end to allow full input of all FuGE sections, and more complex search functions. However, the default view will always remain simple and clear. The option to write complete FuGE documents will be present, but the ability to produce simple documents for basic use will remain the main focus.
- Jones AR, Pizarro A, Spellman P, Miller M, and FuGE Working Group. . pmid:16901224.
- Brazma A, Krestyaninova M, and Sarkans U. . pmid:16847461.
For further help on editing, please see [ http://en.wikipedia.org/wiki/Help:Editing here]