Abhishek Tiwari:Workflow technology

 Home  About  Resources  Research & Projects  Softwares  Publications  ImpLinks  Contact

=Applications of Workflow technology in Cheminformatics ,Bioinformatics and Drug Discovery=

Workflow System for Mass Spectrometry in Cancer Diagnosis Using the INCOGEN VIBE Software [|Read More]

Workflow technology
About Workflow technology

Workflow technology is a mechanism to integrate data, application and services. Workflow technology enables scientist to dynamically construct their own research protocol for scientific analytics and decision making by connecting various resources and software applications together in an innovative way. Workflow technology is being increasingly applied in discovery informatics to organize and analyze data. SciTegic's Pipeline Pilot is a one of the chemically intelligent implementation of a workflow technology known as data pipelining. It allows scientists to construct and execute workflows using components that encapsulate many cheminformatics based algorithms. Workflow technology is generic so analytics work flow can be built for any areas like gene expression analysis, sequence analysis, proteomics, system biology and so on. Workflow technology provides an interface where software from different vendors can assemble according to scientific requirement.

Data Pipelining & Workflow technology

Increasingly, research organizations are looking for information technology that goes beyond point solutions targeted to specific problems. They want to integrate such solutions to increase the efficiency of the wider R&D process.

This trend can be viewed in terms of the R&D 'workflow': the sequence of tasks, processes, and decisions that are undertaken in completing a research project. Optimize research requires technologies that help to smooth this workflow, enabling data and results to flow more readily from step to step. Such technologies may even enable automation of the workflow. A key such enabling technology is data pipelining.

Data pipelining is a relatively simple concept. Any computational component has data inputs and data outputs. Data pipelining views these components as being connected together by 'pipes' through which data flows.

Workflow technology based Solutions for Chemoinformatics
SciTegic  http://www.scitegic.com/

SciTegic develops enterprise informatics software for the scientific discovery and development industries. It has pioneered a technology called "Data Pipelining", widely used to process drug discovery data with unprecedented flexibility.SciTegic's Pipeline Pilot™ technology 'wraps' computational components and data sources in code that makes them easy to integrate and provides powerful scripting capabilities to enable the construction of flexible workflows. A simple user interface allows non-programmers to connect pre-defined components together and build their own workflows. SciTegic technology is already widely applied in drug discovery, particularly for high throughput data processing applications.Pipeline Pilot streamlines the integration and analysis of vast quantities of data flooding the research informatics world.

InforSense  http://www.inforsense.com/

InforSense KDE (Knowledge Discovery Environment) enables analysts to accelerate their research by accessing and integrating all their tools through the intuitive InforSense analytical workflow interface. In addition, the InforSense TextSense, ChemSense and BioSense are specialized extensions of KDE for literature analytics, cheminformatics and bioinformatics, enabling end users to rapidly create and customize their own cross-domain analytical applications.Built on InforSense KDE, InforSense ChemSense provides a vendor-neutral integrative analytics environment for designing and executing scalable, high performance cheminformatics solutionsVendor neutral environment provides access to tools and cartridges from leading vendors including, Chemaxon, Daylight, MDL, Molecular Networks, Tripos, and more.Applications built using InforSense ChemSense range from the analysis and visualization of chemical libraries to the development of combinatorial chemistry libraries, and include to a wide range of QSAR, ADME-Tox prediction, molecular modeling and evaluation methods.

BioLog  http://www.biolog-tech.com/

BioLib from BioLog is an open-architecture Informatics System that builds on the tremendous recent technical progress in the Biotech and Pharmaceutical Information Technology world to create a unique set of drug discovery IT tools. BioLib is an integrated suite of methods and applications for prototyping and developing Bio-IT solutions. BioLib helps customize existing applications, automate application development pipelines, and support Bio-IT data warehousing. BioLib has a comprehensive Methods Library and a customizable Algorithm Library.The BioLib Suite consists of a Visual Workflow environment that can easily and rapidly integrate and automate the comprehensive internal set of drug discovery applications, along with external applications. The workflow aspect provides flexible and rapid capabilities in building and automating specific pipelines for each drug discovery effort. The Suite also includes a Standard Developing Kit (SDK) with Bio to chemo-informatics unique components and enables rapid development of solutions.

Our Friend is KNIME(Open Source Freeware)  http://knime.org

KNIME (Konstanz Information Miner), pronounced [naIm], is a modular data exploration platform that enables the user to visually create data flows (often referred to as pipelines), selectively execute some or all analysis steps, and later investigate the results through interactive views on data and models.KNIME is available through a dual licensing scheme. A non-profit open source license allows to download, distribute, and use KNIME freely as long as the software or its use is not distributed per profit.KNIME is based on the Eclipse platform and, through it's modular API, easily extensible. Customs nodes and types can be integrated within hours and allow for KNIMEs usage not only in production enviroments but also for teaching and research prototyping.

Workflow technology based Solutions for Bioinformatics
InforSense http://www.inforsense.com

Built on InforSense KDE, InforSense BioSense provides an integrative, interactive analytics environment for designing and executing scalable, high performance bioinformatics solutions ranging from sequence analysis to microarray informatics and remote database annotation. It also provides full integration of Perl and shell scripts and R/Bioconductor and Matlab programs.It give interactive analysis through interactive chart visualization, web-enabled sequence annotation browser, dendrogram viewer, and interactive cluster browser.It has comprehensive and integrated toolbox that addresses sequence analysis, proteomics, expression analysis, and SNP analysis.Extensive classification, clustering, statistical and preprocessing operations can be applied to results from life science tools such as Emboss, ClustalW, FASTA, Blast and more. With integrated features of InforSense ChemSense and TextSense it make a excellent analytical workflow solution.

INCOGEN http://www.incogen.com/

The INCOGEN Visual Integrated Bioinformatics Environment (VIBE) is a state-of-the-art, drag-and-drop analysis workflow management environment.The VIBE system can interface with a variety of environments, including high throughput platforms such as Sun Microsystem's Grid Engine. The rich visualization and data mining environments in combination with the sophisticated server architecture offer the life science researcher a powerful system for data analysis, mining and knowledge discovery. The software can be deployed using one of two architectures: client/server (Workgroup Edition) or client-only (Desktop Edition).VIBE is a client-server program, designed for a multitiered environment, ideally with multiple database servers including a DeCypher query accelerator.It provides implementations of many standard bioinformatics algorithms. A Java API allows users to write their own modules and enables them to be independent of the tool’s underlying data schema. Implementers of new analysis techniques need to extend the four Java classes provided by INCOGEN. VIBE makes extensive use of XML for configuration, data exchange, data storage, and communications.

Science Factory http://www.science-factory.com/iindex.html

Science factory's übertool is a software system for the integration and analysis of molecular biological data. The system provides scientists with over 200 types of bioinformatic methods and enables access to public biological databases and proprietary data including all überTOOL results. Using an integrated programming language, the user can also easily extend the core functions. Interactive browsers for different biological data types and diverse expansion possibilities simplify the collaboration of the various users within the scalable client-server system in a unique way. A graphical interface makes constructing bioinformatic workflows fast and intuitive for bioscientists and bioprogrammers alike. If you can visualize it, you can implement it.

Ptolemy II http://ptolemy.eecs.berkeley.edu/ptolemyII/

Ptolemy a graphical dataflow engine intended for simulation of electrical engineering phenomena, but equally applicable to bioinformatics.Ptolemy II includes a growing suite of domains, each of which realizes a model of computation.Ptolemy supplies a dataflow engine that is designed for a discrete data model, where each separate field from a database query flows through a separate path in the dataflow graph.Currently many scientific workflows systemes are under development based on Ptolemy II system like Kepler

Kepler http://kepler-project.org/

Kepler is an open source cross-project, cross-institution collaboration to build and evolve a scientific workflow system on top of the (also evolving) Ptolemy II system. Ptolemy II was developed by the members of the Ptolemy project at UC Berkeley. Although not originally intended for scientific workflows, it provides a mature platform for building and executing workflows, and supports multiple models of computation.

MIGenAS http://www.migenas.org/home/index.jsp

MIGenAS (Max-Planck Integrated Gene Analysis System) provides an integrated software environment for bioinformatics tools and databases.MIGenAS workflow engine is a integrated bioinformatics toolkit for web-based sequence analysis. MIGenAS facilitates similarity searches in public or user-supplied sequence databases, computation and validation of multiple sequence alignments, phylogenetic analysis, protein structure prediction.MIGenAS allows seamless chaining of different tools into pipelines. There is no need for format conversions or parsing of intermediate results. It supports efficient processing of predefined workflows and offers programmatic access via webservices

Taverna http://taverna.sourceforge.net/

Taverna is a Open source project for bioinformatics workflows. Taverna is GUI-based with workflow engine behind.Effectively Taverna allows a biologist or bioinformatician with limited computing background and limited technical resources and support to construct highly complex analyses over public and private data and computational resources, all from a standard PC, UNIX box or Apple computer.Taverna is a collaboration between the European Bioinformatics Institute (EBI), IT Innovation, the School of Computer Science, University of Newcastle, Newcastle Centre for Life , School of Computer Science at the University of Manchester and the Nottingham University Mixed Reality Lab. Additional development effort has come from the Biomoby project, Seqhound , Biomart and various individuals across the planet. Development is coordinated through the facilities generously provided by SourceForge.net and predominantly driven by the requirements of biologists in the UK life science community.There is built-in support for web services, local Java functions, BioMoby, and Soaplab. The workflow language (XScufl) is completely proprietary, and the implementation is still in the beta release cycle. There is currently little support for passing complex XML parameters and XML transformations.

GeneBeans http://www.uncw.edu/csc/bioinformatics/

The GeneBeans Dataflow Interface is a program that lets you use a graphical language to describe bioinformatics questions you want to ask of a database. You specify a workflow - what actions the computer should take, in what order they should be taken, and how information should be routed.The GeneBeans system uses a three-layer architecture: (1) user interface presentation isolated from (2) a dataflow engine that executes commands on data retrieved from (3) a database (or several federated data sources).

IBM's WsBAW http://www.alphaworks.ibm.com/tech/wsbaw

WsBAW is an application that automates Bioinformatic Analysis Workflow by deploying a Web service. It consists of a JavaTM client application through which users are able to send batch requests to a specific bioinformatic workflow execution engine, such as BioWBI, by using a Web service.

Without this Web service to handle Bioinformatic Analysis workflow, a researcher typically has to manually handle operations repetitively on the same input data or otherwise proceed through multiple sequences of different analytical steps using multiple analysis algorithms. Using WsBAW, researchers can handle the complete research analysis in a batch mode by writing and organizing workflows that include all the operations that otherwise would need to be done by hand.

Workflows then are simply sent to a workflow execution engine: Bioinformatic Workflow Builder Interface (BioWBI). This complementary technology (BioWBI) is an easy-to-use, Web-based working environment from which a life sciences researcher can graphically build and execute bioinformatic workflows and share analysis processes.BioWBI can be used in several ways. It can constitute the basic element of large bioinformatic portals that private or public research groups can activate in order to supply simplified bioinformatic analysis services to end users, even in an on-demand mode. For example, a pharmaceutical company can use this technology to set up a large internal portal for covering the need for bioinformatic workflow analysis related to target discovery and identification, two early phases of the drug discovery process.

Major Problem is you need to register yourself through IBM alphaWorks and use BioWBI online.

Others
Vision http://mgltools.scripps.edu/

Vision is a visual-programming environment in which a user can interactively build networks describing novel combinations of computational methods, and yielding new visualizations of their data without actually writing code. Nodes encapsulating specific computational methods are organized in libraries and displayed in Vision. The user can drag-and-drop them onto a canvas and connect their input and output ports to define an execution flow. Subnetworks can be encapsulated into macro nodes, allowing nesting of networks. Tooltips and balloon help provide runtime information about a node's function, inputs and outputs. Data flowing through nodes can be interactively monitored and introspected. A data type manager holds a pre-defined set of data type objects and new types can be added to this table interactively through a GUI. Although data types are optional, by declaring one, it is possible to specify the appearance (i.e. the color and shape) of the port's icon. This provides helpful visual hints for connecting the proper outputs with the proper inputs. New nodes can be created interactively during a working session and the computational method of any given node can be changed while the network is running. Vision nodes are essentially lightweight wrappers of functionality that is otherwise available in Python, C and Fortran. This makes Vision useful beyond biological applications. To date, Vision comprises a set of standard nodes including an OpenGL 3D-visualization library. The SymServ library implements a set of nodes defining geometric transformations such as point symmetries, translation, rotation, helical arrangements etc. These nodes allow describing complex hierarchical symmetries such as icosahedral symmetry as tree-like structures. The MolKit library, which is based on our Python MolKit library, allows reading, writing, representing and querying molecular data structures. The Imaging library exposes the Python Image Library thus enabling image processing in Vision. Many more custom nodes have been written to date, including nodes to read electron density maps, to compute isocontour surfaces, to mapping textures onto geometry, to convert a 2-D image to 3-D heightfield, and node to converter molecular data into various 3-D file formats (VRML, STL). Vision has been successfully used to combine in 3-D electron microscopy and atomic fore microscopy data to build and refine models of supramolecular assemblies.

Read More
Workflows management: new abilities for the biological information overflow 

Different data, different analytical tools, different users. How can scientific computing cope with diversity in the life sciences?

A Calculus for Propagating Semantic Annotations through Scientific Workflow Queries

Scientific Workflow Management and the Kepler System 

CAFTAN: a tool for fast mapping, and quality assessment of cDNAs 

A Typical Biology Workflow 

Using Biopython for Laboratory Analysis Pipelines 

Bioinformatics Workflow using ASSIST on GRID 

 