Umie Kalsum: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
Line 28: Line 28:


==Group Members==
==Group Members==
# [http://openwetware.org/wiki/Razib_Othman Muhamad Razib Bin Othman]
# [http://openwetware.org/wiki/Razib_Othman Razib M. Othman]
# [http://www.se.fsksm.utm.my/~zuraini/ Zuraini Bt Ali Shah]
# [http://www.se.fsksm.utm.my/~zuraini/ Zuraini A. Shah]


==Contact Info==
==Contact Info==

Revision as of 10:55, 11 July 2008

Umie Kalsum Bt Hassan(an artistic interpretation)

A holistic framework for protein domain detection


ABSTRACT
Protein domain is the fundamental unit of protein structure, folding, function, evolution and design. Knowing the protein domain of protein sequence enables us to probe the function of the protein, to perform drug design, and to construct novel protein. The proposed holistic framework uses several different implementations to detect protein domain from a protein sequence and protein secondary structure information. In this framework, SSpro algorithm is applied to predict the protein secondary structure. Measures of entropy, correlation, protein sequence termination, contact profile, protein secondary structure, physio-chemical properties and intron-exon boundaries are defined to predict the information of sequence from this secondary structure. The neural network training is used to process the scores of information obtain from the various measures to assign protein domain boundaries. A protein domain region can be predicted by pattern matching and assigning the number for each protein domain segment. To evaluate the results, a false positive of domain boundaries region is identified in order to calculate the sensitivity and specificity of protein domain prediction. The proposed holistic framework is evaluated by comparing it with other existing methods. An analysis of the results has demonstrated that the framework based on specific evaluation criteria including sensitivity and specificity of protein domain prediction has performed better than other ab initio methods. The source of the proposed holistic framework is available at http://openwetware.org/wiki/User:Umie_Kalsum.



Introduction
A protein domain is the basic unit of protein structure that can develop itself by using its own shapes and function. It exists independently since the protein domain is a part of the protein sequence. Each shape in protein domains is a structure that is compacted, folded and independently stable.

There is no signal to indicate when a protein domain starts and ends. Because of its unknown beginnings, several methods have been used to detect the domains. One of the methods is ab initio prediction of protein domain that uses machine learning technique. The ab initio method is based on the understanding of how a three-dimensional (3D) structure of protein is attained and deduced as the 3D structure given by protein sequence. One drawback of this method is that it is computationally intensive (Cheng et al., 2006). Recently, a published ab initio method by Nagaranjan and Yona (2004) has been used to attempt to predict the domain boundaries using neural network. This method is based on a protein primary structure analysis to multiple sequence alignment that has been picked from a Non-Redundant (NR) database (Henikoff et al., 1999). Multiple measures are defined and used to quantify the protein domain information content of each position along the sequence which is combined into a single predictor using a neural network. Subsequently this is used to predict protein domain boundaries from the output of this neural network. However, prediction of protein domain using a secondary structure based on SSpro algorithm is believed to be more accurate (Cheng et al., 2005a).

SSpro algorithm is used to predict the secondary structure for each protein chain. SSpro algorithm is based on the ensemble of a one-dimensional Recursive Neural Network (1D-RNN) architecture. The 1D-RNN architecture employs the theory of probabilistic graphical model meshed with neural networks parameterization to accelerate belief propagation and learning. It has been proven that SSpro algorithm can produce successful results by using several methods. Two methods using SSpro algorithm are DISpro (Cheng et al., 2005b) and DOMpro (Cheng et al., 2006). DISpro method is a predictor of protein disordered regions which rely on neural network. The results show that DISpro method is more accurate and roughly equal or slightly better than all the other predictors on Receiver Operating Characteristics (ROC) and False Positive Rate (FPR). DOMpro method used to predict a protein domain boundary is based on a bidirectional recurrent neural network and statistical method. SSpro algorithm used in this framework to predict the secondary structure enables DOMpro method to detect protein domain form the secondary structure information obtained. The results between the sensitivity and specificity using this framework have shown that is an improvement of protein domain prediction.

The framework used in this research is based on an analysis of multiple sequence alignments derived from a database of NR that is used to predict whether a residue belongs to a protein domain boundary region or not. Residues within 20 amino acids from the actual protein domain boundary in the Structure Classification of Protein (SCOP) database (Murzin et al., 1995) are considered to be part of the protein domain boundary region. SSpro algorithm is used to predict the secondary structure that detects a protein domain. The output is further smoothed and post-processed using a neural network model that predicts the protein domain boundaries.

In order to show that the framework used in this research has produced results that are more accurate than other methods such as Biozon (Nagaranjan and Yona, 2004), Dompro and Dompred-DPS (Marsden et al., 2002), sensitivity and specificity comparison has been used. A number of performance measures of sensitivity and specificity in the analysis for single-domain, two-domain and multiple-domain prediction have been applied in this framework.

Research Interests

  1. Predict protein domain from protein sequence and secondary structure information
  2. Split protein sequence into a segment to predict protein domain

Resources

  1. Datasets-Scop 1.57
  2. Download the SSpro source code for protein secondary structure prediction
  3. Biozon domain prediction server or email niranjan@cs.cornell.edu for source code
  4. The score of hydrophobicity and molecular weight for Physio-chemical properties measure

Main References

  1. Automatic prediction of protein domains from sequence information using a hybrid learning system - Niranjan Nagarajan
  2. DOMpro: Protein Domain Prediction Using Profiles, Secondary Structure, Relative Solvent Accessibility, and Recursive Neural Networks - Jianlin Cheng


Group Members

  1. Razib M. Othman
  2. Zuraini A. Shah

Contact Info


Kalsum U Hassan
Laboratory of Computational Intelligence and Bioinformatics
Department of Software Engineering
Faculty of Computer Science and Information Systems
Universiti Teknologi Malaysia
81310 UTM Skudai, Johor Bahru, Malaysia
Email: ukalsum8@siswa.utm.my
Tel: +60197499786/+60123674711

Education

  1. Expected 2009, Master in Computer Science (Research in Bioinformatics), Universiti Teknologi Malaysia
  2. 2006, Degree in Computer Science (Major in Information System), Universiti Teknologi Malaysia
  3. 2003, Diploma in Computer Science (Information Technology), Universiti Teknologi Malaysia