Abhishek Tiwari:SYSTEMS BIOLOGY
BMC Bioinformatics Volume 7 | FEBRUARY 2007
- Classification of microarray data using gene networks
Authors introduced a general mathematical formalism to include a priori the knowledge of a gene network for the analysis of gene expression data. The method is independent of the nature of the network, although they focus on the gene metabolic network as an illustration in this paper. The approach is based on the spectral decomposition of gene expression profiles with respect to the eigenfunctions of the graph, resulting in an attenuation of the high-frequency components of the expression profiles with respect to the topology of the graph. Authors show how to derive unsupervised and supervised classification algorithms of expression profiles, resulting in classifiers with biological relevance. Reported algorithms for unsupervised clustering and supervised classification, which enforce some level of smoothness on the gene network for the classifier. This enforcement can be considered as a means of reducing the high dimension of the variable space, using the available knowledge about gene network. No prior decomposition of the gene network into modules or pathways is needed, and the method can work in principle with a variety of gene networks.
BMC Bioinformatics Volume 7 | JANUARY 2007
- Identification of functional modules using network topology and high-throughput data
Authors have reported a novel computational technique for the integrated analysis of network and similarity data. The method is aimed to dissect together topological properties of gene or protein networks and other high-throughput data. Method was used to analyze large-scale protein interaction networks and genome-wide transcription profiles in yeast and human. The method was shown to identify functionally sound modules, i.e., connected subnetworks with highly coherent expression showing significant functional enrichment. In comparison to the extant Co-clustering method, which aims to integrate similar data, our method demonstrated substantial improvement in solution quality. Comparison to solutions produced by clustering highlights the advantage of utilizing topological connectivity in the hunt for functionally sound modules. By construction, our method is specifically powerful in detection of regulatory modules, and less fit for detection of metabolic modules. Our technique, implemented in the program MATISSE, is efficient and can analyze genome-scale interaction and expression data within minutes.The proposed algorithm is very flexible and – unlike Co-clustering – can handle situations where not all genes in the network have similarity information or expression patterns. In particular, MATISSE can determine the subset on which similarity is computed using various criteria, e.g., initial probe filtering, differential expression confidence values, etc.
Oxford Bioinformatics Volume 22 | Number 18 | 15 November 2006
- Discovering disease-genes by topological features in human protein–protein interaction network
Mining the hereditary disease-genes from human genome is one of the most important tasks in bioinformatics research. A variety of sequence features and functional similarities between known human hereditary disease-genes and those not known to be involved in disease have been systematically examined and efficient classifiers have been constructed based on the identified common patterns. The availability of human genome-wide protein–protein interactions (PPIs) provides us with new opportunity for discovering hereditary disease-genes by topological features in PPIs network.Analysis reveals that the hereditary disease-genes ascertained from OMIM in the literature-curated (LC) PPIs network are characterized by a larger degree, tendency to interact with other disease-genes, more common neighbors and quick communication to each other whereas those properties could not be detected from the network identified from high-throughput yeast two-hybrid mapping approach (EXP) and predicted interactions (PDT) PPIs network. KNN classifier based on those features was created and on average gained overall prediction accuracy of 0.76 in cross-validation test. Then the classifier was applied to 5262 genes on human genome and predicted 178 novel disease-genes. Some of the predictions have been validated by biological experiments.