Abhishek Tiwari:TEXT MINING

 Home  About  Resources  Research & Projects  Softwares  Publications  ImpLinks  Contact = TEXT MINING =

Oxford Bioinformatics Volume 22 | Number 18 | 15 September 2006
 * Text similarity: an alternative way to search MEDLINE

Synopsis

Garner, Harold et al. have created and optimized a new, hybrid search system for Medline that takes natural text as input and then delivers results with high precision and recall. The combination of a fast, low-sensitivity weighted keyword-based first pass algorithm to cast a wide net to gather an initial set of literature, followed by a unique sentence-alignment based similarity algorithm to rank order those results was developed that is sensitive, fast and easy to use. Literature searching algorithms are implemented in a system called eTBLAST. eTBLAST is a unique search engine for searching biomedical literature. eTBLAST service is very different from PubMed. While PubMed searches for "keywords", eTBLAST search engine lets you input an entire paragraph and returns MEDLINE abstracts that are similar to it. This is something like PubMed's "Related Articles" feature, only better because it runs on your unique set of interests. No more guessing whether your set of keywords has found all the right papers. No more sorting through hundreds of papers you don't care about to find the handful you were looking for--eTBLAST search engine does it for you.

Oxford Bioinformatics Volume 22 | Number 17 | 1 September 2006


 * Combination of text-mining algorithms increases the performance

Synopsis

In this paper, Malik, Rainer et al. show that by combining different algorithms and their outcome, the results improve significantly. Method was implemented by CONAN, a system which combines different programs and their outcome. Its methods include tagging of gene/protein names, finding interaction and mutation data, tagging of biological concepts and linking to MeSH and Gene Ontology terms. CONAN, a text mining system that can automatically extract and display the following information: protein/gene names, protein point mutations, protein-protein interactions and biologically interesting keywords. CONAN is integrated: a command-line tool to query CONAN, a web server and the integration of protein-protein interaction data in a human gene interaction network. With the integration into a human gene interaction network, also "hidden" information can be extracted.

BMC Bioinformatics Volume 6 | Supplements 1 | 2005


 * A critical assessment of text mining methods in molecular biology

Synopsis

The goal of the first BioCreAtIvE challenge (Critical Assessment of Information Extraction in Biology) was to provide a set of common evaluation tasks to assess the state of the art for text mining applied to biological problems. The results were presented in a workshop held in Granada, Spain March 28–31, 2004. The articles collected in this BMC Bioinformatics supplement entitled "A critical assessment of text mining methods in molecular biology" describe the BioCreAtIvE tasks, systems, results and their independent evaluation.

BioCreAtIvE focused on two tasks. The first dealt with extraction of gene or protein names from text, and their mapping into standardized gene identifiers for three model organism databases (fly, mouse, yeast). The second task addressed issues of functional annotation, requiring systems to identify specific text passages that supported Gene Ontology annotations for specific proteins, given full text articles.

This supplement issue from BMC Bioinformatics is a very good collection of papers/reports and any one want to have good material on biological text mining then this issue is a must.

 