CH391L/S13/Metagenomics & Bioprospecting

From OpenWetWare
Revision as of 06:02, 18 February 2013 by Andre C Maranhao (talk | contribs)
Jump to: navigation, search

Introduction & History

Metagenomics and bioprospecting are two 'umbrella' terms that cover many activities within biological research. Both terms are relatively new and were coined in the 1990's. As they both share many of the same activities and techniques, it is possible to confuse these related concepts. A helpful analogy suggests that metagenomics and bioprospecting are "different sides of the same coin". The coin would represents the drive to access and utilize the wealth of information that biodiversity has to offer. In this analogy, the difference between basic and applied research rougly delineate the two sides of the coin as well as the concepts that each embodies. As a scientific field in its own right, metagenomics represents the accumulation of genetic information from a broad range of environmental samples. Conversely, bioprospecting is application-driven research aimed at discovering commercially relevant material. Whereas metagenomics is a field of research, bioprospecting is more akin to a strategy or collection of techniques, but may be considered a field.

Of these fraternal concepts, bioprospecting would be considered the older sibling. Bioprospecting originates from the field of chemical ecology wherein the discovery and commercialization of natural products had been previously known as 'chemical prospecting.' While similar in principle, chemical prospecting ultimately employed chemical synthesis of newly discovered, commercially relevant compounds. The advent of next-generation sequencing, recombinant DNA techniques, and biotechnology in general allowed bioprospecting to develop as a unique and separate 'field'. Those same technological advances and interest in natural products would later pave the way for metagenomics.

Bioprospecting: Hunting for Utility in Nature

Bioprospecting covers the many activities involved in discovery and utilization of biological material. In the past, bioprospecting has primarily focused upon natural products and drug discovery. Still, bioprospecting has led to the discovery of numerous enzyme and protein tools that are widely used in both the pharmaceutical and research communities. Current research efforts combined with improvements in sequencing technologies may expand the breadth of activities defined as bioprospecting.

Therapeutics & Drug Discovery

There are many examples of bioprospecting geared toward drug discovery. As an outgrowth of chemical prospecting, considerable bioprospecting efforts - both past and present - have centered around plant secondary metabolites. A potent chemotherapy drug, paclitaxel (i.e. taxol) serves as an excellent example of the transition from chemical prospecting to bioprospecting. Discovered in the bark of the Pacific Yew tree, this isoprenoid therapeutic was initially produced through low yield chemical extraction before semi-synthetic production was adopted in 1988 [1]. Using metabolic engineering techniques, researchers created transgenic Arabidopsis thaliana capable of producing taxidene, the first committed step in paclitaxel biosynthesis [2]. Since then, further research has led to plant cell fermentation production. Additional research generated strains of E. coli and yeast containing the metabolic pathways necessary for the production of taxidene and other isoprenoid compounds [1][3]. The tale of paclitaxel is principally considered a feat in the field of metabolic engineering. However, those engineered strains of E. coli and yeast serve as platform technologies for tractable expression of newly discovered enzymes and production of their isoprenoid compounds.

Although terrestrial plants remain an important aspect of bioprospecting, increasing attention is being paid to marine biodiversity in the search for new therapeutics. Study of tunicated has led to the discovery of numerous cytotoxic compounds with potential as cancer therapies[4]. More commonly known as seaweed, macroalgae present another considerable opportunity for bioprospecting [5].


In the pursuit of second-generation or advanced biofuels, bioprospecting is increasingly implemented as a strategy for metabolic pathway engineering and overall optimization. In a 2010 publication, LS9, Inc. reported the discovery of alkane biosynthesis pathways in a diverse set of cyanobacteria, which was subsequently expressed in E. coli [6]. That body of work provides an excellent demonstration of various bioprospecting techniques.

Research Tools

Fluorescent proteins are likely the most famous research tools derived from bioprospecting. Examples include dsRed as well as GFP and its many derivatives, which have been utilized across the spectrum of biological research. Interestingly, these fluorescent proteins are finding new purpose in medicine as visual guides in surgery. In this scenario, a recombinant form of GFP accumulates on the cells of blood vessels thus providing a visual queue to a surgeon. Having been demonstrated in mice, this technique could greatly diminishing the chances of an accidental incision during surgery.

DNA and RNA polymerases are the workhorses of biotechnology. Almost every aspect of modern biological research is dependent upon nucleic acid polymerases in one aspect or another. Recombinant cloning techniques, Sanger sequencing, and qPCR cover a few of the most common uses. These examples also highlight the shared importance of nucleic acid polymerases and Polymerase Chain Reaction (PCR). It was the development of PCR using Taq polymerase that began the drive for bioprospecting of DNA and RNA polymerases. Over the years, several other polymerases of thermophilic origin have been discovered and rapidly commercialized. One area of considerable interest is the discovery or development of high-fidelity, thermostable reverse transcriptases.

Using bioprospecting techniques, one research group isolated and cultured a novel thermophilic bacterium from a hot spring. That bacterium's DNA polymerase I gene was subsequently cloned and engineered to alter its specificity from DNA to RNA. In this manner, the researchers mutated the DNA-dependent DNA polymerase into an RNA-dependent DNA polymerase (i.e. a reverse transcriptase).

Metagenomics: Biological Data Mining

Consideration of biological organization greatly assists understanding the meaning of metagenomics. Within that conceptual framework, metagenomics would be a higher level element similar to the population or community tiers of biological organization. In brief, metagenomics refers to the sum of all genetic information present in an environmental sample. The term itself was coined in 1998 [7]. Shortly thereafter, researchers characterized the first bacterial rhodopsin, which was isolated from seawater genomic DNA fragments[8].

Since the turn of the century, metagenomics has bosomed as a field. Decreasing per basepair cost of pyrosequencing technologies has greatly increased the number of metagenomic research projects. The April 2012 release of the UniProt database comprised an impressive 20.6 million protein sequences. However, only 2.8% of those protein sequences were confirmed to exist by analysis at the protein and or transcript level. The matter is further complicated as the probability of feature identification is proportional to read length. So, there is a significant difference between the information derived from pyrosequencesing reads versus Sanger sequences [9].

Global Ocean Explorer

Craig Venter and his Yacht

In the early 2000's, the J. Craig Venter Institute set as one of its goals to sequence the genomic diversity in the oceans. Craig Venter used his personal yacht, the Sorcerer II, to traverse the Earth's oceans, taking samples of oceanic life and sequencing using Whole Genome Shotgun Sequencing. From this adventure, they uncovered 6 million proteins (double the current database), which consisted of 1,700 clusters of gene families with no known homology. The data also revealed homology for 6,000 unknown ORF families (ORFan). They found that a very high proportion of new genes belonged to viruses (likely marine phage), which current databases had underrepresented. [10]

The Human Microbe Project



  1. Boghigian BA, Myint M, Wu J, and Pfeifer BA. Simultaneous production and partitioning of heterologous polyketide and isoprenoid natural products in an Escherichia coli two-phase bioprocess. J Ind Microbiol Biotechnol. 2011 Nov;38(11):1809-20. DOI:10.1007/s10295-011-0969-9 | PubMed ID:21487833 | HubMed [Boghigian2011]
  2. Besumbes, Oscar. Metabolic engineering of isoprenoid biosynthesis in Arabidopsis for the production of taxadiene, the first committed precursor of Taxol. Biotechnol Bioeng, 2004. [Besumbes2004]
  3. Engels, Benedikt. Metabolic engineering of taxadiene biosynthesis in yeast as a first step towards Taxol (Paclitaxel) production. Metab Eng, 2008. [Engels2008]
  4. Rinehart, K.L. Antitumor Compounds from Tunicates. Med Res Rev, 1999. [Rinehart1999]
  5. Pereira, Renato C. Bioprospecting for bioactives from seaweeds: potential, obstacles and alternatives. Braz J Pharmacogn, 2012. [Pereira2012]
  6. Schirmer, Andreas. Microbial Biosynthesis of Alkanes. Science, 2010. [Schirmer2010]
  7. Handelsman, Jo. Molecular biology access to the chemistry of unknown soil microbes: a new frontier for natural products. Chemistry & Biology, 1998. [Handelsman1998]
  8. Beja, Oded. Bacterial Rhodopsin: Evidence for a New Type of Phototrophy in the Sea. Science, 2000. [Beja2000]
  9. Temperton, Ben. Metagenomics: microbial diversity through a scratched lens. Curr Opin Microbiol, 2012. [Temperton2012]
  10. Yooseph S, Sutton G, Rusch DB, Halpern AL, Williamson SJ, Remington K, Eisen JA, Heidelberg KB, Manning G, Li W, Jaroszewski L, Cieplak P, Miller CS, Li H, Mashiyama ST, Joachimiak MP, van Belle C, Chandonia JM, Soergel DA, Zhai Y, Natarajan K, Lee S, Raphael BJ, Bafna V, Friedman R, Brenner SE, Godzik A, Eisenberg D, Dixon JE, Taylor SS, Strausberg RL, Frazier M, and Venter JC. The Sorcerer II Global Ocean Sampling expedition: expanding the universe of protein families. PLoS Biol. 2007 Mar;5(3):e16. DOI:10.1371/journal.pbio.0050016 | PubMed ID:17355171 | HubMed [PLOSbio2007]
All Medline abstracts: PubMed | HubMed