User:Pakpoom Subsoontorn/Notebook/Genetically Encoded Memory/2008/10/14: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
Line 12: Line 12:
**mining from literature
**mining from literature
**large scale, parallel experiment
**large scale, parallel experiment
    The site-specific recombination system generally have a group of enzyme that can cut & paste DNA at very specific sequence. There are, as far as I know, hundreds of such system exists both in nature and that developed in laboratory. They have been exploited extensively for making transgenic organism and in gene therapy.
    There are tons of literatures mentioning such information. The problem is people used different assays, different host cells, different interpretations. It would be great if we can organize such information together. We would like to be able to update, compare and  systematically make reference to the source existing information.
      Moreover, we would like to know "how much we already know" and "how did we know it." Let's say, I have two  stored pieces of information that enzyme X has N terminal domain for catalytic and C terminal for binding,  while enzyme Y has intermixed domains for catalytic and binding domain. I would like to have systematic way to go back and check whether such conclusion come from totally different methods, from two different research groups, and were reported 15 years aways from each other!
The information of interest include:
          -The sequence of recombinase enzyme and target DNA sequence. We probably get the protein sequence from GenBank. However, as far as I know, there is no public database for the target sequences yet. Determining the minimal target sequences are still subjects of intensive research, even for the best known recombinase systems.
          -Structural information. In particular, functional domain, DNA binding domain, catalytic domain, etc.
          -Mutation/ chimeric studies.
          -Efficiency in different host cells.  What're the rates of recombination? How specific is the recombination? toxicities to the host cells?
          -Different Assays used for the studies.


==Database/method features==
==Database/method features==

Revision as of 23:56, 24 October 2008

Project name <html><img src="/images/9/94/Report.png" border="0" /></html> Main project page
<html><img src="/images/c/c3/Resultset_previous.png" border="0" /></html>Previous entry<html>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</html>Next entry<html><img src="/images/5/5c/Resultset_next.png" border="0" /></html>

Quantitative functional profiles of site-specific recombinase

  • Compiling together known information about recombinase
  • Focusing on information that will be useful for in vivo DNA manipulation. For now, let's focus on simple model organism like E.coli and S.cerevisiae
  • Two major sources of information:
    • mining from literature
    • large scale, parallel experiment


    The site-specific recombination system generally have a group of enzyme that can cut & paste DNA at very specific sequence. There are, as far as I know, hundreds of such system exists both in nature and that developed in laboratory. They have been exploited extensively for making transgenic organism and in gene therapy.
    There are tons of literatures mentioning such information. The problem is people used different assays, different host cells, different interpretations. It would be great if we can organize such information together. We would like to be able to update, compare and  systematically make reference to the source existing information. 
      Moreover, we would like to know "how much we already know" and "how did we know it." Let's say, I have two  stored pieces of information that enzyme X has N terminal domain for catalytic and C terminal for binding,  while enzyme Y has intermixed domains for catalytic and binding domain. I would like to have systematic way to go back and check whether such conclusion come from totally different methods, from two different research groups, and were reported 15 years aways from each other!

The information of interest include:

          -The sequence of recombinase enzyme and target DNA sequence. We probably get the protein sequence from GenBank. However, as far as I know, there is no public database for the target sequences yet. Determining the minimal target sequences are still subjects of intensive research, even for the best known recombinase systems.
          -Structural information. In particular, functional domain, DNA binding domain, catalytic domain, etc.
          -Mutation/ chimeric studies.
          -Efficiency in different host cells.  What're the rates of recombination? How specific is the recombination? toxicities to the host cells?
          -Different Assays used for the studies.

Database/method features

  • Key feature of the database:
    • Quantitative description,
    • Capabilities to compare and contrast
      • Standardized description
  • Expandable size of database without messing up with core structure
  • literature cross reference, updating
  • scalable experiment/measurement

Quantitative functional information

The list of information fields (need to decide: mining VS experimenting, set priority, black-box):

  • Enzyme name
  • Natural source of enzyme, host
  • Coding sequence for enzyme itself
  • Enzyme structure: amino acid sequence, functional domain,
  • Target DNA sequence: Natural target,
  • Natural/synthetic topologies of the target DNA
  • Required Axillary factors:
  • Recombination Efficiency in standard host, natural host, etc.
  • Recombination speed
  • Studies in mutated enzyme VS target sequences
  • Mechanism:


Standard System,

  • What should be our standard:
    • Host cells
    • Plasmid
    • System for expressing enzyme

Parameter we want to tune,

  • Enzyme concentration
  • Expression time
    • How long do we have to have enzyme around?

Tools

Informatics tools

  • automated system of updating the list of information in public domain.
    • The list of enzymes and the reference scientific literature (i.e. from pubmed)
    • Roughly split-up information according to fields:
    • Standard information such as sequence, structure, ontology, etc from GenBank or PDB...keep them update
  • Decouple methods (mutant, assay, etc.) and implications (sequence, structure, models) from literature
  • Need effective ways to make reference. Even better to refer to the experiment paper

Molecular Biology tools

  • Standard cassettes on plasmids so that we can try wide varieties of recombination sites and recombination enzymes
  • Tunable promoter and some reporter tags that report the levels of recombinase
  • Method for timing the recombination process. At what time point the recombination of DNA is complete? This could time scale could be much shorter than the time until the reporter of the new DNA configuration become observable. Can we pause/terminate recombination at different time point?
  • Some mechanism to accommodate the fact that the substrate (sites on genomic DNA) has very low copies.

Notes/questions

  • I would suggest that we start at the systems that we have strong references first: lambda and phiC31