User:Pakpoom Subsoontorn/Notebook/Genetically Encoded Memory/2008/10/14: Difference between revisions

Revision as of 23:56, 24 October 2008

Project name

<html><img src="/images/9/94/Report.png" border="0" /></html> Main project page
<html><img src="/images/c/c3/Resultset_previous.png" border="0" /></html>Previous entry<html>      </html>Next entry<html><img src="/images/5/5c/Resultset_next.png" border="0" /></html>

Quantitative functional profiles of site-specific recombinase

Compiling together known information about recombinase
Focusing on information that will be useful for in vivo DNA manipulation. For now, let's focus on simple model organism like E.coli and S.cerevisiae
Two major sources of information:
- mining from literature
- large scale, parallel experiment

    The site-specific recombination system generally have a group of enzyme that can cut & paste DNA at very specific sequence. There are, as far as I know, hundreds of such system exists both in nature and that developed in laboratory. They have been exploited extensively for making transgenic organism and in gene therapy.

    There are tons of literatures mentioning such information. The problem is people used different assays, different host cells, different interpretations. It would be great if we can organize such information together. We would like to be able to update, compare and  systematically make reference to the source existing information. 
      Moreover, we would like to know "how much we already know" and "how did we know it." Let's say, I have two  stored pieces of information that enzyme X has N terminal domain for catalytic and C terminal for binding,  while enzyme Y has intermixed domains for catalytic and binding domain. I would like to have systematic way to go back and check whether such conclusion come from totally different methods, from two different research groups, and were reported 15 years aways from each other!

The information of interest include:

          -The sequence of recombinase enzyme and target DNA sequence. We probably get the protein sequence from GenBank. However, as far as I know, there is no public database for the target sequences yet. Determining the minimal target sequences are still subjects of intensive research, even for the best known recombinase systems.
          -Structural information. In particular, functional domain, DNA binding domain, catalytic domain, etc.
          -Mutation/ chimeric studies.
          -Efficiency in different host cells.  What're the rates of recombination? How specific is the recombination? toxicities to the host cells?
          -Different Assays used for the studies.

Database/method features

Key feature of the database:
- Quantitative description,
- Capabilities to compare and contrast
  - Standardized description
Expandable size of database without messing up with core structure
literature cross reference, updating
scalable experiment/measurement

Quantitative functional information

The list of information fields (need to decide: mining VS experimenting, set priority, black-box):

Enzyme name
Natural source of enzyme, host
Coding sequence for enzyme itself
Enzyme structure: amino acid sequence, functional domain,
Target DNA sequence: Natural target,
Natural/synthetic topologies of the target DNA
Required Axillary factors:
Recombination Efficiency in standard host, natural host, etc.
Recombination speed
Studies in mutated enzyme VS target sequences
Mechanism:

Standard System,

What should be our standard:
- Host cells
- Plasmid
- System for expressing enzyme

Parameter we want to tune,

Enzyme concentration
Expression time
- How long do we have to have enzyme around?

Tools

Informatics tools

automated system of updating the list of information in public domain.
- The list of enzymes and the reference scientific literature (i.e. from pubmed)
- Roughly split-up information according to fields:
- Standard information such as sequence, structure, ontology, etc from GenBank or PDB...keep them update
Decouple methods (mutant, assay, etc.) and implications (sequence, structure, models) from literature
Need effective ways to make reference. Even better to refer to the experiment paper

Molecular Biology tools

Standard cassettes on plasmids so that we can try wide varieties of recombination sites and recombination enzymes
Tunable promoter and some reporter tags that report the levels of recombinase
Method for timing the recombination process. At what time point the recombination of DNA is complete? This could time scale could be much shorter than the time until the reporter of the new DNA configuration become observable. Can we pause/terminate recombination at different time point?
Some mechanism to accommodate the fact that the substrate (sites on genomic DNA) has very low copies.

Notes/questions

I would suggest that we start at the systems that we have strong references first: lambda and phiC31

User:Pakpoom Subsoontorn/Notebook/Genetically Encoded Memory/2008/10/14: Difference between revisions

Revision as of 23:56, 24 October 2008

Quantitative functional profiles of site-specific recombinase

Database/method features

Quantitative functional information

Tools

Notes/questions

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

research

Tools

@@ Line 12: / Line 12: @@
 **mining from literature
 **large scale, parallel experiment
+     The site-specific recombination system generally have a group of enzyme that can cut & paste DNA at very specific sequence. There are, as far as I know, hundreds of such system exists both in nature and that developed in laboratory. They have been exploited extensively for making transgenic organism and in gene therapy.
+     There are tons of literatures mentioning such information. The problem is people used different assays, different host cells, different interpretations. It would be great if we can organize such information together. We would like to be able to update, compare and  systematically make reference to the source existing information.
+       Moreover, we would like to know "how much we already know" and "how did we know it." Let's say, I have two  stored pieces of information that enzyme X has N terminal domain for catalytic and C terminal for binding,  while enzyme Y has intermixed domains for catalytic and binding domain. I would like to have systematic way to go back and check whether such conclusion come from totally different methods, from two different research groups, and were reported 15 years aways from each other!
+The information of interest include:
+           -The sequence of recombinase enzyme and target DNA sequence. We probably get the protein sequence from GenBank. However, as far as I know, there is no public database for the target sequences yet. Determining the minimal target sequences are still subjects of intensive research, even for the best known recombinase systems.
+           -Structural information. In particular, functional domain, DNA binding domain, catalytic domain, etc.
+           -Mutation/ chimeric studies.
+           -Efficiency in different host cells.  What're the rates of recombination? How specific is the recombination? toxicities to the host cells?
+           -Different Assays used for the studies.
 ==Database/method features==