The BioBricks Foundation:Standards/Technical/E.coli promoter standard

From OpenWetWare

Jump to: navigation, search

Jason R. Kelly 02:11, 30 March 2008 (EDT):Seems like we should just make a decision about where to locate transcription start site (+1 site) in E.coli sigma70 BB promoters. There's excellent previous discussion on the topic here (I also copied a portion of it below).

Contents

A proposal

proposed standard A.  green box identifies the 7bp sequence that would be required at the end of all standard BB promoters, the -10 box should be directly upstream of these 7 bp and would vary by promoter.  n's are there to ensure standard spacing. BB junction in CAPS (it's not part of promoter).
proposed standard A. green box identifies the 7bp sequence that would be required at the end of all standard BB promoters, the -10 box should be directly upstream of these 7 bp and would vary by promoter. n's are there to ensure standard spacing. BB junction in CAPS (it's not part of promoter).

This basically follows from Chris & Reshma's discussion below. The only addition is that the standard requires a defined spacing between the -10 box and the CAT sequence. The A in the CAT sequence should be the transcription start site. The reasonable spacing along with the use of CAT (which is the 'consensus' -1,+1,+2 sequence - see table) should hopefully lead to predictable transcription start at the 'A'. Unfortunately the current Berkeley promoter library / R0040 don't conform to this standard. I suspect their transcriptional start is at the first A in the BB junction, but it's hard to know because the sequence isn't obviously optimal for a transcription start though it's not bad (see fig).

Table with nucleotide frequencies from 168 promoters[1]
Table with nucleotide frequencies from 168 promoters[1]
berkeley / R0040 promoters (popular registry promoters) have unknown transcription start sites (IMO).  Bold sequence is where I suspect transcription start could be based on data from 112 known promoters[1].  The 1,2,4 are the number of promoters that had the same spacing and the same -1,+1 bases.  (e.g. only 1 promoter had 4n's followed by gc where the c was the trans start. 2 had right spacing for ct and 4 had right spacing for ta).  In contrast, 9 promoters had ca and same spacing as the proposed standard A above.  Underline is my best guess at start site.  I chose T in R0040 as that is the known +1 site for the wt (non-BB'd) promoter, however normally the T is an A, which is much more likely to be a start.
berkeley / R0040 promoters (popular registry promoters) have unknown transcription start sites (IMO). Bold sequence is where I suspect transcription start could be based on data from 112 known promoters[1]. The 1,2,4 are the number of promoters that had the same spacing and the same -1,+1 bases. (e.g. only 1 promoter had 4n's followed by gc where the c was the trans start. 2 had right spacing for ct and 4 had right spacing for ta). In contrast, 9 promoters had ca and same spacing as the proposed standard A above. Underline is my best guess at start site. I chose T in R0040 as that is the known +1 site for the wt (non-BB'd) promoter, however normally the T is an A, which is much more likely to be a start.


  • Jason R. Kelly 04:59, 30 March 2008 (EDT):The other option is to adopt the R0040/Berkeley promoter set as the promoter standard. E.g. (-10box)nnnnnc(BBjunction). Downside here is that (1) I don't know how dependent the "non-consensus" transcriptional start will be on the n's (so might get different start sites w/ diff promoters) and (2) I don't know exactly where the transcriptional start is in the first place. Though I might just not be up to speed on best way to predict start site (ref below was best I could find). Of course this standard would have the advantage that we already have some parts in the format ;)
proposed standard B. Same nomenclature as above.
proposed standard B. Same nomenclature as above.

Continued Discussion

Joey Davis 07:30, 31 March 2008 (EDT):

I've added a few thoughts about standard promoters::

I wholeheartedly agree with the need to standardize promoter construction and decided to throw in a few other thoughts. In particular, I think we should focus on creating a good standard for the interface between a promoter and an RBS – I mention it briefly below.

1. Focus on one family of sigma factors:

I would recommend focusing on the sigma70 family as this encompasses most of the factors we commonly think about (RpoD,S,H,E). Further, the sigma54 family is commonly activated by remote elements which would either make for very large promoters or would greatly limit composability.

2. Functional organization of the promoter:

(A) Each column indicates the number of different repressor proteins that have at least one site centered within the indicated 10-bp interval. The leftmost and rightmost columns indicate the number of repressors upstream of ?101 and downstream of +61. (B) Each column indicates the number of different activator proteins that have at least one site centered within the indicated 5-bp interval. At the extreme left, a single column indicates the number of activators with sites upstream of ?91.From Jay D. Gralla and Julio Collado-Vides “organization and function of transcription regulatory elements”
(A) Each column indicates the number of different repressor proteins that have at least one site centered within the indicated 10-bp interval. The leftmost and rightmost columns indicate the number of repressors upstream of ?101 and downstream of +61. (B) Each column indicates the number of different activator proteins that have at least one site centered within the indicated 5-bp interval. At the extreme left, a single column indicates the number of activators with sites upstream of ?91.From Jay D. Gralla and Julio Collado-Vides “organization and function of transcription regulatory elements”

Naively, promoter strength is a function of 3 variables: holoenzyme affinity for promoter, equilibrium between closed and open complex and efficiency/frequency of promoter escape (this includes all events between opening of DNA helix and clearance of the promoter – in particular initiating synthesis of the first phosphodiester bond, idling/stuttering leading to the generation of nonproductive RNA oligos and finally clearance of the promoter). In the absence of activation/repression, the -40 to +15 region controls all 3 of these variables (-35,-10 define binding, -10-+4 define DNA opening, +1-+20 define promoter escape) – I think this is the absolute minimum we should use to define a promoter. More realistically, we should probably include back to -100 and forward to +20 as the vast majority of both repressors and activators act in this region (see a sampling of ~200 coli promoters below).

  • Joey Davis 21:44, 31 March 2008 (EDT): Moreover, the "up" region which binds alpha subunit sits around -40 - -60 making this region all the more important to include.

It seems using this larger definition would be particularly useful when composing parts – you wouldn’t have to worry about an unforeseen repressor binding site which is coincidentally just up- or down-stream.

  1. Collado-Vides J, Magasanik B, and Gralla JD. . pmid:1943993. PubMed HubMed [Gralla]
  2. Pérez-Rueda E, Gralla JD, and Collado-Vides J. . pmid:9466899. PubMed HubMed [Rueda]
  3. Estrem ST, Gaal T, Ross W, and Gourse RL. . pmid:9707549. PubMed HubMed [Estrem]
All Medline abstracts: PubMed HubMed

3. Transcript stability

The promoter definition itself can have no influence on 3’ dependent or internal cleavage dependent mRNA stability – so that sucks – when we put different parts downstream of a given promoter, we will get different absolute steady-state levels of mRNA. However, ideally, the rank order (in terms of strength) of different promoters should be maintained. While we could include a standardized 5’ UTR in every promoter definition, I think (although with no profound justification) that this would limit promoter design too drastically (so much regulation and strength determination occurs downstream of the start site). Thus, if +20 were included in the promoter definition, the 5’ stability of the transcript would be an inherent characteristic of a given promoter and not of the downstream transcript it is driving (admittedly a bit odd).

  • Jason R. Kelly 21:44, 31 March 2008 (EDT): So to be clear this wouldn't constrain the promoter sequence up to +20. (e.g. you need to define the promoter up to the +20 position, but there are no sequence requirements for positions +3 through +20 -- position +1 and +2 are A,T respectively.
    • Promoter escape module from EcoSal - review covering why promoter should run through the +20 site.
    • Jason R. Kelly 13:51, 4 April 2008 (EDT):I wonder how much a problem the RNA engineering community would have with giving up control of the first 20bp of the mRNA to the promoter. For instance stuff like a 5' stem-loop can stabilize mRNA. [5]

4. Transcription start site

I like Jason’s proposal – in particular I don’t think that enforcing a standard length between the -10 and the start or a defined CAT start is too limiting (some people decrease promoter strength using this region but I think that there are sufficient alternatives to justify always knowing the initiating position). It might be worthwhile however requesting something of the form TATTATnnnnBCAT (where B is C, G or T) as an adenosine 5 nt from the -10 can be used (albeit infrequently) to initiate.

5. Composition between promoters and RBS

Because folding of a 5’UTR can profoundly influence the apparent ribozyme binding/translation efficiency it seems we should provide some insulation between these components. One could imagine placing standardized insulation at either the 3’ end of the promoter part or alternatively at the 5’ end of the RBS part or both. Maybe someone with a background in RNA folding could weigh in on amount of insulation required or if this is even feasible. I guess the biggest concern is that you would get unexpected base-pairing interactions formed between the “RBS part” and the “promoter part” on the completed transcript. Because the loops in stem-loops can be so large I don’t see any easy way to fix this…ideas???

Previous discussion

JCAnderson: The site: http://parts.mit.edu/registry/index.php/Help:BioBrick_Prefix_and_Suffix under the BioBrick Prefix section has a really critical piece of information on how to design biobrick basic parts, and I think we should add to that a preferred way of biobricking the promoter initiation site relative to the polylinker to avoid heterogeneity 5' to the biobrick junction. Again, it is an arbitrary standard, and the options are (with the transcription start in bold): Define it like r0040(and what iGEM2006 did for the family of constitutive promoters):

...ctACTAGT

Or have it in the biobrick site explicitly, something like:

...ACTAGT

So that nothing has to be re-made, and so that more native promoter sequence can be present in the part I lean towards defining the standard as the r0040-compatible version.

Clearly not all promoters are going to be compatible with this standard. Some promoters have operators that overlap or extend beyond the transcriptional start. When making basic promoter parts, one has to currently make an arbitrary decision as to where to put the 3' end of the promoter. It would be preferrable to have a standard.

  • Reshma 11:12, 21 August 2006 (EDT): I agree that we should have a default standard for the promoter-RBS junction. But in looking at the sequence logo for E. coli promoters, I think the typical nucleotides for the -1 and +1 positions are CA. In the absence of any strong reason to go with another scheme, why not go with E. coli promoter consensus? So perhaps something like ...

...caTACTAGAG

i.e. Pretty similar to the R0040-compatible version but with the transcription start site being a defined nucleotide where possible along with the nucleotide before. It might make it a bit more likely that the transcription start site occurs where we think it should occur.


Reference

  1. Hawley DK and McClure WR. . pmid:6344016. PubMed HubMed [McClure]
  2. Emory SA, Bouvet P, and Belasco JG. . pmid:1370426. PubMed HubMed [Emory-1992]
  3. Horwitz MS and Loeb LA. . pmid:3049585. PubMed HubMed [Loeb]
All Medline abstracts: PubMed HubMed

Promoter design references

  1. Mayo AE, Setty Y, Shavit S, Zaslaver A, and Alon U. . pmid:16602820. PubMed HubMed [Alon2006]
  2. Cox RS 3rd, Surette MG, and Elowitz MB. . pmid:18004278. PubMed HubMed [Elowitz2007]
  3. Alper H, Fischer C, Nevoigt E, and Stephanopoulos G. . pmid:16123130. PubMed HubMed [Alper2005]
  4. Miksch G, Bettenworth F, Friehs K, Flaschel E, Saalbach A, Twellmann T, and Nattkemper TW. . pmid:16019099. PubMed HubMed [Miksch2005]
  5. Shimada T, Makinoshima H, Ogawa Y, Miki T, Maeda M, and Ishihama A. . pmid:15489422. PubMed HubMed [Shimada2004]
  6. De Mey M, Maertens J, Lequeux GJ, Soetaert WK, and Vandamme EJ. . pmid:17572914. PubMed HubMed [DeMay2007]
  7. Murphy KF, Balázsi G, and Collins JJ. . pmid:17652177. PubMed HubMed [Collins2007]
All Medline abstracts: PubMed HubMed

Transcriptional start site references

  1. Hershberg R, Bejerano G, Santos-Zavaleta A, and Margalit H. . pmid:11125111. PubMed HubMed [Hershberg-2001]
  2. Gordon L, Chervonenkis AY, Gammerman AJ, Shahmuradov IA, and Solovyev VV. . pmid:14555630. PubMed HubMed [Solovyev-2003]
  3. Gordon JJ, Towsey MW, Hogan JM, Mathews SA, and Timms P. . pmid:16287942. PubMed HubMed [Timms-2006]
All Medline abstracts: PubMed HubMed

Repression references

http://jb.asm.org/cgi/content/full/181/10/2987

Activation references

  1. Ross W, Schneider DA, Paul BJ, Mertens A, and Gourse RL. . pmid:12756230. PubMed HubMed [Ross2003]

http://arjournals.annualreviews.org/doi/pdf/10.1146/annurev.ge.18.120184.001133?cookieSet=1

Initiation references

  1. Leibman M and Hochschild A. . pmid:17332752. PubMed HubMed [Leibman2007]

http://arjournals.annualreviews.org/doi/pdf/10.1146/annurev.ge.19.120185.002035 http://www.ias.ac.in/jarch/jbiosci/18/13-25.pdf http://www.castu.tsinghua.edu.cn/course/ref/section-3/McClure85.pdf

Measuring Kit references

  1. Lu C, Bentley WE, and Rao G. . pmid:15575693. PubMed HubMed [Lu2004]


Ponderings

1: Should we really rely on the the sigma70 type promoters or would it be better to pursue an orthogonal promoter system (maybe a polymerase/sigma system from another organism) such that we don't deplete the pool of coli pols.

2: Should the measuring kit have a second internal standard (like RFP under a known promoter) - this might help with things like copy number variations between different promoters, saturation of some transcriptional/translational component?

Notes from the MIT SB lunch

  • Consider putting a terminator upstream of every promoter to ensure there is no trascriptional read-through.
  • make a sub-set of promoters that have a defined trascriptional start and then 20 defined basis afterwards (phomoters), these would ensure identical mRNA from any phomoter put upstream of the same coding region.
Personal tools