# Talk:Synthetic Biology:BioBricks/Standardization

Note that this document is in REVERSE CHRONOLOGICAL ORDER

### JM, 8/20/06

This came up earlier when we were starting out with the screening plasmid (which unfortunately doesn't work all that well for characterizing parts). If we trust our measurements (we think that they accurately reflect the system behavior) then I definitely think that it's better to start characterizing parts now. As mentioned below, all our measurements right now are going to be relative - let's pick the best standard we can think of now and, if things change in the future then we just recalibrate (c.f. changing atm->bar).

The important question, for me, is whether our measurements are valid, and here I think the data is inconclusive. Someone (and I'm not volunteering) would need to show that the measurements are both internally consistent (if A is twice as strong as B, and C is twice as strong as A, then C should be four times as strong as B) and externally consistent (if A is twice as strong as B when GFP is used as the reporter, it should also be twice as strong when RFP is used as a reporter). As long as someone's willing to do this for the 'standard composite' then I think it would be really valuable.

I think this is really the first issue that needs to be settled. Once you have consistent measurements in a completely standardized situation, then we can start figuring how the measurements map onto a new set of conditions. If you don't trust the measurements in the first place, though, trying to compare measurements in different conditions is going to be a mess. -JM

• Reshma 11:12, 21 August 2006 (EDT): This is certainly a major issue. But making sure that experiments are internally and externally consistent is actually pretty hard given that we don't know how to physically and functionally compose standard biological parts. I'm not sure that we should wait to define a "standard reference composite" until that problem is solved. We might be waiting a long time.
• Jkm 12:05, 21 August 2006 (EDT): But is the reference going to be useful until we have consistency? It's great to talk about a standard, but if that standard doesn't allow us to compare two things that we haven't directly measured (if A is twice as strong as B, and C is twice as strong as A, but we can't compare B to C), then I don't think we've gained much (other than the qualitative comparisons that we already have). I can live without external consistency if necessary (call it more standardization if you have to), but I think internal consistency is necessary for any data to be useful.
• JCAnderson 22:40, 21 August 2006 (EDT): There may be issues with doing standardization based on FP's due to nonlinearities that give rise to internal/external inconsistancy as you describe. It may be we have to do some mathematical conversion to map the data back to quantitative mRNA-based experiments and get consistancy. This all needs to be investigated by just jumping in, start taking some measurements, and see what mode of measurement and data massaging gives rise to a usable standards system.

None of that stops us from declaring a temporary standard composite. The experiments that will need to be done to get at the measurement issue is to make a set of very similar constructs and do Tecan/cytometry/qpcr/northern/western, or who knows what else on a very narrow set of perturbations on the standard composite and figure out the measurement issues. Step 1 is to declare the standard. Step 2 is to figure out how to standardize it.

### JCAnderson 16:48, 20 August 2006 (EDT)

It doesn't sound like we are going to fully agree on a specific standard. It is my belief that the "standard composite" is an entirely arbitrary reference point as long as the choice doesn't in hindsight turn out to be an oddball.

For now, it seems like we do agree on several points:

• Eventually one "standard composite" should be defined, and all activities can be a reference to that
• The Neidhart media is preferrable to LB to minimize media-based variation
• A single-copy standard would be ideal, but we do not currently have an appropriate FP and single-copy plasmid to do it
• Reshma 11:12, 21 August 2006 (EDT): The terminator should be B0015.
• cmc 17:45, 19 September 2006 (EDT) : I think this is still open for discussion. B0011 or B0014 might be a better choice.
• Reshma 11:12, 21 August 2006 (EDT): The promoter should be R0040?
• cmc 17:45, 19 September 2006 (EDT) : I would support R0040, given its characterization and use in various systems.

I still believe we should pick a temporary standard to get the ball rolling, it should reflect the content of the current registry rather than trying to anticipate what people will use in the future, and it should be as easy to use as possible. It very well may be true that a high-copy standard poorly maps to single copy experiments. We don't yet know that's true.

The real heart of the problem is not what the standard is but rather what are the mapping functions that allow us to predict the behavior of a part in one context versus another. How many of these things (changing to a slower growing strain, changing the copy number of a plasmid, changing the media, changing the temperature) result in simple scalar multiples of the activity of the part measured in a different context. In essence, the initial question that needs to be addressed is what are the sources of variation beyond the ones we already know about. How common and significant are various types of context effects, what are these various effects, and how can we eliminate them.

• Reshma 11:12, 21 August 2006 (EDT): Going with a temporary standard sounds like the right plan ... so some open questions (in my mind, please add to the list).
1. Which FP? GFP or wait for Emerald?
2. FP with or without an LVA/LAA tag?
3. Which RBS? (Seems like there is consensus on R0040 and B0015?)
4. Which plasmid? pSB1 series? another option is a pSB4 or pSB3 series plasmid which are low but not single copy. I'm synthesizing a new set of pSB4* series plasmids but they won't be ready for at least a month.
5. Others?

#### the modularity problem with promoters

Drew's example of the promoter and ribosome binding site that produce a hybrid 5'UTR hairpin that disrupts translation is a separate standardization issue. I think it is more pressing, though.

The site: http://parts.mit.edu/registry/index.php/Help:BioBrick_Prefix_and_Suffix under the BioBrick Prefix section has a really critical piece of information on how to design biobrick basic parts, and I think we should add to that a preferred way of biobricking the promoter initiation site relative to the polylinker to avoid heterogeneity 5' to the biobrick junction. Again, it is an arbitrary standard, and the options are (with the transcription start in bold): Define it like r0040(and what iGEM2006 did for the family of constitutive promoters):

...ctACTAGT

Or have it in the biobrick site explicitly, something like:

...ACTAGT

So that nothing has to be re-made, and so that more native promoter sequence can be present in the part I lean towards defining the standard as the r0040-compatible version.

Clearly not all promoters are going to be compatible with this standard. Some promoters have operators that overlap or extend beyond the transcriptional start. When making basic promoter parts, one has to currently make an arbitrary decision as to where to put the 3' end of the promoter. It would be preferrable to have a standard.

• Reshma 11:12, 21 August 2006 (EDT): I agree that we should have a default standard for the promoter-RBS junction. But in looking at the sequence logo for E. coli promoters, I think the typical nucleotides for the -1 and +1 positions are CA. In the absence of any strong reason to go with another scheme, why not go with E. coli promoter consensus? So perhaps something like ...

...caTACTAGAG

i.e. Pretty similar to the R0040-compatible version but with the transcription start site being a defined nucleotide where possible along with the nucleotide before. It might make it a bit more likely that the transcription start site occurs where we think it should occur.

On a second point, this information is really buried on the registry, but it is really important. I only became aware of the different standard for ORF parts when I was examining sequence data and noticed that they needed the start codon to overlap the XbaI site. Until that, I thought all parts were supposed to be the same way. The berkeley iGEM students last year biobricked their open reading frame and lock(rbs) parts incorrectly according to this standard, so I'm not the only one who missed it. This information needs to be more clearly declared on the registry "add a basic part" page.

• Reshma 11:12, 21 August 2006 (EDT): Agreed. There are lots of holes in explanations on the Registry. That's why the Registry was converted to a wiki format ... to allow more people the chance to fix these explanations. But there is still a lot more work to be done.

### On 8/17/06, Barry Canton wrote:

1.) We might want to specify a couple of reference constructs so that people can confirm the linearity of whatever measurement system they use. One point on the curve is very useful, 2-3 might be better again.

• Chris: I agree. At the heart of it I think one construct needs to be the "absolute standard composite" and has all the parts which become defined as activity=1. Everything, regardless of the method of measurement, can be described relative to that standard composite. So, to standardize promoters, you would hold rbs, FP, term the same and vary promoter. To standardize rbs, you'd hold promoter, FP, term the same and vary just the rbs. Then, of course, you'd want to build double perturbation variants and show that their function is predicted from the single perturbation standards. At the end of the standardization, I agree you'd want a panel of variants for each style part so that the measurement for the new part can be done in parallel with the standard panel.
• Reshma 17:57, 17 August 2006 (EDT): Agreed.

2.) While I'm in favor of just picking something and then sticking with it, we might want to wait until we get one of the new FPs (emerald?) biobricked since it should be much brighter than our current standard, E0040 and would allow us to work at much lower copy. Jason has been working on this FP and might have more to add.

• Chris: If the delay for having a functioning better variant of the FP is fairly short, I'm all for it. It probably would be good to include the LVA/LAA tag on it in the standard.
• Reshma 17:57, 17 August 2006 (EDT): I am not sure about including an LVA/LAA tag. In my hands it often reduces fluorescence levels sufficiently that it can be hard to measure at low-medium copy number or for reporters other than GFP. I prefer to omit the tag.

3.) I'd like to pick a reference construct right now but the real benefit of such a construct would probably be felt by next year's iGEM teams, its probably too late for most of this year's teams to standardize. So it seems like we have the time to make a few improvements to the standard (different promoter as John suggests, maybe new FP etc.). We can gain experience with it ourselves over the winter so that it can be distributed as a well tested reference construct to iGEM next year.

• Chris: It's definitely too late for this year's teams to standardize things. I'm not sure that the teams are the ones that will be using this at first anyway. The standardization needs to begin with very simple parts--rbs, FPs, terminators, and simple promoters (constitutive promoters and operons with all trans-acting elements encoded on the genome). I doubt any iGEM teams (other than us) will have focused on those types of parts.
There isn't a rush to standardize, but I think it is important to have a standard defined if people want to standardize, and also valuable to make an attempt to standardize the more popular parts, or introduce a set of new standardized parts.

Excited by the prospect of coast-coast standardization.

Barry

### On 8/17/06, Reshma P. Shetty wrote:

Hey Chris and Drew,

See

and associated pages linked from there.

In particular our discussions on standard operating conditions have differed from your choices on the following counts ...

1) media -> defined rich media like Neidhardt EZ Rich media (LB tends to vary a lot batch to batch) http://openwetware.org/wiki/Synthetic_Biology:Media

• Chris: Neidhard it is.

2) strain -> MG1655 or derivative thereof (or really Blattner's strain but since no one can get ahold of it and it is not redistributable ... not very practical). I don't think DH10B is sequenced. http://openwetware.org/wiki/Standard_E._coli_Strain_for_BioBricks

• Chris: Not sure I agree with that one. On the one hand a sequenced genome is a big advantage since you know exactly what's in it. On the other hand, MG1655/W3110 is a pain to work with--it doesn't transform very well, it's recA and endA positive, etc. I don't think my iGEM team could handle MG1655. I often have difficulties working with it myself. DH10B is partially sequenced; contigs of most of it are on the web, but it isn't completed. DH10B and it's homologs and derivatives (GeneHogs, invalpha, etc.) are the most popular strain in use today and is very user friendly. Also, I think "having been sequenced" is a red herring. In probably a year or two DH10B will be finished, as will sequences of most other strains.
In the end, I think it is going to be arbitrary what strain you pick as long as the fundamental standard is relational rather than absolute. Here, the activity of all parts is defined as its activity as a substitution in the absolute standard composite relative to the absolute standard composite's activity--independent of measurment strategy. So, that could be western-based PoPS and RIPS values, QPCR, Tecan measurments, FACS, or whatever else. In practice, we need to do all these measurements and establish 1) what are the types and sources of variation, and 2) what methods of measurment produce similar standard values. I think what you will find is that a q. Western and a FACS experiment give the exact same standards, and they will correlate well with QPCR and Tecan data when just accounting for nonlinearity.
Once you have a set of standards defined in an arbitrary reference strain, you can take the panel of variants and probe them in a MG1655, the new Blattner strain, or whatever other strain and ask whether the relational standards hold up.
The bottom line though is that most people are going to DO their experiment in DH10B and are therefore best served by a set of standards that will apply to their experiment.
• Reshma 17:57, 17 August 2006 (EDT): The main thing that MG1655 has going for it is that it was in some sense chosen by the E. coli as the "standard" K12 strain. That's why it was the one sequenced. Hence, it is as close to standard E. coli as there is. In my mind, choosing any other strain is pretty arbitrary. (Granted that often standards are picked somewhat arbitrarily).
Also, keep in mind that we are just talking about a strain used for characterization purposes. Systems can be assembled in other, friendlier strains and then moved to MG1655 in the final step for characterization. Therefore, it shouldn't be so much of a burden to transform intact plasmid into it once for characterization (in my opinion).
Also, is DH10B really the most popular strain? We don't actually use it much in the MIT SBWG.

3) vector -> single copy F plasmid or chromosomal integration (try to minimize cell-to-cell copy number variation) (in the process of getting a standard vector scaffold synthesized but still need a single copy origin) http://openwetware.org/wiki/Synthetic_Biology:Vectors

• Chris: In general I am very much in favor of encouraging the use of F plasmids and having the absolute standard construct based on single copy. I think that will make a world of difference for many applications. Their use has been a critical development in my own work. The only caveats to that are:
1. You may never get good FP-based reporter systems doing single copy. Unless the promoter and rbs are very strong, you can't see GFP coming from single copy. Even the Pbad promoter with a b0034-like rbs on GFP shows up in the first decade by cytometry. It's a major problem. Perhaps the Emerald FP described by Barry would address this, but I'm skeptical about whether that is doable. I have a feeling this would make the characterization of weaker rbs and promoters impossible by protein-detection based assays.
• Reshma 17:57, 17 August 2006 (EDT): You're definitely right that this might make FP-based reporter systems very difficult to use.
2. Many of the things that people do with syn biol--including protein overexpression, translational recoding, engineered biosynthesis of small molecules, biofuels, materials production--are not done on single-copy plasmids. It would be best if the standards system could account for activity both at high and signle copy, though I recognize this will be difficult to do.
• Reshma 17:57, 17 August 2006 (EDT): Most people do tend to do things at high copy. But I am not sure that we should define a standard characterization vector for that reason. I figure that eventually everyone will be assembling their systems at medium/high copy and then characterizing/running their systems at single copy. Hence I would favor a single copy characterization standard. In some sense, I am trying to pick the characterization standard based on what I anticipate that people are going to do in the future rather than what they are doing now ... which might be a very bad idea. :) It seems like many of the things we do now (use high copy plasmids for instance) are an artifact of what was easiest to do in the past rather than what's might be required in the future.
3. You need to have the plasmid, and it really needs to have an R6K or other second promoter to move the copy number back to high in a conditional manner. These things are incredibly painful to work with without a means of amplification. I've attached the plasmid I've been using to do single-copy with biobricks. All the biobrick sites are unique except PstI.
• Reshma 17:57, 17 August 2006 (EDT): My plan is to try and avoid a second conditional high copy replicon on a single copy plasmid. Instead, I include a high copy origin in the multiple cloning site. That way, isolating large amounts of the plasmid is easy. Then, when you clone in your BioBrick system for characterization, you replace the high copy origin and the entire plasmid goes to single copy for characterization work. I like this approach cause you don't burn an inducible promoter. (Again, I assume you only use the single copy plasmid at the final stage for your intact assembly. All previous clonings and assemblies are down in a medium/high copy plasmid).

4) temp -> this one I am unsure about. We tend to run everything at 37 degrees but it might actually be preferable to run everything at 30 degrees where the cells don't grow quite as fast ... might help with load issues.

• Chris: 37 is the physiologically natural temp for these strains and is the temp everyone in biology uses unless there is a necessity to do a different temperature. The standard conditions need to reflect what people already do.
• Reshma 17:57, 17 August 2006 (EDT): Again, I feel that the standard conditions should reflect how people are going to be running their synthetic biological systems in the future. It might be that the cells and our systems are happier at 30. But I don't have a definitive reason to go for 30 rather than 37. So probably it makes sense to do 37 for now.

I think an OD=0.5 is reasonable and don't have a particular preference on culture volume. There might be issues with size of culture flask though. In an effort to make sure that oxygen isn't limiting, we've tried to standardize on a flask:culture volume ratio of 10 or higher. ( i.e. Use at least a 500mL flask for a 50 mL culture.)

As for a reference construct, the ones you've chosen seem reasonable though I think more characterization work has been done already on B0032 and B0030? I agree regarding R0040, E0040 and B0015.

Happy to discuss/debate further. Definitely important issues.

-Reshma

### On Aug 17, 2006, at 2:28 AM, Drew Endy wrote:

Are there any OWW pages that talk about standard measurement and standard conditions? See below. At the least, I'd like to suggest a defined media et cetera. I know that we've talked about this a lot -- has any of that been captured?

Thanks! Drew

### From: "John Anderson"

Date: August 17, 2006 1:06:29 AM EDT
To: "Drew Endy"
Subject: Re: PoPS standardization

As for a standard reference construct/condition, I think it should be:

(r0040).b0034.E0040.b0015 in pSB1AK3, in DH10B, 37degree, 50 mL LB, at OD=0.5

I put r0040 in parenthesis because of the previous issue of having 5' UTR in the promoter. I might even make a family of variants of r0040 that put their transcriptional start in the biobrick linker.

Since b0034 has already been defined as "1" it seems like the place to start. GFP is the most sensitive FP, so E0040 is the logical choice, and the Tet-based promoter and b0015 seem to be everyone's go-to parts right now.

On 8/16/06, John Anderson wrote: Thanks for sending your presentation. I remembered the BBa_I7108 result, but I had interpreted that as a context effect between the junction of the RBS part and the ORF part. I now see the issue:

          38
|      31
a tctttaatttcttgTgaatttaatagatgatttccttagaaa 5'
c<     ||||||||
taga gattaaagaggagaaatactagatgcgtaaaggagaaga 3'
|      57
50


(promoter in red, transcription start in orange, GFP in green, biobrick scars in blue)

Part of the diffculty with standardization of promoter and rbs parts is the fact that the promoters haven't been fully modularized--the 5' UTR gets pieces of the promoter with it. If the transcriptional start sites were within the biobrick scar, this would not be a source of variability. As is, chimeric constructs aren't really separating the transcription and transcription-related functions of the device.

Unfortunately, r0040 and our synthetic constitutive promoters also share a nub of 5' UTR within the promoter part.

I wonder if it might be prudent to define a specific site within the biobrick scar as the (at least predicted) transcriptional start and emphasize that people should design promoter parts to initiate within that scar.

r0040 tctagagtccctatcagtgatagagattgacatccctatcagtgatagagatactgagcact actagt

• Reshma 17:57, 17 August 2006 (EDT): I think you're definitely right that we need to define the promoter-RBS divide more carefully. Your suggestion to choose a specific location within the BioBricks scar seems reasonable.

### On 8/16/06, Drew Endy wrote:

Chris,

Aye; sorry about my delay in responding.

We have measured context dependent variation (as you describe) with our constructs. I reported this at SB2.0. We use "functional composition" to describe this effect. The idea is summarized on slide 23 of my talk and then some early experiments are presented starting on slide 42. For example, R0053:B0030:E0040:B0015:pSB4A3 should make protein but fails to do so. PDF of slides is online here:

I appreciate the need to get started somewhere. But, I'm also pretty keen on measuring the absolute physical performance of the cell (molecules per cell) at some level and then developing relative standards on top of that. For example, you can develop mapping functions to go from a small number of q. western measurements to relative measurements on a plate reader.

I'm not sure what you most want to do first / are doing but let me know if you want to agree to a standard cell, plasmid, media, growth conditions, reporter construct and we'd be happy to support that 100%.

Drew

### On Aug 6, 2006, at 1:24 PM, John Anderson wrote:

My take on the issue is that we need several tiers of characterization to balance compliance and utility.

If we define the sole "standard experiment" as a specific biobrick reporter involving northerns/QPCR, westerns/ELISA, southerns, chemostats or other advanced methods, I think it would be difficult to get compliance. However, everyone should have access to a Tecan or cytometer to measure fluorescence of a RBS.GFP, RFP, or CFP.terminator reporter (such as pSB1A2-I13504 or pSB1A2-I13507). So, I think the minimal first tier of standardization is to measure the activity of basic promoter parts in some arbitrarily chosen reporter at a specific OD. We define the activity of the r0040 variant of the reporter as "1" and express the activity of other promoters relative to r0040. We just need to pick the reporter plasmid.

The three major issues (that we know about) are junction variation, copy number variation, and growth condition variation. By junction variation, I mean that a pSB1A2-[r0040-GFP-b0015] construct makes a different amount of mRNA than does a pSB1A2-[r0040-RFP-b0015]. That could take two forms. The variation could relative or context-dependent. So, we take a panel of constitutive promoters defined relative to Ptet=P_1.0, say P_0.1, P_0.5, and P_0.8, and we put them all in front of RFP and GFP and measure fluorescence. If we measured the activities 0.1, 0.5, 0.8, and 1.0 with our standard GFP reporter and there is only relative variation, we'd observe the same values with the RFP reporter. If there is context-dependent variation, we might observe that the P_0.5 promoter fails altogether giving a 0.0 reading. I don't know if this type of context-dependent variation even exists (I've never observed it myself), but certainly relative variation exists. Any standardization method that describes promoter activity relative to Ptet should be sufficient to take care of relative variation.

The second issue is the copy-number variation which will result when moving the construct from pSB1A2 onto a p15A, F plasmid, or genomic replicon. It could also occur if features on a pSB1A2 plasmid change its copy number, or a different strain is used.

With respect to "simple" promoters (constitutive promoters or promoters with regulatory sequence in trans) the standardization should largely be insulated from copy number variation. For example, if you put r0040 in front of a mevalonate biosynthesis genes on a single-copy replicon, it is probably still true that the amount of transcript generated is twice the activity of the same plasmid with a P_constitutive promoter measured as 0.5 barring context-dependent variation.

The difficulty will occur when we try to standardize parts that are not simple. Take for example I0500--the arabinose promoter which is a basic part in the registry, but is in theory a composite part/genetic circuit of promoter.rbs.araC.terminator.promoter. If you lower the copy number of the construct, the amount of AraC will be different. The transfer function of PoPS with respect to arabinose, the uninduced and induced PoPS value with respect to Ptet may change. So, measuring the activity of a pSB1A2/GFP reporter may not describe even the relative behavior of the part. It suggests that maybe there should be a single-copy standard reporter with QPCR measurments as a second tier of standardization.

The third variation, growth-dependent, occurs when the promoter responds to the media or OD. If we define the "standard" as fluorescence at OD600=0.5 in LB, a growth-phase dependent promoter such as katE wouldn't even be on under the standard conditions. Similarly, a promoter that was only activated in minimal media would give an irrelevent value. I have no idea how this type of activity can be standardized, but it probably requires more detailed experiments as you described. Nevertheless, knowing the activity of the element in question relative to Ptet under the copy-number and growth conditions in which it is intended to be used is still useful information.

So, yeah, it's a hard problem, but I think if we pick a reporter plasmid, cell strain, and growth conditions and describe activity relative to Ptet, we'd have something which is better than nothing...so we need to arbitrarily pick a standard.

### On 8/6/06, Drew Endy wrote:

Chris,

Awesome. New parts, techniques and outright improvements and upgrades to the BioBricks framework are desperately needed. Anything that helps enable and improve the reliable physical and functional composition of standard biological parts.

So far as quantifying PoPS-generating devices. We've measured the absolute physical activity of such devices by measuring the steady state levels of DNA, RNA, and protein in a simple chemostat via traditional methods (Southern, Northern, Western). At steady state we can use a simple ODE model to estimate the absolute number of polymerase molecules initiating per second per DNA copy and also the absolute number of ribosomes initiating per RBS per mRNA copy. It's a little tedious but works OK. We've also adapted this method to work in batch culture on a PE Victor III plate reader.

You hit the issue on the head though re: what reporter device should be used to define the activity of a PoPS generator? Just to say the obvious, the reason the choice of reporter is an issue is that there may be a failure of functional composition created by a weird ass junction between a particular PoPS generator and the reporter device being used. We don't yet know enough to say that any one of our fluorescent protein reporter devices is the right one(s) to be using.

We could send you a bunch of our different reporters devices (GFP and RFP based) if you want to try them out? And/or, we'd be happy to try measuring any constructs you have in parallel (i.e., we could try making the same measurements in both labs and see if things agree). This would be an important first step down the road of standardization. Chris V. and I keep talking about this but nothing's ever happened (yet).

Let me know how we can help. It's very exciting that you are pushing this forward!

All best, Drew

### On Aug 6, 2006, at 11:30 AM, John Anderson wrote:

Hi Drew,

How are things? I thought you might like to know that I have become a Biobricks convert. Stay tuned for some interesting new parts and techniques.

I'm writing with respect to iGEM, though. Specifically, how to quantify PoPS generating devices. Our team has made a family of synthetic constitutive promoters that span the full physiological range of promoter activity. We want to know how to quantify their activity.

I recognize that we can only measure PoPS relative to some standard, but I was wondering if there is one chosen standard. Do you standardize it as an abolute fluorescence when the part is inserted into a specific CFP reporter? Or, do we make it reporter- independent and reference activity to say R0040 in the same reporter context?

Thanks! Chris Anderson