User:Janet B. Matsen/never forget

Math/Stats

Significant Figures

If you have a measurement like 12.34, and you are showing the error, would you write
- (a) 12.34 ± 1.56
- (b) or do you include the number of significant figures in your measurement: 12.34 ± 1.567?
- Mary: I would go with (a). That is certainly how all of the journals I publish in handle this.

If you are using a lac promoter and you add IPTG, arent you also inducing expression of LacZYA?
- I think so. But if you have a high copy plasmid, you are making much much more of your enzyme.

Standard deviation vs. standard error

useful summary: Standard deviations and standard errors
standard error is standard deviation divided by the square root of the sample size.

	Standard Deviation	Standard Error
Formula		(standard deviation)/sqrt(n)
Scale with Sampling	Doesn't change much with increased sampling	Scales by 1/(sqrt(n)), where n is the sample size. (Always smaller than standard deviation!)
Default Use	Describe how scattered samples are	Describe uncertainty of the estimate of the mean. Calculate confidence interval & P-value in many cases.

Plotting

Never plot absolute values

If you want to change a sign from negative to positive for a plot, plot the negative of the number, not the absolute value.
- In 5/2014 I was measuring the activity of some low-activity enzymes and plotted abs(Vmax), which lead some small numbers with the opposite sign to appear a little more promising than they were.

Chemicals

NH3 = ammonia, NO2 = nitrite, NO3 = nitrate
- "The preferred nitrogen electron acceptors in order of most to least thermodynamically favorable include nitrate (NO3−), nitrite (NO2−), nitric oxide (NO), nitrous oxide (N2O) finally resulting in the production of dinitrogen (N2) completing the nitrogen cycle." Wikipedia
- Nitrifying bacteria: ammonia --> nitrite OR nitrite --> nitrate
- Denitrification = reduction of nitrates back into the largely inert nitrogen gas (N2): NO3− → NO2− → NO + N2O → N2 (g)
  - performed by bacterial species such as Pseudomonas and Clostridium in anaerobic conditions: use the nitrate as an electron acceptor in the place of oxygen during respiration.
- Some types eat ammonia; they produce nitrites as waste. Others eat nitrites; they produce nitrates as waste.
DTT is a reducing agent which will help keep the Cys thiol groups from getting oxidized
Tris-HCl and Tris are NOT the same! (2012/11/14)
Many commercial enzyme preps have ammonium sulfate in them because the enzymes are purified by ammonium sulfate precipitation (2014/1/11)

Biotechnology

Why are viruses a big problem for bacterial biotech, but not for eukaryotic biotech?

Ans (from dialogue with Mary Lidstrom 8/2012)

Phage can be encapsulated in tough capsids. However, viruses infecting multicellular organisms evade the immune system by coating themselves with a portion of the cell membrane, but that makes them fragile outside a host.
Yeast viruses seem to be transmitted through the yeast mating process and don't infect the host cells from the outside. I don't know why that would be, except perhaps because it is a safe mechanism for transfer and yeast do mate at a high frequency. That means of course that even if an infection occurs, sterilization to get rid of the infected yeast gets rid of the virus also. That's much less problematic than having to scrub all the ventilation systems, walls, etc. to get rid of phage.

Metrics of yield/success

Yield (%), titer (g/L), productivity (g/L/hr) - Eli Groban of Intrexon 2/2013
- byproduct proﬁle & strain robustness - "From the ﬁrst drop to the ﬁrst truckload: commercialization of microbial processes for renewable chemicals" by Stephen Van Dien 2013
Also in Van Dien paper:
- "For production of basic and intermediate chemicals with selling price near $1.00/lb or lower, the raw material cost of sugar represents a significant fraction of the value of the product even at near theoretical yield. Thus target yields generally need to be at least 80% of theoretical yield even be considered for commercialization. In contrast, with a few notable exceptions cells do not direct a high percentage of carbon flux to these compounds, if any at all. Therefore, the metabolic engineer is faced with the challenge of redirecting a major portion of flux away from biomass production and natural fermentation products, and toward the product of interest."
- "As a rule of thumb, 50 g/L is the minimum acceptable titer for any basic or intermediate chemical, and may be higher in many cases."

Maximum Theoretical Yield Calculations

Options:
- Use FBA
  - example: "Metabolic ﬂux analysis is used to determine the relative ﬂux of all the reactions in a cell while satisfying all cellular constraints of mass, energy, and redox. When the uptake rates are experimentally determined, metabolic ﬂux analysis can be used to determine the maximum production rate, and subsequently yield, of the product of interest."
- Use chemical formula of dry biomass and the desired product

Chemostat versus batch process

Can't determine μ_max for a chemostat in a batch process, because you can't reach as high of μ_max values in a chemostat. You have to be below μ_max or by definition washout would occur.

Desireable metabolic products

branched chain alcohols
- "branched" is key because these compounds' properties include...

Why are yeast a favorite industrial organism?

Oxygen transfer is the largest thing to scale in industrial processes, so you want an organism that can grow anaerobically. You don't, however, want an obligate anaerobe because it is so much harder to work with and grow starter cultures of.
- Example: clostridium naturally produces butanol, but the fact that it is an obligate anaerobe motivated scientists to move its butanol production pathway into E. coli.

Ingredients for a good metabolic engineering project

Intermediates and products can be purchased & measured experimentally
System can be taken apart when it doesn't work
- Enzymes can be assayed individually. Ideally with spectrophotometer based MS assay or HPLC, not more laborious & expensive mass spectrometry.
System is composed of well characterized parts
- Basic science has been done on the enzymes. Post-translational regulation is understood. How it interacts with other pathways and regulation is understood.
Screens/selections are available
- And are powerful enough to overcome issues with false positives
Organism is good to work with
- Plasmids can be used. Ideally multiple origins are available, like with E. coli
- Knockouts can be made
- Genome is sequenced. Ideally transcriptome is, too.
- Grows fast on liquids & plates & is easily centrifuged (for assays!).

Why microbial catalysts are not as malleable as those in synthetic organic chemistry

Metabolic engineers must weigh many trade-offs in the development of microbial catalysts: (from Keasling PNAS talk)

cost and availability of starting materials (e.g., carbon substrates)
metabolic route and corresponding genes encoding the enzymes in the pathway to produce the desired product
most appropriate microbial host
robust and responsive genetic control system for the desired pathways and chosen host
methods for debugging and debottlenecking the constructed pathway
ways to maximize yields, titers, and productivities

Unfortunately, these design decisions cannot be made independently of each other: Genes cannot be expressed, nor will the resulting enzymes function, in every host; products or metabolic intermediates may be toxic to one host but not another host; different hosts have different levels of sophistication of genetic tools available; and processing conditions (e.g., growth, production, product separation and purification) are not compatible with all hosts.

Choosing a host

(from Keasling PNAS talk) Some of the most important qualities one must consider when choosing a host are whether:

the desired metabolic pathway exists or can be reconstituted in that host
if the host can survive (and thrive) under the desired process conditions (e.g., ambient versus extremes of temperature, pH, ionic strength, etc.)
if the host is genetically stable (both with the introduced pathway and not susceptible to phage attack)
if good genetic tools are available to manipulate the host.

Protein Modeling/Fusion Proteins

Can you use a protein structure from a database to preduct whether you can use that protein as part of a fusion protein?

Ans: from 2012_09_11 summary from e-mail from Justin about visualizing FLS & ADH

You may have the structure file, but you are going to run into two issues:
- "First, the floppy ends of the proteins are not in the structures. At the N and C terminal, you will often see a discrepancy between the actual sequence and what you see in a crystal. This is because you can only see parts of the protein that are essentially fixed in space and not moving around much. So where exactly to overlay the proteins won't be totally clear.
- Second, as long as the ends are floppy you really won't be able to predict how it looks and the units interact. The only reason to look at a structure for fusions is to know if the ends are highly structured and an integral part of the protein (i.e. in a sheet or helix). If this is the case it is best to avoid fusing those sections. However, if the ends are "floppy" or "unstructured" (as in not a helix or sheet, and not an integral part of the protein... this is for sure the case if you can't see the actual ends in the structure) you can most likely make a functional fusion. Now... there may be empirical differences between how well an N vs C terminal function, but their will really be no sound way to predict that. This is generally speaking, there are special cases, such as a protein that ends in a helix and a fusion made starting at a helix so the two proteins are very specifically oriented relative to each other. But these cases are extremely rare."

Basic Biology

How do cells incorporate N from minimal media?

reduced nitrogen (NH4+) is assimilated into glutamate and glutamine, then into other nitrogen-containing biomolecules.
- assimilation of NH4+ into glutamate requires two reactions:
  - glutamine synthetase catalyzes the reaction of glutamate and NH4+ to yield glutamine (in 2 steps)
- Glutamate is the source of amino groups for most other amino acids, through transamination reactions.
If NO3- (nitrate) is provided, they must expend NADH to reduce it to ammonium before it can be assimilated.
- Some organisms prefer to do this than start with ammonium. See MM1/MM2/MM3/HY

Bacterial Nomenclature

From the journal Molecular and Cell Biology: instructions to authors
capital 1st letter is for protein product of a gene. + or - mean whether a strain has the WT phenotype
- (i) Phenotype designations must be employed when mutant loci have not been identified or mapped. They can also be used to identify the protein product of a gene, e.g., the OmpA protein. Phenotype designations generally consist of three-letter symbols; these are not italicized, and the first letter of the symbol is capitalized (e.g., Pol). Wild-type characteristics can be designated with a superscript plus (Pol+), and, when necessary for clarity, negative superscripts (Pol–) can be used to designate mutant characteristics. Lowercase superscript letters may be used to further delineate phenotypes (e.g., Strr for streptomycin resistance). Phenotype designations should be defined.
Use lowercase italic letters for genes. You can indicate promoter and terminator sites. The + and - applies at the gene level, too.
- (ii) Genotype designations are also indicated by a three-letter symbol. In contrast to phenotype designations, genotype designations are lowercase italic (e.g., ara his rps). If several loci govern related functions, these are distinguished by an italicized capital letter following the locus symbol (e.g., araA araB). Mutation sites are distinguished by placing serial isolation numbers (allele numbers) after the locus symbol (e.g., ara-1 hisB5). Promoter, terminator, and operator sites should be indicated as described by Bachmann and Low (Microbiol. Rev. 44:1-56, 1980): e.g., lacZp, lacAt, and lacZo. It is essential in papers reporting the isolation of new mutants that allele numbers be given to the mutations. For Escherichia coli, there is a registry of such numbers: E. coli Genetic Stock Center, (http://cgsc.biology.yale.edu/). For the genus Salmonella, the registry is Salmonella Genetic Stock Centre (http://people.ucalgary.ca/~Kesander).
(?) Not sure about 2nd sentence...
- (iii) Wild-type alleles are indicated with a superscript plus (ara+his+). A superscript minus is not used to indicate a mutant locus; thus, one refers to an ara mutant rather than an ara– strain.
There is special notation for the kind of mutation
- (iv) The use of superscripts with genotypes (other than + to indicate wild-type alleles) should be avoided. Designations indicating amber mutations (Am), temperature-sensitive mutations (Ts), constitutive mutations (Con), cold-sensitive mutations (Cs), and production of a hybrid protein (Hyb) should follow the allele number [e.g., araA230(Am) hisD21(Ts)]. All other such designations of phenotype must be defined at the first occurrence. If superscripts must be used, they must be approved by the editor and defined at the first occurrence in the text.
Subscripts/suberscripts distinguish between genetic elements with the same name
- Subscripts may be used in two situations. Subscripts may be used to distinguish between genes (having the same name) from different organisms or strains; e.g., hisE. coli or hisK-12 for the his gene of E. coli or strain K-12, respectively, may be used to distinguish this gene from the his gene in another species or strain. An abbreviation may also be used if it is explained. Similarly, a subscript can also be used to distinguish between genetic elements that have the same name. For example, the promoters of the gln operon can be designated glnAp1 and glnAp2.
Don't refer to a strain by its mutation.
- (v) Avoid the use of a genotype as a name (e.g., "subsequent use of leuC6 for transduction"). If a strain designation has not been chosen, select an appropriate word combination (e.g., "either strain PA3092 or another strain containing the leuC6 mutation").

Gram + bacteria

Don't have a cytoplasm. There are gram + methylotrophs, so in these methylotrophs all metabolic reactions occur in the cytoplasm.
Are less easy to work with (Mary 10/18/2013)
- usually use replicons isolated from gram + bacteria (e.g. mycoplasm)
Bacillus subtilis is gram + and came out around when E. coli did.
Protoplasm fusion can be done.
Many are obligate anaerobes. e.g. Clostridium

Methylotrophy

Methylotrophic yeasts

There are methylotrophic yeasts that oxidize methanol to formaldehyde by the enzyme alcohol oxidase in a reaction that produces formaldehyde and hydrogen peroxide. To avoid damage to the cell by these very active compounds, this reaction occurs in peroxisomes, where catalase decomposes the hydrogen peroxide into water and oxygen. The formaldehyde produced can be assimilated by the dihydroxyacetone cycle or further oxidized to CO₂ for energy using a glutathione-dependent pathway.summary

Sounds pretty inefficient for metabolic engineering!

Fluorescent In Situ Hybridization (FISH)

use formaldehyde to fix cells before applying probes. This makes cells stick together, creating the illusion that consortia are tighter than they may be in nature.

Methanotrophy

All pure methanotroph cultures are aerobic. ?? True of methylotrophs, too??
Methanotrophs are found at the interface between anoxic and oxygenated environments. Recent evidence points toward the possibility that they ferment some of the formaldehyde produced from methane. The cultures cannot grow in strictly anaerobic conditions.
?? For a long time, it was thought that the EDD pathway was used to ____, but recent evidence points toward use of the EMP pathway. The EMP pathway is more efficient.
All genes for mixed acid fermentation are present.
5GB1 = Type 1 = RUMP cycle (superior for metabolic engineering energetics)
(virtually?) all methanotrophs that have been isolated make their own methanobactin or methanobactin-like molecule to scavenge copper. Is this a required function? Can organisms in nature rely on nearby methanotrophs to synthesize the expensive compound for them?

E. coli groups (B, K, etc.)

"Population genetic studies based on both multi-locus enzyme electrophoresis [11–13] and various DNA markers [14–18] have identified four major phylogenetic groups (A, B1, D and B2) and a potential fifth group (E) among E. coli strains. Strains of these groups differ in their phenotypic characteristics, including the ability to use certain sugars, antibiotic resistance profiles and growth rate–temperature relationships [19]. The distribution(presence/absence) of a range of virulence factors thought to be involved in the ability of a strain to cause diverse diseases also varies among strains of these phylogenetic groups [20–22],indicating a role of the genetic background in the expression of virulence [23]. Consequently, these groups are differently associated with certain ecological niches, life-history characteristics and propensity to cause disease. For example, group B2 and D strains are less frequently isolated from the environment [24],but more frequently recovered from extra-intestinal body sites[23]. While B2 strains represent 30 to 50% of the strains isolated from the faeces of healthy humans living in industrialized countries, they account for less than 5% in French Guyana Amerindians [25–26]." (source)
"Escherichia coli K–12 and B strains are among the most frequently used bacterial hosts for production of recombinant proteins on an industrial scale." (source)

E. coli vs. Salmonella

Mila claimed they are essentially the same organism.
- The Genome of Salmonella enterica Serovar Typhi, 2007
  - "If the DNA sequences of genes in the core genome of different enteric bacteria are compared, E. coli and S. enterica are found to differ by ∼10%, and Salmonella serovars within S. enterica differ by ∼1%. This 10% divergence between the core sequences of E. coli and S. enterica most likely represents evolutionary drift over the ∼100 million years since the 2 species separated from a common ancestor."

Basic Metabolic Engineering

When you knock out one gene, often other genes will compensate.
- Example: Aaron knocked out a key glycogen gene in 5GB1, but it wasn't clear that the exopolysaccharide production was reduced. Mila thinks that the metabolism of RUMP cycle produces an excess of energy that requires dumping of energy into such a compound.
Whenever you do selections for an enzyme variant, you usually have to knock out sinks for the metabolic intermediates used in the selection. There may be multiple because genes can compensate when one is removed as mentioned above.
When doing analytics to determine the concentration of a cellular component that requires extraction (e.g. glycogen), it is often best to include an internal standard to control for differences in extraction efficiency. The compound you add should have a similar extraction efficiency to that of your compound.

Mass Spec

Why is it important to correct for natural abundance & pool size in MS experiments? (incomplete)

If you see an increase/decrease in the 13C metabolite pool over time, it is possible (or likeley!) that it is due to the total pool size changing. This is called natural abundance correction by Amanda.
See Correction of 13C mass isotopomer distributions for natural stable isotope abundance in Mendeley library
- O and H don't have large natural abundances, but C does:
  - "For elements having low natural abundance of heavy isotopes, such as hydrogen or oxygen, the difference between natural abundance MIDs of unlabeled and labeled fragments is negligible and consideration of natural abundance MID skew is unnecessary. For elements with non-negligible abundances of naturally occurring stable isotopes, such as carbon, the natural abundance MIDs of labeled isotopomers differ significantly from that of the unlabeled ion."
- They say the best way to correct for natural abundance is to make standards of labled and unlabeled versions of the compounds you are detecting. However, this is costly and often impossible so they propose a mathematical method estimate a correction.

What can internal standard correct for?

ion supression
metabolite loss/degradation

Types of internal standards

¹³C, deuterated (²H)

How does ion pairing harm a MS system?

link: Chromatographers frequently have discussed the effect of the ion-pairing reagents on the stationary phase for columns used for ion-pairing chromatography. Apparently, ion-pairing reagents such as octanesulfonic acid (used for cations) and tetraalkylammonium bromide (used for anions) strongly sorb on the surfaces of bonded-silica columns at certain concentrations of organic modiﬁer. The columns become contaminated and cannot be regenerated to their original state, and the story goes that any column used for ionpairing work should be dedicated to that technique and never used again for regular reversed-phase chromatography. Bidlingmeyer (9) disagrees with this generality and feels that the aggressive pH values used for the ion-pairing coupling actually can change the nature of some columns by either hydrolysis of the bonded phase or endcapping silane under acidic conditions (pH 1–3) or by silica dissolution at higher pH values (pH 7–8).

Our FLS pathway

Alternate enzyme options

ACS

E. coli ACS is a bifunctional protein, adhE. We used to use it, but it had stability issues.
- Amanda suggested we only used part of the enzyme, and that resulted in stability issues.

ADH

There are several acetaldehyde dehydrogenases.
- Some are from E. coli metabolism.
  - Ethanol is produced as an anaerobic fermentation product. E. coli sometimes wants to utilize this carbon and energy source once oxygen becomes available. To utilize ethanol, it is converted to acetaldehyde, then acetaldehyde is converted to acetyl-CoA by an ADH. The enzyme in E. coli does both reactions.

DHAK

E. coli DHAK is PEP linked