20.109(S08):Start-up protein engineering (Day1)
Contrary to how it may be taught in some laboratory classrooms, the process of scientific inquiry encompasses much more than the collection and interpretation of data. A key part of the process is design – of experiments that specifically address a hypothesis and of new materials or technologies. Moreover, any design is subject to continued revision. You might redesign an experiment or tool based on your own own research, or you might consult the vast body of scientific literature for other perspectives. As the old graduate student saying goes, “A month in the lab might save you a day in the library!” In other words, although the process of combing the literature can be arduous or even tedious at times, it beats wasting a month of your time repeating experiments already proven not to work or reinventing the wheel.
During this module, each of you will design and test a new version of inverse pericam (IPC). Today, we will refer to a few primary research articles in order to familiarize ourselves with this recombinant protein and its constituent parts. The fluorescent component of IPC is an enhanced yellow fluorescent protein (abbreviated EYFP), one of the many derivatives of green fluorescent protein (GFP). GFP is naturally produced by jellyfish and was cloned into other organisms in the early 1990’s. It has since been exploited as a genetically encodable reporter and mutagenized to vary its excitation and emission spectra. The other key component of inverse pericam is the protein calmodulin (CaM), a natural calcium sensor that is present in all eukaryotes (and briefly reviewed here). Calmodulin has many ligands that it binds only in the presence of calcium ion, including the peptide fragment M13. This conditional specificity for M13 binding is enabled by the change in CaM’s conformation when it binds calcium.
Within inverse pericam, M13 and CaM are located at opposite ends, surrounding a permuted (i.e., rearranged) version of EYFP. In the absence of calcium, this EYFP exhibits strong fluorescence. However, when calcium is added to a solution of inverse pericam, CaM and M13 interact, disrupting the conformation and, as a result, the fluorescence of EYFP. The transition from bright to dim fluorescence occurs over a particular concentration range of calcium. Your goal today is to propose a mutation (actually, two) that will shift the concentration range over which IPC fluorescence decreases. Specifically, you will modify the calcium sensor portion of inverse pericam in a manner that is likely to increase or decrease its affinity for calcium ion.
In order to make reasonable modifications to inverse pericam, we will use several protein analysis tools. Proteins are modular materials that may be described and examined at multiple levels of a structural hierarchy (from primary to quaternary in the classical paradigm). Primary structure refers to a protein’s amino acid sequence, which might reveal a cluster of charged residues, say, or a pattern of alternating polar and nonpolar residues. One cannot predict the conformation of a protein merely from its linear sequence, however, due to rotational flexibility of bonds and non-covalent interactions between non-adjacent amino acids (as well as covalent disulfide bonds).
Physical methods used to interrogate 3D protein structure include X-ray diffraction (XRD), electron microscopy, and nuclear magnetic resonance (NMR) spectroscopy. The paper by Zhang et al. that you will refer to today describes the decoding of calmodulin’s structure using NMR, which depends on subjecting molecules to electromagnetic fields and analyzing the resulting energy absorption spectra of their nuclei. Scientists who elucidate protein structures, in addition to publishing their results, will often add them to public databases such as the Protein Data Bank (PDB). Because many proteins have structural motifs in common (e.g., alpha helices and beta sheets at the secondary level, or leucine-rich repeats at the tertiary level), which ultimately arise from their amino acid sequence, such databases can be useful for making predictions about proteins with known amino acid sequences but unknown structures. Today we will use a computer program that harnesses information in the Protein Data Bank to display interactive 3D models.
After examining both two- and three-dimensional protein information, you will propose two mutations to the wild-type inverse pericam protein, and finally design primers for incorporating these modifications at the genetic level.
Part 1: protein backbone
Perhaps nothing is so conducive to a feeling of intimate familiarity with a protein as studying it at the amino acid level (primary structure). For the first part of lab today, you will dissect a two-dimensional representation of inverse pericam into its component parts. Begin by downloading this document, which contains the DNA and amino acid sequences of inverse pericam (IPC). You will annotate these sequences to guide your design work.
- Figure 1 of the paper by Nagai et al. depicts the inverse pericam construct. Use this diagram to locate the M13 peptide in your sequence file. (Hint: the last paragraph of Nagai’s introduction states how many residues this version of M13 contains, and also cites a paper that gives the M13 amino acid sequence.) Bold the first and last codons of this component to indicate its location, and also write the location (according to residue) at the top of the page (e.g., M13 = aa10 - aa50).
- Next, look for the two parts of the permuted EYFP, keeping in mind that each component may be separated by a short linker of a few residues. You might look for the DNA or amino acid sequence of EYFP on a site such as NCBI (choose Protein or Nucleotide in the Search pull-down menu to get an amino acid sequence or a nucleotide sequence, respectively). Note that many variants of this protein exist, but they have a lot of sequence overlap. It may help you orient yourself to know that the version of EYFP in IPC was modified at residues 68 and 69 (see beginning of Materials and Methods in Nagai et al. for details). Again, bold the first and last codon of each component, and note the numerical location.
- You can use a similar method – i.e., finding sequence data online – to locate calmodulin (CaM) within inverse pericam. The CaM sequence is highly conserved across species, so you can most likely locate it using almost any sequence.
- Finally, mark the linkers in between each component in blue. As part of your homework, you will determine how these sequences compare to the linkers described in the Results section of the Nagai paper.
Part 2: higher-order protein features
Unless we are precocious bioengineers indeed, looking at the amino acid sequence alone is unlikely to tell us too much about the protein. We might be left wondering where the chromophore (colour-forming portion) is located in EYFP, or where the binding sites for M13 and for calcium ions are located in calmodulin. You will use a tool called Protein Explorer to examine some of these questions. Your work will also be informed by the primary literature. Let’s begin there.
- Now that you know the major components of inverse pericam, you might like to locate the notable features of each component. To help you locate calmodulin’s calcium binding sites, read the following portions of the Zhang paper, along with skimming whatever you find useful: abstract, first two paragraphs, “Linker and loop flexibility” section.
- In your IPC sequence document, mark the amino acid residues that make up the calcium-binding loops in CaM. Do they share any common features? If you find other areas of calmodulin that you may be interested in mutagenizing (e.g., hydrophobic pockets), mark these as well.
Print out your annotated document and hang on to it for reference. Now let’s put some visuals to all those letters!
- Protein Explorer is a free web-based viewer for biological molecules. To access it, open the Firefox browser and load proteinexplorer.org. Choose “Protein Explorer in Chime” and proceed to the Front Door.
- Structures are organized according to PDB (Protein Data Bank) identification codes, which may be input at the prompt in the middle of the page. Begin by looking at the molecule with PDB ID number 1CLL, which is a calcium-bound form of calmodulin. Later you will search for an example of the ligand-free form, also called apo calmodulin.
- The program will open in FirstView mode for the structure you’ve chosen (ensure that popup blockers are off if the structure fails to load). On the right is the image panel, which shows your protein along with associated water molecules and ligands. Try clicking and dragging on the rotating image to see what happens.
- Now look at the control panel on the upper left: here you can modify the image. Try removing the water molecules to get an unobstructed look at your protein’s backbone. Use the “backbone trace” link in the control panel to explore what this term means, then click the Back button to return. You can explore other definitions (e.g., for disulfide bonds) and always return by using Back.
- As you explore the features of the control panel and image panel, be sure to observe the message frame window on the lower left for any relevant information that may pop up. If you click on an atom in the image panel, its atomic identity will be displayed in the message frame, along with its encompassing amino acid residue and position.
- From the control panel, you can link to More Features/QuickView. This leads you both to detailed information about the publication upon which the model image is based, and to further options for modifying how you view the image. Try looking at the protein secondary structure, or use slab mode to investigate the amino acids buried in the core of the protein.
- Right-clicking on the image will bring up more options for viewing – for example, you can highlight specific amino acids, or change from a backbone trace to a space-filling model. Explore these features. For example, you might use colour to highlight all the acidic amino acids in calmodulin.
- Be sure to note any useful information in your notebook. You might ask:
- what method was used to elucidate the structure of this protein?
- how good is the image resolution?
- which species did this protein come from?
- when did the authors publish their results?
- what are the major components of the molecule’s secondary structure?
- what do the calcium binding loops (or other areas of interest you found) look like?
- Once you are satisfied with your understanding of calcium-bound calmodulin, bring up an apo calmodulin structure (or two) for comparison. You might find the structure directly by using PDB, or by using the NCBI Structure database. Write a few sentences in your lab notebook describing the differences between the calcium-bound and apo forms of calmodulin.
- If you have time, take a moment to look at some yellow fluorescent protein structures. We will not be making mutations to this portion of inverse pericam, because there tends to be more need for optimization of protein folding in this case. Why might this be? What are the major features of YFP?
Part 3: choice of mutation sites
You will now integrate the information you learned about calmodulin’s binding sites at the structural and residue levels. Decide on two modifications that might plausibly increase or decrease CaM’s affinity for calcium, and briefly state your hypothesis for the effect of each mutation in your notebook. To simplify your primer design, please just choose one residue to modify, but make two independent changes to it. Part of your writing assignment for this module will involve describing your design process and reasoning more fully.
Part 4: primer design for mutagenesis
In Module 1, you designed primers for creating a 32bp deletion in a GFP plasmid. Primer design for site-directed mutagenesis is in one respect more simple than in the previous exercise: both primers will be directed at the same location on each strand, and thus will be precisely complementary. You should also heed the following design guidelines:
- The desired mutation (1-3 bp) must be present on both strands.
- The mutation should occur approximately in the middle of the sequence.
- The primer should be 25-45 bp long.
- A G/C content of > 40% is desired.
- Both primers should terminate in at least one G or C base.
- The melting temperature should exceed 78 °C, according to:
- Tm = 81.5 + 0.41 (%GC) – 675/N - %mismatch
- N is primer length, and the two percentages should be whole numbers
Once you have chosen your mutation site, you might begin by writing out the codon of interest. Next consider the options for coding a new amino acid. Finally, you might pick the codon that incurs the fewest point mutations. An example is outlined below.
Residue 64 of calmodulin is tyrosine, encoded by TAC. To change to asparagine, one might choose GAU or GAC. Since GAC requires only one mutation (rather than two for GAU), we choose this codon. Ultimately, your primer might look like the following, which has a Tm of almost 81, and a G/C content of ~40%.
5’ GCT GAT GGC AAT GGA ACG ATT GAC TTT CCT GAA TTT CTT ACT ATG
3’ CGA CTA CCG TTA CCT TGC TAA CTG AAA GGA CTT AAA GAA TGA TAC
When you are done, remember to reverse your second primer so it reads 5’-3’ (for ease of ordering).
For next time
- Look ahead to Part 4 of our next lab class. Carefully read the two papers and come prepared to discuss them. There is no need to hand in written answers to any of the questions - they are simply meant to guide your reading.
- So long as you're doing a close reading of the paper about inverse pericam, you may as well complete your analysis of the components of IPC. How do the linkers in your annotated IPC sequence document compare to those reported by Nagai et al. in the Results section of their paper?