20.109(S07):Start-up genome engineering

Introduction
How many genetically-encoded creatures exist in every milliliter of sea water? 100? 1000? More? It turns out that bacteria are by far the best represented life form, numbering up to a million cells/ml. If each cell is assumed to harbor the DNA content of pedestrian E. coli MG1655, then that means 10^12th base pairs of DNA/ml. This thriving gene pool is even more remarkable in light of the fact that each ml of sea water contains approximately 10^10th viruses that infect bacteria, aka bacteriophages or “phages” for short. These destroy half the world's bacterial population every 48 hours. Given the huge number of bacteriophages that exist, it's probably not surprising that most of the earth’s bacteriophages are completely uncharacterized, though massive genome sequencing efforts are underway.

A few bacteriophages are exquisitely well characterized. Indeed, the study of phage laid much of the groundwork for our current understanding of genetics and molecular principles in biology. These principles carry over to the biology of more complex cells (Jacques Monod famously said “What is true for Escherichia coli is true for the elephant” [Francois Jacob (1988)]). M13 is a member of the filamentous phage family. It has a long (~900 nm), narrow (~20 nm) protein coat that encases a small (~6.4 kb) single stranded DNA genome. The genome encodes 11 proteins, five of which are exposed on the phage’s protein coat and six of which are involved in phage maturation inside its E. coli host.

Phage Particles
The phage coat is primarily assembled from a 50 amino acid protein called pVIII (or p8), which is sensibly enough encoded by gene VIII (or g8) in the phage genome. For a wild type M13 particle, it takes about ~2700 copies of p8 to make the ~900 nm long coat. The coat's dimensions are flexible though and the number of p8 copies adjusts to accommodate the size of the single stranded genome it packages. For example, when the phage genome was mutated to reduce its number of DNA bases (from 6.4 kb to 221 bp), then the p8 coat “shrink wraps" around the reduced genome,  decreasing the number of p8 copies to less than 100. Electron micrographs of the resulting “microphage” and its wild type parent are shown below(image courtesy of Esther Bullitt, Boston University School of Medicine), where the black bar in each image is 50 nm long. And what about the upper limit to the length of the phage particle? Anecdotally, viable phage seems to top out at approximately twice the natural DNA content. However, deletion of a phage protein (p3) prevents full escape from the host E. coli, and phages that are 10-20X the normal length with several copies of the phage genome can be seen shedding from the E. coli host (look at the image on the coverpage to this module).

There are four other proteins on the phage surface, two of which have been extensively studied. At one end of the filament are five copies of the surface exposed pIX (p9) and a more buried companion protein, pVII (p7). If p8 forms the shaft of the phage, p9 and p7 form the “blunt” end that’s seen in the micrographs. These proteins are some of the smallest known (only 33 and 32 amino acids), though some additional residues can be added to the N-terminal portion of each which are then presented on the “outside” of the phage coat (much more on this technique later). At the other end of the phage particle are five copies of the surface exposed pIII (p3) and its less exposed accessory protein, pVI (p6). These form the rounded tip of the phage and are the first proteins to interact with the E. coli host during infection. p3 is also the last point of contact with the host as new phages bud from the bacterial surface.

Phage life-cycle
The general stages to a viral life cycle are: infection, replication of the viral genome, assembly of new viral particles and then release of the progeny particles from the host. Filamentous phages use a bacterial structure known as the F pilus to infect E. coli, with the M13 p3 tip contacting the TolA protein on the bacterial pilus. The phage genome is then transferred to the cytoplasm of the bacterial cell where resident proteins convert the single stranded DNA genome to a double stranded replicative form (“RF”). This DNA then serves as a template for expression of the phage genes.

Two phage gene products play critical roles in the next stage of the phage life cycle, namely amplification of the genome. pII (aka p2) nicks the double stranded form of the genome to initiate replication of the + strand. Without p2, no replication of the phage genome can occur. Host enzymes copy the replicated + strand, resulting in more copies of double stranded phage DNA. pV (aka p5) competes with double stranded DNA formation by sequestering copies of the + stranded DNA into a protein/DNA complex destined for packaging into new phage particles. Interestingly there is one additional phage-encoded protein, pX (p10), that is important for regulating the number of double stranded genomes in the bacterial host. Without p10 no + strands can accumulate. What's particularly interesting about p10 is that it's identical to the C-terminal portion of p2 since the gene for p10 is within the gene for p2 and the protein arises from transcription initiation within gene 2. This makes the manipulation of p10 inextricably linked to manipulation of p2 (an engineering headache) but it also makes for a compact and efficient phage in nature.



Phage maturation requires the phage-encoded proteins pIV (p4), pI (p1) and its translational restart product pXI (p11). Multiple copies (on the order of 12 or 14) of p4 assemble in the outer membrane into a stable, i.e. detergent resistant, barrel-shaped structure. Similarly a handful of the p1 and p11 proteins (5 or 6 copies of each) assemble in the bacterial inner membrane, and genetic evidence suggests C-terminal portions of p1 and p11 interact with the N-terminal portion of p4 in the periplasm. Together the p1, p11, p4 complex forms channels through which mature phage are secreted from the bacterial host.

To initiate phage secretion, two of the minor phage coat proteins, p9 and p7, are thought to interact with the p5-single stranded DNA complex at a region of the DNA called the packaging sequence (aka PS). The p5 proteins covering the single stranded DNA are then replaced by p8 proteins that are embedded in the bacterial membrane and the growing phage filament is threaded through the p1, p11, p4 channel. This replacement of p5 by p8 explains the microphage data presented earlier...making very clear how the size of the phage particle is determined by the number of bases the phage packages. Once the phage DNA has been fully coated with p8, the secretion terminates by adding the p3/p6 cap, and the new phage detaches from the bacterial surface. How long does all this take? Amazingly, new M13 phage particles are secreted within 10 minutes from a newly infected host and can arise at a rate of 1000/cell within the first hour of infection. Also amazing is how the bacterial host can continue to grow and divide, allowing this process to continue indefinitely.

Protocols
As you heard about in lecture, you’ll be starting a project to study the M13 genome and in the process you’ll be learning some fundamental tools and techniques of molecular biology. One major goal we have for this module is to establish good habits for documentation of your work, in your lab notebook and on the wiki. By documenting your work according to the exercises done today, you will Today’s lab has four parts. First, you and your partner will complete the lab practical we have set up for you. Second, you and your lab partner will annotate the M13K07 genome map, identifying trouble spots suitable for re-engineering. Next you will design a pair of oligonucleotides for adding an epitope tag to one of two proteins encoded by the M13 genome. Finally, you will begin to prepare the M13 backbone needed for epitope tagging. Next time you’ll start the cloning to insert the oligonucleotide tag into the phage genome.
 * Be better research students (in 20.109 and in any research lab you may join)
 * Be better writers since a clear record of what you’ve done will improve your data analysis
 * Be better scientists, since you’ll eventually train others to document things this way too

Part 1: lab practical
Good luck!

Part 2: M13K07 renovation
You are about to undertake an ambitious project, namely a complete renovation of the M13 genome. If you’re successful, the renovated genome will be a better substrate for further engineering. It will consist of discrete, insulated elements that might, later on, be re-used, tuned and rationally modified with ease. Refining the genome won’t be easy. Nature has optimized the existing phage in response to evolutionary pressures and it will be a lot easier for you to kill it than to improve it. But we have a few powerful resources to draw from: full sequence data is available for the M13 genome and some of its close relatives; clever genetic experiments have defined the functionally relevant parts; structural data gives us a view of the phage particle and its components. Together these provide detailed, though not complete, understanding of the workings of the phage. And the beauty of this experiment is how your genome renovation, once built and tested, will feedback and add to the existing base of knowledge.

Today you will begin your begin your renovation with a detailed evaluation of the natural existence, identifying parts of the genome crying out for re-engineering. Annotate the M13K07 genome printout in the following way.
 * 1) change the PstI site (CTGCAG) that starts at position 8079 to an EcoRI site (GAATTC).
 * 2) change the T at position 1372 to an A, making a PstI site.
 * 3) box the unique BamHI site (GGATCC) that starts at position 2220.
 * 4) use the information printed at the start of your M13K07 sequence to box the start and stop codons for genes I through XI.
 * 5) summarize in your notebook any instances where modification of one gene affects the sequence of another.

Part 3: Digest M13K07
Restriction endonucleases, also called restriction enzymes, cut (“digest”) DNA at specific sequences of bases. The restriction enzymes are named for the prokaryotic organism from which they were isolated. For example, the restriction endonuclease EcoRI (pronounced “echo-are-one”) was originally isolated from E. coli giving it the “Eco” part of the name. “RI” indicates the particular version on the E. coli strain (RY13) and the fact that it was the first restriction enzyme isolated from this strain.

The sequence of DNA that is bound and cleaved by an endonuclease is called the recognition sequence or restriction site. These sequences are usually four or six base pairs long and palindromic, that is, they read the same 5’ to 3’ on the top and bottom strand of DNA. For example the recognition sequence for EcoRI is

5’ GAATTC 3’ 3’ CTTAAG 5’

 Other restriction enzymes, for example HaeIII, cut in the middle of the palindrome leaving no DNA overhang, called a “blunt end.” One of the most useful resources for restriction enzyme information is the website from New England Biolabs NEB homepage. Use their search engine to retrieve information about the recognition enzymes SmaI and XmaI. Be sure you are clear on how they differ before you move on to the experiment. For example, do the enzymes have the same recognition sites? do they leave the same overhang? will they work in the same buffer? at the same temperature? These are some of the preliminary questions you'll have to ask yourself whenever you set up a restriction digest.

Next, use the NEB website or the paper copy of their catalog to look up the recognition sites for the enzymes PstI and BamHI. What overhangs do they leave? what buffer is recommended? what temperature do they work best at? You will perform a restriction digest with one of these two enzymes today. Your choice will depend on which of the M13 proteins you would like to modify. A PstI digest linearizes the M13K07 genome at the gene for p8 whereas a BamHI digest cuts the gene for p3. You and your partner should decide which protein you would like to modify and choose one of the following protocols to follow:

Assemble your reactions in the following order: water, buffer, DNA and finally enzyme. You can flick the tube to mix the contents and give it a quick spin in the microfuge to bring any pellets down to the bottom of the eppendorf tube (be sure to balance your tube against another in the microfuge). You should also set up a second reaction without enzyme. Label two tubes with your team color, the name of the DNA you'll digest, and the name of the enzyme (if used). Place both tubes in the 37° incubator while you design the tag that you will clone into this digested plasmid.

Part 4: M13K07 tag
Myc is a proto-oncogene or "cancer gene," i.e. one that exists in normal cells but that can be mutated to behave badly in cancerous one. Because of its relevance to development and disease, myc has been extensively studied and good antibodies exist that can recognize even very small portions of the myc protein. We will be using such a portion of myc in our studies of M13 to tag one of the phage proteins, either p3 or p8. This will allow us to detect the phage-myc fusion protein with an antibody, telling us if the protein is expressed in the bacteria and on the phage. It will give us new information about the phage's tolerance for manipulation, since these tags have not been applied to these phage proteins before. It may also provide a "hook" onto the phage coat that might be useful for building things. For example, these same tagging techniques were used to add a short glutamate sequence to the M13 phage you'll use in Module 4, allowing the phage to template nanowires.

The sequence of the myc epitope, in one letter amino acid code, is EQKLISEEDL. To design the tag, you will need to: As helpful molecular tools you will also:
 * reverse translate this sequence (i.e., detail the DNA sequence that could encode this protein sequence)
 * add an overhang so the tag can fit into the PstI or BamHI sites in the genes for p8 or p3
 * confirm that the reading frame for the additional amino acids is right
 * add a restriction site that makes no change to the epitope's protein sequence (i.e. is "silent") but that will be useful as you check your cloned products on Day 4 of this module.
 * modify the overhang so the Pst or Bam sites are not regenerated in the new plasmid. This will be useful on Day 3 of this module to rid your mix of reclosed plasmids containing no inserts.

A few helpful websites for this part of the protocol are: You may want to open each of them in separate browser windows. Another informative site is this one for general info about epitope tagging. '''You do not have to design tags for both p8 and p3. Just follow the directions for the one you've decided to try.'''
 * Gene Design
 * the genetic code (table of codons and amino acids)
 * restriction enzyme finder to help with primer design

p8+myc
Begin by opening and printing the [[Media:Macintosh_HD-Users-nkuldell-Desktop-g8p_M13K07%2BPst.doc | p8 DNA and protein sequence file]]. Find the PstI site on this sequence. It should fall across two alanines in the protein translation. At the bottom of the printout, write the double stranded sequence for the two residues prior to the double alanine, grouping the sequence into the correct codon triplets. Add the PstI site, indicating the overhanging single stranded sequences you will have once the DNA is digested. Finally, add the double stranded sequence for the two residues after the double alanine, keeping the codon triplets.

Now you're ready to plan your insert. Plan the top strand of the insert first.

Top strand, step 1: Begin by reverse translating the myc epitope (EQKLISEEDL) at Gene Design. Be sure to use the codon bias for the organism you'll be studying. Copy the sequence you get from this program to a new MSWord document, adding spaces between the codons for each amino acid. For each step described here, add a new line of sequence to your MS Word document, with a short mention of the purpose. Ideally this record should be informative enough that someone else could understand how you designed this insert without having to ask you about it.

Top strand, step 2: Next add a PstI tail that will anneal this top strand to the overhang that's left in the plasmid you're cutting. Which end to you need to add this tail to? What sequence will you add? You should design your sequence so it presents a single stranded tail that can anneal to the backbone you are digesting but that will not regenerate the PstI site once the insert attaches to the backbone.

Top strand, step 3: Now a tricky part: are you still in frame for correct protein translation? Recall that there are three reading frames, only one of which is the correct one. If you are out of frame, how can you most innocuously restore the reading frame? You have the options of adding a few bases to the end of the strand, or deleting some without removing sequence information. Don't forget about the wobble position of the codons you're looking at, as this might allow you to shift the reading frame of the insert without changing the protein sequence.

Top strand, step 4: You will need a restriction site in your insert to determine later if your insert is present. One way to do this would be to find or place a restriction site in your insert that is not found in the rest of the M13 genome. There are also ways to change the codons in your insert without changing the amino acids. Note all the degeneracies for the codons and look for any that could introduce restriction sites. Recall that most restriction sites are palindromic so it might be easiest to look for palindromes and then check out NEB Cutter to see if any restriction enzymes cut that sequence. Once you've found a restriction site to add to your insert, check the existing M13K07 genome to see if there are other places that enzyme might cut. The following links may be helpful: The teaching faculty can help you decide between different options if you are having a tough time choosing.
 * M13K07 genome sequence and links that list
 * [[Media:M13KO7 zerocutters.txt | Zero cutters]] (restriction enzymes that cut the M13 genome 0 times)
 * [[Media:M13KO7 singlecutters.txt | Single-site cutters]]
 * [[Media:M13KO7 2cutters.txt | Double-site cutters]]

Top strand, step 5: In your MS Word document, box the sequence of the top strand you've designed, underlining the restriction site you've added and noting the sequence you hope to keep single stranded in lower case letters.

Now you're ready to design the bottom strand.

 Bottom strand, step 1: Base pair the top strand you've designed, leaving off any sequence you want to remain single stranded in the final insert.

 Bottom strand, step 2: Add a single stranded Pst tail so there will be an overhang that can anneal to the M13 genome backbone you are digesting.

 Bottom strand, step 3: Destroy the Pst site that might be regenerated by the annealing of this oligo you are designing into the Pst backbone you are preparing. There are several ways to do this but try to take the most conservative approach as possible (e.g. if you have to introduce a new amino acid into the tag, make it a neutral one). Recall that you must maintain the codon reading frame that you have established.  Adjust the top strand you've designed to accommodate the design of your bottom strand.

 Bottom strand, step 4: In your MS Word document, box the sequence of the bottom strand you've designed, underlining the restriction site you've added and noting the sequence you home to keep single stranded in lower case letters. Below the oligonucleotides, write the protein sequence you expect from the resulting DNA, and confirm that the reading frame is correct with the existing gene for p8.

The following steps are not essential but are worth doing if you have some time:
 * verify the basepairing of the top and bottom strand
 * confirm that the restriction sites you've included are in fact there (try pasting your sequence into NEB cutter)
 * confirm that the restriction sites you've destroyed are gone. This may involve temporarily filling in the overhangs then using the NEB cutter site.
 * check your design against the ones listed in the  Reagents list for the next lab. If the sequence you've designed is significantly different, we can order the ones you've come up with. To do this, you will have to email a copy of your MS Word document to nkuldell.

If you're done, print out three copies of your MSWord document: one for your lab notebook and one for your lab partners and one to hand in to the teaching faculty.

p3+myc
Begin by opening and printing the [[Media:Macintosh_HD-Users-nkuldell-Desktop-g3p M13KO7translated.doc | translated M13K07 g3p]]. Find the BamHI site on this sequence. It should fall across an E D P sequence about halfway through the protein translation. At the bottom of the printout, write the double stranded sequence for this region, grouping the E D P sequence into the correct codon triplets and indicating the overhanging single stranded sequences you will have once the DNA is digested.

Now you're ready to plan your insert. Plan the top strand of the insert first.

Top strand, step 1: Begin by reverse translating the myc epitope (EQKLISEEDL) at Gene Design. Be sure to use the codon bias for the organism you'll be studying. Copy the sequence you get from this program to a new MSWord document, adding spaces between the codons for each amino acid. For each step described here, add a new line of sequence to your MS word document, with a short mention of the purpose. Ideally this record should be informative enough that someone else could understand how you designed this insert without having to ask you about it.

Top strand, step 2: Next add a BamHI tail that will anneal this top strand to the overhang that's left in the plasmid you're cutting. Which end to you need to add this tail to? What sequence will you add? You should design your sequence so it presents a single stranded tail that can anneal to the backbone you are digesting but that  will not regenerate the BamHI site once the insert attaches to the backbone.

Top strand, step 3: Now a tricky part: are you still in frame for correct protein translation? Recall that there are three reading frames, only one of which is the correct one. If you are out of frame, how can you most innocuously restore the reading frame? You have the options of adding a few bases to the end of the strand, or deleting some without removing sequence information. Don't forget about the wobble position of the codons you're looking at as this might allow you to shift the reading frame of the insert without changing the protein sequence. Try to take the most conservative approach as possible (e.g., if you have to introduce a new amino acid into the tag, make it a neutral one).

Top strand, step 4: You will need a restriction site in your insert to determine later if your insert is present. One way to do this would be to find or place a restriction site in your insert that is not found in the rest of the M13 genome. There are also ways to change the codons in your insert without changing the amino acids. You can try noting all the degeneracies for the codons and look for any that could introduce restriction sites. Recall that most restriction sites are palindromic so it might be easiest to look for palindromes and then check out NEB Cutter to see if any restriction enzymes cut that sequence. Once you've found a restriction site to add to your insert, check the existing M13K07 genome to see if there are other places that enzyme might cut. The following links may be helpful: The teaching faculty can help you decide between different options if you are having a tough time choosing.
 * M13K07 genome sequence and links that list
 * [[Media:M13KO7 zerocutters.txt | Zero cutters]] (restriction enzymes that cut the M13 genome 0 times)
 * [[Media:M13KO7 singlecutters.txt | Single-site cutters]]
 * [[Media:M13KO7 2cutters.txt | Double-site cutters]]

Top strand, step 5: In your MSWord document, box the sequence of the top strand you've designed, underlining the restriction site you've added and noting the sequence you hope to keep single stranded in lower case letters.

Now you're ready to design the bottom strand.

 Bottom strand, step 1: Base pair the top strand you've designed, leaving off any sequence you want to remain single stranded in the final insert.

 Bottom strand, step 2: Add a single stranded BamHI tail so there will be an overhang that can anneal to the M13 genome backbone you are digesting.

 Bottom strand, step 3: Destroy the BamHI site that might be regenerated by the annealing of this oligo you are designing into the BamHI backbone you are preparing. There are several ways to do this but recall that you must maintain the codon reading frame that you have established.  Adjust the top strand you've designed to accomodate the design of your bottom strand.

 Bottom strand, step 4: In your MS Word document, box the sequence of the bottom strand you've designed, underlining the restriction site you've added and noting the sequence you home to keep single stranded in lower case letters. Below the oligonucleotides, write the protein sequence you expect from the resulting DNA, and confirm that the reading frame is correct with the existing gene for p3.

The following steps are not essential but are worth doing if you have some time:
 * verify the basepairing of the top and bottom strand
 * confirm that the restriction sites you've included are in fact there (try pasting your sequence into NEB cutter)
 * confirm that the restriction sites you've destroyed are gone. This may involve temporarily filling in the overhangs then using the NEB cutter site.
 * check your design against the ones listed in the Reagents list for the next lab. If the sequence you've designed is significantly different, we can order the ones you've come up with. To do this, you will have to email a copy of your MSWord document to nkuldell.

If you're done, print out three copies of your MS Word document: one for your lab notebook and one for your lab partners and one to hand in to the teaching faculty.

Also before you leave, give your digested DNA samples to the teaching faculty, who will freeze them away for next time. DONE!

For next time

 * 1) The major writing assignment for this module will be a description of your M13 renovation work. Use the summary in your lab notebook to start a table on your wiki user page to organize your thoughts about the existing genome. Generate a table that lists each gene and any re-engineering ideas you have for it. Print out this table to hand in next time. If you want to start a new wiki page for this part of the assignment, go for it but be sure to follow OWW page naming guidelines and choose something like "username:superM13" as the title for your page not just "myM13" since there will soon be several people trying to name their page exactly that.
 * 2) A good understanding of the biology of M13 will be essential as you consider any genome re-engineering ideas. Please answer the following questions to raise your awareness of how nature may constrain your design possibilities.
 * 3) * Would you expect the phage to tolerate p8 modifications that:
 * 4) **make the protein neutral rather than positively charged at the C-terminal region? (hint: is this part of the protein on the inside or the outside the phage coat?)
 * 5) **encode all Leucines with the CTA codon instead of the CTG codon?
 * 6) **double the size of the protein? Justify your answers. Please assume that the p8 modifications do not destabilize the protein itself.
 * 7) * Would you expect the phage to tolerate these same modifications to p3?
 * 8) * Would you expect the phage to tolerate transcriptional terminators that are
 * 9) **2X stronger
 * 10) **100X stronger
 * 11) **2X weaker
 * 12) **100X weaker? Again please justify your answer.
 * 13) Nature often preserves functionally critical genomic elements, and evolutionary cousins can help us identify which genetic elements are disposable, which are interchangeable, and which are essential. Who are M13's closest evolutionary relatives and how do they differ from the phage you're working with?
 * 14) As you heard on the first day of class, the writing you are doing for 20.109 is the subject of an academic study and will eventually become a chapter in a forthcoming book, "The Idea of a Writing Laboratory." The author, Neal Lerner, has requested that you download the following file [[Media:Macintosh HD-Users-nkuldell-Desktop-Student Writing Survey.doc | Student Writing Survey]], fill the information out electronically, then email the completed survey to "nlerner AT mit DOT edu". Please cc "nkuldell AT mit DOT edu" on your message. He will directly follow up with some 109ers. Thanks in advance.

Reagents list

 * 10X NEBuffer 2:
 * 500 mM NaCl
 * 100 mM Tris-HCl
 * 100 mM MgCl2
 * 10 mM DTT


 * 10X NEBuffer 3:
 * 1 M NaCl
 * 500 mM Tris-HCl
 * 100 mM MgCl2
 * 10 mM DTT