Generating parts encoding circularly permuted proteins
What's a Circularly Permuted protein?
A circularly permuted protein is generated by (conceptually) linking together the N and C terminus of a protein into a circular molecule, and then cutting it back open at a different site. In the DNA, what this boils down to is doing something like this:
Note that in this example, I've shown how you would permute a periplasmically-expressed protein, hence the prepro sequence targeting it to the periplasm. When permuting, regulatory features need to stay in the same spots--you should want to "spin around" the active peptide. If this protein weren't periplasmic, it would be even easier to permute. You'd just flip the N and C terminal regions and that's it.
To illustrate how this done, let's try making one! Let's design a circularly permuted T4 Lysozyme. I have no idea whether the product of this construction file is a functional protein or not, but you'll get the idea of it. First of all, grab the crystal structure of it. T4 Lysozyme (T4L) is heavily studied structurally, so there are tons of files available on pubmed or pdb. Let's look at PDB ID: 3DN1.
The first question to ask is where are the N and C terminus. Are they reasonably close to each other? If not, you're probably not going to be able to make this work. They don't have to be right on top of each other, you can make up for some distance with a flexible linker. They look pretty good in T4L. You also want to look for a place to cut it back open. The ideal spots are going to be large disordered loops. T4L doesn't really have one, so we'll just go with one of the loops. Gly51 looks like a reasonable spot. There are more sophisticated modeling tools that probably would be wise to use for this sort of design, but I won't get into that here.
To make this thing, we need a template DNA. Berkeley iGEM 2008 has cloned T4L already and also removed the restriction sites, so we'll start with pBca1256-K112012, which you can download here. The overall strategy is to first clone individual regions of the target sequence into BglBricks plasmids, then use an assembly strategy to put the thing together. In this tutorial, we'll use SOEing to assemble the pieces together, but you could use BglBrick standard assembly to do this. You'd just necessarily have at least 1 GlySer dipeptide within the linker region. If you designed things to be assembled as BglBricks, you could either make a distinct linker part, or incorporate all or half the linker into the starting pieces.
Make BglBricks out of each region
The first step is to break up the sequence into its constituent regions, clone and sequence those. If we had a periplasmically-targetted protein, we'd split this up into 3 parts:
In designing the component parts, you need to make sure you are grabbing the right portions of the protein. For prepro sequences, often the sequence is presented as an annotation in the genbank file. If not, you can use a prediction tool such as: http://www.cbs.dtu.dk/services/SignalP/. You need to be sure you are including the entire prepro including the dipeptide that gets cleaved during processing.
For our purposes here, T4L is a cytoplasmic protein, so we don't have to deal with a prepro. We just need to break it into N and C-terminal parts. We need to make sure we're cutting at the right spot, though. Let's look at the crystal structure. Often the numbering of the amino acids in crystal structures is not the same as the numbering from the start codon! So, don't go into autopilot mode in finding your cut site within the DNA. First of all, find the amino acid (Gly51) in the structure and note the peptide that comes after it. In this case, it's GRNCNG. Now, using ApE, we'll translate the T4L CDS. Put your cursor at the BglII site of the ApE file with your source plasmid sequence and then go under ORFs > Find next. That should light up your open reading frame. Keep in mind that some genes start with GTG or even TTG, so note the annotation that is in your source to be sure you are really starting at the right spot. With the ORF highlighted, select ORFs > translate. make sure the DNA: Above button is clicked and say OK. Now, look for your peptide within the window that ApE popped up. Use your cursor to highlight the DNA above the GRNCNG peptide and copy that sequence. You can now close that translate window. Search for the GRNCNG peptide within your sequence file and highlight it. Now translate that again and make sure it translates at GRNCNG. If it doesn't, you probably grabbed the DNA 1 or 2 basepairs out of frame. Go back and re-do it until you get the sequence corresponding to GRNCNG highlighted.
OK, now we're ready to break this into two DNA sequences. First of all, copy the ORF of T4L into a new window. Next, let's break it directly 3' of the Gly51 codon. So, I'm going to start my mouse at the start codon and highlight up to the last base of the Gly51 codon. Now ctrl+x to cut, and paste in a new window. Alright, now we have our two windows corresponding to the N and C termini. The two sequences I have are:
Now would be a good time to repeat the translation proceedure on these sequences and make sure that each sequence is still in-frame and starts and stops with the right amino acids.
Alright, now add your BglBrick polylinker ends to these sequence and design some oligos. You should also add/remove start and stop codons where appropriate at this step. You'll be re-using these oligos in the second part of the construction, and the termini of the final products will be set by these oligos. Here's my two construction files:
PCR Oca9393/Oca9394 on pBca1256-K112012 (187 bp, pcrpdt) Digest pcrpdt (EcoRI/BamHI, L, pcrdig) Digest pBca9145-Bca1144#5 (EcoRI/BamHI, 2057+910, L, vectdig) Ligate pcrdig + vectdig (pBca9145-Bca9393) ---- >Oca9393 Construction of cpT4L N term part ctctgGAATTCATGAGATCTatgaatatatttgaaatgttac >Oca9394 Construction of cpT4L N term part catgtGGATCCttacccaatagctttatctaattcag
PCR Oca9395/Oca9396 on pBca1256-K112012 (376 bp, pcrpdt) Digest pcrpdt (EcoRI/BamHI, L, pcrdig) Digest pBca9145-Bca1144#5 (EcoRI/BamHI, 2057+910, L, vectdig) Ligate pcrdig + vectdig (pBca9145-Bca9395) ---- >Oca9395 Construction of cpT4L C term part ctctgGAATTCATGAGATCTatgcgtaattgcaatggtgtaattac >Oca9396 Construction of cpT4L C term part catgtGGATCCttatagatttttatacgcg
Assemble the complete part by SOEing
Alright, now that we have our individual parts made and sequenced, let's assemble them using SOEing to generate the circularly permuted gene. Let's design the junction between the two. First of all, we're going to want to put a linker between the two parts. It definitely will matter what this sequence is, but it really must be determined empirically. For today, let's just use the sequence GGQSGQ. A DNA sequence for that is:
Now, let's grab the last 20bp (or so, usual PCR design rules apply) of the new N-terminal part. We'll want to remove the stop codon, so that gives us:
N junction gggacgcgtataaaaatcta
Let's grab the first 20bp (or so) of the new C-terminal part. Whether to include the start codon or not is up for debate. Usually you would want to remove it:
C junction aatatatttgaaatgttacg
Our forward oligo for amplifying Bca9393 will then be
Linker.C junction GGAGGGcagtctgggcagaatatatttgaaatgttacg
Our reverse oligo for amplifying Bca9395 will be the reverse complement of
N junction.Linker gggacgcgtataaaaatctaGGAGGGcagtctgggcag
Alright, now we can write up the construction file:
PCR Oca9397/Oca9394 on pBca9145-Bca9393 (182 bp, gp, =fragA) PCR Oca9395/Oca9398 on pBca9145-Bca9395 (380 bp, gp, =fragB) PCR Oca9395/Oca9394 on fragA + fragB (544 bp, pcrpdt) Digest pcrpdt (EcoRI/BamHI, L, pcrdig) Digest pBca9145-Bca1144#5 (EcoRI/BamHI, 2057+910, L, vectdig) Ligate pcrpdt + vectdig (pBca9145-Bca9398) ---- >Oca9397 Forward SOEing oligo for cpT4L GGAGGGcagtctgggcagaatatatttgaaatgttacg >Oca9398 Reverse SOEing oligo for cpT4L ctgcccagactgCCCTCCtagatttttatacgcgtccc
For things like this you really really REALLY want to go through the construction file very carefully and inspect your final model sequence for correctness.
- Did all the pcrs "work"?
- Is your product sequence the correct frame, and are starts, stops, and linkers in the right places and correct frames?