Introduction To Oligo Design
This tutorial takes you through the basics of how to design oligos to PCR amplify a gene and insert it into a plasmid. At the end of the process, you will have a construction file that describes how all the bits and pieces will be put together, and the sequences of the oligonucleotides you need to order to do your experiment. To do this, you'll need the sequence of the gene you want to clone, the sequence of the plasmid you want to put it in, and an editing program such as ApE to manipulate the sequences. The tutorial assumes you have a basic knowledge about DNA and already understand how PCR works. If you are a little rusty, check out this explanation of DNA and this PCR primer.
The Construction File
To get started, let's look at a complete construction file:
Construction of KanR Basic Part Bca9128
This is an example of cloning the gene kanR encoding the kanamycin resistance gene from a plasmid and inserting it into the Biobrick plasmid pSB1A2-I13521. The product of the experiment is plasmid pSB1A2-Bca9128. The template for the PCR is pSB1AK3-b0015. You can view annotated sequences for these plasmids by right clicking on the links and saving the files to your computer.
If you click on one of the links, you'll see the text of the file. The format you're seeing is the common GenBank format. Many programs including ApE (A plasmid Editor) can interpret this format and perform the operations described in this tutorial (locating restriction sites and reverse complementing sequences). If you are unable to download a suitable program, these functions can also be performed with the tools at http://searchlauncher.bcm.tmc.edu/seq-util/seq-util.html
The construction file has two sections to it. The top portion is a list of manipulations you will perform, and the bottom portion is a list of molecules that you'll use for those manipulations. What this construction file is saying to do is set up a PCR reaction with oligonucleotides ca1067F and ca1067R using pSB1AK3-b0015 plasmid DNA as template for the reaction. The product of that PCR reaction is 1054 bp long, and you should digest it with EcoRI and SpeI restriction enzymes. Digesting with EcoRI and SpeI generate cohesive ends you will use to ligate this DNA into your vector backbone. Similarly, you would set up a digest of plasmid pSB1A2-I13521 with EcoRI and SpeI which will cut the plasmid into two fragments of sizes 2062 and 946 bp. The "0" means you want to gel purify the the fragment size listed at index 0 (numbered as 0,1,2,...), which in this case is 2062. For "10+1038+6, 1", it is position 1, the 1038bp fragment. Upon ligating this fragment to your PCR digest, you will transform bacteria, and the product of these manipulations is plasmid pSB1A2-Bca9128.
To understand this further, let's focus on the oligos and template and how they work.
Interpreting the Construction File
Let's look at the oligos and see what's in them:
>ca1067F Forward Biobricking of KanR of pSB1AK3 ccagtGAATTCgtccTCTAGAgagctgatccttcaactc >ca1067R Reverse Biobricking of KanR of pSB1AK3 gcagtACTAGTtccgtcaagtcagcgtaatg
The first part is the name of the oligo, the second part is a description of what it's for, and the third part is the sequence of the oligo in 5' to 3' format. Keep in mind that oligonucleotides are single stranded DNAs, and the direction of the oligo is very important. In general, the restriction sites will always appear on the 5' end of the oligo (the left-hand side). Here, the EcoRI (GAATTC), XbaI (TCTAGA), and SpeI (ACTAGT) restriction sites are shown in upper case letters. When ordering oligos from most suppliers (IDT and Genosys included) the case of the letters is irrelevant. It is useful to change the case to highlight special features in your sequences such as restriction sites.
When you run a PCR, you are annealing the oligos to the template sequence and initiating polymerization. Polymerization always goes in the 5' to 3' direction. What this means is that the chain initiates from the 3' end of the oligo (the righthand side). The polymerases have a fairly strict requirement that the last 6 bases of the oligo need to base pair with the template for the reaction to occur. In the case of ca1067F, that would be the sequence caactc. However, the melting temperature, or Tm of this sequence alone is only -10.5 degrees Celsius. Your PCR will be operating at no temperature lower than 45 degrees, so a sequence this short would be dissociated from its template under all conditions experienced during the reaction. Therefore, the oligo must have significantly more homology to the template than this. Usually the "annealing region" of the oligo is bare minimum 15 bp, but typically more like 20 bp. For oligo ca1067, the annealing region is the sequence gagctgatccttcaactc.
To see how this works, open up pSB1AK3-b0015 in an editor and search for gagctgatccttcaactc. You should see the sequence light up just upstream of the KanR gene.
Now try to search for the entire oligo sequence ccagtGAATTCgtccTCTAGAgagctgatccttcaactc in pSB1AK3-b0015. It fails! Why? Only the annealing region of this oligo matches the template and here lies an important principle about oligo design. Only the 3' annealing region of the oligo has to match the template for PCR to work. You can pin almost any sequence to the 5' end of the oligo, and that sequence will be incorporated into the final PCR product. Here, we are using this to incorporate restriction sites into the PCR product. Note, though, that additional sequence lies in between the two restriction sites, and 5 bp of arbitrary sequence lies on the 5' end of the oligo. The reason for adding these bases is that restriction enzymes don't like to cut their sequence when there is nothing upstream of the sequence. Each enzyme is different, but in general pinning 5 bases on the end of an oligo is sufficient for most enzymes. For a more detailed description of this issue on a case-by-case basis, you can check out this NEB page
Alright, so now let's predict the product of PCR with oligos ca1067F and ca1067R on pSB1AK3-b0015.
Predicting the PCR Product
Select the annealing sequence of ca1067F (gagctgatccttcaactc) and search for it again. Now copy the entire sequence of the oligo (ccagtGAATTCgtccTCTAGAgagctgatccttcaactc). While the annealing region is still highlighted within the file, paste the copied text to replace the annealing sequence with the oligo sequence. Now select all the sequence upstream (to the left) of your oligo sequence and delete it. You have now "fixed" the 5' end of your PCR product.
Now, let's look at the 3' end of your PCR product. First of all, grab the annealing region of ca1067R (ccgtcaagtcagcgtaatg). Do a find for this sequence in your ApE file. It should fail. Why? The reverse oligo anneals to the other strand of the template DNA. This will always be the case. One of your two oligos will match the template exactly, the other will only match as the reverse complement. So, you need to first reverse complement the sequence of ca1067R. Use your program to reverse complement the sequence and copy it to the clipboard (ctrl-shift-c). Paste the sequence into a new window. It should be cattacgctgacttgacggaACTAGTactgc. Note that now the SpeI restriction site is on the right-hand side of the sequence, and the annealing region is on the left-most side.
Now grab the annealing region of this sequence (cattacgctgacttgacgg) and search for it in the pSB1AK3-b0015 file. Note there is an extra "a" in between the annealing region and the template here. The exact annealing region can sometimes be a little hard to pinpoint in this business, but you can always start with the terminus of the oligo and work your way over. Say, start with the first 10 bp (cattacgctg) and search for that, and if that's not a unique string in your template, search for the first 12 or 15 bases. Again, replace the annealing region with the complete sequence of the reverse-complemented oligo. Now, find the right-most end of the oligo and delete everything downstream of it. You should now have the sequence:
This is the sequence of your PCR product. If it doesn't match this, go back and try this again. You need to be confident in your ability to predict the PCR product given template and oligo sequences. It's critical to be able to design PCR reactions that will produce the product you need for your experiment. Eventually, this tutorial will get you to design the oligos for a new reaction, but the last step of any design procedure is to write a construction file and check it. To check it, you go through the steps of the cloning experiment in silico to simulate what would happen in the test tube. If it doesn't work in the computer, it's not going to work in the lab, and you can waste a lot of time and energy with a flawed design. So, you'll want to always always ALWAYS go through this in the computer before ordering oligos.
Once you've mastered this, let's simulate the cloning experiment.
Simulating the Construction File
Alright, you have the sequence of the PCR product. Let's digest it. Our construction file said to digest with EcoRI and SpeI. The EcoRI and SpeI enzymes will digest any of their respective restriction sites present in your sequence. With the PCR product sequence in your editor, find all the EcoRI and SpeI and hit "highlight". The restriction sites should light up in red. Why not also search for DpnI and why are we using it at all? If you're curious, click here.
Look at the file now and make sure that additional sites did NOT light up. If there were internal restriction sites, you will not end up cloning the fragment you intended. The shorter fragments almost always preferentially get incorporated into your final plasmid. Now, let's simulate digestion of the PCR product. EcoRI digests the GAATTC sequence leaving an AATT sticky end. So, remove all the sequence upstream of AATT. Similarly, SpeI digests ACTAGT leaving a CTAG sticky end, so delete all the sequence downstream of CTAG. You should be left with:
Now let's shift gears and look at the vector sequence. Open pSB1A2-I13521 in your editor. Search for EcoRI and SpeI sites. Make sure again that the sites are unique, and then use your mouse to select all the sequence from AATT to CTAG of the restriction sites. Leaving that highlighted, go back to the file with your PCR product digest, select the entire sequence, and copy it to the clipboard. Go back to pSB1A2-I13521 and paste in your PCR product. You have now simulated the ligation reaction and predicted the sequence you would expect from this cloning experiment.
You can open up pSB1A2-Bca9128 in your editor and compare it to your sequence. Do they match? They should. If they don't you did something wrong, and you should start over. Again, you need to be confident in this procedure, so repeat it until you get it right consistently.
After you've mastered this, you can move on.
Basic Oligo Design
In general, the way you design a cloning experiment is more complicated than predicting the product of a cloning experiment. For a properly-designed cloning experiment, there is only one potential product of the experiment. The procedures we've gone through thus far allow you to predict this product given sequences and a construction file. In contrast, there are a zillion ways of making any particular product and each instance you take on a case-by-case basis.
The questions you need to ask yourself before starting are things like
- Do I have a plasmid or genomic DNA template for my sequence that could be used for cloning by PCR?
- How big is the sequence I want to make?
- What am I going to do with this sequence? Should I put it in Biobricks format or something else?
...and it can get very complicated. We'll only consider for now the usual scenario where you do have a plasmid or genomic DNA template, the sequence is over 100 bp in length, and you are going to put it into a Biobrick-type format.
Step 1: Make a sequence file of your template
If the source of your sequence is a plasmid, make an ApE file of your plasmid sequence. Check that the thing you want to clone is going in the direction you want it in your final plasmid. If it isn't, replace the sequence with its reverse complement.
If the source of your sequence is genomic DNA, find a sequence of the thing you want somewhere, usually from NCBI (https://www.ncbi.nlm.nih.gov/). So, let's pick an example. Let's make an open reading frame basic part for the papC gene of Streptomyces venezuelae. We'll assume you found some genomic DNA from this organism somewhere, and you're going to PCR the papC gene out of it. In case you're curious, papC encodes an enzyme involved in the biosynthesis of p-aminophenylalanine en route to chloramphenicol. If you want a challenge, try searching for the gene sequence within the nucleotide mode of NCBI (https://www.ncbi.nlm.nih.gov/). The terms "venezuelae papC" should pull it up. If that isn't working for you, the number "AB116234" will pull it up. Look at the file entry now. You'll see a description of what's in this fragment of the genome from the organism, and you'll see that papC is in there amongst other genes. Let's convert this file.
In the browser portion of the page, click "Send to" and select "file". Save the "sequence.gb" file to your computer, the open it from within your editor. The file will be in the GenBank format. If your chosen editor preserves the annotations, you should be able to easily find the papC open reading frame. It is 969 bp in length. Notice that the sequence starts with ATG (a start codon) and ends with TGA (a stop codon). This tells you that the sequence is going in the direction you want it. If instead you saw that it started TCA and ended CAT, you'd know you were looking at the reverse complement and you need to reverse complement the whole file. If you didn't successfully do all that, you can cheat and download the file here. Finding and processing database data is kindov tricky and takes some practice to master. Just keep in mind that you need to get your template DNA sequence into an annotated format before proceeding with oligo design.
Copy this open reading frame sequence and paste it into a new editor window. Now we have our template for designing oligos.
Step 2: Inspect the sequence for restriction sites
We're going to make a BglBricks basic part out of this. This entails dropping the sequence in between BglII and BamHI restriction sites within the basic parts plasmid pBca9145. Before we worry about the oligos, let's make sure there aren't any restriction sites present in this sequence. Locate any BamHI, BglII, XhoI, or EcoRI sites present in this sequence. Hopefully nothing lit up. That's good. If there were some of these sites present, you'd have to remove them and that's a whole different can of worms.
Step 3: Design the annealing sequences
If you just take the first 20 bp and the last 20 bp of these sequence you probably are ok. Usually that's good enough...but not always. What else might you want to consider?
- In general, the 3' base of your oligos should be a G or C
- The overall G/C content of your annealing region should be between 50 and 65%
- The overall base composition of the sequences should be balanced (no missing bases, no excesses of one particular base)
- The length of your sequence can be modified to be around 18 and 25 bp
- The sequence should appear random. There shouldn't be long stretches of a single base, or large regions of G/C rich sequence and all A/T in other regions
- There should be little secondary structure. Ideally the Tm for the oligo should be under 40 degrees. By this, I am not referring to the 'Tm' shown in ApE. That is the Tm of the duplex. For an elaboration on this point, go here. Anything under that is harmless, but occasionally more can cause problems.
And why does any of that matter? You can think of the probability of failure for a PCR reaction as a function of how many deviations you make from the "ideal" PCR. The ideal PCR is between 500-1000 bp in length, uses a high-quality plasmid prep as template, has no strange secondary structure in the oligos or template, isn't abnormal in G/C content, etc. Your oligos ideally would have all the above-listed properties and have no added sequence on the 5' end.
You're already deviating a little because you will be pinning on 5' sequences, and you'll be cloning from genomic DNA. With respect to the length and G/C content of your template, you're somewhat stuck with whatever nature gave you. Since you are bound to cloning this sequence with the start codon at a specific place in your vector, your oligo has to anneal to the first few codons of your gene. So, all you can really do is play with how long the annealing stretch is. You have more flexibility on the 3' end. You could make your oligo anneal further downstream of the stop codon if the sequence were particularly bad around there.
In the case of papC, the 20bp on the 5' and 22 bp on the 3' end are OK, so let's just go with:
5' annealing region atgagcggcttcccccgcag 3' annealing region gaaggcgagaaggaccgatga
Looking at these sequences, the stretch of 5 C's in the 5' annealing region is a little ugly, but there is nothing you can do about it. The 3' one has only one T in it, and that's not very balanced. These aren't great, but they'd probably work.
Step 4: Add restriction sites and tails
Let's start with the 5' annealing region. We want to put a BglII site right next to the start codon, so that would be AGATCTatgagcggcttcccccgcag. We're going to clone this into the BglII and XhoI sites of our plasmid, so we only need the BglII site pinned on here. We do need a tail, though. So, a final sequence would be ccaaaAGATCTatgagcggcttcccccgcag.
Now let's do the 3' oligo. First of all, let's reverse complement it. So, gaaggcgagaaggaccgatga becomes tcatcggtccttctcgccttc. Notice that the stop codon (TGA) is now reverse complemented as TCA and is on the left hand side instead of the right. You'll want to put the BamHI site just upstream of the stop codon, so that gives you GGATCCtcatcggtccttctcgccttc. You'll also need to put the XhoI site, a spacer, and a tail on there. So, you want something like gctagCTCGAGttaGGATCCtcatcggtccttctcgccttc. Let's call these oligos papC-F and papC-R
Why do you need the XhoI site there? You may be thinking why don't we just do BamHI and BglII. Well, there's a problem with that. BamHI and BglII are specifically used for BglBricks because they generate "compatible cohesive ends". They both generate GATC sticky ends. If you tried digesting a vector with BamHI and BglII, the vector would just ligate back on itself giving rise to a high parent background. In practice, this can be circumvented by treating the vector with alkaline phosphates (CIP or SAP) which will remove the 5' phosphates on the ends of the fragment preventing ligation. But, there's another problem. The insert can ligate in two orientations. So, you'd have to screen many more clones to find the ones that had the correct orientation. That's the main reason for putting in the XhoI site.
Step 5: Make a construction file
I'll get you started here. You can download the plasmid sequence pBca9145-Bca1089 here.
Biobricking of papC
...and you can fill in the ????'s and then compare your answer to pBca9145-papC
Step 6: Check it!
- Predict the PCR product. Is there any unusual sequence bias or secondary structure present in your oligos? Are all your oligos annealing in the correct orientations? What is the size of your product?
- Simulate PCR product digestion. Are your restriction sites unique? Are they cutting at the right places?
- Simulate plasmid digestion. Are your restriction sites unique? Are the sites blocked by dam or dcm methylation? Are they cutting at the right places?
- Simulate ligation. Is the product what you intended to make?
Step 7: Some extra consideration
In Step 6, you determined whether your construction file "works" -- by this I mean you could go into the lab, go through the pcr, digestion, and ligation steps, and you'd get the product that you intended to make. Are you sure that this product is what you really want?
This isn't really a technical design question, it's a composition question. Therefore, it doesn't have a simple technical answer. However, there are a few clear-cut things you should ask yourself while examining your product.
- Do I have a complete open reading frame (ORF) in my part? By this, is there a complete DNA sequence beginning with a start codon and ending in a stop codon within the part (within the BglII and BamHI sites). In a later tutorial, you'll consider making parts that aren't complete ORFs. For now, you should have a complete ORF in your part.
- Does this ORF encode my protein . Are you sure you cloned the right thing? One way to determine that is to use ApE's find orf tool, then the translate tool to predict the protein. You could compare that sequence to the sequence in NCBI for your protein.
- Is my part oriented correctly?. For most purposes, the ORF in a part is on the sense strand. So, the start codon should be 5' of the stop codon. Another way of putting it, the ORF is pointing to the right.
- Is my part biobricked properly?. Are all the restriction sites I'll need for composite part assembly present in my part? Are the EcoRI, BglII, BamHI, and XhoI visible and unique in the product plasmid?
- Is the spacing of my part correct?. Coding sequences need to be the correct distance from the ribosome binding site. In the standard I've taught you, the start codon (ATG) should be directly 3' of the BglII site. If you've added extra bases, or omitted bases, the spacing won't be right when you hook your part up to the ribosome binding site parts. For the stop codon, their are no special spacing requirements.
Next, try it on your own with the quiz.
Basic Design Quiz
Try making a BglBricks basic part for the following part in plasmid pBca9145-Bca1089 for the colE2 colicin protein ceaB open reading frame. You'll have to find the sequence for ceaB in NCBI. When designing this, keep in mind you only want the open reading frame of ceaB in your basic part, and make sure you Biobrick it properly. Draw up a construction file and the oligos.
If you have any comments or want to report a potential error in the tutorial, please email me (Chris Anderson) at JCAnderson2167-at-gmail.com