Build-a-Gene Session 5
Because PCA produces many DNA molecules, not all of which are the correct size, we want to make sure that the DNA that we work with from here on contains the full-length emGFP gene. Remember that each bacterial cell originally picked up one DNA molecule. As that cell grew into a colony, all of the cells in that colony contain the same DNA molecule. Other bacterial colonies will contain DNA molecules that may be of a different size or sequence. We can therefore screen the bacterial colonies by colony PCR to determine which ones contain a plasmid insert that is the correct size for the emGFP+promoter.
1. Dispense 50 ul of water per tube into 12 different tubes.
2. Use a sterile toothpick to pick a bacterial colony and resuspend it in a tube with water. Repeat for 11 more colonies.
3. Prepare a master mix for all PCRs by combining all reagents listed below in one tube.
|10 uM primer||8.4 ul|
|10 uM primer||8.4 ul|
|2.5 mM dNTPs||14.0 ul|
4. Add 9 ul of the master mix into 12 different PCR tubes.
5. Add 1 ul of resuspended bacterial cells from a different colony into each PCR tube (IT IS VERY IMPORTANT TO STORE THE REMAINDER!) Start the PCR reactions in the PCR machine.
95°C, 6 minutes
95oC, 30 seconds 55oC, 30 seconds 72oC, 1 minute
72oC, 10 minutes
Now, we need to check how well our colony PCR worked by running our PCR products on an agarose gel to verify whether which colonies contain a plasmid carrying the full-length emGFP gene.
Pouring a Gel:
1. Weigh out 0.35 g of agarose on a piece of weigh paper. Transfer to an Erlenmeyer flask. Add 50 ml of 1x TAE.
2. Place the flask in the microwave and heat until the agarose is completely transparent and colorless.
3. Remove the flask of clear agarose and allow it to cool. This will take about 10 min.
4. When the agarose is cool, add 5 ul of gel red to the melted agarose
5. Swirl the agarose to incorporate the gel red and pour the agarose into the gel tray.
6. Allow at least 20 minutes for the gel to solidify. Once solid, carefully remove the comb and place the solidified gel (still on the tray) into the gel box so that the wells are oriented on the same side as the black electrode.
7. Add enough 1x TAE buffer to completely cover the gel by about 1 cm.
Preparing your samples:
1. On a piece of parafilm, spot out 2 ul of 6x DNA loading dye (for each colony PCR reaction.
2. Add 5 ul of water to each spot of dye.
3. Add 5 ul of PCR product to each spot of dye.
Running a Gel:
1. Into the first lane of the gel load 10 ul of the DNA ladder.
2. Then load 10 ul of each of your PCR products (mixed with water and dye).
3. Place gel lid with electrodes on gel box, and set voltage to 100V.
4. Run gel approximately 30 minutes or until the dye is 2/3 of the way down the gel, then take picture.
DNA SEQUENCE ANALYSIS
Once we’ve screened our clones by colony-screening PCR to verify that they contain an insert of the correct size, we need to sequence the inserts to verify that they contain an emGFP gene and promoter without any sequence errors.
When sequencing data is sent to us, we receive not only a text file containing the sequence of the DNA insert, but we also receive the data from the sequencing machine in the form of a color-coded electropherogram. The electopherogram represents the data obtained from sequencing detector, with the height of each peak representing the strength of the signal. We can therefore see the quality of the sequencing data that was obtained as well as investigate any ambiguities in the sequence. A sample electropherogram is here:  You will notice that the signal at the end of the electropherogram is not as strong as at the beginning; the peaks are much shorter and broader and become difficult to distinguish from one another. This is due to the difficulty of discriminating between relatively long DNA sequences at single-nucleotide resolution.
Our emGFP gene is about 750 bp long. However, DNA sequencing reactions (called sequencing "reads") are only 700 nucleotides long. We therefore sequence each clone twice (once from the beginning to the end of the emGFP gene and once from the end to the beginning)- we call these "forward” and “reverse” sequencing reads. This ensures that we will get good sequencing data across the entire gene.
Comparing the forward sequencing read to the desired emGFP sequence:
Now we need to determine if our clones contain a sequence that perfectly matches the emGFP gene and promoter or if they have DNA sequence errors. To accomplish this, we a bioinformatics tool called Clustal W .
1. Under "Step 1" of Clustal, change the settings from Protein to DNA.
2. Input the sequence of the emGFP gene. The line before the emGFP sequence must contain >emGFP (no spaces).
ATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGC CCGTGCCCTGGCCCACCCTCGTGACCACCTTGACCTACGGCGTGCAGTGCTTCGCCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGAC CCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAAGGTCTATATCACCGCCGACAAGCAGAAGAACGGCATCAAG GTGAACTTCAAGACCCGCCACAACATCGAGGACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCGCCCTGAGCAAAGACCCCAACGAGAAGCGCGATC ACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAGTAA
3. Skip a line and input the forward sequencing reaction:
tagcgagctaggattttttttatctgaattctgcctcgtgatacgcctatttttataggttaatgtcatga taataatggtttcttagacgtcaggtggcactcgagttgatcgggcacgtaagaggttccaactttcaccataatgaaaATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGC CCGTGCCCTGGCCCACCCTCGTGACCACCTTGACCTACGGCGTGCAGTGCTTCGCCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGAC CCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAAGGTCTATATCACCGCCGACAAGCAGAAGAACGGCATCAAG GTGAACTTCAAGACCCGCCACAACATCGAGGACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCGCCCTGAGCAAAGACCCCAACGAGAAGCGCGATC ACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAGTAAagaatccaagcctcgagctgtcagaccaagtttactcata
4. Click Align.
Clustal W gives you a scores table indicating the pairwise alignment similarity score (out of 100). More importantly, it provides a DNA alignment. Residues that are identical in the two sequences marked with a *. The alignment extends past the end of the emGFP gene and continues to sequence the vector as well.
Analyzing the reverse sequencing read:
The reverse sequencing read is the reverse complement of the emGFP sequence because it sequenced the complementary DNA strand of the double helix. To line it up with the emGFP gene, we must first reverse the sequence.
1. Go to the Sequence Manipulation Suite 
2. Make sure that you are on the Reverse Complement page and input your reverse sequencing read.
agtaatcttttcggttttaaagaaaaagggcaggTTACTTGTACAGCTCGTCCATGCCGAGAGTGATCCCGGCGGCGGTCACGAACTCCAGCAG GACCATGTGATCGCGCTTCTCGTTGGGGTCTTTGCTCAGGGCGGACTGGGTGCTCAGGTA GTGGTTGTCGGGCAGCAGCACGGGGCCGTCGCCGATGGGGGTGTTCTGCTGGTAGTGGTC GGCGAGCTGCACGCTGCCGTCCTCGATGTTGTGGCGGGTCTTGAAGTTCACCTTGATGCC GTTCTTCTGCTTGTCGGCGGTGATATAGACCTTGTGGCTGTTGTAGTTGTACTCCAGCTT GTGCCCCAGGATGTTGCCGTCCTCCTTGAAGTCGATGCCCTTCAGCTCGATGCGGTTCAC CAGGGTGTCGCCCTCGAACTTCACCTCGGCGCGGGTCTTGTAGTTGCCGTCGTCCTTGAA GAAGATGGTGCGCTCCTGGACGTAGCCTTCGGGCATGGCGGACTTGAAGAAGTCGTGCTG CTTCATGTGGTCGGGGTAGCGGGCGAAGCACTGCACGCCGTAGGTCAAGGTGGTCACGAG GGTGGGCCAGGGCACGGGCAGCTTGCCGGTGGTGCAGATGAACTTCAGGGTCAGCTTGCC GTAGGTGGCATCGCCCTCGCCCTCGCCGGACACGCTGAACTTGTGGCCGTTTACGTCGCC GTCCAGCTCGACCAGGATGGGCACCACCCCGGTGAACAGCTCCTCGCCCTTGCTCACCATtttaatttaaaaggatctaggtgaagatccttt
3. Click Submit.
4. Cut and paste this sequence into Clustal W:
The result should show the emGFP gene aligned with both the forward and reverse sequencing reads. At any nucleotide position, if your forward and reverse reads do not agree, one of the sequences is probably HIGHER quality than the other at every individual discrepant base (it’s more likely the ends at the beginning of the sequencing read are more reliable than at the end of the sequencing read. A mutation is only recorded if the forward and reverse reads agree with each other and disagree with the building block sequence.
Analyzing sequencing reads that contain a mutation:
tagcgagctaggattttttttatctgaattctgcctcgtgatacgcctatttttataggttaatgtcatga taataatggtttcttagacgtcaggtggcactcgagttgatcgggcacgtaagaggttccaactttcaccataatgaaaATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGC CCGTGCCCTGGCCCACCCTCGTGACCACCTTGACCTACGGCGTGCAGTGCTTCGCCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGAC CCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAAGGTCTATATCACCGCCGACAAGCAGAAGAACGGCATCAAG GTGAACTTCAAGACCCGCCACAACATCGAGGACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCGCCCTGAGCAAAGACCCCAACGAGAAGCGCGATC ACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGATCACTCTCGGCATGGACGAGCTGTACAAGTAAagaatccaagcctcgagctgtcagaccaagtttactcata
aaaggatcttcacctagatccttttaaattaaaATGGTGAGCAAGGGCGAGGAGCTGTTC ACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGC GTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGC ACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCTTGACCTACGGCGTG CAGTGCTTCGCCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATG CCCGAAGGCTACGTCCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACC CGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATC GACTTCAAGGAGGACGGCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCAC AAGGTCTATATCACCGCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGACCCGC CACAACATCGAGGACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATC GGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCGCCCTGAGC AAAGACCCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGG ATCACTCTCGGCATGGACGAGCTGTACAAGTAAcctgccctttttctttaaaaccgaaaa gattact
Calculating the error rate:
If you would like to know the overall error rate for creation of our emGFP gene (we call this value α), the error rate can be calculated as follows:
α = (Total # mutations found in all sequenced clones)/(Total # nucleotides sequenced that are not vector sequence). For example: if you found 13 mutations in 6 clones of a 750 bp gene, then α=13/(6*750) = 0.002
The probability of a clone containing the desired emGFP sequence without mutations (pc) is pc=e-αL.