BISC 219: Assignment Help- Sequence Analysis

From OpenWetWare
Jump to navigationJump to search
Wellesley College BISC 219 Genetics

Practice using the DNA sequencing software Sequencher™

You will be provided with sample sequencing data from previous experiments so that you can practice using this software and learn how to analyze your data. The practice data can be found on the computers in the back of the lab. Use the instructions below to familiarize yourself with Sequencher software. DO NOT transfer or remove any files from the lab folder on the desktop! Instead, please copy and paste the files you want to use into a new folder that you will create, as described in the beginning of the directions. PLEASE FOLLOW THE DIRECTIONS CAREFULLY, so that the students who come after you will find the files they need in a useable, unchanged form.

GOAL: Assembling and editing sequenced parts of a plasmid & comparing it to the sequence of a wild type gene of interest. Although it is preferable to confirm the sequence of a gene in both directions using forward and reverse primers for each template, these sequencing reactions used two forward primers: EcoRI clockwise 15mer and EcorRI clockwise 30-mer. The only difference is that one primer (30mer) had more bases than the other. These primers were both expected to anneal slightly upstream from the cI gene in a pBR322 based plasmid, and then sequence in a clockwise direction. The software program, Sequencher 4.5, will be used to fit the product of the sequencing reactions (the separated fragments of different sizes created in the thermal cycler) together and assemble the total gene sequence. You will scan for disagreements between the fragments and make decisions about the accuracy of the sequence by observing the peaks in the chromatogram of each of the sequenced fragments. Once you have obtained the most accurate sequence information possible for the mutagenized cI gene, you will compare them to wild type cI and hypothesize the source of a functional defect in the cI gene product (lambda repressor protein).

Importing files: NOTE: IF you don’t follow the directions in the first two paragraphs below for creating your own folder and for copying the files in the master practice folder (rather than working with the master files), you will RUIN the course master files for all other students coming after you. Please be considerate of others and avoid overwriting the master files by following these directions carefully.

Create a folder for your team (you and your partner) on the desktop of one of the 4 Mac computers in the lab. Name this folder your team color & lab day (Ex: orange-Wed). Drag and drop this new team folder into the main folder for your lab section day that you will see on the desktop.

Open another folder on the desktop called "Practice Sequence Information”. Copy all of it (Select All, Copy, Paste) to your team folder that you just created. Close the “Practice Sequence Info” folder as well as any other open files. DO NOT work with the files in the Practice Sequence Info Folder other than to copy them! ONLY WORK WITH THE COPY OF THE PRACTICE FILES THAT’S IN YOUR TEAM FOLDER.

Open Sequencher by clicking on the icon showing colored peaks found on the doc on the bottom of the main computer screen. A new project (or file) called “Untitled Project" will automatically open.

In this practice exercise you will import into your “Untitled Project” in Sequencher four pairs of practice sequences (black, blue, orange, and red) and the wild type cI gene sequence. Select under File Import, the 4 pairs of colored coded .abi files that you want (black, blue, orange, red) by clicking on each file. Make sure that all of those files are in your project before proceeding. Note that there are two files with the same color. Each is the DNA sequence of the cI gene from the same mutant, but the difference is that the sequence was generated from a different primer (15 or 30 mer). After importing all 8 of the .abi files (4 pairs) into your “Untitled project”, De-Select those files by clicking on the white background. If you have imported a file into your project that you don’t want and you want to remove it, highlight the unwanted file by clicking on it and then click on the trash bucket on the top right.

You will also need to get the .txt file (NOT the .doc file) of the wild type cI sequence into your project. This text file (.txt extension) can be found in the Practice Sequence folder. The .txt file can be dragged into your project to copy it, rather than by clicking an .abi file. DO NOT import the Word file (.doc extension) for the wild type cI gene if it is also in the practice master folder!

Analyzing Real (not practice) Data: IF you are practicing, skip this paragraph. If you are analyzing real data, you must find your team’s .abi files in the DATA folder for your lab day. Look for a sub-folder within the desktop folder for your lab section called “Sequencing Data- Your Lab Day- 219”. Select, Copy, & Paste (into the folder for your team that you created when practicing) only the .abi files that correspond to your team’s sequencing reactions. DO NOT save or try to use the files with the .seq extension! Once you have your raw data copied into your team folder on the computer desk top you can import the .abi files that you want to use into a new "untitled project" that opens automatically when you open Sequencher. Drag and drop the .txt file of the wild type gene sequence into your project from the practice folder you created for your team.

Assembly and Editing Once you have the files that you want to work with in your Untitled Project, you will assemble and edit the matching reactions. Before you start, please name your project. To do this, click somewhere in the blank space of Untitled Project to bring up the Sequencher tool bar. Then click File Save Project As (not Save). Rename your project with a unique name that will clearly identify it as yours and only yours (Ex. Bert_Ernie SR). In the “where” to save box, find your team folder under “recent places” in the drop-down menu of the save to where box so that you save your project to your team folder in your lab section folder (NOT TO THE COMPUTER DESKTOP!!!!)

Now you are ready to begin to analyze the data. Double click on one of the fragments (.abi files) in your project. A window with the sequence should appear. This is the moment of truth: did the sequencing work? How can you decide?

Check the number of bases in each of the fragments. Sequencing reactions are good for up to about a thousand bases before the reagents run out or the assembly time ends; therefore, if both fragments of a pair of sequencing reactions have only a few bases, the sequencing didn’t work and you can stop the data analysis. We hope this won’t happen to your real data, but that’s the conclusion you should reach about one of the practice sets in the practice files. Which color? You should have other pairs of practice sequences that are extensive enough to allow you to continue.

In the other pairs, is one of the sequenced fragments significantly longer than the other? In these sequencing reactions, two versions of the same primer on the same plasmid were used to create the fragments, so should the sequences be similar or identical. Why? If a sequence pair only shows one long sequence while the other is very short, you can still do a comparison with wild type cI's sequence, but you won't be able to create a combined sequence (called a contig) of the two paired sequences. Is there a color of practice files for which that is the case?

Work with a set (color) of practice files where both fragments have at least 500 bases. Now check and see how "clean" the sequencing was. Double click on one of the .abi files. Click Show Chromatogram from the tool bar at the top of the window with the sequence. If the sequence worked well, you should see hundreds of nice tall clean peaks of different colors. If the peaks are not tall, you can increase the scale using the blue dot and sliding vertical bar at the left, but this also increases background, so it may not help.

You can improve your confidence in the accuracy of any sequence by comparing paired sequences to see how well they match. This involves creating a contig (comparison), editing it, and then making a new sequence from the edit.

Close any Chromatograms and everything BUT your project window.

Making a Contig (comparison of two sequences): Select two matching .abi files by clicking on one to highlight it and then holding down the shift key while clicking on the other. They should both be highlighted while everything else in the project is not. If you need to deselect files, go to the Select menu to do that.

Before Assembling a Contig, modify the criteria for aligning the sequences by clicking on the “Assembly Parameters” bar at the top of the project window. Change the required minimum match percentage and minimum overlap. The Minimum match should be 60%, and the Minimum overlap 50 bases. Click OK.

Make sure both fragments are still highlighted and then click Assemble Automatically. This will create the “contig”: compares your two fragments and highlights the gaps and mismatches in order to show you commonality. When you see the window “Assembly completed”, look at the number of contigs created. If it says “0”, it means the two fragments didn’t match well enough to align them.

The contig of the two sequences should now have replaced the two fragment .abi files and it should be highlighted in blue in your project window. Double click it. A window called Contig000# should appear with a summary of how well the two reactions matched. Now you are ready to edit this contig with the goal of improving the accuracy of your sequence.

Click on the bases bar. Look at the line of bases at the very bottom. You will see plus signs and black dots. Can you figure out what they mean? Click on one of the two fragments (Ex: Orange-30) in the box on the far left. Read what it tells you in the box. Click on the other fragment. If you have ambiguities, you should try editing them.

EDITing the contig: Position the curser on the first base in the contig sequence on the bottom. It should make a black square around the letter and then highlight it in blue. Hold down both the Apple and D keys at the same time. This will move the display over to where the first disagreement between the two fragments is found. There will be a black dot there instead of the + sign. Because you want to determine if this is a real discrepancy, a sequencing error or an ambiguity, click the Show Chromatograms button and look for the base that is highlighted in blue. At that position decide if the height of the peaks and overall quality of the sequence in that region is clean enough for you to decide on the base name. If you have any doubt, the base in question should be N. If you can see a distinct peak under a lot of background, you can override the computer’s choice of “N” (meaning it didn’t know what to put there). Replace the N with the letter A,C,T, or G, if you think it’s clear what the base should be. You do not have to delete the letter that is there before typing the letter key of your choice. If there is a true discrepancy between the two sequences, the two peaks will both be clean, showing that the computer has named them appropriately. In that case, do not edit. Leave the bases called as they are. Move on to the next ambiguity or discrepancy. Hold down the apple&D keys again to move to the next disagreement and repeat the editing decision. After a few disagreements, a window will appear that directs you to use the space bar instead of the Apple-D keys.

Continue through the entire “Chromatograph from Contig” by hitting the space bar or Apple-D and either change the base to the appropriate letter code or leave it as N or whatever the computer has named that base. It is crucial to do the editing correctly. If you leave too many ambiguities when you could make an appropriate base call, it will be unclear whether or not the ambiguities hide a functionally significant mutation. Conversely, if you edit inappropriately, you may override and remove the discrepancy that reveals the functionally signficant mutation.

Once you have finished (you will get a message that no more discrepancies exist), go to the Contig menu at the top of the Sequencher tool bar and choose Create New Seq from Consensus. Be sure to use the defaults: Remove Gaps, Include Features, Append Time stamp. Click ok.

Close the “Chromatograph from Contig” window and the Contig 000# window. You will see the original project file with the first contig, probably #0001, and a new contig that you just created with “…@ date” as a name extension. Rename this newly edited contig (it will have the date as its name) and also rename the original contig with unique names. The first contig is the computer’s assessment of the sequences of a mutagenized plasmid’s cI gene. The second contig is your edited version of that data; presumably a more accurate plasmid sequence since you had two versions to compare, edit and combine. Use “mutcI” followed by the color code and the word contig as part of the new names.

Compare the wild type cI and the sequenced mutagenized plasmid gene again by creating another contig: Now you are ready to compare the mutated cI gene sequence with wild type's. Highlight (by clicking on it once) the best consensus sequence of the edited two fragments. If you don’t have an edited consensus sequence because one primer didn’t work well, see what to do in the next paragraph.

If you are using a file that has only one long, clean sequence from one of the two primers, you will highlight and use that one .abi file (rather than using the new sequence created as the consensus of both fragments). In the EDIT menu, you can "Remove from Project" any unusable .abi files, if you wish.

Holding down the apple or shift key, highlight the wild type cI gene.txt file that should be in your project. Make sure you didn't copy and try to use the .doc (Word) version of this file.

The edited consensus sequence (or the original .abi sequence file if you were unable to make a consensus sequence from both primer reactions) and the .txt file of the wild type cI sequence should be highlighted in blue and no other files should be highlighted. You are ready to assemble a new contig.

First check the Assembly Parameters by clicking on it. Make the Minimum Match= Percentage 60 and Minimum overlap= 50.

Form a new contig (comparison) of the mutant sequence with the wild type by clicking Assemble Automatically. Your new contig will appear highlighted in your project file, IF a contig could be created. If your window says “0” contigs created and there are a lot of bases in your fragment, what does that mean? What might it mean that there are no areas of homology between the wild type cI gene and the part of the mutagenized plasmid that were sequenced?

If a contig was formed, Double click on it. Click on the green button on the right of the window to maximize your view. Spend some time analyzing these green lines of the mutated gene’s sequence compared to the wild type. The Diagram Key should help you form some initial impressions of the similarities and differences. First of all, is your fragment longer or shorter than wild type? How might you account for your fragment being longer than the wild type gene? If your fragment is shorter than cI and has gaps, what might that mean?

If a mutant's contig is longer or shorter than wild type cI, delete the extra bits on the keyboard. It is not necessary to do this, but it will “clean up” things a bit and make comparing the amino acid translations easier. If you want to delete any extra named bases from a mutant’s cI gene that are outside the wild type cI gene sequence (both beginning and end), it’s ok to do that.

Follow the previous directions for editing a contig. Place the cursor on the first base and use the space bar to see where the disagreements are. If there is an N, that does not necessary indicate a mutation. Why not? A colon where a letter should be, indicates a possible deletion, but it is also possible that there is a sequencing error here and no true deletion. A long string of colons is more indicative of a significant deletion in the gene than a single missing base; a single missing base is more likely to be an inaccuracy in the sequencing reaction rather than a true deletion.

Click on the summary bar if you like that view better. Try the Ruler, Options, and Find views if you want to.

Rename this contig with something unique that lets you know what it is: a cleaned and edited sequence of a mutated cI gene compared to the wild-type sequence.

Did you find one or more possible point mutations (substitutions) that could account for a functionally significant change in the encoded protein, lambda repressor? To decide, you should scroll through the new contig sequence again counting and examining the base comparisons at the black dots. Assess each for its probable importance. If you find a long series of black dots or plus signs, that's more clearly significant, but what about one or more substitutions (point mutations)? Remember that one base change can lead to a functionally significant change in the protein or it might make no difference at all. To assess importance of point mutations, the first thing to determine is whether or not a particular substitution leads to an amino acid change. If so, the changes that are more likely to be important are those that create stop codons or encode a change in chemical properties of the residue affected.

You should familiarize yourself with the chemial properties of the different R groups on the 20 main amino acids. Changes of: 1) a basic to an acidic amino acid residue or, 2) a small to a bulky or, 3) flexible to inflexible amino acid or, 4) hydrophobic to hydrophilic (or vice versa) are more likely to be significant than a interchange of like-kind R groups. You can read about those characteristics of the various amino acids in text books or many places, but a quick guide to the biochemical characteristics of various amino acids can be found at the web site: http://chemed.chem.purdue.edu/genchem/topicreview/bp/1biochem/amino2.html

You will also need a key to the letter codes for the amino acids at: http://www.biochem.ucl.ac.uk/bsm/dbbrowser/c32/aacode.html

Additionally, it is important to know whether the changed amino acid in a point mutation inserts a stop codon and, if so, where is that stop codon in the folded protein? Remember the amino terminus is the beginning of the protein and the carboxy terminus is the end.

The lambda repressor protein is 236 amino acids long and folds into two globular domains that are connected by a flexible hinge or tether. The amino domain, from amino acids 1 to 92, is responsible for binding the DNA. The lambda repressor protein must dimerize in order to bind to DNA. The carboxyl domain of the protein, from amino acids 132 to 236, is the region responsible for the dimerization function.

You will need the protein translations of the mutant and wild type cI gene sequences to compare. Before you try to get the translation of your mutagenized gene, make sure you are starting in the right place. Delete all bases that you sequenced that are part of the primer sequence or upstream of cI. Also, remember that all transcripts start with a start codon; if translated, that would mean that every polypeptide begins with MET; most don't. The start codon gets excised post-transcriptionally, so you should either delete it or not count it when you are trying to correlate your amino acid translation with the numbers you find for residues of this protein in the literature. Also, if you have true insertions or deletions (as opposed to sequencing errors in the reactions that appear as deletions or insertions) in the mutant's nucleotide sequence, the translation may have a frameshift. A frameshift causes all the amino acids after the insertion or deletion point to be incorrect. It is possible that a mutant's cI gene has a frameshift problem that is the cause of the protein defect seen phenotypically, but it is much more likely that an apparent insertion or deletion is only an experimental artifact, usually a sequencing error. Frameshifts are so devastating that mechanisms have evolved to “fix” them post-transcriptionally, meaning that the translated polypeptide may not reflect an apparent frameshift mutation in the gene; therefore, that “frameshift” mutation is not responsible for the functional change seen in the protein. Look farther than an apparent frameshift for the cause of a mutant’s protein dysfunction. A point mutation or more significant gene deletion is more likely to be the cause.

To get the amino acid translation for the mutant's edited contig, go to the View menu in the Sequencher tool bar and choose Translation, Protein 1st frame. You will see the single letter amino acid code at the bottom of the contig window.

You can find the wild type's protein translation by using the translate function in Sequencher or, easier, you can find it in a Word (.doc) file called “wild type cI gene sequence.doc”. You have a copy of this .doc file in your folder that you copied from the Practice Files. It should contain both the DNA sequence and protein translation.

Alternatively, the following link to the NCBI web site, a national data base of all kinds of information on genes and proteins gives a lot of helpful information, including the translation of wildtype cI. The link to this site and the pertinent page is: http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=15056


When you are finished: Save the project with your contigs to your folder on the desktop. If you want to send any part of the data to yourself over First Class, you will have to EXPORT (under the file menu) the sequence into a word or other text file. You will not be able to open .abi or .seq files except in Sequencher so be sure to change the file format in the window that appears.

Remember to FILE QUIT Sequencher! If you do not quit the program, others, elsewhere on campus, can’t use it. We only have a site license for 5 copies to run at the same time, so be considerate and quit the program when you are finished.

Data Analysis Assignment due in lab next time: 10 points

Please write a summary assessing each of the mutagenized plasmids (each color indicates a different mutant) showing functional defects in the cI gene product. Make sure you include the id color code (black, red, orange, or blue), found in the original desktop folder, for the sequence you are analyzing.

1) Is there evidence for a point mutation or for a significant or complete cI gene deletion? Explain. Hint: “Can’t tell” may be the correct answer here for at least one of the “colors”.

2) If the mutation appears to be one or more point mutations, use the Sequencher software to determine if there are changes in the protein. If so, give the amino acid substitution(s).

3) If there is an amino acid change evaluate how likely that change is to be functionally significant. Hint: Does that change alter the chemical characteristics of an amino acid residue in the protein? If so, how so? Is there a new stop codon? If so, is it early or late in the protein translation and therefore, more or less likely to be functionally significant?

4) Note that lambda repressor protein is 236 amino acids long and folds into two globular domains that are connected by a flexible hinge or tether. The amino domain, from amino acids 1 to 92, is responsible for binding the DNA. The lambda repressor protein must dimerize in order to bind to DNA. The carboxyl domain of the protein, from amino acids 132 to 236, is the region responsible for the dimerization and tetramerization functions. If there is a point mutation that is likely to affect function, that change is likely to affect function that is associated with that region of the protein. For each of the mutations that you locate in the practice files decide of the muation is more likely to affect the dimerization/tetramerization function or DNA binding function of the protein?