BE.109:Bio-material engineering/Sequence analysis: Difference between revisions

Latest revision as of 14:23, 14 May 2006

BE.109 Laboratory Fundamentals of Biological Engineering

Home Getting started Lab Presenting your work People Schedule

DNA engineering Protein engineering Systems engineering Bio-material engineering

Introduction

The invention of automated sequencing machines has made sequence determination a fast and inexpensive endeavor. The method for sequencing DNA is not new but automation of the process is recent, developed in conjunction with the massive genome sequencing efforts of the 1990s. At the heart of sequencing reactions is chemistry worked out by Fred Sanger in the 1970’s which uses dideoxynucleotides.

**Normal bases versus chain-terminating bases**

These chain-terminating bases can be added to a growing chain of DNA but cannot be further extended. Performing four reactions, each with a different chain-terminating base, generates fragments of different lengths ending at G, A, T, or C. The fragments, once separated by size, reflect the DNA’s sequence. In the “old days” (all of 10 years ago!) radioactive material was incorporated into the elongating DNA fragments so they could be visualized on X-ray film (image on left). More recently fluorescent dyes, one color linked to each dideoxy-base, have been used instead. The four colored fragments can be passed through capillaries to a computer that can read the output and trace the color intensities detected (image on right). Your sample was sequenced in this way on an ABI 3730 DNA Analyzer.

Analysis of sequence data is no small task. “Sequence gazing” can swallow hours of time with little or no results. There are also many web-based programs to decipher patterns. The nucleotide or its translated protein can be examined in this way. Thanks to the genome sequence information that is now available, a new verb, “to BLAST,” has been coined to describe the comparison of your own sequence to sequences from other organisms. BLAST is an acronym for Basic Local Alignment Search Tool, and can be accessed through the National Center for Biotechnology Information (NCBI) home page at http://www.ncbi.nlm.nih.gov/

Protocol

The data from the Biopolymers Facility has been loaded to your laptop. If you would like to retrieve it yourself go to http://web.mit.edu/biopolymers/www/ You can follow the link to DNA SEQUENCING SERVICES to read about the sequencing that was done or go directly to INSTRUCTIONS FOR DOWNLOADING DATA - ACCESSING THE FTP SERVER. To download your data, know: Host = biopolymers.mit.edu User ID = KULDELL Password = NATECOLE (all caps!) Directory = /pub Scroll to find the “Kuldell” folder, which is only data folder you’ll be able to open. There should be two outputs for each sequencing sample your group provided. One ends with “.abi” and is a trace of the fluorescent output from the sequencing machine. This can be viewed with “EditSeq” if you are using a Mac or with “Chromas” if you are on a PC. The other output file ends with “.seq” and lists the nucleotide sequence in Excel. The data from this file can be imported into any web-based sequence analysis program you’d like to use, pasting it wherever the program asks for “FASTA” format.

A good place to start your sequence analysis might with the translation program freely available at http://www.ebi.ac.uk/emboss/transeq/. The table below may help orient you to the salient parts of your data. The translated sequence is presented in single letter code, where X indicated ambiguity in the sequence data. The four library sequences do not necessarily bind gold.

pCT-CON	YALQA SGGGG SGGGG SGGGG SASCG GGGTS KISHF LKMES LNFIR AHTPY INIYN CEPAN PSEKN SPSTQ YCYSI QSSQV DCGGG SEQKL ISEED LLEI QQ
pAu1	YALQA SGGGG SGGGG SGGGG SASQV QLQQS GPGLV KPSQT LSLTC AISGD SVSGN TAAWN WIRQS PSRGL EWLGR TYYRS KWHYD MRHL* KVE*
Library seq1	YALQA SGGGG SGGGG SGGGG SASQG GGGSG PPRRR SNVWA PVLA RPVAW GRIRT KAYF
seq2	YXXXA SGGGG SGGGG SGGGG SASQG GGGSG VYGLS GTARS RGLA RPVAW GRIRT KAYF
seq3	YXXXA SXXGG SGGGG SGGGG SASQG GGGSG KRGCS RALWW IALA RPVAW GRIRT KAYF
seq4	YXLQA SGGGG SGGGG SGGGG SASQG GGGSG WKMFI GGTWL GCLA RPVAW GRIRT KAYF

As you consider your data, you should also explore what is known about amino acid interaction with metals, using search engines such as PubMed, MIT’s homepage or even Google, and also consider the data from your classmates. Collaborating in this way may support any developing theory you have. Before you leave, please post your data (sequence, relative strength of gold binding, and so on) to the discussion page associated with this lab and write a few comments about the results.

REALLY DONE!

@@ Line 17: / Line 17: @@
 ==Protocol==
-Practice sequence analysis using the pCT-CON and pAu1 files on the website and the following link [[http://www2.ebi.ac.uk/translate/]]. The flow should be relatively intuitive but here are some brief instructions in case you are stuck. Open a sequence file (.seq) which is an Excel worksheet. Select all. Copy. Go to link. Paste. Translate sequence by clicking “Generate Protein” in each reading frame (no need to generate complements). You can color (“colour”) the protein if you want. Use the attached page to remind you of the structure and chemistry of the amino acids as well as the single- and three-letter amino acid abbreviations. <br> Translate in all three reading frames and paste each output into a word document.
+The data from the Biopolymers Facility has been loaded to your laptop. If you would like to retrieve it yourself go to http://web.mit.edu/biopolymers/www/  You can follow the link to DNA SEQUENCING SERVICES to read about the sequencing that was done or go directly to INSTRUCTIONS FOR DOWNLOADING DATA - ACCESSING THE FTP SERVER. To download your data, know:
-Once you are confident with these translation steps, begin similar analysis of your sequence information. Download that data from the Biopolymers Facility to your laptop. Go to http://web.mit.edu/biopolymers/www/  You can follow the link to DNA SEQUENCING SERVICES to read about the sequencing that was done or go directly to INSTRUCTIONS FOR DOWNLOADING DATA - ACCESSING THE FTP SERVER. To download your data, know:
 Host = biopolymers.mit.edu
 User ID = KULDELL
 Password = NATECOLE (all caps!)
 Directory = /pub
-Scroll to find the “Kuldell” folder, which is only data folder you’ll be able to open. There should be two outputs for each sequencing sample your group provided. One ends with “.abi” and is a trace of the fluorescent output from the sequencing machine. This can be viewed with “EditSeq” if you are using a Mac or with “Chromas” if you are on a PC. The other output file ends with “.seq” and lists the nucleotide sequence in Excel. The data from this file can be imported into any web-based sequence analysis program you’d like to use, pasting it wherever the program asks for “FASTA” format. A good place to start your sequence analysis might with the translation program you tried last time
+Scroll to find the “Kuldell” folder, which is only data folder you’ll be able to open. There should be two outputs for each sequencing sample your group provided. One ends with “.abi” and is a trace of the fluorescent output from the sequencing machine. This can be viewed with “EditSeq” if you are using a Mac or with “Chromas” if you are on a PC. The other output file ends with “.seq” and lists the nucleotide sequence in Excel. The data from this file can be imported into any web-based sequence analysis program you’d like to use, pasting it wherever the program asks for “FASTA” format.
-http://www2.ebi.ac.uk/translate/ . The table below may help orient you to the salient parts of your data. The translated sequence is presented in single letter code, where X indicated ambiguity in the sequence data. The four library sequences do not necessarily bind gold.
+A good place to start your sequence analysis might with the translation program freely available at
+http://www.ebi.ac.uk/emboss/transeq/. The table below may help orient you to the salient parts of your data. The translated sequence is presented in single letter code, where X indicated ambiguity in the sequence data. The four library sequences do not necessarily bind gold.
 {| border="1"
@@ Line 47: / Line 47: @@
 |}
-As you consider your data, you should also explore what is known about amino acid interaction with metals, using search engines such as PubMed, MIT’s homepage or even Google, and also consider the data from your classmates. Collaborating in this way may support any developing theory you have. Before you leave, please post your data on the class website and write a few comments about the results.
+As you consider your data, you should also explore what is known about amino acid interaction with metals, using search engines such as PubMed, MIT’s homepage or even Google, and also consider the data from your classmates. Collaborating in this way may support any developing theory you have. Before you leave, please post your data (sequence, relative strength of gold binding, and so on) to the discussion page associated with this lab and write a few comments about the results.
 REALLY DONE!
-==For next time==
-#Finish writing your lab report on the experiments performed in Module 4. You will describe ''either'' the variable you tested to optimize the gold-binding protocol ''or'' the library screen for new gold binders. The report is due next time.

BE.109:Bio-material engineering/Sequence analysis: Difference between revisions

Latest revision as of 14:23, 14 May 2006

Introduction

Protocol

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

research

Tools