Tour of Structural Biology
This tutorial is designed to introduce you to protein structure. We’ll go through the traditional stuff which hopefully you’ve already seen, then we’ll look at some diverse structures of interest and see how they work. Interspersed throughout you'll have questions in red which are your quiz. You should collect all the images and text answers to the questions, number them, and embed them in a single Word file and then send that in as a single document.
This tutorial will use a freeware program called DeepView. If you skipped the "DeepView Basics" tutorial, you can download it here: http://spdbv.vital-it.ch/disclaim.html
This tutorial refers to several PDB files which can be downloaded here. In spots, you'll have to download the PDB files from the Protein Data Bank.
The Basics: alpha helices, beta sheets, and hydrophobic cores
There are 4 levels of protein structure: primary (the polypeptide sequence), secondary (alpha helices and beta strands), tertiary (folds), and quaternary (protein-protein interactions). To see how these work, let’s open up “alpha_helix.pdb” in DeepView. It’s a crystal structure of an alpha helix. It will throw up some messages when you open files—just close them.
Recap of DeepView controls
You should see a structure on the screen. If you do not see the control panel, go under Wind > Control Panel to open it up. The control panel shows all the residues in the structure. The first column (here all A’s) is the chain. If there were two polypeptides, or say a protein and a DNA, you’d see more letters. The next column shows whether there is secondary structure present. You’ll see here a bunch of h’s—those mean helices, or alpha helices. The next column are the residues and their position numbered from the N-terminus. You’ll notice that this starts with 3—this is fairly common with protein structure. Often the N and C-termini of a protein are unstructured and do not give clear electron density. In this instance, I chopped an alpha helices out of a larger protein and that is why you only see 15 amino acids. The next column is yes/no as to whether DeepView shows the atom coordinates for the residue. Try clicking on some of them. You’ll notice that clicking one of the v’s will make various sticks disappear. You can click on the same spot again to make them come back. The next column determines whether the side chain is displayed or not. Try clicking off one of those. You’ll notice that the side chains disappear, but the backbone stays lit up. The next column are labels. Try clicking in the white area next to GLN8. You’ll see “Gln8” show up on the screen near the residue. The next column is the space-filling view of the residue The little triangle above it can be clicked on and it allows you to toggle between different types of space-filling views. The next column is the ribbon view. This gives the traditional Richardson view of the structure. Finally, you have the color column and a triangle to the right of it. You can toggle between coloring the side chain, the ribbon, the backbone, and so on by choosing the one you want from the triangle and then clicking on the square under “col” and choosing a color. If you just hit “cancel” from the color window that pops up, it will use CPK coloring which means red for oxygen, blue for nitrogen, yellow for sulfur, and white for carbon. You can play with the control panel to familiarize yourself with turning things on and off.
Now let’s play with the navigation controls. At the top of the screen you’ll see some icons:
The first one centers the currently-selected residues in 3 dimensions and scales it to fit your window. The next one is the hand symbol. First click on it to highlight it, then click on the structure and you can drag the view left and right. The next one allows you to scale the view by clicking and moving the mouse up and down. You can achieve the same result by holding down the left and right clicks and dragging the mouse up and down. The next one rotates the view with left clicks when highlighted. It rotates the view around the current center point. You can re-center the entire thing by clicking on the first re-center and scale button. So, practice moving the molecule around on your screen and get comfortable looking at it.
You can also put the view in stereo mode by going under Display > Stereo View. This gives me a headache, so I never use it, but if you want to ruin your eyes, turn this on and click on the rotate button. Look at the screen cross-eyed, and a third version of the structure will appear in the middle (an illusion, of course). Staring at the middle structure, move the mouse around, and it should pop out at you as 3D. More sophisticated structure viewing tools give you special goggles to look at things in 3D, but if you just constantly wiggle the rotation around you’ll get used to looking at things in 2D. Alright, those are the basic viewing tools in DeepView. We’ll play with more of them later, but let’s learn about structure.
Examination of alpha helix secondary structure
Go under Wind > Ramachandran Plot to pull up the window. If you aren’t familiar with this plot, what it tells you is the phi and psi angles of all the backbone bonds. The “Calpha” positions of all the amino acids in a structure in combination with the psi and phi angles define the backbone positions of the structure. You’ll notice there are several distinct zones colored in the plot. What these refer to are the “allowed” angles for polypeptides. Basically, outside of these zones you are getting an angle of the bonds that doesn’t work electronically or sterically. Go into the control panel. You’ll notice some things are black and some things are red. Click on “GLN8”—it will turn red and everything else turns black. Now click ctrl-A to select everything—they all turn red again. Now do that same thing while looking at the Ramachandran plot. You’ll see little white dots appearing and disappearing as you click. Each dot represents a different residue in the protein, and it is plotting it’s psi and phi angles. You’ll notice that all these residues have angles very similar to one another within a little yellow zone. This is because are current structure is all alpha helix. Secondary structure is defined by stretches of residues with similar psi and phi angles. So, it’s no coincidence that all these residues have similar angles. You can close the Ramachandran plot now. Let’s look a little closer at the structure. Click around on the control panel to make it look like the image below. If you hold shift while clicking, you’ll perform the change you want to the entire column rather than just a single cell—that speeds things up. You can also click and drag to change multiple cells at once.
Click on Tools > Compute H-bonds. Little green lines should pop up. These represent calculated hydrogen bonds. These are calculated by the program whenever appropriate hydrogen bond donor and acceptors are within a certain distance from each other and an acceptable orientation for H-bonding. Rotate the view around and center it to orient yourself, and then shift-click on the ribn column to get rid of it. Since we have the side chains off, you are just looking at the backbone, but notice how the oxygens of all the backbone carbonyl groups hydrogen bond to alpha amino groups on the N+4 residues. Now orient your view so that you are looking down the barrel of the alpha helix. Click on the word “side” in control panel. This will light up the side chains for the selected residues (8 through 15). You’ll notice that they all point out at various angles towards the outside of the helix. So, that’s the alpha helix—you have a spiral made up of the backbone with side chains sticking straight out from it. Now, let’s look at the whole protein.
Examining the tertiary structure of an all-alpha helix protein
Open up 2JHO.pdb and turn off the “show” and “side” for everything, and turn on “ribn” for everything. So, this is a myoglobin from sperm whale. It’s an all alpha-helix protein. Open up the Ramachandran Plot. Select all the residues in the control panel and notice where they show up—a lot of alpha helix, huh. Now select resdues 10-34 and turn on show, side, and rib. Center your view. Look at your Ramachandran Plot—you’ll notice that only one residue isn’t in the alpha-helix angle zone. It’s asp20. Now, turn off all side-chains except asp20. Turn on the label if you can’t find it, but you’ll see its right at the junction between the two alpha helices. So, in protein domains that are “all alpha helix”, you’ll get loops that have phi and psi angles outside of the alpha helix region.
Let’s look at the whole thing again. Select everything, turn on ribn only. What you see is about 7 alpha helices piled on top of each other. Let’s look at folding now. Select all and turn on “show”, “side”, and “::v”. Click on the color triangle and select “backbone and side”. Click on the word “col” and select one of the orange colors and say ok. You should see a big mess of orange dots now (and no other colors). Alright, now select all the hydrophobic residues by doing Select > Group Property > Non Polar. With those selected, click on “col” and choose one of the greens. Now, the screen looks really complicated.
Examining the hydrophobic core
Let’s put it in “slab mode” to simplify it: Display > Slab. You’re now looking at a cross-section of the structure. You can use your rotate tool to spin things around in 3D. Take a look at it from multiple views. Notice a pattern? In general, you’ll see that almost exclusively green dots are in the middle, and orange dots on the outside. That’s basically what is driving proteins to fold into tertiary structures—you have a hydrophobic core being pushed to the middle, and mostly polar residues being brought to the surface. You’ll notice, though, that there are still a variety of hydrophobic patches on the surface. That’s pretty typical. The important thing for folding is to have a tightly packed, all hydrophobic core. I recommend you go through this procedure of lighting up the hydrophobic core for all the proteins we look at in this tutorial to see the degree to which this is true. Maintaining the current slab view, also turn on ribn and rotate the structure around. You’ll see now how alpha helices interact with each other. Each alpha helix extends side chains from it, and the hydrophobic ones orient towards the core while the polar ones orient themselves outward.
Examining the tertiary structure of an all-beta sheet protein Let’s now turn to a beta strand. Open up “2IZJ.pdb”. Turn on “show” and “ribn” for 26-35 and center your view. Open up the Ramachandran Plot. You’ll notice that all the phi and psi angles are again in a similar region, but it is a different region than the alpha helices. The backbone of a beta strand is all stretched out making for a long flat peptide. Go Tools > Compute H-Bonds. You’ll notice that nothing popped up. There are no hydrogen bonds within a beta strand! Let’s expand the view: select 17-45. Turn on “show” and “ribn”. Now we have 3 beta strands in view and you should see some green hydrogen bonds. So, in beta sheets you get hydrogen bonding between the backbone carbonyls and amino groups from adjacent strands. Now, let’s also turn on the sidechains “side”. Select all. Using the color triangle, make “backbone + side” orange. Wiggle around your rotation, and you’ll see that the side chains all point above and below the beta sheet.
Now, put everything into view, turn on show, side, ::v, and ribn on. Color all the backbones and side chains orange, select the hydrophobic residues, and color them green. Now wiggle around your screen in slab view. The hydrophobic core of this protein (streptavidin) is a little less apparent than the myoglobin structure, but check out this view looking down the barrel:
So, you get one face of the beta sheets jutting out hydrophobic residues towards the core, and the other face jutting out polar residues towards the solvent. You’ll notice there are some polar residues inside the core of this protein. That seems a little strange. The reason for it is that this protein binds a small molecule (biotin) by stuffing the small molecule inside the barrel. The non-hydrophobic amino acids will interact with the non-hydrophobic moieties present in the ligand.
Inverting the hydrophobic core
Let's take a look at the integral membrane protein bacteriorhodopsin and see how it folds. In the membrane, you have an aqueous environment "above" and "below" the protein, but hydrophobic residues all around it. So, how does this play out in terms of folding? Open up 2I20.pdb. I've already colored all the hydrophobic residues green, the lipid residues from the membrane around it dark green, and everything else is yellow:
Notice how the outer rim of all the alpha helices is hydrophobic and the internal core is mostly hydrophilic. Spin the files around on the screen and look at the top and bottom of the barrel. Notice that the residues capping the barrel are more hydrophilic. Make sense? You are still getting folding by driving hydrophobic residues to a hydrophobic environment, and hydrophilic residues towards an aqueous environment.
Motifs, Folds, and Domains
Before we move beyond folding to look at active sites, let's take a look at some of the basics of protein topology. I'm going to give you just the highlights, but the Brandon and Tooze text goes through this in great detail.
First of all, the minimal folding units of proteins are called "domains". Assuming it's a soluble protein (ie, not a transmembrane protein), a domain is composed of alpha helices, beta sheets, and loops all oriented in such a way that they produce a nicely packed hydrophobic core. Basically, all the structures we've thus far looked at were single domain proteins, so the distinction between a protein and a domain wasn't necessary. In discussing folds, it is important to make a distinction between the two. The linkage between the alpha helices and beta sheets in a structure is referred to as the topology of the domain, the supersecondary structure, or the motif of the domain. The reason these are important are 1) it provides a descriptive means of classifying different domains folds, 2) they often can be treated as modular functional/folding units in protein engineering, and 3) they tend to be evolutionarily and functionally conserved. By this latter point, I mean that similar folds tend to be duplicated during evolution and used for similar types of functions. So, let's look at the 3 categories: all beta, all alpha, and alpha/beta.
All Beta Domains
The all-beta family of folds has some very important members including the antibodies, many lectins, and many adhesion domains. Three common folds are shown above. They all look pretty similar--they are all made up of beta sheets and really differ only by which strands hydrogen bond to which other strands.
The up-and-down beta barrels are a single beta sheet wrapped into a circle. These show up in many multi-pass transmembrane proteins (such as ompA and ompG). The large void in the middle of the barrel can house other things such as a hole for channels. Often they show up as soluble proteins with something plugging the middle of the barrel. That can be a ligand that the barrel binds such as biotin in streptavidin. In GFP, the fluorophore is hanging onto a terminus of the protein and then inserted into the barrel.
The jellyroll motif has its loops wrapped over the ends of the barrel. Like the beta barrels, a single circular beta sheet wraps around the protein. However, the "inside" of the barrel tends to be blocked by the loops and is usually a solid hydrophobic core without voids. So, you don't tend to see things stuffed into domains like these. Instead, they can house long loops, and the binding sites for things are in between the loops. The large loops enabled by this structure allows these to bind fairly large substrates such as polysaccharides. As such, many lectins (carbohydrate-binding proteins) adopt this fold.
Though they also show up in various prokaryotic proteins, the immunoglobulin domain is a recurring fold in mammalian systems used for cell-cell recognition and the immune system. Unlike the previous two, it isn't a barrel structure. Instead, they adopt a sandwich orientation where the two pieces of bread are the beta sheets, and in the middle is a hydrophobic core. Like the jellyroll, there aren't voids within the hydrophobic core, so these domains interact with other things via their loops.
The alpha/beta folds can be distinguished based on whether the beta sheet is splayed out or forming a barrel. In the splayed-out beta sheet ones, the alpha helices wrap around the sheet either above and/or below the sheet. The beta sheet itself is always twisted, and sometimes forms a very pronounced spiral. In the barrel versions, often you get an all-parallel beta sheet wrapped into a circle with alpha helices looping out of the barrel to connect them end-to-end. These are referred to as "a/b barrels".
Alpha/Beta domains are very common in enzymes. In particular, I've highlighted some specific folds in each class in the image above. The Rossman fold is very often seen when an enzyme works with nucleotides like ATP or NADH. There is a specific spot in a cleft between two alpha helices at the end of two beta strands on these folds where the nucleotide binds. Because of it's ubiquitous role in binding nucleotides, it is also called the "nucleotide binding motif."
The TIM barrel is a specific type of a/b barrel that is typified by TIM (triose phosphate isomerase). Although it is a barrel like the all-beta barrels we saw before, the core isn't entirely hollow. Usually it has a core in the middle, so substrates don't stuff themselves entirely into the barrel as you might expect. They do end up inside the barrel, but just inset in a little bit from one side. The interactions are therefore with the loops above the barrel, not the alpha helices or beta strands.
All Alpha Domains
The all-alpha domains are typified by the molten globules and coiled-coils. Unlike beta sheets where the hydrogen-bonding interactions are between strands, all the hydrogen bonds in all-alpha domains are within the alpha helices themselves. This allows the tertiary structure of all-alpha domains to be a little more free form. The glue holding it all together is the hydrophobic residues packing between the helices. So, you can't as easily typify the folding of these things.
In globular proteins, such as myoglobin, you get many alpha helices all packed together in a ball with voids in between the hydrophobic core to bind to various substrates. In coiled coils, you get long alpha helices wrapping around each other with a periodic pattern of hydrophobic residues holding them together. The can be parallel or antiparallel, and can contain 2, 3, 4, or more alpha helices bundled together. These often form long rod-like structures as structural proteins such as collagen, or spacer elements due to their long stretched-out configuration. Also, they can be seen as dimerization domains that hold two polypeptides together.
There are several really common neat and tidy folds that show up in structural biology, and I've highlighted the most famous ones here. However, if you just pick a random protein from the Protein Data Bank and examine its fold, you'll find that it most likely doesn't fit into one of these categories. As long as a domain follows the general principals of having a packed hydrophobic core and a lot of hydrogen bonds, it is fair game. Some proteins don't even obey this rule! There are things that get tied together with a bunch of disulfide bonds and are therefore covalently locked into their tertiary structure. Knottins are an example of this that have been used for engineering applications. These proteins tend to be really stable structures. So, there are all sorts of topologies possible.
Multiple domains and quaternary structure
Most importantly, many proteins have multiple domains. For example, you don't see an immunoglobulin fold all by itself very often. Usually, you get 2 or more immunoglobulin domains linked one after another in a chain. On an IgG molecule, you have 4 polypeptides all linked together in a quaternary structure where you have 2 light chains with 2 immunoglobulin domains each and 2 heavy chains with 4 immunoglobulin domains each. With the immunoglobulin fold, the individual domains truly are modular. How do we know? Well, you can make a synthetic gene out of just one of the immunoglobulin domains and it can often be expressed as a properly-folded protein. Another way you might anticipate the modularity of a domain is by looking for evidence of flexibility in the overall protein, or by the lack of contacts between adjacent chains in the crystal structure. The domains are readily identifiable in an antibody--they are all linked together as little balls separated by pretty long linkers. You get lots of interactions between the different polypeptides in an IgG, but not between the domains of a single chain. There are very few structures of complete IgG molecules. The reason for this is that the linkers between the domains are somewhat flexible making the whole thing hard to crystallize. So, you typically get structures of just the Fab fragment or of scFv's.
Let's take a look at one that is full-sized. Download 1HZH.pdb. Open it up and color by chain to light up the different polypeptides present. See which are the heavy chains and which are the light chains? Try to identify which domains are VH and VL, and where the antigen binding site is.
1) Pick one of the domains in 1HZH. Show only this domain on the screen. Show it's ribbon in yellow. Also show the sidechains in the domain. Color them all yellow, and then select just the "non-polar" residues, and color those green. Put the view in "Slab" view and orient the structure so that your view cuts across the beta sheets. Notice the nice hydrophobic core? Take a picture of it for your answer.
In many other cases, there are distinct domains visible in the crystal structure, but they aren't really modular folding units from an engineering perspective. Let's look at one...download 2DEI.pdb which is galactokinase from Pyrococcus horikoshii. Open it up, show just the ribbon. Color the ribbon from amino acid 1 through 166 white, and the rest of the polypeptide green. Notice the two domains? So, there are 2 domains here, but they aren't well isolated by any means. There are lots of interactions between them. Let's also show the small molecules present in the structure. Page down in your control panel to GLA401 and MAP402. Turn on "show" and "side" for those groups. These two groups are galactose (the sugar this thing phosphorylates) and AMP-PnP, a non-hydrolyzable analog of ATP. Notice how the ATP moiety is bound to one domain (this is a Rossman fold!), and the glucose to the other. The two domains are being brought into proximity to hold the ATP right up next to the hydroxyl group on galactose that will receive the gamma phosphate.
Unlike with the immunoglobulin domains, it is not correct to think of these two domains as modular domains. If you tried to make synthetic genes out of the two domains and express them, they probably wouldn't make folded proteins. You might be thinking you could take this ATP domain, glue it to some other domain that bound another substrate and turn these things into a new kinase. Well, that's like what evolution does, but if you just glued two domains together you wouldn't get the simple behavior you're hoping for. These domains clearly have evolved to interact with each other in such a way that they exactly position the substrates in the right spot. You'd have to engineer the interface between your domains to get a similar communication between the domains, and that is by no means easy.
2) Take a picture of this for your answer. Don't close the file just yet--you'll use it again in the next section.
The Active Site
The thing about proteins that leads us to study them isn't primarily folding. It's what they do after they fold. Now, of course, the folding is necessary to get to the structure that enables them to do whatever they are going to do, but folding alone doesn't accomplish anything within the cell. What proteins and RNAs do is pretty diverse. On the figure below I've pointed to various interactions that occur within biology. The arrows correspond to physical interactions between biomolecules, and they can be either binding (just sticking together with no change in either molecules chemical composition) or catalytic (interaction results in a change in modifications to the covalent bonds in one or the other molecule). The one with a dotted line isn't really a means of achieving function in a cell, but all the others are quite common: Proteins can interact with other proteins, RNAs, DNAs, or small molecules. RNA has similar diversity of interactions, but not quite as extensive as protein. Missing from the figure are the carbohydrate polymers and lipid membranes. I guess I'd lump them into the same category as metabolites--from a functional perspective they (as far as we know now!) are passive endpoints in biology. Things certainly interact with and manipulate them, but you don't get carbohydrate-based catalysts.
So, let's take a tour of some examples of these interactions:
Let's take a closer look at how galactokinase interacts with its substrates. If you closed it, reopen 2DEI.pdb. Turn off the ribbon so that nothing shows up on the screen. Now select MAP402 so that the name is red (and only that single group is red). Now do Select > Neighbors of selected residues... When the menu bar pops up, select "Display only groups that are within" and change the number to 5 angstroms. Do Select > Visible groups. That will highlight all the groups you currently see on the screen. Now, color the backbones blue and color the sidechains as CPK. If you forgot how to do CPK coloring, you click "col" then just say OK instead of picking a specific color. Go undo Tools and select "Compute H-Bonds". Alright, it's a little messy, but let's take a look at some of the interactions present here.
3) Take a picture of this for your answer.
First of all, turn on the label of Phe108. Notice how the aromatic side chain is pancaked next to the adenine moiety? That is called pi-stacking. It is a pretty common type of interaction whenever substrates contain an aromatic ring. It's also part of what gives DNA it's structural stability
Turn on the labels for Ser47 and Phe50. Notice how the serine sidechain makes a hydrogen bond to the adenine? Notice also that Ser47 is hydrogen bonded to the backbone of Phe50. That second interaction is part of the "hydrogen bonding network" of the active site. When proteins interact with substrates using hydrogen bonds, you often get a highly structured combination of hydrogen bonds within the active site that orient the side chain in a geometrically-precise way and help to satisfy all the hydrogen bonds that can be made in the structure. Sometimes you'll also get "structural waters" in an active site. Normally, you don't see the water in a crystal structure because it is randomly incorporated about the structure. Sometimes, however, they do show up in the electron density and are playing a specific role in molecular recognition for the active site. You'll notice that nothing is hydrogen bonding to the 2'OH and 3'OH of the ribose moiety. That is because they are solvent exposed and are interacting with waters, but not every single protein molecule in the crystal is doing it the same way. So, it doesn't show up in the electron density. An important principle of hydrogen bonds is that they all need to be satisfied. There is a huge energy cost to having unsatisfied hydrogen bonds in an aqueous environment. So, if a face of the substrate is "inside" the protein, you'll explicitly see interactions with the peptide backbone, polar sidechains, or structured waters.
Turn on the label for Arg191. Arginine has a positive charge, and notice how it interacts with the negatively charged gamma phosphate. You'd call that a salt bridge or an ionic interaction. You'll also notice that there is a whole ring of backbone residues surrounding the gamma phosphate. Together, these interactions will stabilize the transition state when the hydroxyl group on galactose attacks the gamma phosphate. That lowers the energy barrier thereby catalyzing the reaction. So, this little pocket of groups around the gamma phosphate is the real catalyst that makes the chemistry happen. The rest of the protein is there just to selectively bind to the substrates and hold everything in the right orientation for the two substrates to interact.
If you'd like to spend a little more time here, do a similar analysis of the galactose active site. Carbohydrates are chock full of hydrogen bonding hydroxyl groups. So, you'll see lots of hydrogen bonds and also a complicated hydrogen bonding network with backbone residues. An important thing to note about this is that active sites aren't one-type-fits all. It is pretty common to see these hydrogen bonding networks when a substrate needs lots of hydrogen bonds. In other types of active sites, such as those that bind entirely hydrophobic substrates, you won't see backbone involvement at all. There are plenty of active sites that only involve amino acid side chains.
The interaction between biotin (a vitamin) and streptavidin (a bacterially-derived protein) is the strongest protein-small molecule interaction known. There is also a protein called avidin from eggs that binds biotin and is very similar structurally and functionally to streptavidin, yet I digress. The structural basis for this interaction has been investigated, and you can read about it at PMID: 2911722. There are 2 PDB files you can download: 2IZA and 1STP. The first is the "apo" structure which means that biotin is not bound to the active site. To see what's going on, do the same sort of active-site analysis you did with galactokinase. Start with 1STP file, light up all the residues within 5 angstroms of biotin, and turn them on. Notice what interacts directly with ligand. Now, light them up in the apo file. What is similar between them, what is different?
Well, that gives a glimpse of how proteins work. You now have the basic tools and resources to get started learning more on your own. Some other worthwhile things to look at are some other enzymes such as triose phosphate isomerase--that is the TIM barrel. Some other interesting ones are barnase/barstar which is a classic example of a tight protein-protein interaction. For the interactions between DNAs and proteins, take a look at a zinc finger complex with its DNA substrate. You might also look at some RNA structure. Take a look at 1O15.pdb which is the theophylline-binding aptamer. RNA folding is quite different than protein folding, but you can check out the role of pi-stacking here. One of my personal favorites is 1AKL: alkaline protease from Pseudomonas aeruginosa. It has an interesting motif called the beta roll--check out how it interacts with calcium ions. GFP is really interesting, as is 2VH3 ranasmurfin.
4) Find a protein that you find interesting. What's interesting about it? (describe in under 100 words). Take a picture of it that illustrates what you learned from looking at this structure.