This journal entry is due on Wednesday, October 22 at midnight PDT (Tuesday night/Wednesday morning). NOTE that the server records the time as Eastern Daylight Time (EDT). Therefore, midnight will register as 03:00.

## Individual Journal Assignment

### Defining Your HIV Structure Research Project

For this project, you can choose to work with the same sequences you used for the HIV Evolution Project, or you may choose different sequences. You will reframe your question from the HIV Evolution Project to make it a structure→function question. Instead of looking at how the evolution of variation of the viral DNA sequence affects the different patient groups, you will look at how variations in the viral sequence affect the structure and, therefore, function of the virus.

For this week's journal assignment, your electronic lab notebook entry should contain the answers to the following:

3. Which subjects, visits, and clones will you use to answer your question?
• You should choose a combination of subjects, visits, and clones that will add up to approximately 50 sequences. You will need about that many sequences to answer a reasonably complex question. However, you cannot use more because the multiple sequence alignment tool cannot handle more than that many sequences.
• Justify why you chose the subjects, visits, and clones you did.

### Working with Protein Sequences In-class Activity

• This week we will begin to learn how to analyze protein structures. For today, we will be using the Bioinformatics for Dummies book extensively, so be sure to bring it to class. We will be using some bioinformatics tools to analyze the structure of the gp120 envelope protein.
• Chapter 4: Reading a SWISS-PROT entry (pp. 110-123 in the second edition). The example worked through in the book is the epidermal growth factor receptor. Work through this example using the HIV gp120 envelope protein instead.
• Swiss-Prot is now part of what is known as the UniProt Knowledgebase (UniProt KB). UniProt KB has two parts to it, Swis-Prot, which contains entries for proteins that have been manually reviewed, and TrEMBL (which stands for "Translated EMBL"), which are automated translations of all DNA sequences in the EMBL/GenBank/DDBJ databases. The user interface to this protein database has undergone many revisions since this book was published, but all of the same information can still be found.
• If you search on the keywords "HIV" and "gp120", how many results do you get?
• Use the entry with accession number "Q75760" which corresponds to the HIV gp120 sequence that was used for the crystal structure for the Huang et al. (2005) paper.
• Chapter 5: ORFing your DNA sequence (pp. 146-147 in second edition). In the previous section of the course, we were working with DNA sequences from the HIV gp120 envelope protein. Take one of your DNA sequences and follow the instructions to find the open reading frames in the sequence. Since you were working with just a portion of the entire envelope protein, you may get some strange results. Compare your results with the UniProt entry for the protein above to decipher what the output means.
• Chapter 6: Working with a single protein sequence (pp. 159-195 in second edition). Apply the following tools to the entire HIV gp120 envelop protein sequence that you obtained from UniProt above. We will then compare the results of these analyses with the actual structure of the gp120 protein obtained by X-ray crystallography.
• The ExPASy tools page which lists many available tools for protein analysis
• ProtParam
• ScanProsite (NOTE: uncheck the box for "Exclude motifs with a high probability of occurrence from the scan")
• Chapter 11: Predicting the secondary structure of a protein sequence and additional structural features (pp. 330-336 in second edition). Predict the secondary structures and other structural features that occur in your HIV gp120 sequence and compare it to the published crystal structure from Huang et al. (2005).
• To compare your analyses with the actual crystal structure of gp120, download the structure file for the paper we read in journal club from the NCBI Structure Database.

## Shared Journal Assignment

### Reflection

1. After working with the protein tools on today's assignment, compare your experience with working with the nucleic acid tools versus the protein tools. Which do you like better, and why?