# Chris Rhodes Week 9

## Methods

• In order to work with the amino acid sequences of the subjects from the Markham paper each subject's amino acid data was uploaded onto Workbench
• Since our primary understanding of the functional significance of the gp120 amino acid residues is based on the Kwong and Stanfield papers our results can only be interpreted if they coincide with the data we've gathered from these papers. Therefore using the protein data of the sequences from the Kwong and Stanfield papers we will find the regions within the Markham amino acid sequences whose results can be related back to the data of the Kwong and Stanfield papers.
• Individual ClustalW alignments were performed between all the amino acid sequences of a particular subject and the amino acid sequences of the proteins used in the Kwong and Stanfield papers in order to identify regions of possible functional significance in the amino acid sequences of the subjects. The Kwong and Stanfield sequences can be found here: Kwong, Stanfield 1999 Aib, Stanfield 1999 His/Ser Loop, Stanfield 2003
• Regions of functional significance were determined as highly conserved consecutive sequence areas in the alignments between the subject and the Kwong and Stanfield sequences. Examples of the determination of these regions for Subject 13 are shown below where the regions of functional significance are shown within the red and green boxes.

• For Subject 13 the regions of possible functional significance were found to be residues 6-51, 71-95 of Subject 13's amino sequences. Any significant changes in the residues of the Subject 13 amino acid sequences that occur within this region will be considered to be functionally relevant.
• For each subject, both the amount of changes that occur and the exact amino acid substitution that occurs for each change will be recorded. Based on the type of amino acid change that occurs for each residue the change will be hypothesized to be either functionally significant or non-functionally significant.
• Phylogenetic trees of the amino acid sequences, generated through ClustalW alignments, will also be used to aid in determining which amino acid sequences to use when comparing for functional differences.

## Subject 13

Rooted Tree

ClustalW alignment of Subject 13 with conserved changes highlighted in yellow

Table of Amino Acid Residue Changes for Visits 4 and 5

Sequence      1st Residue Change    2nd    3rd    4th
V4-1           N->H                 R->K   E->G   R->G
V4-2           R->K                 Q->K   R->G    -
V4-3           N->H                 R->K   R->G    -
V4-4           R->K                 Q->K   R->G    -
V4-5           R->G                  -      -      -
V4-6           N->H                 E->K   R->G    -
V4-7           E->K                 R->G    -      -
V5-1           R->G                  -      -      -
V5-2           R->G                  -      -      -
V5-3           S->F                 R->G    -      -
V5-4           G->R                  -      -      -
V5-5           R->G                 R->K    -      -
V5-6           I->T                 G->R    -      -


Interpretations The sequences chosen for this experiment were all of Visit 4 and all of Visit 5. Based on the phylogenetic tree we can see that Visit 4 represents a large divergence away from the less diverse root sequences indicating a possible functional difference. Visit 5 contains sequences that are well dispersed through the root of the tree and these sequences were chosen as a sort of control to represent the root sequences. Given that within Subject 13's proteins there is little diversity or divergence it is most likely that the observed conserved changes, though there are only two, represent a shift in protein function between Visits 4 and 5. If this is the case it would support the previous hypothesis that Subject 13 uses a Broad immune response pattern in which the immune system is high selective against functional change away from the root sequences. SInce Visit 4 sequences appear to have a different function than the root sequences it is likely that they would have been wiped out by the high selective immune system explaining why we don't see any descendants off of the Visit 4 branch in the phylogenetic tree. Another point of interest for Subject 13 is the first conserved change that occurs in Visit 4. If you look at the alignment the highlighted region shows the conserved change of G->R for all the Visit 4 sequences however these are not the only sequences in which this change occurs. V5-4 and V5-6 also share this residue change. Since the G->R change present in the Visit 4 sequences still exists in following Visit 5 generation whatever structural shift that occurs due to the change seems to have no negative effect on the survival of the strain. This could indicate that this G->R change may not affect the final functionality of the protein. If this is the case, then we can assume that the functional difference between Visits 4 and 5 can be represented solely by the second conserved change R->G which is only present in the Visit 4 sequences. This information tells us that this particular region of the amino acid sequences is closely related to protein function and could be used as a potential area of future study. By swapping in different amino acid residues for this region and observing any functional changes we could better understand how this region may affect or not affect function.

## Subject 7

Rooted Tree

ClustalW alignment of Subject 7 sequences with conserved changes highlighted in yellow

Table of Conserved Residue Changes Between Subject 7 Visits

Visit #    1st Conserved Change       2nd     3rd     4th     5th      6th     7th     8th     9th
4             L->P                    A->T    K->N    S->P    N->D     N->T    I->V    K->Q    E->G
3             T->S                    A->T     -       -       -        -       -       -
5             V->I                    S->P     -       -       -        -       -       -

• Based on the phylogenetic tree the clones we decided to study for Subject 7 are: All of visit 3, V4-1, V4-2, V4-3, V4-4, V4-6, V5-1, V5-3, V5-6, V5-7, V5-8, and V5-9.Due to the large amount of amino acid sequences we studied for subject 7 it would have been unreasonable to list every individual change of every individual amino acid sequence. Instead This table only lists residue changes that were conserved throughout every amino acid sequence of each visit. It is also more likely that these conserved residue changes are more indicative of functional changes between the visits.

Interpretations

Based on the results gathered it appears that the visit 5 and visit 3 sequences are actually quite similar to each other. Between the two visits there are only 4 conserved changes and of these changes only 2 could be considered functionally relevant. This is in contrast with the results of comparing visit 5 or 3's sequences with the visit 4 sequences where between the visit 4 sequences and either visit 5 or visit 3 there are 11 conserved changes and many of them functionally relevant. From these results it is clear that the visit 4 sequences will show major functional differences when compared to the chosen sequences from visits 5 or 3. It also seems that the visit 5 and 3 sequences may be closely related, though they may differ slightly in function. From previous research it was hypothesized that the visit 3 strains were wiped out by the immune system never to be seen again. It is possible however, that instead of disappearing entirely, a small amount of the visit 3 strains survived, mutated, and re-emerged as the visit 5 sequences explaining why we see such large residue conservation between the two. If this is the case, then the previous hypothesis that Subject 7 follows a best fit model of viral progression can be considered to be accurate and given the severe differences between visit 4 and visits 3 and 5, the similarity between visits 3 and 5, and the pattern of the phylogenetic tree it can be said that the visit 4 constitutes the best fit viral sequence. This indicates that since the functions of visits 3 and 5 are so divergent from visit 4 it is most likely that we would see a disappearance of the visit 5 sequences if a hypothetical visit 6 sample was taken from Subject 7 just as we saw a disappearance of the visit 3 sequences.

## Structure Data

The structure data for the Kwong and Stanfield proteins can be found by following these links:

Secondary Structure Predictions of gp120 and V3 Sequences

The secondary structure predictions were made by inputting the amino acid sequences of the proteins into PSPIRED. The prediction outputs are shown below:

Kwong gp120

Stanfield 1999 V3 Aib

Stanfield 2003 V3

3-D Structures of gp120 and V3 Sequences

The 3-D structures of the proteins are visualized by downloading the structure files into Cn3D. Using the tools of Cn3D, the secondary structures of the proteins were compared to the predictions made by PSPIRED shown above as a measure of accuracy of the PSPIRED predictions. The C and N termini are shown highlighted in yellow with the C terminus on the right and the N terminus on the left. The side chains are not shown in order to make viewing the termini easier since cysteine's side chains are also colored in yellow.

Kwong gp120

After comparing the secondary structures of the 3D renderings to the PSPRIED secondary structure predictions it was found that PSPIRED was fairly accurate for Kwong but contained certain areas where the predicted sheet was shifted 1 or 2 amino acids from the 3D data and there were also some sheets that were one to four amino acids too short but none that were too long.

Stanfield 1999 V3 Aib

After comparing the secondary structures of the 3D renderings to the PSPRIED secondary structure predictions it was found that PSPIRED was somewhat accurate for the V3 loop of Stanfield's 1999 study. The predicted coils match up perfectly but there are no sheets present in the 3D data.

Stanfield 2003 V3

After comparing the secondary structures of the 3D renderings to the PSPRIED secondary structure predictions it was found that PSPIRED was also somewhat accurate for the V3 loop of Stanfield's 2003 study. Again, the predicted coils match up perfectly but there are no sheets present in the 3D data.

The paper we researched is titled Clinical resistance to vicriviroc through adaptive V3 loop mutations in HIV-1 subtype D gp120 that alter interactions with the N-terminus and ECL2 of CCR5 by Ogert et al. (2009) Ogert's research was studying mutations of the V3 loop of HIV-1 in order to better understand its functionality. Their finding indicated that six specific changes in the amino acid sequence of the V3 loop could cause a functional change in the viral coat making it unable to properly infect CD4 cells. Only when all all six site specific mutations occur does the functionality of the V3 loop become altered. In terms of our experiment this represents an interesting point of consideration. Throughout the class we have been making assumptions about the immune response of the Subject based on the changes of the DNA and amino acid sequences of the viral strains. All of our previous conclusions have been based on the assumption that all of the amino acid sequences observed represent functional proteins capable of infecting and passing on their sequences. However, it is possible that some of the sequences we see represent non-functional proteins that are incapable of certain essential functions for infection. If this is the case then the causes of certain sequence disappearances in the phylogenetic tree may actually be independent of the immune system which could drastically affect how the trees can be interpreted. Unfortunately, at our level of bioinformatics there is no way for us to differentiate between functionality and non-functionality of amino acid sequences. It is best then to interpret our results on the assumption that all the proteins observed are functional though their specific functions, mechanisms, or interactions may vary.

Ogert, Robert, Yan Hou, Lei Ba, Lisa Wojcik, Ping Qiu, Nicholas Murgolo, Jose Duca, Lisa Dunkle, Robert Ralston, and John Howe. "Clinical resistance to vicriviroc through adaptive V3 loop mutations in HIV-1 subtype D gp120 that alter interactions with the N-terminus and ECL2 of CCR5." Virology. 400.1 (2009): 145-55. Web. 31 Oct. 2011. <http://0-www.sciencedirect.com.linus.lmu.edu/science/article/pii/S0042682210000863>.

• Unfortunately there is no direct link to the paper's source journal. Only LMU students and faculty will be able to follow the link provided in the citation.