KP Ramirez Week 8

Group
KP Ramirez & Janelle Ruiz

Former Question: Are their differences in HIV-1 diversity or divergence between participants with high CD4 T cell variability within the study (between visits) as compared to participants with linear ‘progression’ (defined as CD4 T cell counts which fall rapidly, or linearly, over time (slope ~ -1)?

Former Prediction: We predict that participants with high variability in T cell count between visits will show a lower HIV-1 diversity and divergence than participants with linear progression. This is predicted under two assumptions
 * (1) High diversity and divergence of HIV-1 variants indicates a more rapidly progressing virus (and thus a steadily falling CD4 T cell count)
 * (2) high variability in T-Cell count will indicate a participant’s immune system was able to manage this virus better than a participant with a steadily falling CD4 counts. If we do see high diversity in participants with high variability in T cell count between visits, we predict that these will be predominantly synonymous mutations as opposed to non-synonymous mutations (which we would expect to see with linear progressors).

Subjects Chosen:
 * Linear Progressors: (slope: -1) Subject: 4, 10
 * High Variability between visits: Subject 12, 8
 * (Low Variability between visits: 5

Working with Protein Sequences In-class Activity

 * This week we will begin to learn how to analyze protein structures. For today, we will be using the Bioinformatics for Dummies book extensively, so be sure to bring it to class. We will be using some bioinformatics tools to analyze the structure of the gp120 envelope protein.
 * Completed

Chapter 2
Retrieving Protein Sequences/Retrieving a list of Related protein sequences (pp. 42-51 in second edition). The example worked through in the book uses the sequence of an enzyme called dUTPase. Follow the book example yourself and then work through the example again, this time using the HIV gp120 envelope protein instead. MRVKEKYQHLWRWGWRWGTMLLGMLMICSATEKLWVTVYYGVPVWKEATTTLFCASDAKA YDTEVHNVWATHACVPTDPNPQEVVLVNVTENFNMWKNDMVEQMHEDIISLWDQSLKPCV KLTPLCVSLKCTDLKNDTNTNSSSGRMIMEKGEIKNCSFNISTSIRGKVQKEYAFFYKLD IIPIDNDTTSYKLTSCNTSVITQACPKVSFEPIPIHYCAPAGFAILKCNNKTFNGTGPCT NVSTVQCTHGIRPVVSTQLLLNGSLAEEEVVIRSVNFTDNAKTIIVQLNTSVEINCTRPN NNTRKRIRIQRGPGRAFVTIGKIGNMRQAHCNISRAKWNNTLKQIASKLREQFGNNKTII FKQSSGGDPEIVTHSFNCGGEFFYCNSTQLFNSTWFNSTWSTEGSNNTEGSDTITLPCRI KQIINMWQKVGKAMYAPPISGQIRCSSNITGLLLTRDGGNSNNESEIFRPGGGDMRDNWR SELYKYKVVKIEPLGVAPTKAKRRVVQREKRAVGIGALFLGFLGAAGSTMGAASMTLTVQ ARQLLSGIVQQQNNLLRAIEAQQHLLQLTVWGIKQLQARILAVERYLKDQQLLGIWGCSG KLICTTAVPWNASWSNKSLEQIWNHTTWMEWDREINNYTSLIHSLIEESQNQQEKNEQEL LELDKWASLWNWFNITNWLWYIKLFIMIVGGLVGLRIVFAVLSIVNRVRQGYSPLSFQTH LPTPRGPDRPEGIEEEGGERDRDRSIRLVNGSLALIWDDLRSLCLFSYHRLRDLLLIVTR IVELLGRRGWEALKYWWNLLQYWSQELKNSAVSLLNATAIAVAEGTDRVIEVVQGACRAI RHIPRRIRQGLERILL'''
 * Human immunodeficiency virus type 1 (isolate HXB2 group M subtype B) (HIV-1)
 * P04578 Accession
 * Presented 4 pages of results
 * '''>sp|P04578|ENV_HV1H2 Envelope glycoprotein gp160 OS=Human immunodeficiency virus type 1 (isolate HXB2 group M subtype B) GN=env PE=1 SV=2

Chapter 4
Reading a SWISS-PROT entry (pp. 110-123 in the second edition). The example worked through in the book is the epidermal growth factor receptor. Work through this example and then do it again with the HIV gp120 envelope protein instead. Comments Cross References Sequence databases 3D structure databases Genome annotation databases Enzyme and pathway databases Family and domain databases The Features Molecule processing Regions Sites Amino acid modifications Experimental info Secondary structure
 * This was a lot different from the Bioinformatics for Dummies book, the entry name has been retained however now they actually spell out and specify Homosapien (HUMAN) rather then EGFR_HUMAN like in the book.
 * Primary (citable) accession number: P04578
 * Secondary accession number(s): O09779
 * These are now located at the bottom of the page
 * Integrated into UniProtKB/Swiss-Prot:August 13, 1987
 * Last sequence update:July 15, 1999
 * Last modified:March 2, 2010
 * Protein name: Recommended name:Envelope glycoprotein gp160 Alternative name(s):Env polyprotein
 * Gene names-Name:env
 * From: Homo sapiens (Human) [TaxID:11706 [NCBI]
 * Taxonomic lineage-Viruses › Retro-transcribing viruses › Retroviridae › Orthoretrovirinae › Lentivirus › Primate lentivirus group
 * The comments section has been completely reworked when compared to the Dummies book. The dummies book presented a simple table like format, the newer version has a paragraph format and now only has Function, Subunit structure, Sub cellular location, Domain, Post Translational Modifications, and Misc.
 * The cross refernces sections are similar to the Dummies book, however, they have been further separated into
 * The features section has been changed to sequence annotation



Chapter 5
ORFing your DNA sequence (pp. 146-147 in second edition). In the previous section of the course, we were working with DNA sequences from the HIV gp120 envelope protein. Take one of your DNA sequences and follow the instructions to find the open reading frames in the sequence. Since you were working with just a portion of the entire envelope protein, you may get some strange results. Compare your results with the SWISS-PROT entry you found for the protein above to decipher what the output means. Besides the NCBI Open Reading Frame Finder described in the book, ExPASy also has a translation tool you can use, found here.


 * Chose Subject 4, a Linear progressor that we examined during our former project.




 * This was then compared to against SWISSPROT entry. The ORF sequence first appeared to be completely similar, however there were a couple of differences in the sequence between the two.

Chapter 6
Working with a single protein sequence (pp. 159-195 in second edition). Work through the following examples in this chapter using the entire HIV gp120 envelop protein sequence that you obtained from SWISS-PROT. We will then compare the results of these analyses with the actual structure of the gp120 protein obtained by X-ray crystallography.
 * ProtParam
 * Began by using the ProParam tool for P04578 ENV_HIVH2
 * The sequence ENV_HV1H2 consists of 856 amino acids.




 * Used the expasy tool page in order to carry out a primary structure analysis.
 * Used the accession number P04578 from Swiss-Prot into the ProtParam.
 * ProtParam generated the parameters of the entire gp120 sequence that we selected.

Looking for transmembrane segmenting We used the accession number P04578 again and put it into the ExPASy ProtScale site and conducted a full range analysis. The image was retrieved in GIF format.
 * Then pasted the gp120 seguence into the ExPasy website again in order to cut the protein.



Interpreting ProtScale results

A piece of paper was used to help us locate the strongest peaks on the graph. We determined that there were four important transmembrane regions.

TMHMM
I generated the TMHMM results by using a FASTA sequence of the gp120 protein.

Looking for PROSITE patterns

 * Used the accession number P04578 to determine which proteins we wanted to be scanned and then started the scan.

InterProScan

 * For this section I used the fasta outlined in the Dummies book.


 * This involved pasting the gp120 sequence that was given in the book and removing some of the larger databases to assist the search

Finding Domains With the CD Server
 * This condensed search, or CD server used NCBI conserved domains search site and added in the FASTA that I downloaded the results were as followed.


 * This was interpreted using the Bioinformatics for dummies book.