Yaniv Maddahi Journal Week 6

From OpenWetWare
Jump to navigationJump to search

Purpose

  • The purpose of this lab is to develop our skills as researchers and to create a presentation regarding our scientific question. We will be creating a presentation and conducting our own further research to see if we can answer our research question and even consider areas for future development. We will be creating a sequence alignment, phylogenetic tree, and model of ACE2 and spike protein. We will also be identifying key binding sites for the interactions.

Research Question

  • How do differences in ACE2 receptors for host organisms affect binding strength with SARS-CoV?
    • We will use the UniProt database and possibly GenBank for obtaining sequences. We will remain within the mammals category and possibly look at the sequences for ACE2 of Humans, Bats, Civet Cats, Squirrels, Rats, Birds, Dogs, Cats, Lions, Tigers, Ferrets, and Minks.


Methods/Results

  1. Based on our scientific question, we began by searching GenBank for all of our desired sequences.
  2. I chose the following GenBank records
  3. I then downloaded the nucleotide sequence in FASTA format to my local hard drive.
  4. I clicked the "Send to link" in the upper right of the page, selected Complete Record, file as the Destination, and FASTA as the format, and created the file.
  5. I opened the file that I saved with a word processor to confirm that I had the sequence and that it is in the FASTA format. In the FASTA format each sequence is preceded by a label which begins with the greater than sign (>). I searched for the GenBank record associated with that sequence and located the spike protein accession number in the GenBank record. (Note that the spike protein is sometimes called the "S" protein.)
  6. I then downloaded my assigned protein sequence in FASTA format, just like I did for the whole genome sequence.
  7. I began every line with a space character as it will be interpreted as a fixed width font and the sequences will line up nicely on the page.
  8. In order to analyze sequence data I used www.phylogeny.fr, a free, simple to use web service dedicated to reconstructing and analysing phylogenetic relationships between molecular sequences.
  9. In my browser, I went to the website www.phylogeny.fr. I scrolled down on the page to the section labeled ‘Phylogeny analysis’, and clicked on the text ‘One Click’.
  10. I then clicked in the large text field labeled ‘Upload your set of sequences in FASTA, EMBL, or NEXUS format' and copied the list of sequences from the talk page and used Ctrl-V (or command-V) to paste my sequences here. I then clicked the “Submit” button.
  11. After my alignment was complete,I found the numbered tabs located just beneath the text One Click Mode, and clicked on the tab labeled 3. Alignment. Within the alignment, individual positions are color-coded to indicate their conservation, or how similar the sequences are to each other at that position. Blue highlighting indicates high conservation (i.e., the sequences are identical or at least very similar), while gray highlighting indicates lower conservation and white highlighting indicates little if any conservation.

Spike Protein Sequences

>NP_001358344.1 [Homo sapiens]
MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLASWNYNTNITEENVQNMNNAGDKWS
AFLKEQSTLAQMYPLQEIQNLTVKLQLQALQQNGSSVLSEDKSKRLNTILNTMSTIYSTGKVCNPDNPQE
CLLLEPGLNEIMANSLDYNERLWAWESWRSEVGKQLRPLYEEYVVLKNEMARANHYEDYGDYWRGDYEVN
GVDGYDYSRGQLIEDVEHTFEEIKPLYEHLHAYVRAKLMNAYPSYISPIGCLPAHLLGDMWGRFWTNLYS
LTVPFGQKPNIDVTDAMVDQAWDAQRIFKEAEKFFVSVGLPNMTQGFWENSMLTDPGNVQKAVCHPTAWD
LGKGDFRILMCTKVTMDDFLTAHHEMGHIQYDMAYAAQPFLLRNGANEGFHEAVGEIMSLSAATPKHLKS
IGLLSPDFQEDNETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKDQWMKKWWEMKREIVGVVEP
VPHDETYCDPASLFHVSNDYSFIRYYTRTLYQFQFQEALCQAAKHEGPLHKCDISNSTEAGQKLFNMLRL
GKSEPWTLALENVVGAKNMNVRPLLNYFEPLFTWLKDQNKNSFVGWSTDWSPYADQSIKVRISLKSALGD
KAYEWNDNEMYLFRSSVAYAMRQYFLKVKNQMILFGEEDVRVANLKPRISFNFFVTAPKNVSDIIPRTEV
EKAIRMSRSRINDAFRLNDNSLEFLGIQPTLGPPNQPPVSIWLIVFGVVMGVIVVGIVILIFTGIRDRKK
KNKARSGENPYASIDISKGENNPGFQNTDDVQTSF


>AAW78017.1 [Rattus norvegicus]
MSSSCWLLLSLVAVATAQSLIEEKAESFLNKFNQEAEDLSYQSSLASWNYNTNITEENAQKMNEAAAKWS
AFYEEQSKIAQNFSLQEIQNATIKRQLKALQQSGSSALSPDKNKQLNTILNTMSTIYSTGKVCNSMNPQE
CFLLEPGLDEIMATSTDYNRRLWAWEGWRAEVGKQLRPLYEEYVVLKNEMARANNYEDYGDYWRGDYEAE
GVEGYNYNRNQLIEDVENTFKEIKPLYEQLHAYVRTKLMEVYPSYISPTGCLPAHLLGDMWGRFWTNLYP
LTTPFLQKPNIDVTDAMVNQSWDAERIFKEAEKFFVSVGLPQMTPGFWTNSMLTEPGDDRKVVCHPTAWD
LGHGDFRIKMCTKVTMDNFLTAHHEMGHIQYDMAYAKQPFLLRNGANEGFHEAVGEIMSLSAATPKHLKS
IGLLPSNFQEDNETEINFLLKQALTIVGTLPFTYMLEKWRWMVFQDKIPREQWTKKWWEMKREIVGVVEP
LPHDETYCDPASLFHVSNDYSFIRYYTRTIYQFQFQEALCQAAKHDGPLHKCDISNSTEAGQKLLNMLSL
GNSGPWTLALENVVGSRNMDVKPLLNYFQPLFVWLKEQNRNSTVGWSTDWSPYADQSIKVRISLKSALGK
NAYEWTDNEMYLFRSSVAYAMREYFSREKNQTVPFGEADVWVSDLKPRVSFNFFVTSPKNVSDIIPRSEV
EEAIRMSRGRINDIFGLNDNSLEFLGIYPTLKPPYEPPVTIWLIIFGVVMGTVVVGIVILIVTGIKGRKK
KNETKREENPYDSMDIGKGESNAGFQNSDDAQTSF


>AAX63775.1 [Paguma larvata]
MSGSFWLLLSFAALTAAQSTTEELAKTFLETFNYEAQELSYQSSVASWNYNTNITDENAKNMNEAGAKWS
AYYEEQSKLAQTYPLAEIQDAKIKRQLQALQQSGSSVLSADKSQRLNTILNAMSTIYSTGKACNPNNPQE
CLLLEPGLDNIMENSKDYNERLWAWEGWRAEVGKQLRPLYEEYVALKNEMARANNYEDYGDYWRGDYEEE
WTGGYNYSRNQLIQDVEDTFEQIKPLYQHLHAYVRAKLMDTYPSRISRTGCLPAHLLGDMWGRFWTNLYP
LTVPFGQKPNIDVTDAMVNQNWDARRIFKEAEKFFVSVGLPNMTQGFWENSMLTEPGDGRKVVCHPTAWD
LGKGDFRIKMCTKVTMDDFLTAHHEMGHIQYDMAYAAQPFLLRNGANEGFHEAVGEIMSLSAATPNHLKT
IGLLSPAFSEDNETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGAIPKEQWMQKWWEMKRNIVGVVEP
VPHDETYCDPASLFHVANDYSFIRYYTRTIYQFQFQEALCQIAKHEGPLHKCDISNSTEAGKKLLEMLSL
GRSEPWTLALERVVGAKNMNVTPLLNYFEPLFTWLKEQNRNSFVGWDTDWRPYSDQSIKVRISLKSALGE
KAYEWNDNEMYLFRSSIAYAMREYFSKVKNQTIPFVEDNVWVSDLKPRISFNFFVTFSNNVSDVIPRSEV
EDAIRMSRSRINDAFRLDDNSLEFLGIEPTLSPPYRPPVTIWLIVFGVVMGAIVVGIVLLIVSGIRNRRK
NDQAGSEENPYASVDLNKGENNPGFQHADDVQTSF


>U6DXQ3-1 [Neovison vison]
GLPNMTEGFWQNSMLTEPGDNRKVVCHPTAWDLGKHDFRIKMCTKVTMDDFLTAHHEMGH
IQYDMAYAAQPFLLRNGANEGFHEAVGEIMSLSAATPNHLKNIGLLPPDFSEDSETDINF
LLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKEQWMQKWWEMKRDIVGVVEPLPHDETYC
DPAALFHVANDYSFIRYYTRTIYQFQFQEALCQIAKHEGPLYKCDISNSREAGQKLHEML
SLGRSKPWTFALERVVGAKTMDVRPLLNYFEPLFTWLKEQNRNSFVGWNTDWSPYADQSI
KVRISLKSALGEKAYEWNDNEMYFFQSSIAYAMREYFSKVKKQTIPFVDKDVRVSDLKPR
ISFNFIVTSPENMSDIIPRADVEEAIRKSRGRINDAFRLDDNSLEFLGIQPTLEPPYQPP
VTIWLIVFGVVMGVVVVGIFLLIFSGIRNRRKNNQARSEENPYASVDLSKG


>BAE53380.1 [Mustela putorius furo]
MLGSSWLLLSLAALTAAQSTTEDLAKTFLEKFNYEAEELSYQNSLASWNYNTNITDENIQKMNIAGAKWS
AFYEEESQHAKTYPLEEIQDPIIKRQLRALQQSGSSVLSADKRERLNTILNAMSTIYSTGKACNPNNPQE
CLLLEPGLDDIMENSKDYNERLWAWEGWRSEVGKQLRPLYEEYVALKNEMARANNYEDYGDYWRGDYEEE
WADGYSYSRNQLIEDVEHTFTQIKPLYEHLHAYVRAKLMDAYPSRISPTGCLPAHLLGDMWGRFWTNLYP
LMVPFRQKPNIDVTDAMVNQSWDARRIFEEAETFFVSVGLPNMTEGFWQNSMLTEPGDNRKVVCHPTAWD
LGKRDFRIKMCTKVTMDDFLTAHHEMGHIQYDMAYAEQPFLLRNGANEGFHEAVGEIMSLSAATPNHLKN
IGLLPPDFSEDSETDINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKEQWMQKWWEMKRDIVGVVEP
LPHDETYCDPAALFHVANDYSFIRYYTRTIYQFQFQEALCQIAKHEGPLYKCDISNSSEAGQKLHEMLSL
GRSKPWTFALERVVGAKTMDVRPLLNYFEPLFTWLKEQNRNSFVGWNTDWSPYADQSIKVRISLKSALGE
KAYEWNDNEMYFFQSSIAYAMREYFSKVKNQTIPFVGKDVRVSDLKPRISFNFIVTSPENMSDIIPRADV
EEAIRKSRGRINDAFRLDDNSLEFLGIQPTLEPPYQPPVTIWLIVFGVVMGVVVVGIFLLIFSGIRNRRK
NNQARSEENPYASVDLSKGENNPGFQNVDDVQTSF


>AGZ48803.1 [Rhinolophus sinicus]
MSGSSWLLLSLVAVTTAQSTTEDEAKMFLDKFNTKAEDLSHQSSLASWDYNTNINDENVQKMDEAGAKWS
AFYEEQSKLAKNYSLEQIQNVTVKLQLQILQQSGSPVLSEDKSKRLNSILNAMSTIYSTGKVCKPNKPQE
CLLLEPGLDNIMGTSKDYNERLWAWEGWRAEVGKQLRPLYEEYVVLKNEMARGYHYEDYGDYWRRDYETE
ESPGPGYSRDQLMKDVERIFTEIKPLYEHLHAYVRAKLMDTYPFHISPTGCLPAHLLGDMWGRFWTNLYP
LTVPFGQKPNIDVTDEMLKQGWDADRIFKEAEKFFVSVGLPNMTEGFWNNSMLTEPGDGRKVVCHPTAWD
LGKGDFRIKMCTKVTMEDFLTAHHEMGHIQYDMAYASQPYLLRNGANEGFHEAVGEVMSLSVATPKHLKT
MGLLSPDFREDNETEINFLLKQALNIVGTLPFTYMLEKWRWMVFKGEIPKEEWMKKWWEMKRKIVGVVEP
VPHDETYCDPASLFHVANDYSFIRYYTRTIFEFQFHEALCRIAQHDGPLHKCDISNSTDAGKKLHQMLSV
GKSQAWTKTLEDIVDSRNMDVGPLLKYFEPLYTWLQEQNRKSYVGWNTDWSPYSDQSIKVRISLKSALGE
NAYEWNDNEMYLFRSSVAYAMREYFLKEKHQTILFGAENVWVSNLKPRISFNFHVTSPGNLSDIIPRPEV
EGAIRMSRSRINDAFRLDDNSLEFLGIQPTLGPPYQPPVTIWLIVFGVVMAVVVVGIVVLIITGIRDRRK
TDQARSEENPYSSVDLSKGENNPGFQNGDDVQTSF


>NP_001123985.1 [Mus musculus]
MSSSSWLLLSLVAVTTAQSLTEENAKTFLNNFNQEAEDLSYQSSLASWNYNTNITEENAQKMSEAAAKWS
AFYEEQSKTAQSFSLQEIQTPIIKRQLQALQQSGSSALSADKNKQLNTILNTMSTIYSTGKVCNPKNPQE
CLLLEPGLDEIMATSTDYNSRLWAWEGWRAEVGKQLRPLYEEYVVLKNEMARANNYNDYGDYWRGDYEAE
GADGYNYNRNQLIEDVERTFAEIKPLYEHLHAYVRRKLMDTYPSYISPTGCLPAHLLGDMWGRFWTNLYP
LTVPFAQKPNIDVTDAMMNQGWDAERIFQEAEKFFVSVGLPHMTQGFWANSMLTEPADGRKVVCHPTAWD
LGHGDFRIKMCTKVTMDNFLTAHHEMGHIQYDMAYARQPFLLRNGANEGFHEAVGEIMSLSAATPKHLKS
IGLLPSDFQEDSETEINFLLKQALTIVGTLPFTYMLEKWRWMVFRGEIPKEQWMKKWWEMKREIVGVVEP
LPHDETYCDPASLFHVSNDYSFIRYYTRTIYQFQFQEALCQAAKYNGSLHKCDISNSTEAGQKLLKMLSL
GNSEPWTKALENVVGARNMDVKPLLNYFQPLFDWLKEQNRNSFVGWNTEWSPYADQSIKVRISLKSALGA
NAYEWTNNEMFLFRSSVAYAMRKYFSIIKNQTVPFLEEDVRVSDLKPRVSFYFFVTSPQNVSDVIPRSEV
EDAIRMSRGRINDVFGLNDNSLEFLGIHPTLEPPYQPPVTIWLIIFGVVMALVVVGIIILIVTGIKGRKK
KNETKREENPYDSMDIGKGESNAGFQNSDDAQTSF


>XP_007090142.1 [Panthera tigris altaica]
LSFAALTAAQSTTEELAKTFLEKFNHEAEELSYQSSLASWNYNTNITDENVQKMNEAGAKWSAFYEEQSK
LAETYPLAEIHNTTVKRQLQALQQSGSSVLSADKSQRLNTILNAMSTIYSTGKACNPNNPQECLLLEPGL
DDIMENSKDYNERLWAWEGWRAEVGKQLRPLYEEYVALKNEMARANNYEDYGDYWRGDYEEEWTDGYNYS
RSQLIKDVEHTFTQIKPLYQHLHAYVRAKLMDSYPSRISPTGCLPAHLLGDMWGRFWTNLYPLTVPFGQK
PNIDVTDAMVNQSWDARRIFKEAEKFFVSVGLPNMTQGFWENSMLTEPGNSQKVVCHPTAWDLGKGDFRI
KMCTKVTMDDFLTAHHEMGHIQYDMAYAVQPFLLRNGANEGFHEAVGEIMSLSAATPNHLKTIGLLPPGF
SEDSETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKEQWMQKWWEMKREIVGVVEPVPHDETYC
DPASLFHVANDYSFIRYYTRTIYQFQFQEALCRIAKHEGPLHKCDISNSSEAGKKLLQMLTLGKSKPWTL
ALEHVVGEKNMNVTPLLKYFEPLFTWLKEQNRNSFVGWNTDWRPYADQSIKVRISLKSALGDKAYEWNDN
EMYLFRSSVAYAMREYFSKVKNQTIPFVEDNVWVSNLKPRISFNFFVTASKNVSDVIPRREVEEAIRMSR
SRINDAFRLDDNSLEFLGIQPTLSPPYQPPVTIWLIVFGVVMGVVVVGIVLLIVSGIRNRRKNNQARSEE
NPYASVDLSKGENNPGFQHADDVQTSF


>QLH93383.1 [Manis pentadactyla]
MSGSSWLLLSLVAVTAAQSTSDEEAKTFLEKFNSEAEELSYQSSLASWNYNTNITDENVQKMNVAGAKWS
TFYEEQSKIAKNYQLQNIQNDTIKRQLQALQLSGSSALSADKNQRLNTILNTMSTIYSTGKVCNPGNPQE
CSLLEPGLDNIMESSKDYNERLWAWEGWRSEVGKQLRPLYEEYVVLKNEMARANHYEDYGDYWRGDYETE
GANGYNYSRDHLIEDVEHIFTQIKPLYEHLHAYVRAKLMDNYPSHISPTGCLPAHLLGDMWGRFWTNLYP
LTVPFRQKPNIDVTDAMVNQTWDANRIFKEAEKFFVSVGLPKMTQTFWENSMLTEPGDGRKVVCHPTAWD
LGKHDFRIKMCTKVTMDDFLTAHHEMGHIQYDMAYAMQPYLLRNGANEGFHEAVGEIMSLSAATPKHLKN
IGLLPPDFYEDNETEINFLLKQALTIVGTLPFTYMLEKWRWMVFSGQIPKEQWMKKWWEMKREIVGVVEP
VPHDETYCDPASLFHVANDYSFIRYYTRTIYQFQFQEALCQTAKHEGPLHKCDISNSTEAGQKLLQMLSL
GKSKPWTLALERVVGTKNMDVRPLLNYFEPLLTWLKEQNKNSFVGWNTDWSPYAAQSIKVRISLKSALGE
KAYEWNDSEMYLFRSSVAYAMREYFSKFKKQTIPFEEESVRVSDLKPRVSFIFFVTLPKNVSAVIPRAEV
EEAIRMSRSRINDVFRLDDNSLEFLGIQPTLEPPYQPPVTIWLIVFGVVMGVIVVGIVVLIFTGIRDRKK
KNQARSEQNPYASVDLSKGENNPGFQNVDDVQTSF


>NP_001034545.1 [Felis catus]
MSGSFWLLLSFAALTAAQSTTEELAKTFLEKFNHEAEELSYQSSLASWNYNTNITDENVQKMNEAGAKWS
AFYEEQSKLAKTYPLAEIHNTTVKRQLQALQQSGSSVLSADKSQRLNTILNAMSTIYSTGKACNPNNPQE
CLLLEPGLDDIMENSKDYNERLWAWEGWRAEVGKQLRPLYEEYVALKNEMARANNYEDYGDYWRGDYEEE
WTDGYNYSRSQLIKDVEHTFTQIKPLYQHLHAYVRAKLMDTYPSRISPTGCLPAHLLGDMWGRFWTNLYP
LTVPFGQKPNIDVTDAMVNQSWDARRIFKEAEKFFVSVGLPNMTQGFWENSMLTEPGDSRKVVCHPTAWD
LGKGDFRIKMCTKVTMDDFLTAHHEMGHIQYDMAYAVQPFLLRNGANEGFHEAVGEIMSLSAATPNHLKT
IGLLSPGFSEDSETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKEQWMQKWWEMKREIVGVVEP
VPHDETYCDPASLFHVANDYSFIRYYTRTIYQFQFQEALCRIAKHEGPLHKCDISNSSEAGKKLLQMLTL
GKSKPWTLALEHVVGEKKMNVTPLLKYFEPLFTWLKEQNRNSFVGWNTDWRPYADQSIKVRISLKSALGD
EAYEWNDNEMYLFRSSVAYAMREYFSKVKNQTIPFVEDNVWVSNLKPRISFNFFVTASKNVSDVIPRSEV
EEAIRMSRSRINDAFRLDDNSLEFLGIQPTLSPPYQPPVTIWLIVFGVVMGVVVVGIVLLIVSGIRNRRK
NNQARSEENPYASVDLSKGENNPGFQHADDVQTSF


>NP_001158732.1 [Canis lupus familiaris]
MSGSSWLLLSLAALTAAQSTEDLVKTFLEKFNYEAEELSYQSSLASWNYNINITDENVQKMNNAGAKWSA
FYEEQSKLAKTYPLEEIQDSTVKRQLRALQHSGSSVLSADKNQRLNTILNSMSTVYSTGKACNPSNPQEC
LLLEPGLDDIMENSKDYNERLWAWEGWRSEVGKQLRPLYEEYVALKNEMARANNYEDYGDYWRGDYEEEW
ENGYNYSRNQLIDDVELTFTQIMPLYQHLHAYVRTKLMDTYPSYISPTGCLPAHLLGDMWGRFWTNLYPL
TVPFGQKPNIDVTNAMVNQSWDARKIFKEAEKFFVSVGLPNMTQEFWGNSMLTEPSDSRKVVCHPTAWDL
GKGDFRIKMCTKVTMDDFLTAHHEMGHIQYDMAYAAQPFLLRNGANEGFHEAVGEIMSLSAATPNHLKNI
GLLPPSFFEDSETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKDQWMKTWWEMKRNIVGVVEPV
PHDETYCDPASLFHVANDYSFIRYYTRTIYQFQFQEALCQIAKHEGPLHKCDISNSSEAGQKLLEMLKLG
KSKPWTYALEIVVGAKNMDVRPLLNYFEPLFTWLKEQNRNSFVGWNTDWSPYADQSIKVRISLKSALGEK
AYEWNNNEMYLFRSSIAYAMRQYFSEVKNQTIPFVEDNVWVSDLKPRISFNFSVTSPGNVSDIIPRTEVE
EAIRMYRSRINDVFRLDDNSLEFLGIQPTPGPPYEPPVTIWLIVFGVVMGVVVVGIVLLIFSGIRNRRKN
DQARGEENPYASVDLSKGENNPGFQSGDDVQTSF 


>AAY57872.1 [Chlorocebus aethiops]
MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLASWNYNTNITEENVQNMNNAGEKWS
AFLKEQSTLAQMYPLQAIQNLTVKLQLQALQQNGSSVLSEDKSKRLNTILNTMSTIHSTGKVCNPNNPQE
CLLLDPGLNEIMEKSLDYNERLWAWEGWRSEVGKQLRPLYEEYVVLKNEMARANHYKDYGDYWRGDYEVN
GVDGYDYNRDQLIEDVERTFEEIKPLYEHLHAYVRAKLMNAYPSYISPTGCLPAHLLGDMWGRFWTNLYS
LTVPFGQKPNIDVTDAMVNQAWNAQRIFKEAEKFFVSVGLPNMTQGFWENSMLTDPGNVQKVVCHPTAWD
LGKGDFRIIMCTKVTMDDFLTAHHEMGHIQYDMAYAAQPFLLRNGANEGFHEAVGEIMSLSAATPKHLKS
IGLLSPDFQEDNETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKDQWMKKWWEMKREIVGVVEP
VPHDETYCDPASLFHVSNDYSFIRYYTRTLYQFQFQEALCQAAKHEGPLHKCDISNSTEAGQKLLNMLKL
GKSEPWTLALENVVGAKNMSVRPLLNYFEPLFTWLKDQNKNSFVGWSTDWSPYADQSIKVRISLKSALGA
NAYKWNDNEMYLFRSSVAYAMRQYFLENKHQTILFGEEDVRVADLKPRISFNFYVTAPKNVSDIIPRTEV
EEAIRFSRSRINDAFQLNDNSLEFLGIQSTLVPPYQSPITTWLIVFGVVMAVIVAGIVVLIFTGIRDRKK
KNQARSEENPYASIDISKGENNPGFQNTDDVQTSF


>NP_001116542.1 [Sus scrofa]
MSGSFWLLLSLIPVTAAQSTTEELAKTFLEKFNLEAEDLAYQSSLASWTINTNITDENIQKMNDARAKWS
AFYEEQSRIAKTYPLDEIQTLILKRQLQALQQSGTSGLSADKSKRLNTILNTMSTIYSSGKVLDPNNPQE
CLVLEPGLDEIMENSKDYSRRLWAWESWRAEVGKQLRPLYEEYVVLENEMARANNYEDYGDYWRGDYEVT
GTGDYDYSRNQLMEDVERTFAEIKPLYEHLHAYVRAKLMDAYPSRISPTGCLPAHLLGDMWGRFWTNLYP
LTVPFGEKPSIDVTEAMVNQSWDAIRIFEEAEKFFVSIGLPNMTQGFWNNSMLTEPGDGRKVVCHPTAWD
LGKGDFRIKMCTKVTMDDFLTAHHEMGHIQYDMAYAIQPYLLRNGANEGFHEAVGEIMSLSAATPHYLKA
LGLLPPDFYEDSETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKEQWMQKWWEMKREIVGVVEP
LPHDETYCDPACLFHVAEDYSFIRYYTRTIYQFQFHEALCRTAKHEGPLYKCDISNSTEAGQKLLQMLSL
GKSEPWTLALENIVGVKTMDVKPLLSYFEPLLTWLKAQNGNSSVGWNTDWTPYADQSIKVRISLKSALGE
DAYEWNDNEMYLFRSSIAYAMRNYFSSAKNETIPFGAVDVWVSDLKPRISFNFFVTSPANMSDIIPRSDV
EKAISMSRSRINDAFRLDDNTLEFLGIQPTLGPPDEPPVTVWLIIFGVVMGLVVVGIVVLIFTGIRDRRK
KKQASSEENPYGSMDLSKGESNSGFQNGDDIQTSF


>Q5RFN1.1 [Pongo abelii] 
MSGSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLASWNYNTNITEENVQNMNNAGDKWS
AFLKEQSTLAQMYPLQEIQNLTVKLQLQALQQNGSSVLSEDKSKRLNTILNTMSTIYSTGKVCNPNNPQE
CLLLEPGLNEIMANSLDYNERLWAWESWRSEVGKQLRPLYEEYVVLKNEMARANHYEDYGDYWRGDYEVN
GVDSYDYSRGQLIEDVEHTFEEIKPLYEHLHAYVRAKLINAYPSYISPIGCLPAHLLGDMWGRFWTNLYS
LTVPFGQKPNIDVTDAMVDQAWDAQRIFKEAEKFFVSVGLPNMTQRFWENSMLTDPGNVQKVVCHPTAWD
LGKGDFRILMCTKVTMDDFLTAHHEMGHIQYDMAYAAQPFLLRNGANEGFHEAVGEIMSLSAATPKHLKS
IGLLSPDFQEDNETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKDQWMKKWWEMKREIVGVVEP
VPHDETYCDPASLFHVSNDYSFIRYYTRTLYQFQFQEALCQAAKHEGPLHKCDISNSTEAGQKLLNMLRL
GKSEPWTLALENVVGAKNMNVRPLLDYFEPLFTWLKDQNKNSFVGWSTDWSPYADQSIKVRISLKSALGN
KAYEWNDNEIYLFRSSVAYAMRKYFLEVKNQMILFGEEDVRVANLKPRISFNFFVTAPKNVSDIIPRTEV
EKAIRMSRSRINDAFRLNDNSLEFLGIQPTLGPPNQPPVSIWLIVFGVVMGVIVVGIVVLIFTGIRDRKK
KNKARNEENPYASIDISKGENNPGFQNTDDVQTSF

Multiple Sequence Allignment

AAW78017.1      MSSSCWLLLSLVAVATAQSLIEEKAESFLNKFNQEAEDLSYQSSLASWNYNTNITEENAQ
NP_0011239      MSSSSWLLLSLVAVTTAQSLTEENAKTFLNNFNQEAEDLSYQSSLASWNYNTNITEENAQ
AGZ48803.1      MSGSSWLLLSLVAVTTAQSTTEDEAKMFLDKFNTKAEDLSHQSSLASWDYNTNINDENVQ
NP_0011165      MSGSFWLLLSLIPVTAAQSTTEELAKTFLEKFNLEAEDLAYQSSLASWTINTNITDENIQ
AAY57872.1      MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLASWNYNTNITEENVQ
NP_0013583      MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLASWNYNTNITEENVQ
Q5RFN1.1_P      MSGSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLASWNYNTNITEENVQ
QLH93383.1      MSGSSWLLLSLVAVTAAQSTSDEEAKTFLEKFNSEAEELSYQSSLASWNYNTNITDENVQ
U6DXQ3-1_N      ------------------------------------------------------------
BAE53380.1      MLGSSWLLLSLAALTAAQSTTEDLAKTFLEKFNYEAEELSYQNSLASWNYNTNITDENIQ
NP_0011587      MSGSSWLLLSLAALTAAQST-EDLVKTFLEKFNYEAEELSYQSSLASWNYNINITDENVQ
AAX63775.1      MSGSFWLLLSFAALTAAQSTTEELAKTFLETFNYEAQELSYQSSVASWNYNTNITDENAK
XP_0070901      --------LSFAALTAAQSTTEELAKTFLEKFNHEAEELSYQSSLASWNYNTNITDENVQ
NP_0010345      MSGSFWLLLSFAALTAAQSTTEELAKTFLEKFNHEAEELSYQSSLASWNYNTNITDENVQ
                                                                           
AAW78017.1      KMNEAAAKWSAFYEEQSKIAQNFSLQEIQNATIKRQLKALQQSGSSALSPDKNKQLNTIL
NP_0011239      KMSEAAAKWSAFYEEQSKTAQSFSLQEIQTPIIKRQLQALQQSGSSALSADKNKQLNTIL
AGZ48803.1      KMDEAGAKWSAFYEEQSKLAKNYSLEQIQNVTVKLQLQILQQSGSPVLSEDKSKRLNSIL
NP_0011165      KMNDARAKWSAFYEEQSRIAKTYPLDEIQTLILKRQLQALQQSGTSGLSADKSKRLNTIL
AAY57872.1      NMNNAGEKWSAFLKEQSTLAQMYPLQAIQNLTVKLQLQALQQNGSSVLSEDKSKRLNTIL
NP_0013583      NMNNAGDKWSAFLKEQSTLAQMYPLQEIQNLTVKLQLQALQQNGSSVLSEDKSKRLNTIL
Q5RFN1.1_P      NMNNAGDKWSAFLKEQSTLAQMYPLQEIQNLTVKLQLQALQQNGSSVLSEDKSKRLNTIL
QLH93383.1      KMNVAGAKWSTFYEEQSKIAKNYQLQNIQNDTIKRQLQALQLSGSSALSADKNQRLNTIL
U6DXQ3-1_N      ------------------------------------------------------------
BAE53380.1      KMNIAGAKWSAFYEEESQHAKTYPLEEIQDPIIKRQLRALQQSGSSVLSADKRERLNTIL
NP_0011587      KMNNAGAKWSAFYEEQSKLAKTYPLEEIQDSTVKRQLRALQHSGSSVLSADKNQRLNTIL
AAX63775.1      NMNEAGAKWSAYYEEQSKLAQTYPLAEIQDAKIKRQLQALQQSGSSVLSADKSQRLNTIL
XP_0070901      KMNEAGAKWSAFYEEQSKLAETYPLAEIHNTTVKRQLQALQQSGSSVLSADKSQRLNTIL
NP_0010345      KMNEAGAKWSAFYEEQSKLAKTYPLAEIHNTTVKRQLQALQQSGSSVLSADKSQRLNTIL
                                                                           
AAW78017.1      NTMSTIYSTGKVCNSMNPQECFLLEPGLDEIMATSTDYNRRLWAWEGWRAEVGKQLRPLY
NP_0011239      NTMSTIYSTGKVCNPKNPQECLLLEPGLDEIMATSTDYNSRLWAWEGWRAEVGKQLRPLY
AGZ48803.1      NAMSTIYSTGKVCKPNKPQECLLLEPGLDNIMGTSKDYNERLWAWEGWRAEVGKQLRPLY
NP_0011165      NTMSTIYSSGKVLDPNNPQECLVLEPGLDEIMENSKDYSRRLWAWESWRAEVGKQLRPLY
AAY57872.1      NTMSTIHSTGKVCNPNNPQECLLLDPGLNEIMEKSLDYNERLWAWEGWRSEVGKQLRPLY
NP_0013583      NTMSTIYSTGKVCNPDNPQECLLLEPGLNEIMANSLDYNERLWAWESWRSEVGKQLRPLY
Q5RFN1.1_P      NTMSTIYSTGKVCNPNNPQECLLLEPGLNEIMANSLDYNERLWAWESWRSEVGKQLRPLY
QLH93383.1      NTMSTIYSTGKVCNPGNPQECSLLEPGLDNIMESSKDYNERLWAWEGWRSEVGKQLRPLY
U6DXQ3-1_N      ------------------------------------------------------------
BAE53380.1      NAMSTIYSTGKACNPNNPQECLLLEPGLDDIMENSKDYNERLWAWEGWRSEVGKQLRPLY
NP_0011587      NSMSTVYSTGKACNPSNPQECLLLEPGLDDIMENSKDYNERLWAWEGWRSEVGKQLRPLY
AAX63775.1      NAMSTIYSTGKACNPNNPQECLLLEPGLDNIMENSKDYNERLWAWEGWRAEVGKQLRPLY
XP_0070901      NAMSTIYSTGKACNPNNPQECLLLEPGLDDIMENSKDYNERLWAWEGWRAEVGKQLRPLY
NP_0010345      NAMSTIYSTGKACNPNNPQECLLLEPGLDDIMENSKDYNERLWAWEGWRAEVGKQLRPLY
                                                                           
AAW78017.1      EEYVVLKNEMARANNYEDYGDYWRGDYEAEGVEGYNYNRNQLIEDVENTFKEIKPLYEQL
NP_0011239      EEYVVLKNEMARANNYNDYGDYWRGDYEAEGADGYNYNRNQLIEDVERTFAEIKPLYEHL
AGZ48803.1      EEYVVLKNEMARGYHYEDYGDYWRRDYETEESPGPGYSRDQLMKDVERIFTEIKPLYEHL
NP_0011165      EEYVVLENEMARANNYEDYGDYWRGDYEVTGTGDYDYSRNQLMEDVERTFAEIKPLYEHL
AAY57872.1      EEYVVLKNEMARANHYKDYGDYWRGDYEVNGVDGYDYNRDQLIEDVERTFEEIKPLYEHL
NP_0013583      EEYVVLKNEMARANHYEDYGDYWRGDYEVNGVDGYDYSRGQLIEDVEHTFEEIKPLYEHL
Q5RFN1.1_P      EEYVVLKNEMARANHYEDYGDYWRGDYEVNGVDSYDYSRGQLIEDVEHTFEEIKPLYEHL
QLH93383.1      EEYVVLKNEMARANHYEDYGDYWRGDYETEGANGYNYSRDHLIEDVEHIFTQIKPLYEHL
U6DXQ3-1_N      ------------------------------------------------------------
BAE53380.1      EEYVALKNEMARANNYEDYGDYWRGDYEEEWADGYSYSRNQLIEDVEHTFTQIKPLYEHL
NP_0011587      EEYVALKNEMARANNYEDYGDYWRGDYEEEWENGYNYSRNQLIDDVELTFTQIMPLYQHL
AAX63775.1      EEYVALKNEMARANNYEDYGDYWRGDYEEEWTGGYNYSRNQLIQDVEDTFEQIKPLYQHL
XP_0070901      EEYVALKNEMARANNYEDYGDYWRGDYEEEWTDGYNYSRSQLIKDVEHTFTQIKPLYQHL
NP_0010345      EEYVALKNEMARANNYEDYGDYWRGDYEEEWTDGYNYSRSQLIKDVEHTFTQIKPLYQHL
                                                                           
AAW78017.1      HAYVRTKLMEVYPSYISPTGCLPAHLLGDMWGRFWTNLYPLTTPFLQKPNIDVTDAMVNQ
NP_0011239      HAYVRRKLMDTYPSYISPTGCLPAHLLGDMWGRFWTNLYPLTVPFAQKPNIDVTDAMMNQ
AGZ48803.1      HAYVRAKLMDTYPFHISPTGCLPAHLLGDMWGRFWTNLYPLTVPFGQKPNIDVTDEMLKQ
NP_0011165      HAYVRAKLMDAYPSRISPTGCLPAHLLGDMWGRFWTNLYPLTVPFGEKPSIDVTEAMVNQ
AAY57872.1      HAYVRAKLMNAYPSYISPTGCLPAHLLGDMWGRFWTNLYSLTVPFGQKPNIDVTDAMVNQ
NP_0013583      HAYVRAKLMNAYPSYISPIGCLPAHLLGDMWGRFWTNLYSLTVPFGQKPNIDVTDAMVDQ
Q5RFN1.1_P      HAYVRAKLINAYPSYISPIGCLPAHLLGDMWGRFWTNLYSLTVPFGQKPNIDVTDAMVDQ
QLH93383.1      HAYVRAKLMDNYPSHISPTGCLPAHLLGDMWGRFWTNLYPLTVPFRQKPNIDVTDAMVNQ
U6DXQ3-1_N      ------------------------------------------------------------
BAE53380.1      HAYVRAKLMDAYPSRISPTGCLPAHLLGDMWGRFWTNLYPLMVPFRQKPNIDVTDAMVNQ
NP_0011587      HAYVRTKLMDTYPSYISPTGCLPAHLLGDMWGRFWTNLYPLTVPFGQKPNIDVTNAMVNQ
AAX63775.1      HAYVRAKLMDTYPSRISRTGCLPAHLLGDMWGRFWTNLYPLTVPFGQKPNIDVTDAMVNQ
XP_0070901      HAYVRAKLMDSYPSRISPTGCLPAHLLGDMWGRFWTNLYPLTVPFGQKPNIDVTDAMVNQ
NP_0010345      HAYVRAKLMDTYPSRISPTGCLPAHLLGDMWGRFWTNLYPLTVPFGQKPNIDVTDAMVNQ
                                                                           
AAW78017.1      SWDAERIFKEAEKFFVSVGLPQMTPGFWTNSMLTEPGDDRKVVCHPTAWDLGHGDFRIKM
NP_0011239      GWDAERIFQEAEKFFVSVGLPHMTQGFWANSMLTEPADGRKVVCHPTAWDLGHGDFRIKM
AGZ48803.1      GWDADRIFKEAEKFFVSVGLPNMTEGFWNNSMLTEPGDGRKVVCHPTAWDLGKGDFRIKM
NP_0011165      SWDAIRIFEEAEKFFVSIGLPNMTQGFWNNSMLTEPGDGRKVVCHPTAWDLGKGDFRIKM
AAY57872.1      AWNAQRIFKEAEKFFVSVGLPNMTQGFWENSMLTDPGNVQKVVCHPTAWDLGKGDFRIIM
NP_0013583      AWDAQRIFKEAEKFFVSVGLPNMTQGFWENSMLTDPGNVQKAVCHPTAWDLGKGDFRILM
Q5RFN1.1_P      AWDAQRIFKEAEKFFVSVGLPNMTQRFWENSMLTDPGNVQKVVCHPTAWDLGKGDFRILM
QLH93383.1      TWDANRIFKEAEKFFVSVGLPKMTQTFWENSMLTEPGDGRKVVCHPTAWDLGKHDFRIKM
U6DXQ3-1_N      ------------------GLPNMTEGFWQNSMLTEPGDNRKVVCHPTAWDLGKHDFRIKM
BAE53380.1      SWDARRIFEEAETFFVSVGLPNMTEGFWQNSMLTEPGDNRKVVCHPTAWDLGKRDFRIKM
NP_0011587      SWDARKIFKEAEKFFVSVGLPNMTQEFWGNSMLTEPSDSRKVVCHPTAWDLGKGDFRIKM
AAX63775.1      NWDARRIFKEAEKFFVSVGLPNMTQGFWENSMLTEPGDGRKVVCHPTAWDLGKGDFRIKM
XP_0070901      SWDARRIFKEAEKFFVSVGLPNMTQGFWENSMLTEPGNSQKVVCHPTAWDLGKGDFRIKM
NP_0010345      SWDARRIFKEAEKFFVSVGLPNMTQGFWENSMLTEPGDSRKVVCHPTAWDLGKGDFRIKM
                                  ***:**  ** *****:*.: .*.**********: **** *
AAW78017.1      CTKVTMDNFLTAHHEMGHIQYDMAYAKQPFLLRNGANEGFHEAVGEIMSLSAATPKHLKS
NP_0011239      CTKVTMDNFLTAHHEMGHIQYDMAYARQPFLLRNGANEGFHEAVGEIMSLSAATPKHLKS
AGZ48803.1      CTKVTMEDFLTAHHEMGHIQYDMAYASQPYLLRNGANEGFHEAVGEVMSLSVATPKHLKT
NP_0011165      CTKVTMDDFLTAHHEMGHIQYDMAYAIQPYLLRNGANEGFHEAVGEIMSLSAATPHYLKA
AAY57872.1      CTKVTMDDFLTAHHEMGHIQYDMAYAAQPFLLRNGANEGFHEAVGEIMSLSAATPKHLKS
NP_0013583      CTKVTMDDFLTAHHEMGHIQYDMAYAAQPFLLRNGANEGFHEAVGEIMSLSAATPKHLKS
Q5RFN1.1_P      CTKVTMDDFLTAHHEMGHIQYDMAYAAQPFLLRNGANEGFHEAVGEIMSLSAATPKHLKS
QLH93383.1      CTKVTMDDFLTAHHEMGHIQYDMAYAMQPYLLRNGANEGFHEAVGEIMSLSAATPKHLKN
U6DXQ3-1_N      CTKVTMDDFLTAHHEMGHIQYDMAYAAQPFLLRNGANEGFHEAVGEIMSLSAATPNHLKN
BAE53380.1      CTKVTMDDFLTAHHEMGHIQYDMAYAEQPFLLRNGANEGFHEAVGEIMSLSAATPNHLKN
NP_0011587      CTKVTMDDFLTAHHEMGHIQYDMAYAAQPFLLRNGANEGFHEAVGEIMSLSAATPNHLKN
AAX63775.1      CTKVTMDDFLTAHHEMGHIQYDMAYAAQPFLLRNGANEGFHEAVGEIMSLSAATPNHLKT
XP_0070901      CTKVTMDDFLTAHHEMGHIQYDMAYAVQPFLLRNGANEGFHEAVGEIMSLSAATPNHLKT
NP_0010345      CTKVTMDDFLTAHHEMGHIQYDMAYAVQPFLLRNGANEGFHEAVGEIMSLSAATPNHLKT
                ******::****************** **:****************:****.***::** 
AAW78017.1      IGLLPSNFQEDNETEINFLLKQALTIVGTLPFTYMLEKWRWMVFQDKIPREQWTKKWWEM
NP_0011239      IGLLPSDFQEDSETEINFLLKQALTIVGTLPFTYMLEKWRWMVFRGEIPKEQWMKKWWEM
AGZ48803.1      MGLLSPDFREDNETEINFLLKQALNIVGTLPFTYMLEKWRWMVFKGEIPKEEWMKKWWEM
NP_0011165      LGLLPPDFYEDSETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKEQWMQKWWEM
AAY57872.1      IGLLSPDFQEDNETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKDQWMKKWWEM
NP_0013583      IGLLSPDFQEDNETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKDQWMKKWWEM
Q5RFN1.1_P      IGLLSPDFQEDNETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKDQWMKKWWEM
QLH93383.1      IGLLPPDFYEDNETEINFLLKQALTIVGTLPFTYMLEKWRWMVFSGQIPKEQWMKKWWEM
U6DXQ3-1_N      IGLLPPDFSEDSETDINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKEQWMQKWWEM
BAE53380.1      IGLLPPDFSEDSETDINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKEQWMQKWWEM
NP_0011587      IGLLPPSFFEDSETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKDQWMKTWWEM
AAX63775.1      IGLLSPAFSEDNETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGAIPKEQWMQKWWEM
XP_0070901      IGLLPPGFSEDSETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKEQWMQKWWEM
NP_0010345      IGLLSPGFSEDSETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKEQWMQKWWEM
                :***.. * **.**:*********.******************* . **.::* :.****
AAW78017.1      KREIVGVVEPLPHDETYCDPASLFHVSNDYSFIRYYTRTIYQFQFQEALCQAAKHDGPLH
NP_0011239      KREIVGVVEPLPHDETYCDPASLFHVSNDYSFIRYYTRTIYQFQFQEALCQAAKYNGSLH
AGZ48803.1      KRKIVGVVEPVPHDETYCDPASLFHVANDYSFIRYYTRTIFEFQFHEALCRIAQHDGPLH
NP_0011165      KREIVGVVEPLPHDETYCDPACLFHVAEDYSFIRYYTRTIYQFQFHEALCRTAKHEGPLY
AAY57872.1      KREIVGVVEPVPHDETYCDPASLFHVSNDYSFIRYYTRTLYQFQFQEALCQAAKHEGPLH
NP_0013583      KREIVGVVEPVPHDETYCDPASLFHVSNDYSFIRYYTRTLYQFQFQEALCQAAKHEGPLH
Q5RFN1.1_P      KREIVGVVEPVPHDETYCDPASLFHVSNDYSFIRYYTRTLYQFQFQEALCQAAKHEGPLH
QLH93383.1      KREIVGVVEPVPHDETYCDPASLFHVANDYSFIRYYTRTIYQFQFQEALCQTAKHEGPLH
U6DXQ3-1_N      KRDIVGVVEPLPHDETYCDPAALFHVANDYSFIRYYTRTIYQFQFQEALCQIAKHEGPLY
BAE53380.1      KRDIVGVVEPLPHDETYCDPAALFHVANDYSFIRYYTRTIYQFQFQEALCQIAKHEGPLY
NP_0011587      KRNIVGVVEPVPHDETYCDPASLFHVANDYSFIRYYTRTIYQFQFQEALCQIAKHEGPLH
AAX63775.1      KRNIVGVVEPVPHDETYCDPASLFHVANDYSFIRYYTRTIYQFQFQEALCQIAKHEGPLH
XP_0070901      KREIVGVVEPVPHDETYCDPASLFHVANDYSFIRYYTRTIYQFQFQEALCRIAKHEGPLH
NP_0010345      KREIVGVVEPVPHDETYCDPASLFHVANDYSFIRYYTRTIYQFQFQEALCRIAKHEGPLH
                **.*******:**********.****::***********:::***:****. *:::*.*:
AAW78017.1      KCDISNSTEAGQKLLNMLSLGNSGPWTLALENVVGSRNMDVKPLLNYFQPLFVWLKEQNR
NP_0011239      KCDISNSTEAGQKLLKMLSLGNSEPWTKALENVVGARNMDVKPLLNYFQPLFDWLKEQNR
AGZ48803.1      KCDISNSTDAGKKLHQMLSVGKSQAWTKTLEDIVDSRNMDVGPLLKYFEPLYTWLQEQNR
NP_0011165      KCDISNSTEAGQKLLQMLSLGKSEPWTLALENIVGVKTMDVKPLLSYFEPLLTWLKAQNG
AAY57872.1      KCDISNSTEAGQKLLNMLKLGKSEPWTLALENVVGAKNMSVRPLLNYFEPLFTWLKDQNK
NP_0013583      KCDISNSTEAGQKLFNMLRLGKSEPWTLALENVVGAKNMNVRPLLNYFEPLFTWLKDQNK
Q5RFN1.1_P      KCDISNSTEAGQKLLNMLRLGKSEPWTLALENVVGAKNMNVRPLLDYFEPLFTWLKDQNK
QLH93383.1      KCDISNSTEAGQKLLQMLSLGKSKPWTLALERVVGTKNMDVRPLLNYFEPLLTWLKEQNK
U6DXQ3-1_N      KCDISNSREAGQKLHEMLSLGRSKPWTFALERVVGAKTMDVRPLLNYFEPLFTWLKEQNR
BAE53380.1      KCDISNSSEAGQKLHEMLSLGRSKPWTFALERVVGAKTMDVRPLLNYFEPLFTWLKEQNR
NP_0011587      KCDISNSSEAGQKLLEMLKLGKSKPWTYALEIVVGAKNMDVRPLLNYFEPLFTWLKEQNR
AAX63775.1      KCDISNSTEAGKKLLEMLSLGRSEPWTLALERVVGAKNMNVTPLLNYFEPLFTWLKEQNR
XP_0070901      KCDISNSSEAGKKLLQMLTLGKSKPWTLALEHVVGEKNMNVTPLLKYFEPLFTWLKEQNR
NP_0010345      KCDISNSSEAGKKLLQMLTLGKSKPWTLALEHVVGEKKMNVTPLLKYFEPLFTWLKEQNR
                ******* :**:** :** :*.* .** :** :*. ..*.* ***.**:**  **: ** 
AAW78017.1      NSTVGWSTDWSPYADQSIKVRISLKSALGKNAYEWTDNEMYLFRSSVAYAMREYFSREKN
NP_0011239      NSFVGWNTEWSPYADQSIKVRISLKSALGANAYEWTNNEMFLFRSSVAYAMRKYFSIIKN
AGZ48803.1      KSYVGWNTDWSPYSDQSIKVRISLKSALGENAYEWNDNEMYLFRSSVAYAMREYFLKEKH
NP_0011165      NSSVGWNTDWTPYADQSIKVRISLKSALGEDAYEWNDNEMYLFRSSIAYAMRNYFSSAKN
AAY57872.1      NSFVGWSTDWSPYADQSIKVRISLKSALGANAYKWNDNEMYLFRSSVAYAMRQYFLENKH
NP_0013583      NSFVGWSTDWSPYADQSIKVRISLKSALGDKAYEWNDNEMYLFRSSVAYAMRQYFLKVKN
Q5RFN1.1_P      NSFVGWSTDWSPYADQSIKVRISLKSALGNKAYEWNDNEIYLFRSSVAYAMRKYFLEVKN
QLH93383.1      NSFVGWNTDWSPYAAQSIKVRISLKSALGEKAYEWNDSEMYLFRSSVAYAMREYFSKFKK
U6DXQ3-1_N      NSFVGWNTDWSPYADQSIKVRISLKSALGEKAYEWNDNEMYFFQSSIAYAMREYFSKVKK
BAE53380.1      NSFVGWNTDWSPYADQSIKVRISLKSALGEKAYEWNDNEMYFFQSSIAYAMREYFSKVKN
NP_0011587      NSFVGWNTDWSPYADQSIKVRISLKSALGEKAYEWNNNEMYLFRSSIAYAMRQYFSEVKN
AAX63775.1      NSFVGWDTDWRPYSDQSIKVRISLKSALGEKAYEWNDNEMYLFRSSIAYAMREYFSKVKN
XP_0070901      NSFVGWNTDWRPYADQSIKVRISLKSALGDKAYEWNDNEMYLFRSSVAYAMREYFSKVKN
NP_0010345      NSFVGWNTDWRPYADQSIKVRISLKSALGDEAYEWNDNEMYLFRSSVAYAMREYFSKVKN
                :* ***.*:* **: ************** .**:*.:.*:::*.**:*****:**   *:
AAW78017.1      QTVPFGEADVWVSDLKPRVSFNFFVTSPKNVSDIIPRSEVEEAIRMSRGRINDIFGLNDN
NP_0011239      QTVPFLEEDVRVSDLKPRVSFYFFVTSPQNVSDVIPRSEVEDAIRMSRGRINDVFGLNDN
AGZ48803.1      QTILFGAENVWVSNLKPRISFNFHVTSPGNLSDIIPRPEVEGAIRMSRSRINDAFRLDDN
NP_0011165      ETIPFGAVDVWVSDLKPRISFNFFVTSPANMSDIIPRSDVEKAISMSRSRINDAFRLDDN
AAY57872.1      QTILFGEEDVRVADLKPRISFNFYVTAPKNVSDIIPRTEVEEAIRFSRSRINDAFQLNDN
NP_0013583      QMILFGEEDVRVANLKPRISFNFFVTAPKNVSDIIPRTEVEKAIRMSRSRINDAFRLNDN
Q5RFN1.1_P      QMILFGEEDVRVANLKPRISFNFFVTAPKNVSDIIPRTEVEKAIRMSRSRINDAFRLNDN
QLH93383.1      QTIPFEEESVRVSDLKPRVSFIFFVTLPKNVSAVIPRAEVEEAIRMSRSRINDVFRLDDN
U6DXQ3-1_N      QTIPFVDKDVRVSDLKPRISFNFIVTSPENMSDIIPRADVEEAIRKSRGRINDAFRLDDN
BAE53380.1      QTIPFVGKDVRVSDLKPRISFNFIVTSPENMSDIIPRADVEEAIRKSRGRINDAFRLDDN
NP_0011587      QTIPFVEDNVWVSDLKPRISFNFSVTSPGNVSDIIPRTEVEEAIRMYRSRINDVFRLDDN
AAX63775.1      QTIPFVEDNVWVSDLKPRISFNFFVTFSNNVSDVIPRSEVEDAIRMSRSRINDAFRLDDN
XP_0070901      QTIPFVEDNVWVSNLKPRISFNFFVTASKNVSDVIPRREVEEAIRMSRSRINDAFRLDDN
NP_0010345      QTIPFVEDNVWVSNLKPRISFNFFVTASKNVSDVIPRSEVEEAIRMSRSRINDAFRLDDN
                : : *   .*.*::****:** * ** . *:* :*** :** **   *.**** * *:**
AAW78017.1      SLEFLGIYPTLKPPYEPPVTIWLIIFGVVMGTVVVGIVILIVTGIKGRKKKNETKREENP
NP_0011239      SLEFLGIHPTLEPPYQPPVTIWLIIFGVVMALVVVGIIILIVTGIKGRKKKNETKREENP
AGZ48803.1      SLEFLGIQPTLGPPYQPPVTIWLIVFGVVMAVVVVGIVVLIITGIRDRRKTDQARSEENP
NP_0011165      TLEFLGIQPTLGPPDEPPVTVWLIIFGVVMGLVVVGIVVLIFTGIRDRRKKKQASSEENP
AAY57872.1      SLEFLGIQSTLVPPYQSPITTWLIVFGVVMAVIVAGIVVLIFTGIRDRKKKNQARSEENP
NP_0013583      SLEFLGIQPTLGPPNQPPVSIWLIVFGVVMGVIVVGIVILIFTGIRDRKKKNKARSGENP
Q5RFN1.1_P      SLEFLGIQPTLGPPNQPPVSIWLIVFGVVMGVIVVGIVVLIFTGIRDRKKKNKARNEENP
QLH93383.1      SLEFLGIQPTLEPPYQPPVTIWLIVFGVVMGVIVVGIVVLIFTGIRDRKKKNQARSEQNP
U6DXQ3-1_N      SLEFLGIQPTLEPPYQPPVTIWLIVFGVVMGVVVVGIFLLIFSGIRNRRKNNQARSEENP
BAE53380.1      SLEFLGIQPTLEPPYQPPVTIWLIVFGVVMGVVVVGIFLLIFSGIRNRRKNNQARSEENP
NP_0011587      SLEFLGIQPTPGPPYEPPVTIWLIVFGVVMGVVVVGIVLLIFSGIRNRRKNDQARGEENP
AAX63775.1      SLEFLGIEPTLSPPYRPPVTIWLIVFGVVMGAIVVGIVLLIVSGIRNRRKNDQAGSEENP
XP_0070901      SLEFLGIQPTLSPPYQPPVTIWLIVFGVVMGVVVVGIVLLIVSGIRNRRKNNQARSEENP
NP_0010345      SLEFLGIQPTLSPPYQPPVTIWLIVFGVVMGVVVVGIVLLIVSGIRNRRKNNQARSEENP
                :****** .*  **  .*:: ***:*****. :*.**.:**.:**..*.*..::   :**
AAW78017.1      YDSMDIGKGESNAGFQNSDDAQTSF
NP_0011239      YDSMDIGKGESNAGFQNSDDAQTSF
AGZ48803.1      YSSVDLSKGENNPGFQNGDDVQTSF
NP_0011165      YGSMDLSKGESNSGFQNGDDIQTSF
AAY57872.1      YASIDISKGENNPGFQNTDDVQTSF
NP_0013583      YASIDISKGENNPGFQNTDDVQTSF
Q5RFN1.1_P      YASIDISKGENNPGFQNTDDVQTSF
QLH93383.1      YASVDLSKGENNPGFQNVDDVQTSF
U6DXQ3-1_N      YASVDLSKG----------------
BAE53380.1      YASVDLSKGENNPGFQNVDDVQTSF
NP_0011587      YASVDLSKGENNPGFQSGDDVQTSF
AAX63775.1      YASVDLNKGENNPGFQHADDVQTSF
XP_0070901      YASVDLSKGENNPGFQHADDVQTSF
NP_0010345      YASVDLSKGENNPGFQHADDVQTSF
                * *:*:.**                


Phylogenetic Tree

MadCor pres1tree.png

Model

To create the model we used the Figure 1C SARS-CoV RBD (optimized for human ACE2 recognition) and human ACE2: 3SCI. Follow the link to start the modelling process.

  • Click on the Windows menu to “View Sequences & Annotations”
  • In the new window that appears to the right, click on the “Details” tab to show the actual amino acid sequences
  • There are 2 sets of ACE2-spike proteins because of the way the proteins crystallized.
  • Focus on the pink and tan chains and orient them like is shown in Figure 4B
  • We are going to make the amino acid side chains shown in the figure visible.
  • In the sequence window go to sequence “Protein 3SCK_A” (in pink) and select the following amino acids. Use the overall
    • K31
    • E35
    • D38
    • M82
    • K353
  • The part of the ribbon that represents these amino acids should be highlighted in yellow in the structure
  • Go to the Styles menu and select Proteins > Ball and Stick
  • Go to the Color menu and select Atom
  • You should see the side chains shown in the figure.
  • The labels were then added using Microsoft Paint.

Bioinformatics.model.png

Chart

MaddahiJTChart.png

Presentation

File:Maddahi CorreyBioinformatics Presentation 1 slides.pdf

Research Conclusion

  • Upon analysis of the model and Human ACE2-Sprike Protein RBD structure, we were able to identify key regions at which the amino acids interact. The Spike Protein RBD contains two main amino acids, Aspartate and Glutamate. Both of these amino acids are acidic and thus we anticipate basic amino acids on host organisms to maximize bond strength. Upon analyzing the model, we see the two most direct interactions from ACE2 are among positions 353 and 3, which in Human ACE2 are both Lysine. Upon comparison we see that monkeys, house cats, orangutans, pangolins, ferrets, common dogs, chinese horseshoe bats, wild boars, and the siberian tiger all share a similarity in that all also contain Lysine in positions 353 and 31. However, the palm civet did show to have a lysine in position 353 and a threonine in position 31. Given that Threonine is not basic but simply polar uncharged, we would naturally expect it’s interactions with Spike RBD to be somewhat weaker than expected. It is interesting, however some research has been produced to show that perhaps the palm civet was in fact a host organism at one point (Wan et. al). The house mouse shows to have an Asparagine in position 31 while containing a Histidine in Position 353. The asparagine we expect would result in the interaction working similarly to the Threonine in the palm civet given that both are polar uncharged amino acids although we do not know if there are any practical differences. Lastly, the genome used for the mink seems to have been missing the amino acids we were looking for except for position 353, which contains lysine.

Journal Conclusion

  • The purpose of the lab was met as we were able to create our project, look further into our research question, and create a powerpoint presentation. We were able to analyze the different amino acid sites and come to our conclusions regarding the potential differences in interaction strengths among the organisms. Please read the research conclusion for those results. We were able to conduct sequence alignments, create a phylogenetic tree, and conduct our comparisons.

Acknowledgements

  • I acknowledge my professor, Kam D. Dahlquist, Ph.D., with whom I met several times via zoom to discuss questions regarding the purpose and format.
  • I acknowledge my TA, Annika Dinulos, whom I contacted regarding a question for one of my assignments.
  • I acknowledge my lab partner, JT Correy, whom I worked with and consulted while creating my sequence alignments and tree
  • I copied and modified the protocol shown on the Week 1 page.
  • I copied and modified the protocol shown on the Week 4 page.
  • I copied and modified sequence data and a phylogenetic tree from Phylogeny.fr
  • I copied and referred to data in the article by Wan et al (2020)


"Except for what is noted above, this individual journal entry was completed by me and not copied from another source." Yaniv Maddahi (talk) 08:30, 13 October 2020 (PDT)

References

  1. Yushun Wan, Jian Shang, Rachel Graham, Ralph S. Baric, Fang Li Journal of Virology Mar 2020, 94 (7) e00127-20; DOI: 10.1128/JVI.00127-20
  2. OpenWetWare. (2020). BIOL368/F20:Week 1. Retrieved September 30, 2020, from https://openwetware.org/wiki/BIOL368/F20:Week_1
  3. OpenWetWare. (2020). BIOL368/F20:Week 4. Retrieved September 30, 2020, from https://openwetware.org/wiki/BIOL368/F20:Week_4
  4. Phylogeny.fr: Home. (2020). Retrieved September 30, 2020, from https://www.phylogeny.fr/
  5. NCBI GenBank. (2020). Bat SARS coronavirus Rp3, complete genome - Nucleotide. Retrieved 1 October 2020, from https://www.ncbi.nlm.nih.gov/nuccore/DQ071615
  6. NCBI GenBank. (2020). spike protein [Bat SARS CoV Rp3/2004] - Protein. Retrieved 1 October 2020, https://www.ncbi.nlm.nih.gov/protein/72256271
  7. iCn3D: Web-based 3D Structure Viewer 2AJF. (2020). Retrieved 6 October 2020, from https://www.ncbi.nlm.nih.gov/Structure/icn3d/full.html?&mmdbid=35213&bu=1&showanno=1
  8. iCn3D: Web-based 3D Structure Viewer 3SCK. (2020). Retrieved 1 October 2020, from https://www.ncbi.nlm.nih.gov/Structure/icn3d/full.html?pdbid=%203SCK
  9. Uniprot. (2020). S - Spike glycoprotein precursor - Severe acute respiratory syndrome coronavirus 2 (2019-nCoV) - S gene & protein. Retrieved 1 October 2020, from https://www.uniprot.org/uniprot/P0DTC2
  10. Andersen, K.G., Rambaut, A., Lipkin, W.I. et al. The proximal origin of SARS-CoV-2. Nat Med 26, 450–452 (2020). https://doi.org/10.1038/s41591-020-0820-9

Template

BIOL368/F20

Yaniv Maddahi

Yaniv Maddahi Template

Assignment Week

Individual Journal Pages

Class Journal Pages