Yaniv Maddahi Journal Week 6
From OpenWetWare
Jump to navigationJump to search
Purpose
- The purpose of this lab is to develop our skills as researchers and to create a presentation regarding our scientific question. We will be creating a presentation and conducting our own further research to see if we can answer our research question and even consider areas for future development. We will be creating a sequence alignment, phylogenetic tree, and model of ACE2 and spike protein. We will also be identifying key binding sites for the interactions.
Research Question
- How do differences in ACE2 receptors for host organisms affect binding strength with SARS-CoV?
- We will use the UniProt database and possibly GenBank for obtaining sequences. We will remain within the mammals category and possibly look at the sequences for ACE2 of Humans, Bats, Civet Cats, Squirrels, Rats, Birds, Dogs, Cats, Lions, Tigers, Ferrets, and Minks.
Methods/Results
- Based on our scientific question, we began by searching GenBank for all of our desired sequences.
- I chose the following GenBank records
- Human: NP_001358344.1 Homo sapiens
- Rats: AAW78017.1 Rattus norvegicus
- Civet: AAX63775.1 [Paguma larvata
- Mink: U6DXQ3-1 Neovison vison
- Ferret: BAE53380.1 Mustela putorius furo
- Bat: AGZ48803.1 Rhinolophus sinicus
- Mouse: NP_001123985.1 Mus musculus
- Tiger: XP_007090142.1 Panthera tigris altaica
- Pangolin:QLH93383.1 Manis pentadactyla
- Cat: NP_001034545.1 Felis catus
- Dog: NP_001158732.1 Canis lupus familiaris
- Monkey: AAY57872.1 Chlorocebus aethiops
- Pig: NP_001116542.1 Sus scrofa
- Orangutan: Q5RFN1.1 Pongo abelii
- GenBank records contain information regarding the genomic sequence of a virus, its locus, publications, source, organism and classifications, references, information regarding publishing of the virus, codons within the virus, translation sqeuence, and more.
- I then downloaded the nucleotide sequence in FASTA format to my local hard drive.
- I clicked the "Send to link" in the upper right of the page, selected Complete Record, file as the Destination, and FASTA as the format, and created the file.
- I opened the file that I saved with a word processor to confirm that I had the sequence and that it is in the FASTA format. In the FASTA format each sequence is preceded by a label which begins with the greater than sign (>). I searched for the GenBank record associated with that sequence and located the spike protein accession number in the GenBank record. (Note that the spike protein is sometimes called the "S" protein.)
- I then downloaded my assigned protein sequence in FASTA format, just like I did for the whole genome sequence.
- I began every line with a space character as it will be interpreted as a fixed width font and the sequences will line up nicely on the page.
- In order to analyze sequence data I used www.phylogeny.fr, a free, simple to use web service dedicated to reconstructing and analysing phylogenetic relationships between molecular sequences.
- In my browser, I went to the website www.phylogeny.fr. I scrolled down on the page to the section labeled ‘Phylogeny analysis’, and clicked on the text ‘One Click’.
- I then clicked in the large text field labeled ‘Upload your set of sequences in FASTA, EMBL, or NEXUS format' and copied the list of sequences from the talk page and used Ctrl-V (or command-V) to paste my sequences here. I then clicked the “Submit” button.
- After my alignment was complete,I found the numbered tabs located just beneath the text One Click Mode, and clicked on the tab labeled 3. Alignment. Within the alignment, individual positions are color-coded to indicate their conservation, or how similar the sequences are to each other at that position. Blue highlighting indicates high conservation (i.e., the sequences are identical or at least very similar), while gray highlighting indicates lower conservation and white highlighting indicates little if any conservation.
Spike Protein Sequences
>NP_001358344.1 [Homo sapiens] MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLASWNYNTNITEENVQNMNNAGDKWS AFLKEQSTLAQMYPLQEIQNLTVKLQLQALQQNGSSVLSEDKSKRLNTILNTMSTIYSTGKVCNPDNPQE CLLLEPGLNEIMANSLDYNERLWAWESWRSEVGKQLRPLYEEYVVLKNEMARANHYEDYGDYWRGDYEVN GVDGYDYSRGQLIEDVEHTFEEIKPLYEHLHAYVRAKLMNAYPSYISPIGCLPAHLLGDMWGRFWTNLYS LTVPFGQKPNIDVTDAMVDQAWDAQRIFKEAEKFFVSVGLPNMTQGFWENSMLTDPGNVQKAVCHPTAWD LGKGDFRILMCTKVTMDDFLTAHHEMGHIQYDMAYAAQPFLLRNGANEGFHEAVGEIMSLSAATPKHLKS IGLLSPDFQEDNETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKDQWMKKWWEMKREIVGVVEP VPHDETYCDPASLFHVSNDYSFIRYYTRTLYQFQFQEALCQAAKHEGPLHKCDISNSTEAGQKLFNMLRL GKSEPWTLALENVVGAKNMNVRPLLNYFEPLFTWLKDQNKNSFVGWSTDWSPYADQSIKVRISLKSALGD KAYEWNDNEMYLFRSSVAYAMRQYFLKVKNQMILFGEEDVRVANLKPRISFNFFVTAPKNVSDIIPRTEV EKAIRMSRSRINDAFRLNDNSLEFLGIQPTLGPPNQPPVSIWLIVFGVVMGVIVVGIVILIFTGIRDRKK KNKARSGENPYASIDISKGENNPGFQNTDDVQTSF
>AAW78017.1 [Rattus norvegicus] MSSSCWLLLSLVAVATAQSLIEEKAESFLNKFNQEAEDLSYQSSLASWNYNTNITEENAQKMNEAAAKWS AFYEEQSKIAQNFSLQEIQNATIKRQLKALQQSGSSALSPDKNKQLNTILNTMSTIYSTGKVCNSMNPQE CFLLEPGLDEIMATSTDYNRRLWAWEGWRAEVGKQLRPLYEEYVVLKNEMARANNYEDYGDYWRGDYEAE GVEGYNYNRNQLIEDVENTFKEIKPLYEQLHAYVRTKLMEVYPSYISPTGCLPAHLLGDMWGRFWTNLYP LTTPFLQKPNIDVTDAMVNQSWDAERIFKEAEKFFVSVGLPQMTPGFWTNSMLTEPGDDRKVVCHPTAWD LGHGDFRIKMCTKVTMDNFLTAHHEMGHIQYDMAYAKQPFLLRNGANEGFHEAVGEIMSLSAATPKHLKS IGLLPSNFQEDNETEINFLLKQALTIVGTLPFTYMLEKWRWMVFQDKIPREQWTKKWWEMKREIVGVVEP LPHDETYCDPASLFHVSNDYSFIRYYTRTIYQFQFQEALCQAAKHDGPLHKCDISNSTEAGQKLLNMLSL GNSGPWTLALENVVGSRNMDVKPLLNYFQPLFVWLKEQNRNSTVGWSTDWSPYADQSIKVRISLKSALGK NAYEWTDNEMYLFRSSVAYAMREYFSREKNQTVPFGEADVWVSDLKPRVSFNFFVTSPKNVSDIIPRSEV EEAIRMSRGRINDIFGLNDNSLEFLGIYPTLKPPYEPPVTIWLIIFGVVMGTVVVGIVILIVTGIKGRKK KNETKREENPYDSMDIGKGESNAGFQNSDDAQTSF
>AAX63775.1 [Paguma larvata] MSGSFWLLLSFAALTAAQSTTEELAKTFLETFNYEAQELSYQSSVASWNYNTNITDENAKNMNEAGAKWS AYYEEQSKLAQTYPLAEIQDAKIKRQLQALQQSGSSVLSADKSQRLNTILNAMSTIYSTGKACNPNNPQE CLLLEPGLDNIMENSKDYNERLWAWEGWRAEVGKQLRPLYEEYVALKNEMARANNYEDYGDYWRGDYEEE WTGGYNYSRNQLIQDVEDTFEQIKPLYQHLHAYVRAKLMDTYPSRISRTGCLPAHLLGDMWGRFWTNLYP LTVPFGQKPNIDVTDAMVNQNWDARRIFKEAEKFFVSVGLPNMTQGFWENSMLTEPGDGRKVVCHPTAWD LGKGDFRIKMCTKVTMDDFLTAHHEMGHIQYDMAYAAQPFLLRNGANEGFHEAVGEIMSLSAATPNHLKT IGLLSPAFSEDNETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGAIPKEQWMQKWWEMKRNIVGVVEP VPHDETYCDPASLFHVANDYSFIRYYTRTIYQFQFQEALCQIAKHEGPLHKCDISNSTEAGKKLLEMLSL GRSEPWTLALERVVGAKNMNVTPLLNYFEPLFTWLKEQNRNSFVGWDTDWRPYSDQSIKVRISLKSALGE KAYEWNDNEMYLFRSSIAYAMREYFSKVKNQTIPFVEDNVWVSDLKPRISFNFFVTFSNNVSDVIPRSEV EDAIRMSRSRINDAFRLDDNSLEFLGIEPTLSPPYRPPVTIWLIVFGVVMGAIVVGIVLLIVSGIRNRRK NDQAGSEENPYASVDLNKGENNPGFQHADDVQTSF
>U6DXQ3-1 [Neovison vison] GLPNMTEGFWQNSMLTEPGDNRKVVCHPTAWDLGKHDFRIKMCTKVTMDDFLTAHHEMGH IQYDMAYAAQPFLLRNGANEGFHEAVGEIMSLSAATPNHLKNIGLLPPDFSEDSETDINF LLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKEQWMQKWWEMKRDIVGVVEPLPHDETYC DPAALFHVANDYSFIRYYTRTIYQFQFQEALCQIAKHEGPLYKCDISNSREAGQKLHEML SLGRSKPWTFALERVVGAKTMDVRPLLNYFEPLFTWLKEQNRNSFVGWNTDWSPYADQSI KVRISLKSALGEKAYEWNDNEMYFFQSSIAYAMREYFSKVKKQTIPFVDKDVRVSDLKPR ISFNFIVTSPENMSDIIPRADVEEAIRKSRGRINDAFRLDDNSLEFLGIQPTLEPPYQPP VTIWLIVFGVVMGVVVVGIFLLIFSGIRNRRKNNQARSEENPYASVDLSKG
>BAE53380.1 [Mustela putorius furo] MLGSSWLLLSLAALTAAQSTTEDLAKTFLEKFNYEAEELSYQNSLASWNYNTNITDENIQKMNIAGAKWS AFYEEESQHAKTYPLEEIQDPIIKRQLRALQQSGSSVLSADKRERLNTILNAMSTIYSTGKACNPNNPQE CLLLEPGLDDIMENSKDYNERLWAWEGWRSEVGKQLRPLYEEYVALKNEMARANNYEDYGDYWRGDYEEE WADGYSYSRNQLIEDVEHTFTQIKPLYEHLHAYVRAKLMDAYPSRISPTGCLPAHLLGDMWGRFWTNLYP LMVPFRQKPNIDVTDAMVNQSWDARRIFEEAETFFVSVGLPNMTEGFWQNSMLTEPGDNRKVVCHPTAWD LGKRDFRIKMCTKVTMDDFLTAHHEMGHIQYDMAYAEQPFLLRNGANEGFHEAVGEIMSLSAATPNHLKN IGLLPPDFSEDSETDINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKEQWMQKWWEMKRDIVGVVEP LPHDETYCDPAALFHVANDYSFIRYYTRTIYQFQFQEALCQIAKHEGPLYKCDISNSSEAGQKLHEMLSL GRSKPWTFALERVVGAKTMDVRPLLNYFEPLFTWLKEQNRNSFVGWNTDWSPYADQSIKVRISLKSALGE KAYEWNDNEMYFFQSSIAYAMREYFSKVKNQTIPFVGKDVRVSDLKPRISFNFIVTSPENMSDIIPRADV EEAIRKSRGRINDAFRLDDNSLEFLGIQPTLEPPYQPPVTIWLIVFGVVMGVVVVGIFLLIFSGIRNRRK NNQARSEENPYASVDLSKGENNPGFQNVDDVQTSF
>AGZ48803.1 [Rhinolophus sinicus] MSGSSWLLLSLVAVTTAQSTTEDEAKMFLDKFNTKAEDLSHQSSLASWDYNTNINDENVQKMDEAGAKWS AFYEEQSKLAKNYSLEQIQNVTVKLQLQILQQSGSPVLSEDKSKRLNSILNAMSTIYSTGKVCKPNKPQE CLLLEPGLDNIMGTSKDYNERLWAWEGWRAEVGKQLRPLYEEYVVLKNEMARGYHYEDYGDYWRRDYETE ESPGPGYSRDQLMKDVERIFTEIKPLYEHLHAYVRAKLMDTYPFHISPTGCLPAHLLGDMWGRFWTNLYP LTVPFGQKPNIDVTDEMLKQGWDADRIFKEAEKFFVSVGLPNMTEGFWNNSMLTEPGDGRKVVCHPTAWD LGKGDFRIKMCTKVTMEDFLTAHHEMGHIQYDMAYASQPYLLRNGANEGFHEAVGEVMSLSVATPKHLKT MGLLSPDFREDNETEINFLLKQALNIVGTLPFTYMLEKWRWMVFKGEIPKEEWMKKWWEMKRKIVGVVEP VPHDETYCDPASLFHVANDYSFIRYYTRTIFEFQFHEALCRIAQHDGPLHKCDISNSTDAGKKLHQMLSV GKSQAWTKTLEDIVDSRNMDVGPLLKYFEPLYTWLQEQNRKSYVGWNTDWSPYSDQSIKVRISLKSALGE NAYEWNDNEMYLFRSSVAYAMREYFLKEKHQTILFGAENVWVSNLKPRISFNFHVTSPGNLSDIIPRPEV EGAIRMSRSRINDAFRLDDNSLEFLGIQPTLGPPYQPPVTIWLIVFGVVMAVVVVGIVVLIITGIRDRRK TDQARSEENPYSSVDLSKGENNPGFQNGDDVQTSF
>NP_001123985.1 [Mus musculus] MSSSSWLLLSLVAVTTAQSLTEENAKTFLNNFNQEAEDLSYQSSLASWNYNTNITEENAQKMSEAAAKWS AFYEEQSKTAQSFSLQEIQTPIIKRQLQALQQSGSSALSADKNKQLNTILNTMSTIYSTGKVCNPKNPQE CLLLEPGLDEIMATSTDYNSRLWAWEGWRAEVGKQLRPLYEEYVVLKNEMARANNYNDYGDYWRGDYEAE GADGYNYNRNQLIEDVERTFAEIKPLYEHLHAYVRRKLMDTYPSYISPTGCLPAHLLGDMWGRFWTNLYP LTVPFAQKPNIDVTDAMMNQGWDAERIFQEAEKFFVSVGLPHMTQGFWANSMLTEPADGRKVVCHPTAWD LGHGDFRIKMCTKVTMDNFLTAHHEMGHIQYDMAYARQPFLLRNGANEGFHEAVGEIMSLSAATPKHLKS IGLLPSDFQEDSETEINFLLKQALTIVGTLPFTYMLEKWRWMVFRGEIPKEQWMKKWWEMKREIVGVVEP LPHDETYCDPASLFHVSNDYSFIRYYTRTIYQFQFQEALCQAAKYNGSLHKCDISNSTEAGQKLLKMLSL GNSEPWTKALENVVGARNMDVKPLLNYFQPLFDWLKEQNRNSFVGWNTEWSPYADQSIKVRISLKSALGA NAYEWTNNEMFLFRSSVAYAMRKYFSIIKNQTVPFLEEDVRVSDLKPRVSFYFFVTSPQNVSDVIPRSEV EDAIRMSRGRINDVFGLNDNSLEFLGIHPTLEPPYQPPVTIWLIIFGVVMALVVVGIIILIVTGIKGRKK KNETKREENPYDSMDIGKGESNAGFQNSDDAQTSF
>XP_007090142.1 [Panthera tigris altaica] LSFAALTAAQSTTEELAKTFLEKFNHEAEELSYQSSLASWNYNTNITDENVQKMNEAGAKWSAFYEEQSK LAETYPLAEIHNTTVKRQLQALQQSGSSVLSADKSQRLNTILNAMSTIYSTGKACNPNNPQECLLLEPGL DDIMENSKDYNERLWAWEGWRAEVGKQLRPLYEEYVALKNEMARANNYEDYGDYWRGDYEEEWTDGYNYS RSQLIKDVEHTFTQIKPLYQHLHAYVRAKLMDSYPSRISPTGCLPAHLLGDMWGRFWTNLYPLTVPFGQK PNIDVTDAMVNQSWDARRIFKEAEKFFVSVGLPNMTQGFWENSMLTEPGNSQKVVCHPTAWDLGKGDFRI KMCTKVTMDDFLTAHHEMGHIQYDMAYAVQPFLLRNGANEGFHEAVGEIMSLSAATPNHLKTIGLLPPGF SEDSETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKEQWMQKWWEMKREIVGVVEPVPHDETYC DPASLFHVANDYSFIRYYTRTIYQFQFQEALCRIAKHEGPLHKCDISNSSEAGKKLLQMLTLGKSKPWTL ALEHVVGEKNMNVTPLLKYFEPLFTWLKEQNRNSFVGWNTDWRPYADQSIKVRISLKSALGDKAYEWNDN EMYLFRSSVAYAMREYFSKVKNQTIPFVEDNVWVSNLKPRISFNFFVTASKNVSDVIPRREVEEAIRMSR SRINDAFRLDDNSLEFLGIQPTLSPPYQPPVTIWLIVFGVVMGVVVVGIVLLIVSGIRNRRKNNQARSEE NPYASVDLSKGENNPGFQHADDVQTSF
>QLH93383.1 [Manis pentadactyla] MSGSSWLLLSLVAVTAAQSTSDEEAKTFLEKFNSEAEELSYQSSLASWNYNTNITDENVQKMNVAGAKWS TFYEEQSKIAKNYQLQNIQNDTIKRQLQALQLSGSSALSADKNQRLNTILNTMSTIYSTGKVCNPGNPQE CSLLEPGLDNIMESSKDYNERLWAWEGWRSEVGKQLRPLYEEYVVLKNEMARANHYEDYGDYWRGDYETE GANGYNYSRDHLIEDVEHIFTQIKPLYEHLHAYVRAKLMDNYPSHISPTGCLPAHLLGDMWGRFWTNLYP LTVPFRQKPNIDVTDAMVNQTWDANRIFKEAEKFFVSVGLPKMTQTFWENSMLTEPGDGRKVVCHPTAWD LGKHDFRIKMCTKVTMDDFLTAHHEMGHIQYDMAYAMQPYLLRNGANEGFHEAVGEIMSLSAATPKHLKN IGLLPPDFYEDNETEINFLLKQALTIVGTLPFTYMLEKWRWMVFSGQIPKEQWMKKWWEMKREIVGVVEP VPHDETYCDPASLFHVANDYSFIRYYTRTIYQFQFQEALCQTAKHEGPLHKCDISNSTEAGQKLLQMLSL GKSKPWTLALERVVGTKNMDVRPLLNYFEPLLTWLKEQNKNSFVGWNTDWSPYAAQSIKVRISLKSALGE KAYEWNDSEMYLFRSSVAYAMREYFSKFKKQTIPFEEESVRVSDLKPRVSFIFFVTLPKNVSAVIPRAEV EEAIRMSRSRINDVFRLDDNSLEFLGIQPTLEPPYQPPVTIWLIVFGVVMGVIVVGIVVLIFTGIRDRKK KNQARSEQNPYASVDLSKGENNPGFQNVDDVQTSF
>NP_001034545.1 [Felis catus] MSGSFWLLLSFAALTAAQSTTEELAKTFLEKFNHEAEELSYQSSLASWNYNTNITDENVQKMNEAGAKWS AFYEEQSKLAKTYPLAEIHNTTVKRQLQALQQSGSSVLSADKSQRLNTILNAMSTIYSTGKACNPNNPQE CLLLEPGLDDIMENSKDYNERLWAWEGWRAEVGKQLRPLYEEYVALKNEMARANNYEDYGDYWRGDYEEE WTDGYNYSRSQLIKDVEHTFTQIKPLYQHLHAYVRAKLMDTYPSRISPTGCLPAHLLGDMWGRFWTNLYP LTVPFGQKPNIDVTDAMVNQSWDARRIFKEAEKFFVSVGLPNMTQGFWENSMLTEPGDSRKVVCHPTAWD LGKGDFRIKMCTKVTMDDFLTAHHEMGHIQYDMAYAVQPFLLRNGANEGFHEAVGEIMSLSAATPNHLKT IGLLSPGFSEDSETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKEQWMQKWWEMKREIVGVVEP VPHDETYCDPASLFHVANDYSFIRYYTRTIYQFQFQEALCRIAKHEGPLHKCDISNSSEAGKKLLQMLTL GKSKPWTLALEHVVGEKKMNVTPLLKYFEPLFTWLKEQNRNSFVGWNTDWRPYADQSIKVRISLKSALGD EAYEWNDNEMYLFRSSVAYAMREYFSKVKNQTIPFVEDNVWVSNLKPRISFNFFVTASKNVSDVIPRSEV EEAIRMSRSRINDAFRLDDNSLEFLGIQPTLSPPYQPPVTIWLIVFGVVMGVVVVGIVLLIVSGIRNRRK NNQARSEENPYASVDLSKGENNPGFQHADDVQTSF
>NP_001158732.1 [Canis lupus familiaris] MSGSSWLLLSLAALTAAQSTEDLVKTFLEKFNYEAEELSYQSSLASWNYNINITDENVQKMNNAGAKWSA FYEEQSKLAKTYPLEEIQDSTVKRQLRALQHSGSSVLSADKNQRLNTILNSMSTVYSTGKACNPSNPQEC LLLEPGLDDIMENSKDYNERLWAWEGWRSEVGKQLRPLYEEYVALKNEMARANNYEDYGDYWRGDYEEEW ENGYNYSRNQLIDDVELTFTQIMPLYQHLHAYVRTKLMDTYPSYISPTGCLPAHLLGDMWGRFWTNLYPL TVPFGQKPNIDVTNAMVNQSWDARKIFKEAEKFFVSVGLPNMTQEFWGNSMLTEPSDSRKVVCHPTAWDL GKGDFRIKMCTKVTMDDFLTAHHEMGHIQYDMAYAAQPFLLRNGANEGFHEAVGEIMSLSAATPNHLKNI GLLPPSFFEDSETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKDQWMKTWWEMKRNIVGVVEPV PHDETYCDPASLFHVANDYSFIRYYTRTIYQFQFQEALCQIAKHEGPLHKCDISNSSEAGQKLLEMLKLG KSKPWTYALEIVVGAKNMDVRPLLNYFEPLFTWLKEQNRNSFVGWNTDWSPYADQSIKVRISLKSALGEK AYEWNNNEMYLFRSSIAYAMRQYFSEVKNQTIPFVEDNVWVSDLKPRISFNFSVTSPGNVSDIIPRTEVE EAIRMYRSRINDVFRLDDNSLEFLGIQPTPGPPYEPPVTIWLIVFGVVMGVVVVGIVLLIFSGIRNRRKN DQARGEENPYASVDLSKGENNPGFQSGDDVQTSF
>AAY57872.1 [Chlorocebus aethiops] MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLASWNYNTNITEENVQNMNNAGEKWS AFLKEQSTLAQMYPLQAIQNLTVKLQLQALQQNGSSVLSEDKSKRLNTILNTMSTIHSTGKVCNPNNPQE CLLLDPGLNEIMEKSLDYNERLWAWEGWRSEVGKQLRPLYEEYVVLKNEMARANHYKDYGDYWRGDYEVN GVDGYDYNRDQLIEDVERTFEEIKPLYEHLHAYVRAKLMNAYPSYISPTGCLPAHLLGDMWGRFWTNLYS LTVPFGQKPNIDVTDAMVNQAWNAQRIFKEAEKFFVSVGLPNMTQGFWENSMLTDPGNVQKVVCHPTAWD LGKGDFRIIMCTKVTMDDFLTAHHEMGHIQYDMAYAAQPFLLRNGANEGFHEAVGEIMSLSAATPKHLKS IGLLSPDFQEDNETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKDQWMKKWWEMKREIVGVVEP VPHDETYCDPASLFHVSNDYSFIRYYTRTLYQFQFQEALCQAAKHEGPLHKCDISNSTEAGQKLLNMLKL GKSEPWTLALENVVGAKNMSVRPLLNYFEPLFTWLKDQNKNSFVGWSTDWSPYADQSIKVRISLKSALGA NAYKWNDNEMYLFRSSVAYAMRQYFLENKHQTILFGEEDVRVADLKPRISFNFYVTAPKNVSDIIPRTEV EEAIRFSRSRINDAFQLNDNSLEFLGIQSTLVPPYQSPITTWLIVFGVVMAVIVAGIVVLIFTGIRDRKK KNQARSEENPYASIDISKGENNPGFQNTDDVQTSF
>NP_001116542.1 [Sus scrofa] MSGSFWLLLSLIPVTAAQSTTEELAKTFLEKFNLEAEDLAYQSSLASWTINTNITDENIQKMNDARAKWS AFYEEQSRIAKTYPLDEIQTLILKRQLQALQQSGTSGLSADKSKRLNTILNTMSTIYSSGKVLDPNNPQE CLVLEPGLDEIMENSKDYSRRLWAWESWRAEVGKQLRPLYEEYVVLENEMARANNYEDYGDYWRGDYEVT GTGDYDYSRNQLMEDVERTFAEIKPLYEHLHAYVRAKLMDAYPSRISPTGCLPAHLLGDMWGRFWTNLYP LTVPFGEKPSIDVTEAMVNQSWDAIRIFEEAEKFFVSIGLPNMTQGFWNNSMLTEPGDGRKVVCHPTAWD LGKGDFRIKMCTKVTMDDFLTAHHEMGHIQYDMAYAIQPYLLRNGANEGFHEAVGEIMSLSAATPHYLKA LGLLPPDFYEDSETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKEQWMQKWWEMKREIVGVVEP LPHDETYCDPACLFHVAEDYSFIRYYTRTIYQFQFHEALCRTAKHEGPLYKCDISNSTEAGQKLLQMLSL GKSEPWTLALENIVGVKTMDVKPLLSYFEPLLTWLKAQNGNSSVGWNTDWTPYADQSIKVRISLKSALGE DAYEWNDNEMYLFRSSIAYAMRNYFSSAKNETIPFGAVDVWVSDLKPRISFNFFVTSPANMSDIIPRSDV EKAISMSRSRINDAFRLDDNTLEFLGIQPTLGPPDEPPVTVWLIIFGVVMGLVVVGIVVLIFTGIRDRRK KKQASSEENPYGSMDLSKGESNSGFQNGDDIQTSF
>Q5RFN1.1 [Pongo abelii] MSGSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLASWNYNTNITEENVQNMNNAGDKWS AFLKEQSTLAQMYPLQEIQNLTVKLQLQALQQNGSSVLSEDKSKRLNTILNTMSTIYSTGKVCNPNNPQE CLLLEPGLNEIMANSLDYNERLWAWESWRSEVGKQLRPLYEEYVVLKNEMARANHYEDYGDYWRGDYEVN GVDSYDYSRGQLIEDVEHTFEEIKPLYEHLHAYVRAKLINAYPSYISPIGCLPAHLLGDMWGRFWTNLYS LTVPFGQKPNIDVTDAMVDQAWDAQRIFKEAEKFFVSVGLPNMTQRFWENSMLTDPGNVQKVVCHPTAWD LGKGDFRILMCTKVTMDDFLTAHHEMGHIQYDMAYAAQPFLLRNGANEGFHEAVGEIMSLSAATPKHLKS IGLLSPDFQEDNETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKDQWMKKWWEMKREIVGVVEP VPHDETYCDPASLFHVSNDYSFIRYYTRTLYQFQFQEALCQAAKHEGPLHKCDISNSTEAGQKLLNMLRL GKSEPWTLALENVVGAKNMNVRPLLDYFEPLFTWLKDQNKNSFVGWSTDWSPYADQSIKVRISLKSALGN KAYEWNDNEIYLFRSSVAYAMRKYFLEVKNQMILFGEEDVRVANLKPRISFNFFVTAPKNVSDIIPRTEV EKAIRMSRSRINDAFRLNDNSLEFLGIQPTLGPPNQPPVSIWLIVFGVVMGVIVVGIVVLIFTGIRDRKK KNKARNEENPYASIDISKGENNPGFQNTDDVQTSF
Multiple Sequence Allignment
AAW78017.1 MSSSCWLLLSLVAVATAQSLIEEKAESFLNKFNQEAEDLSYQSSLASWNYNTNITEENAQ NP_0011239 MSSSSWLLLSLVAVTTAQSLTEENAKTFLNNFNQEAEDLSYQSSLASWNYNTNITEENAQ AGZ48803.1 MSGSSWLLLSLVAVTTAQSTTEDEAKMFLDKFNTKAEDLSHQSSLASWDYNTNINDENVQ NP_0011165 MSGSFWLLLSLIPVTAAQSTTEELAKTFLEKFNLEAEDLAYQSSLASWTINTNITDENIQ AAY57872.1 MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLASWNYNTNITEENVQ NP_0013583 MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLASWNYNTNITEENVQ Q5RFN1.1_P MSGSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLASWNYNTNITEENVQ QLH93383.1 MSGSSWLLLSLVAVTAAQSTSDEEAKTFLEKFNSEAEELSYQSSLASWNYNTNITDENVQ U6DXQ3-1_N ------------------------------------------------------------ BAE53380.1 MLGSSWLLLSLAALTAAQSTTEDLAKTFLEKFNYEAEELSYQNSLASWNYNTNITDENIQ NP_0011587 MSGSSWLLLSLAALTAAQST-EDLVKTFLEKFNYEAEELSYQSSLASWNYNINITDENVQ AAX63775.1 MSGSFWLLLSFAALTAAQSTTEELAKTFLETFNYEAQELSYQSSVASWNYNTNITDENAK XP_0070901 --------LSFAALTAAQSTTEELAKTFLEKFNHEAEELSYQSSLASWNYNTNITDENVQ NP_0010345 MSGSFWLLLSFAALTAAQSTTEELAKTFLEKFNHEAEELSYQSSLASWNYNTNITDENVQ
AAW78017.1 KMNEAAAKWSAFYEEQSKIAQNFSLQEIQNATIKRQLKALQQSGSSALSPDKNKQLNTIL NP_0011239 KMSEAAAKWSAFYEEQSKTAQSFSLQEIQTPIIKRQLQALQQSGSSALSADKNKQLNTIL AGZ48803.1 KMDEAGAKWSAFYEEQSKLAKNYSLEQIQNVTVKLQLQILQQSGSPVLSEDKSKRLNSIL NP_0011165 KMNDARAKWSAFYEEQSRIAKTYPLDEIQTLILKRQLQALQQSGTSGLSADKSKRLNTIL AAY57872.1 NMNNAGEKWSAFLKEQSTLAQMYPLQAIQNLTVKLQLQALQQNGSSVLSEDKSKRLNTIL NP_0013583 NMNNAGDKWSAFLKEQSTLAQMYPLQEIQNLTVKLQLQALQQNGSSVLSEDKSKRLNTIL Q5RFN1.1_P NMNNAGDKWSAFLKEQSTLAQMYPLQEIQNLTVKLQLQALQQNGSSVLSEDKSKRLNTIL QLH93383.1 KMNVAGAKWSTFYEEQSKIAKNYQLQNIQNDTIKRQLQALQLSGSSALSADKNQRLNTIL U6DXQ3-1_N ------------------------------------------------------------ BAE53380.1 KMNIAGAKWSAFYEEESQHAKTYPLEEIQDPIIKRQLRALQQSGSSVLSADKRERLNTIL NP_0011587 KMNNAGAKWSAFYEEQSKLAKTYPLEEIQDSTVKRQLRALQHSGSSVLSADKNQRLNTIL AAX63775.1 NMNEAGAKWSAYYEEQSKLAQTYPLAEIQDAKIKRQLQALQQSGSSVLSADKSQRLNTIL XP_0070901 KMNEAGAKWSAFYEEQSKLAETYPLAEIHNTTVKRQLQALQQSGSSVLSADKSQRLNTIL NP_0010345 KMNEAGAKWSAFYEEQSKLAKTYPLAEIHNTTVKRQLQALQQSGSSVLSADKSQRLNTIL
AAW78017.1 NTMSTIYSTGKVCNSMNPQECFLLEPGLDEIMATSTDYNRRLWAWEGWRAEVGKQLRPLY NP_0011239 NTMSTIYSTGKVCNPKNPQECLLLEPGLDEIMATSTDYNSRLWAWEGWRAEVGKQLRPLY AGZ48803.1 NAMSTIYSTGKVCKPNKPQECLLLEPGLDNIMGTSKDYNERLWAWEGWRAEVGKQLRPLY NP_0011165 NTMSTIYSSGKVLDPNNPQECLVLEPGLDEIMENSKDYSRRLWAWESWRAEVGKQLRPLY AAY57872.1 NTMSTIHSTGKVCNPNNPQECLLLDPGLNEIMEKSLDYNERLWAWEGWRSEVGKQLRPLY NP_0013583 NTMSTIYSTGKVCNPDNPQECLLLEPGLNEIMANSLDYNERLWAWESWRSEVGKQLRPLY Q5RFN1.1_P NTMSTIYSTGKVCNPNNPQECLLLEPGLNEIMANSLDYNERLWAWESWRSEVGKQLRPLY QLH93383.1 NTMSTIYSTGKVCNPGNPQECSLLEPGLDNIMESSKDYNERLWAWEGWRSEVGKQLRPLY U6DXQ3-1_N ------------------------------------------------------------ BAE53380.1 NAMSTIYSTGKACNPNNPQECLLLEPGLDDIMENSKDYNERLWAWEGWRSEVGKQLRPLY NP_0011587 NSMSTVYSTGKACNPSNPQECLLLEPGLDDIMENSKDYNERLWAWEGWRSEVGKQLRPLY AAX63775.1 NAMSTIYSTGKACNPNNPQECLLLEPGLDNIMENSKDYNERLWAWEGWRAEVGKQLRPLY XP_0070901 NAMSTIYSTGKACNPNNPQECLLLEPGLDDIMENSKDYNERLWAWEGWRAEVGKQLRPLY NP_0010345 NAMSTIYSTGKACNPNNPQECLLLEPGLDDIMENSKDYNERLWAWEGWRAEVGKQLRPLY
AAW78017.1 EEYVVLKNEMARANNYEDYGDYWRGDYEAEGVEGYNYNRNQLIEDVENTFKEIKPLYEQL NP_0011239 EEYVVLKNEMARANNYNDYGDYWRGDYEAEGADGYNYNRNQLIEDVERTFAEIKPLYEHL AGZ48803.1 EEYVVLKNEMARGYHYEDYGDYWRRDYETEESPGPGYSRDQLMKDVERIFTEIKPLYEHL NP_0011165 EEYVVLENEMARANNYEDYGDYWRGDYEVTGTGDYDYSRNQLMEDVERTFAEIKPLYEHL AAY57872.1 EEYVVLKNEMARANHYKDYGDYWRGDYEVNGVDGYDYNRDQLIEDVERTFEEIKPLYEHL NP_0013583 EEYVVLKNEMARANHYEDYGDYWRGDYEVNGVDGYDYSRGQLIEDVEHTFEEIKPLYEHL Q5RFN1.1_P EEYVVLKNEMARANHYEDYGDYWRGDYEVNGVDSYDYSRGQLIEDVEHTFEEIKPLYEHL QLH93383.1 EEYVVLKNEMARANHYEDYGDYWRGDYETEGANGYNYSRDHLIEDVEHIFTQIKPLYEHL U6DXQ3-1_N ------------------------------------------------------------ BAE53380.1 EEYVALKNEMARANNYEDYGDYWRGDYEEEWADGYSYSRNQLIEDVEHTFTQIKPLYEHL NP_0011587 EEYVALKNEMARANNYEDYGDYWRGDYEEEWENGYNYSRNQLIDDVELTFTQIMPLYQHL AAX63775.1 EEYVALKNEMARANNYEDYGDYWRGDYEEEWTGGYNYSRNQLIQDVEDTFEQIKPLYQHL XP_0070901 EEYVALKNEMARANNYEDYGDYWRGDYEEEWTDGYNYSRSQLIKDVEHTFTQIKPLYQHL NP_0010345 EEYVALKNEMARANNYEDYGDYWRGDYEEEWTDGYNYSRSQLIKDVEHTFTQIKPLYQHL
AAW78017.1 HAYVRTKLMEVYPSYISPTGCLPAHLLGDMWGRFWTNLYPLTTPFLQKPNIDVTDAMVNQ NP_0011239 HAYVRRKLMDTYPSYISPTGCLPAHLLGDMWGRFWTNLYPLTVPFAQKPNIDVTDAMMNQ AGZ48803.1 HAYVRAKLMDTYPFHISPTGCLPAHLLGDMWGRFWTNLYPLTVPFGQKPNIDVTDEMLKQ NP_0011165 HAYVRAKLMDAYPSRISPTGCLPAHLLGDMWGRFWTNLYPLTVPFGEKPSIDVTEAMVNQ AAY57872.1 HAYVRAKLMNAYPSYISPTGCLPAHLLGDMWGRFWTNLYSLTVPFGQKPNIDVTDAMVNQ NP_0013583 HAYVRAKLMNAYPSYISPIGCLPAHLLGDMWGRFWTNLYSLTVPFGQKPNIDVTDAMVDQ Q5RFN1.1_P HAYVRAKLINAYPSYISPIGCLPAHLLGDMWGRFWTNLYSLTVPFGQKPNIDVTDAMVDQ QLH93383.1 HAYVRAKLMDNYPSHISPTGCLPAHLLGDMWGRFWTNLYPLTVPFRQKPNIDVTDAMVNQ U6DXQ3-1_N ------------------------------------------------------------ BAE53380.1 HAYVRAKLMDAYPSRISPTGCLPAHLLGDMWGRFWTNLYPLMVPFRQKPNIDVTDAMVNQ NP_0011587 HAYVRTKLMDTYPSYISPTGCLPAHLLGDMWGRFWTNLYPLTVPFGQKPNIDVTNAMVNQ AAX63775.1 HAYVRAKLMDTYPSRISRTGCLPAHLLGDMWGRFWTNLYPLTVPFGQKPNIDVTDAMVNQ XP_0070901 HAYVRAKLMDSYPSRISPTGCLPAHLLGDMWGRFWTNLYPLTVPFGQKPNIDVTDAMVNQ NP_0010345 HAYVRAKLMDTYPSRISPTGCLPAHLLGDMWGRFWTNLYPLTVPFGQKPNIDVTDAMVNQ
AAW78017.1 SWDAERIFKEAEKFFVSVGLPQMTPGFWTNSMLTEPGDDRKVVCHPTAWDLGHGDFRIKM NP_0011239 GWDAERIFQEAEKFFVSVGLPHMTQGFWANSMLTEPADGRKVVCHPTAWDLGHGDFRIKM AGZ48803.1 GWDADRIFKEAEKFFVSVGLPNMTEGFWNNSMLTEPGDGRKVVCHPTAWDLGKGDFRIKM NP_0011165 SWDAIRIFEEAEKFFVSIGLPNMTQGFWNNSMLTEPGDGRKVVCHPTAWDLGKGDFRIKM AAY57872.1 AWNAQRIFKEAEKFFVSVGLPNMTQGFWENSMLTDPGNVQKVVCHPTAWDLGKGDFRIIM NP_0013583 AWDAQRIFKEAEKFFVSVGLPNMTQGFWENSMLTDPGNVQKAVCHPTAWDLGKGDFRILM Q5RFN1.1_P AWDAQRIFKEAEKFFVSVGLPNMTQRFWENSMLTDPGNVQKVVCHPTAWDLGKGDFRILM QLH93383.1 TWDANRIFKEAEKFFVSVGLPKMTQTFWENSMLTEPGDGRKVVCHPTAWDLGKHDFRIKM U6DXQ3-1_N ------------------GLPNMTEGFWQNSMLTEPGDNRKVVCHPTAWDLGKHDFRIKM BAE53380.1 SWDARRIFEEAETFFVSVGLPNMTEGFWQNSMLTEPGDNRKVVCHPTAWDLGKRDFRIKM NP_0011587 SWDARKIFKEAEKFFVSVGLPNMTQEFWGNSMLTEPSDSRKVVCHPTAWDLGKGDFRIKM AAX63775.1 NWDARRIFKEAEKFFVSVGLPNMTQGFWENSMLTEPGDGRKVVCHPTAWDLGKGDFRIKM XP_0070901 SWDARRIFKEAEKFFVSVGLPNMTQGFWENSMLTEPGNSQKVVCHPTAWDLGKGDFRIKM NP_0010345 SWDARRIFKEAEKFFVSVGLPNMTQGFWENSMLTEPGDSRKVVCHPTAWDLGKGDFRIKM ***:** ** *****:*.: .*.**********: **** *
AAW78017.1 CTKVTMDNFLTAHHEMGHIQYDMAYAKQPFLLRNGANEGFHEAVGEIMSLSAATPKHLKS NP_0011239 CTKVTMDNFLTAHHEMGHIQYDMAYARQPFLLRNGANEGFHEAVGEIMSLSAATPKHLKS AGZ48803.1 CTKVTMEDFLTAHHEMGHIQYDMAYASQPYLLRNGANEGFHEAVGEVMSLSVATPKHLKT NP_0011165 CTKVTMDDFLTAHHEMGHIQYDMAYAIQPYLLRNGANEGFHEAVGEIMSLSAATPHYLKA AAY57872.1 CTKVTMDDFLTAHHEMGHIQYDMAYAAQPFLLRNGANEGFHEAVGEIMSLSAATPKHLKS NP_0013583 CTKVTMDDFLTAHHEMGHIQYDMAYAAQPFLLRNGANEGFHEAVGEIMSLSAATPKHLKS Q5RFN1.1_P CTKVTMDDFLTAHHEMGHIQYDMAYAAQPFLLRNGANEGFHEAVGEIMSLSAATPKHLKS QLH93383.1 CTKVTMDDFLTAHHEMGHIQYDMAYAMQPYLLRNGANEGFHEAVGEIMSLSAATPKHLKN U6DXQ3-1_N CTKVTMDDFLTAHHEMGHIQYDMAYAAQPFLLRNGANEGFHEAVGEIMSLSAATPNHLKN BAE53380.1 CTKVTMDDFLTAHHEMGHIQYDMAYAEQPFLLRNGANEGFHEAVGEIMSLSAATPNHLKN NP_0011587 CTKVTMDDFLTAHHEMGHIQYDMAYAAQPFLLRNGANEGFHEAVGEIMSLSAATPNHLKN AAX63775.1 CTKVTMDDFLTAHHEMGHIQYDMAYAAQPFLLRNGANEGFHEAVGEIMSLSAATPNHLKT XP_0070901 CTKVTMDDFLTAHHEMGHIQYDMAYAVQPFLLRNGANEGFHEAVGEIMSLSAATPNHLKT NP_0010345 CTKVTMDDFLTAHHEMGHIQYDMAYAVQPFLLRNGANEGFHEAVGEIMSLSAATPNHLKT ******::****************** **:****************:****.***::**
AAW78017.1 IGLLPSNFQEDNETEINFLLKQALTIVGTLPFTYMLEKWRWMVFQDKIPREQWTKKWWEM NP_0011239 IGLLPSDFQEDSETEINFLLKQALTIVGTLPFTYMLEKWRWMVFRGEIPKEQWMKKWWEM AGZ48803.1 MGLLSPDFREDNETEINFLLKQALNIVGTLPFTYMLEKWRWMVFKGEIPKEEWMKKWWEM NP_0011165 LGLLPPDFYEDSETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKEQWMQKWWEM AAY57872.1 IGLLSPDFQEDNETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKDQWMKKWWEM NP_0013583 IGLLSPDFQEDNETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKDQWMKKWWEM Q5RFN1.1_P IGLLSPDFQEDNETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKDQWMKKWWEM QLH93383.1 IGLLPPDFYEDNETEINFLLKQALTIVGTLPFTYMLEKWRWMVFSGQIPKEQWMKKWWEM U6DXQ3-1_N IGLLPPDFSEDSETDINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKEQWMQKWWEM BAE53380.1 IGLLPPDFSEDSETDINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKEQWMQKWWEM NP_0011587 IGLLPPSFFEDSETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKDQWMKTWWEM AAX63775.1 IGLLSPAFSEDNETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGAIPKEQWMQKWWEM XP_0070901 IGLLPPGFSEDSETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKEQWMQKWWEM NP_0010345 IGLLSPGFSEDSETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKEQWMQKWWEM :***.. * **.**:*********.******************* . **.::* :.****
AAW78017.1 KREIVGVVEPLPHDETYCDPASLFHVSNDYSFIRYYTRTIYQFQFQEALCQAAKHDGPLH NP_0011239 KREIVGVVEPLPHDETYCDPASLFHVSNDYSFIRYYTRTIYQFQFQEALCQAAKYNGSLH AGZ48803.1 KRKIVGVVEPVPHDETYCDPASLFHVANDYSFIRYYTRTIFEFQFHEALCRIAQHDGPLH NP_0011165 KREIVGVVEPLPHDETYCDPACLFHVAEDYSFIRYYTRTIYQFQFHEALCRTAKHEGPLY AAY57872.1 KREIVGVVEPVPHDETYCDPASLFHVSNDYSFIRYYTRTLYQFQFQEALCQAAKHEGPLH NP_0013583 KREIVGVVEPVPHDETYCDPASLFHVSNDYSFIRYYTRTLYQFQFQEALCQAAKHEGPLH Q5RFN1.1_P KREIVGVVEPVPHDETYCDPASLFHVSNDYSFIRYYTRTLYQFQFQEALCQAAKHEGPLH QLH93383.1 KREIVGVVEPVPHDETYCDPASLFHVANDYSFIRYYTRTIYQFQFQEALCQTAKHEGPLH U6DXQ3-1_N KRDIVGVVEPLPHDETYCDPAALFHVANDYSFIRYYTRTIYQFQFQEALCQIAKHEGPLY BAE53380.1 KRDIVGVVEPLPHDETYCDPAALFHVANDYSFIRYYTRTIYQFQFQEALCQIAKHEGPLY NP_0011587 KRNIVGVVEPVPHDETYCDPASLFHVANDYSFIRYYTRTIYQFQFQEALCQIAKHEGPLH AAX63775.1 KRNIVGVVEPVPHDETYCDPASLFHVANDYSFIRYYTRTIYQFQFQEALCQIAKHEGPLH XP_0070901 KREIVGVVEPVPHDETYCDPASLFHVANDYSFIRYYTRTIYQFQFQEALCRIAKHEGPLH NP_0010345 KREIVGVVEPVPHDETYCDPASLFHVANDYSFIRYYTRTIYQFQFQEALCRIAKHEGPLH **.*******:**********.****::***********:::***:****. *:::*.*:
AAW78017.1 KCDISNSTEAGQKLLNMLSLGNSGPWTLALENVVGSRNMDVKPLLNYFQPLFVWLKEQNR NP_0011239 KCDISNSTEAGQKLLKMLSLGNSEPWTKALENVVGARNMDVKPLLNYFQPLFDWLKEQNR AGZ48803.1 KCDISNSTDAGKKLHQMLSVGKSQAWTKTLEDIVDSRNMDVGPLLKYFEPLYTWLQEQNR NP_0011165 KCDISNSTEAGQKLLQMLSLGKSEPWTLALENIVGVKTMDVKPLLSYFEPLLTWLKAQNG AAY57872.1 KCDISNSTEAGQKLLNMLKLGKSEPWTLALENVVGAKNMSVRPLLNYFEPLFTWLKDQNK NP_0013583 KCDISNSTEAGQKLFNMLRLGKSEPWTLALENVVGAKNMNVRPLLNYFEPLFTWLKDQNK Q5RFN1.1_P KCDISNSTEAGQKLLNMLRLGKSEPWTLALENVVGAKNMNVRPLLDYFEPLFTWLKDQNK QLH93383.1 KCDISNSTEAGQKLLQMLSLGKSKPWTLALERVVGTKNMDVRPLLNYFEPLLTWLKEQNK U6DXQ3-1_N KCDISNSREAGQKLHEMLSLGRSKPWTFALERVVGAKTMDVRPLLNYFEPLFTWLKEQNR BAE53380.1 KCDISNSSEAGQKLHEMLSLGRSKPWTFALERVVGAKTMDVRPLLNYFEPLFTWLKEQNR NP_0011587 KCDISNSSEAGQKLLEMLKLGKSKPWTYALEIVVGAKNMDVRPLLNYFEPLFTWLKEQNR AAX63775.1 KCDISNSTEAGKKLLEMLSLGRSEPWTLALERVVGAKNMNVTPLLNYFEPLFTWLKEQNR XP_0070901 KCDISNSSEAGKKLLQMLTLGKSKPWTLALEHVVGEKNMNVTPLLKYFEPLFTWLKEQNR NP_0010345 KCDISNSSEAGKKLLQMLTLGKSKPWTLALEHVVGEKKMNVTPLLKYFEPLFTWLKEQNR ******* :**:** :** :*.* .** :** :*. ..*.* ***.**:** **: **
AAW78017.1 NSTVGWSTDWSPYADQSIKVRISLKSALGKNAYEWTDNEMYLFRSSVAYAMREYFSREKN NP_0011239 NSFVGWNTEWSPYADQSIKVRISLKSALGANAYEWTNNEMFLFRSSVAYAMRKYFSIIKN AGZ48803.1 KSYVGWNTDWSPYSDQSIKVRISLKSALGENAYEWNDNEMYLFRSSVAYAMREYFLKEKH NP_0011165 NSSVGWNTDWTPYADQSIKVRISLKSALGEDAYEWNDNEMYLFRSSIAYAMRNYFSSAKN AAY57872.1 NSFVGWSTDWSPYADQSIKVRISLKSALGANAYKWNDNEMYLFRSSVAYAMRQYFLENKH NP_0013583 NSFVGWSTDWSPYADQSIKVRISLKSALGDKAYEWNDNEMYLFRSSVAYAMRQYFLKVKN Q5RFN1.1_P NSFVGWSTDWSPYADQSIKVRISLKSALGNKAYEWNDNEIYLFRSSVAYAMRKYFLEVKN QLH93383.1 NSFVGWNTDWSPYAAQSIKVRISLKSALGEKAYEWNDSEMYLFRSSVAYAMREYFSKFKK U6DXQ3-1_N NSFVGWNTDWSPYADQSIKVRISLKSALGEKAYEWNDNEMYFFQSSIAYAMREYFSKVKK BAE53380.1 NSFVGWNTDWSPYADQSIKVRISLKSALGEKAYEWNDNEMYFFQSSIAYAMREYFSKVKN NP_0011587 NSFVGWNTDWSPYADQSIKVRISLKSALGEKAYEWNNNEMYLFRSSIAYAMRQYFSEVKN AAX63775.1 NSFVGWDTDWRPYSDQSIKVRISLKSALGEKAYEWNDNEMYLFRSSIAYAMREYFSKVKN XP_0070901 NSFVGWNTDWRPYADQSIKVRISLKSALGDKAYEWNDNEMYLFRSSVAYAMREYFSKVKN NP_0010345 NSFVGWNTDWRPYADQSIKVRISLKSALGDEAYEWNDNEMYLFRSSVAYAMREYFSKVKN :* ***.*:* **: ************** .**:*.:.*:::*.**:*****:** *:
AAW78017.1 QTVPFGEADVWVSDLKPRVSFNFFVTSPKNVSDIIPRSEVEEAIRMSRGRINDIFGLNDN NP_0011239 QTVPFLEEDVRVSDLKPRVSFYFFVTSPQNVSDVIPRSEVEDAIRMSRGRINDVFGLNDN AGZ48803.1 QTILFGAENVWVSNLKPRISFNFHVTSPGNLSDIIPRPEVEGAIRMSRSRINDAFRLDDN NP_0011165 ETIPFGAVDVWVSDLKPRISFNFFVTSPANMSDIIPRSDVEKAISMSRSRINDAFRLDDN AAY57872.1 QTILFGEEDVRVADLKPRISFNFYVTAPKNVSDIIPRTEVEEAIRFSRSRINDAFQLNDN NP_0013583 QMILFGEEDVRVANLKPRISFNFFVTAPKNVSDIIPRTEVEKAIRMSRSRINDAFRLNDN Q5RFN1.1_P QMILFGEEDVRVANLKPRISFNFFVTAPKNVSDIIPRTEVEKAIRMSRSRINDAFRLNDN QLH93383.1 QTIPFEEESVRVSDLKPRVSFIFFVTLPKNVSAVIPRAEVEEAIRMSRSRINDVFRLDDN U6DXQ3-1_N QTIPFVDKDVRVSDLKPRISFNFIVTSPENMSDIIPRADVEEAIRKSRGRINDAFRLDDN BAE53380.1 QTIPFVGKDVRVSDLKPRISFNFIVTSPENMSDIIPRADVEEAIRKSRGRINDAFRLDDN NP_0011587 QTIPFVEDNVWVSDLKPRISFNFSVTSPGNVSDIIPRTEVEEAIRMYRSRINDVFRLDDN AAX63775.1 QTIPFVEDNVWVSDLKPRISFNFFVTFSNNVSDVIPRSEVEDAIRMSRSRINDAFRLDDN XP_0070901 QTIPFVEDNVWVSNLKPRISFNFFVTASKNVSDVIPRREVEEAIRMSRSRINDAFRLDDN NP_0010345 QTIPFVEDNVWVSNLKPRISFNFFVTASKNVSDVIPRSEVEEAIRMSRSRINDAFRLDDN : : * .*.*::****:** * ** . *:* :*** :** ** *.**** * *:**
AAW78017.1 SLEFLGIYPTLKPPYEPPVTIWLIIFGVVMGTVVVGIVILIVTGIKGRKKKNETKREENP NP_0011239 SLEFLGIHPTLEPPYQPPVTIWLIIFGVVMALVVVGIIILIVTGIKGRKKKNETKREENP AGZ48803.1 SLEFLGIQPTLGPPYQPPVTIWLIVFGVVMAVVVVGIVVLIITGIRDRRKTDQARSEENP NP_0011165 TLEFLGIQPTLGPPDEPPVTVWLIIFGVVMGLVVVGIVVLIFTGIRDRRKKKQASSEENP AAY57872.1 SLEFLGIQSTLVPPYQSPITTWLIVFGVVMAVIVAGIVVLIFTGIRDRKKKNQARSEENP NP_0013583 SLEFLGIQPTLGPPNQPPVSIWLIVFGVVMGVIVVGIVILIFTGIRDRKKKNKARSGENP Q5RFN1.1_P SLEFLGIQPTLGPPNQPPVSIWLIVFGVVMGVIVVGIVVLIFTGIRDRKKKNKARNEENP QLH93383.1 SLEFLGIQPTLEPPYQPPVTIWLIVFGVVMGVIVVGIVVLIFTGIRDRKKKNQARSEQNP U6DXQ3-1_N SLEFLGIQPTLEPPYQPPVTIWLIVFGVVMGVVVVGIFLLIFSGIRNRRKNNQARSEENP BAE53380.1 SLEFLGIQPTLEPPYQPPVTIWLIVFGVVMGVVVVGIFLLIFSGIRNRRKNNQARSEENP NP_0011587 SLEFLGIQPTPGPPYEPPVTIWLIVFGVVMGVVVVGIVLLIFSGIRNRRKNDQARGEENP AAX63775.1 SLEFLGIEPTLSPPYRPPVTIWLIVFGVVMGAIVVGIVLLIVSGIRNRRKNDQAGSEENP XP_0070901 SLEFLGIQPTLSPPYQPPVTIWLIVFGVVMGVVVVGIVLLIVSGIRNRRKNNQARSEENP NP_0010345 SLEFLGIQPTLSPPYQPPVTIWLIVFGVVMGVVVVGIVLLIVSGIRNRRKNNQARSEENP :****** .* ** .*:: ***:*****. :*.**.:**.:**..*.*..:: :**
AAW78017.1 YDSMDIGKGESNAGFQNSDDAQTSF NP_0011239 YDSMDIGKGESNAGFQNSDDAQTSF AGZ48803.1 YSSVDLSKGENNPGFQNGDDVQTSF NP_0011165 YGSMDLSKGESNSGFQNGDDIQTSF AAY57872.1 YASIDISKGENNPGFQNTDDVQTSF NP_0013583 YASIDISKGENNPGFQNTDDVQTSF Q5RFN1.1_P YASIDISKGENNPGFQNTDDVQTSF QLH93383.1 YASVDLSKGENNPGFQNVDDVQTSF U6DXQ3-1_N YASVDLSKG---------------- BAE53380.1 YASVDLSKGENNPGFQNVDDVQTSF NP_0011587 YASVDLSKGENNPGFQSGDDVQTSF AAX63775.1 YASVDLNKGENNPGFQHADDVQTSF XP_0070901 YASVDLSKGENNPGFQHADDVQTSF NP_0010345 YASVDLSKGENNPGFQHADDVQTSF * *:*:.**
Phylogenetic Tree
Model
To create the model we used the Figure 1C SARS-CoV RBD (optimized for human ACE2 recognition) and human ACE2: 3SCI. Follow the link to start the modelling process.
- Click on the Windows menu to “View Sequences & Annotations”
- In the new window that appears to the right, click on the “Details” tab to show the actual amino acid sequences
- There are 2 sets of ACE2-spike proteins because of the way the proteins crystallized.
- Focus on the pink and tan chains and orient them like is shown in Figure 4B
- We are going to make the amino acid side chains shown in the figure visible.
- In the sequence window go to sequence “Protein 3SCK_A” (in pink) and select the following amino acids. Use the overall
- K31
- E35
- D38
- M82
- K353
- The part of the ribbon that represents these amino acids should be highlighted in yellow in the structure
- Go to the Styles menu and select Proteins > Ball and Stick
- Go to the Color menu and select Atom
- You should see the side chains shown in the figure.
- The labels were then added using Microsoft Paint.
Chart
Presentation
File:Maddahi CorreyBioinformatics Presentation 1 slides.pdf
Research Conclusion
- Upon analysis of the model and Human ACE2-Sprike Protein RBD structure, we were able to identify key regions at which the amino acids interact. The Spike Protein RBD contains two main amino acids, Aspartate and Glutamate. Both of these amino acids are acidic and thus we anticipate basic amino acids on host organisms to maximize bond strength. Upon analyzing the model, we see the two most direct interactions from ACE2 are among positions 353 and 3, which in Human ACE2 are both Lysine. Upon comparison we see that monkeys, house cats, orangutans, pangolins, ferrets, common dogs, chinese horseshoe bats, wild boars, and the siberian tiger all share a similarity in that all also contain Lysine in positions 353 and 31. However, the palm civet did show to have a lysine in position 353 and a threonine in position 31. Given that Threonine is not basic but simply polar uncharged, we would naturally expect it’s interactions with Spike RBD to be somewhat weaker than expected. It is interesting, however some research has been produced to show that perhaps the palm civet was in fact a host organism at one point (Wan et. al). The house mouse shows to have an Asparagine in position 31 while containing a Histidine in Position 353. The asparagine we expect would result in the interaction working similarly to the Threonine in the palm civet given that both are polar uncharged amino acids although we do not know if there are any practical differences. Lastly, the genome used for the mink seems to have been missing the amino acids we were looking for except for position 353, which contains lysine.
Journal Conclusion
- The purpose of the lab was met as we were able to create our project, look further into our research question, and create a powerpoint presentation. We were able to analyze the different amino acid sites and come to our conclusions regarding the potential differences in interaction strengths among the organisms. Please read the research conclusion for those results. We were able to conduct sequence alignments, create a phylogenetic tree, and conduct our comparisons.
Acknowledgements
- I acknowledge my professor, Kam D. Dahlquist, Ph.D., with whom I met several times via zoom to discuss questions regarding the purpose and format.
- I acknowledge my TA, Annika Dinulos, whom I contacted regarding a question for one of my assignments.
- I acknowledge my lab partner, JT Correy, whom I worked with and consulted while creating my sequence alignments and tree
- I copied and modified the protocol shown on the Week 1 page.
- I copied and modified the protocol shown on the Week 4 page.
- I copied and modified sequence data and a phylogenetic tree from Phylogeny.fr
- I copied and referred to data in the article by Wan et al (2020)
"Except for what is noted above, this individual journal entry was completed by me and not copied from another source." Yaniv Maddahi (talk) 08:30, 13 October 2020 (PDT)
References
- Yushun Wan, Jian Shang, Rachel Graham, Ralph S. Baric, Fang Li Journal of Virology Mar 2020, 94 (7) e00127-20; DOI: 10.1128/JVI.00127-20
- OpenWetWare. (2020). BIOL368/F20:Week 1. Retrieved September 30, 2020, from https://openwetware.org/wiki/BIOL368/F20:Week_1
- OpenWetWare. (2020). BIOL368/F20:Week 4. Retrieved September 30, 2020, from https://openwetware.org/wiki/BIOL368/F20:Week_4
- Phylogeny.fr: Home. (2020). Retrieved September 30, 2020, from https://www.phylogeny.fr/
- NCBI GenBank. (2020). Bat SARS coronavirus Rp3, complete genome - Nucleotide. Retrieved 1 October 2020, from https://www.ncbi.nlm.nih.gov/nuccore/DQ071615
- NCBI GenBank. (2020). spike protein [Bat SARS CoV Rp3/2004] - Protein. Retrieved 1 October 2020, https://www.ncbi.nlm.nih.gov/protein/72256271
- iCn3D: Web-based 3D Structure Viewer 2AJF. (2020). Retrieved 6 October 2020, from https://www.ncbi.nlm.nih.gov/Structure/icn3d/full.html?&mmdbid=35213&bu=1&showanno=1
- iCn3D: Web-based 3D Structure Viewer 3SCK. (2020). Retrieved 1 October 2020, from https://www.ncbi.nlm.nih.gov/Structure/icn3d/full.html?pdbid=%203SCK
- Uniprot. (2020). S - Spike glycoprotein precursor - Severe acute respiratory syndrome coronavirus 2 (2019-nCoV) - S gene & protein. Retrieved 1 October 2020, from https://www.uniprot.org/uniprot/P0DTC2
- Andersen, K.G., Rambaut, A., Lipkin, W.I. et al. The proximal origin of SARS-CoV-2. Nat Med 26, 450–452 (2020). https://doi.org/10.1038/s41591-020-0820-9
Template
Assignment Week
- Week 1 Assignment
- Week 2 Assignment
- Week 3 Assignment
- Week 4 Assignment
- Week 5 Assignment
- Week 6 Assignment
- Week 7 Assignment
- Week 8 Assignment
- Week 9 Assignment
- Week 10 Assignment
- Week 11 Assignment
- Week 12 Assignment
- Week 14 Assignment
Individual Journal Pages
- Yaniv Maddahi Journal Week 1
- Yaniv Maddahi Journal Week 2
- Yaniv Maddahi Journal Week 3
- Yaniv Maddahi Journal Week 4
- Yaniv Maddahi Journal Week 5
- Yaniv Maddahi Journal Week 6
- Yaniv Maddahi Journal Week 7
- Allosteric Database Review
- Yaniv Maddahi Journal Week 9
- Yaniv Maddahi Journal Week 10
- Yaniv Maddahi Journal Week 11
- The Mutants Research Project Week 12
- The Mutants Research Project Week 14