JT Correy Journal Week 6
Purpose
The purpose of this assignment was to develop our skills as researchers and progress our understanding of bioinformatics through analyzing and understanding a complex biological system. The binding ability of the SARS-COV-2 to the ACE2 proteins of various species is a complex process and to comprehend it we have to use several techniques we have been introduced to in the class (like modeling and creating of a phylogenetic tree).
Methods
Background research
Background research was done by reading the following articles:
- Daly, N. (2020, April 22). Seven more big cats test positive for coronavirus at Bronx Zoo. Retrieved October 15, 2020, from https://www.nationalgeographic.com/animals/2020/04/tiger-coronavirus-covid19-positive-test-bronx-zoo/
- Zhang, T., Wu, Q., & Zhang, Z. (2020). Probable Pangolin Origin of SARS-CoV-2 Associated with the COVID-19 Outbreak. Current Biology, 30(7). doi:10.1016/j.cub.2020.03.022 from https://www.sciencedirect.com/science/article/pii/S0960982220303602
- Kim, A. (2020, October 13). More than 1 million mink will be killed to help contain a series of Covid-19 outbreaks on Danish farms. Retrieved October 15, 2020, from https://www.cnn.com/2020/10/13/world/denmark-mink-farms-covid-trnd/index.html
- Hoffmann, M., Kleine-Weber, H., Schroeder, S., Krüger, N., Herrler, T., Erichsen, S., . . . Pöhlmann, S. (2020). SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2 and Is Blocked by a Clinically Proven Protease Inhibitor. Cell, 181(2). doi:10.1016/j.cell.2020.02.052 from https://www.sciencedirect.com/science/article/pii/S0092867420302294
- Rees, V. (2020, May 06). Activation sequence of COVID-19 S protein cleaved by furin protease. Retrieved October 15, 2020, from https://www.drugtargetreview.com/news/61264/study-finds-activation-sequence-of-covid-19-s-protein-is-cleaved-by-furin-protease/
- John's Hopkins University, C. (2020, October 15). COVID-19 Map. Retrieved October 15, 2020, from https://coronavirus.jhu.edu/map.html
Sequences
Based on the background research we selected to analyze the following species:
- Human: NP_001358344.1 Homo sapiens
- Rats: AAW78017.1 Rattus norvegicus
- Civet: AAX63775.1 [Paguma larvata
- Mink: U6DXQ3-1 Neovison vison
- Ferret: BAE53380.1 Mustela putorius furo
- Bat: AGZ48803.1 Rhinolophus sinicus
- Mouse: NP_001123985.1 Mus musculus
- Tiger: XP_007090142.1 Panthera tigris altaica
- Pangolin:QLH93383.1 Manis pentadactyla
- Cat: NP_001034545.1 Felis catus
- Dog: NP_001158732.1 Canis lupus familiaris
- Monkey: AAY57872.1 Chlorocebus aethiops
- Pig: NP_001116542.1 Sus scrofa
- Orangutan: Q5RFN1.1 Pongo abelii
We started by searching the for the chosen animal sequences on Unipro
- We had to start using Genbank also because some of the sequences couldn't be found on Unipro
- We searched for "ACE2" and then were able to filter by taxa to go to mammals then to each of our chosen species. We also were able to find some by searching "ACE2 Animal" with the "animal" just being the species we were searching for.
- We had to start using Genbank also because some of the sequences couldn't be found on Unipro
- All the sequences were the respective ACE2 protein sequences. The were downloaded in the FASTA file format.
- The FASTA sequences were then compiled on to one document.
- The sequences were then analyzed to make a phylogenetic tree using Phylogeny.fr
These were the selected mammals and their respective sequences
FASTA format
>NP_001358344.1 [Homo sapiens] MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLASWNYNTNITEENVQNMNNAGDKWS AFLKEQSTLAQMYPLQEIQNLTVKLQLQALQQNGSSVLSEDKSKRLNTILNTMSTIYSTGKVCNPDNPQE CLLLEPGLNEIMANSLDYNERLWAWESWRSEVGKQLRPLYEEYVVLKNEMARANHYEDYGDYWRGDYEVN GVDGYDYSRGQLIEDVEHTFEEIKPLYEHLHAYVRAKLMNAYPSYISPIGCLPAHLLGDMWGRFWTNLYS LTVPFGQKPNIDVTDAMVDQAWDAQRIFKEAEKFFVSVGLPNMTQGFWENSMLTDPGNVQKAVCHPTAWD LGKGDFRILMCTKVTMDDFLTAHHEMGHIQYDMAYAAQPFLLRNGANEGFHEAVGEIMSLSAATPKHLKS IGLLSPDFQEDNETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKDQWMKKWWEMKREIVGVVEP VPHDETYCDPASLFHVSNDYSFIRYYTRTLYQFQFQEALCQAAKHEGPLHKCDISNSTEAGQKLFNMLRL GKSEPWTLALENVVGAKNMNVRPLLNYFEPLFTWLKDQNKNSFVGWSTDWSPYADQSIKVRISLKSALGD KAYEWNDNEMYLFRSSVAYAMRQYFLKVKNQMILFGEEDVRVANLKPRISFNFFVTAPKNVSDIIPRTEV EKAIRMSRSRINDAFRLNDNSLEFLGIQPTLGPPNQPPVSIWLIVFGVVMGVIVVGIVILIFTGIRDRKK KNKARSGENPYASIDISKGENNPGFQNTDDVQTSF
>AAW78017.1 [Rattus norvegicus] MSSSCWLLLSLVAVATAQSLIEEKAESFLNKFNQEAEDLSYQSSLASWNYNTNITEENAQKMNEAAAKWS AFYEEQSKIAQNFSLQEIQNATIKRQLKALQQSGSSALSPDKNKQLNTILNTMSTIYSTGKVCNSMNPQE CFLLEPGLDEIMATSTDYNRRLWAWEGWRAEVGKQLRPLYEEYVVLKNEMARANNYEDYGDYWRGDYEAE GVEGYNYNRNQLIEDVENTFKEIKPLYEQLHAYVRTKLMEVYPSYISPTGCLPAHLLGDMWGRFWTNLYP LTTPFLQKPNIDVTDAMVNQSWDAERIFKEAEKFFVSVGLPQMTPGFWTNSMLTEPGDDRKVVCHPTAWD LGHGDFRIKMCTKVTMDNFLTAHHEMGHIQYDMAYAKQPFLLRNGANEGFHEAVGEIMSLSAATPKHLKS IGLLPSNFQEDNETEINFLLKQALTIVGTLPFTYMLEKWRWMVFQDKIPREQWTKKWWEMKREIVGVVEP LPHDETYCDPASLFHVSNDYSFIRYYTRTIYQFQFQEALCQAAKHDGPLHKCDISNSTEAGQKLLNMLSL GNSGPWTLALENVVGSRNMDVKPLLNYFQPLFVWLKEQNRNSTVGWSTDWSPYADQSIKVRISLKSALGK NAYEWTDNEMYLFRSSVAYAMREYFSREKNQTVPFGEADVWVSDLKPRVSFNFFVTSPKNVSDIIPRSEV EEAIRMSRGRINDIFGLNDNSLEFLGIYPTLKPPYEPPVTIWLIIFGVVMGTVVVGIVILIVTGIKGRKK KNETKREENPYDSMDIGKGESNAGFQNSDDAQTSF
>AAX63775.1 [Paguma larvata] MSGSFWLLLSFAALTAAQSTTEELAKTFLETFNYEAQELSYQSSVASWNYNTNITDENAKNMNEAGAKWS AYYEEQSKLAQTYPLAEIQDAKIKRQLQALQQSGSSVLSADKSQRLNTILNAMSTIYSTGKACNPNNPQE CLLLEPGLDNIMENSKDYNERLWAWEGWRAEVGKQLRPLYEEYVALKNEMARANNYEDYGDYWRGDYEEE WTGGYNYSRNQLIQDVEDTFEQIKPLYQHLHAYVRAKLMDTYPSRISRTGCLPAHLLGDMWGRFWTNLYP LTVPFGQKPNIDVTDAMVNQNWDARRIFKEAEKFFVSVGLPNMTQGFWENSMLTEPGDGRKVVCHPTAWD LGKGDFRIKMCTKVTMDDFLTAHHEMGHIQYDMAYAAQPFLLRNGANEGFHEAVGEIMSLSAATPNHLKT IGLLSPAFSEDNETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGAIPKEQWMQKWWEMKRNIVGVVEP VPHDETYCDPASLFHVANDYSFIRYYTRTIYQFQFQEALCQIAKHEGPLHKCDISNSTEAGKKLLEMLSL GRSEPWTLALERVVGAKNMNVTPLLNYFEPLFTWLKEQNRNSFVGWDTDWRPYSDQSIKVRISLKSALGE KAYEWNDNEMYLFRSSIAYAMREYFSKVKNQTIPFVEDNVWVSDLKPRISFNFFVTFSNNVSDVIPRSEV EDAIRMSRSRINDAFRLDDNSLEFLGIEPTLSPPYRPPVTIWLIVFGVVMGAIVVGIVLLIVSGIRNRRK NDQAGSEENPYASVDLNKGENNPGFQHADDVQTSF
>U6DXQ3-1 [Neovison vison] GLPNMTEGFWQNSMLTEPGDNRKVVCHPTAWDLGKHDFRIKMCTKVTMDDFLTAHHEMGH IQYDMAYAAQPFLLRNGANEGFHEAVGEIMSLSAATPNHLKNIGLLPPDFSEDSETDINF LLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKEQWMQKWWEMKRDIVGVVEPLPHDETYC DPAALFHVANDYSFIRYYTRTIYQFQFQEALCQIAKHEGPLYKCDISNSREAGQKLHEML SLGRSKPWTFALERVVGAKTMDVRPLLNYFEPLFTWLKEQNRNSFVGWNTDWSPYADQSI KVRISLKSALGEKAYEWNDNEMYFFQSSIAYAMREYFSKVKKQTIPFVDKDVRVSDLKPR ISFNFIVTSPENMSDIIPRADVEEAIRKSRGRINDAFRLDDNSLEFLGIQPTLEPPYQPP VTIWLIVFGVVMGVVVVGIFLLIFSGIRNRRKNNQARSEENPYASVDLSKG
>BAE53380.1 [Mustela putorius furo] MLGSSWLLLSLAALTAAQSTTEDLAKTFLEKFNYEAEELSYQNSLASWNYNTNITDENIQKMNIAGAKWS AFYEEESQHAKTYPLEEIQDPIIKRQLRALQQSGSSVLSADKRERLNTILNAMSTIYSTGKACNPNNPQE CLLLEPGLDDIMENSKDYNERLWAWEGWRSEVGKQLRPLYEEYVALKNEMARANNYEDYGDYWRGDYEEE WADGYSYSRNQLIEDVEHTFTQIKPLYEHLHAYVRAKLMDAYPSRISPTGCLPAHLLGDMWGRFWTNLYP LMVPFRQKPNIDVTDAMVNQSWDARRIFEEAETFFVSVGLPNMTEGFWQNSMLTEPGDNRKVVCHPTAWD LGKRDFRIKMCTKVTMDDFLTAHHEMGHIQYDMAYAEQPFLLRNGANEGFHEAVGEIMSLSAATPNHLKN IGLLPPDFSEDSETDINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKEQWMQKWWEMKRDIVGVVEP LPHDETYCDPAALFHVANDYSFIRYYTRTIYQFQFQEALCQIAKHEGPLYKCDISNSSEAGQKLHEMLSL GRSKPWTFALERVVGAKTMDVRPLLNYFEPLFTWLKEQNRNSFVGWNTDWSPYADQSIKVRISLKSALGE KAYEWNDNEMYFFQSSIAYAMREYFSKVKNQTIPFVGKDVRVSDLKPRISFNFIVTSPENMSDIIPRADV EEAIRKSRGRINDAFRLDDNSLEFLGIQPTLEPPYQPPVTIWLIVFGVVMGVVVVGIFLLIFSGIRNRRK NNQARSEENPYASVDLSKGENNPGFQNVDDVQTSF
>AGZ48803.1 [Rhinolophus sinicus] MSGSSWLLLSLVAVTTAQSTTEDEAKMFLDKFNTKAEDLSHQSSLASWDYNTNINDENVQKMDEAGAKWS AFYEEQSKLAKNYSLEQIQNVTVKLQLQILQQSGSPVLSEDKSKRLNSILNAMSTIYSTGKVCKPNKPQE CLLLEPGLDNIMGTSKDYNERLWAWEGWRAEVGKQLRPLYEEYVVLKNEMARGYHYEDYGDYWRRDYETE ESPGPGYSRDQLMKDVERIFTEIKPLYEHLHAYVRAKLMDTYPFHISPTGCLPAHLLGDMWGRFWTNLYP LTVPFGQKPNIDVTDEMLKQGWDADRIFKEAEKFFVSVGLPNMTEGFWNNSMLTEPGDGRKVVCHPTAWD LGKGDFRIKMCTKVTMEDFLTAHHEMGHIQYDMAYASQPYLLRNGANEGFHEAVGEVMSLSVATPKHLKT MGLLSPDFREDNETEINFLLKQALNIVGTLPFTYMLEKWRWMVFKGEIPKEEWMKKWWEMKRKIVGVVEP VPHDETYCDPASLFHVANDYSFIRYYTRTIFEFQFHEALCRIAQHDGPLHKCDISNSTDAGKKLHQMLSV GKSQAWTKTLEDIVDSRNMDVGPLLKYFEPLYTWLQEQNRKSYVGWNTDWSPYSDQSIKVRISLKSALGE NAYEWNDNEMYLFRSSVAYAMREYFLKEKHQTILFGAENVWVSNLKPRISFNFHVTSPGNLSDIIPRPEV EGAIRMSRSRINDAFRLDDNSLEFLGIQPTLGPPYQPPVTIWLIVFGVVMAVVVVGIVVLIITGIRDRRK TDQARSEENPYSSVDLSKGENNPGFQNGDDVQTSF
>NP_001123985.1 [Mus musculus] MSSSSWLLLSLVAVTTAQSLTEENAKTFLNNFNQEAEDLSYQSSLASWNYNTNITEENAQKMSEAAAKWS AFYEEQSKTAQSFSLQEIQTPIIKRQLQALQQSGSSALSADKNKQLNTILNTMSTIYSTGKVCNPKNPQE CLLLEPGLDEIMATSTDYNSRLWAWEGWRAEVGKQLRPLYEEYVVLKNEMARANNYNDYGDYWRGDYEAE GADGYNYNRNQLIEDVERTFAEIKPLYEHLHAYVRRKLMDTYPSYISPTGCLPAHLLGDMWGRFWTNLYP LTVPFAQKPNIDVTDAMMNQGWDAERIFQEAEKFFVSVGLPHMTQGFWANSMLTEPADGRKVVCHPTAWD LGHGDFRIKMCTKVTMDNFLTAHHEMGHIQYDMAYARQPFLLRNGANEGFHEAVGEIMSLSAATPKHLKS IGLLPSDFQEDSETEINFLLKQALTIVGTLPFTYMLEKWRWMVFRGEIPKEQWMKKWWEMKREIVGVVEP LPHDETYCDPASLFHVSNDYSFIRYYTRTIYQFQFQEALCQAAKYNGSLHKCDISNSTEAGQKLLKMLSL GNSEPWTKALENVVGARNMDVKPLLNYFQPLFDWLKEQNRNSFVGWNTEWSPYADQSIKVRISLKSALGA NAYEWTNNEMFLFRSSVAYAMRKYFSIIKNQTVPFLEEDVRVSDLKPRVSFYFFVTSPQNVSDVIPRSEV EDAIRMSRGRINDVFGLNDNSLEFLGIHPTLEPPYQPPVTIWLIIFGVVMALVVVGIIILIVTGIKGRKK KNETKREENPYDSMDIGKGESNAGFQNSDDAQTSF
>XP_007090142.1 [Panthera tigris altaica] LSFAALTAAQSTTEELAKTFLEKFNHEAEELSYQSSLASWNYNTNITDENVQKMNEAGAKWSAFYEEQSK LAETYPLAEIHNTTVKRQLQALQQSGSSVLSADKSQRLNTILNAMSTIYSTGKACNPNNPQECLLLEPGL DDIMENSKDYNERLWAWEGWRAEVGKQLRPLYEEYVALKNEMARANNYEDYGDYWRGDYEEEWTDGYNYS RSQLIKDVEHTFTQIKPLYQHLHAYVRAKLMDSYPSRISPTGCLPAHLLGDMWGRFWTNLYPLTVPFGQK PNIDVTDAMVNQSWDARRIFKEAEKFFVSVGLPNMTQGFWENSMLTEPGNSQKVVCHPTAWDLGKGDFRI KMCTKVTMDDFLTAHHEMGHIQYDMAYAVQPFLLRNGANEGFHEAVGEIMSLSAATPNHLKTIGLLPPGF SEDSETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKEQWMQKWWEMKREIVGVVEPVPHDETYC DPASLFHVANDYSFIRYYTRTIYQFQFQEALCRIAKHEGPLHKCDISNSSEAGKKLLQMLTLGKSKPWTL ALEHVVGEKNMNVTPLLKYFEPLFTWLKEQNRNSFVGWNTDWRPYADQSIKVRISLKSALGDKAYEWNDN EMYLFRSSVAYAMREYFSKVKNQTIPFVEDNVWVSNLKPRISFNFFVTASKNVSDVIPRREVEEAIRMSR SRINDAFRLDDNSLEFLGIQPTLSPPYQPPVTIWLIVFGVVMGVVVVGIVLLIVSGIRNRRKNNQARSEE NPYASVDLSKGENNPGFQHADDVQTSF
>QLH93383.1 [Manis pentadactyla] MSGSSWLLLSLVAVTAAQSTSDEEAKTFLEKFNSEAEELSYQSSLASWNYNTNITDENVQKMNVAGAKWS TFYEEQSKIAKNYQLQNIQNDTIKRQLQALQLSGSSALSADKNQRLNTILNTMSTIYSTGKVCNPGNPQE CSLLEPGLDNIMESSKDYNERLWAWEGWRSEVGKQLRPLYEEYVVLKNEMARANHYEDYGDYWRGDYETE GANGYNYSRDHLIEDVEHIFTQIKPLYEHLHAYVRAKLMDNYPSHISPTGCLPAHLLGDMWGRFWTNLYP LTVPFRQKPNIDVTDAMVNQTWDANRIFKEAEKFFVSVGLPKMTQTFWENSMLTEPGDGRKVVCHPTAWD LGKHDFRIKMCTKVTMDDFLTAHHEMGHIQYDMAYAMQPYLLRNGANEGFHEAVGEIMSLSAATPKHLKN IGLLPPDFYEDNETEINFLLKQALTIVGTLPFTYMLEKWRWMVFSGQIPKEQWMKKWWEMKREIVGVVEP VPHDETYCDPASLFHVANDYSFIRYYTRTIYQFQFQEALCQTAKHEGPLHKCDISNSTEAGQKLLQMLSL GKSKPWTLALERVVGTKNMDVRPLLNYFEPLLTWLKEQNKNSFVGWNTDWSPYAAQSIKVRISLKSALGE KAYEWNDSEMYLFRSSVAYAMREYFSKFKKQTIPFEEESVRVSDLKPRVSFIFFVTLPKNVSAVIPRAEV EEAIRMSRSRINDVFRLDDNSLEFLGIQPTLEPPYQPPVTIWLIVFGVVMGVIVVGIVVLIFTGIRDRKK KNQARSEQNPYASVDLSKGENNPGFQNVDDVQTSF
>NP_001034545.1 [Felis catus] MSGSFWLLLSFAALTAAQSTTEELAKTFLEKFNHEAEELSYQSSLASWNYNTNITDENVQKMNEAGAKWS AFYEEQSKLAKTYPLAEIHNTTVKRQLQALQQSGSSVLSADKSQRLNTILNAMSTIYSTGKACNPNNPQE CLLLEPGLDDIMENSKDYNERLWAWEGWRAEVGKQLRPLYEEYVALKNEMARANNYEDYGDYWRGDYEEE WTDGYNYSRSQLIKDVEHTFTQIKPLYQHLHAYVRAKLMDTYPSRISPTGCLPAHLLGDMWGRFWTNLYP LTVPFGQKPNIDVTDAMVNQSWDARRIFKEAEKFFVSVGLPNMTQGFWENSMLTEPGDSRKVVCHPTAWD LGKGDFRIKMCTKVTMDDFLTAHHEMGHIQYDMAYAVQPFLLRNGANEGFHEAVGEIMSLSAATPNHLKT IGLLSPGFSEDSETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKEQWMQKWWEMKREIVGVVEP VPHDETYCDPASLFHVANDYSFIRYYTRTIYQFQFQEALCRIAKHEGPLHKCDISNSSEAGKKLLQMLTL GKSKPWTLALEHVVGEKKMNVTPLLKYFEPLFTWLKEQNRNSFVGWNTDWRPYADQSIKVRISLKSALGD EAYEWNDNEMYLFRSSVAYAMREYFSKVKNQTIPFVEDNVWVSNLKPRISFNFFVTASKNVSDVIPRSEV EEAIRMSRSRINDAFRLDDNSLEFLGIQPTLSPPYQPPVTIWLIVFGVVMGVVVVGIVLLIVSGIRNRRK NNQARSEENPYASVDLSKGENNPGFQHADDVQTSF
>NP_001158732.1 [Canis lupus familiaris] MSGSSWLLLSLAALTAAQSTEDLVKTFLEKFNYEAEELSYQSSLASWNYNINITDENVQKMNNAGAKWSA FYEEQSKLAKTYPLEEIQDSTVKRQLRALQHSGSSVLSADKNQRLNTILNSMSTVYSTGKACNPSNPQEC LLLEPGLDDIMENSKDYNERLWAWEGWRSEVGKQLRPLYEEYVALKNEMARANNYEDYGDYWRGDYEEEW ENGYNYSRNQLIDDVELTFTQIMPLYQHLHAYVRTKLMDTYPSYISPTGCLPAHLLGDMWGRFWTNLYPL TVPFGQKPNIDVTNAMVNQSWDARKIFKEAEKFFVSVGLPNMTQEFWGNSMLTEPSDSRKVVCHPTAWDL GKGDFRIKMCTKVTMDDFLTAHHEMGHIQYDMAYAAQPFLLRNGANEGFHEAVGEIMSLSAATPNHLKNI GLLPPSFFEDSETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKDQWMKTWWEMKRNIVGVVEPV PHDETYCDPASLFHVANDYSFIRYYTRTIYQFQFQEALCQIAKHEGPLHKCDISNSSEAGQKLLEMLKLG KSKPWTYALEIVVGAKNMDVRPLLNYFEPLFTWLKEQNRNSFVGWNTDWSPYADQSIKVRISLKSALGEK AYEWNNNEMYLFRSSIAYAMRQYFSEVKNQTIPFVEDNVWVSDLKPRISFNFSVTSPGNVSDIIPRTEVE EAIRMYRSRINDVFRLDDNSLEFLGIQPTPGPPYEPPVTIWLIVFGVVMGVVVVGIVLLIFSGIRNRRKN DQARGEENPYASVDLSKGENNPGFQSGDDVQTSF
>AAY57872.1 [Chlorocebus aethiops] MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLASWNYNTNITEENVQNMNNAGEKWS AFLKEQSTLAQMYPLQAIQNLTVKLQLQALQQNGSSVLSEDKSKRLNTILNTMSTIHSTGKVCNPNNPQE CLLLDPGLNEIMEKSLDYNERLWAWEGWRSEVGKQLRPLYEEYVVLKNEMARANHYKDYGDYWRGDYEVN GVDGYDYNRDQLIEDVERTFEEIKPLYEHLHAYVRAKLMNAYPSYISPTGCLPAHLLGDMWGRFWTNLYS LTVPFGQKPNIDVTDAMVNQAWNAQRIFKEAEKFFVSVGLPNMTQGFWENSMLTDPGNVQKVVCHPTAWD LGKGDFRIIMCTKVTMDDFLTAHHEMGHIQYDMAYAAQPFLLRNGANEGFHEAVGEIMSLSAATPKHLKS IGLLSPDFQEDNETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKDQWMKKWWEMKREIVGVVEP VPHDETYCDPASLFHVSNDYSFIRYYTRTLYQFQFQEALCQAAKHEGPLHKCDISNSTEAGQKLLNMLKL GKSEPWTLALENVVGAKNMSVRPLLNYFEPLFTWLKDQNKNSFVGWSTDWSPYADQSIKVRISLKSALGA NAYKWNDNEMYLFRSSVAYAMRQYFLENKHQTILFGEEDVRVADLKPRISFNFYVTAPKNVSDIIPRTEV EEAIRFSRSRINDAFQLNDNSLEFLGIQSTLVPPYQSPITTWLIVFGVVMAVIVAGIVVLIFTGIRDRKK KNQARSEENPYASIDISKGENNPGFQNTDDVQTSF
>NP_001116542.1 [Sus scrofa] MSGSFWLLLSLIPVTAAQSTTEELAKTFLEKFNLEAEDLAYQSSLASWTINTNITDENIQKMNDARAKWS AFYEEQSRIAKTYPLDEIQTLILKRQLQALQQSGTSGLSADKSKRLNTILNTMSTIYSSGKVLDPNNPQE CLVLEPGLDEIMENSKDYSRRLWAWESWRAEVGKQLRPLYEEYVVLENEMARANNYEDYGDYWRGDYEVT GTGDYDYSRNQLMEDVERTFAEIKPLYEHLHAYVRAKLMDAYPSRISPTGCLPAHLLGDMWGRFWTNLYP LTVPFGEKPSIDVTEAMVNQSWDAIRIFEEAEKFFVSIGLPNMTQGFWNNSMLTEPGDGRKVVCHPTAWD LGKGDFRIKMCTKVTMDDFLTAHHEMGHIQYDMAYAIQPYLLRNGANEGFHEAVGEIMSLSAATPHYLKA LGLLPPDFYEDSETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKEQWMQKWWEMKREIVGVVEP LPHDETYCDPACLFHVAEDYSFIRYYTRTIYQFQFHEALCRTAKHEGPLYKCDISNSTEAGQKLLQMLSL GKSEPWTLALENIVGVKTMDVKPLLSYFEPLLTWLKAQNGNSSVGWNTDWTPYADQSIKVRISLKSALGE DAYEWNDNEMYLFRSSIAYAMRNYFSSAKNETIPFGAVDVWVSDLKPRISFNFFVTSPANMSDIIPRSDV EKAISMSRSRINDAFRLDDNTLEFLGIQPTLGPPDEPPVTVWLIIFGVVMGLVVVGIVVLIFTGIRDRRK KKQASSEENPYGSMDLSKGESNSGFQNGDDIQTSF
>Q5RFN1.1 [Pongo abelii] MSGSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLASWNYNTNITEENVQNMNNAGDKWS AFLKEQSTLAQMYPLQEIQNLTVKLQLQALQQNGSSVLSEDKSKRLNTILNTMSTIYSTGKVCNPNNPQE CLLLEPGLNEIMANSLDYNERLWAWESWRSEVGKQLRPLYEEYVVLKNEMARANHYEDYGDYWRGDYEVN GVDSYDYSRGQLIEDVEHTFEEIKPLYEHLHAYVRAKLINAYPSYISPIGCLPAHLLGDMWGRFWTNLYS LTVPFGQKPNIDVTDAMVDQAWDAQRIFKEAEKFFVSVGLPNMTQRFWENSMLTDPGNVQKVVCHPTAWD LGKGDFRILMCTKVTMDDFLTAHHEMGHIQYDMAYAAQPFLLRNGANEGFHEAVGEIMSLSAATPKHLKS IGLLSPDFQEDNETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKDQWMKKWWEMKREIVGVVEP VPHDETYCDPASLFHVSNDYSFIRYYTRTLYQFQFQEALCQAAKHEGPLHKCDISNSTEAGQKLLNMLRL GKSEPWTLALENVVGAKNMNVRPLLDYFEPLFTWLKDQNKNSFVGWSTDWSPYADQSIKVRISLKSALGN KAYEWNDNEIYLFRSSVAYAMRKYFLEVKNQMILFGEEDVRVANLKPRISFNFFVTAPKNVSDIIPRTEV EKAIRMSRSRINDAFRLNDNSLEFLGIQPTLGPPNQPPVSIWLIVFGVVMGVIVVGIVVLIFTGIRDRKK KNKARNEENPYASIDISKGENNPGFQNTDDVQTSF
Clustal Format
The sequences were the aligned into the clustal format for better viewing. This was done with Phylogeny.fr.
AAW78017.1 MSSSCWLLLSLVAVATAQSLIEEKAESFLNKFNQEAEDLSYQSSLASWNYNTNITEENAQ NP_0011239 MSSSSWLLLSLVAVTTAQSLTEENAKTFLNNFNQEAEDLSYQSSLASWNYNTNITEENAQ AGZ48803.1 MSGSSWLLLSLVAVTTAQSTTEDEAKMFLDKFNTKAEDLSHQSSLASWDYNTNINDENVQ NP_0011165 MSGSFWLLLSLIPVTAAQSTTEELAKTFLEKFNLEAEDLAYQSSLASWTINTNITDENIQ AAY57872.1 MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLASWNYNTNITEENVQ NP_0013583 MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLASWNYNTNITEENVQ Q5RFN1.1_P MSGSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLASWNYNTNITEENVQ QLH93383.1 MSGSSWLLLSLVAVTAAQSTSDEEAKTFLEKFNSEAEELSYQSSLASWNYNTNITDENVQ U6DXQ3-1_N ------------------------------------------------------------ BAE53380.1 MLGSSWLLLSLAALTAAQSTTEDLAKTFLEKFNYEAEELSYQNSLASWNYNTNITDENIQ NP_0011587 MSGSSWLLLSLAALTAAQST-EDLVKTFLEKFNYEAEELSYQSSLASWNYNINITDENVQ AAX63775.1 MSGSFWLLLSFAALTAAQSTTEELAKTFLETFNYEAQELSYQSSVASWNYNTNITDENAK XP_0070901 --------LSFAALTAAQSTTEELAKTFLEKFNHEAEELSYQSSLASWNYNTNITDENVQ NP_0010345 MSGSFWLLLSFAALTAAQSTTEELAKTFLEKFNHEAEELSYQSSLASWNYNTNITDENVQ
AAW78017.1 KMNEAAAKWSAFYEEQSKIAQNFSLQEIQNATIKRQLKALQQSGSSALSPDKNKQLNTIL NP_0011239 KMSEAAAKWSAFYEEQSKTAQSFSLQEIQTPIIKRQLQALQQSGSSALSADKNKQLNTIL AGZ48803.1 KMDEAGAKWSAFYEEQSKLAKNYSLEQIQNVTVKLQLQILQQSGSPVLSEDKSKRLNSIL NP_0011165 KMNDARAKWSAFYEEQSRIAKTYPLDEIQTLILKRQLQALQQSGTSGLSADKSKRLNTIL AAY57872.1 NMNNAGEKWSAFLKEQSTLAQMYPLQAIQNLTVKLQLQALQQNGSSVLSEDKSKRLNTIL NP_0013583 NMNNAGDKWSAFLKEQSTLAQMYPLQEIQNLTVKLQLQALQQNGSSVLSEDKSKRLNTIL Q5RFN1.1_P NMNNAGDKWSAFLKEQSTLAQMYPLQEIQNLTVKLQLQALQQNGSSVLSEDKSKRLNTIL QLH93383.1 KMNVAGAKWSTFYEEQSKIAKNYQLQNIQNDTIKRQLQALQLSGSSALSADKNQRLNTIL U6DXQ3-1_N ------------------------------------------------------------ BAE53380.1 KMNIAGAKWSAFYEEESQHAKTYPLEEIQDPIIKRQLRALQQSGSSVLSADKRERLNTIL NP_0011587 KMNNAGAKWSAFYEEQSKLAKTYPLEEIQDSTVKRQLRALQHSGSSVLSADKNQRLNTIL AAX63775.1 NMNEAGAKWSAYYEEQSKLAQTYPLAEIQDAKIKRQLQALQQSGSSVLSADKSQRLNTIL XP_0070901 KMNEAGAKWSAFYEEQSKLAETYPLAEIHNTTVKRQLQALQQSGSSVLSADKSQRLNTIL NP_0010345 KMNEAGAKWSAFYEEQSKLAKTYPLAEIHNTTVKRQLQALQQSGSSVLSADKSQRLNTIL
AAW78017.1 NTMSTIYSTGKVCNSMNPQECFLLEPGLDEIMATSTDYNRRLWAWEGWRAEVGKQLRPLY NP_0011239 NTMSTIYSTGKVCNPKNPQECLLLEPGLDEIMATSTDYNSRLWAWEGWRAEVGKQLRPLY AGZ48803.1 NAMSTIYSTGKVCKPNKPQECLLLEPGLDNIMGTSKDYNERLWAWEGWRAEVGKQLRPLY NP_0011165 NTMSTIYSSGKVLDPNNPQECLVLEPGLDEIMENSKDYSRRLWAWESWRAEVGKQLRPLY AAY57872.1 NTMSTIHSTGKVCNPNNPQECLLLDPGLNEIMEKSLDYNERLWAWEGWRSEVGKQLRPLY NP_0013583 NTMSTIYSTGKVCNPDNPQECLLLEPGLNEIMANSLDYNERLWAWESWRSEVGKQLRPLY Q5RFN1.1_P NTMSTIYSTGKVCNPNNPQECLLLEPGLNEIMANSLDYNERLWAWESWRSEVGKQLRPLY QLH93383.1 NTMSTIYSTGKVCNPGNPQECSLLEPGLDNIMESSKDYNERLWAWEGWRSEVGKQLRPLY U6DXQ3-1_N ------------------------------------------------------------ BAE53380.1 NAMSTIYSTGKACNPNNPQECLLLEPGLDDIMENSKDYNERLWAWEGWRSEVGKQLRPLY NP_0011587 NSMSTVYSTGKACNPSNPQECLLLEPGLDDIMENSKDYNERLWAWEGWRSEVGKQLRPLY AAX63775.1 NAMSTIYSTGKACNPNNPQECLLLEPGLDNIMENSKDYNERLWAWEGWRAEVGKQLRPLY XP_0070901 NAMSTIYSTGKACNPNNPQECLLLEPGLDDIMENSKDYNERLWAWEGWRAEVGKQLRPLY NP_0010345 NAMSTIYSTGKACNPNNPQECLLLEPGLDDIMENSKDYNERLWAWEGWRAEVGKQLRPLY
AAW78017.1 EEYVVLKNEMARANNYEDYGDYWRGDYEAEGVEGYNYNRNQLIEDVENTFKEIKPLYEQL NP_0011239 EEYVVLKNEMARANNYNDYGDYWRGDYEAEGADGYNYNRNQLIEDVERTFAEIKPLYEHL AGZ48803.1 EEYVVLKNEMARGYHYEDYGDYWRRDYETEESPGPGYSRDQLMKDVERIFTEIKPLYEHL NP_0011165 EEYVVLENEMARANNYEDYGDYWRGDYEVTGTGDYDYSRNQLMEDVERTFAEIKPLYEHL AAY57872.1 EEYVVLKNEMARANHYKDYGDYWRGDYEVNGVDGYDYNRDQLIEDVERTFEEIKPLYEHL NP_0013583 EEYVVLKNEMARANHYEDYGDYWRGDYEVNGVDGYDYSRGQLIEDVEHTFEEIKPLYEHL Q5RFN1.1_P EEYVVLKNEMARANHYEDYGDYWRGDYEVNGVDSYDYSRGQLIEDVEHTFEEIKPLYEHL QLH93383.1 EEYVVLKNEMARANHYEDYGDYWRGDYETEGANGYNYSRDHLIEDVEHIFTQIKPLYEHL U6DXQ3-1_N ------------------------------------------------------------ BAE53380.1 EEYVALKNEMARANNYEDYGDYWRGDYEEEWADGYSYSRNQLIEDVEHTFTQIKPLYEHL NP_0011587 EEYVALKNEMARANNYEDYGDYWRGDYEEEWENGYNYSRNQLIDDVELTFTQIMPLYQHL AAX63775.1 EEYVALKNEMARANNYEDYGDYWRGDYEEEWTGGYNYSRNQLIQDVEDTFEQIKPLYQHL XP_0070901 EEYVALKNEMARANNYEDYGDYWRGDYEEEWTDGYNYSRSQLIKDVEHTFTQIKPLYQHL NP_0010345 EEYVALKNEMARANNYEDYGDYWRGDYEEEWTDGYNYSRSQLIKDVEHTFTQIKPLYQHL
AAW78017.1 HAYVRTKLMEVYPSYISPTGCLPAHLLGDMWGRFWTNLYPLTTPFLQKPNIDVTDAMVNQ NP_0011239 HAYVRRKLMDTYPSYISPTGCLPAHLLGDMWGRFWTNLYPLTVPFAQKPNIDVTDAMMNQ AGZ48803.1 HAYVRAKLMDTYPFHISPTGCLPAHLLGDMWGRFWTNLYPLTVPFGQKPNIDVTDEMLKQ NP_0011165 HAYVRAKLMDAYPSRISPTGCLPAHLLGDMWGRFWTNLYPLTVPFGEKPSIDVTEAMVNQ AAY57872.1 HAYVRAKLMNAYPSYISPTGCLPAHLLGDMWGRFWTNLYSLTVPFGQKPNIDVTDAMVNQ NP_0013583 HAYVRAKLMNAYPSYISPIGCLPAHLLGDMWGRFWTNLYSLTVPFGQKPNIDVTDAMVDQ Q5RFN1.1_P HAYVRAKLINAYPSYISPIGCLPAHLLGDMWGRFWTNLYSLTVPFGQKPNIDVTDAMVDQ QLH93383.1 HAYVRAKLMDNYPSHISPTGCLPAHLLGDMWGRFWTNLYPLTVPFRQKPNIDVTDAMVNQ U6DXQ3-1_N ------------------------------------------------------------ BAE53380.1 HAYVRAKLMDAYPSRISPTGCLPAHLLGDMWGRFWTNLYPLMVPFRQKPNIDVTDAMVNQ NP_0011587 HAYVRTKLMDTYPSYISPTGCLPAHLLGDMWGRFWTNLYPLTVPFGQKPNIDVTNAMVNQ AAX63775.1 HAYVRAKLMDTYPSRISRTGCLPAHLLGDMWGRFWTNLYPLTVPFGQKPNIDVTDAMVNQ XP_0070901 HAYVRAKLMDSYPSRISPTGCLPAHLLGDMWGRFWTNLYPLTVPFGQKPNIDVTDAMVNQ NP_0010345 HAYVRAKLMDTYPSRISPTGCLPAHLLGDMWGRFWTNLYPLTVPFGQKPNIDVTDAMVNQ
AAW78017.1 SWDAERIFKEAEKFFVSVGLPQMTPGFWTNSMLTEPGDDRKVVCHPTAWDLGHGDFRIKM NP_0011239 GWDAERIFQEAEKFFVSVGLPHMTQGFWANSMLTEPADGRKVVCHPTAWDLGHGDFRIKM AGZ48803.1 GWDADRIFKEAEKFFVSVGLPNMTEGFWNNSMLTEPGDGRKVVCHPTAWDLGKGDFRIKM NP_0011165 SWDAIRIFEEAEKFFVSIGLPNMTQGFWNNSMLTEPGDGRKVVCHPTAWDLGKGDFRIKM AAY57872.1 AWNAQRIFKEAEKFFVSVGLPNMTQGFWENSMLTDPGNVQKVVCHPTAWDLGKGDFRIIM NP_0013583 AWDAQRIFKEAEKFFVSVGLPNMTQGFWENSMLTDPGNVQKAVCHPTAWDLGKGDFRILM Q5RFN1.1_P AWDAQRIFKEAEKFFVSVGLPNMTQRFWENSMLTDPGNVQKVVCHPTAWDLGKGDFRILM QLH93383.1 TWDANRIFKEAEKFFVSVGLPKMTQTFWENSMLTEPGDGRKVVCHPTAWDLGKHDFRIKM U6DXQ3-1_N ------------------GLPNMTEGFWQNSMLTEPGDNRKVVCHPTAWDLGKHDFRIKM BAE53380.1 SWDARRIFEEAETFFVSVGLPNMTEGFWQNSMLTEPGDNRKVVCHPTAWDLGKRDFRIKM NP_0011587 SWDARKIFKEAEKFFVSVGLPNMTQEFWGNSMLTEPSDSRKVVCHPTAWDLGKGDFRIKM AAX63775.1 NWDARRIFKEAEKFFVSVGLPNMTQGFWENSMLTEPGDGRKVVCHPTAWDLGKGDFRIKM XP_0070901 SWDARRIFKEAEKFFVSVGLPNMTQGFWENSMLTEPGNSQKVVCHPTAWDLGKGDFRIKM NP_0010345 SWDARRIFKEAEKFFVSVGLPNMTQGFWENSMLTEPGDSRKVVCHPTAWDLGKGDFRIKM ***:** ** *****:*.: .*.**********: **** *
AAW78017.1 CTKVTMDNFLTAHHEMGHIQYDMAYAKQPFLLRNGANEGFHEAVGEIMSLSAATPKHLKS NP_0011239 CTKVTMDNFLTAHHEMGHIQYDMAYARQPFLLRNGANEGFHEAVGEIMSLSAATPKHLKS AGZ48803.1 CTKVTMEDFLTAHHEMGHIQYDMAYASQPYLLRNGANEGFHEAVGEVMSLSVATPKHLKT NP_0011165 CTKVTMDDFLTAHHEMGHIQYDMAYAIQPYLLRNGANEGFHEAVGEIMSLSAATPHYLKA AAY57872.1 CTKVTMDDFLTAHHEMGHIQYDMAYAAQPFLLRNGANEGFHEAVGEIMSLSAATPKHLKS NP_0013583 CTKVTMDDFLTAHHEMGHIQYDMAYAAQPFLLRNGANEGFHEAVGEIMSLSAATPKHLKS Q5RFN1.1_P CTKVTMDDFLTAHHEMGHIQYDMAYAAQPFLLRNGANEGFHEAVGEIMSLSAATPKHLKS QLH93383.1 CTKVTMDDFLTAHHEMGHIQYDMAYAMQPYLLRNGANEGFHEAVGEIMSLSAATPKHLKN U6DXQ3-1_N CTKVTMDDFLTAHHEMGHIQYDMAYAAQPFLLRNGANEGFHEAVGEIMSLSAATPNHLKN BAE53380.1 CTKVTMDDFLTAHHEMGHIQYDMAYAEQPFLLRNGANEGFHEAVGEIMSLSAATPNHLKN NP_0011587 CTKVTMDDFLTAHHEMGHIQYDMAYAAQPFLLRNGANEGFHEAVGEIMSLSAATPNHLKN AAX63775.1 CTKVTMDDFLTAHHEMGHIQYDMAYAAQPFLLRNGANEGFHEAVGEIMSLSAATPNHLKT XP_0070901 CTKVTMDDFLTAHHEMGHIQYDMAYAVQPFLLRNGANEGFHEAVGEIMSLSAATPNHLKT NP_0010345 CTKVTMDDFLTAHHEMGHIQYDMAYAVQPFLLRNGANEGFHEAVGEIMSLSAATPNHLKT ******::****************** **:****************:****.***::**
AAW78017.1 IGLLPSNFQEDNETEINFLLKQALTIVGTLPFTYMLEKWRWMVFQDKIPREQWTKKWWEM NP_0011239 IGLLPSDFQEDSETEINFLLKQALTIVGTLPFTYMLEKWRWMVFRGEIPKEQWMKKWWEM AGZ48803.1 MGLLSPDFREDNETEINFLLKQALNIVGTLPFTYMLEKWRWMVFKGEIPKEEWMKKWWEM NP_0011165 LGLLPPDFYEDSETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKEQWMQKWWEM AAY57872.1 IGLLSPDFQEDNETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKDQWMKKWWEM NP_0013583 IGLLSPDFQEDNETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKDQWMKKWWEM Q5RFN1.1_P IGLLSPDFQEDNETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKDQWMKKWWEM QLH93383.1 IGLLPPDFYEDNETEINFLLKQALTIVGTLPFTYMLEKWRWMVFSGQIPKEQWMKKWWEM U6DXQ3-1_N IGLLPPDFSEDSETDINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKEQWMQKWWEM BAE53380.1 IGLLPPDFSEDSETDINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKEQWMQKWWEM NP_0011587 IGLLPPSFFEDSETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKDQWMKTWWEM AAX63775.1 IGLLSPAFSEDNETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGAIPKEQWMQKWWEM XP_0070901 IGLLPPGFSEDSETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKEQWMQKWWEM NP_0010345 IGLLSPGFSEDSETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKEQWMQKWWEM :***.. * **.**:*********.******************* . **.::* :.****
AAW78017.1 KREIVGVVEPLPHDETYCDPASLFHVSNDYSFIRYYTRTIYQFQFQEALCQAAKHDGPLH NP_0011239 KREIVGVVEPLPHDETYCDPASLFHVSNDYSFIRYYTRTIYQFQFQEALCQAAKYNGSLH AGZ48803.1 KRKIVGVVEPVPHDETYCDPASLFHVANDYSFIRYYTRTIFEFQFHEALCRIAQHDGPLH NP_0011165 KREIVGVVEPLPHDETYCDPACLFHVAEDYSFIRYYTRTIYQFQFHEALCRTAKHEGPLY AAY57872.1 KREIVGVVEPVPHDETYCDPASLFHVSNDYSFIRYYTRTLYQFQFQEALCQAAKHEGPLH NP_0013583 KREIVGVVEPVPHDETYCDPASLFHVSNDYSFIRYYTRTLYQFQFQEALCQAAKHEGPLH Q5RFN1.1_P KREIVGVVEPVPHDETYCDPASLFHVSNDYSFIRYYTRTLYQFQFQEALCQAAKHEGPLH QLH93383.1 KREIVGVVEPVPHDETYCDPASLFHVANDYSFIRYYTRTIYQFQFQEALCQTAKHEGPLH U6DXQ3-1_N KRDIVGVVEPLPHDETYCDPAALFHVANDYSFIRYYTRTIYQFQFQEALCQIAKHEGPLY BAE53380.1 KRDIVGVVEPLPHDETYCDPAALFHVANDYSFIRYYTRTIYQFQFQEALCQIAKHEGPLY NP_0011587 KRNIVGVVEPVPHDETYCDPASLFHVANDYSFIRYYTRTIYQFQFQEALCQIAKHEGPLH AAX63775.1 KRNIVGVVEPVPHDETYCDPASLFHVANDYSFIRYYTRTIYQFQFQEALCQIAKHEGPLH XP_0070901 KREIVGVVEPVPHDETYCDPASLFHVANDYSFIRYYTRTIYQFQFQEALCRIAKHEGPLH NP_0010345 KREIVGVVEPVPHDETYCDPASLFHVANDYSFIRYYTRTIYQFQFQEALCRIAKHEGPLH **.*******:**********.****::***********:::***:****. *:::*.*:
AAW78017.1 KCDISNSTEAGQKLLNMLSLGNSGPWTLALENVVGSRNMDVKPLLNYFQPLFVWLKEQNR NP_0011239 KCDISNSTEAGQKLLKMLSLGNSEPWTKALENVVGARNMDVKPLLNYFQPLFDWLKEQNR AGZ48803.1 KCDISNSTDAGKKLHQMLSVGKSQAWTKTLEDIVDSRNMDVGPLLKYFEPLYTWLQEQNR NP_0011165 KCDISNSTEAGQKLLQMLSLGKSEPWTLALENIVGVKTMDVKPLLSYFEPLLTWLKAQNG AAY57872.1 KCDISNSTEAGQKLLNMLKLGKSEPWTLALENVVGAKNMSVRPLLNYFEPLFTWLKDQNK NP_0013583 KCDISNSTEAGQKLFNMLRLGKSEPWTLALENVVGAKNMNVRPLLNYFEPLFTWLKDQNK Q5RFN1.1_P KCDISNSTEAGQKLLNMLRLGKSEPWTLALENVVGAKNMNVRPLLDYFEPLFTWLKDQNK QLH93383.1 KCDISNSTEAGQKLLQMLSLGKSKPWTLALERVVGTKNMDVRPLLNYFEPLLTWLKEQNK U6DXQ3-1_N KCDISNSREAGQKLHEMLSLGRSKPWTFALERVVGAKTMDVRPLLNYFEPLFTWLKEQNR BAE53380.1 KCDISNSSEAGQKLHEMLSLGRSKPWTFALERVVGAKTMDVRPLLNYFEPLFTWLKEQNR NP_0011587 KCDISNSSEAGQKLLEMLKLGKSKPWTYALEIVVGAKNMDVRPLLNYFEPLFTWLKEQNR AAX63775.1 KCDISNSTEAGKKLLEMLSLGRSEPWTLALERVVGAKNMNVTPLLNYFEPLFTWLKEQNR XP_0070901 KCDISNSSEAGKKLLQMLTLGKSKPWTLALEHVVGEKNMNVTPLLKYFEPLFTWLKEQNR NP_0010345 KCDISNSSEAGKKLLQMLTLGKSKPWTLALEHVVGEKKMNVTPLLKYFEPLFTWLKEQNR ******* :**:** :** :*.* .** :** :*. ..*.* ***.**:** **: **
AAW78017.1 NSTVGWSTDWSPYADQSIKVRISLKSALGKNAYEWTDNEMYLFRSSVAYAMREYFSREKN NP_0011239 NSFVGWNTEWSPYADQSIKVRISLKSALGANAYEWTNNEMFLFRSSVAYAMRKYFSIIKN AGZ48803.1 KSYVGWNTDWSPYSDQSIKVRISLKSALGENAYEWNDNEMYLFRSSVAYAMREYFLKEKH NP_0011165 NSSVGWNTDWTPYADQSIKVRISLKSALGEDAYEWNDNEMYLFRSSIAYAMRNYFSSAKN AAY57872.1 NSFVGWSTDWSPYADQSIKVRISLKSALGANAYKWNDNEMYLFRSSVAYAMRQYFLENKH NP_0013583 NSFVGWSTDWSPYADQSIKVRISLKSALGDKAYEWNDNEMYLFRSSVAYAMRQYFLKVKN Q5RFN1.1_P NSFVGWSTDWSPYADQSIKVRISLKSALGNKAYEWNDNEIYLFRSSVAYAMRKYFLEVKN QLH93383.1 NSFVGWNTDWSPYAAQSIKVRISLKSALGEKAYEWNDSEMYLFRSSVAYAMREYFSKFKK U6DXQ3-1_N NSFVGWNTDWSPYADQSIKVRISLKSALGEKAYEWNDNEMYFFQSSIAYAMREYFSKVKK BAE53380.1 NSFVGWNTDWSPYADQSIKVRISLKSALGEKAYEWNDNEMYFFQSSIAYAMREYFSKVKN NP_0011587 NSFVGWNTDWSPYADQSIKVRISLKSALGEKAYEWNNNEMYLFRSSIAYAMRQYFSEVKN AAX63775.1 NSFVGWDTDWRPYSDQSIKVRISLKSALGEKAYEWNDNEMYLFRSSIAYAMREYFSKVKN XP_0070901 NSFVGWNTDWRPYADQSIKVRISLKSALGDKAYEWNDNEMYLFRSSVAYAMREYFSKVKN NP_0010345 NSFVGWNTDWRPYADQSIKVRISLKSALGDEAYEWNDNEMYLFRSSVAYAMREYFSKVKN :* ***.*:* **: ************** .**:*.:.*:::*.**:*****:** *:
AAW78017.1 QTVPFGEADVWVSDLKPRVSFNFFVTSPKNVSDIIPRSEVEEAIRMSRGRINDIFGLNDN NP_0011239 QTVPFLEEDVRVSDLKPRVSFYFFVTSPQNVSDVIPRSEVEDAIRMSRGRINDVFGLNDN AGZ48803.1 QTILFGAENVWVSNLKPRISFNFHVTSPGNLSDIIPRPEVEGAIRMSRSRINDAFRLDDN NP_0011165 ETIPFGAVDVWVSDLKPRISFNFFVTSPANMSDIIPRSDVEKAISMSRSRINDAFRLDDN AAY57872.1 QTILFGEEDVRVADLKPRISFNFYVTAPKNVSDIIPRTEVEEAIRFSRSRINDAFQLNDN NP_0013583 QMILFGEEDVRVANLKPRISFNFFVTAPKNVSDIIPRTEVEKAIRMSRSRINDAFRLNDN Q5RFN1.1_P QMILFGEEDVRVANLKPRISFNFFVTAPKNVSDIIPRTEVEKAIRMSRSRINDAFRLNDN QLH93383.1 QTIPFEEESVRVSDLKPRVSFIFFVTLPKNVSAVIPRAEVEEAIRMSRSRINDVFRLDDN U6DXQ3-1_N QTIPFVDKDVRVSDLKPRISFNFIVTSPENMSDIIPRADVEEAIRKSRGRINDAFRLDDN BAE53380.1 QTIPFVGKDVRVSDLKPRISFNFIVTSPENMSDIIPRADVEEAIRKSRGRINDAFRLDDN NP_0011587 QTIPFVEDNVWVSDLKPRISFNFSVTSPGNVSDIIPRTEVEEAIRMYRSRINDVFRLDDN AAX63775.1 QTIPFVEDNVWVSDLKPRISFNFFVTFSNNVSDVIPRSEVEDAIRMSRSRINDAFRLDDN XP_0070901 QTIPFVEDNVWVSNLKPRISFNFFVTASKNVSDVIPRREVEEAIRMSRSRINDAFRLDDN NP_0010345 QTIPFVEDNVWVSNLKPRISFNFFVTASKNVSDVIPRSEVEEAIRMSRSRINDAFRLDDN : : * .*.*::****:** * ** . *:* :*** :** ** *.**** * *:**
AAW78017.1 SLEFLGIYPTLKPPYEPPVTIWLIIFGVVMGTVVVGIVILIVTGIKGRKKKNETKREENP NP_0011239 SLEFLGIHPTLEPPYQPPVTIWLIIFGVVMALVVVGIIILIVTGIKGRKKKNETKREENP AGZ48803.1 SLEFLGIQPTLGPPYQPPVTIWLIVFGVVMAVVVVGIVVLIITGIRDRRKTDQARSEENP NP_0011165 TLEFLGIQPTLGPPDEPPVTVWLIIFGVVMGLVVVGIVVLIFTGIRDRRKKKQASSEENP AAY57872.1 SLEFLGIQSTLVPPYQSPITTWLIVFGVVMAVIVAGIVVLIFTGIRDRKKKNQARSEENP NP_0013583 SLEFLGIQPTLGPPNQPPVSIWLIVFGVVMGVIVVGIVILIFTGIRDRKKKNKARSGENP Q5RFN1.1_P SLEFLGIQPTLGPPNQPPVSIWLIVFGVVMGVIVVGIVVLIFTGIRDRKKKNKARNEENP QLH93383.1 SLEFLGIQPTLEPPYQPPVTIWLIVFGVVMGVIVVGIVVLIFTGIRDRKKKNQARSEQNP U6DXQ3-1_N SLEFLGIQPTLEPPYQPPVTIWLIVFGVVMGVVVVGIFLLIFSGIRNRRKNNQARSEENP BAE53380.1 SLEFLGIQPTLEPPYQPPVTIWLIVFGVVMGVVVVGIFLLIFSGIRNRRKNNQARSEENP NP_0011587 SLEFLGIQPTPGPPYEPPVTIWLIVFGVVMGVVVVGIVLLIFSGIRNRRKNDQARGEENP AAX63775.1 SLEFLGIEPTLSPPYRPPVTIWLIVFGVVMGAIVVGIVLLIVSGIRNRRKNDQAGSEENP XP_0070901 SLEFLGIQPTLSPPYQPPVTIWLIVFGVVMGVVVVGIVLLIVSGIRNRRKNNQARSEENP NP_0010345 SLEFLGIQPTLSPPYQPPVTIWLIVFGVVMGVVVVGIVLLIVSGIRNRRKNNQARSEENP :****** .* ** .*:: ***:*****. :*.**.:**.:**..*.*..:: :**
AAW78017.1 YDSMDIGKGESNAGFQNSDDAQTSF NP_0011239 YDSMDIGKGESNAGFQNSDDAQTSF AGZ48803.1 YSSVDLSKGENNPGFQNGDDVQTSF NP_0011165 YGSMDLSKGESNSGFQNGDDIQTSF AAY57872.1 YASIDISKGENNPGFQNTDDVQTSF NP_0013583 YASIDISKGENNPGFQNTDDVQTSF Q5RFN1.1_P YASIDISKGENNPGFQNTDDVQTSF QLH93383.1 YASVDLSKGENNPGFQNVDDVQTSF U6DXQ3-1_N YASVDLSKG---------------- BAE53380.1 YASVDLSKGENNPGFQNVDDVQTSF NP_0011587 YASVDLSKGENNPGFQSGDDVQTSF AAX63775.1 YASVDLNKGENNPGFQHADDVQTSF XP_0070901 YASVDLSKGENNPGFQHADDVQTSF NP_0010345 YASVDLSKGENNPGFQHADDVQTSF * *:*:.**
Model
To create the model we used the Figure 1C SARS-CoV RBD (optimized for human ACE2 recognition) and human ACE2: 3SCI. Follow the link to start the modelling process.
- Click on the Windows menu to “View Sequences & Annotations”
- In the new window that appears to the right, click on the “Details” tab to show the actual amino acid sequences
- There are 2 sets of ACE2-spike proteins because of the way the proteins crystallized.
- Focus on the pink and tan chains and orient them like is shown in Figure 4B
- We are going to make the amino acid side chains shown in the figure visible.
- In the sequence window go to sequence “Protein 3SCK_A” (in pink) and select the following amino acids. Use the overall
- K31
- E35
- D38
- M82
- K353
- The part of the ribbon that represents these amino acids should be highlighted in yellow in the structure
- Go to the Styles menu and select Proteins > Ball and Stick
- Go to the Color menu and select Atom
- You should see the side chains shown in the figure.
- The labels were then added using Microsoft Paint.
Phylogenetic tree
- My partner (Yaniv) created the phylogenetic tree using the FASTA sequencing and produced the phylogenetic tree on Phylogeny.fr
Species table
Using the Clustal format sequences we constructed the following table on Excel to better visualize the difference in the critical residues.
Presentation
File:Maddahi CorreyBioinformatics Presentation 1 slides.pdf
Conclusion
Overall this assignment taught me many things. I feel much more confident in my ability to do research and create models that assist with the research I am doing. We came to saveral conclusions about the binding of the ACE2 protein in different mammals with the SARS-COV-2 spike protein. The main conclusion we arrived at is that the binding depends on the polarity/charge of the amino acids in positions 31 and 353 in the ACE2 protein.
Acknowledgements
- Yaniv Maddahi
- Yaniv and I worked as homework partners for this week. We communicated and worked together both at the end of the week 6 lab and throughout the week to create our research project and assignment pages.
- Dr. Dahlquist
- Dr. Dahlquist served as a coach for how to begin our pages. She also instructed the class and provided us with the guiding homework document.
- I copied and modified the protocol shown on the Week 4 page of our class OpenWetWare.
- I copied and modified the protocol shown on the Week 5 page of our class OpenWetWare.
- I copied and modified sequence data and a phylogenetic tree from Phylogeny.fr
- I copied a model that was made using iCn3D web protein modelling.
Except for what is noted above, this individual journal entry was completed by me and not copied from another source. Jcorrey (talk) 23:02, 14 October 2020 (PDT)
References
- Yushun Wan, Jian Shang, Rachel Graham, Ralph S. Baric, Fang Li Journal of Virology Mar 2020, 94 (7) e00127-20; DOI: 10.1128/JVI.00127-20
- OpenWetWare. (2020). BIOL368/F20:Week 1. Retrieved September 30, 2020, from https://openwetware.org/wiki/BIOL368/F20:Week_1
- OpenWetWare. (2020). BIOL368/F20:Week 4. Retrieved September 30, 2020, from https://openwetware.org/wiki/BIOL368/F20:Week_4
- Phylogeny.fr: Home. (2020). Retrieved September 30, 2020, from https://www.phylogeny.fr/
- NCBI GenBank. (2020). Bat SARS coronavirus Rp3, complete genome - Nucleotide. Retrieved 1 October 2020, from https://www.ncbi.nlm.nih.gov/nuccore/DQ071615
- NCBI GenBank. (2020). spike protein [Bat SARS CoV Rp3/2004] - Protein. Retrieved 1 October 2020, https://www.ncbi.nlm.nih.gov/protein/72256271
- iCn3D: Web-based 3D Structure Viewer 2AJF. (2020). Retrieved 6 October 2020, from https://www.ncbi.nlm.nih.gov/Structure/icn3d/full.html?&mmdbid=35213&bu=1&showanno=1
- iCn3D: Web-based 3D Structure Viewer 3SCK. (2020). Retrieved 1 October 2020, from https://www.ncbi.nlm.nih.gov/Structure/icn3d/full.html?pdbid=%203SCK
- Uniprot. (2020). S - Spike glycoprotein precursor - Severe acute respiratory syndrome coronavirus 2 (2019-nCoV) - S gene & protein. Retrieved 1 October 2020, from https://www.uniprot.org/uniprot/P0DTC2
- Andersen, K.G., Rambaut, A., Lipkin, W.I. et al. The proximal origin of SARS-CoV-2. Nat Med 26, 450–452 (2020). https://doi.org/10.1038/s41591-020-0820-9
JT Correy Template
Weekly Assignments
- Week 1 Assignment
- Week 2 Assignment
- Week 3 Assignment
- Week 4 Assignment
- Week 5 Assignment
- Week 6 Assignment
- Week 7 Assignment
- Week 8 Assignment
- Week 9 Assignment
- Week 10 Assignment
- Week 11 Assignment
- Week 12 Assignment
- Week 14 Assignment
Individual Journal Pages
- JT Correy
- JT Correy Journal Week 2
- JT Correy Journal Week 3
- JT Correy Journal Week 4
- JT Correy Journal Week 5
- JT Correy Journal Week 6
- JT Correy Journal Week 7
- CancerTracer Review
- JT Correy Journal Week 9
- JT Correy Journal Week 10
- JT Correy Journal Week 11
- JT Correy Journal Week 12
- JT Correy Journal Week 14
- The Mutants Week 14
Class Journal Pages
- BIOL368/F20:Class Journal Week 1
- BIOL368/F20:Class Journal Week 2
- BIOL368/F20:Class Journal Week 3
- BIOL368/F20:Class Journal Week 4
- BIOL368/F20:Class Journal Week 5
- BIOL368/F20:Class Journal Week 6
- BIOL368/F20:Class Journal Week 7
- BIOL368/F20:Class Journal Week 8
- BIOL368/F20:Class Journal Week 9
- BIOL368/F20:Class Journal Week 10
- BIOL368/F20:Class Journal Week 11
- BIOL368/F20:Class Journal Week 12
- BIOL368/F20:Class Journal Week 14