JT Correy Journal Week 6

From OpenWetWare
Jump to navigationJump to search

Purpose

The purpose of this assignment was to develop our skills as researchers and progress our understanding of bioinformatics through analyzing and understanding a complex biological system. The binding ability of the SARS-COV-2 to the ACE2 proteins of various species is a complex process and to comprehend it we have to use several techniques we have been introduced to in the class (like modeling and creating of a phylogenetic tree).

Methods

Background research

Background research was done by reading the following articles:


Sequences

Based on the background research we selected to analyze the following species:

We started by searching the for the chosen animal sequences on Unipro

    • We had to start using Genbank also because some of the sequences couldn't be found on Unipro
      • We searched for "ACE2" and then were able to filter by taxa to go to mammals then to each of our chosen species. We also were able to find some by searching "ACE2 Animal" with the "animal" just being the species we were searching for.
  1. All the sequences were the respective ACE2 protein sequences. The were downloaded in the FASTA file format.
  2. The FASTA sequences were then compiled on to one document.
  3. The sequences were then analyzed to make a phylogenetic tree using Phylogeny.fr

These were the selected mammals and their respective sequences

FASTA format

>NP_001358344.1 [Homo sapiens]
MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLASWNYNTNITEENVQNMNNAGDKWS
AFLKEQSTLAQMYPLQEIQNLTVKLQLQALQQNGSSVLSEDKSKRLNTILNTMSTIYSTGKVCNPDNPQE
CLLLEPGLNEIMANSLDYNERLWAWESWRSEVGKQLRPLYEEYVVLKNEMARANHYEDYGDYWRGDYEVN
GVDGYDYSRGQLIEDVEHTFEEIKPLYEHLHAYVRAKLMNAYPSYISPIGCLPAHLLGDMWGRFWTNLYS
LTVPFGQKPNIDVTDAMVDQAWDAQRIFKEAEKFFVSVGLPNMTQGFWENSMLTDPGNVQKAVCHPTAWD
LGKGDFRILMCTKVTMDDFLTAHHEMGHIQYDMAYAAQPFLLRNGANEGFHEAVGEIMSLSAATPKHLKS
IGLLSPDFQEDNETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKDQWMKKWWEMKREIVGVVEP
VPHDETYCDPASLFHVSNDYSFIRYYTRTLYQFQFQEALCQAAKHEGPLHKCDISNSTEAGQKLFNMLRL
GKSEPWTLALENVVGAKNMNVRPLLNYFEPLFTWLKDQNKNSFVGWSTDWSPYADQSIKVRISLKSALGD
KAYEWNDNEMYLFRSSVAYAMRQYFLKVKNQMILFGEEDVRVANLKPRISFNFFVTAPKNVSDIIPRTEV
EKAIRMSRSRINDAFRLNDNSLEFLGIQPTLGPPNQPPVSIWLIVFGVVMGVIVVGIVILIFTGIRDRKK
KNKARSGENPYASIDISKGENNPGFQNTDDVQTSF


>AAW78017.1 [Rattus norvegicus]
MSSSCWLLLSLVAVATAQSLIEEKAESFLNKFNQEAEDLSYQSSLASWNYNTNITEENAQKMNEAAAKWS
AFYEEQSKIAQNFSLQEIQNATIKRQLKALQQSGSSALSPDKNKQLNTILNTMSTIYSTGKVCNSMNPQE
CFLLEPGLDEIMATSTDYNRRLWAWEGWRAEVGKQLRPLYEEYVVLKNEMARANNYEDYGDYWRGDYEAE
GVEGYNYNRNQLIEDVENTFKEIKPLYEQLHAYVRTKLMEVYPSYISPTGCLPAHLLGDMWGRFWTNLYP
LTTPFLQKPNIDVTDAMVNQSWDAERIFKEAEKFFVSVGLPQMTPGFWTNSMLTEPGDDRKVVCHPTAWD
LGHGDFRIKMCTKVTMDNFLTAHHEMGHIQYDMAYAKQPFLLRNGANEGFHEAVGEIMSLSAATPKHLKS
IGLLPSNFQEDNETEINFLLKQALTIVGTLPFTYMLEKWRWMVFQDKIPREQWTKKWWEMKREIVGVVEP
LPHDETYCDPASLFHVSNDYSFIRYYTRTIYQFQFQEALCQAAKHDGPLHKCDISNSTEAGQKLLNMLSL
GNSGPWTLALENVVGSRNMDVKPLLNYFQPLFVWLKEQNRNSTVGWSTDWSPYADQSIKVRISLKSALGK
NAYEWTDNEMYLFRSSVAYAMREYFSREKNQTVPFGEADVWVSDLKPRVSFNFFVTSPKNVSDIIPRSEV
EEAIRMSRGRINDIFGLNDNSLEFLGIYPTLKPPYEPPVTIWLIIFGVVMGTVVVGIVILIVTGIKGRKK
KNETKREENPYDSMDIGKGESNAGFQNSDDAQTSF


>AAX63775.1 [Paguma larvata]
MSGSFWLLLSFAALTAAQSTTEELAKTFLETFNYEAQELSYQSSVASWNYNTNITDENAKNMNEAGAKWS
AYYEEQSKLAQTYPLAEIQDAKIKRQLQALQQSGSSVLSADKSQRLNTILNAMSTIYSTGKACNPNNPQE
CLLLEPGLDNIMENSKDYNERLWAWEGWRAEVGKQLRPLYEEYVALKNEMARANNYEDYGDYWRGDYEEE
WTGGYNYSRNQLIQDVEDTFEQIKPLYQHLHAYVRAKLMDTYPSRISRTGCLPAHLLGDMWGRFWTNLYP
LTVPFGQKPNIDVTDAMVNQNWDARRIFKEAEKFFVSVGLPNMTQGFWENSMLTEPGDGRKVVCHPTAWD
LGKGDFRIKMCTKVTMDDFLTAHHEMGHIQYDMAYAAQPFLLRNGANEGFHEAVGEIMSLSAATPNHLKT
IGLLSPAFSEDNETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGAIPKEQWMQKWWEMKRNIVGVVEP
VPHDETYCDPASLFHVANDYSFIRYYTRTIYQFQFQEALCQIAKHEGPLHKCDISNSTEAGKKLLEMLSL
GRSEPWTLALERVVGAKNMNVTPLLNYFEPLFTWLKEQNRNSFVGWDTDWRPYSDQSIKVRISLKSALGE
KAYEWNDNEMYLFRSSIAYAMREYFSKVKNQTIPFVEDNVWVSDLKPRISFNFFVTFSNNVSDVIPRSEV
EDAIRMSRSRINDAFRLDDNSLEFLGIEPTLSPPYRPPVTIWLIVFGVVMGAIVVGIVLLIVSGIRNRRK
NDQAGSEENPYASVDLNKGENNPGFQHADDVQTSF


>U6DXQ3-1 [Neovison vison]
GLPNMTEGFWQNSMLTEPGDNRKVVCHPTAWDLGKHDFRIKMCTKVTMDDFLTAHHEMGH
IQYDMAYAAQPFLLRNGANEGFHEAVGEIMSLSAATPNHLKNIGLLPPDFSEDSETDINF
LLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKEQWMQKWWEMKRDIVGVVEPLPHDETYC
DPAALFHVANDYSFIRYYTRTIYQFQFQEALCQIAKHEGPLYKCDISNSREAGQKLHEML
SLGRSKPWTFALERVVGAKTMDVRPLLNYFEPLFTWLKEQNRNSFVGWNTDWSPYADQSI
KVRISLKSALGEKAYEWNDNEMYFFQSSIAYAMREYFSKVKKQTIPFVDKDVRVSDLKPR
ISFNFIVTSPENMSDIIPRADVEEAIRKSRGRINDAFRLDDNSLEFLGIQPTLEPPYQPP
VTIWLIVFGVVMGVVVVGIFLLIFSGIRNRRKNNQARSEENPYASVDLSKG


>BAE53380.1 [Mustela putorius furo]
MLGSSWLLLSLAALTAAQSTTEDLAKTFLEKFNYEAEELSYQNSLASWNYNTNITDENIQKMNIAGAKWS
AFYEEESQHAKTYPLEEIQDPIIKRQLRALQQSGSSVLSADKRERLNTILNAMSTIYSTGKACNPNNPQE
CLLLEPGLDDIMENSKDYNERLWAWEGWRSEVGKQLRPLYEEYVALKNEMARANNYEDYGDYWRGDYEEE
WADGYSYSRNQLIEDVEHTFTQIKPLYEHLHAYVRAKLMDAYPSRISPTGCLPAHLLGDMWGRFWTNLYP
LMVPFRQKPNIDVTDAMVNQSWDARRIFEEAETFFVSVGLPNMTEGFWQNSMLTEPGDNRKVVCHPTAWD
LGKRDFRIKMCTKVTMDDFLTAHHEMGHIQYDMAYAEQPFLLRNGANEGFHEAVGEIMSLSAATPNHLKN
IGLLPPDFSEDSETDINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKEQWMQKWWEMKRDIVGVVEP
LPHDETYCDPAALFHVANDYSFIRYYTRTIYQFQFQEALCQIAKHEGPLYKCDISNSSEAGQKLHEMLSL
GRSKPWTFALERVVGAKTMDVRPLLNYFEPLFTWLKEQNRNSFVGWNTDWSPYADQSIKVRISLKSALGE
KAYEWNDNEMYFFQSSIAYAMREYFSKVKNQTIPFVGKDVRVSDLKPRISFNFIVTSPENMSDIIPRADV
EEAIRKSRGRINDAFRLDDNSLEFLGIQPTLEPPYQPPVTIWLIVFGVVMGVVVVGIFLLIFSGIRNRRK
NNQARSEENPYASVDLSKGENNPGFQNVDDVQTSF


>AGZ48803.1 [Rhinolophus sinicus]
MSGSSWLLLSLVAVTTAQSTTEDEAKMFLDKFNTKAEDLSHQSSLASWDYNTNINDENVQKMDEAGAKWS
AFYEEQSKLAKNYSLEQIQNVTVKLQLQILQQSGSPVLSEDKSKRLNSILNAMSTIYSTGKVCKPNKPQE
CLLLEPGLDNIMGTSKDYNERLWAWEGWRAEVGKQLRPLYEEYVVLKNEMARGYHYEDYGDYWRRDYETE
ESPGPGYSRDQLMKDVERIFTEIKPLYEHLHAYVRAKLMDTYPFHISPTGCLPAHLLGDMWGRFWTNLYP
LTVPFGQKPNIDVTDEMLKQGWDADRIFKEAEKFFVSVGLPNMTEGFWNNSMLTEPGDGRKVVCHPTAWD
LGKGDFRIKMCTKVTMEDFLTAHHEMGHIQYDMAYASQPYLLRNGANEGFHEAVGEVMSLSVATPKHLKT
MGLLSPDFREDNETEINFLLKQALNIVGTLPFTYMLEKWRWMVFKGEIPKEEWMKKWWEMKRKIVGVVEP
VPHDETYCDPASLFHVANDYSFIRYYTRTIFEFQFHEALCRIAQHDGPLHKCDISNSTDAGKKLHQMLSV
GKSQAWTKTLEDIVDSRNMDVGPLLKYFEPLYTWLQEQNRKSYVGWNTDWSPYSDQSIKVRISLKSALGE
NAYEWNDNEMYLFRSSVAYAMREYFLKEKHQTILFGAENVWVSNLKPRISFNFHVTSPGNLSDIIPRPEV
EGAIRMSRSRINDAFRLDDNSLEFLGIQPTLGPPYQPPVTIWLIVFGVVMAVVVVGIVVLIITGIRDRRK
TDQARSEENPYSSVDLSKGENNPGFQNGDDVQTSF


>NP_001123985.1 [Mus musculus]
MSSSSWLLLSLVAVTTAQSLTEENAKTFLNNFNQEAEDLSYQSSLASWNYNTNITEENAQKMSEAAAKWS
AFYEEQSKTAQSFSLQEIQTPIIKRQLQALQQSGSSALSADKNKQLNTILNTMSTIYSTGKVCNPKNPQE
CLLLEPGLDEIMATSTDYNSRLWAWEGWRAEVGKQLRPLYEEYVVLKNEMARANNYNDYGDYWRGDYEAE
GADGYNYNRNQLIEDVERTFAEIKPLYEHLHAYVRRKLMDTYPSYISPTGCLPAHLLGDMWGRFWTNLYP
LTVPFAQKPNIDVTDAMMNQGWDAERIFQEAEKFFVSVGLPHMTQGFWANSMLTEPADGRKVVCHPTAWD
LGHGDFRIKMCTKVTMDNFLTAHHEMGHIQYDMAYARQPFLLRNGANEGFHEAVGEIMSLSAATPKHLKS
IGLLPSDFQEDSETEINFLLKQALTIVGTLPFTYMLEKWRWMVFRGEIPKEQWMKKWWEMKREIVGVVEP
LPHDETYCDPASLFHVSNDYSFIRYYTRTIYQFQFQEALCQAAKYNGSLHKCDISNSTEAGQKLLKMLSL
GNSEPWTKALENVVGARNMDVKPLLNYFQPLFDWLKEQNRNSFVGWNTEWSPYADQSIKVRISLKSALGA
NAYEWTNNEMFLFRSSVAYAMRKYFSIIKNQTVPFLEEDVRVSDLKPRVSFYFFVTSPQNVSDVIPRSEV
EDAIRMSRGRINDVFGLNDNSLEFLGIHPTLEPPYQPPVTIWLIIFGVVMALVVVGIIILIVTGIKGRKK
KNETKREENPYDSMDIGKGESNAGFQNSDDAQTSF


>XP_007090142.1 [Panthera tigris altaica]
LSFAALTAAQSTTEELAKTFLEKFNHEAEELSYQSSLASWNYNTNITDENVQKMNEAGAKWSAFYEEQSK
LAETYPLAEIHNTTVKRQLQALQQSGSSVLSADKSQRLNTILNAMSTIYSTGKACNPNNPQECLLLEPGL
DDIMENSKDYNERLWAWEGWRAEVGKQLRPLYEEYVALKNEMARANNYEDYGDYWRGDYEEEWTDGYNYS
RSQLIKDVEHTFTQIKPLYQHLHAYVRAKLMDSYPSRISPTGCLPAHLLGDMWGRFWTNLYPLTVPFGQK
PNIDVTDAMVNQSWDARRIFKEAEKFFVSVGLPNMTQGFWENSMLTEPGNSQKVVCHPTAWDLGKGDFRI
KMCTKVTMDDFLTAHHEMGHIQYDMAYAVQPFLLRNGANEGFHEAVGEIMSLSAATPNHLKTIGLLPPGF
SEDSETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKEQWMQKWWEMKREIVGVVEPVPHDETYC
DPASLFHVANDYSFIRYYTRTIYQFQFQEALCRIAKHEGPLHKCDISNSSEAGKKLLQMLTLGKSKPWTL
ALEHVVGEKNMNVTPLLKYFEPLFTWLKEQNRNSFVGWNTDWRPYADQSIKVRISLKSALGDKAYEWNDN
EMYLFRSSVAYAMREYFSKVKNQTIPFVEDNVWVSNLKPRISFNFFVTASKNVSDVIPRREVEEAIRMSR
SRINDAFRLDDNSLEFLGIQPTLSPPYQPPVTIWLIVFGVVMGVVVVGIVLLIVSGIRNRRKNNQARSEE
NPYASVDLSKGENNPGFQHADDVQTSF


>QLH93383.1 [Manis pentadactyla]
MSGSSWLLLSLVAVTAAQSTSDEEAKTFLEKFNSEAEELSYQSSLASWNYNTNITDENVQKMNVAGAKWS
TFYEEQSKIAKNYQLQNIQNDTIKRQLQALQLSGSSALSADKNQRLNTILNTMSTIYSTGKVCNPGNPQE
CSLLEPGLDNIMESSKDYNERLWAWEGWRSEVGKQLRPLYEEYVVLKNEMARANHYEDYGDYWRGDYETE
GANGYNYSRDHLIEDVEHIFTQIKPLYEHLHAYVRAKLMDNYPSHISPTGCLPAHLLGDMWGRFWTNLYP
LTVPFRQKPNIDVTDAMVNQTWDANRIFKEAEKFFVSVGLPKMTQTFWENSMLTEPGDGRKVVCHPTAWD
LGKHDFRIKMCTKVTMDDFLTAHHEMGHIQYDMAYAMQPYLLRNGANEGFHEAVGEIMSLSAATPKHLKN
IGLLPPDFYEDNETEINFLLKQALTIVGTLPFTYMLEKWRWMVFSGQIPKEQWMKKWWEMKREIVGVVEP
VPHDETYCDPASLFHVANDYSFIRYYTRTIYQFQFQEALCQTAKHEGPLHKCDISNSTEAGQKLLQMLSL
GKSKPWTLALERVVGTKNMDVRPLLNYFEPLLTWLKEQNKNSFVGWNTDWSPYAAQSIKVRISLKSALGE
KAYEWNDSEMYLFRSSVAYAMREYFSKFKKQTIPFEEESVRVSDLKPRVSFIFFVTLPKNVSAVIPRAEV
EEAIRMSRSRINDVFRLDDNSLEFLGIQPTLEPPYQPPVTIWLIVFGVVMGVIVVGIVVLIFTGIRDRKK
KNQARSEQNPYASVDLSKGENNPGFQNVDDVQTSF


>NP_001034545.1 [Felis catus]
MSGSFWLLLSFAALTAAQSTTEELAKTFLEKFNHEAEELSYQSSLASWNYNTNITDENVQKMNEAGAKWS
AFYEEQSKLAKTYPLAEIHNTTVKRQLQALQQSGSSVLSADKSQRLNTILNAMSTIYSTGKACNPNNPQE
CLLLEPGLDDIMENSKDYNERLWAWEGWRAEVGKQLRPLYEEYVALKNEMARANNYEDYGDYWRGDYEEE
WTDGYNYSRSQLIKDVEHTFTQIKPLYQHLHAYVRAKLMDTYPSRISPTGCLPAHLLGDMWGRFWTNLYP
LTVPFGQKPNIDVTDAMVNQSWDARRIFKEAEKFFVSVGLPNMTQGFWENSMLTEPGDSRKVVCHPTAWD
LGKGDFRIKMCTKVTMDDFLTAHHEMGHIQYDMAYAVQPFLLRNGANEGFHEAVGEIMSLSAATPNHLKT
IGLLSPGFSEDSETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKEQWMQKWWEMKREIVGVVEP
VPHDETYCDPASLFHVANDYSFIRYYTRTIYQFQFQEALCRIAKHEGPLHKCDISNSSEAGKKLLQMLTL
GKSKPWTLALEHVVGEKKMNVTPLLKYFEPLFTWLKEQNRNSFVGWNTDWRPYADQSIKVRISLKSALGD
EAYEWNDNEMYLFRSSVAYAMREYFSKVKNQTIPFVEDNVWVSNLKPRISFNFFVTASKNVSDVIPRSEV
EEAIRMSRSRINDAFRLDDNSLEFLGIQPTLSPPYQPPVTIWLIVFGVVMGVVVVGIVLLIVSGIRNRRK
NNQARSEENPYASVDLSKGENNPGFQHADDVQTSF


>NP_001158732.1 [Canis lupus familiaris]
MSGSSWLLLSLAALTAAQSTEDLVKTFLEKFNYEAEELSYQSSLASWNYNINITDENVQKMNNAGAKWSA
FYEEQSKLAKTYPLEEIQDSTVKRQLRALQHSGSSVLSADKNQRLNTILNSMSTVYSTGKACNPSNPQEC
LLLEPGLDDIMENSKDYNERLWAWEGWRSEVGKQLRPLYEEYVALKNEMARANNYEDYGDYWRGDYEEEW
ENGYNYSRNQLIDDVELTFTQIMPLYQHLHAYVRTKLMDTYPSYISPTGCLPAHLLGDMWGRFWTNLYPL
TVPFGQKPNIDVTNAMVNQSWDARKIFKEAEKFFVSVGLPNMTQEFWGNSMLTEPSDSRKVVCHPTAWDL
GKGDFRIKMCTKVTMDDFLTAHHEMGHIQYDMAYAAQPFLLRNGANEGFHEAVGEIMSLSAATPNHLKNI
GLLPPSFFEDSETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKDQWMKTWWEMKRNIVGVVEPV
PHDETYCDPASLFHVANDYSFIRYYTRTIYQFQFQEALCQIAKHEGPLHKCDISNSSEAGQKLLEMLKLG
KSKPWTYALEIVVGAKNMDVRPLLNYFEPLFTWLKEQNRNSFVGWNTDWSPYADQSIKVRISLKSALGEK
AYEWNNNEMYLFRSSIAYAMRQYFSEVKNQTIPFVEDNVWVSDLKPRISFNFSVTSPGNVSDIIPRTEVE
EAIRMYRSRINDVFRLDDNSLEFLGIQPTPGPPYEPPVTIWLIVFGVVMGVVVVGIVLLIFSGIRNRRKN
DQARGEENPYASVDLSKGENNPGFQSGDDVQTSF 


>AAY57872.1 [Chlorocebus aethiops]
MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLASWNYNTNITEENVQNMNNAGEKWS
AFLKEQSTLAQMYPLQAIQNLTVKLQLQALQQNGSSVLSEDKSKRLNTILNTMSTIHSTGKVCNPNNPQE
CLLLDPGLNEIMEKSLDYNERLWAWEGWRSEVGKQLRPLYEEYVVLKNEMARANHYKDYGDYWRGDYEVN
GVDGYDYNRDQLIEDVERTFEEIKPLYEHLHAYVRAKLMNAYPSYISPTGCLPAHLLGDMWGRFWTNLYS
LTVPFGQKPNIDVTDAMVNQAWNAQRIFKEAEKFFVSVGLPNMTQGFWENSMLTDPGNVQKVVCHPTAWD
LGKGDFRIIMCTKVTMDDFLTAHHEMGHIQYDMAYAAQPFLLRNGANEGFHEAVGEIMSLSAATPKHLKS
IGLLSPDFQEDNETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKDQWMKKWWEMKREIVGVVEP
VPHDETYCDPASLFHVSNDYSFIRYYTRTLYQFQFQEALCQAAKHEGPLHKCDISNSTEAGQKLLNMLKL
GKSEPWTLALENVVGAKNMSVRPLLNYFEPLFTWLKDQNKNSFVGWSTDWSPYADQSIKVRISLKSALGA
NAYKWNDNEMYLFRSSVAYAMRQYFLENKHQTILFGEEDVRVADLKPRISFNFYVTAPKNVSDIIPRTEV
EEAIRFSRSRINDAFQLNDNSLEFLGIQSTLVPPYQSPITTWLIVFGVVMAVIVAGIVVLIFTGIRDRKK
KNQARSEENPYASIDISKGENNPGFQNTDDVQTSF


>NP_001116542.1 [Sus scrofa]
MSGSFWLLLSLIPVTAAQSTTEELAKTFLEKFNLEAEDLAYQSSLASWTINTNITDENIQKMNDARAKWS
AFYEEQSRIAKTYPLDEIQTLILKRQLQALQQSGTSGLSADKSKRLNTILNTMSTIYSSGKVLDPNNPQE
CLVLEPGLDEIMENSKDYSRRLWAWESWRAEVGKQLRPLYEEYVVLENEMARANNYEDYGDYWRGDYEVT
GTGDYDYSRNQLMEDVERTFAEIKPLYEHLHAYVRAKLMDAYPSRISPTGCLPAHLLGDMWGRFWTNLYP
LTVPFGEKPSIDVTEAMVNQSWDAIRIFEEAEKFFVSIGLPNMTQGFWNNSMLTEPGDGRKVVCHPTAWD
LGKGDFRIKMCTKVTMDDFLTAHHEMGHIQYDMAYAIQPYLLRNGANEGFHEAVGEIMSLSAATPHYLKA
LGLLPPDFYEDSETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKEQWMQKWWEMKREIVGVVEP
LPHDETYCDPACLFHVAEDYSFIRYYTRTIYQFQFHEALCRTAKHEGPLYKCDISNSTEAGQKLLQMLSL
GKSEPWTLALENIVGVKTMDVKPLLSYFEPLLTWLKAQNGNSSVGWNTDWTPYADQSIKVRISLKSALGE
DAYEWNDNEMYLFRSSIAYAMRNYFSSAKNETIPFGAVDVWVSDLKPRISFNFFVTSPANMSDIIPRSDV
EKAISMSRSRINDAFRLDDNTLEFLGIQPTLGPPDEPPVTVWLIIFGVVMGLVVVGIVVLIFTGIRDRRK
KKQASSEENPYGSMDLSKGESNSGFQNGDDIQTSF


>Q5RFN1.1 [Pongo abelii] 
MSGSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLASWNYNTNITEENVQNMNNAGDKWS
AFLKEQSTLAQMYPLQEIQNLTVKLQLQALQQNGSSVLSEDKSKRLNTILNTMSTIYSTGKVCNPNNPQE
CLLLEPGLNEIMANSLDYNERLWAWESWRSEVGKQLRPLYEEYVVLKNEMARANHYEDYGDYWRGDYEVN
GVDSYDYSRGQLIEDVEHTFEEIKPLYEHLHAYVRAKLINAYPSYISPIGCLPAHLLGDMWGRFWTNLYS
LTVPFGQKPNIDVTDAMVDQAWDAQRIFKEAEKFFVSVGLPNMTQRFWENSMLTDPGNVQKVVCHPTAWD
LGKGDFRILMCTKVTMDDFLTAHHEMGHIQYDMAYAAQPFLLRNGANEGFHEAVGEIMSLSAATPKHLKS
IGLLSPDFQEDNETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKDQWMKKWWEMKREIVGVVEP
VPHDETYCDPASLFHVSNDYSFIRYYTRTLYQFQFQEALCQAAKHEGPLHKCDISNSTEAGQKLLNMLRL
GKSEPWTLALENVVGAKNMNVRPLLDYFEPLFTWLKDQNKNSFVGWSTDWSPYADQSIKVRISLKSALGN
KAYEWNDNEIYLFRSSVAYAMRKYFLEVKNQMILFGEEDVRVANLKPRISFNFFVTAPKNVSDIIPRTEV
EKAIRMSRSRINDAFRLNDNSLEFLGIQPTLGPPNQPPVSIWLIVFGVVMGVIVVGIVVLIFTGIRDRKK
KNKARNEENPYASIDISKGENNPGFQNTDDVQTSF

Clustal Format

The sequences were the aligned into the clustal format for better viewing. This was done with Phylogeny.fr.

AAW78017.1      MSSSCWLLLSLVAVATAQSLIEEKAESFLNKFNQEAEDLSYQSSLASWNYNTNITEENAQ
NP_0011239      MSSSSWLLLSLVAVTTAQSLTEENAKTFLNNFNQEAEDLSYQSSLASWNYNTNITEENAQ
AGZ48803.1      MSGSSWLLLSLVAVTTAQSTTEDEAKMFLDKFNTKAEDLSHQSSLASWDYNTNINDENVQ
NP_0011165      MSGSFWLLLSLIPVTAAQSTTEELAKTFLEKFNLEAEDLAYQSSLASWTINTNITDENIQ
AAY57872.1      MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLASWNYNTNITEENVQ
NP_0013583      MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLASWNYNTNITEENVQ
Q5RFN1.1_P      MSGSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLASWNYNTNITEENVQ
QLH93383.1      MSGSSWLLLSLVAVTAAQSTSDEEAKTFLEKFNSEAEELSYQSSLASWNYNTNITDENVQ
U6DXQ3-1_N      ------------------------------------------------------------
BAE53380.1      MLGSSWLLLSLAALTAAQSTTEDLAKTFLEKFNYEAEELSYQNSLASWNYNTNITDENIQ
NP_0011587      MSGSSWLLLSLAALTAAQST-EDLVKTFLEKFNYEAEELSYQSSLASWNYNINITDENVQ
AAX63775.1      MSGSFWLLLSFAALTAAQSTTEELAKTFLETFNYEAQELSYQSSVASWNYNTNITDENAK
XP_0070901      --------LSFAALTAAQSTTEELAKTFLEKFNHEAEELSYQSSLASWNYNTNITDENVQ
NP_0010345      MSGSFWLLLSFAALTAAQSTTEELAKTFLEKFNHEAEELSYQSSLASWNYNTNITDENVQ
                                                                           
AAW78017.1      KMNEAAAKWSAFYEEQSKIAQNFSLQEIQNATIKRQLKALQQSGSSALSPDKNKQLNTIL
NP_0011239      KMSEAAAKWSAFYEEQSKTAQSFSLQEIQTPIIKRQLQALQQSGSSALSADKNKQLNTIL
AGZ48803.1      KMDEAGAKWSAFYEEQSKLAKNYSLEQIQNVTVKLQLQILQQSGSPVLSEDKSKRLNSIL
NP_0011165      KMNDARAKWSAFYEEQSRIAKTYPLDEIQTLILKRQLQALQQSGTSGLSADKSKRLNTIL
AAY57872.1      NMNNAGEKWSAFLKEQSTLAQMYPLQAIQNLTVKLQLQALQQNGSSVLSEDKSKRLNTIL
NP_0013583      NMNNAGDKWSAFLKEQSTLAQMYPLQEIQNLTVKLQLQALQQNGSSVLSEDKSKRLNTIL
Q5RFN1.1_P      NMNNAGDKWSAFLKEQSTLAQMYPLQEIQNLTVKLQLQALQQNGSSVLSEDKSKRLNTIL
QLH93383.1      KMNVAGAKWSTFYEEQSKIAKNYQLQNIQNDTIKRQLQALQLSGSSALSADKNQRLNTIL
U6DXQ3-1_N      ------------------------------------------------------------
BAE53380.1      KMNIAGAKWSAFYEEESQHAKTYPLEEIQDPIIKRQLRALQQSGSSVLSADKRERLNTIL
NP_0011587      KMNNAGAKWSAFYEEQSKLAKTYPLEEIQDSTVKRQLRALQHSGSSVLSADKNQRLNTIL
AAX63775.1      NMNEAGAKWSAYYEEQSKLAQTYPLAEIQDAKIKRQLQALQQSGSSVLSADKSQRLNTIL
XP_0070901      KMNEAGAKWSAFYEEQSKLAETYPLAEIHNTTVKRQLQALQQSGSSVLSADKSQRLNTIL
NP_0010345      KMNEAGAKWSAFYEEQSKLAKTYPLAEIHNTTVKRQLQALQQSGSSVLSADKSQRLNTIL
                                                                           
AAW78017.1      NTMSTIYSTGKVCNSMNPQECFLLEPGLDEIMATSTDYNRRLWAWEGWRAEVGKQLRPLY
NP_0011239      NTMSTIYSTGKVCNPKNPQECLLLEPGLDEIMATSTDYNSRLWAWEGWRAEVGKQLRPLY
AGZ48803.1      NAMSTIYSTGKVCKPNKPQECLLLEPGLDNIMGTSKDYNERLWAWEGWRAEVGKQLRPLY
NP_0011165      NTMSTIYSSGKVLDPNNPQECLVLEPGLDEIMENSKDYSRRLWAWESWRAEVGKQLRPLY
AAY57872.1      NTMSTIHSTGKVCNPNNPQECLLLDPGLNEIMEKSLDYNERLWAWEGWRSEVGKQLRPLY
NP_0013583      NTMSTIYSTGKVCNPDNPQECLLLEPGLNEIMANSLDYNERLWAWESWRSEVGKQLRPLY
Q5RFN1.1_P      NTMSTIYSTGKVCNPNNPQECLLLEPGLNEIMANSLDYNERLWAWESWRSEVGKQLRPLY
QLH93383.1      NTMSTIYSTGKVCNPGNPQECSLLEPGLDNIMESSKDYNERLWAWEGWRSEVGKQLRPLY
U6DXQ3-1_N      ------------------------------------------------------------
BAE53380.1      NAMSTIYSTGKACNPNNPQECLLLEPGLDDIMENSKDYNERLWAWEGWRSEVGKQLRPLY
NP_0011587      NSMSTVYSTGKACNPSNPQECLLLEPGLDDIMENSKDYNERLWAWEGWRSEVGKQLRPLY
AAX63775.1      NAMSTIYSTGKACNPNNPQECLLLEPGLDNIMENSKDYNERLWAWEGWRAEVGKQLRPLY
XP_0070901      NAMSTIYSTGKACNPNNPQECLLLEPGLDDIMENSKDYNERLWAWEGWRAEVGKQLRPLY
NP_0010345      NAMSTIYSTGKACNPNNPQECLLLEPGLDDIMENSKDYNERLWAWEGWRAEVGKQLRPLY
                                                                           
AAW78017.1      EEYVVLKNEMARANNYEDYGDYWRGDYEAEGVEGYNYNRNQLIEDVENTFKEIKPLYEQL
NP_0011239      EEYVVLKNEMARANNYNDYGDYWRGDYEAEGADGYNYNRNQLIEDVERTFAEIKPLYEHL
AGZ48803.1      EEYVVLKNEMARGYHYEDYGDYWRRDYETEESPGPGYSRDQLMKDVERIFTEIKPLYEHL
NP_0011165      EEYVVLENEMARANNYEDYGDYWRGDYEVTGTGDYDYSRNQLMEDVERTFAEIKPLYEHL
AAY57872.1      EEYVVLKNEMARANHYKDYGDYWRGDYEVNGVDGYDYNRDQLIEDVERTFEEIKPLYEHL
NP_0013583      EEYVVLKNEMARANHYEDYGDYWRGDYEVNGVDGYDYSRGQLIEDVEHTFEEIKPLYEHL
Q5RFN1.1_P      EEYVVLKNEMARANHYEDYGDYWRGDYEVNGVDSYDYSRGQLIEDVEHTFEEIKPLYEHL
QLH93383.1      EEYVVLKNEMARANHYEDYGDYWRGDYETEGANGYNYSRDHLIEDVEHIFTQIKPLYEHL
U6DXQ3-1_N      ------------------------------------------------------------
BAE53380.1      EEYVALKNEMARANNYEDYGDYWRGDYEEEWADGYSYSRNQLIEDVEHTFTQIKPLYEHL
NP_0011587      EEYVALKNEMARANNYEDYGDYWRGDYEEEWENGYNYSRNQLIDDVELTFTQIMPLYQHL
AAX63775.1      EEYVALKNEMARANNYEDYGDYWRGDYEEEWTGGYNYSRNQLIQDVEDTFEQIKPLYQHL
XP_0070901      EEYVALKNEMARANNYEDYGDYWRGDYEEEWTDGYNYSRSQLIKDVEHTFTQIKPLYQHL
NP_0010345      EEYVALKNEMARANNYEDYGDYWRGDYEEEWTDGYNYSRSQLIKDVEHTFTQIKPLYQHL
                                                                           
AAW78017.1      HAYVRTKLMEVYPSYISPTGCLPAHLLGDMWGRFWTNLYPLTTPFLQKPNIDVTDAMVNQ
NP_0011239      HAYVRRKLMDTYPSYISPTGCLPAHLLGDMWGRFWTNLYPLTVPFAQKPNIDVTDAMMNQ
AGZ48803.1      HAYVRAKLMDTYPFHISPTGCLPAHLLGDMWGRFWTNLYPLTVPFGQKPNIDVTDEMLKQ
NP_0011165      HAYVRAKLMDAYPSRISPTGCLPAHLLGDMWGRFWTNLYPLTVPFGEKPSIDVTEAMVNQ
AAY57872.1      HAYVRAKLMNAYPSYISPTGCLPAHLLGDMWGRFWTNLYSLTVPFGQKPNIDVTDAMVNQ
NP_0013583      HAYVRAKLMNAYPSYISPIGCLPAHLLGDMWGRFWTNLYSLTVPFGQKPNIDVTDAMVDQ
Q5RFN1.1_P      HAYVRAKLINAYPSYISPIGCLPAHLLGDMWGRFWTNLYSLTVPFGQKPNIDVTDAMVDQ
QLH93383.1      HAYVRAKLMDNYPSHISPTGCLPAHLLGDMWGRFWTNLYPLTVPFRQKPNIDVTDAMVNQ
U6DXQ3-1_N      ------------------------------------------------------------
BAE53380.1      HAYVRAKLMDAYPSRISPTGCLPAHLLGDMWGRFWTNLYPLMVPFRQKPNIDVTDAMVNQ
NP_0011587      HAYVRTKLMDTYPSYISPTGCLPAHLLGDMWGRFWTNLYPLTVPFGQKPNIDVTNAMVNQ
AAX63775.1      HAYVRAKLMDTYPSRISRTGCLPAHLLGDMWGRFWTNLYPLTVPFGQKPNIDVTDAMVNQ
XP_0070901      HAYVRAKLMDSYPSRISPTGCLPAHLLGDMWGRFWTNLYPLTVPFGQKPNIDVTDAMVNQ
NP_0010345      HAYVRAKLMDTYPSRISPTGCLPAHLLGDMWGRFWTNLYPLTVPFGQKPNIDVTDAMVNQ
                                                                           
AAW78017.1      SWDAERIFKEAEKFFVSVGLPQMTPGFWTNSMLTEPGDDRKVVCHPTAWDLGHGDFRIKM
NP_0011239      GWDAERIFQEAEKFFVSVGLPHMTQGFWANSMLTEPADGRKVVCHPTAWDLGHGDFRIKM
AGZ48803.1      GWDADRIFKEAEKFFVSVGLPNMTEGFWNNSMLTEPGDGRKVVCHPTAWDLGKGDFRIKM
NP_0011165      SWDAIRIFEEAEKFFVSIGLPNMTQGFWNNSMLTEPGDGRKVVCHPTAWDLGKGDFRIKM
AAY57872.1      AWNAQRIFKEAEKFFVSVGLPNMTQGFWENSMLTDPGNVQKVVCHPTAWDLGKGDFRIIM
NP_0013583      AWDAQRIFKEAEKFFVSVGLPNMTQGFWENSMLTDPGNVQKAVCHPTAWDLGKGDFRILM
Q5RFN1.1_P      AWDAQRIFKEAEKFFVSVGLPNMTQRFWENSMLTDPGNVQKVVCHPTAWDLGKGDFRILM
QLH93383.1      TWDANRIFKEAEKFFVSVGLPKMTQTFWENSMLTEPGDGRKVVCHPTAWDLGKHDFRIKM
U6DXQ3-1_N      ------------------GLPNMTEGFWQNSMLTEPGDNRKVVCHPTAWDLGKHDFRIKM
BAE53380.1      SWDARRIFEEAETFFVSVGLPNMTEGFWQNSMLTEPGDNRKVVCHPTAWDLGKRDFRIKM
NP_0011587      SWDARKIFKEAEKFFVSVGLPNMTQEFWGNSMLTEPSDSRKVVCHPTAWDLGKGDFRIKM
AAX63775.1      NWDARRIFKEAEKFFVSVGLPNMTQGFWENSMLTEPGDGRKVVCHPTAWDLGKGDFRIKM
XP_0070901      SWDARRIFKEAEKFFVSVGLPNMTQGFWENSMLTEPGNSQKVVCHPTAWDLGKGDFRIKM
NP_0010345      SWDARRIFKEAEKFFVSVGLPNMTQGFWENSMLTEPGDSRKVVCHPTAWDLGKGDFRIKM
                                  ***:**  ** *****:*.: .*.**********: **** *
AAW78017.1      CTKVTMDNFLTAHHEMGHIQYDMAYAKQPFLLRNGANEGFHEAVGEIMSLSAATPKHLKS
NP_0011239      CTKVTMDNFLTAHHEMGHIQYDMAYARQPFLLRNGANEGFHEAVGEIMSLSAATPKHLKS
AGZ48803.1      CTKVTMEDFLTAHHEMGHIQYDMAYASQPYLLRNGANEGFHEAVGEVMSLSVATPKHLKT
NP_0011165      CTKVTMDDFLTAHHEMGHIQYDMAYAIQPYLLRNGANEGFHEAVGEIMSLSAATPHYLKA
AAY57872.1      CTKVTMDDFLTAHHEMGHIQYDMAYAAQPFLLRNGANEGFHEAVGEIMSLSAATPKHLKS
NP_0013583      CTKVTMDDFLTAHHEMGHIQYDMAYAAQPFLLRNGANEGFHEAVGEIMSLSAATPKHLKS
Q5RFN1.1_P      CTKVTMDDFLTAHHEMGHIQYDMAYAAQPFLLRNGANEGFHEAVGEIMSLSAATPKHLKS
QLH93383.1      CTKVTMDDFLTAHHEMGHIQYDMAYAMQPYLLRNGANEGFHEAVGEIMSLSAATPKHLKN
U6DXQ3-1_N      CTKVTMDDFLTAHHEMGHIQYDMAYAAQPFLLRNGANEGFHEAVGEIMSLSAATPNHLKN
BAE53380.1      CTKVTMDDFLTAHHEMGHIQYDMAYAEQPFLLRNGANEGFHEAVGEIMSLSAATPNHLKN
NP_0011587      CTKVTMDDFLTAHHEMGHIQYDMAYAAQPFLLRNGANEGFHEAVGEIMSLSAATPNHLKN
AAX63775.1      CTKVTMDDFLTAHHEMGHIQYDMAYAAQPFLLRNGANEGFHEAVGEIMSLSAATPNHLKT
XP_0070901      CTKVTMDDFLTAHHEMGHIQYDMAYAVQPFLLRNGANEGFHEAVGEIMSLSAATPNHLKT
NP_0010345      CTKVTMDDFLTAHHEMGHIQYDMAYAVQPFLLRNGANEGFHEAVGEIMSLSAATPNHLKT
                ******::****************** **:****************:****.***::** 
AAW78017.1      IGLLPSNFQEDNETEINFLLKQALTIVGTLPFTYMLEKWRWMVFQDKIPREQWTKKWWEM
NP_0011239      IGLLPSDFQEDSETEINFLLKQALTIVGTLPFTYMLEKWRWMVFRGEIPKEQWMKKWWEM
AGZ48803.1      MGLLSPDFREDNETEINFLLKQALNIVGTLPFTYMLEKWRWMVFKGEIPKEEWMKKWWEM
NP_0011165      LGLLPPDFYEDSETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKEQWMQKWWEM
AAY57872.1      IGLLSPDFQEDNETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKDQWMKKWWEM
NP_0013583      IGLLSPDFQEDNETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKDQWMKKWWEM
Q5RFN1.1_P      IGLLSPDFQEDNETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKDQWMKKWWEM
QLH93383.1      IGLLPPDFYEDNETEINFLLKQALTIVGTLPFTYMLEKWRWMVFSGQIPKEQWMKKWWEM
U6DXQ3-1_N      IGLLPPDFSEDSETDINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKEQWMQKWWEM
BAE53380.1      IGLLPPDFSEDSETDINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKEQWMQKWWEM
NP_0011587      IGLLPPSFFEDSETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKDQWMKTWWEM
AAX63775.1      IGLLSPAFSEDNETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGAIPKEQWMQKWWEM
XP_0070901      IGLLPPGFSEDSETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKEQWMQKWWEM
NP_0010345      IGLLSPGFSEDSETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKEQWMQKWWEM
                :***.. * **.**:*********.******************* . **.::* :.****
AAW78017.1      KREIVGVVEPLPHDETYCDPASLFHVSNDYSFIRYYTRTIYQFQFQEALCQAAKHDGPLH
NP_0011239      KREIVGVVEPLPHDETYCDPASLFHVSNDYSFIRYYTRTIYQFQFQEALCQAAKYNGSLH
AGZ48803.1      KRKIVGVVEPVPHDETYCDPASLFHVANDYSFIRYYTRTIFEFQFHEALCRIAQHDGPLH
NP_0011165      KREIVGVVEPLPHDETYCDPACLFHVAEDYSFIRYYTRTIYQFQFHEALCRTAKHEGPLY
AAY57872.1      KREIVGVVEPVPHDETYCDPASLFHVSNDYSFIRYYTRTLYQFQFQEALCQAAKHEGPLH
NP_0013583      KREIVGVVEPVPHDETYCDPASLFHVSNDYSFIRYYTRTLYQFQFQEALCQAAKHEGPLH
Q5RFN1.1_P      KREIVGVVEPVPHDETYCDPASLFHVSNDYSFIRYYTRTLYQFQFQEALCQAAKHEGPLH
QLH93383.1      KREIVGVVEPVPHDETYCDPASLFHVANDYSFIRYYTRTIYQFQFQEALCQTAKHEGPLH
U6DXQ3-1_N      KRDIVGVVEPLPHDETYCDPAALFHVANDYSFIRYYTRTIYQFQFQEALCQIAKHEGPLY
BAE53380.1      KRDIVGVVEPLPHDETYCDPAALFHVANDYSFIRYYTRTIYQFQFQEALCQIAKHEGPLY
NP_0011587      KRNIVGVVEPVPHDETYCDPASLFHVANDYSFIRYYTRTIYQFQFQEALCQIAKHEGPLH
AAX63775.1      KRNIVGVVEPVPHDETYCDPASLFHVANDYSFIRYYTRTIYQFQFQEALCQIAKHEGPLH
XP_0070901      KREIVGVVEPVPHDETYCDPASLFHVANDYSFIRYYTRTIYQFQFQEALCRIAKHEGPLH
NP_0010345      KREIVGVVEPVPHDETYCDPASLFHVANDYSFIRYYTRTIYQFQFQEALCRIAKHEGPLH
                **.*******:**********.****::***********:::***:****. *:::*.*:
AAW78017.1      KCDISNSTEAGQKLLNMLSLGNSGPWTLALENVVGSRNMDVKPLLNYFQPLFVWLKEQNR
NP_0011239      KCDISNSTEAGQKLLKMLSLGNSEPWTKALENVVGARNMDVKPLLNYFQPLFDWLKEQNR
AGZ48803.1      KCDISNSTDAGKKLHQMLSVGKSQAWTKTLEDIVDSRNMDVGPLLKYFEPLYTWLQEQNR
NP_0011165      KCDISNSTEAGQKLLQMLSLGKSEPWTLALENIVGVKTMDVKPLLSYFEPLLTWLKAQNG
AAY57872.1      KCDISNSTEAGQKLLNMLKLGKSEPWTLALENVVGAKNMSVRPLLNYFEPLFTWLKDQNK
NP_0013583      KCDISNSTEAGQKLFNMLRLGKSEPWTLALENVVGAKNMNVRPLLNYFEPLFTWLKDQNK
Q5RFN1.1_P      KCDISNSTEAGQKLLNMLRLGKSEPWTLALENVVGAKNMNVRPLLDYFEPLFTWLKDQNK
QLH93383.1      KCDISNSTEAGQKLLQMLSLGKSKPWTLALERVVGTKNMDVRPLLNYFEPLLTWLKEQNK
U6DXQ3-1_N      KCDISNSREAGQKLHEMLSLGRSKPWTFALERVVGAKTMDVRPLLNYFEPLFTWLKEQNR
BAE53380.1      KCDISNSSEAGQKLHEMLSLGRSKPWTFALERVVGAKTMDVRPLLNYFEPLFTWLKEQNR
NP_0011587      KCDISNSSEAGQKLLEMLKLGKSKPWTYALEIVVGAKNMDVRPLLNYFEPLFTWLKEQNR
AAX63775.1      KCDISNSTEAGKKLLEMLSLGRSEPWTLALERVVGAKNMNVTPLLNYFEPLFTWLKEQNR
XP_0070901      KCDISNSSEAGKKLLQMLTLGKSKPWTLALEHVVGEKNMNVTPLLKYFEPLFTWLKEQNR
NP_0010345      KCDISNSSEAGKKLLQMLTLGKSKPWTLALEHVVGEKKMNVTPLLKYFEPLFTWLKEQNR
                ******* :**:** :** :*.* .** :** :*. ..*.* ***.**:**  **: ** 
AAW78017.1      NSTVGWSTDWSPYADQSIKVRISLKSALGKNAYEWTDNEMYLFRSSVAYAMREYFSREKN
NP_0011239      NSFVGWNTEWSPYADQSIKVRISLKSALGANAYEWTNNEMFLFRSSVAYAMRKYFSIIKN
AGZ48803.1      KSYVGWNTDWSPYSDQSIKVRISLKSALGENAYEWNDNEMYLFRSSVAYAMREYFLKEKH
NP_0011165      NSSVGWNTDWTPYADQSIKVRISLKSALGEDAYEWNDNEMYLFRSSIAYAMRNYFSSAKN
AAY57872.1      NSFVGWSTDWSPYADQSIKVRISLKSALGANAYKWNDNEMYLFRSSVAYAMRQYFLENKH
NP_0013583      NSFVGWSTDWSPYADQSIKVRISLKSALGDKAYEWNDNEMYLFRSSVAYAMRQYFLKVKN
Q5RFN1.1_P      NSFVGWSTDWSPYADQSIKVRISLKSALGNKAYEWNDNEIYLFRSSVAYAMRKYFLEVKN
QLH93383.1      NSFVGWNTDWSPYAAQSIKVRISLKSALGEKAYEWNDSEMYLFRSSVAYAMREYFSKFKK
U6DXQ3-1_N      NSFVGWNTDWSPYADQSIKVRISLKSALGEKAYEWNDNEMYFFQSSIAYAMREYFSKVKK
BAE53380.1      NSFVGWNTDWSPYADQSIKVRISLKSALGEKAYEWNDNEMYFFQSSIAYAMREYFSKVKN
NP_0011587      NSFVGWNTDWSPYADQSIKVRISLKSALGEKAYEWNNNEMYLFRSSIAYAMRQYFSEVKN
AAX63775.1      NSFVGWDTDWRPYSDQSIKVRISLKSALGEKAYEWNDNEMYLFRSSIAYAMREYFSKVKN
XP_0070901      NSFVGWNTDWRPYADQSIKVRISLKSALGDKAYEWNDNEMYLFRSSVAYAMREYFSKVKN
NP_0010345      NSFVGWNTDWRPYADQSIKVRISLKSALGDEAYEWNDNEMYLFRSSVAYAMREYFSKVKN
                :* ***.*:* **: ************** .**:*.:.*:::*.**:*****:**   *:
AAW78017.1      QTVPFGEADVWVSDLKPRVSFNFFVTSPKNVSDIIPRSEVEEAIRMSRGRINDIFGLNDN
NP_0011239      QTVPFLEEDVRVSDLKPRVSFYFFVTSPQNVSDVIPRSEVEDAIRMSRGRINDVFGLNDN
AGZ48803.1      QTILFGAENVWVSNLKPRISFNFHVTSPGNLSDIIPRPEVEGAIRMSRSRINDAFRLDDN
NP_0011165      ETIPFGAVDVWVSDLKPRISFNFFVTSPANMSDIIPRSDVEKAISMSRSRINDAFRLDDN
AAY57872.1      QTILFGEEDVRVADLKPRISFNFYVTAPKNVSDIIPRTEVEEAIRFSRSRINDAFQLNDN
NP_0013583      QMILFGEEDVRVANLKPRISFNFFVTAPKNVSDIIPRTEVEKAIRMSRSRINDAFRLNDN
Q5RFN1.1_P      QMILFGEEDVRVANLKPRISFNFFVTAPKNVSDIIPRTEVEKAIRMSRSRINDAFRLNDN
QLH93383.1      QTIPFEEESVRVSDLKPRVSFIFFVTLPKNVSAVIPRAEVEEAIRMSRSRINDVFRLDDN
U6DXQ3-1_N      QTIPFVDKDVRVSDLKPRISFNFIVTSPENMSDIIPRADVEEAIRKSRGRINDAFRLDDN
BAE53380.1      QTIPFVGKDVRVSDLKPRISFNFIVTSPENMSDIIPRADVEEAIRKSRGRINDAFRLDDN
NP_0011587      QTIPFVEDNVWVSDLKPRISFNFSVTSPGNVSDIIPRTEVEEAIRMYRSRINDVFRLDDN
AAX63775.1      QTIPFVEDNVWVSDLKPRISFNFFVTFSNNVSDVIPRSEVEDAIRMSRSRINDAFRLDDN
XP_0070901      QTIPFVEDNVWVSNLKPRISFNFFVTASKNVSDVIPRREVEEAIRMSRSRINDAFRLDDN
NP_0010345      QTIPFVEDNVWVSNLKPRISFNFFVTASKNVSDVIPRSEVEEAIRMSRSRINDAFRLDDN
                : : *   .*.*::****:** * ** . *:* :*** :** **   *.**** * *:**
AAW78017.1      SLEFLGIYPTLKPPYEPPVTIWLIIFGVVMGTVVVGIVILIVTGIKGRKKKNETKREENP
NP_0011239      SLEFLGIHPTLEPPYQPPVTIWLIIFGVVMALVVVGIIILIVTGIKGRKKKNETKREENP
AGZ48803.1      SLEFLGIQPTLGPPYQPPVTIWLIVFGVVMAVVVVGIVVLIITGIRDRRKTDQARSEENP
NP_0011165      TLEFLGIQPTLGPPDEPPVTVWLIIFGVVMGLVVVGIVVLIFTGIRDRRKKKQASSEENP
AAY57872.1      SLEFLGIQSTLVPPYQSPITTWLIVFGVVMAVIVAGIVVLIFTGIRDRKKKNQARSEENP
NP_0013583      SLEFLGIQPTLGPPNQPPVSIWLIVFGVVMGVIVVGIVILIFTGIRDRKKKNKARSGENP
Q5RFN1.1_P      SLEFLGIQPTLGPPNQPPVSIWLIVFGVVMGVIVVGIVVLIFTGIRDRKKKNKARNEENP
QLH93383.1      SLEFLGIQPTLEPPYQPPVTIWLIVFGVVMGVIVVGIVVLIFTGIRDRKKKNQARSEQNP
U6DXQ3-1_N      SLEFLGIQPTLEPPYQPPVTIWLIVFGVVMGVVVVGIFLLIFSGIRNRRKNNQARSEENP
BAE53380.1      SLEFLGIQPTLEPPYQPPVTIWLIVFGVVMGVVVVGIFLLIFSGIRNRRKNNQARSEENP
NP_0011587      SLEFLGIQPTPGPPYEPPVTIWLIVFGVVMGVVVVGIVLLIFSGIRNRRKNDQARGEENP
AAX63775.1      SLEFLGIEPTLSPPYRPPVTIWLIVFGVVMGAIVVGIVLLIVSGIRNRRKNDQAGSEENP
XP_0070901      SLEFLGIQPTLSPPYQPPVTIWLIVFGVVMGVVVVGIVLLIVSGIRNRRKNNQARSEENP
NP_0010345      SLEFLGIQPTLSPPYQPPVTIWLIVFGVVMGVVVVGIVLLIVSGIRNRRKNNQARSEENP
                :****** .*  **  .*:: ***:*****. :*.**.:**.:**..*.*..::   :**
AAW78017.1      YDSMDIGKGESNAGFQNSDDAQTSF
NP_0011239      YDSMDIGKGESNAGFQNSDDAQTSF
AGZ48803.1      YSSVDLSKGENNPGFQNGDDVQTSF
NP_0011165      YGSMDLSKGESNSGFQNGDDIQTSF
AAY57872.1      YASIDISKGENNPGFQNTDDVQTSF
NP_0013583      YASIDISKGENNPGFQNTDDVQTSF
Q5RFN1.1_P      YASIDISKGENNPGFQNTDDVQTSF
QLH93383.1      YASVDLSKGENNPGFQNVDDVQTSF
U6DXQ3-1_N      YASVDLSKG----------------
BAE53380.1      YASVDLSKGENNPGFQNVDDVQTSF
NP_0011587      YASVDLSKGENNPGFQSGDDVQTSF
AAX63775.1      YASVDLNKGENNPGFQHADDVQTSF
XP_0070901      YASVDLSKGENNPGFQHADDVQTSF
NP_0010345      YASVDLSKGENNPGFQHADDVQTSF
                * *:*:.**

Model

To create the model we used the Figure 1C SARS-CoV RBD (optimized for human ACE2 recognition) and human ACE2: 3SCI. Follow the link to start the modelling process.

  • Click on the Windows menu to “View Sequences & Annotations”
  • In the new window that appears to the right, click on the “Details” tab to show the actual amino acid sequences
  • There are 2 sets of ACE2-spike proteins because of the way the proteins crystallized.
  • Focus on the pink and tan chains and orient them like is shown in Figure 4B
  • We are going to make the amino acid side chains shown in the figure visible.
  • In the sequence window go to sequence “Protein 3SCK_A” (in pink) and select the following amino acids. Use the overall
    • K31
    • E35
    • D38
    • M82
    • K353
  • The part of the ribbon that represents these amino acids should be highlighted in yellow in the structure
  • Go to the Styles menu and select Proteins > Ball and Stick
  • Go to the Color menu and select Atom
  • You should see the side chains shown in the figure.
  • The labels were then added using Microsoft Paint.

Bioinformatics.model.png

Phylogenetic tree

  • My partner (Yaniv) created the phylogenetic tree using the FASTA sequencing and produced the phylogenetic tree on Phylogeny.fr
  • MadCor pres1tree.png

Species table

Using the Clustal format sequences we constructed the following table on Excel to better visualize the difference in the critical residues. MaddahiJTChart.png

Presentation

File:Maddahi CorreyBioinformatics Presentation 1 slides.pdf

Conclusion

Overall this assignment taught me many things. I feel much more confident in my ability to do research and create models that assist with the research I am doing. We came to saveral conclusions about the binding of the ACE2 protein in different mammals with the SARS-COV-2 spike protein. The main conclusion we arrived at is that the binding depends on the polarity/charge of the amino acids in positions 31 and 353 in the ACE2 protein.

Acknowledgements

  • Yaniv Maddahi
    • Yaniv and I worked as homework partners for this week. We communicated and worked together both at the end of the week 6 lab and throughout the week to create our research project and assignment pages.
  • Dr. Dahlquist
    • Dr. Dahlquist served as a coach for how to begin our pages. She also instructed the class and provided us with the guiding homework document.
  • I copied and modified the protocol shown on the Week 4 page of our class OpenWetWare.
  • I copied and modified the protocol shown on the Week 5 page of our class OpenWetWare.
  • I copied and modified sequence data and a phylogenetic tree from Phylogeny.fr
  • I copied a model that was made using iCn3D web protein modelling.

Except for what is noted above, this individual journal entry was completed by me and not copied from another source. Jcorrey (talk) 23:02, 14 October 2020 (PDT)

References

  1. Yushun Wan, Jian Shang, Rachel Graham, Ralph S. Baric, Fang Li Journal of Virology Mar 2020, 94 (7) e00127-20; DOI: 10.1128/JVI.00127-20
  2. OpenWetWare. (2020). BIOL368/F20:Week 1. Retrieved September 30, 2020, from https://openwetware.org/wiki/BIOL368/F20:Week_1
  3. OpenWetWare. (2020). BIOL368/F20:Week 4. Retrieved September 30, 2020, from https://openwetware.org/wiki/BIOL368/F20:Week_4
  4. Phylogeny.fr: Home. (2020). Retrieved September 30, 2020, from https://www.phylogeny.fr/
  5. NCBI GenBank. (2020). Bat SARS coronavirus Rp3, complete genome - Nucleotide. Retrieved 1 October 2020, from https://www.ncbi.nlm.nih.gov/nuccore/DQ071615
  6. NCBI GenBank. (2020). spike protein [Bat SARS CoV Rp3/2004] - Protein. Retrieved 1 October 2020, https://www.ncbi.nlm.nih.gov/protein/72256271
  7. iCn3D: Web-based 3D Structure Viewer 2AJF. (2020). Retrieved 6 October 2020, from https://www.ncbi.nlm.nih.gov/Structure/icn3d/full.html?&mmdbid=35213&bu=1&showanno=1
  8. iCn3D: Web-based 3D Structure Viewer 3SCK. (2020). Retrieved 1 October 2020, from https://www.ncbi.nlm.nih.gov/Structure/icn3d/full.html?pdbid=%203SCK
  9. Uniprot. (2020). S - Spike glycoprotein precursor - Severe acute respiratory syndrome coronavirus 2 (2019-nCoV) - S gene & protein. Retrieved 1 October 2020, from https://www.uniprot.org/uniprot/P0DTC2
  10. Andersen, K.G., Rambaut, A., Lipkin, W.I. et al. The proximal origin of SARS-CoV-2. Nat Med 26, 450–452 (2020). https://doi.org/10.1038/s41591-020-0820-9

JT Correy Template

BIOL368/F20

JT Correy Template

Weekly Assignments

Individual Journal Pages

Class Journal Pages