Anna Horvath Week 4

From OpenWetWare
Jump to navigationJump to search

Purpose

Understand how to find genetic sequences using GenBank and build phylogenetic trees for the purpose of inferring relationships. These skills will allow us to work on independent projects, incorporating our understanding of working with sequences.

Methods/Results

Part 1: GenBank

  • I went to GenBank in order to analyze the Bat SARS coronavirus Rf1 from the .
  • I found the full FASTA sequence and viewed the full record
    • The accession number was DQ412042.1
    • The GenBank record provides the complete sequence of the Bat SARS coronavirus, the accession number, and the original authors who published the sequence.
  • I downloaded the nucleotide sequence in FASTA format to my local hard drive.
    • I did this by clicking Send to in the upper right of the page. Then, I chose Complete Record, File as the Destination, and FASTA as the format. I then clicked the Create File button.
  • Spike Protein [Bat SARS CoV Rf1/2004]
MKILIFAFLVTLVKAQEGCGVINLRTQPKLTQVSSSRRGVYYNDDIFRSDVLHLTQDYFLPFHSNLTQYF
SLNIESDKIVYFDNPILKFGDGVYFAATEKSNVIRGWVFGSTFDNTTQSAIIVNNSTHIIIRVCYFNLCK
DPMYTVSAGTQKSSWVYQSAFNCTYDRVEKSFQLDTSPKTGNFTDLREFVFKNRDGFFTAYQTYTPVNLL
RGLPSGLSVLKPILKLPFGINITSFRVVMAMFSKTTSNYVPESAAYYVGNLKQSTFMLSFNQNGTIVDAV
DCSQDPLAELKCTTKSFNVSKGIYQTSNFRVSPVTEVVRFPNITNLCPFDKVFNATRFPSVYAWERTKIS
DCVADYTVFYNSTSFSTFNCYGVSPSKLIDLCFTSVYADTFLIRFSEVRQVAPGQTGVIADYNYKLPDDF
TGCVIAWNTAKQDVGSYFYRSHRSSKLKPFERDLSSEENGVRTLSTYDFNQNVPLEYQATRVVVLSFELL
NAPATVCGPKLSTSLVKNQCVNFNFNGFKGTGVLTDSSKTFQSFQQFGRDASDFTDSVRDPQTLRILDIS
PCSFGGVSVITPGTNTSSAVAVLYQDVNCTDVPRTIQADQLAPSWRVYTTGPYVFQTQAGCLIGAEHVNA
SYQCDIPIGAGICASYHTASHLRSTGQKSIVAYTMSLGAENSVAYANNSIAIPTNFSISVTTEVMPVSMA
KTSVDCTMYICGDSLECSNLLLQYGSFCTQLNRALSGIAVEQDKNTQEVFAQVKQMYKTPTIRDFGGFNF
SQILPDPLKPTKRSFIEDLLYNKVTLADAGFMKQYADCLGGINARDLICAQKFNGLTVLPPLLTDDMIAA
YTAALISGTATAGWTFGAGAALQIPFAMQMAYRFNGIGVTQNVLYENQKQIANQFNKAITQIQESLTTTS
TALGKLQDVVNQNAQALNTLVKQLSSNFGAISSALNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQL
IRAAEIRASANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPSQEKNFTTAPAIC
HEGKAYFPREGVFVSNGSSWFITQRNFYSPQIITTDNTFVAGSCDVVIGIINNTVYDPLQPELDSFKQEL
DKYFKNHTSPDVDLGDISGINASVVDIQKEIDRLNEVAKNLNESLIDLQELGKYEQYIKWPWYVWLGFIA
GLVGLFMAIILLCYFTSCCSCCKGMCSCGSCCRFDEDDSEPVLKGVKLHYT

Part 2: Creating a phylogenetic tree with Phylogeny.fr

  • I went to the website www.phylogeny.fr. Then, I clicked 'Phylogeny analysis’, and clicked on the text ‘One Click'.
  • Then, I clicked on ‘Upload your set of sequences in FASTA, EMBL, or NEXUS format’. I copied the protein sequences from Week 4 Talk Page.
  • I used Command-V to paste my sequences in the field and clicked 'Submit".
    • In order to properly align the sequences, I first pasted them into a Word document.
  • I found the numbered tabs located just beneath the text One Click Mode, and clicked on the tab labeled 3. Alignment. Prior to this, I saw the pages named Alignment results, Phylogeny results, and Tree rendering results.
  • Positions are color-coded to indicate their conservation. Blue highlighting meant high conservation (the sequences are identical or very similar), gray highlighting means lower conservation, and white highlighting means little conservation.
  • Under Outputs, I clicked on Alignment in Clustal Format.
    • This showed my sequences with the amount of conservation indicated below them. The amount of conservation corresponded to the color-coded highlights shown above.
    • Key:
      • “*” for invariant
      • “:” for highly conserved
      • “.” for weakly conserved
      • Space for not conserved
    • Below are the class' alignments
QDF43825.1      ---------MKLLVLV-----FATLVSSYTIEKCTDFD------DRTPPSNTQFLSSHRG
AGZ48818.1      ---------MKLLVLV-----FATLVSSYTIEKCLDFD------DRTPPANTQFLSSHRG
ALK02457.1      ----------MFIFLF-----FLTLTSGSDLESCTTFD------DVQAPNYPQHSSSRRG
AAS10463.1      ----------MFIFLL-----FLTLTSGSDLDRCTTFD------DVQAPNYTQHTSSMRG
AAP13441.1      ----------MFIFLL-----FLTLTSGSDLDRCTTFD------DVQAPNYTQHTSSMRG
AAP13567.1      ----------MFIFLL-----FLTLTSGSDLDRCTTFD------DVQAPNYTQHTSSMRG
QHD43416.1      ----------MFVFLV-----LLPLVSSQ----CVNLT------TRTQLPPAYTNSFTRG
AVP78031.1      -----------MLFFL-----FLQFALVN--SQCVNLT------GRTPLNPNYTNSSQRG
ABD75323.1      --------MKILIFAF-----LVTLVKAQ--EGCGVIN------LRTQPKLTQVSSSRRG
QDF43835.1      --------MKVLIVLL-----CLGLVTAQ--DGCGHIS------TKPQPLLDKFSSSRRG
ABD75332.1      --------MKVLIFAL-----LFSLAKAQ--EGCGIIS------RKPQPKMEKVSSSRRG
QDF43820.1      --------MKILIFAF-----LVTLVEAQ--EGCGIIS------RKPQPKMAQVSSSRRG
AAZ67052.1      --------MKILILAF-----LASLAKAQ--EGCGIIS------RKPQPKMAQVSSSRRG
AFS88936.1      ----MIHSVFLLMFLLTPTESYVDVGPDSVKSACIEVDIQQTFFDKTWPRPIDVSKA-DG
YP_0010399      MTLLMCLLMSLLIFVRGCDSQFVDMSPASNTSECLESQVDAAAFSKLMWPYPIDPSKVDG
                           ::.          .        *                     .   *
QDF43825.1      VYYPDDIFRSNVLHLVQDHFLPFDSNVTRFITFGLN-------------FDN---PIIPF
AGZ48818.1      VYYPDDIFRSNVLHLVQDHFLPFDSNVTRFITFGLN-------------FDN---PIIPF
ALK02457.1      VYYPDEIFRSDTLYLTQDLFLPFYSNVTGFHTINHR-------------FDN---PVIPF
AAS10463.1      VYYPDEIFRSDTLYLTQDLFLPFYSNVTGFHTINHT-------------FDD---PVIPF
AAP13441.1      VYYPDEIFRSDTLYLTQDLFLPFYSNVTGFHTINHT-------------FGN---PVIPF
AAP13567.1      VYYPDEIFRSDTLYLTQDLFLPFYSNVTGFHTINHT-------------FDN---PVIPF
QHD43416.1      VYYPDKVFRSSVLHSTQDLFLPFFSNVTWFHAIHVS------GTNGTKRFDN---PVLPF
AVP78031.1      VYYPDTIYRSDTLVLSQGYFLPFYSNVSWYYSLTTN-------NAATKRTDN---PILDF
ABD75323.1      VYYNDDIFRSDVLHLTQDYFLPFHSNLTQYFSLNIE-------SDKIVYFDN---PILKF
QDF43835.1      VYYNDDIFRSDVLHLTQDYFLPFDTNLTRYLSFNMD-------SATKVYFDN---PTLPF
ABD75332.1      VYYNDDIFRSDVLHLTQDYFLPFDSNLTQYFSLNID-------SNKYTYFDN---PILDF
QDF43820.1      VYYNDDIFRSDVLHLTQDYFLPFDSNLTQYFSLNVD-------SDRYTYFDN---PILDF
AAZ67052.1      VYYNDDIFRSNVLHLTQDYFLPFDSNLTQYFSLNVD-------SDRFTYFDN---PILDF
AFS88936.1      IIYPQGRTYSNITITYQGLF-PYQGDHGDMYVYSAG--HATGTTPQKLFVANYSQDVKQF
YP_0010399      IIYPLGRTYSNITLAYTGLF-PLQGDLGSQYLYSVSHAVGHDGDPTKAYISNYSLLVNDF
                : *      *.      . * *   :                         :       *
QDF43825.1      RDGVYF----AATEKSNVIRG-------------WVFGSTMNNKSQ---------SVIIM
AGZ48818.1      KDGIYF----AATEKSNVIRG-------------WVFGSTMNNKSQ---------SVIIM
ALK02457.1      KDGVYF----AATEKSNVVRG-------------WVFGSTMNNKSQ---------SVIII
AAS10463.1      KDGIYF----AATEKSNVVRG-------------WVFGSTMNNKSQ---------SVIII
AAP13441.1      KDGIYF----AATEKSNVVRG-------------WVFGSTMNNKSQ---------SVIII
AAP13567.1      KDGIYF----AATEKSNVVRG-------------WVFGSTMNNKSQ---------SVIII
QHD43416.1      NDGVYF----ASTEKSNIIRG-------------WIFGTTLDSKTQ---------SLLIV
AVP78031.1      KDGIYF----AATEHSNIIRG-------------WIFGTTLDNTSQ---------SLLIV
ABD75323.1      GDGVYF----AATEKSNVIRG-------------WVFGSTFDNTTQ---------SAIIV
QDF43835.1      GDGIYF----AATEKSNVVRG-------------WIFGSTMDNTTQ---------SAIIV
ABD75332.1      GDGVYF----AATEKSNVIRG-------------WIFGSSFDNTTQ---------SAIIV
QDF43820.1      GDGVYF----AATEKSNVIRG-------------WIFGSTFDNTTQ---------SAVIV
AAZ67052.1      GDGVYF----AATEKSNVIRG-------------WIFGSTFDNTTQ---------SAVIV
AFS88936.1      ANGFVVRIGAAANSTGTVIISPSTSATIRKIYPAFMLGSSVGNFSDGKMGRFFNHTLVLL
YP_0010399      DNGFVVRIGAAANSTGTIVISPSVNTKIKKAYPAFILGSSLTNTSAGQ-PLYANYSLTII
                :*. .    *:.. ..:: .             :::*::. . :          :  ::
QDF43825.1      NNSTNLVIRACNFELCDNPFFVVLRSNNTQIPSY------IFNNAFN-CTFEYVSKDFNL
AGZ48818.1      NNSTNLVIRACNFELCDNPFFVVLKSNNTQIPSY------IFNNAFN-CTFEYVSKDFNL
ALK02457.1      NNSTNVVIRACNFELCDNPFFAVSKPTGTQTHTM------IFDNAFN-CTFEYISDSFSL
AAS10463.1      NNSTNVVIRACNFELCDNPFFVVSKPMGTRTHTM------IFDNAFN-CTFEYISDAFSL
AAP13441.1      NNSTNVVIRACNFELCDNPFFAVSKPMGTQTHTM------IFDNAFN-CTFEYISDAFSL
AAP13567.1      NNSTNVVIRACNFELCDNPFFAVSKPMGTQTHTM------IFDNAFN-CTFEYISDAFSL
QHD43416.1      NNATNVVIKVCEFQFCNDPFLGVYY--HKNNKSWMESEFRVYSSANN-CTFEYVSQPFLM
AVP78031.1      NNATNVIIKVCNFDFCYDP-YLSGY--YHNNKTWSIREFAVYSSYAN-CTFEYVSKSFML
ABD75323.1      NNSTHIIIRVCYFNLCKDPMYTVSA--GTQKSSW------VYQSAFN-CTYDRVEKSFQL
QDF43835.1      NNSTHIIIRVCYFNLCKEPMYAISN--EQHYKSW------VYQNAYN-CTYDRVEQSFQL
ABD75332.1      NNSTHIIIRVCNFNLCKEPMYTVSK--GTQQSSW------VYQSAFN-CTYDRVEKSFQL
QDF43820.1      NNSTHIIIRVCNFNLCKEPMYTVSR--GTQQSSW------VYQSAFN-CTYDRVERSFQL
AAZ67052.1      NNSTHIIIRVCNFNLCKEPMYTVSR--GAQQSSW------VYQSAFN-CTYDRVEKSFQL
AFS88936.1      PDGCGTLLRAFYCIL--EPRSGNHCPAGNSYTSF-----ATYHTPATDCSDGNYNRNASL
YP_0010399      PDGCGTVLHAFYCIL--KPRTVNRCPSGTGYVSY-----FIYETVHNDCQ-STINRNASL
                :.   ::..    :  .*             :        : .  . *     .    :
QDF43825.1      DIGEKPGNFKDLREFVFRNKDG--------FLHVYSGYQPISAASGLPTGF--NALKPIF
AGZ48818.1      DLGEKPGNFKDLREFVFRNKDG--------FLHVYSGYQPISAASGLPTGF--NALKPIF
ALK02457.1      DVAEKSGNFKHLREFVFKNKDG--------FLYVYKGYQPIDVVRDLPSGF--NILKPIF
AAS10463.1      DVSEKSGNFKHLREFVFKNKDG--------FLYVYKGYQPIDVVRDLPSGF--NTLKPIF
AAP13441.1      DVSEKSGNFKHLREFVFKNKDG--------FLYVYKGYQPIDVVRDLPSGF--NTLKPIF
AAP13567.1      DVSEKSGNFKHLREFVFKNKDG--------FLYVYKGYQPIDVVRDLPSGF--NTLKPIF
QHD43416.1      DLEGKQGNFKNLREFVFKNIDG--------YFKIYSKHTPINLVRDLPQGF--SALEPLV
AVP78031.1      NISGNGGLFNTLREFVFRNVDG--------HFKIYSKFTPVNLNRGLPTGL--SVLQPLV
ABD75323.1      DTSPKTGNFTDLREFVFKNRDG--------FFTAYQTYTPVNLLRGLPSGL--SVLKPIL
QDF43835.1      DTAPQTGNFKDLREYVFKNKDG--------FLSVYNAYSPIDIPRGLPVGF--SVLKPIL
ABD75332.1      DTAPKTGNFKDLREYVFKNKGG--------FLRVYQTYTAVNLPRGFPAGF--SVLRPIL
QDF43820.1      DTAPKTGNFKDLREYVFKNRDG--------FLSVYQTYTAVNLPRGLPIGF--SVLRPIL
AAZ67052.1      DTAPKTGNFKDLREYVFKNRDG--------FLSVYQTYTAVNLPRGLPIGF--SVLRPIL
AFS88936.1      NSFKE---YFNLRNCTFMYTYNITEDEILEWFGITQTAQGVHLFSSRYVDLYGGNMFQFA
YP_0010399      NSFK---SFFDLVNCTFFNSWDITADETKEWFGITQDTQGVHLYSSRKGDLYGGNMFRFA
                :       :  * : .*    .         :   .    :    .   .:  . :  : 
QDF43825.1      KLPLGINITNFRTLLTAF------PPNPGYWGTSAAAYFVGYLKPTTFMLKYDENGTITD
AGZ48818.1      KLPLGINITNFRTLLTAF------PPRPDYWGTSAAAYFVGYLKPTTFMLKYDENGTITD
ALK02457.1      KLPLGINITNFRAILTAF------LPAQDTWGTSAAAYFVGYLKPATFMLKYDENGTITD
AAS10463.1      KLPLGINITNFRAILTAF------SPAQDTWGTSAAAYFVGYLKPTTFMLKYDENGTITD
AAP13441.1      KLPLGINITNFRAILTAF------SPAQDIWGTSAAAYFVGYLKPTTFMLKYDENGTITD
AAP13567.1      KLPLGINITNFRAILTAF------SPAQDTWGTSAAAYFVGYLKPTTFMLKYDENGTITD
QHD43416.1      DLPIGINITRFQTLLALHRSYLTPGDSSSGWTAGAAAYYVGYLQPRTFLLKYNENGTITD
AVP78031.1      ELPVSINITKFRTLLTIHRGD---PMPNNGWTAFSAAYFVGYLKPRTFMLKYNENGTITD
ABD75323.1      KLPFGINITSFRVVMAMF------SKTTSNYVPESAAYYVGNLKQSTFMLSFNQNGTIVD
QDF43835.1      KLPIGINITSFKVVMSMF------SRTTSNFLPEVAAYFVGNLKYSTFMLNFNENGTITD
ABD75332.1      KLPFGINITSYRVVMTMF------SQFNSNFLPESAAYYVGNLKYTTFMLSFNENGTITD
QDF43820.1      KLPFGINITSYRVVMAMF------SQTTSNFLPESAAYYVGNLKYTTFMLRFNENGTITD
AAZ67052.1      KLPFGINITSYRVVMAMF------SQTTSNFLPESAAYYVGNLKYTTFMLSFNENGTITN
AFS88936.1      TLPVYDTIKYYSIIPHSIRSI---QSDRKAW----AAFYVYKLQPLTFLLDFSVDGYIRR
YP_0010399      TLPVYEGIKYYTVIPRSFRSK---ANKREAW----AAFYVYKLHQLTYLLDFSVDGYIRR
                 **.   *. :  :                :    **::*  *:  *::* :. :* *  
QDF43825.1      AVDCSQNPLAELKCSVKSFEIDKGIYQTSNFRVAPSKEVVRFPNITNLCPFGEVFNATTF
AGZ48818.1      AVDCSQNPLAELKCSVKSFEIDKGIYQTSNFRVAPSKEVVRFPNITNLCPFGEVFNATTF
ALK02457.1      AVDCSQNPLAELKCSVKSFEIDKGIYQTSNFRVAPSKEVVRFPNITNLCPFGEVFNATTF
AAS10463.1      AVDCSQNPLAELKCSVKSFEIDKGIYQTSNFRVVPSGDVVRFPNITNLCPFGEVFNATKF
AAP13441.1      AVDCSQNPLAELKCSVKSFEIDKGIYQTSNFRVVPSGDVVRFPNITNLCPFGEVFNATKF
AAP13567.1      AVDCSQNPLAELKCSVKSFEIDKGIYQTSNFRVVPSGDVVRFPNITNLCPFGEVFNATKF
QHD43416.1      AVDCALDPLSETKCTLKSFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFGEVFNATRF
AVP78031.1      AVDCALDPLSETKCTLKSLTVQKGIYQTSNFRVQPTQSVVRFPNITNVCPFHKVFNATRF
ABD75323.1      AVDCSQDPLAELKCTTKSFNVSKGIYQTSNFRVSPVTEVVRFPNITNLCPFDKVFNATRF
QDF43835.1      AIDCAQNPLSELKCTIKNFNVSKGIYQTSNFRVSPTHEVIRFPNITNRCPFDKVFNASRF
ABD75332.1      AVDCSQNPLAELKCTIKNFNVSKGIYQTSNFRVTPTQEVVRFPNITNRCPFDKVFNASRF
QDF43820.1      AIDCAQNPLAELKCTIKNFNVSKGIYQTSNFRVSPTQEVVRFPNITNRCPFDKVFNASRF
AAZ67052.1      AIDCAQNPLAELKCTIKNFNVSKGIYQTSNFRVSPTQEVIRFPNITNRCPFDKVFNATRF
AFS88936.1      AIDCGFNDLSQLHCSYESFDVESGVYSVSSFEAKPSGSVVEQAEGVE-CDFSPLLSGTP-
YP_0010399      AIDCGHDDLSQLHCSYTSFEVDTGVYSVSSYEASATGTFIEQPNATE-CDFSPMLTGVA-
                *:**. : *:: :*:  .: :..*:*..*.: . .   .:  .: .: * *  ::..   
QDF43825.1      PSVYAWERKRISNCVADYSVLYNSTSFSTFKCYGVSATKLNDLCFSNVYADSFVVKGDDV
AGZ48818.1      PSVYAWERKRISNCVADYSVLYNSTSFSTFKCYGVSATKLNDLCFSNVYADSFVVKGDDV
ALK02457.1      PSVYAWERKRISNCVADYSVLYNSTSFSTFKCYGVSATKLNDLCFSNVYADSFVVKGDDV
AAS10463.1      PSVYAWERKRISNCVADYSVLYNSTSFSTFKCYGVSATKLNDLCFSNVYADSFVVKGDDV
AAP13441.1      PSVYAWERKKISNCVADYSVLYNSTFFSTFKCYGVSATKLNDLCFSNVYADSFVVKGDDV
AAP13567.1      PSVYAWERKKISNCVADYSVLYNSTFFSTFKCYGVSATKLNDLCFSNVYADSFVVKGDDV
QHD43416.1      ASVYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLNDLCFTNVYADSFVIRGDEV
AVP78031.1      PSVYAWERTKISDCIADYTVFYNSTSFSTFKCYGVSPSKLIDLCFTSVYADTFLIRFSEV
ABD75323.1      PSVYAWERTKISDCVADYTVFYNSTSFSTFNCYGVSPSKLIDLCFTSVYADTFLIRFSEV
QDF43835.1      PNVYAWERTKISDCVADYTVLYNSTSFSTFKCYGVSPSKLIDLCFTSVYADTFLIRSSEV
ABD75332.1      PNVYAWERTKISDCVADYTVLYNSTSFSTFKCYGVSPSKLIDLCFTSVYADTFLIRSSEV
QDF43820.1      PNVYAWERTKISDCVADYTVLYNSTSFSTFKCYGVSPSKLIDLCFTSVYADTFLIRSSEV
AAZ67052.1      PNVYAWERTKISDCVADYTVLYNSTSFSTFKCYGVSPSKLIDLCFTSVYADTFLIRSSEV
AFS88936.1      PQVYNFKRLVFTNCNYNLTKLLSLFSVNDFTCSQISPAAIASNCYSSLILDYFSYPLSMK
YP_0010399      PQVYNFKRLVFSNCNYNLTKLLSLFAVDEFSCNGISPDSIARGCYSTLTVDYFAYPLSMK
                ..** ::*  :::*  : : : .   .. *.*  :*.  :   *::.:  * *    .  
QDF43825.1      RQIAPGQTGVIADYNYKLPDDFMGC-VLAWNTRNIDATSTGNYNYKYRSLRHGKLRPFER
AGZ48818.1      RQIAPGQTGVIADYNYKLPDDFTGC-VLAWNTRNIDATQTGNYNYKYRSLRHGKLRPFER
ALK02457.1      RQIAPGQTGVIADYNYKLPDDFTGC-VLAWNTRNIDATQTGNYNYKYRSLRHGKLRPFER
AAS10463.1      RQIAPGQTGVIADYNYKLPDDFMGC-VLAWNTRNIDATSTGNYNYKYRYLRHGKLRPFER
AAP13441.1      RQIAPGQTGVIADYNYKLPDDFMGC-VLAWNTRNIDATSTGNYNYKYRYLRHGKLRPFER
AAP13567.1      RQIAPGQTGVIADYNYKLPDDFMGC-VLAWNTRNIDATSTGNYNYKYRYLRHGKLRPFER
QHD43416.1      RQIAPGQTGKIADYNYKLPDDFTGC-VIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFER
AVP78031.1      RQVAPGQTGVIADYNYKLPDDFTGC-VIAWNTAKQD---VGNYF--YRSHRSTKLKPFER
ABD75323.1      RQVAPGQTGVIADYNYKLPDDFTGC-VIAWNTAKQD---VGSYF--YRSHRSSKLKPFER
QDF43835.1      RQVAPGETGVIADYNYKLPDDFTGC-VIAWNTAKQD---QGQYY--YRSSRKTKLKPFER
ABD75332.1      RQVAPGETGVIADYNYKLPDDFTGC-VIAWNTAQQD---QGQYY--YRSYRKEKLKPFER
QDF43820.1      RQVAPGETGVIADYNYKLPDDFTGC-VIAWNTAKQD---TGHYY--YRSHRKTKLKPFER
AAZ67052.1      RQVAPGETGVIADYNYKLPDDFTGC-VIAWNTAKQD---QGQYY--YRSHRKTKLKPFER
AFS88936.1      SDLSVSSAGPISQFNYKQSFSNPTC-LILATVPHNLTTITKPLKYSYINKCSRLLSDDRT
YP_0010399      SYIRPGSAGNIPLYNYKQSFANPTCRVMASVLANVTITKPHAYG--YIS-KCSRLTGANQ
                  :  ..:* *. :*** .     * ::     :            *       *     
QDF43825.1      DISNVPFSPDGKPCTPP-AF-NCYW-----------PLNDYGFFTTNGIGYQPYRVVVLS
AGZ48818.1      DISNVPFSPDGKPCTPP-AF-NCYW-----------PLNDYGFYITNGIGYQPYRVVVLS
ALK02457.1      DISNVPFSPDGKPCTPP-AF-NCYW-----------PLNDYGFYITNGIGYQPYRVVVLS
AAS10463.1      DISNVPFSPDGKPCTPP-AP-NCYW-----------PLNGYGFYTTSGIGYQPYRVVVLS
AAP13441.1      DISNVPFSPDGKPCTPP-AL-NCYW-----------PLNDYGFYTTTGIGYQPYRVVVLS
AAP13567.1      DISNVPFSPDGKPCTPP-AL-NCYW-----------PLNDYGFYTTTGIGYQPYRVVVLS
QHD43416.1      DISTEIYQAGSTPCNGVEGF-NCYF-----------PLQSYGFQPTNGVGYQPYRVVVLS
AVP78031.1      DLSSDE---------------NGVR-----------TLSTYDFNPNVPLEYQATRVVVLS
ABD75323.1      DLSSEE---------------NGVR-----------TLSTYDFNQNVPLEYQATRVVVLS
QDF43835.1      DLTSDE---------------NGVR-----------TLSTYDFYPNVPIEYQATRVVVLS
ABD75332.1      DLSSDE---------------NGVY-----------TLSTYDFYPSIPVEYQATRVVVLS
QDF43820.1      DLSSDDG--------------NGVY-----------TLSTYDFNPNVPVAYQATRVVVLS
AAZ67052.1      DLSSDE---------------NGVR-----------TLSTYDFYPSVPVAYQATRVVVLS
AFS88936.1      EVPQLVNANQYSPCVSI-VP-STVWEDGDYYRKQLSPLEGGGWLVASGSTVAMTEQLQMG
YP_0010399      DVETPLYINPGEYSICRDFSPGGFSEDGQVFKRTLTQFEGGGLLIGVGTRVPMTDNLQMS
                ::                   .               :.  .              : :.
QDF43825.1      FELL----NAPATVC-----GPKLSTDLIKNQCVNFNFNGLTGTGVLTPSSKRFQPFQQF
AGZ48818.1      FELL----NAPATVC-----GPKLSTDLIKNQCVNFNFNGLTGTGVLTPSSKRFQPFQQF
ALK02457.1      FELL----NAPATVC-----GPKLSTDLIKNQCVNFNFNGLTGTGVLTPSSKRFQPFQQF
AAS10463.1      FELL----NAPATVC-----GPKLSTDLIKNQCVNFNFNGLTGTGVLTPSSKRFQPFQQF
AAP13441.1      FELL----NAPATVC-----GPKLSTDLIKNQCVNFNFNGLTGTGVLTPSSKRFQPFQQF
AAP13567.1      FELL----NAPATVC-----GPKLSTDLIKNQCVNFNFNGLTGTGVLTPSSKRFQPFQQF
QHD43416.1      FELL----HAPATVC-----GPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKKFLPFQQF
AVP78031.1      FELL----NAPATVC-----GPKLSTQLVKNQCVNFNFNGLKGTGVLTDSSKRFQSFQQF
ABD75323.1      FELL----NAPATVC-----GPKLSTSLVKNQCVNFNFNGFKGTGVLTDSSKTFQSFQQF
QDF43835.1      FELL----NAPATVC-----GPKLSTGLVKNQCVNFNFNGLRGTGVLTDSSKRFQSFQQF
ABD75332.1      FELL----NAPATVC-----GPKLSTQLVKNQCVNFNFNGLRGTGVLTTSSKRFQSFQQF
QDF43820.1      FELL----NAPATVC-----GPKLSTQLVKNQCVNFNFNGLKGTGVLTDSSKRFQSFQQF
AAZ67052.1      FELL----NAPATVC-----GPKLSTQLVKNQCVNFNFNGLKGTGVLTESSKRFQSFQQF
AFS88936.1      FGITVQYGTDTNSVCPKLEFANDTKIASQLGNCVEYSLYGVSGRGVFQNCTAVGVRQQRF
YP_0010399      FIISVQYGTGTDSVCPMLDLGDSLTITNRLGKCVDYSLYGVTGRGVFQNCTAVGVKQQRF
                * :       . :**     . . .     .:**::.: *. * **:  ..      *.*
QDF43825.1      GRDVSD-FTDSVRDPKTSEILDISPCSFGGVSVITPGTNTSSEVAVLYQDVNCTDVPVAI
AGZ48818.1      GRDVSD-FTDSVRDPKTSEILDISPCSFGGVSVITPGTNTSSEVAVLYQDVNCTDVPVAI
ALK02457.1      GRDVLD-FTDSVRDPKTSEILDISPCSFGGVSVITPGTNTSSEVAVLYQDVNCTDVPVAI
AAS10463.1      GRDVSD-FTDSVRDPKTSEILDISPCSFGGVSVITPGTNASSEVAVLYQDVNCTDVSTLI
AAP13441.1      GRDVSD-FTDSVRDPKTSEILDISPCSFGGVSVITPGTNASSEVAVLYQDVNCTDVSTAI
AAP13567.1      GRDVSD-FTDSVRDPKTSEILDISPCSFGGVSVITPGTNASSEVAVLYQDVNCTDVSTAI
QHD43416.1      GRDIAD-TTDAVRDPQTLEILDITPCSFGGVSVITPGTNTSNQVAVLYQDVNCTEVPVAI
AVP78031.1      GKDASD-FIDSVRDPQTLEILDITPCSFGGVSVITPGTNTSLEVAVLYQDVNCTDVPTTI
ABD75323.1      GRDASD-FTDSVRDPQTLRILDISPCSFGGVSVITPGTNTSSAVAVLYQDVNCTDVPRTI
QDF43835.1      GRDTSD-FTDSVRDPQTLEILDITPCSFGGVSVITPGTNASSEVAVLYQDVNCTDVPTAI
ABD75332.1      GRDTSD-FTDSVRDPQTLEILDISPCSFGGVSVITPGTNASSEVAVLYQDVNCTDVPTSI
QDF43820.1      GRDTSD-FTDSVRDPQTLEILDITPCSFGGVSVITPGTNASSEVAVLYQDVNCTDVPTAI
AAZ67052.1      GRDTSD-FTDSVRDPQTLEILDISPCSFGGVSVITPGTNASSEVAVLYQDVNCTDVPAAI
AFS88936.1      VYDAYQNLVGYYSDDGNYYCLR--ACVSVPVSVIY--DKETKTHATLFGSVACEHISSTM
YP_0010399      VYDSFDNLVGYYSDDGNYYCVR--PCVSVPVSVIY--DKSTNLHATLFGSVACEHVTTMM
                  *  :   .   *  .   :   .*    ****    : :   *.*: .* *  :.  :
QDF43825.1      --HADQLTPAWRIYSTGNNVFQTQAGCLIGAEHVDTSY---ECDIPIGAGICASYHTVSS
AGZ48818.1      --HADQLTPSWRVYSTGNNVFQTQAGCLIGAEHVDTSY---ECDIPIGAGICASYHTVSS
ALK02457.1      --HADQLTPSWRVYSTGNNVFQTQAGCLIGAEHVDTSY---ECDIPIGAGICASYHTVSS
AAS10463.1      --HAEQLTPAWRIYSTGNNVFQTQAGCLIGAEHVDTSY---ECDIPIGAGICASYHTVSS
AAP13441.1      --HADQLTPAWRIYSTGNNVFQTQAGCLIGAEHVDTSY---ECDIPIGAGICASYHTVSL
AAP13567.1      --HADQLTPAWRIYSTGNNVFQTQAGCLIGAEHVDTSY---ECDIPIGAGICASYHTVSL
QHD43416.1      --HADQLTPTWRVYSTGSNVFQTRAGCLIGAEHVNNSY---ECDIPIGAGICASYQTQTN
AVP78031.1      --HADQLTPAWRIYATGTNVFQTQAGCLIGAEHVNASY---ECDIPIGAGICASYHTASI
ABD75323.1      --QADQLAPSWRVYTTGPYVFQTQAGCLIGAEHVNASY---QCDIPIGAGICASYHTASH
QDF43835.1      --RADQLTPAWRVYSTGINVFQTQAGCLIGAEHVNASY---ECDIPIGAGICASYHTAST
ABD75332.1      --HADQLTPAWRVYSTGVNVFQTQAGCLIGAEHVNASY---ECDIPIGAGICASYHTASV
QDF43820.1      --RADQLTPAWRVYSTGVNVFQTQAGCLIGAEHVNASY---ECDIPIGAGICASYHTAST
AAZ67052.1      --HADQLTPAWRVYSTGTNVFQTQAGCLIGAEHVNASY---ECDIPIGAGICASYHTAST
AFS88936.1      SQYSRSTRSMLKRRDSTYGPLQTPVGCVLGL--VNSSLFVEDCKLPLGQSLCALPDTPST
YP_0010399      S-QFSRLTQSNLRRRDSNIPLQTAVGCVIGLS--NNSLVVSDCKLPLGQSLCAV-PPVST
                                    :** .**::*    : *    :*.:*:* .:**   . : 
QDF43825.1      ----LRSTS----QKSI--------VAYTMSLGADSSIAYSNNTIAIPTNFSISITTEVM
AGZ48818.1      ----LRSTS----QKSI--------VAYTMSLGADSSIAYSNNTIAIPTNFSISITTEVM
ALK02457.1      ----LRSTS----QKSI--------VAYTMSLGADSSIAYSNNTIAIPTNFSISITTEVM
AAS10463.1      ----LRSTS----QKSI--------VAYTMSLGADSSIAYSNNTIAIPTNFSISITTEVM
AAP13441.1      ----LRSTS----QKSI--------VAYTMSLGADSSIAYSNNTIAIPTNFSISITTEVM
AAP13567.1      ----LRSTS----QKSI--------VAYTMSLGADSSIAYSNNTIAIPTNFSISITTEVM
QHD43416.1      SPRRARSVA----SQSI--------IAYTMSLGAENSVAYSNNSIAIPTNFTISVTTEIL
AVP78031.1      ----LRSTS----QKAI--------VAYTMSLGAENSIAYANNSIAIPTNFSISVTTEVM
ABD75323.1      ----LRSTG----QKSI--------VAYTMSLGAENSVAYANNSIAIPTNFSISVTTEVM
QDF43835.1      ----LRSVG----QKSI--------VAYTMSLGAENSIAYANNSIAIPTNFSISVTTEVM
ABD75332.1      ----LRSTG----QKSI--------VAYTMSLGAENSIAYANNSIAIPTNFSISVTTEVM
QDF43820.1      ----LRSVG----QKSI--------VAYTMSLGAENSIAYANNSIAIPTNFSISVTTEVM
AAZ67052.1      ----LRSVG----QKSI--------VAYTMSLGAENSIAYANNSIAIPTNFSISVTTEVM
AFS88936.1      ----LTPRS----VRSVPGEMRLASIAFNHPIQVDQ-LNSSYFKLSIPTNFSFGVTQEYI
YP_0010399      ----FRSYSASQFQLAV--------LNYTSPIVV-TPINSSGFTAAIPTNFSFSVTQEYI
                      . .      ::        : :. .: .   :  :  . :*****::.:* * :
QDF43825.1      PVSMAKTSVDCNMYICGDSTECANLLLQYGSFCTQLNRALSGIAVEQDRNTREVFAQVKQ
AGZ48818.1      PVSMAKTSVDCNMYICGDSTECANLLLQYGSFCTQLNRALSGIAVEQDRNTREVFAQVKQ
ALK02457.1      PVSMAKTSVDCNMYICGDSTECANLLLQYGSFCTQLNRALSGIAVEQDRNTREVFAQVKQ
AAS10463.1      PVSMAKTSVDCNMYICGDSTECANLLLQYGSFCRQLNRALSGIAAEQDRNTREVFVQVKQ
AAP13441.1      PVSMAKTSVDCNMYICGDSTECANLLLQYGSFCTQLNRALSGIAAEQDRNTREVFAQVKQ
AAP13567.1      PVSMAKTSVDCNMYICGDSTECANLLLQYGSFCTQLNRALSGIAAEQDRNTREVFAQVKQ
QHD43416.1      PVSMTKTSVDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGIAVEQDKNTQEVFAQVKQ
AVP78031.1      PVSMAKTSVDCTMYICGDSIECSNLLLQYGSFCTQLNRALSGIAIEQDKNTQEVFAQVKQ
ABD75323.1      PVSMAKTSVDCTMYICGDSLECSNLLLQYGSFCTQLNRALSGIAVEQDKNTQEVFAQVKQ
QDF43835.1      PVSMSKTSVDCTMYICGDSQECSNLLLQYGSFCTQLNRALTGIAIEQDKNTQEVFAQVKQ
ABD75332.1      PVSIAKTSVDCTMYICGDSLECSNLLLQYGSFCTQLNRALTGIAIEQDKNTQEVFAQVKQ
QDF43820.1      PVSMAKTSVDCTMYICGDSQECSNLLLQYGSFCTQLNRALTGVALEQDKNTQEVFAQVKQ
AAZ67052.1      PVSMAKTSVDCTMYICGDSLECSNLLLQYGSFCTQLNRALSGIAIEQDKNTQEVFAQVKQ
AFS88936.1      QTTIQKVTVDCKQYVCNGFQKCEQLLREYGQFCSKINQALHGANLRQDDSVRNLFASVKS
YP_0010399      ETSIQKVTVDCKQYVCNGFTRCEKLLVEYGQFCSKINQALHGANLRQDESVYSLYSNIKT
                 .:: *.:***. *:*..   * :** :**.** ::*.** *    ** .. .:: .:* 

QDF43825.1      MYKTPTLKD-FGG-FNFSQILPDPLKPTKRSF---IEDLLFNKVTLADAGFMKQYGECL-
AGZ48818.1      MYKTPTLKD-FGG-FNFSQILPDPLKPTKRSF---IEDLLFNKVTLADAGFMKQYGECL-
ALK02457.1      MYKTPTLKD-FGG-FNFSQILPDPLKPTKRSF---IEDLLFNKVTLADAGFMKQYGECL-
AAS10463.1      MYKTPTLKD-FGG-FNFSQILPDPLKPTKRSF---IEDLLFNKVTLADAGFMKQYGECL-
AAP13441.1      MYKTPTLKY-FGG-FNFSQILPDPLKPTKRSF---IEDLLFNKVTLADAGFMKQYGECL-
AAP13567.1      MYKTPTLKY-FGG-FNFSQILPDPLKPTKRSF---IEDLLFNKVTLADAGFMKQYGECL-
QHD43416.1      IYKTPPIKD-FGG-FNFSQILPDPSKPSKRSF---IEDLLFNKVTLADAGFIKQYGDCL-
AVP78031.1      IYKTPPIKD-FGG-FNFSQILPDPSKPSKRSF---IEDLLFNKVTLADAGFIKQYGDCL-
ABD75323.1      MYKTPTIRD-FGG-FNFSQILPDPLKPTKRSF---IEDLLYNKVTLADAGFMKQYADCL-
QDF43835.1      MYKTPAIKD-FGG-FNFSQILPDPSKPTKRSF---IEDLLFNKVTLADAGFMKQYGECL-
ABD75332.1      MYKTPAIKD-FGG-FNFSQILPDPSKPTKRSF---IEDLLFNKVTLADAGFMKQYGECL-
QDF43820.1      MYKTPAIKD-FGG-FNFSQILPDPSKPTKRSF---IEDLLFNKVTLADAGFMKQYGECL-
AAZ67052.1      MYKTPAIKD-FGG-FNFSQILPDPSKPTKRSF---IEDLLFNKVTLADAGFMKQYGECL-
AFS88936.1      SQSSPIIPG-FGGDFNLTLLEPVSISTGSRSARSAIEDLLFDKVTIADPGYMQGYDDCMQ
YP_0010399      T-STQTLEYGLNGDFNLTLLQVPQIGGSSSSYRSAIEDLLFDKVTIADPGYMQGYDDCMK
                  .:  :   :.* **:: :        . *    *****::***:**.*::: * :*: 
QDF43825.1      -GDINARDLICAQKFNGLTVLPPLLTDDMIAAYTAALVSGTATAGWTFGAGAALQIPFAM
AGZ48818.1      -GDINARDLICAQKFNGLTVLPPLLTDDMIAAYTAALVSGTATAGWTFGAGAALQIPFAM
ALK02457.1      -GDINARDLICAQKFNGLTVLPPLLTDDMIAAYTAALVSGTATAGWTFGAGAALQIPFAM
AAS10463.1      -GDINARDLICAQKFNGLTVLPPLLTDDMIAAYTAALVSGTATAGWTFGAGAALQIPFAM
AAP13441.1      -GDINARDLICAQKFNGLTVLPPLLTDDMIAAYTAALVSGTATAGWTFGAGAALQIPFAM
AAP13567.1      -GDINARDLICAQKFNGLTVLPPLLTDDMIAAYTAALVSGTATAGWTFGAGAALQIPFAM
QHD43416.1      -GDIAARDLICAQKFNGLTVLPPLLTDEMIAQYTSALLAGTITSGWTFGAGAALQIPFAM
AVP78031.1      -GGISARDLICAQKFNGLTVLPPLLTDEMIAAYTAALISGTATAGWTFGAGAALQIPFAM
ABD75323.1      -GGINARDLICAQKFNGLTVLPPLLTDDMIAAYTAALISGTATAGWTFGAGAALQIPFAM
QDF43835.1      -GDINARDLICAQKFNGLTVLPPLLTDDMIAAYTAALVSGTATAGWTFGAGAALQIPFAM
ABD75332.1      -GDISARDLICAQKFNGLTVLPPLLTDEMIAAYTAALVSGTATAGWTFGAGSALQIPFAM
QDF43820.1      -GDINARDLICAQKFNGLTVLPPLLTDDMIAAYTAALVSGTATAGWTFGAGAALQIPFAM
AAZ67052.1      -GDISARDLICAQKFNGLTVLPPLLTDEMIAAYTAALVSGTATAGWTFGAGSALQIPFAM
AFS88936.1      QGPASARDLICAQYVAGYKVLPPLMDVNMEAAYTSSLLGSIAGVGWTAGLSSFAAIPFAQ
YP_0010399      QGPQSARDLICAQYVSGYKVLPPLYDPNMEAAYTSSLLGSIAGAGWTAGLSSFAAIPFAQ
                 *   ******** . * .*****   :* * **::*:..    *** * .:   **** 
QDF43825.1      QMAYRFNGIGVTQNVLYENQKQIANQFNKAISQIQESLTTTSTALGKLQDVVNQNAQALN
AGZ48818.1      QMAYRFNGIGVTQNVLYENQKQIANQFNKAISQIQESLTTTSTALGKLQDVVNQNAQALN
ALK02457.1      QMAYRFNGIGVTQNVLYENQKQIANQFNKAISQIQESLTTTSTALGKLQDVVNQNAQALN
AAS10463.1      QMAYRFNGIGVTQNVLYENQKQIANQFNKAISQIQESLTTTSTALGKLQDVVNQNAQALN
AAP13441.1      QMAYRFNGIGVTQNVLYENQKQIANQFNKAISQIQESLTTTSTALGKLQDVVNQNAQALN
AAP13567.1      QMAYRFNGIGVTQNVLYENQKQIANQFNKAISQIQESLTTTSTALGKLQDVVNQNAQALN
QHD43416.1      QMAYRFNGIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTASALGKLQDVVNQNAQALN
AVP78031.1      QMAYRFNGIGVTQNVLYENQKLIANQFNSAIGKIQESLTSTASALGKLQDVVNQNAQALN
ABD75323.1      QMAYRFNGIGVTQNVLYENQKQIANQFNKAITQIQESLTTTSTALGKLQDVVNQNAQALN
QDF43835.1      QMAYRFNGIGVTQNVLYENQKQIANQFNKAISQIQESLTTTSTALGKLQDVVNQNAQALN
ABD75332.1      QMAYRFNGIGVTQNVLYENQKQIANQFNKAISQIQESLTTTSTALGKLQDVVNQNAQALN
QDF43820.1      QMAYRFNGIGVTQNVLYENQKQIANQFNKAISQIQESLTTTSTALGKLQDVVNQNAQALN
AAZ67052.1      QMAYRFNGIGVTQNVLYENQKQIANQFNKAISQIQESLTTTSTALGKLQDVVNQNAQALN
AFS88936.1      SIFYRLNGVGITQQVLSENQKLIANKFNQALGAMQTGFTTTNEAFQKVQDAVNNNAQALS
YP_0010399      SMFYRLNGVGITQQVLSENQKLIANKFNQALGAMQTGFTTSNLAFSKVQDAVNANAQALS
                .: **:**:*:**:** **** ***:**.*:  :* .::::  *: *:**.** *****.
QDF43825.1      TLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRA
AGZ48818.1      TLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRA
ALK02457.1      TLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRA
AAS10463.1      TLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRA
AAP13441.1      TLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRA
AAP13567.1      TLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRA
QHD43416.1      TLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRA
AVP78031.1      TLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRA
ABD75323.1      TLVKQLSSNFGAISSALNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRA
QDF43835.1      TLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRA
ABD75332.1      TLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRA
QDF43820.1      TLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRA
AAZ67052.1      TLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRA
AFS88936.1      KLASELSNTFGAISASIGDIIQRLDVLEQDAQIDRLINGRLTTLNAFVAQQLVRSESAAL
YP_0010399      KLASELSNTFGAISSSISDILARLDTVEQDAQIDRLINGRLISLNAFVSQQLVRSETAAR
                .*..:**..*****: :.**: *** :* :.******.*** :*:::*:***:*:     
QDF43825.1      SANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQAAPHGVVFLHVTYVPSQERNFTTAPA
AGZ48818.1      SANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQAAPHGVVFLHVTYVPSQERNFTTAPA
ALK02457.1      SANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQAAPHGVVFLHVTYVPSQERNFTTAPA
AAS10463.1      SANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQAAPHGVVFLHVTYVPSQERNFTTAPA
AAP13441.1      SANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQAAPHGVVFLHVTYVPSQERNFTTAPA
AAP13567.1      SANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQAAPHGVVFLHVTYVPSQERNFTTAPA
QHD43416.1      SANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQEKNFTTAPA
AVP78031.1      SANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYIPSQEKNFTTAPA
ABD75323.1      SANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPSQEKNFTTAPA
QDF43835.1      SANLAATKMSECVLGQSKRVDFCGRGYHLMSFPQAAPHGVVFLHVTYVPSQEKNFTTAPA
ABD75332.1      SANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQAAPHGVVFLHVTYVPSQERNFTTAPA
QDF43820.1      SANLAATKMSECVLGQSKRVDFCGRGYHLMSFPQAAPHGVVFLHVTYVPSQEKNFTTAPA
AAZ67052.1      SANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQAAPHGVVFLHVTYVPSQERNFTTAPA
AFS88936.1      SAQLAKDKVNECVKAQSKRSGFCGQGTHIVSFVVNAPNGLYFMHVGYYPSNHIEVVSAYG
YP_0010399      SAQLASDKVNECVKSQSKRNGFCGSGTHIVSFVVNAPNGFYFFHVGYVPTNYTNVTAAYG
                **:**  *:.*** .**** .*** * *::**   **:*. *:** * *::  :..:* .
QDF43825.1      ICHEGK---AYFPREGVFVFNGTS-------WFITQRNFFSPQIITTDNT-FVSGSCDVV
AGZ48818.1      ICHEGK---AYFPREGVFVFNGTS-------WFITQRNFFSPQIITTDNT-FVSGSCDVV
ALK02457.1      ICHEGK---AYFPREGVFVFNGTS-------WFITQRNFFSPQIITTDNT-FVSGSCDVV
AAS10463.1      ICHEGK---AYFPREGVFVFNGTS-------WFITQRNFFSPQIITTDNT-FVSGNCDVV
AAP13441.1      ICHEGK---AYFPREGVFVFNGTS-------WFITQRNFFSPQIITTDNT-FVSGNCDVV
AAP13567.1      ICHEGK---AYFPREGVFVFNGTS-------WFITQRNFFSPQIITTDNT-FVSGNCDVV
QHD43416.1      ICHDGK---AHFPREGVFVSNGTH-------WFVTQRNFYEPQIITTDNT-FVSGNCDVV
AVP78031.1      ICHEGK---AHFPREGVFVSNGTH-------WFVTQRNFYEPKIITTDNT-FVSGNCDVV
ABD75323.1      ICHEGK---AYFPREGVFVSNGSS-------WFITQRNFYSPQIITTDNT-FVAGSCDVV
QDF43835.1      ICHEGK---AYFPREGVFVSNGTS-------WFITQRNFYSPQIITTDNT-FVAGSCDVV
ABD75332.1      ICHEGK---AYFPREGVFVSNGTS-------WFITQRNFYSPQIITTDNT-FVAGNCDVV
QDF43820.1      ICHEGK---AYFPREGVFVSNGTF-------WFITQRNFYSPQIITTDNT-FVAGNCDVV
AAZ67052.1      ICHEGK---AYFPREGVFVSNGTS-------WFITQRNFYSPQIITTDNT-FVAGSCDVV
AFS88936.1      LCDAANPTNCIAPVNGYFIKTNNT--RIVDEWSYTGSSFYAPEPITSLNTKYVA--PQVT
YP_0010399      LCNNNNPPLCIAPIDGYFITNQTTTYSVDTEWYYTGSSFYKPEPITQANSRYVS--SDVK
                :*   :   .  * :* *: . .        *  *  .*: *: **  *: :*:   :* 
QDF43825.1      IGIINNTVYDPL---QPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRL
AGZ48818.1      IGIINNTVYDPL---QPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEINRL
ALK02457.1      IGIINNTVYDPL---QPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRL
AAS10463.1      IGIINNTVYDPL---QPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQEEIDRL
AAP13441.1      IGIINNTVYDPL---QPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRL
AAP13567.1      IGIINNTVYDPL---QPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRL
QHD43416.1      IGIVNNTVYDPL---QPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRL
AVP78031.1      IGIINNTVYDPL---QPELDSFKEELDKYFKNHTSPDIDLGDISGINASVVNIQKEIDRL
ABD75323.1      IGIINNTVYDPL---QPELDSFKQELDKYFKNHTSPDVDLGDISGINASVVDIQKEIDRL
QDF43835.1      IGIINNTVYDPL---QPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRL
ABD75332.1      IGIINNTVYDPL---QPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRL
QDF43820.1      IGIINNTVYDPL---QPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRL
AAZ67052.1      IGIINNTVYDPL---QPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRL
AFS88936.1      YQNISTNLPPPLLGNSTGID-FQDELDEFFKNVSTSIPNFGSLTQINTTLLDLTYEMLSL
YP_0010399      FDKLENNLPPPLLENSTDVD-FKDELEEFFKNVTSHGPNFAEISKINTTLLDLSDEMAML
                   :...:  **   .. :* *::**:::*** ::   ::..:: **::::::  *:  *
QDF43825.1      NEVAKNLNESLIDLQELGKYEQYIKWPWYVWLGFIAGLIAIVMVTILLCCMTSCCSCLKG
AGZ48818.1      NEVAKNLNESLIDLQELGKYEQYIKWPWYVWLGFIAGLIAIVMVTILLCCMTSCCSCLKG
ALK02457.1      NEVAKNLNESLIDLQELGKYEQYIKWPWYVWLGFIAGLIAIVMVTILLCCMTSCCSCLKG
AAS10463.1      NEVAKNLNESLIDLQELGKYEQYIKWPWYVWLGFIAGLIAIVMVTILLCCMTSCCSCLKG
AAP13441.1      NEVAKNLNESLIDLQELGKYEQYIKWPWYVWLGFIAGLIAIVMVTILLCCMTSCCSCLKG
AAP13567.1      NEVAKNLNESLIDLQELGKYEQYIKWPWYVWLGFIAGLIAIVMVTILLCCMTSCCSCLKG
QHD43416.1      NEVAKNLNESLIDLQELGKYEQYIKWPWYIWLGFIAGLIAIVMVTIMLCCMTSCCSCLKG
AVP78031.1      NEVARNLNESLIDLQELGKYEQYIKWPWYVWLGFIAGLIAIVMVTILLCCMTSCCSCLKG
ABD75323.1      NEVAKNLNESLIDLQELGKYEQYIKWPWYVWLGFIAGLVGLFMAIILLCYFTSCCSCCKG
QDF43835.1      NEVAKNLNESLIDLQELGKYEQYIKWPWYVWLGFIAGLIAIVMATILLCCMTSCCSCLKG
ABD75332.1      NEVAKNLNESLIDLQELGKYEQYIKWPWYVWLGFIAGLIAIVMVTILLCCMTSCCSCLKG
QDF43820.1      NEVAKNLNESLIDLQELGKYEQYIKWPWYVWLGFIAGLIAIVMATILLCCMTSCCSCLKG
AAZ67052.1      NEVAKNLNESLIDLQELGKYEQYIKWPWYVWLGFIAGLIAIVMVTILLCCMTSCCSCLKG
AFS88936.1      QQVVKALNESYIDLKELGNYTYYNKWPWYIWLGFIAGLVALALCVFFILCCTGCGTNCMG
YP_0010399      QEVVKQLNDSYIDLKELGNYTYYNKWPWYVWLGFIAGLVALLLCVFFLLCCTGCGTSCLG
                ::*.. **:* ***:***:*  * *****:********:.: :  :::   *.* :   *
QDF43825.1      ACSCGSCC-KFDEDDSEPVLKGVKLHYT
AGZ48818.1      ACSCGSCC-KFDEDDSEPVLKGVKLHYT
ALK02457.1      ACSCGSCC-KFDEDDSEPVLKGVKLHYT
AAS10463.1      ACSCGSCC-KFDEDDSEPVLKGVKLHYT
AAP13441.1      ACSCGSCC-KFDEDDSEPVLKGVKLHYT
AAP13567.1      ACSCGSCC-KFDEDDSEPVLKGVKLHYT
QHD43416.1      CCSCGSCC-KFDEDDSEPVLKGVKLHYT
AVP78031.1      CCSCGSCC-KFDEDDSEPVLKGVKLHYT
ABD75323.1      MCSCGSCC-RFDEDDSEPVLKGVKLHYT
QDF43835.1      ACSCGSCC-KFDEDDSEPVLKGVKLHYT
ABD75332.1      ACSCGSCC-KFDEDDSEPVLKGVKLHYT
QDF43820.1      ACSCGSCC-KFDEDDSEPVLKGVKLHYT
AAZ67052.1      ACSCGSCC-KFDEDDSEPVLKGVKLHYT
AFS88936.1      KLKCNRCCDRYEEYDLEP----HKVHVH
YP_0010399      KMKCKNCCDSYEEYDVE------KIHVH
                  .*  **  ::* * *      *:*
  • Went back to 6. Tree rendering for the phylogenetic tree of the seqeunces.
    • Horizontal lines represent individual evolutionary lines.
    • Vertical lines represent mutation events. the vertical length has no biological meaning.
    • The left-most split is called the root of the tree, which represents a hypothesis about the most recent common ancestor (MRCA) of the sequences within your tree.
    • The length of each branch represents the percentage change in the amino acid sequence occurring along that branch, relative to the scale bar
      • The scale bar was 0.5 (50%).
  • I saved the image to a file and uploaded it to the wiki.

Horvath Phylogenetic Tree.png

  • Comparison of generated tree to multiple sequence alignment
    • The generated tree and the class sequences have a lot of similarities. For example, the two outgroups, Human betacoronavirus 2c (AFS88936.1) and Trylonycteris bat coronavirus (YP_0010399), show a lot of similarity to one another in terms of the sequence. As they are both outgroups and depicted as sister taxa on the phylogenetic tree, we would anticipate this similarity. Outside of the outgroups, the rest of the sequences are divided into two primary groups. Every sequence comes off one of these larger nodes. For example, the spike protein of bat SARS CoV Rm1 (ABD75332.1) and bat SARS CoV Rp3 (AAZ67052.1) are similar to one another. This is visible throughout most of the sequence, and they are indeed sister taxa. however, this sequence is different from spike protein SARS-like coronavirus (AGZ48818.1). This can be seen as it has a more complete sequence with less missing amino acid residues. These results are expected as the phylogenetic tree was built directly off these sequences. Therefore, it can be anticipated that it separates the sequences based on the similarities and differences that are observable.
  • Alignment compared to Figure 3 of Wan et al (2020) paper.
    • The amino acid sequences highlighted in the paper are between amino acid positions 435 to about 480. Our sequences have a high prevalence of spaces, meaning it was not conserved. There are a lot fewer sequences in the article, indicating that they saw relatively more conservation throughout their sequences. However, certain regions have a high prevalence of colons, indicating it is highly conserved in this region. This is true for all of the sequences being observed. The paper shows a lot of stars, which indicates invariance, meaning that it remained unchanged. This was not as prevalent in our class sequences. Overall, the clas sequences were not as similar to the Wan et al paper's sequences as initially thought.
  • Alignment compared to Figure 2 of Wan et al (2020) paper.
    • The paper published by Wan et. al. has certain differences when compared to the phylogenetic tree created by the class. For example, their outgroup could not be found based on the sequence they provided. Instead, the outgroup of this tree is Human betacoronavirus 2c (AFS88936.1) and Trylonycteris bat coronavirus (YP_0010399). There were several sequences published in the paper that were not used for comparison in the class' phylogenetic tree, making exact similarities difficult to determine. This means that less than half of the sequences from the paper were used. For this reason, the trees do not appear to be very similar. In the paper, their figure separates into two major branches, with one branch being very similar and does not diverge much, while the other branch diverges many times. In contrast, the tree produced from the class' sequences also has two major branches. However, these two branches do not diverge much once this split occurs, as their sequences seem similar.
  • Is enough information provided by Wan et al (2020) paper in their paper to reproduce their analysis?
    • There was not enough information provided in the paper to reproduce the analysis. As mentioned previously, the accession number they used in the paper as an outgroup ("BtSCoV PDF2386") was not correct. Therefore, right from the beginning, we would not be able to recreate the phylogenetic tree they presented in their results. It was also very difficult to find the exact highlighted amino acid sequences that they used. Connecting the larger protein to the actual RBD of the spike protein was very hard to accomplish. Overall, this paper is not easily reproducible.

Conclusion

This week's lab was very helpful for learning how to obtain sequences from GenBank, as well as how to build a phylogenetic tree from them. It also helped clarify the methods used in the previously read article. Through this exercise, we were able to build skills for our future independent projects.

Acknowledgements

  • I consulted with my partner Aiden Burnett in class and over text to discuss how to properly find the sequences and created the phylogenetic tree.
  • I contacted my TA, Annika Dinulos, to ask about downloading the image from the phylogeny.fr website.
  • I copied and modified procedures from the Week 4 assignment page.
  • I used the Wan et. al - Receptor Recognition by the Novel Coronavirus from Wuhan paper for reference to Figures 2 and 3.
  • I obtained sequences from GenBank.
  • I built the phylogenetic tree and created sequence alignments using Phylogeny.fr.
  • I gathered sequences from Week 4 Talk page.
  • Except for what is noted above, this individual journal entry was completed by me and not copied from another source.

Anna Horvath (talk) 18:03, 30 September 2020 (PDT)

References

  1. NCBI GenBank. (2020). Bat SARS-like coronavirus Rs3367, complete genome - Nucleotide. Retrieved 30 September 2020, from https://www.ncbi.nlm.nih.gov/nuccore/556015127/
  2. NCBI GenBank. (2020). Spike protein [Bat SARS CoV Rf1/2004] - Protein. Retrieved 30 September 2020, from https://www.ncbi.nlm.nih.gov/protein/ABD75323.1?report=fasta
  3. OpenWetWare. (2020). BIOL368/F20:Week 4. Retrieved 30 September 2020, from https://openwetware.org/wiki/BIOL368/F20:Week_4#Data_.26_Tools
  4. OpenWetWare. (2020). Talk:BIOL368/F20:Week 4. Retrieved 30 September 2020, from https://openwetware.org/wiki/Talk:BIOL368/F20:Week_4
  5. Phylogeny.fr: "One Click" Mode. (2020). Retrieved 30 September 2020, from http://www.phylogeny.fr/simple_phylogeny.cgi?workflow_id=b9c0813cbbe9695d63cf7e31da5f026d&tab_index=1
  6. Wan, Y., Shang, J., Graham, R., Baric, R., & Li, F. (2020). Receptor Recognition by the Novel Coronavirus from Wuhan: an Analysis Based on Decade-Long Structural Studies of SARS Coronavirus. Journal Of Virology, 94(7). doi: 10.1128/jvi.00127-20

Template

Anna Horvath Template

User Pages

Assignments

Journal Pages

Class Journal Pages