JT Correy Journal Week 4

From OpenWetWare
Jump to navigationJump to search

Journal Week 4

Propose

The purpose of this assignment is to analyze the phylogenetic relationships between varying types of the SARS and COVID type virus based on their protein spikes.

Tasks

Part 1: GenBank

In this section you will take a closer look at a GenBank record and the type of data that is stored there. Once you reach the sequence data associated with the Wan et al. (2020) paper you will see that there are a variety of different ways to view the data.

Choose one of the GenBank records from the Data & Resources section above and view both the full record and the FASTA formatted sequence.

  • What was the accession number of the sequence you chose?
  • What information is provided in the GenBank record?
  • Download the nucleotide sequence in FASTA format to your local hard drive.
    • Click the Send to link in the upper right of the page. Select Complete Record, File as the Destination, and FASTA as the format. Click the Create File button. Be careful to remember where you put the file and what you name it so that you can find it later.
  • Open the file that you saved with a word processor to confirm that you have the sequence and that it is in the FASTA format. In the FASTA format each sequence is preceded by a label which begins with the greater than sign (>). For example, the first 10 lines of the SARS-CoV-2 sequence is:
>MN908947.3 Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome
ATTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGTAGATCTGTTCTCTAAA
CGAACTTTAAAATCTGTGTGGCTGTCACTCGGCTGCATGCTTAGTGCACTCACGCAGTATAATTAATAAC
TAATTACTGTCGTTGACAGGACACGAGTAACTCGTCTATCTTCTGCAGGCTGCTTACGGTTTCGTCCGTG
TTGCAGCCGATCATCAGCACATCTAGGTTTCGTCCGGGTGTGACCGAAAGGTAAGATGGAGAGCCTTGTC
CCTGGTTTCAACGAGAAAACACACGTCCAACTCAGTTTGCCTGTTTTACAGGTTCGCGACGTGCTCGTAC
GTGGCTTTGGAGACTCCGTGGAGGAGGTCTTATCAGAGGCACGTCAACATCTTAAAGATGGCACTTGTGG
CTTAGTAGAAGTTGAAAAAGGCGTTTTGCCTCAACTTGAACAGCCCTATGTGTTCATCAAACGTTCGGAT
GCTCGAACTGCACCTCATGGTCATGTTATGGTTGAGCTGGTAGCAGAACTCGAAGGCATTCAGTACGGTC
GTAGTGGTGAGACACTTGGTGTCCTTGTCCCTCATGTGGGCGAAATACCAGTGGCTTACCGCAAGGTTCT
TCTTCGTAAGAACGGTAATAAAGGAGCTGGTGGCCATAGTTACGGCGCCGATCTAAAGTCATTTGACTTA
...continued
  • While we could create a phlylogenetic tree with the entire genome sequence of the viruses, in this analysis we are mainly interested in the spike protein. Links have been provided to the individual spike protein sequences corresponding to each of the viral genome records listed in the Data & Tools section. We are going to "crowdsource" gathering the sequence data from 12 other viral strains that are listed in Figure 2 of Wan et al. (2002).
    • Each student will be assigned a nucleotide sequence accession number from Figure 2 in class.
      • The accession number I was assigned was ay278554, it corresponds to the SARS coronavirus CUHK-W1 complete genome.
    • Search for the GenBank record associated with that sequence. Add a hyperlink to the GenBank record to the list of sequences in the Data & Tools section.
    • Locate the spike protein accession number in the GenBank record. (Note that the spike protein is sometimes called the "S" protein.)
      • The spike protein accession number is: AAP13567.1
    • Add a hyperlink to the spike protein record to the list of sequences in the Data & Tools section. Be sure to format the list in the same way as it is already formatted.
    • Download your assigned protein sequence in FASTA format, just like you did for the whole genome sequence.
    • Add the protein sequence to your individual journal page.
      • Sequence can be found below.
      • Note that if you begin a line with a space character, it will be interpreted as a fixed width font and the sequences will like up nicely on the page.
    • Also add the protein sequence to the talk page for this assignment. We will be creating a list of sequences for everyone in the class to use.

Sequences

CLUSTAL FORMAT: MUSCLE (3.8) multiple sequence alignment


QDF43825.1      ---------MKLLVLV-----FATLVSSYTIEKCTDFD------DRTPPSNTQFLSSHRG
AGZ48818.1      ---------MKLLVLV-----FATLVSSYTIEKCLDFD------DRTPPANTQFLSSHRG
ALK02457.1      ----------MFIFLF-----FLTLTSGSDLESCTTFD------DVQAPNYPQHSSSRRG
AAS10463.1      ----------MFIFLL-----FLTLTSGSDLDRCTTFD------DVQAPNYTQHTSSMRG
AAP13441.1      ----------MFIFLL-----FLTLTSGSDLDRCTTFD------DVQAPNYTQHTSSMRG
AAP13567.1      ----------MFIFLL-----FLTLTSGSDLDRCTTFD------DVQAPNYTQHTSSMRG
QHD43416.1      ----------MFVFLV-----LLPLVSSQ----CVNLT------TRTQLPPAYTNSFTRG
AVP78031.1      -----------MLFFL-----FLQFALVN--SQCVNLT------GRTPLNPNYTNSSQRG
ABD75323.1      --------MKILIFAF-----LVTLVKAQ--EGCGVIN------LRTQPKLTQVSSSRRG
QDF43835.1      --------MKVLIVLL-----CLGLVTAQ--DGCGHIS------TKPQPLLDKFSSSRRG
ABD75332.1      --------MKVLIFAL-----LFSLAKAQ--EGCGIIS------RKPQPKMEKVSSSRRG
QDF43820.1      --------MKILIFAF-----LVTLVEAQ--EGCGIIS------RKPQPKMAQVSSSRRG
AAZ67052.1      --------MKILILAF-----LASLAKAQ--EGCGIIS------RKPQPKMAQVSSSRRG
AFS88936.1      ----MIHSVFLLMFLLTPTESYVDVGPDSVKSACIEVDIQQTFFDKTWPRPIDVSKA-DG
YP_0010399      MTLLMCLLMSLLIFVRGCDSQFVDMSPASNTSECLESQVDAAAFSKLMWPYPIDPSKVDG
                           ::.          .        *                     .   *
QDF43825.1      VYYPDDIFRSNVLHLVQDHFLPFDSNVTRFITFGLN-------------FDN---PIIPF
AGZ48818.1      VYYPDDIFRSNVLHLVQDHFLPFDSNVTRFITFGLN-------------FDN---PIIPF
ALK02457.1      VYYPDEIFRSDTLYLTQDLFLPFYSNVTGFHTINHR-------------FDN---PVIPF
AAS10463.1      VYYPDEIFRSDTLYLTQDLFLPFYSNVTGFHTINHT-------------FDD---PVIPF
AAP13441.1      VYYPDEIFRSDTLYLTQDLFLPFYSNVTGFHTINHT-------------FGN---PVIPF
AAP13567.1      VYYPDEIFRSDTLYLTQDLFLPFYSNVTGFHTINHT-------------FDN---PVIPF
QHD43416.1      VYYPDKVFRSSVLHSTQDLFLPFFSNVTWFHAIHVS------GTNGTKRFDN---PVLPF
AVP78031.1      VYYPDTIYRSDTLVLSQGYFLPFYSNVSWYYSLTTN-------NAATKRTDN---PILDF
ABD75323.1      VYYNDDIFRSDVLHLTQDYFLPFHSNLTQYFSLNIE-------SDKIVYFDN---PILKF
QDF43835.1      VYYNDDIFRSDVLHLTQDYFLPFDTNLTRYLSFNMD-------SATKVYFDN---PTLPF
ABD75332.1      VYYNDDIFRSDVLHLTQDYFLPFDSNLTQYFSLNID-------SNKYTYFDN---PILDF
QDF43820.1      VYYNDDIFRSDVLHLTQDYFLPFDSNLTQYFSLNVD-------SDRYTYFDN---PILDF
AAZ67052.1      VYYNDDIFRSNVLHLTQDYFLPFDSNLTQYFSLNVD-------SDRFTYFDN---PILDF
AFS88936.1      IIYPQGRTYSNITITYQGLF-PYQGDHGDMYVYSAG--HATGTTPQKLFVANYSQDVKQF
YP_0010399      IIYPLGRTYSNITLAYTGLF-PLQGDLGSQYLYSVSHAVGHDGDPTKAYISNYSLLVNDF
                : *      *.      . * *   :                         :       *
QDF43825.1      RDGVYF----AATEKSNVIRG-------------WVFGSTMNNKSQ---------SVIIM
AGZ48818.1      KDGIYF----AATEKSNVIRG-------------WVFGSTMNNKSQ---------SVIIM
ALK02457.1      KDGVYF----AATEKSNVVRG-------------WVFGSTMNNKSQ---------SVIII
AAS10463.1      KDGIYF----AATEKSNVVRG-------------WVFGSTMNNKSQ---------SVIII
AAP13441.1      KDGIYF----AATEKSNVVRG-------------WVFGSTMNNKSQ---------SVIII
AAP13567.1      KDGIYF----AATEKSNVVRG-------------WVFGSTMNNKSQ---------SVIII
QHD43416.1      NDGVYF----ASTEKSNIIRG-------------WIFGTTLDSKTQ---------SLLIV
AVP78031.1      KDGIYF----AATEHSNIIRG-------------WIFGTTLDNTSQ---------SLLIV
ABD75323.1      GDGVYF----AATEKSNVIRG-------------WVFGSTFDNTTQ---------SAIIV
QDF43835.1      GDGIYF----AATEKSNVVRG-------------WIFGSTMDNTTQ---------SAIIV
ABD75332.1      GDGVYF----AATEKSNVIRG-------------WIFGSSFDNTTQ---------SAIIV
QDF43820.1      GDGVYF----AATEKSNVIRG-------------WIFGSTFDNTTQ---------SAVIV
AAZ67052.1      GDGVYF----AATEKSNVIRG-------------WIFGSTFDNTTQ---------SAVIV
AFS88936.1      ANGFVVRIGAAANSTGTVIISPSTSATIRKIYPAFMLGSSVGNFSDGKMGRFFNHTLVLL
YP_0010399      DNGFVVRIGAAANSTGTIVISPSVNTKIKKAYPAFILGSSLTNTSAGQ-PLYANYSLTII
                 :*. .    *:.. ..:: .             :::*::. . :          :  ::
QDF43825.1      NNSTNLVIRACNFELCDNPFFVVLRSNNTQIPSY------IFNNAFN-CTFEYVSKDFNL
AGZ48818.1      NNSTNLVIRACNFELCDNPFFVVLKSNNTQIPSY------IFNNAFN-CTFEYVSKDFNL
ALK02457.1      NNSTNVVIRACNFELCDNPFFAVSKPTGTQTHTM------IFDNAFN-CTFEYISDSFSL
AAS10463.1      NNSTNVVIRACNFELCDNPFFVVSKPMGTRTHTM------IFDNAFN-CTFEYISDAFSL
AAP13441.1      NNSTNVVIRACNFELCDNPFFAVSKPMGTQTHTM------IFDNAFN-CTFEYISDAFSL
AAP13567.1      NNSTNVVIRACNFELCDNPFFAVSKPMGTQTHTM------IFDNAFN-CTFEYISDAFSL
QHD43416.1      NNATNVVIKVCEFQFCNDPFLGVYY--HKNNKSWMESEFRVYSSANN-CTFEYVSQPFLM
AVP78031.1      NNATNVIIKVCNFDFCYDP-YLSGY--YHNNKTWSIREFAVYSSYAN-CTFEYVSKSFML
ABD75323.1      NNSTHIIIRVCYFNLCKDPMYTVSA--GTQKSSW------VYQSAFN-CTYDRVEKSFQL
QDF43835.1      NNSTHIIIRVCYFNLCKEPMYAISN--EQHYKSW------VYQNAYN-CTYDRVEQSFQL
ABD75332.1      NNSTHIIIRVCNFNLCKEPMYTVSK--GTQQSSW------VYQSAFN-CTYDRVEKSFQL
QDF43820.1      NNSTHIIIRVCNFNLCKEPMYTVSR--GTQQSSW------VYQSAFN-CTYDRVERSFQL
AAZ67052.1      NNSTHIIIRVCNFNLCKEPMYTVSR--GAQQSSW------VYQSAFN-CTYDRVEKSFQL
AFS88936.1      PDGCGTLLRAFYCIL--EPRSGNHCPAGNSYTSF-----ATYHTPATDCSDGNYNRNASL
YP_0010399      PDGCGTVLHAFYCIL--KPRTVNRCPSGTGYVSY-----FIYETVHNDCQ-STINRNASL
                 :.   ::..    :  .*             :        : .  . *     .    :
QDF43825.1      DIGEKPGNFKDLREFVFRNKDG--------FLHVYSGYQPISAASGLPTGF--NALKPIF
AGZ48818.1      DLGEKPGNFKDLREFVFRNKDG--------FLHVYSGYQPISAASGLPTGF--NALKPIF
ALK02457.1      DVAEKSGNFKHLREFVFKNKDG--------FLYVYKGYQPIDVVRDLPSGF--NILKPIF
AAS10463.1      DVSEKSGNFKHLREFVFKNKDG--------FLYVYKGYQPIDVVRDLPSGF--NTLKPIF
AAP13441.1      DVSEKSGNFKHLREFVFKNKDG--------FLYVYKGYQPIDVVRDLPSGF--NTLKPIF
AAP13567.1      DVSEKSGNFKHLREFVFKNKDG--------FLYVYKGYQPIDVVRDLPSGF--NTLKPIF
QHD43416.1      DLEGKQGNFKNLREFVFKNIDG--------YFKIYSKHTPINLVRDLPQGF--SALEPLV
AVP78031.1      NISGNGGLFNTLREFVFRNVDG--------HFKIYSKFTPVNLNRGLPTGL--SVLQPLV
ABD75323.1      DTSPKTGNFTDLREFVFKNRDG--------FFTAYQTYTPVNLLRGLPSGL--SVLKPIL
QDF43835.1      DTAPQTGNFKDLREYVFKNKDG--------FLSVYNAYSPIDIPRGLPVGF--SVLKPIL
ABD75332.1      DTAPKTGNFKDLREYVFKNKGG--------FLRVYQTYTAVNLPRGFPAGF--SVLRPIL
QDF43820.1      DTAPKTGNFKDLREYVFKNRDG--------FLSVYQTYTAVNLPRGLPIGF--SVLRPIL
AAZ67052.1      DTAPKTGNFKDLREYVFKNRDG--------FLSVYQTYTAVNLPRGLPIGF--SVLRPIL
AFS88936.1      NSFKE---YFNLRNCTFMYTYNITEDEILEWFGITQTAQGVHLFSSRYVDLYGGNMFQFA
YP_0010399      NSFK---SFFDLVNCTFFNSWDITADETKEWFGITQDTQGVHLYSSRKGDLYGGNMFRFA
                :       :  * : .*    .         :   .    :    .   .:  . :  : 
QDF43825.1      KLPLGINITNFRTLLTAF------PPNPGYWGTSAAAYFVGYLKPTTFMLKYDENGTITD
AGZ48818.1      KLPLGINITNFRTLLTAF------PPRPDYWGTSAAAYFVGYLKPTTFMLKYDENGTITD
ALK02457.1      KLPLGINITNFRAILTAF------LPAQDTWGTSAAAYFVGYLKPATFMLKYDENGTITD
AAS10463.1      KLPLGINITNFRAILTAF------SPAQDTWGTSAAAYFVGYLKPTTFMLKYDENGTITD
AAP13441.1      KLPLGINITNFRAILTAF------SPAQDIWGTSAAAYFVGYLKPTTFMLKYDENGTITD
AAP13567.1      KLPLGINITNFRAILTAF------SPAQDTWGTSAAAYFVGYLKPTTFMLKYDENGTITD
QHD43416.1      DLPIGINITRFQTLLALHRSYLTPGDSSSGWTAGAAAYYVGYLQPRTFLLKYNENGTITD
AVP78031.1      ELPVSINITKFRTLLTIHRGD---PMPNNGWTAFSAAYFVGYLKPRTFMLKYNENGTITD
ABD75323.1      KLPFGINITSFRVVMAMF------SKTTSNYVPESAAYYVGNLKQSTFMLSFNQNGTIVD
QDF43835.1      KLPIGINITSFKVVMSMF------SRTTSNFLPEVAAYFVGNLKYSTFMLNFNENGTITD
ABD75332.1      KLPFGINITSYRVVMTMF------SQFNSNFLPESAAYYVGNLKYTTFMLSFNENGTITD
QDF43820.1      KLPFGINITSYRVVMAMF------SQTTSNFLPESAAYYVGNLKYTTFMLRFNENGTITD
AAZ67052.1      KLPFGINITSYRVVMAMF------SQTTSNFLPESAAYYVGNLKYTTFMLSFNENGTITN
AFS88936.1      TLPVYDTIKYYSIIPHSIRSI---QSDRKAW----AAFYVYKLQPLTFLLDFSVDGYIRR
YP_0010399      TLPVYEGIKYYTVIPRSFRSK---ANKREAW----AAFYVYKLHQLTYLLDFSVDGYIRR
                 **.   *. :  :                :    **::*  *:  *::* :. :* *  
QDF43825.1      AVDCSQNPLAELKCSVKSFEIDKGIYQTSNFRVAPSKEVVRFPNITNLCPFGEVFNATTF
AGZ48818.1      AVDCSQNPLAELKCSVKSFEIDKGIYQTSNFRVAPSKEVVRFPNITNLCPFGEVFNATTF
ALK02457.1      AVDCSQNPLAELKCSVKSFEIDKGIYQTSNFRVAPSKEVVRFPNITNLCPFGEVFNATTF
AAS10463.1      AVDCSQNPLAELKCSVKSFEIDKGIYQTSNFRVVPSGDVVRFPNITNLCPFGEVFNATKF
AAP13441.1      AVDCSQNPLAELKCSVKSFEIDKGIYQTSNFRVVPSGDVVRFPNITNLCPFGEVFNATKF
AAP13567.1      AVDCSQNPLAELKCSVKSFEIDKGIYQTSNFRVVPSGDVVRFPNITNLCPFGEVFNATKF
QHD43416.1      AVDCALDPLSETKCTLKSFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFGEVFNATRF
AVP78031.1      AVDCALDPLSETKCTLKSLTVQKGIYQTSNFRVQPTQSVVRFPNITNVCPFHKVFNATRF
ABD75323.1      AVDCSQDPLAELKCTTKSFNVSKGIYQTSNFRVSPVTEVVRFPNITNLCPFDKVFNATRF
QDF43835.1      AIDCAQNPLSELKCTIKNFNVSKGIYQTSNFRVSPTHEVIRFPNITNRCPFDKVFNASRF
ABD75332.1      AVDCSQNPLAELKCTIKNFNVSKGIYQTSNFRVTPTQEVVRFPNITNRCPFDKVFNASRF
QDF43820.1      AIDCAQNPLAELKCTIKNFNVSKGIYQTSNFRVSPTQEVVRFPNITNRCPFDKVFNASRF
AAZ67052.1      AIDCAQNPLAELKCTIKNFNVSKGIYQTSNFRVSPTQEVIRFPNITNRCPFDKVFNATRF
AFS88936.1      AIDCGFNDLSQLHCSYESFDVESGVYSVSSFEAKPSGSVVEQAEGVE-CDFSPLLSGTP-
YP_0010399      AIDCGHDDLSQLHCSYTSFEVDTGVYSVSSYEASATGTFIEQPNATE-CDFSPMLTGVA-
                *:**. : *:: :*:  .: :..*:*..*.: . .   .:  .: .: * *  ::..   
QDF43825.1      PSVYAWERKRISNCVADYSVLYNSTSFSTFKCYGVSATKLNDLCFSNVYADSFVVKGDDV
AGZ48818.1      PSVYAWERKRISNCVADYSVLYNSTSFSTFKCYGVSATKLNDLCFSNVYADSFVVKGDDV
ALK02457.1      PSVYAWERKRISNCVADYSVLYNSTSFSTFKCYGVSATKLNDLCFSNVYADSFVVKGDDV
AAS10463.1      PSVYAWERKRISNCVADYSVLYNSTSFSTFKCYGVSATKLNDLCFSNVYADSFVVKGDDV
AAP13441.1      PSVYAWERKKISNCVADYSVLYNSTFFSTFKCYGVSATKLNDLCFSNVYADSFVVKGDDV
AAP13567.1      PSVYAWERKKISNCVADYSVLYNSTFFSTFKCYGVSATKLNDLCFSNVYADSFVVKGDDV
QHD43416.1      ASVYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLNDLCFTNVYADSFVIRGDEV
AVP78031.1      PSVYAWERTKISDCIADYTVFYNSTSFSTFKCYGVSPSKLIDLCFTSVYADTFLIRFSEV
ABD75323.1      PSVYAWERTKISDCVADYTVFYNSTSFSTFNCYGVSPSKLIDLCFTSVYADTFLIRFSEV
QDF43835.1      PNVYAWERTKISDCVADYTVLYNSTSFSTFKCYGVSPSKLIDLCFTSVYADTFLIRSSEV
ABD75332.1      PNVYAWERTKISDCVADYTVLYNSTSFSTFKCYGVSPSKLIDLCFTSVYADTFLIRSSEV
QDF43820.1      PNVYAWERTKISDCVADYTVLYNSTSFSTFKCYGVSPSKLIDLCFTSVYADTFLIRSSEV
AAZ67052.1      PNVYAWERTKISDCVADYTVLYNSTSFSTFKCYGVSPSKLIDLCFTSVYADTFLIRSSEV
AFS88936.1      PQVYNFKRLVFTNCNYNLTKLLSLFSVNDFTCSQISPAAIASNCYSSLILDYFSYPLSMK
YP_0010399      PQVYNFKRLVFSNCNYNLTKLLSLFAVDEFSCNGISPDSIARGCYSTLTVDYFAYPLSMK
                ..** ::*  :::*  : : : .   .. *.*  :*.  :   *::.:  * *    .  
QDF43825.1      RQIAPGQTGVIADYNYKLPDDFMGC-VLAWNTRNIDATSTGNYNYKYRSLRHGKLRPFER
AGZ48818.1      RQIAPGQTGVIADYNYKLPDDFTGC-VLAWNTRNIDATQTGNYNYKYRSLRHGKLRPFER
ALK02457.1      RQIAPGQTGVIADYNYKLPDDFTGC-VLAWNTRNIDATQTGNYNYKYRSLRHGKLRPFER
AAS10463.1      RQIAPGQTGVIADYNYKLPDDFMGC-VLAWNTRNIDATSTGNYNYKYRYLRHGKLRPFER
AAP13441.1      RQIAPGQTGVIADYNYKLPDDFMGC-VLAWNTRNIDATSTGNYNYKYRYLRHGKLRPFER
AAP13567.1      RQIAPGQTGVIADYNYKLPDDFMGC-VLAWNTRNIDATSTGNYNYKYRYLRHGKLRPFER
QHD43416.1      RQIAPGQTGKIADYNYKLPDDFTGC-VIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFER
AVP78031.1      RQVAPGQTGVIADYNYKLPDDFTGC-VIAWNTAKQD---VGNYF--YRSHRSTKLKPFER
ABD75323.1      RQVAPGQTGVIADYNYKLPDDFTGC-VIAWNTAKQD---VGSYF--YRSHRSSKLKPFER
QDF43835.1      RQVAPGETGVIADYNYKLPDDFTGC-VIAWNTAKQD---QGQYY--YRSSRKTKLKPFER
ABD75332.1      RQVAPGETGVIADYNYKLPDDFTGC-VIAWNTAQQD---QGQYY--YRSYRKEKLKPFER
QDF43820.1      RQVAPGETGVIADYNYKLPDDFTGC-VIAWNTAKQD---TGHYY--YRSHRKTKLKPFER
AAZ67052.1      RQVAPGETGVIADYNYKLPDDFTGC-VIAWNTAKQD---QGQYY--YRSHRKTKLKPFER
AFS88936.1      SDLSVSSAGPISQFNYKQSFSNPTC-LILATVPHNLTTITKPLKYSYINKCSRLLSDDRT
YP_0010399      SYIRPGSAGNIPLYNYKQSFANPTCRVMASVLANVTITKPHAYG--YIS-KCSRLTGANQ
                  :  ..:* *. :*** .     * ::     :            *       *     
QDF43825.1      DISNVPFSPDGKPCTPP-AF-NCYW-----------PLNDYGFFTTNGIGYQPYRVVVLS
AGZ48818.1      DISNVPFSPDGKPCTPP-AF-NCYW-----------PLNDYGFYITNGIGYQPYRVVVLS
ALK02457.1      DISNVPFSPDGKPCTPP-AF-NCYW-----------PLNDYGFYITNGIGYQPYRVVVLS
AAS10463.1      DISNVPFSPDGKPCTPP-AP-NCYW-----------PLNGYGFYTTSGIGYQPYRVVVLS
AAP13441.1      DISNVPFSPDGKPCTPP-AL-NCYW-----------PLNDYGFYTTTGIGYQPYRVVVLS
AAP13567.1      DISNVPFSPDGKPCTPP-AL-NCYW-----------PLNDYGFYTTTGIGYQPYRVVVLS
QHD43416.1      DISTEIYQAGSTPCNGVEGF-NCYF-----------PLQSYGFQPTNGVGYQPYRVVVLS
AVP78031.1      DLSSDE---------------NGVR-----------TLSTYDFNPNVPLEYQATRVVVLS
ABD75323.1      DLSSEE---------------NGVR-----------TLSTYDFNQNVPLEYQATRVVVLS
QDF43835.1      DLTSDE---------------NGVR-----------TLSTYDFYPNVPIEYQATRVVVLS
ABD75332.1      DLSSDE---------------NGVY-----------TLSTYDFYPSIPVEYQATRVVVLS
QDF43820.1      DLSSDDG--------------NGVY-----------TLSTYDFNPNVPVAYQATRVVVLS
AAZ67052.1      DLSSDE---------------NGVR-----------TLSTYDFYPSVPVAYQATRVVVLS
AFS88936.1      EVPQLVNANQYSPCVSI-VP-STVWEDGDYYRKQLSPLEGGGWLVASGSTVAMTEQLQMG
YP_0010399      DVETPLYINPGEYSICRDFSPGGFSEDGQVFKRTLTQFEGGGLLIGVGTRVPMTDNLQMS
                ::                   .               :.  .              : :.
QDF43825.1      FELL----NAPATVC-----GPKLSTDLIKNQCVNFNFNGLTGTGVLTPSSKRFQPFQQF
AGZ48818.1      FELL----NAPATVC-----GPKLSTDLIKNQCVNFNFNGLTGTGVLTPSSKRFQPFQQF
ALK02457.1      FELL----NAPATVC-----GPKLSTDLIKNQCVNFNFNGLTGTGVLTPSSKRFQPFQQF
AAS10463.1      FELL----NAPATVC-----GPKLSTDLIKNQCVNFNFNGLTGTGVLTPSSKRFQPFQQF
AAP13441.1      FELL----NAPATVC-----GPKLSTDLIKNQCVNFNFNGLTGTGVLTPSSKRFQPFQQF
AAP13567.1      FELL----NAPATVC-----GPKLSTDLIKNQCVNFNFNGLTGTGVLTPSSKRFQPFQQF
QHD43416.1      FELL----HAPATVC-----GPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKKFLPFQQF
AVP78031.1      FELL----NAPATVC-----GPKLSTQLVKNQCVNFNFNGLKGTGVLTDSSKRFQSFQQF
ABD75323.1      FELL----NAPATVC-----GPKLSTSLVKNQCVNFNFNGFKGTGVLTDSSKTFQSFQQF
QDF43835.1      FELL----NAPATVC-----GPKLSTGLVKNQCVNFNFNGLRGTGVLTDSSKRFQSFQQF
ABD75332.1      FELL----NAPATVC-----GPKLSTQLVKNQCVNFNFNGLRGTGVLTTSSKRFQSFQQF
QDF43820.1      FELL----NAPATVC-----GPKLSTQLVKNQCVNFNFNGLKGTGVLTDSSKRFQSFQQF
AAZ67052.1      FELL----NAPATVC-----GPKLSTQLVKNQCVNFNFNGLKGTGVLTESSKRFQSFQQF
AFS88936.1      FGITVQYGTDTNSVCPKLEFANDTKIASQLGNCVEYSLYGVSGRGVFQNCTAVGVRQQRF
YP_0010399      FIISVQYGTGTDSVCPMLDLGDSLTITNRLGKCVDYSLYGVTGRGVFQNCTAVGVKQQRF
                * :       . :**     . . .     .:**::.: *. * **:  ..      *.*
QDF43825.1      GRDVSD-FTDSVRDPKTSEILDISPCSFGGVSVITPGTNTSSEVAVLYQDVNCTDVPVAI
AGZ48818.1      GRDVSD-FTDSVRDPKTSEILDISPCSFGGVSVITPGTNTSSEVAVLYQDVNCTDVPVAI
ALK02457.1      GRDVLD-FTDSVRDPKTSEILDISPCSFGGVSVITPGTNTSSEVAVLYQDVNCTDVPVAI
AAS10463.1      GRDVSD-FTDSVRDPKTSEILDISPCSFGGVSVITPGTNASSEVAVLYQDVNCTDVSTLI
AAP13441.1      GRDVSD-FTDSVRDPKTSEILDISPCSFGGVSVITPGTNASSEVAVLYQDVNCTDVSTAI
AAP13567.1      GRDVSD-FTDSVRDPKTSEILDISPCSFGGVSVITPGTNASSEVAVLYQDVNCTDVSTAI
QHD43416.1      GRDIAD-TTDAVRDPQTLEILDITPCSFGGVSVITPGTNTSNQVAVLYQDVNCTEVPVAI
AVP78031.1      GKDASD-FIDSVRDPQTLEILDITPCSFGGVSVITPGTNTSLEVAVLYQDVNCTDVPTTI
ABD75323.1      GRDASD-FTDSVRDPQTLRILDISPCSFGGVSVITPGTNTSSAVAVLYQDVNCTDVPRTI
QDF43835.1      GRDTSD-FTDSVRDPQTLEILDITPCSFGGVSVITPGTNASSEVAVLYQDVNCTDVPTAI
ABD75332.1      GRDTSD-FTDSVRDPQTLEILDISPCSFGGVSVITPGTNASSEVAVLYQDVNCTDVPTSI
QDF43820.1      GRDTSD-FTDSVRDPQTLEILDITPCSFGGVSVITPGTNASSEVAVLYQDVNCTDVPTAI
AAZ67052.1      GRDTSD-FTDSVRDPQTLEILDISPCSFGGVSVITPGTNASSEVAVLYQDVNCTDVPAAI
AFS88936.1      VYDAYQNLVGYYSDDGNYYCLR--ACVSVPVSVIY--DKETKTHATLFGSVACEHISSTM
YP_0010399      VYDSFDNLVGYYSDDGNYYCVR--PCVSVPVSVIY--DKSTNLHATLFGSVACEHVTTMM
                  *  :   .   *  .   :   .*    ****    : :   *.*: .* *  :.  :
QDF43825.1      --HADQLTPAWRIYSTGNNVFQTQAGCLIGAEHVDTSY---ECDIPIGAGICASYHTVSS
AGZ48818.1      --HADQLTPSWRVYSTGNNVFQTQAGCLIGAEHVDTSY---ECDIPIGAGICASYHTVSS
ALK02457.1      --HADQLTPSWRVYSTGNNVFQTQAGCLIGAEHVDTSY---ECDIPIGAGICASYHTVSS
AAS10463.1      --HAEQLTPAWRIYSTGNNVFQTQAGCLIGAEHVDTSY---ECDIPIGAGICASYHTVSS
AAP13441.1      --HADQLTPAWRIYSTGNNVFQTQAGCLIGAEHVDTSY---ECDIPIGAGICASYHTVSL
AAP13567.1      --HADQLTPAWRIYSTGNNVFQTQAGCLIGAEHVDTSY---ECDIPIGAGICASYHTVSL
QHD43416.1      --HADQLTPTWRVYSTGSNVFQTRAGCLIGAEHVNNSY---ECDIPIGAGICASYQTQTN
AVP78031.1      --HADQLTPAWRIYATGTNVFQTQAGCLIGAEHVNASY---ECDIPIGAGICASYHTASI
ABD75323.1      --QADQLAPSWRVYTTGPYVFQTQAGCLIGAEHVNASY---QCDIPIGAGICASYHTASH
QDF43835.1      --RADQLTPAWRVYSTGINVFQTQAGCLIGAEHVNASY---ECDIPIGAGICASYHTAST
ABD75332.1      --HADQLTPAWRVYSTGVNVFQTQAGCLIGAEHVNASY---ECDIPIGAGICASYHTASV
QDF43820.1      --RADQLTPAWRVYSTGVNVFQTQAGCLIGAEHVNASY---ECDIPIGAGICASYHTAST
AAZ67052.1      --HADQLTPAWRVYSTGTNVFQTQAGCLIGAEHVNASY---ECDIPIGAGICASYHTAST
AFS88936.1      SQYSRSTRSMLKRRDSTYGPLQTPVGCVLGL--VNSSLFVEDCKLPLGQSLCALPDTPST
YP_0010399      S-QFSRLTQSNLRRRDSNIPLQTAVGCVIGLS--NNSLVVSDCKLPLGQSLCAV-PPVST
                                    :** .**::*    : *    :*.:*:* .:**   . : 
QDF43825.1      ----LRSTS----QKSI--------VAYTMSLGADSSIAYSNNTIAIPTNFSISITTEVM
AGZ48818.1      ----LRSTS----QKSI--------VAYTMSLGADSSIAYSNNTIAIPTNFSISITTEVM
ALK02457.1      ----LRSTS----QKSI--------VAYTMSLGADSSIAYSNNTIAIPTNFSISITTEVM
AAS10463.1      ----LRSTS----QKSI--------VAYTMSLGADSSIAYSNNTIAIPTNFSISITTEVM
AAP13441.1      ----LRSTS----QKSI--------VAYTMSLGADSSIAYSNNTIAIPTNFSISITTEVM
AAP13567.1      ----LRSTS----QKSI--------VAYTMSLGADSSIAYSNNTIAIPTNFSISITTEVM
QHD43416.1      SPRRARSVA----SQSI--------IAYTMSLGAENSVAYSNNSIAIPTNFTISVTTEIL
AVP78031.1      ----LRSTS----QKAI--------VAYTMSLGAENSIAYANNSIAIPTNFSISVTTEVM
ABD75323.1      ----LRSTG----QKSI--------VAYTMSLGAENSVAYANNSIAIPTNFSISVTTEVM
QDF43835.1      ----LRSVG----QKSI--------VAYTMSLGAENSIAYANNSIAIPTNFSISVTTEVM
ABD75332.1      ----LRSTG----QKSI--------VAYTMSLGAENSIAYANNSIAIPTNFSISVTTEVM
QDF43820.1      ----LRSVG----QKSI--------VAYTMSLGAENSIAYANNSIAIPTNFSISVTTEVM
AAZ67052.1      ----LRSVG----QKSI--------VAYTMSLGAENSIAYANNSIAIPTNFSISVTTEVM
AFS88936.1      ----LTPRS----VRSVPGEMRLASIAFNHPIQVDQ-LNSSYFKLSIPTNFSFGVTQEYI
YP_0010399      ----FRSYSASQFQLAV--------LNYTSPIVV-TPINSSGFTAAIPTNFSFSVTQEYI
                      . .      ::        : :. .: .   :  :  . :*****::.:* * :
QDF43825.1      PVSMAKTSVDCNMYICGDSTECANLLLQYGSFCTQLNRALSGIAVEQDRNTREVFAQVKQ
AGZ48818.1      PVSMAKTSVDCNMYICGDSTECANLLLQYGSFCTQLNRALSGIAVEQDRNTREVFAQVKQ
ALK02457.1      PVSMAKTSVDCNMYICGDSTECANLLLQYGSFCTQLNRALSGIAVEQDRNTREVFAQVKQ
AAS10463.1      PVSMAKTSVDCNMYICGDSTECANLLLQYGSFCRQLNRALSGIAAEQDRNTREVFVQVKQ
AAP13441.1      PVSMAKTSVDCNMYICGDSTECANLLLQYGSFCTQLNRALSGIAAEQDRNTREVFAQVKQ
AAP13567.1      PVSMAKTSVDCNMYICGDSTECANLLLQYGSFCTQLNRALSGIAAEQDRNTREVFAQVKQ
QHD43416.1      PVSMTKTSVDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGIAVEQDKNTQEVFAQVKQ
AVP78031.1      PVSMAKTSVDCTMYICGDSIECSNLLLQYGSFCTQLNRALSGIAIEQDKNTQEVFAQVKQ
ABD75323.1      PVSMAKTSVDCTMYICGDSLECSNLLLQYGSFCTQLNRALSGIAVEQDKNTQEVFAQVKQ
QDF43835.1      PVSMSKTSVDCTMYICGDSQECSNLLLQYGSFCTQLNRALTGIAIEQDKNTQEVFAQVKQ
ABD75332.1      PVSIAKTSVDCTMYICGDSLECSNLLLQYGSFCTQLNRALTGIAIEQDKNTQEVFAQVKQ
QDF43820.1      PVSMAKTSVDCTMYICGDSQECSNLLLQYGSFCTQLNRALTGVALEQDKNTQEVFAQVKQ
AAZ67052.1      PVSMAKTSVDCTMYICGDSLECSNLLLQYGSFCTQLNRALSGIAIEQDKNTQEVFAQVKQ
AFS88936.1      QTTIQKVTVDCKQYVCNGFQKCEQLLREYGQFCSKINQALHGANLRQDDSVRNLFASVKS
YP_0010399      ETSIQKVTVDCKQYVCNGFTRCEKLLVEYGQFCSKINQALHGANLRQDESVYSLYSNIKT
                 .:: *.:***. *:*..   * :** :**.** ::*.** *    ** .. .:: .:* 
QDF43825.1      MYKTPTLKD-FGG-FNFSQILPDPLKPTKRSF---IEDLLFNKVTLADAGFMKQYGECL-
AGZ48818.1      MYKTPTLKD-FGG-FNFSQILPDPLKPTKRSF---IEDLLFNKVTLADAGFMKQYGECL-
ALK02457.1      MYKTPTLKD-FGG-FNFSQILPDPLKPTKRSF---IEDLLFNKVTLADAGFMKQYGECL-
AAS10463.1      MYKTPTLKD-FGG-FNFSQILPDPLKPTKRSF---IEDLLFNKVTLADAGFMKQYGECL-
AAP13441.1      MYKTPTLKY-FGG-FNFSQILPDPLKPTKRSF---IEDLLFNKVTLADAGFMKQYGECL-
AAP13567.1      MYKTPTLKY-FGG-FNFSQILPDPLKPTKRSF---IEDLLFNKVTLADAGFMKQYGECL-
QHD43416.1      IYKTPPIKD-FGG-FNFSQILPDPSKPSKRSF---IEDLLFNKVTLADAGFIKQYGDCL-
AVP78031.1      IYKTPPIKD-FGG-FNFSQILPDPSKPSKRSF---IEDLLFNKVTLADAGFIKQYGDCL-
ABD75323.1      MYKTPTIRD-FGG-FNFSQILPDPLKPTKRSF---IEDLLYNKVTLADAGFMKQYADCL-
QDF43835.1      MYKTPAIKD-FGG-FNFSQILPDPSKPTKRSF---IEDLLFNKVTLADAGFMKQYGECL-
ABD75332.1      MYKTPAIKD-FGG-FNFSQILPDPSKPTKRSF---IEDLLFNKVTLADAGFMKQYGECL-
QDF43820.1      MYKTPAIKD-FGG-FNFSQILPDPSKPTKRSF---IEDLLFNKVTLADAGFMKQYGECL-
AAZ67052.1      MYKTPAIKD-FGG-FNFSQILPDPSKPTKRSF---IEDLLFNKVTLADAGFMKQYGECL-
AFS88936.1      SQSSPIIPG-FGGDFNLTLLEPVSISTGSRSARSAIEDLLFDKVTIADPGYMQGYDDCMQ
YP_0010399      T-STQTLEYGLNGDFNLTLLQVPQIGGSSSSYRSAIEDLLFDKVTIADPGYMQGYDDCMK
                  .:  :   :.* **:: :        . *    *****::***:**.*::: * :*: 
QDF43825.1      -GDINARDLICAQKFNGLTVLPPLLTDDMIAAYTAALVSGTATAGWTFGAGAALQIPFAM
AGZ48818.1      -GDINARDLICAQKFNGLTVLPPLLTDDMIAAYTAALVSGTATAGWTFGAGAALQIPFAM
ALK02457.1      -GDINARDLICAQKFNGLTVLPPLLTDDMIAAYTAALVSGTATAGWTFGAGAALQIPFAM
AAS10463.1      -GDINARDLICAQKFNGLTVLPPLLTDDMIAAYTAALVSGTATAGWTFGAGAALQIPFAM
AAP13441.1      -GDINARDLICAQKFNGLTVLPPLLTDDMIAAYTAALVSGTATAGWTFGAGAALQIPFAM
AAP13567.1      -GDINARDLICAQKFNGLTVLPPLLTDDMIAAYTAALVSGTATAGWTFGAGAALQIPFAM
QHD43416.1      -GDIAARDLICAQKFNGLTVLPPLLTDEMIAQYTSALLAGTITSGWTFGAGAALQIPFAM
AVP78031.1      -GGISARDLICAQKFNGLTVLPPLLTDEMIAAYTAALISGTATAGWTFGAGAALQIPFAM
ABD75323.1      -GGINARDLICAQKFNGLTVLPPLLTDDMIAAYTAALISGTATAGWTFGAGAALQIPFAM
QDF43835.1      -GDINARDLICAQKFNGLTVLPPLLTDDMIAAYTAALVSGTATAGWTFGAGAALQIPFAM
ABD75332.1      -GDISARDLICAQKFNGLTVLPPLLTDEMIAAYTAALVSGTATAGWTFGAGSALQIPFAM
QDF43820.1      -GDINARDLICAQKFNGLTVLPPLLTDDMIAAYTAALVSGTATAGWTFGAGAALQIPFAM
AAZ67052.1      -GDISARDLICAQKFNGLTVLPPLLTDEMIAAYTAALVSGTATAGWTFGAGSALQIPFAM
AFS88936.1      QGPASARDLICAQYVAGYKVLPPLMDVNMEAAYTSSLLGSIAGVGWTAGLSSFAAIPFAQ
YP_0010399      QGPQSARDLICAQYVSGYKVLPPLYDPNMEAAYTSSLLGSIAGAGWTAGLSSFAAIPFAQ
                 *   ******** . * .*****   :* * **::*:..    *** * .:   **** 
QDF43825.1      QMAYRFNGIGVTQNVLYENQKQIANQFNKAISQIQESLTTTSTALGKLQDVVNQNAQALN
AGZ48818.1      QMAYRFNGIGVTQNVLYENQKQIANQFNKAISQIQESLTTTSTALGKLQDVVNQNAQALN
ALK02457.1      QMAYRFNGIGVTQNVLYENQKQIANQFNKAISQIQESLTTTSTALGKLQDVVNQNAQALN
AAS10463.1      QMAYRFNGIGVTQNVLYENQKQIANQFNKAISQIQESLTTTSTALGKLQDVVNQNAQALN
AAP13441.1      QMAYRFNGIGVTQNVLYENQKQIANQFNKAISQIQESLTTTSTALGKLQDVVNQNAQALN
AAP13567.1      QMAYRFNGIGVTQNVLYENQKQIANQFNKAISQIQESLTTTSTALGKLQDVVNQNAQALN
QHD43416.1      QMAYRFNGIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTASALGKLQDVVNQNAQALN
AVP78031.1      QMAYRFNGIGVTQNVLYENQKLIANQFNSAIGKIQESLTSTASALGKLQDVVNQNAQALN
ABD75323.1      QMAYRFNGIGVTQNVLYENQKQIANQFNKAITQIQESLTTTSTALGKLQDVVNQNAQALN
QDF43835.1      QMAYRFNGIGVTQNVLYENQKQIANQFNKAISQIQESLTTTSTALGKLQDVVNQNAQALN
ABD75332.1      QMAYRFNGIGVTQNVLYENQKQIANQFNKAISQIQESLTTTSTALGKLQDVVNQNAQALN
QDF43820.1      QMAYRFNGIGVTQNVLYENQKQIANQFNKAISQIQESLTTTSTALGKLQDVVNQNAQALN
AAZ67052.1      QMAYRFNGIGVTQNVLYENQKQIANQFNKAISQIQESLTTTSTALGKLQDVVNQNAQALN
AFS88936.1      SIFYRLNGVGITQQVLSENQKLIANKFNQALGAMQTGFTTTNEAFQKVQDAVNNNAQALS
YP_0010399      SMFYRLNGVGITQQVLSENQKLIANKFNQALGAMQTGFTTSNLAFSKVQDAVNANAQALS
                .: **:**:*:**:** **** ***:**.*:  :* .::::  *: *:**.** *****.
QDF43825.1      TLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRA
AGZ48818.1      TLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRA
ALK02457.1      TLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRA
AAS10463.1      TLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRA
AAP13441.1      TLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRA
AAP13567.1      TLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRA
QHD43416.1      TLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRA
AVP78031.1      TLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRA
ABD75323.1      TLVKQLSSNFGAISSALNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRA
QDF43835.1      TLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRA
ABD75332.1      TLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRA
QDF43820.1      TLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRA
AAZ67052.1      TLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRA
AFS88936.1      KLASELSNTFGAISASIGDIIQRLDVLEQDAQIDRLINGRLTTLNAFVAQQLVRSESAAL
YP_0010399      KLASELSNTFGAISSSISDILARLDTVEQDAQIDRLINGRLISLNAFVSQQLVRSETAAR
                .*..:**..*****: :.**: *** :* :.******.*** :*:::*:***:*:     
QDF43825.1      SANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQAAPHGVVFLHVTYVPSQERNFTTAPA
AGZ48818.1      SANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQAAPHGVVFLHVTYVPSQERNFTTAPA
ALK02457.1      SANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQAAPHGVVFLHVTYVPSQERNFTTAPA
AAS10463.1      SANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQAAPHGVVFLHVTYVPSQERNFTTAPA
AAP13441.1      SANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQAAPHGVVFLHVTYVPSQERNFTTAPA
AAP13567.1      SANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQAAPHGVVFLHVTYVPSQERNFTTAPA
QHD43416.1      SANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQEKNFTTAPA
AVP78031.1      SANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYIPSQEKNFTTAPA
ABD75323.1      SANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPSQEKNFTTAPA
QDF43835.1      SANLAATKMSECVLGQSKRVDFCGRGYHLMSFPQAAPHGVVFLHVTYVPSQEKNFTTAPA
ABD75332.1      SANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQAAPHGVVFLHVTYVPSQERNFTTAPA
QDF43820.1      SANLAATKMSECVLGQSKRVDFCGRGYHLMSFPQAAPHGVVFLHVTYVPSQEKNFTTAPA
AAZ67052.1      SANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQAAPHGVVFLHVTYVPSQERNFTTAPA
AFS88936.1      SAQLAKDKVNECVKAQSKRSGFCGQGTHIVSFVVNAPNGLYFMHVGYYPSNHIEVVSAYG
YP_0010399      SAQLASDKVNECVKSQSKRNGFCGSGTHIVSFVVNAPNGFYFFHVGYVPTNYTNVTAAYG
                **:**  *:.*** .**** .*** * *::**   **:*. *:** * *::  :..:* .
QDF43825.1      ICHEGK---AYFPREGVFVFNGTS-------WFITQRNFFSPQIITTDNT-FVSGSCDVV
AGZ48818.1      ICHEGK---AYFPREGVFVFNGTS-------WFITQRNFFSPQIITTDNT-FVSGSCDVV
ALK02457.1      ICHEGK---AYFPREGVFVFNGTS-------WFITQRNFFSPQIITTDNT-FVSGSCDVV
AAS10463.1      ICHEGK---AYFPREGVFVFNGTS-------WFITQRNFFSPQIITTDNT-FVSGNCDVV
AAP13441.1      ICHEGK---AYFPREGVFVFNGTS-------WFITQRNFFSPQIITTDNT-FVSGNCDVV
AAP13567.1      ICHEGK---AYFPREGVFVFNGTS-------WFITQRNFFSPQIITTDNT-FVSGNCDVV
QHD43416.1      ICHDGK---AHFPREGVFVSNGTH-------WFVTQRNFYEPQIITTDNT-FVSGNCDVV
AVP78031.1      ICHEGK---AHFPREGVFVSNGTH-------WFVTQRNFYEPKIITTDNT-FVSGNCDVV
ABD75323.1      ICHEGK---AYFPREGVFVSNGSS-------WFITQRNFYSPQIITTDNT-FVAGSCDVV
QDF43835.1      ICHEGK---AYFPREGVFVSNGTS-------WFITQRNFYSPQIITTDNT-FVAGSCDVV
ABD75332.1      ICHEGK---AYFPREGVFVSNGTS-------WFITQRNFYSPQIITTDNT-FVAGNCDVV
QDF43820.1      ICHEGK---AYFPREGVFVSNGTF-------WFITQRNFYSPQIITTDNT-FVAGNCDVV
AAZ67052.1      ICHEGK---AYFPREGVFVSNGTS-------WFITQRNFYSPQIITTDNT-FVAGSCDVV
AFS88936.1      LCDAANPTNCIAPVNGYFIKTNNT--RIVDEWSYTGSSFYAPEPITSLNTKYVA--PQVT
YP_0010399      LCNNNNPPLCIAPIDGYFITNQTTTYSVDTEWYYTGSSFYKPEPITQANSRYVS--SDVK
                :*   :   .  * :* *: . .        *  *  .*: *: **  *: :*:   :* 
QDF43825.1      IGIINNTVYDPL---QPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRL
AGZ48818.1      IGIINNTVYDPL---QPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEINRL
ALK02457.1      IGIINNTVYDPL---QPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRL
AAS10463.1      IGIINNTVYDPL---QPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQEEIDRL
AAP13441.1      IGIINNTVYDPL---QPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRL
AAP13567.1      IGIINNTVYDPL---QPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRL
QHD43416.1      IGIVNNTVYDPL---QPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRL
AVP78031.1      IGIINNTVYDPL---QPELDSFKEELDKYFKNHTSPDIDLGDISGINASVVNIQKEIDRL
ABD75323.1      IGIINNTVYDPL---QPELDSFKQELDKYFKNHTSPDVDLGDISGINASVVDIQKEIDRL
QDF43835.1      IGIINNTVYDPL---QPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRL
ABD75332.1      IGIINNTVYDPL---QPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRL
QDF43820.1      IGIINNTVYDPL---QPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRL
AAZ67052.1      IGIINNTVYDPL---QPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRL
AFS88936.1      YQNISTNLPPPLLGNSTGID-FQDELDEFFKNVSTSIPNFGSLTQINTTLLDLTYEMLSL
YP_0010399      FDKLENNLPPPLLENSTDVD-FKDELEEFFKNVTSHGPNFAEISKINTTLLDLSDEMAML
                   :...:  **   .. :* *::**:::*** ::   ::..:: **::::::  *:  *
QDF43825.1      NEVAKNLNESLIDLQELGKYEQYIKWPWYVWLGFIAGLIAIVMVTILLCCMTSCCSCLKG
AGZ48818.1      NEVAKNLNESLIDLQELGKYEQYIKWPWYVWLGFIAGLIAIVMVTILLCCMTSCCSCLKG
ALK02457.1      NEVAKNLNESLIDLQELGKYEQYIKWPWYVWLGFIAGLIAIVMVTILLCCMTSCCSCLKG
AAS10463.1      NEVAKNLNESLIDLQELGKYEQYIKWPWYVWLGFIAGLIAIVMVTILLCCMTSCCSCLKG
AAP13441.1      NEVAKNLNESLIDLQELGKYEQYIKWPWYVWLGFIAGLIAIVMVTILLCCMTSCCSCLKG
AAP13567.1      NEVAKNLNESLIDLQELGKYEQYIKWPWYVWLGFIAGLIAIVMVTILLCCMTSCCSCLKG
QHD43416.1      NEVAKNLNESLIDLQELGKYEQYIKWPWYIWLGFIAGLIAIVMVTIMLCCMTSCCSCLKG
AVP78031.1      NEVARNLNESLIDLQELGKYEQYIKWPWYVWLGFIAGLIAIVMVTILLCCMTSCCSCLKG
ABD75323.1      NEVAKNLNESLIDLQELGKYEQYIKWPWYVWLGFIAGLVGLFMAIILLCYFTSCCSCCKG
QDF43835.1      NEVAKNLNESLIDLQELGKYEQYIKWPWYVWLGFIAGLIAIVMATILLCCMTSCCSCLKG
ABD75332.1      NEVAKNLNESLIDLQELGKYEQYIKWPWYVWLGFIAGLIAIVMVTILLCCMTSCCSCLKG
QDF43820.1      NEVAKNLNESLIDLQELGKYEQYIKWPWYVWLGFIAGLIAIVMATILLCCMTSCCSCLKG
AAZ67052.1      NEVAKNLNESLIDLQELGKYEQYIKWPWYVWLGFIAGLIAIVMVTILLCCMTSCCSCLKG
AFS88936.1      QQVVKALNESYIDLKELGNYTYYNKWPWYIWLGFIAGLVALALCVFFILCCTGCGTNCMG
YP_0010399      QEVVKQLNDSYIDLKELGNYTYYNKWPWYVWLGFIAGLVALLLCVFFLLCCTGCGTSCLG
                ::*.. **:* ***:***:*  * *****:********:.: :  :::   *.* :   *
QDF43825.1      ACSCGSCC-KFDEDDSEPVLKGVKLHYT
AGZ48818.1      ACSCGSCC-KFDEDDSEPVLKGVKLHYT
ALK02457.1      ACSCGSCC-KFDEDDSEPVLKGVKLHYT
AAS10463.1      ACSCGSCC-KFDEDDSEPVLKGVKLHYT
AAP13441.1      ACSCGSCC-KFDEDDSEPVLKGVKLHYT
AAP13567.1      ACSCGSCC-KFDEDDSEPVLKGVKLHYT
QHD43416.1      CCSCGSCC-KFDEDDSEPVLKGVKLHYT
AVP78031.1      CCSCGSCC-KFDEDDSEPVLKGVKLHYT
ABD75323.1      MCSCGSCC-RFDEDDSEPVLKGVKLHYT
QDF43835.1      ACSCGSCC-KFDEDDSEPVLKGVKLHYT
ABD75332.1      ACSCGSCC-KFDEDDSEPVLKGVKLHYT
QDF43820.1      ACSCGSCC-KFDEDDSEPVLKGVKLHYT
AAZ67052.1      ACSCGSCC-KFDEDDSEPVLKGVKLHYT
AFS88936.1      KLKCNRCCDRYEEYDLEP----HKVHVH
YP_0010399      KMKCKNCCDSYEEYDVE------KIHVH
                  .*  **  ::* * *      *:*  

Part 2: Creating a phylogenetic tree with Phylogeny.fr

In order to analyze sequence data we will use the Phylogeny.fr, a free, simple to use web service dedicated to reconstructing and analyzing phylogenetic relationships between molecular sequences.

  1. In your browser, go to the website www.phylogeny.fr. Scroll down on the page to the section labeled ‘Phylogeny analysis’, and click on the text ‘One Click’.
  2. Click in the large text field labeled ‘Upload your set of sequences in FASTA, EMBL, or NEXUS format’. Copy the list of sequences from the talk page and use Ctrl-V (or command-V) to paste your sequences here, then click the “Submit” button.
  3. You will see a page named Alignment results. After your alignment is complete, you will see a new page named Phylogeny results. Finally, you will see a page named Tree rendering results. You will come back to these pages later. For now, find the numbered tabs located just beneath the text One Click Mode, and click on the tab labeled 3. Alignment.
    • Within the alignment, individual positions are color-coded to indicate their conservation, or how similar the sequences are to each other at that position. Blue highlighting indicates high conservation (i.e., the sequences are identical or at least very similar), while gray highlighting indicates lower conservation and white highlighting indicates little if any conservation.
  4. Near the bottom of the page, under Outputs, click on Alignment in Clustal format. This will display your alignment in a text-only format in which each position's conservation is indicated by a symbol underneath the alignment block (“*” for invariant, “:” for highly conserved, “.” for weakly conserved, and a space for not conserved). Copy and paste this entire alignment into your individual journal entry. Use the space character at the beginning of each line so that the sequence lines up properly on your page.
  5. Now go back and click on the tab 6. Tree Rendering, and you will see a phylogenetic tree of the five sequences.
    • On this tree, horizontal lines (branches) represent individual evolutionary lineages. By contrast, vertical lines (splits) represent mutation events, and the vertical length of each split is drawn purely for visual clarity with no biological meaning. The left-most split is called the root of the tree, and represents a hypothesis about the most recent common ancestor (MRCA) of the sequences within your tree.
      • In Figure 2 of Wan et al. (2020), an outgroup called BtSCoV PDF2386 is used. However, I was unable to find this sequence in GenBank for us to use. Instead, the sequences from Figure 3C, MERS-CoV and HKU4 are provided, which essentially create two outgroups.
    • The length of each branch represents the percentage change in amino acid sequence occurring along that branch, relative to the scale bar shown at the bottom of the tree. The scale bar will be a number between 0 and 1 and can be reinterpreted as a percent. For example, 0.05 would be 5%. The tree may also contain support values for each clade; shown in red on the branches, also expressed as a number between 0 and 1. 0.05 would be 5%. In general, a higher support value indicates a higher statistical confidence in a particular clade.
    • Save the image to a file, upload it to the wiki, and display it on your individual journal page.
      • This is the phylogenetic tree that I developed

CorreyPhyloTree.JPG

  1. Compare the tree to the multiple sequence alignment. See if you can relate the differences in the sequences to the topology of the tree diagram. Describe the relationship in your individual journal page.
  2. Relate your alignment to Figure 3 of the Wan et al. (2020) paper.
    • for reference the article can be found here: Wan, et al. (2020). Receptor recognition by the novel coronavirus from Wuhan: an analysis based on decade-long structural studies of SARS coronavirus.
    • What are the similarities and differences between your alignment and the one shown in Figure 3?
      • The code from the spike that I analyzed has a corresponding code of NTRNIDATSTGNYNYKYRYLRHGKLRPFERDISNVPFSPDGKPCTPP-AL-NCYW-----------PLNDYGFYTTTGIGYQPY that mirros the highlighted region of the figure. This protein sequences is very similar to both the human-SARS-2002 and civet-SARS-2002, and much less similar to the 1029n-COV. There are section of the code that are maintained throughout all the protein spikes, but the putative E2 glycoprotein precursor [SARS coronavirus CUHK-W1] that I analyzed had signfificatn differences from the 2019-nCOV that is causing the pandemic we see today.
  3. Compare your tree to Figure 2 of the Wan et al. (2020) paper.
    • What are the similarities and differences between your alignment and the one shown in Figure 2?
      • The journal article had more proteins that they analyzed, so their figure had more branches. One major similarity is that both trees clearly show that the viruses that infected a single species were all very similar. Both trees have a branch with viruses from bats and a large group with viruses that infected humans. One major difference is the lengths of the branches. In both the article and the tree I constructed the branches represent mutations/changes in the protein sequence. There is much more variety in the lengths of the branches in the journal article then the tree I constructed. This could be due to the fact that there are just more samples in the article, or it could be attributed the the program differences. They used an aligned program called Clustal Omega instead of the free online software that I used. Overall, the trees look fairly similar and both display the major findings of the protein sequencing well, but the article had more data and thus a more detailed phylogenetic tree.
  4. Is enough information provided by Wan et al (2020) in their paper for us to reproduce their analysis? Explain your answer.
    • There was not enough information in the article to produce an accurate replicate. The methods section was almost non-existent, all it did was list the programs they used. They left out a significant amount of crucial information such as where they got the protein sequences. The article, on multiple occasions, alluded to the decade of research that this group put into studying SARS type viruses, but gave no indication of how they conducted this research. The article draws firm conclusions about the origins of the Novel Corona Virus, but with a lack of methods describing their research it brings all their finding into a questionable area.

Conclusion

This week's assignment was very useful. Through the journal club we were able to fully understand the Wan et al. article and this week I was able to use that knowledge to contrast the article with my own findings. Through the formation of my own phylogenetic tree I was able to draw conclusions about the accuracy of the article. The tree that I formed supported the main finding of the article that the spike protein in the 2019-nCOV virus originated in some type of SARS virus that was around in the early 2000's.

Acknowledgements

  • Yaniv Maddahi
    • Yaniv and I worked as homework partners for this week. We communicated and worked together both at the end of the week 4 lab and throughout the week to create our phylogenic trees and assignment pages.
  • Dr. Dahlquist
    • Dr. Dahlquist served as a coach for how to begin our pages. She also instructed the class and provided us with the guiding homework document.
  • Except for what is noted above, this individual journal entry was completed by me and not copied from another source.

Jcorrey (talk) 21:49, 30 September 2020 (PDT)

References

JT Correy Template

BIOL368/F20

JT Correy Template

Weekly Assignments

Individual Journal Pages

Class Journal Pages