Kam Taghizadeh Week 4

From OpenWetWare
Jump to navigationJump to search

user:Kam Taghizadeh

Template: Kam Taghizadeh

Links to Weekly Assignments

Links to Individual Journal Assignments

Links to Shared Journal Assignments

Purpose

  • This week's assignment instructs one how to access particular genomic sequences and learn how to compare them to other sequences using phylogenetic trees, in order to determine a common ancestor.By learning these skills, we can better analyze viral strains through their similarities and differences.

Methods and Results

Part 1: Access GenBank Records

  1. I chose the coronavirus 2 isolate Wuhan-Hu-1 from the GenBank Record of the Data & Resources section from the BIOL/F20 Week 4 page and viewed the full record and the FASTA formatted sequence.
    • The Accession number was: MN908947
    • I interpreted the information provided on GenBank regarding this particular protein sequence:
      1. Definition:Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome
      2. Organism:Severe acute respiratory syndrome coronavirus 2
      3. Title:A new coronavirus associated with human respiratory disease in China
      4. Source:Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)
    • Full sequence of the genome
  2. I Downloaded the nucleotide sequence in FASTA format to my local hard drive.
  3. I Clicked the send-to link in the upper right side of the page. Selected Complete Record, File as the Destination, and FASTA as the format. I clicked the File button and remembered where to put the file and named it so it can be found later.
  4. I Opened the file that was saved with a word processor to confirm that the sequence is there and is in the FASTA format. In this format, each sequence begins with a greater than sign (>).
  5. After gaining a good understanding on how to navigate GenBank with the chosen sequence from the data and resources section from the BIOL/F20 Week 4 page,I searched for my assigned viral sequence, Bat SARS-like coronavirus isolate bat-SL-CoVZC45.
  6. I then searched for the spike protein of the bat-SL-CoVZC45 sequence in the GenBank record.
    • I then added a hyperlink to it in the list of sequences in the Data & Tool section of the Week 4 Assignment.
      • I downloaded the spike protein sequence in the FASTA format.
    • Spike Protein [bat-SL-CoVZC45] Sequence:
  7. Spike protein sequence accessed from GenBank.
>AVP78031.1 spike protein [Bat SARS-like coronavirus]
MLFFLFLQFALVNSQCVNLTGRTPLNPNYTNSSQRGVYYPDTIYRSDTLVLSQGYFLPFYSNVSWYYSLT
TNNAATKRTDNPILDFKDGIYFAATEHSNIIRGWIFGTTLDNTSQSLLIVNNATNVIIKVCNFDFCYDPY
LSGYYHNNKTWSIREFAVYSSYANCTFEYVSKSFMLNISGNGGLFNTLREFVFRNVDGHFKIYSKFTPVN
LNRGLPTGLSVLQPLVELPVSINITKFRTLLTIHRGDPMPNNGWTAFSAAYFVGYLKPRTFMLKYNENGT
ITDAVDCALDPLSETKCTLKSLTVQKGIYQTSNFRVQPTQSVVRFPNITNVCPFHKVFNATRFPSVYAWE
RTKISDCIADYTVFYNSTSFSTFKCYGVSPSKLIDLCFTSVYADTFLIRFSEVRQVAPGQTGVIADYNYK
LPDDFTGCVIAWNTAKQDVGNYFYRSHRSTKLKPFERDLSSDENGVRTLSTYDFNPNVPLEYQATRVVVL
SFELLNAPATVCGPKLSTQLVKNQCVNFNFNGLKGTGVLTDSSKRFQSFQQFGKDASDFIDSVRDPQTLE
ILDITPCSFGGVSVITPGTNTSLEVAVLYQDVNCTDVPTTIHADQLTPAWRIYATGTNVFQTQAGCLIGA
EHVNASYECDIPIGAGICASYHTASILRSTSQKAIVAYTMSLGAENSIAYANNSIAIPTNFSISVTTEVM
PVSMAKTSVDCTMYICGDSIECSNLLLQYGSFCTQLNRALSGIAIEQDKNTQEVFAQVKQIYKTPPIKDF
GGFNFSQILPDPSKPSKRSFIEDLLFNKVTLADAGFIKQYGDCLGGISARDLICAQKFNGLTVLPPLLTD
EMIAAYTAALISGTATAGWTFGAGAALQIPFAMQMAYRFNGIGVTQNVLYENQKLIANQFNSAIGKIQES
LTSTASALGKLQDVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTY
VTQQLIRAAEIRASANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYIPSQEKNFTT
APAICHEGKAHFPREGVFVSNGTHWFVTQRNFYEPKIITTDNTFVSGNCDVVIGIINNTVYDPLQPELDS
FKEELDKYFKNHTSPDIDLGDISGINASVVNIQKEIDRLNEVARNLNESLIDLQELGKYEQYIKWPWYVW
LGFIAGLIAIVMVTILLCCMTSCCSCLKGCCSCGSCCKFDEDDSEPVLKGVKLHYT
  • This protein sequence can also be found in the Week 4 Talk Page for this assignment.

Part 2: Creating a Phylogenetic Tree using Phylogeny.fr

  1. I went on www.phylogeny.fr, scrolled down the page to the section called ‘Phylogeny analysis’, and clicked on the text ‘One Click’.
  2. I clicked in the text field labeled ‘Upload your set of sequences in FASTA, EMBL, or NEXUS format’.
  3. In order to create a phylogenetic tree, I copied the list of sequences from the BIOL/F20 Week 4 page talk page and pasted them in the text field using Ctrl-V, then clicked submit.
  4. Once alignment was completed, I saw a page with Alignment results, a page with phylogeny results, and a page with Tree rendering results.
  5. I found the numbered tabs located below the text One Click Mode, and clicked the tab labeled 3.Alignment.
  6. In alignment, the individual positions are color coded to display their conservation as such:
    • Blue highlighting=high conservation
    • Gray highlighting=lower conservation
    • White highlighting=little/no conservation
  7. I made an initial observation regarding color coding.
    • The beginning of the alignment showed very little conservation compared to the rest of the sequences.
    • The end of the alignment showed much more conservation compared to the rest of the sequences.
  8. Towards the bottom of the page, I clicked Alignment in Clustal format under Outputs. This displayed the alignment in text-only format, where conservation is displayed as a symbol underneath the alignment block as such:
    • “*” for invariant
    • “:” for highly conserved
    • “.” for weakly conserved
    • "space" for not conserved
  9. This entire alignment is copied and pasted below, and was formatted properly by using the space character at the beginning of each line.

3. Table 1:Class Sequence alignment-CLUSTAL FORMAT: MUSCLE (3.8) multiple sequence alignments

QDF43825.1      ---------MKLLVLV-----FATLVSSYTIEKCTDFD------DRTPPSNTQFLSSHRG
ALK02457.1      ----------MFIFLF-----FLTLTSGSDLESCTTFD------DVQAPNYPQHSSSRRG
AAS10463.1      ----------MFIFLL-----FLTLTSGSDLDRCTTFD------DVQAPNYTQHTSSMRG
AAP13441.1      ----------MFIFLL-----FLTLTSGSDLDRCTTFD------DVQAPNYTQHTSSMRG 
AAP13567.1      ----------MFIFLL-----FLTLTSGSDLDRCTTFD------DVQAPNYTQHTSSMRG
QHD43416.1      ----------MFVFLV-----LLPLVS----SQCVNLT------TRTQLPPAYTNSFTRG
AVP78031.1      -----------MLFFL-----FLQFALVN--SQCVNLT------GRTPLNPNYTNSSQRG
ABD75323.1      --------MKILIFAF-----LVTLVKAQ--EGCGVIN------LRTQPKLTQVSSSRRG
QDF43835.1      --------MKVLIVLL-----CLGLVTAQ--DGCGHIS------TKPQPLLDKFSSSRRG
QDF43820.1      --------MKILIFAF-----LVTLVEAQ--EGCGIIS------RKPQPKMAQVSSSRRG
AAZ67052.1      --------MKILILAF-----LASLAKAQ--EGCGIIS------RKPQPKMAQVSSSRRG
AFS88936.1      ----MIHSVFLLMFLLTPTESYVDVGPDSVKSACIEVDIQQTFFDKTWPRPIDVSKA-DG
YP_0010399      MTLLMCLLMSLLIFVRGCDSQFVDMSPASNTSECLESQVDAAAFSKLMWPYPIDPSKVDG
                          ::.          .      . *                     .   *
QDF43825.1      VYYPDDIFRSNVLHLVQDHFLPFDSNVT--RFITFGLN--------------FDNPIIPF
ALK02457.1      VYYPDEIFRSDTLYLTQDLFLPFYSNVT--GFHTINHR--------------FDNPVIPF
AAS10463.1      VYYPDEIFRSDTLYLTQDLFLPFYSNVT--GFHTINHT--------------FDDPVIPF
AAP13441.1      VYYPDEIFRSDTLYLTQDLFLPFYSNVT--GFHTINHT--------------FGNPVIPF
AAP13567.1      VYYPDEIFRSDTLYLTQDLFLPFYSNVT--GFHTINHT--------------FDNPVIPF
QHD43416.1      VYYPDKVFRSSVLHSTQDLFLPFFSNVT--WFHAIHVSGTNGTK-------RFDNPVLPF
AVP78031.1      VYYPDTIYRSDTLVLSQGYFLPFYSNVS--WYYSLTTNNAATKR--------TDNPILDF
ABD75323.1      VYYNDDIFRSDVLHLTQDYFLPFHSNLT--QYFSLNIESDKIVY--------FDNPILKF
QDF43835.1      VYYNDDIFRSDVLHLTQDYFLPFDTNLT--RYLSFNMDSATKVY--------FDNPTLPF
QDF43820.1      VYYNDDIFRSDVLHLTQDYFLPFDSNLT--QYFSLNVDSDRYTY--------FDNPILDF
AAZ67052.1      VYYNDDIFRSNVLHLTQDYFLPFDSNLT--QYFSLNVDSDRFTY--------FDNPILDF
AFS88936.1      IIYPQGRTYSNITITYQGLF-PYQGDHG--DMYVYSAGHATGTTPQKLFVANYSQDVKQF
YP_0010399      IIYPLGRTYSNITLAYTGLF-PLQGDLGSQYLYSVSHAVGHDGDPTKAYISNYSLLVNDF
                : *      *.      . * *   :                           .     *
QDF43825.1      RDGVYF----AATEKSNVIRG-------------WVFGSTMNNKSQ---------SVIIM
ALK02457.1      KDGVYF----AATEKSNVVRG-------------WVFGSTMNNKSQ---------SVIII
AAS10463.1      KDGIYF----AATEKSNVVRG-------------WVFGSTMNNKSQ---------SVIII
AAP13441.1      KDGIYF----AATEKSNVVRG-------------WVFGSTMNNKSQ---------SVIII
AAP13567.1      KDGIYF----AATEKSNVVRG-------------WVFGSTMNNKSQ---------SVIII
QHD43416.1      NDGVYF----ASTEKSNIIRG-------------WIFGTTLDSKTQ---------SLLIV
AVP78031.1      KDGIYF----AATEHSNIIRG-------------WIFGTTLDNTSQ---------SLLIV
ABD75323.1      GDGVYF----AATEKSNVIRG-------------WVFGSTFDNTTQ---------SAIIV
QDF43835.1      GDGIYF----AATEKSNVVRG-------------WIFGSTMDNTTQ---------SAIIV
QDF43820.1      GDGVYF----AATEKSNVIRG-------------WIFGSTFDNTTQ---------SAVIV
AAZ67052.1      GDGVYF----AATEKSNVIRG-------------WIFGSTFDNTTQ---------SAVIV
AFS88936.1      ANGFVVRIGAAANSTGTVIISPSTSATIRKIYPAFMLGSSVGNFSDGKMGRFFNHTLVLL
YP_0010399      DNGFVVRIGAAANSTGTIVISPSVNTKIKKAYPAFILGSSLTNTSAGQ-PLYANYSLTII
                 :*. .    *:.. ..:: .             :::*::. . :          :  ::
QDF43825.1      NNSTNLVIRACNFELCDNPFFVVLRSNNTQIPSY----IFNNAFNCTFEYVSKDFNLDIG
ALK02457.1      NNSTNVVIRACNFELCDNPFFAVSKPTGTQTHTM----IFDNAFNCTFEYISDSFSLDVA
AAS10463.1      NNSTNVVIRACNFELCDNPFFVVSKPMGTRTHTM----IFDNAFNCTFEYISDAFSLDVS
AAP13441.1      NNSTNVVIRACNFELCDNPFFAVSKPMGTQTHTM----IFDNAFNCTFEYISDAFSLDVS
AAP13567.1      NNSTNVVIRACNFELCDNPFFAVSKPMGTQTHTM----IFDNAFNCTFEYISDAFSLDVS
QHD43416.1      NNATNVVIKVCEFQFCNDPFLGVYYHKNNKSWMESEFRVYSSANNCTFEYVSQPFLMDLE
AVP78031.1      NNATNVIIKVCNFDFCYDP-YLSGYYHNNKTWSIREFAVYSSYANCTFEYVSKSFMLNIS
ABD75323.1      NNSTHIIIRVCYFNLCKDPMYTVSAGTQKSSW------VYQSAFNCTYDRVEKSFQLDTS
QDF43835.1      NNSTHIIIRVCYFNLCKEPMYAISNEQHYKSW------VYQNAYNCTYDRVEQSFQLDTA
QDF43820.1      NNSTHIIIRVCNFNLCKEPMYTVSRGTQQSSW------VYQSAFNCTYDRVERSFQLDTA
AAZ67052.1      NNSTHIIIRVCNFNLCKEPMYTVSRGAQQSSW------VYQSAFNCTYDRVEKSFQLDTA
AFS88936.1      PDGCGTLLRAFYCILEPRSGNHCPAGNSYTSFAT----YHTPATDCSDGNYNRNASLNSF
YP_0010399      PDGCGTVLHAFYCILKPRTVNRCPSGTGYVSYFI----YETVHNDC-QSTINRNASLNSF
                 :.   ::..    :   .                         :*     .    ::  
QDF43825.1      EKPGNFKDLREFVFRN--------KDGFLHVYSGYQPISAASGLPTGF--NALKPIFKLP
ALK02457.1      EKSGNFKHLREFVFKN--------KDGFLYVYKGYQPIDVVRDLPSGF--NILKPIFKLP
AAS10463.1      EKSGNFKHLREFVFKN--------KDGFLYVYKGYQPIDVVRDLPSGF--NTLKPIFKLP
AAP13441.1      EKSGNFKHLREFVFKN--------KDGFLYVYKGYQPIDVVRDLPSGF--NTLKPIFKLP
AAP13567.1      EKSGNFKHLREFVFKN--------KDGFLYVYKGYQPIDVVRDLPSGF--NTLKPIFKLP
QHD43416.1      GKQGNFKNLREFVFKN--------IDGYFKIYSKHTPINLVRDLPQGF--SALEPLVDLP
AVP78031.1      GNGGLFNTLREFVFRN--------VDGHFKIYSKFTPVNLNRGLPTGL--SVLQPLVELP
ABD75323.1      PKTGNFTDLREFVFKN--------RDGFFTAYQTYTPVNLLRGLPSGL--SVLKPILKLP
QDF43835.1      PQTGNFKDLREYVFKN--------KDGFLSVYNAYSPIDIPRGLPVGF--SVLKPILKLP
QDF43820.1      PKTGNFKDLREYVFKN--------RDGFLSVYQTYTAVNLPRGLPIGF--SVLRPILKLP
AAZ67052.1      PKTGNFKDLREYVFKN--------RDGFLSVYQTYTAVNLPRGLPIGF--SVLRPILKLP
AFS88936.1      KE---YFNLRNCTFMYTYNITEDEILEWFGITQTAQGVHLFSSRYVDLYGGNMFQFATLP
YP_0010399      KS---FFDLVNCTFFNSWDITADETKEWFGITQDTQGVHLYSSRKGDLYGGNMFRFATLP
                 .   :  * : .*              :   .    :    .   .:  . :  :  **
QDF43825.1      LGINITNFRTLLTAF------PPNPGYWGTSAAAYFVGYLKPTTFMLKYDENGTITDAVD
ALK02457.1      LGINITNFRAILTAF------LPAQDTWGTSAAAYFVGYLKPATFMLKYDENGTITDAVD
AAS10463.1      LGINITNFRAILTAF------SPAQDTWGTSAAAYFVGYLKPTTFMLKYDENGTITDAVD
AAP13441.1      LGINITNFRAILTAF------SPAQDIWGTSAAAYFVGYLKPTTFMLKYDENGTITDAVD
AAP13567.1      LGINITNFRAILTAF------SPAQDTWGTSAAAYFVGYLKPTTFMLKYDENGTITDAVD
QHD43416.1      IGINITRFQTLLALHRSYLTPGDSSSGWTAGAAAYYVGYLQPRTFLLKYNENGTITDAVD
AVP78031.1      VSINITKFRTLLTIHRGD---PMPNNGWTAFSAAYFVGYLKPRTFMLKYNENGTITDAVD
ABD75323.1      FGINITSFRVVMAMF------SKTTSNYVPESAAYYVGNLKQSTFMLSFNQNGTIVDAVD
QDF43835.1      IGINITSFKVVMSMF------SRTTSNFLPEVAAYFVGNLKYSTFMLNFNENGTITDAID
QDF43820.1      FGINITSYRVVMAMF------SQTTSNFLPESAAYYVGNLKYTTFMLRFNENGTITDAID
AAZ67052.1      FGINITSYRVVMAMF------SQTTSNFLPESAAYYVGNLKYTTFMLSFNENGTITNAID
AFS88936.1      VYDTIKYYSIIPHSIRSI---QSDRKAW----AAFYVYKLQPLTFLLDFSVDGYIRRAID
YP_0010399      VYEGIKYYTVIPRSFRSK---ANKREAW----AAFYVYKLHQLTYLLDFSVDGYIRRAID
                .   *. :  :                :    **::*  *:  *::* :. :* *  *:*
QDF43825.1      CSQNPLAELKCSVKSFEIDKGIYQTSNFRVAPSKEVVRFPNITNLCPFGEVFNATTFPSV
ALK02457.1      CSQNPLAELKCSVKSFEIDKGIYQTSNFRVAPSKEVVRFPNITNLCPFGEVFNATTFPSV
AAS10463.1      CSQNPLAELKCSVKSFEIDKGIYQTSNFRVVPSGDVVRFPNITNLCPFGEVFNATKFPSV
AAP13441.1      CSQNPLAELKCSVKSFEIDKGIYQTSNFRVVPSGDVVRFPNITNLCPFGEVFNATKFPSV
AAP13567.1      CSQNPLAELKCSVKSFEIDKGIYQTSNFRVVPSGDVVRFPNITNLCPFGEVFNATKFPSV
QHD43416.1      CALDPLSETKCTLKSFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFGEVFNATRFASV
AVP78031.1      CALDPLSETKCTLKSLTVQKGIYQTSNFRVQPTQSVVRFPNITNVCPFHKVFNATRFPSV
ABD75323.1      CSQDPLAELKCTTKSFNVSKGIYQTSNFRVSPVTEVVRFPNITNLCPFDKVFNATRFPSV
QDF43835.1      CAQNPLSELKCTIKNFNVSKGIYQTSNFRVSPTHEVIRFPNITNRCPFDKVFNASRFPNV
QDF43820.1      CAQNPLAELKCTIKNFNVSKGIYQTSNFRVSPTQEVVRFPNITNRCPFDKVFNASRFPNV
AAZ67052.1      CAQNPLAELKCTIKNFNVSKGIYQTSNFRVSPTQEVIRFPNITNRCPFDKVFNATRFPNV
AFS88936.1      CGFNDLSQLHCSYESFDVESGVYSVSSFEAKPSGSVVEQAEGVE-CDFSPLLSGTP-PQV
YP_0010399      CGHDDLSQLHCSYTSFEVDTGVYSVSSYEASATGTFIEQPNATE-CDFSPMLTGVA-PQV
                *. : *:: :*:  .: :..*:*..*.: . .   .:  .: .: * *  ::..   ..*
QDF43825.1      YAWERKRISNCVADYSVLYNSTSFSTFKCYGVSATKLNDLCFSNVYADSFVVKGDDVRQI
ALK02457.1      YAWERKRISNCVADYSVLYNSTSFSTFKCYGVSATKLNDLCFSNVYADSFVVKGDDVRQI
AAS10463.1      YAWERKRISNCVADYSVLYNSTSFSTFKCYGVSATKLNDLCFSNVYADSFVVKGDDVRQI
AAP13441.1      YAWERKKISNCVADYSVLYNSTFFSTFKCYGVSATKLNDLCFSNVYADSFVVKGDDVRQI
AAP13567.1      YAWERKKISNCVADYSVLYNSTFFSTFKCYGVSATKLNDLCFSNVYADSFVVKGDDVRQI
QHD43416.1      YAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLNDLCFTNVYADSFVIRGDEVRQI
AVP78031.1      YAWERTKISDCIADYTVFYNSTSFSTFKCYGVSPSKLIDLCFTSVYADTFLIRFSEVRQV
ABD75323.1      YAWERTKISDCVADYTVFYNSTSFSTFNCYGVSPSKLIDLCFTSVYADTFLIRFSEVRQV
QDF43835.1      YAWERTKISDCVADYTVLYNSTSFSTFKCYGVSPSKLIDLCFTSVYADTFLIRSSEVRQV
QDF43820.1      YAWERTKISDCVADYTVLYNSTSFSTFKCYGVSPSKLIDLCFTSVYADTFLIRSSEVRQV
AAZ67052.1      YAWERTKISDCVADYTVLYNSTSFSTFKCYGVSPSKLIDLCFTSVYADTFLIRSSEVRQV
AFS88936.1      YNFKRLVFTNCNYNLTKLLSLFSVNDFTCSQISPAAIASNCYSSLILDYFSYPLSMKSDL
YP_0010399      YNFKRLVFSNCNYNLTKLLSLFAVDEFSCNGISPDSIARGCYSTLTVDYFAYPLSMKSYI
                * ::*  :::*  : : : .   .. *.*  :*.  :   *::.:  * *    .    :
QDF43825.1      APGQTGVIADYNYKLPDDFMGC-VLAWNTRNIDATSTGNYNYKYRSLRHGKLRPFERDIS
ALK02457.1      APGQTGVIADYNYKLPDDFTGC-VLAWNTRNIDATQTGNYNYKYRSLRHGKLRPFERDIS
AAS10463.1      APGQTGVIADYNYKLPDDFMGC-VLAWNTRNIDATSTGNYNYKYRYLRHGKLRPFERDIS
AAP13441.1      APGQTGVIADYNYKLPDDFMGC-VLAWNTRNIDATSTGNYNYKYRYLRHGKLRPFERDIS
AAP13567.1      APGQTGVIADYNYKLPDDFMGC-VLAWNTRNIDATSTGNYNYKYRYLRHGKLRPFERDIS
QHD43416.1      APGQTGKIADYNYKLPDDFTGC-VIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDIS
AVP78031.1      APGQTGVIADYNYKLPDDFTGC-VIAWNTAKQDV---GNYF--YRSHRSTKLKPFERDLS
ABD75323.1      APGQTGVIADYNYKLPDDFTGC-VIAWNTAKQDV---GSYF--YRSHRSSKLKPFERDLS
QDF43835.1      APGETGVIADYNYKLPDDFTGC-VIAWNTAKQDQ---GQYY--YRSSRKTKLKPFERDLT
QDF43820.1      APGETGVIADYNYKLPDDFTGC-VIAWNTAKQDT---GHYY--YRSHRKTKLKPFERDLS
AAZ67052.1      APGETGVIADYNYKLPDDFTGC-VIAWNTAKQDQ---GQYY--YRSHRKTKLKPFERDLS
AFS88936.1      SVSSAGPISQFNYKQSFSNPTC-LILATVPHNLTTITKPLKYSYINKCSRLLSDDRTEVP
YP_0010399      RPGSAGNIPLYNYKQSFANPTCRVMASVLANVTITKPHAYG--YIS-KCSRLTGANQDVE
                  ..:* *. :*** .     * ::     :            *       *     :: 
QDF43825.1      NVPFSPDGKPCTPP-AF-NCYW-----------PLNDYGFFTTNGIGYQPYRVVVLSFEL
ALK02457.1      NVPFSPDGKPCTPP-AF-NCYW-----------PLNDYGFYITNGIGYQPYRVVVLSFEL
AAS10463.1      NVPFSPDGKPCTPP-AP-NCYW-----------PLNGYGFYTTSGIGYQPYRVVVLSFEL
AAP13441.1      NVPFSPDGKPCTPP-AL-NCYW-----------PLNDYGFYTTTGIGYQPYRVVVLSFEL
AAP13567.1      NVPFSPDGKPCTPP-AL-NCYW-----------PLNDYGFYTTTGIGYQPYRVVVLSFEL
QHD43416.1      TEIYQAGSTPCNGVEGF-NCYF-----------PLQSYGFQPTNGVGYQPYRVVVLSFEL
AVP78031.1      SDE---------------NGVR-----------TLSTYDFNPNVPLEYQATRVVVLSFEL
ABD75323.1      SEE---------------NGVR-----------TLSTYDFNQNVPLEYQATRVVVLSFEL
QDF43835.1      SDE---------------NGVR-----------TLSTYDFYPNVPIEYQATRVVVLSFEL
QDF43820.1      SDDG--------------NGVY-----------TLSTYDFNPNVPVAYQATRVVVLSFEL
AAZ67052.1      SDE---------------NGVR-----------TLSTYDFYPSVPVAYQATRVVVLSFEL
AFS88936.1      QLVNANQYSPCVSI-VP-STVWEDGDYYRKQLSPLEGGGWLVASGSTVAMTEQLQMGFGI
YP_0010399      TPLYINPGEYSICRDFSPGGFSEDGQVFKRTLTQFEGGGLLIGVGTRVPMTDNLQMSFII
                                  .               :.  .              : :.* :
QDF43825.1      L----NAPATVC-----GPKLSTDLIKNQCVNFNFNGLTGTGVLTPSSKRFQPFQQFGRD
ALK02457.1      L----NAPATVC-----GPKLSTDLIKNQCVNFNFNGLTGTGVLTPSSKRFQPFQQFGRD
AAS10463.1      L----NAPATVC-----GPKLSTDLIKNQCVNFNFNGLTGTGVLTPSSKRFQPFQQFGRD
AAP13441.1      L----NAPATVC-----GPKLSTDLIKNQCVNFNFNGLTGTGVLTPSSKRFQPFQQFGRD
AAP13567.1      L----NAPATVC-----GPKLSTDLIKNQCVNFNFNGLTGTGVLTPSSKRFQPFQQFGRD
QHD43416.1      L----HAPATVC-----GPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKKFLPFQQFGRD
AVP78031.1      L----NAPATVC-----GPKLSTQLVKNQCVNFNFNGLKGTGVLTDSSKRFQSFQQFGKD
ABD75323.1      L----NAPATVC-----GPKLSTSLVKNQCVNFNFNGFKGTGVLTDSSKTFQSFQQFGRD
QDF43835.1      L----NAPATVC-----GPKLSTGLVKNQCVNFNFNGLRGTGVLTDSSKRFQSFQQFGRD
QDF43820.1      L----NAPATVC-----GPKLSTQLVKNQCVNFNFNGLKGTGVLTDSSKRFQSFQQFGRD
AAZ67052.1      L----NAPATVC-----GPKLSTQLVKNQCVNFNFNGLKGTGVLTESSKRFQSFQQFGRD
AFS88936.1      TVQYGTDTNSVCPKLEFANDTKIASQLGNCVEYSLYGVSGRGVFQNCTAVGVRQQRFVYD
YP_0010399      SVQYGTGTDSVCPMLDLGDSLTITNRLGKCVDYSLYGVTGRGVFQNCTAVGVKQQRFVYD
                       . :**     . . .     .:**::.: *. * **:  ..      *.*  *
QDF43825.1      VSD-FTDSVRDPKTSEILDISPCSFGGVSVITPGTNTSSEVAVLYQDVNCTDVPVAI---
ALK02457.1      VLD-FTDSVRDPKTSEILDISPCSFGGVSVITPGTNTSSEVAVLYQDVNCTDVPVAI---
AAS10463.1      VSD-FTDSVRDPKTSEILDISPCSFGGVSVITPGTNASSEVAVLYQDVNCTDVSTLI---
AAP13441.1      VSD-FTDSVRDPKTSEILDISPCSFGGVSVITPGTNASSEVAVLYQDVNCTDVSTAI---
AAP13567.1      VSD-FTDSVRDPKTSEILDISPCSFGGVSVITPGTNASSEVAVLYQDVNCTDVSTAI---
QHD43416.1      IAD-TTDAVRDPQTLEILDITPCSFGGVSVITPGTNTSNQVAVLYQDVNCTEVPVAI---
AVP78031.1      ASD-FIDSVRDPQTLEILDITPCSFGGVSVITPGTNTSLEVAVLYQDVNCTDVPTTI---
ABD75323.1      ASD-FTDSVRDPQTLRILDISPCSFGGVSVITPGTNTSSAVAVLYQDVNCTDVPRTI---
QDF43835.1      TSD-FTDSVRDPQTLEILDITPCSFGGVSVITPGTNASSEVAVLYQDVNCTDVPTAI---
QDF43820.1      TSD-FTDSVRDPQTLEILDITPCSFGGVSVITPGTNASSEVAVLYQDVNCTDVPTAI---
AAZ67052.1      TSD-FTDSVRDPQTLEILDISPCSFGGVSVITPGTNASSEVAVLYQDVNCTDVPAAI---
AFS88936.1      AYQNLVGYYSDDGNYYCLR--ACVSVPVSVIY--DKETKTHATLFGSVACEHISSTMSQY
YP_0010399      SFDNLVGYYSDDGNYYCVR--PCVSVPVSVIY--DKSTNLHATLFGSVACEHVTTMM---
                  :   .   *  .   :   .*    ****    : :   *.*: .* *  :.  :   
QDF43825.1      -HADQLTPAWRIYSTGNNVFQTQAGCLIGAEHVD-TSYECDIPIGAGICASYHTVSS---
ALK02457.1      -HADQLTPSWRVYSTGNNVFQTQAGCLIGAEHVD-TSYECDIPIGAGICASYHTVSS---
AAS10463.1      -HAEQLTPAWRIYSTGNNVFQTQAGCLIGAEHVD-TSYECDIPIGAGICASYHTVSS---
AAP13441.1      -HADQLTPAWRIYSTGNNVFQTQAGCLIGAEHVD-TSYECDIPIGAGICASYHTVSL---
AAP13567.1      -HADQLTPAWRIYSTGNNVFQTQAGCLIGAEHVD-TSYECDIPIGAGICASYHTVSL---
QHD43416.1      -HADQLTPTWRVYSTGSNVFQTRAGCLIGAEHVN-NSYECDIPIGAGICASYQTQTNSPR
AVP78031.1      -HADQLTPAWRIYATGTNVFQTQAGCLIGAEHVN-ASYECDIPIGAGICASYHTASI---
ABD75323.1      -QADQLAPSWRVYTTGPYVFQTQAGCLIGAEHVN-ASYQCDIPIGAGICASYHTASH---
QDF43835.1      -RADQLTPAWRVYSTGINVFQTQAGCLIGAEHVN-ASYECDIPIGAGICASYHTAST---
QDF43820.1      -RADQLTPAWRVYSTGVNVFQTQAGCLIGAEHVN-ASYECDIPIGAGICASYHTAST---
AAZ67052.1      -HADQLTPAWRVYSTGTNVFQTQAGCLIGAEHVN-ASYECDIPIGAGICASYHTAST---
AFS88936.1      SRSTRSMLKRRDSTYGP--LQTPVGCVLGLVNSSLFVEDCKLPLGQSLCALPDTPST---
YP_0010399      SQFSRLTQSNLRRRDSNIPLQTAVGCVIGLSNNSLVVSDCKLPLGQSLCAVPPV-ST---
                 .  .    .     .   :** .**::*  : .    :*.:*:* .:**   . :    
QDF43825.1      -LRSTS----QKSI--------VAYTMSLGADSSIAYSNNTIAIPTNFSISITTEVMPVS
ALK02457.1      -LRSTS----QKSI--------VAYTMSLGADSSIAYSNNTIAIPTNFSISITTEVMPVS
AAS10463.1      -LRSTS----QKSI--------VAYTMSLGADSSIAYSNNTIAIPTNFSISITTEVMPVS
AAP13441.1      -LRSTS----QKSI--------VAYTMSLGADSSIAYSNNTIAIPTNFSISITTEVMPVS
AAP13567.1      -LRSTS----QKSI--------VAYTMSLGADSSIAYSNNTIAIPTNFSISITTEVMPVS
QHD43416.1      RARSVA----SQSI--------IAYTMSLGAENSVAYSNNSIAIPTNFTISVTTEILPVS
AVP78031.1      -LRSTS----QKAI--------VAYTMSLGAENSIAYANNSIAIPTNFSISVTTEVMPVS
ABD75323.1      -LRSTG----QKSI--------VAYTMSLGAENSVAYANNSIAIPTNFSISVTTEVMPVS
QDF43835.1      -LRSVG----QKSI--------VAYTMSLGAENSIAYANNSIAIPTNFSISVTTEVMPVS
QDF43820.1      -LRSVG----QKSI--------VAYTMSLGAENSIAYANNSIAIPTNFSISVTTEVMPVS
AAZ67052.1      -LRSVG----QKSI--------VAYTMSLGAENSIAYANNSIAIPTNFSISVTTEVMPVS
AFS88936.1      -LTPRS----VRSVPGEMRLASIAFNHPIQVDQ-LNSSYFKLSIPTNFSFGVTQEYIQTT
YP_0010399      -FRSYSASQFQLAV--------LNYTSPIVV-TPINSSGFTAAIPTNFSFSVTQEYIETS
                   . .      ::        : :. .: .   :  :  . :*****::.:* * : .:
QDF43825.1      MAKTSVDCNMYICGDSTECANLLLQYGSFCTQLNRALSGIAVEQDRNTREVFAQVKQMYK
ALK02457.1      MAKTSVDCNMYICGDSTECANLLLQYGSFCTQLNRALSGIAVEQDRNTREVFAQVKQMYK
AAS10463.1      MAKTSVDCNMYICGDSTECANLLLQYGSFCRQLNRALSGIAAEQDRNTREVFVQVKQMYK
AAP13441.1      MAKTSVDCNMYICGDSTECANLLLQYGSFCTQLNRALSGIAAEQDRNTREVFAQVKQMYK
AAP13567.1      MAKTSVDCNMYICGDSTECANLLLQYGSFCTQLNRALSGIAAEQDRNTREVFAQVKQMYK
QHD43416.1      MTKTSVDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGIAVEQDKNTQEVFAQVKQIYK
AVP78031.1      MAKTSVDCTMYICGDSIECSNLLLQYGSFCTQLNRALSGIAIEQDKNTQEVFAQVKQIYK
ABD75323.1      MAKTSVDCTMYICGDSLECSNLLLQYGSFCTQLNRALSGIAVEQDKNTQEVFAQVKQMYK
QDF43835.1      MSKTSVDCTMYICGDSQECSNLLLQYGSFCTQLNRALTGIAIEQDKNTQEVFAQVKQMYK
QDF43820.1      MAKTSVDCTMYICGDSQECSNLLLQYGSFCTQLNRALTGVALEQDKNTQEVFAQVKQMYK
AAZ67052.1      MAKTSVDCTMYICGDSLECSNLLLQYGSFCTQLNRALSGIAIEQDKNTQEVFAQVKQMYK
AFS88936.1      IQKVTVDCKQYVCNGFQKCEQLLREYGQFCSKINQALHGANLRQDDSVRNLFASVKSSQS
YP_0010399      IQKVTVDCKQYVCNGFTRCEKLLVEYGQFCSKINQALHGANLRQDESVYSLYSNIKTT-S
                : *.:***. *:*..   * :** :**.** ::*.** *    ** .. .:: .:*   .
QDF43825.1      TPTLKD-FGG-FNFSQILPDPLKPTKRSF---IEDLLFNKVTLADAGFMKQYGECL--GD
ALK02457.1      TPTLKD-FGG-FNFSQILPDPLKPTKRSF---IEDLLFNKVTLADAGFMKQYGECL--GD
AAS10463.1      TPTLKD-FGG-FNFSQILPDPLKPTKRSF---IEDLLFNKVTLADAGFMKQYGECL--GD
AAP13441.1      TPTLKY-FGG-FNFSQILPDPLKPTKRSF---IEDLLFNKVTLADAGFMKQYGECL--GD
AAP13567.1      TPTLKY-FGG-FNFSQILPDPLKPTKRSF---IEDLLFNKVTLADAGFMKQYGECL--GD
QHD43416.1      TPPIKD-FGG-FNFSQILPDPSKPSKRSF---IEDLLFNKVTLADAGFIKQYGDCL--GD
AVP78031.1      TPPIKD-FGG-FNFSQILPDPSKPSKRSF---IEDLLFNKVTLADAGFIKQYGDCL--GG
ABD75323.1      TPTIRD-FGG-FNFSQILPDPLKPTKRSF---IEDLLYNKVTLADAGFMKQYADCL--GG
QDF43835.1      TPAIKD-FGG-FNFSQILPDPSKPTKRSF---IEDLLFNKVTLADAGFMKQYGECL--GD
QDF43820.1      TPAIKD-FGG-FNFSQILPDPSKPTKRSF---IEDLLFNKVTLADAGFMKQYGECL--GD
AAZ67052.1      TPAIKD-FGG-FNFSQILPDPSKPTKRSF---IEDLLFNKVTLADAGFMKQYGECL--GD
AFS88936.1      SPIIPG-FGGDFNLTLLEPVSISTGSRSARSAIEDLLFDKVTIADPGYMQGYDDCMQQGP
YP_0010399      TQTLEYGLNGDFNLTLLQVPQIGGSSSSYRSAIEDLLFDKVTIADPGYMQGYDDCMKQGP
                :  :   :.* **:: :        . *    *****::***:**.*::: * :*:  * 
QDF43825.1      INARDLICAQKFNGLTVLPPLLTDDMIAAYTAALVSGTATAGWTFGAGAALQIPFAMQMA
ALK02457.1      INARDLICAQKFNGLTVLPPLLTDDMIAAYTAALVSGTATAGWTFGAGAALQIPFAMQMA
AAS10463.1      INARDLICAQKFNGLTVLPPLLTDDMIAAYTAALVSGTATAGWTFGAGAALQIPFAMQMA
AAP13441.1      INARDLICAQKFNGLTVLPPLLTDDMIAAYTAALVSGTATAGWTFGAGAALQIPFAMQMA
AAP13567.1      INARDLICAQKFNGLTVLPPLLTDDMIAAYTAALVSGTATAGWTFGAGAALQIPFAMQMA
QHD43416.1      IAARDLICAQKFNGLTVLPPLLTDEMIAQYTSALLAGTITSGWTFGAGAALQIPFAMQMA
AVP78031.1      ISARDLICAQKFNGLTVLPPLLTDEMIAAYTAALISGTATAGWTFGAGAALQIPFAMQMA
ABD75323.1      INARDLICAQKFNGLTVLPPLLTDDMIAAYTAALISGTATAGWTFGAGAALQIPFAMQMA
QDF43835.1      INARDLICAQKFNGLTVLPPLLTDDMIAAYTAALVSGTATAGWTFGAGAALQIPFAMQMA
QDF43820.1      INARDLICAQKFNGLTVLPPLLTDDMIAAYTAALVSGTATAGWTFGAGAALQIPFAMQMA
AAZ67052.1      ISARDLICAQKFNGLTVLPPLLTDEMIAAYTAALVSGTATAGWTFGAGSALQIPFAMQMA
AFS88936.1      ASARDLICAQYVAGYKVLPPLMDVNMEAAYTSSLLGSIAGVGWTAGLSSFAAIPFAQSIF
YP_0010399      QSARDLICAQYVSGYKVLPPLYDPNMEAAYTSSLLGSIAGAGWTAGLSSFAAIPFAQSMF
                  ******** . * .*****   :* * **::*:..    *** * .:   **** .: 
QDF43825.1      YRFNGIGVTQNVLYENQKQIANQFNKAISQIQESLTTTSTALGKLQDVVNQNAQALNTLV
ALK02457.1      YRFNGIGVTQNVLYENQKQIANQFNKAISQIQESLTTTSTALGKLQDVVNQNAQALNTLV
AAS10463.1      YRFNGIGVTQNVLYENQKQIANQFNKAISQIQESLTTTSTALGKLQDVVNQNAQALNTLV
AAP13441.1      YRFNGIGVTQNVLYENQKQIANQFNKAISQIQESLTTTSTALGKLQDVVNQNAQALNTLV
AAP13567.1      YRFNGIGVTQNVLYENQKQIANQFNKAISQIQESLTTTSTALGKLQDVVNQNAQALNTLV
QHD43416.1      YRFNGIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTASALGKLQDVVNQNAQALNTLV
AVP78031.1      YRFNGIGVTQNVLYENQKLIANQFNSAIGKIQESLTSTASALGKLQDVVNQNAQALNTLV
ABD75323.1      YRFNGIGVTQNVLYENQKQIANQFNKAITQIQESLTTTSTALGKLQDVVNQNAQALNTLV
QDF43835.1      YRFNGIGVTQNVLYENQKQIANQFNKAISQIQESLTTTSTALGKLQDVVNQNAQALNTLV
QDF43820.1      YRFNGIGVTQNVLYENQKQIANQFNKAISQIQESLTTTSTALGKLQDVVNQNAQALNTLV 
AAZ67052.1      YRFNGIGVTQNVLYENQKQIANQFNKAISQIQESLTTTSTALGKLQDVVNQNAQALNTLV
AFS88936.1      YRLNGVGITQQVLSENQKLIANKFNQALGAMQTGFTTTNEAFQKVQDAVNNNAQALSKLA
YP_0010399      YRLNGVGITQQVLSENQKLIANKFNQALGAMQTGFTTSNLAFSKVQDAVNANAQALSKLA
                **:**:*:**:** **** ***:**.*:  :* .::::  *: *:**.** *****..*.
QDF43825.1      KQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRASAN
ALK02457.1      KQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRASAN
AAS10463.1      KQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRASAN
AAP13441.1      KQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRASAN
AAP13567.1      KQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRASAN
QHD43416.1      KQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRASAN
AVP78031.1      KQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRASAN
ABD75323.1      KQLSSNFGAISSALNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRASAN
QDF43835.1      KQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRASAN
QDF43820.1      KQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRASAN
AAZ67052.1      KQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRASAN
AFS88936.1      SELSNTFGAISASIGDIIQRLDVLEQDAQIDRLINGRLTTLNAFVAQQLVRSESAALSAQ
YP_0010399      SELSNTFGAISSSISDILARLDTVEQDAQIDRLINGRLISLNAFVSQQLVRSETAARSAQ
                .:**..*****: :.**: *** :* :.******.*** :*:::*:***:*:     **:
QDF43825.1      LAATKMSECVLGQSKRVDFCGKGYHLMSFPQAAPHGVVFLHVTYVPSQERNFTTAPAICH
ALK02457.1      LAATKMSECVLGQSKRVDFCGKGYHLMSFPQAAPHGVVFLHVTYVPSQERNFTTAPAICH
AAS10463.1      LAATKMSECVLGQSKRVDFCGKGYHLMSFPQAAPHGVVFLHVTYVPSQERNFTTAPAICH
AAP13441.1      LAATKMSECVLGQSKRVDFCGKGYHLMSFPQAAPHGVVFLHVTYVPSQERNFTTAPAICH
AAP13567.1      LAATKMSECVLGQSKRVDFCGKGYHLMSFPQAAPHGVVFLHVTYVPSQERNFTTAPAICH
QHD43416.1      LAATKMSECVLGQSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQEKNFTTAPAICH
AVP78031.1      LAATKMSECVLGQSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYIPSQEKNFTTAPAICH
ABD75323.1      LAATKMSECVLGQSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPSQEKNFTTAPAICH
QDF43835.1      LAATKMSECVLGQSKRVDFCGRGYHLMSFPQAAPHGVVFLHVTYVPSQEKNFTTAPAICH
QDF43820.1      LAATKMSECVLGQSKRVDFCGRGYHLMSFPQAAPHGVVFLHVTYVPSQEKNFTTAPAICH
AAZ67052.1      LAATKMSECVLGQSKRVDFCGKGYHLMSFPQAAPHGVVFLHVTYVPSQERNFTTAPAICH
AFS88936.1      LAKDKVNECVKAQSKRSGFCGQGTHIVSFVVNAPNGLYFMHVGYYPSNHIEVVSAYGLCD
YP_0010399      LASDKVNECVKSQSKRNGFCGSGTHIVSFVVNAPNGFYFFHVGYVPTNYTNVTAAYGLCN
                **  *:.*** .**** .*** * *::**   **:*. *:** * *::  :..:* .:* 
QDF43825.1      EGKAYF---PREGVFVFNGTS-------WFITQRNFFSPQIITTDNT-FVSGSCDVVIGI
ALK02457.1      EGKAYF---PREGVFVFNGTS-------WFITQRNFFSPQIITTDNT-FVSGSCDVVIGI
AAS10463.1      EGKAYF---PREGVFVFNGTS-------WFITQRNFFSPQIITTDNT-FVSGNCDVVIGI
AAP13441.1      EGKAYF---PREGVFVFNGTS-------WFITQRNFFSPQIITTDNT-FVSGNCDVVIGI
AAP13567.1      EGKAYF---PREGVFVFNGTS-------WFITQRNFFSPQIITTDNT-FVSGNCDVVIGI
QHD43416.1      DGKAHF---PREGVFVSNGTH-------WFVTQRNFYEPQIITTDNT-FVSGNCDVVIGI
AVP78031.1      EGKAHF---PREGVFVSNGTH-------WFVTQRNFYEPKIITTDNT-FVSGNCDVVIGI
ABD75323.1      EGKAYF---PREGVFVSNGSS-------WFITQRNFYSPQIITTDNT-FVAGSCDVVIGI
QDF43835.1      EGKAYF---PREGVFVSNGTS-------WFITQRNFYSPQIITTDNT-FVAGSCDVVIGI
QDF43820.1      EGKAYF---PREGVFVSNGTF-------WFITQRNFYSPQIITTDNT-FVAGNCDVVIGI
AAZ67052.1      EGKAYF---PREGVFVSNGTS-------WFITQRNFYSPQIITTDNT-FVAGSCDVVIGI
AFS88936.1      AANPTNCIAPVNGYFIKTNNT--RIVDEWSYTGSSFYAPEPITSLNTKYVA--PQVTYQN
YP_0010399      NNNPPLCIAPIDGYFITNQTTTYSVDTEWYYTGSSFYKPEPITQANSRYVS--SDVKFDK
                  :.     * :* *: . .        *  *  .*: *: **  *: :*:   :*    
QDF43825.1      INNTVYDPL---QPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRLNEV
ALK02457.1      INNTVYDPL---QPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRLNEV
AAS10463.1      INNTVYDPL---QPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQEEIDRLNEV
AAP13441.1      INNTVYDPL---QPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRLNEV
AAP13567.1      INNTVYDPL---QPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRLNEV
QHD43416.1      VNNTVYDPL---QPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRLNEV
AVP78031.1      INNTVYDPL---QPELDSFKEELDKYFKNHTSPDIDLGDISGINASVVNIQKEIDRLNEV
ABD75323.1      INNTVYDPL---QPELDSFKQELDKYFKNHTSPDVDLGDISGINASVVDIQKEIDRLNEV
QDF43835.1      INNTVYDPL---QPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRLNEV
QDF43820.1      INNTVYDPL---QPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRLNEV
AAZ67052.1      INNTVYDPL---QPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRLNEV
AFS88936.1      ISTNLPPPLLGNSTGID-FQDELDEFFKNVSTSIPNFGSLTQINTTLLDLTYEMLSLQQV
YP_0010399      LENNLPPPLLENSTDVD-FKDELEEFFKNVTSHGPNFAEISKINTTLLDLSDEMAMLQEV
                :...:  **   .. :* *::**:::*** ::   ::..:: **::::::  *:  *::*
QDF43825.1      AKNLNESLIDLQELGKYEQYIKWPWYVWLGFIAGLIAIVMVTILLCCMTSCCSCLKGACS
ALK02457.1      AKNLNESLIDLQELGKYEQYIKWPWYVWLGFIAGLIAIVMVTILLCCMTSCCSCLKGACS
AAS10463.1      AKNLNESLIDLQELGKYEQYIKWPWYVWLGFIAGLIAIVMVTILLCCMTSCCSCLKGACS
AAP13441.1      AKNLNESLIDLQELGKYEQYIKWPWYVWLGFIAGLIAIVMVTILLCCMTSCCSCLKGACS
AAP13567.1      AKNLNESLIDLQELGKYEQYIKWPWYVWLGFIAGLIAIVMVTILLCCMTSCCSCLKGACS
QHD43416.1      AKNLNESLIDLQELGKYEQYIKWPWYIWLGFIAGLIAIVMVTIMLCCMTSCCSCLKGCCS
AVP78031.1      ARNLNESLIDLQELGKYEQYIKWPWYVWLGFIAGLIAIVMVTILLCCMTSCCSCLKGCCS
ABD75323.1      AKNLNESLIDLQELGKYEQYIKWPWYVWLGFIAGLVGLFMAIILLCYFTSCCSCCKGMCS
QDF43835.1      AKNLNESLIDLQELGKYEQYIKWPWYVWLGFIAGLIAIVMATILLCCMTSCCSCLKGACS
QDF43820.1      AKNLNESLIDLQELGKYEQYIKWPWYVWLGFIAGLIAIVMATILLCCMTSCCSCLKGACS
AAZ67052.1      AKNLNESLIDLQELGKYEQYIKWPWYVWLGFIAGLIAIVMVTILLCCMTSCCSCLKGACS
AFS88936.1      VKALNESYIDLKELGNYTYYNKWPWYIWLGFIAGLVALALCVFFILCCTGCGTNCMGKLK
YP_0010399      VKQLNDSYIDLKELGNYTYYNKWPWYVWLGFIAGLVALLLCVFFLLCCTGCGTSCLGKMK
                .. **:* ***:***:*  * *****:********:.: :  :::   *.* :   *  .
QDF43825.1      CGSCC-KFDEDDSEPVLKGVKLHYT
ALK02457.1      CGSCC-KFDEDDSEPVLKGVKLHYT
AAS10463.1      CGSCC-KFDEDDSEPVLKGVKLHYT
AAP13441.1      CGSCC-KFDEDDSEPVLKGVKLHYT
AAP13567.1      CGSCC-KFDEDDSEPVLKGVKLHYT
QHD43416.1      CGSCC-KFDEDDSEPVLKGVKLHYT
AVP78031.1      CGSCC-KFDEDDSEPVLKGVKLHYT
ABD75323.1      CGSCC-RFDEDDSEPVLKGVKLHYT
QDF43835.1      CGSCC-KFDEDDSEPVLKGVKLHYT
QDF43820.1      CGSCC-KFDEDDSEPVLKGVKLHYT
AAZ67052.1      CGSCC-KFDEDDSEPVLKGVKLHYT
AFS88936.1      CNRCCDRYEEYDLEP----HKVHVH
YP_0010399      CKNCCDSYEEYDVE------KIHVH
                *  **  ::* * *      *:*
  1. Next, I clicked the tab labeled 6.Tree Rendering, and I observed a phylogenetic tree with five sequences. This tree has:
    • Horizontal lines (branches) that represent individual evolutionary lineages
    • Vertical lines (splits) that represent mutation events
      • The vertical length of each split is solely there for visual clarity with no biological meaning
      • The left-most split is known as the root of the tree, and represents a hypothesis about the most recent common ancestor (MRCA) of the sequences within the tree.
    • The length of each branch depicts the percentage change in residue sequence occurring along that branch, relative to the scale bar shown at the bottom of the tree.
      • The scale bar will be a number between 0 and 1 and can be reinterpreted as a percent.
        • For example, 0.05 would be 5%.
    • The tree may also contain support values for each clade; shown in red on the branches, also expressed as a number between 0 and 1.
      • 0.05 would be 5%.
    • In general, a higher support value indicates a higher statistical confidence in a particular clade.

Phylogenetic Tree: Phylogenetic Tree Taghizadeh.png Comparison of the phylogenetic tree to the multiple sequence alignment:

  • The class sequence alignment resembles the phylogenetic tree shown, as the branch points and distance between the sequences depict how conserved/not conserved they are in the sequence alignment. For instance, the two outgroups, AFS88936.1 [Human betacoronavirus 2c EMC/2012] and YP_001039953.1 spike glycoprotein [Tylonycteris bat coronavirus HKU4] branch from the same point, and they both show that they have little conserved compared to other spike protein sequences. The similarity in their sequences is to be expected, as they are sister taxa on the phylogenetic tree.

Comparison of the alignment to Figure 3 of the Wan et al (2020) paper:

  • The amino acids that were discussed in the Wan et al. (2020) paper:
QDF43825.1      RQIAPGQTGVIADYNYKLPDDFMGC-VLAWNTRNIDATSTGNYNYKYRSLRHGKLRPFER
AGZ48818.1      RQIAPGQTGVIADYNYKLPDDFTGC-VLAWNTRNIDATQTGNYNYKYRSLRHGKLRPFER
ALK02457.1      RQIAPGQTGVIADYNYKLPDDFTGC-VLAWNTRNIDATQTGNYNYKYRSLRHGKLRPFER
AAS10463.1      RQIAPGQTGVIADYNYKLPDDFMGC-VLAWNTRNIDATSTGNYNYKYRYLRHGKLRPFER
AAP13441.1      RQIAPGQTGVIADYNYKLPDDFMGC-VLAWNTRNIDATSTGNYNYKYRYLRHGKLRPFER
AAP13567.1      RQIAPGQTGVIADYNYKLPDDFMGC-VLAWNTRNIDATSTGNYNYKYRYLRHGKLRPFER
QHD43416.1      RQIAPGQTGKIADYNYKLPDDFTGC-VIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFER
AVP78031.1      RQVAPGQTGVIADYNYKLPDDFTGC-VIAWNTAKQD---VGNYF--YRSHRSTKLKPFER
ABD75323.1      RQVAPGQTGVIADYNYKLPDDFTGC-VIAWNTAKQD---VGSYF--YRSHRSSKLKPFER
QDF43835.1      RQVAPGETGVIADYNYKLPDDFTGC-VIAWNTAKQD---QGQYY--YRSSRKTKLKPFER
ABD75332.1      RQVAPGETGVIADYNYKLPDDFTGC-VIAWNTAQQD---QGQYY--YRSYRKEKLKPFER
QDF43820.1      RQVAPGETGVIADYNYKLPDDFTGC-VIAWNTAKQD---TGHYY--YRSHRKTKLKPFER
AAZ67052.1      RQVAPGETGVIADYNYKLPDDFTGC-VIAWNTAKQD---QGQYY--YRSHRKTKLKPFER
AFS88936.1      SDLSVSSAGPISQFNYKQSFSNPTC-LILATVPHNLTTITKPLKYSYINKCSRLLSDDRT
YP_0010399      SYIRPGSAGNIPLYNYKQSFANPTCRVMASVLANVTITKPHAYG--YIS-KCSRLTGANQ
                  :  ..:* *. :*** .     * ::     :            *       *     

QDF43825.1      DISNVPFSPDGKPCTPP-AF-NCYW-----------PLNDYGFFTTNGIGYQPY
AGZ48818.1      DISNVPFSPDGKPCTPP-AF-NCYW-----------PLNDYGFYITNGIGYQPY
ALK02457.1      DISNVPFSPDGKPCTPP-AF-NCYW-----------PLNDYGFYITNGIGYQPY
AAS10463.1      DISNVPFSPDGKPCTPP-AP-NCYW-----------PLNGYGFYTTSGIGYQPY
AAP13441.1      DISNVPFSPDGKPCTPP-AL-NCYW-----------PLNDYGFYTTTGIGYQPY
AAP13567.1      DISNVPFSPDGKPCTPP-AL-NCYW-----------PLNDYGFYTTTGIGYQPY
QHD43416.1      DISTEIYQAGSTPCNGVEGF-NCYF-----------PLQSYGFQPTNGVGYQPY
AVP78031.1      DLSSDE---------------NGVR-----------TLSTYDFNPNVPLEYQAT
ABD75323.1      DLSSEE---------------NGVR-----------TLSTYDFNQNVPLEYQAT
QDF43835.1      DLTSDE---------------NGVR-----------TLSTYDFYPNVPIEYQAT
ABD75332.1      DLSSDE---------------NGVY-----------TLSTYDFYPSIPVEYQAT
QDF43820.1      DLSSDDG--------------NGVY-----------TLSTYDFNPNVPVAYQAT
AAZ67052.1      DLSSDE---------------NGVR-----------TLSTYDFYPSVPVAYQAT
AFS88936.1      EVPQLVNANQYSPCVSI-VP-STVWEDGDYYRKQLSPLEGGGWLVASGSTVAMT
YP_0010399      DVETPLYINPGEYSICRDFSPGGFSEDGQVFKRTLTQFEGGGLLIGVGTRVPMT
                ::                   .               :.  .    
  • It is observed that the amino acid residues have mutated in the bottom sequences. The five critical amino acids also have mutations within their sequences. In comparison to the class alignment, this alignment shows different amino acids, and the class alignment has a lot more sequences than the one in the paper. Therefore, it can be seen that the alignment from the article is more conserved than the class alignment, which has a lot of spaces. Furthermore, the article alignment shows many stars, indicating invariance. This was not seen as much within the class sequence alignment.

Comparison of the alignment to Figure 2 of the Wan et al (2020) paper:

  • They are similar in that they both have two branch points. However, the tree in figure 2 shows one branch being very similar with little divergence, while the other branch shows a lot of divergence. In the class tree, the branches do not diverge as much and are more similar. Furthermore, there were many sequences within the tree from figure 2 that are not displayed in the class, making it difficult to compare the two.

Is enough information provided by Wan et al (2020) in their paper for us to reproduce their analysis? Explain your answer.

  • Only if one knows the software that was used to provide their analysis and can reproduce phylogenetic trees and sequence alignments on their own would this be reproducible. Other than that, one would require a thorough explanation for the process of generating phylogeny.

Scientific Conclusion

This week's assignment taught us how to generate phylogenetic trees and sequence alignments.It also taught us how to work with GenBank records. Furthermore, we were able to analyze sequences by observing their highlighted portions, to determine if they were conserved or not. This assignment allows us to further our understanding of viral strains, and how similar or different infectious strains can be.

Acknowledgements

  • I consulted with my partner Fatima Alghanem over zoom to discuss the figures and how to compare them to our tree.
  • I referenced the Wan et. al - Receptor Recognition by the Novel Coronavirus from Wuhan paper for Figures 2 and 3.
  • I created the phylogenetic tree and created sequence alignments using Phylogeny.fr.
  • I obtained sequences from GenBank.
  • I crowdsourced sequences from Week 4 Talk page.
  • Except for what is noted above, this individual journal entry was completed by me and not copied from another source.

Kam Taghizadeh (talk) 23:15, 1 October 2020 (PDT)

References

  1. Phylogeny.fr: "One Click" Mode. (2020). Retrieved 30 September 2020, from http://www.phylogeny.fr/simple_phylogeny.cgi?workflow_id=b9c0813cbbe9695d63cf7e31da5f026d&tab_index=1
  2. NCBI GenBank. (2020). Spike protein [Bat SARS CoV Rf1/2004] - Protein. Retrieved 30 September 2020, from https://www.ncbi.nlm.nih.gov/protein/ABD75323.1?report=fasta
  3. NCBI GenBank. (2020). Bat SARS-like coronavirus Rs3367, complete genome - Nucleotide. Retrieved 30 September 2020, from https://www.ncbi.nlm.nih.gov/nuccore/556015127/
  4. OpenWetWare. (2020). BIOL368/F20:Week 4. Retrieved 30 September 2020, from https://openwetware.org/wiki/BIOL368/F20:Week_4#Data_.26_Tools
  5. OpenWetWare. (2020). Talk:BIOL368/F20:Week 4. Retrieved 30 September 2020, from https://openwetware.org/wiki/Talk:BIOL368/F20:Week_4
  6. Wan, Y., Shang, J., Graham, R., Baric, R., & Li, F. (2020). Receptor Recognition by the Novel Coronavirus from Wuhan: an Analysis Based on Decade-Long Structural Studies of SARS Coronavirus. Journal Of Virology, 94(7). doi: 10.1128/jvi.00127-20