Nathan R Beshai Week 4

From OpenWetWare
Jump to navigationJump to search


Nathan R. Beshai User Page

Nathan R. Beshai Template Page

Nathan R. Beshai

Course assignments

  1. Week 1
  2. Week 2
  3. Week 3
  4. Week 4
  5. Week 5
  6. Week 6
  7. Week 7
  8. Week 8
  9. Week 9
  10. Week 10
  11. Week 11
  12. Week 12
  13. Week 14

Individual journal assignments

  1. Nathan R Beshai Week 2
  2. Nathan R Beshai Week 3
  3. Nathan R Beshai Week 4
  4. Nathan R Beshai Week 5
  5. Nathan R Beshai Week 6
  6. Nathan R Beshai Week 7
  7. Nathan R Beshai Week 8
  8. Nathan R Beshai Week 9
  9. Nathan R Beshai Week 10
  10. Nathan R Beshai Week 11
  11. The D614G Research Group Week 12
  12. The D614G Research Group Week 14

Class Journals

  1. Class Journal 1
  2. Class Journal 2
  3. Class Journal 3
  4. Class Journal 4
  5. Class Journal 5
  6. Class Journal 6
  7. Class Journal 7
  8. Class Journal 8
  9. Class Journal 9
  10. Class Journal 10
  11. Class Journal 11
  12. Class Journal 12
  13. Class Journal 14

Link to Brightspace and LMU's Homepage

  1. Link to Brightspace
  2. Link to LMU's Homepage

Purpose

  • Learning how to access gene sequences and protein sequences and comparing them with other sequences in order to find a relationship or common ancestor.

Methods and Results

Methods

Accessed GenBank Records

  1. Chose one of the GenBank records from the Data & Resources section from the BIOL/F20 Week 4 page and viewed both the full record and the FASTA formatted sequence.
    • Copied and pasted the accession number in the results.
    • Listed all the information that was provided in the GenBank record?
  2. Downloaded the nucleotide sequence in FASTA format to the local hard drive.
  3. Clicked the Send to link in the upper right of the page. Selected Complete Record, File as the Destination, and FASTA as the format. Clicked the #Create File button. Be careful to remember where you put the file and what you name it so that you can find it later.
    • Opened the file that was with a word processor to confirm the sequence was present and that it is in the FASTA format. In the FASTA format, each sequence is preceded by a label that begins with the greater than sign (>).
  4. Was assigned the accession number BtCoV MK211375.1 from the Wan et. al. 2020 paper.
    • Searched for the GenBank record associated with that sequence. Added a hyperlink to the GenBank record to the list of sequences in the Data & Tools section.
    • Located the spike protein accession number in the GenBank record. (Note that the spike protein is sometimes called the "S" protein.)
    • Add a hyperlink to the spike protein record to the list of sequences in the BIOL/F20 Week 4 page, Data & Tools section. Was sure to format the list in the same way as it is already formatted.
    • Downloaded the assigned protein sequence in FASTA format, just like the whole genome sequence.
    • Added the protein sequence to the results.
      • Formatted so that there was a space before each line.
      • Also added the protein sequence to the talk page for BIOL/F20 Week 4 page.

Creating the Phylogenetic tree and sequence alignment

  1. Went to the website www.phylogeny.fr. Scrolled down on the page to the section labeled ‘Phylogeny analysis’, and clicked on the text ‘One Click’.
  1. Clicked in the large text field labeled ‘Upload your set of sequences in FASTA, EMBL, or NEXUS format’. Copied the list of sequences from the BIOL/F20 Week 4 page talk page and used Ctrl-V (or command-V) to paste sequences there, then clicked the “Submit” button.
  2. Found the numbered tabs located just beneath the text One-Click Mode and clicked on the tab labeled 3. Alignment.
    • Within the alignment, individual positions are color-coded to indicate their conservation, or how similar the sequences are to each other at that position. Blue highlighting indicates high conservation (i.e., the sequences are identical or at least very similar), while gray highlighting indicates lower conservation and white highlighting indicates little if any conservation.
  3. Near the bottom of the page, under Outputs, clicked on Alignment in Clustal format. This displayed the alignment in a text-only format in which each position's conservation is indicated by a symbol underneath the alignment block (“*” for invariant, “:” for highly conserved, “.” for weakly conserved, and a space for not conserved). Copied and pasted this entire alignment into the results. Used the space character at the beginning of each line so that the sequence lines up properly.
  4. Went back and clicked on the tab 6. Tree Rendering, and saw the phylogenetic tree of the sequences.On this tree, horizontal lines (branches) represent individual evolutionary lineages. By contrast, vertical lines (splits) represent mutation events, and the vertical length of each split is drawn purely for visual clarity with no biological meaning. The left-most split is called the root of the tree, and represents a hypothesis about the most recent common ancestor (MRCA) of the sequences within your tree.
    • The outgroups used were HKU-4 and MERS-CoV sequences.
    • The length of each branch represents the percentage change in amino acid sequence occurring along that branch, relative to the scale bar shown at the bottom of the tree. The scale bar will be a number between 0 and 1 and can be reinterpreted as a percent. For example, 0.05 would be 5%. The tree may also contain support values for each clade; shown in red on the branches, also expressed as a number between 0 and 1. 0.05 would be 5%. In general, a higher support value indicates a higher statistical confidence in a particular clade.
    • Saved the image to a file, upload it to the wiki, and displayed it in the results.
  5. Compared the tree to the multiple sequence alignment. Noted the differences in the sequences to the topology of the tree diagram and described the relationship.
  6. Related the alignment to the alignment on Figure 3 of the Wan et al. (2020) paper.
    • Found the amino acid sequences that are highlighted in the figure and mentioned them in the results. Copied the highlighted sequence from the sequence alignments and pasted them in the results.
    • Noted the similarities and differences between your alignment and the one shown in Figure 3.
  7. Compared the rendered tree to the one in Figure 2 of the Wan et al. (2020) paper.
    • Noted the similarities and differences between the tree and the one shown in Figure 2.
  8. Answered the question:Was information provided by Wan et al (2020) in their paper for us to reproduce their analysis? Explained the answer below.

Results

  1. Chose the sequence SARS Coronavirus Urbani from GenBank
    • The information provided pertaining to this specific protein sequence is the:
      1. Classifications of the Virus.
      2. references to the source of the virus.
      3. The source of features of the virus.
        1. 5' UTR CDS Ribosomal slipage (ORF 1ab- product: nonstructural polyprotein pp1ab)
        2. CDS(ORF 1ab- product: nonstructural polyprotein pp1ab)
        3. CDS(ORF 1ab; expressed via predictid-1ribosomal frameshift- product: nonstructural polyprotein pp1ab)
        4. CDS(Surface spike glycoprotein- product: s-protein)
        5. CDS(potential product, c-terminal similarity to porinn- product: protein X1)
        6. CDS(potential product- product: protein X2)
        7. CDS(potential product- product: protein X3)
        8. CDS(potential product- product: protein X4)
        9. CDS(potential product- product: protein X5)
        10. CDS(Envelope protein- product: E-Protein)
        11. CDS(Membrane protein- product: M-Protein)
        12. CDS(Nucleocapsid protein- product: N-Protein)
    • Full sequence of SARS-Coronavirus Urbani
  2. Spike protein sequence for the RNA strand BTCoV MK211275 accessed from GenBank.
>QDF43820.1 spike glycoprotein [Coronavirus BtRs-BetaCoV/YN2018A]
MKILIFAFLVTLVEAQEGCGIISRKPQPKMAQVSSSRRGVYYNDDIFRSDVLHLTQDYFLPFDSNLTQYF
SLNVDSDRYTYFDNPILDFGDGVYFAATEKSNVIRGWIFGSTFDNTTQSAVIVNNSTHIIIRVCNFNLCK
EPMYTVSRGTQQSSWVYQSAFNCTYDRVERSFQLDTAPKTGNFKDLREYVFKNRDGFLSVYQTYTAVNLP
RGLPIGFSVLRPILKLPFGINITSYRVVMAMFSQTTSNFLPESAAYYVGNLKYTTFMLRFNENGTITDAI
DCAQNPLAELKCTIKNFNVSKGIYQTSNFRVSPTQEVVRFPNITNRCPFDKVFNASRFPNVYAWERTKIS
DCVADYTVLYNSTSFSTFKCYGVSPSKLIDLCFTSVYADTFLIRSSEVRQVAPGETGVIADYNYKLPDDF
TGCVIAWNTAKQDTGHYYYRSHRKTKLKPFERDLSSDDGNGVYTLSTYDFNPNVPVAYQATRVVVLSFEL
LNAPATVCGPKLSTQLVKNQCVNFNFNGLKGTGVLTDSSKRFQSFQQFGRDTSDFTDSVRDPQTLEILDI
TPCSFGGVSVITPGTNASSEVAVLYQDVNCTDVPTAIRADQLTPAWRVYSTGVNVFQTQAGCLIGAEHVN
ASYECDIPIGAGICASYHTASTLRSVGQKSIVAYTMSLGAENSIAYANNSIAIPTNFSISVTTEVMPVSM
AKTSVDCTMYICGDSQECSNLLLQYGSFCTQLNRALTGVALEQDKNTQEVFAQVKQMYKTPAIKDFGGFN
FSQILPDPSKPTKRSFIEDLLFNKVTLADAGFMKQYGECLGDINARDLICAQKFNGLTVLPPLLTDDMIA
AYTAALVSGTATAGWTFGAGAALQIPFAMQMAYRFNGIGVTQNVLYENQKQIANQFNKAISQIQESLTTT
STALGKLQDVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQ
LIRAAEIRASANLAATKMSECVLGQSKRVDFCGRGYHLMSFPQAAPHGVVFLHVTYVPSQEKNFTTAPAI
CHEGKAYFPREGVFVSNGTFWFITQRNFYSPQIITTDNTFVAGNCDVVIGIINNTVYDPLQPELDSFKEE
LDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQELGKYEQYIKWPWYVWLGFI
AGLIAIVMATILLCCMTSCCSCLKGACSCGSCCKFDEDDSEPVLKGVKLHYT


3. Table 1:Class sequence alignment-CLUSTAL FORMAT: MUSCLE (3.8) multiple sequence alignments

Sequence number
QDF43825.1      ---------MKLLVLV-----FATLVSSYTIEKCTDFD------DRTPPSNTQFLSSHRG
AGZ48818.1      ---------MKLLVLV-----FATLVSSYTIEKCLDFD------DRTPPANTQFLSSHRG
ALK02457.1      ----------MFIFLF-----FLTLTSGSDLESCTTFD------DVQAPNYPQHSSSRRG
AAS10463.1      ----------MFIFLL-----FLTLTSGSDLDRCTTFD------DVQAPNYTQHTSSMRG
AAP13441.1      ----------MFIFLL-----FLTLTSGSDLDRCTTFD------DVQAPNYTQHTSSMRG
AAP13567.1      ----------MFIFLL-----FLTLTSGSDLDRCTTFD------DVQAPNYTQHTSSMRG
QHD43416.1      ----------MFVFLV-----LLPLVSSQ----CVNLT------TRTQLPPAYTNSFTRG
AVP78031.1      -----------MLFFL-----FLQFALVN--SQCVNLT------GRTPLNPNYTNSSQRG
ABD75323.1      --------MKILIFAF-----LVTLVKAQ--EGCGVIN------LRTQPKLTQVSSSRRG
QDF43835.1      --------MKVLIVLL-----CLGLVTAQ--DGCGHIS------TKPQPLLDKFSSSRRG
ABD75332.1      --------MKVLIFAL-----LFSLAKAQ--EGCGIIS------RKPQPKMEKVSSSRRG
QDF43820.1      --------MKILIFAF-----LVTLVEAQ--EGCGIIS------RKPQPKMAQVSSSRRG
AAZ67052.1      --------MKILILAF-----LASLAKAQ--EGCGIIS------RKPQPKMAQVSSSRRG
AFS88936.1      ----MIHSVFLLMFLLTPTESYVDVGPDSVKSACIEVDIQQTFFDKTWPRPIDVSKA-DG
YP_0010399      MTLLMCLLMSLLIFVRGCDSQFVDMSPASNTSECLESQVDAAAFSKLMWPYPIDPSKVDG
                           ::.          .        *                     .   *
QDF43825.1      VYYPDDIFRSNVLHLVQDHFLPFDSNVTRFITFGLN-------------FDN---PIIPF
AGZ48818.1      VYYPDDIFRSNVLHLVQDHFLPFDSNVTRFITFGLN-------------FDN---PIIPF
ALK02457.1      VYYPDEIFRSDTLYLTQDLFLPFYSNVTGFHTINHR-------------FDN---PVIPF
AAS10463.1      VYYPDEIFRSDTLYLTQDLFLPFYSNVTGFHTINHT-------------FDD---PVIPF
AAP13441.1      VYYPDEIFRSDTLYLTQDLFLPFYSNVTGFHTINHT-------------FGN---PVIPF
AAP13567.1      VYYPDEIFRSDTLYLTQDLFLPFYSNVTGFHTINHT-------------FDN---PVIPF
QHD43416.1      VYYPDKVFRSSVLHSTQDLFLPFFSNVTWFHAIHVS------GTNGTKRFDN---PVLPF
AVP78031.1      VYYPDTIYRSDTLVLSQGYFLPFYSNVSWYYSLTTN-------NAATKRTDN---PILDF
ABD75323.1      VYYNDDIFRSDVLHLTQDYFLPFHSNLTQYFSLNIE-------SDKIVYFDN---PILKF
QDF43835.1      VYYNDDIFRSDVLHLTQDYFLPFDTNLTRYLSFNMD-------SATKVYFDN---PTLPF
ABD75332.1      VYYNDDIFRSDVLHLTQDYFLPFDSNLTQYFSLNID-------SNKYTYFDN---PILDF
QDF43820.1      VYYNDDIFRSDVLHLTQDYFLPFDSNLTQYFSLNVD-------SDRYTYFDN---PILDF
AAZ67052.1      VYYNDDIFRSNVLHLTQDYFLPFDSNLTQYFSLNVD-------SDRFTYFDN---PILDF
AFS88936.1      IIYPQGRTYSNITITYQGLF-PYQGDHGDMYVYSAG--HATGTTPQKLFVANYSQDVKQF
YP_0010399      IIYPLGRTYSNITLAYTGLF-PLQGDLGSQYLYSVSHAVGHDGDPTKAYISNYSLLVNDF
               : *      *.      . * *   :                         :       *
QDF43825.1      RDGVYF----AATEKSNVIRG-------------WVFGSTMNNKSQ---------SVIIM
AGZ48818.1      KDGIYF----AATEKSNVIRG-------------WVFGSTMNNKSQ---------SVIIM
ALK02457.1      KDGVYF----AATEKSNVVRG-------------WVFGSTMNNKSQ---------SVIII
AAS10463.1      KDGIYF----AATEKSNVVRG-------------WVFGSTMNNKSQ---------SVIII
AAP13441.1      KDGIYF----AATEKSNVVRG-------------WVFGSTMNNKSQ---------SVIII
AAP13567.1      KDGIYF----AATEKSNVVRG-------------WVFGSTMNNKSQ---------SVIII
QHD43416.1      NDGVYF----ASTEKSNIIRG-------------WIFGTTLDSKTQ---------SLLIV
AVP78031.1      KDGIYF----AATEHSNIIRG-------------WIFGTTLDNTSQ---------SLLIV
ABD75323.1      GDGVYF----AATEKSNVIRG-------------WVFGSTFDNTTQ---------SAIIV
QDF43835.1      GDGIYF----AATEKSNVVRG-------------WIFGSTMDNTTQ---------SAIIV
ABD75332.1      GDGVYF----AATEKSNVIRG-------------WIFGSSFDNTTQ---------SAIIV
QDF43820.1      GDGVYF----AATEKSNVIRG-------------WIFGSTFDNTTQ---------SAVIV
AAZ67052.1      GDGVYF----AATEKSNVIRG-------------WIFGSTFDNTTQ---------SAVIV
AFS88936.1      ANGFVVRIGAAANSTGTVIISPSTSATIRKIYPAFMLGSSVGNFSDGKMGRFFNHTLVLL
YP_0010399      DNGFVVRIGAAANSTGTIVISPSVNTKIKKAYPAFILGSSLTNTSAGQ-PLYANYSLTII
                  :*. .    *:.. ..:: .             :::*::. . :          :  ::
QDF43825.1      NNSTNLVIRACNFELCDNPFFVVLRSNNTQIPSY------IFNNAFN-CTFEYVSKDFNL
AGZ48818.1      NNSTNLVIRACNFELCDNPFFVVLKSNNTQIPSY------IFNNAFN-CTFEYVSKDFNL
ALK02457.1      NNSTNVVIRACNFELCDNPFFAVSKPTGTQTHTM------IFDNAFN-CTFEYISDSFSL
AAS10463.1      NNSTNVVIRACNFELCDNPFFVVSKPMGTRTHTM------IFDNAFN-CTFEYISDAFSL
AAP13441.1      NNSTNVVIRACNFELCDNPFFAVSKPMGTQTHTM------IFDNAFN-CTFEYISDAFSL
AAP13567.1      NNSTNVVIRACNFELCDNPFFAVSKPMGTQTHTM------IFDNAFN-CTFEYISDAFSL
QHD43416.1      NNATNVVIKVCEFQFCNDPFLGVYY--HKNNKSWMESEFRVYSSANN-CTFEYVSQPFLM
AVP78031.1      NNATNVIIKVCNFDFCYDP-YLSGY--YHNNKTWSIREFAVYSSYAN-CTFEYVSKSFML
ABD75323.1      NNSTHIIIRVCYFNLCKDPMYTVSA--GTQKSSW------VYQSAFN-CTYDRVEKSFQL
QDF43835.1      NNSTHIIIRVCYFNLCKEPMYAISN--EQHYKSW------VYQNAYN-CTYDRVEQSFQL
ABD75332.1      NNSTHIIIRVCNFNLCKEPMYTVSK--GTQQSSW------VYQSAFN-CTYDRVEKSFQL
QDF43820.1      NNSTHIIIRVCNFNLCKEPMYTVSR--GTQQSSW------VYQSAFN-CTYDRVERSFQL
AAZ67052.1      NNSTHIIIRVCNFNLCKEPMYTVSR--GAQQSSW------VYQSAFN-CTYDRVEKSFQL
AFS88936.1      PDGCGTLLRAFYCIL--EPRSGNHCPAGNSYTSF-----ATYHTPATDCSDGNYNRNASL
YP_0010399      PDGCGTVLHAFYCIL--KPRTVNRCPSGTGYVSY-----FIYETVHNDCQ-STINRNASL
                 :.   ::..    :  .*             :        : .  . *     .    :
QDF43825.1      DIGEKPGNFKDLREFVFRNKDG--------FLHVYSGYQPISAASGLPTGF--NALKPIF
AGZ48818.1      DLGEKPGNFKDLREFVFRNKDG--------FLHVYSGYQPISAASGLPTGF--NALKPIF
ALK02457.1      DVAEKSGNFKHLREFVFKNKDG--------FLYVYKGYQPIDVVRDLPSGF--NILKPIF
AAS10463.1      DVSEKSGNFKHLREFVFKNKDG--------FLYVYKGYQPIDVVRDLPSGF--NTLKPIF
AAP13441.1      DVSEKSGNFKHLREFVFKNKDG--------FLYVYKGYQPIDVVRDLPSGF--NTLKPIF
AAP13567.1      DVSEKSGNFKHLREFVFKNKDG--------FLYVYKGYQPIDVVRDLPSGF--NTLKPIF
QHD43416.1      DLEGKQGNFKNLREFVFKNIDG--------YFKIYSKHTPINLVRDLPQGF--SALEPLV
AVP78031.1      NISGNGGLFNTLREFVFRNVDG--------HFKIYSKFTPVNLNRGLPTGL--SVLQPLV
ABD75323.1      DTSPKTGNFTDLREFVFKNRDG--------FFTAYQTYTPVNLLRGLPSGL--SVLKPIL
QDF43835.1      DTAPQTGNFKDLREYVFKNKDG--------FLSVYNAYSPIDIPRGLPVGF--SVLKPIL
ABD75332.1      DTAPKTGNFKDLREYVFKNKGG--------FLRVYQTYTAVNLPRGFPAGF--SVLRPIL
QDF43820.1      DTAPKTGNFKDLREYVFKNRDG--------FLSVYQTYTAVNLPRGLPIGF--SVLRPIL
AAZ67052.1      DTAPKTGNFKDLREYVFKNRDG--------FLSVYQTYTAVNLPRGLPIGF--SVLRPIL
AFS88936.1      NSFKE---YFNLRNCTFMYTYNITEDEILEWFGITQTAQGVHLFSSRYVDLYGGNMFQFA
YP_0010399      NSFK---SFFDLVNCTFFNSWDITADETKEWFGITQDTQGVHLYSSRKGDLYGGNMFRFA
                :       :  * : .*    .         :   .    :    .   .:  . :  : 
QDF43825.1      KLPLGINITNFRTLLTAF------PPNPGYWGTSAAAYFVGYLKPTTFMLKYDENGTITD
AGZ48818.1      KLPLGINITNFRTLLTAF------PPRPDYWGTSAAAYFVGYLKPTTFMLKYDENGTITD
ALK02457.1      KLPLGINITNFRAILTAF------LPAQDTWGTSAAAYFVGYLKPATFMLKYDENGTITD
AAS10463.1      KLPLGINITNFRAILTAF------SPAQDTWGTSAAAYFVGYLKPTTFMLKYDENGTITD
AAP13441.1      KLPLGINITNFRAILTAF------SPAQDIWGTSAAAYFVGYLKPTTFMLKYDENGTITD
AAP13567.1      KLPLGINITNFRAILTAF------SPAQDTWGTSAAAYFVGYLKPTTFMLKYDENGTITD
QHD43416.1      DLPIGINITRFQTLLALHRSYLTPGDSSSGWTAGAAAYYVGYLQPRTFLLKYNENGTITD
AVP78031.1      ELPVSINITKFRTLLTIHRGD---PMPNNGWTAFSAAYFVGYLKPRTFMLKYNENGTITD
ABD75323.1      KLPFGINITSFRVVMAMF------SKTTSNYVPESAAYYVGNLKQSTFMLSFNQNGTIVD
QDF43835.1      KLPIGINITSFKVVMSMF------SRTTSNFLPEVAAYFVGNLKYSTFMLNFNENGTITD
ABD75332.1      KLPFGINITSYRVVMTMF------SQFNSNFLPESAAYYVGNLKYTTFMLSFNENGTITD
QDF43820.1      KLPFGINITSYRVVMAMF------SQTTSNFLPESAAYYVGNLKYTTFMLRFNENGTITD
AAZ67052.1      KLPFGINITSYRVVMAMF------SQTTSNFLPESAAYYVGNLKYTTFMLSFNENGTITN
AFS88936.1      TLPVYDTIKYYSIIPHSIRSI---QSDRKAW----AAFYVYKLQPLTFLLDFSVDGYIRR
YP_0010399      TLPVYEGIKYYTVIPRSFRSK---ANKREAW----AAFYVYKLHQLTYLLDFSVDGYIRR
                 **.   *. :  :                :    **::*  *:  *::* :. :* *  
QDF43825.1      AVDCSQNPLAELKCSVKSFEIDKGIYQTSNFRVAPSKEVVRFPNITNLCPFGEVFNATTF
AGZ48818.1      AVDCSQNPLAELKCSVKSFEIDKGIYQTSNFRVAPSKEVVRFPNITNLCPFGEVFNATTF
ALK02457.1      AVDCSQNPLAELKCSVKSFEIDKGIYQTSNFRVAPSKEVVRFPNITNLCPFGEVFNATTF
AAS10463.1      AVDCSQNPLAELKCSVKSFEIDKGIYQTSNFRVVPSGDVVRFPNITNLCPFGEVFNATKF
AAP13441.1      AVDCSQNPLAELKCSVKSFEIDKGIYQTSNFRVVPSGDVVRFPNITNLCPFGEVFNATKF
AAP13567.1      AVDCSQNPLAELKCSVKSFEIDKGIYQTSNFRVVPSGDVVRFPNITNLCPFGEVFNATKF
QHD43416.1      AVDCALDPLSETKCTLKSFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFGEVFNATRF
AVP78031.1      AVDCALDPLSETKCTLKSLTVQKGIYQTSNFRVQPTQSVVRFPNITNVCPFHKVFNATRF
ABD75323.1      AVDCSQDPLAELKCTTKSFNVSKGIYQTSNFRVSPVTEVVRFPNITNLCPFDKVFNATRF
QDF43835.1      AIDCAQNPLSELKCTIKNFNVSKGIYQTSNFRVSPTHEVIRFPNITNRCPFDKVFNASRF
ABD75332.1      AVDCSQNPLAELKCTIKNFNVSKGIYQTSNFRVTPTQEVVRFPNITNRCPFDKVFNASRF
QDF43820.1      AIDCAQNPLAELKCTIKNFNVSKGIYQTSNFRVSPTQEVVRFPNITNRCPFDKVFNASRF
AAZ67052.1      AIDCAQNPLAELKCTIKNFNVSKGIYQTSNFRVSPTQEVIRFPNITNRCPFDKVFNATRF
AFS88936.1      AIDCGFNDLSQLHCSYESFDVESGVYSVSSFEAKPSGSVVEQAEGVE-CDFSPLLSGTP-
YP_0010399      AIDCGHDDLSQLHCSYTSFEVDTGVYSVSSYEASATGTFIEQPNATE-CDFSPMLTGVA-
                *:**. : *:: :*:  .: :..*:*..*.: . .   .:  .: .: * *  ::..   
QDF43825.1      PSVYAWERKRISNCVADYSVLYNSTSFSTFKCYGVSATKLNDLCFSNVYADSFVVKGDDV
AGZ48818.1      PSVYAWERKRISNCVADYSVLYNSTSFSTFKCYGVSATKLNDLCFSNVYADSFVVKGDDV
ALK02457.1      PSVYAWERKRISNCVADYSVLYNSTSFSTFKCYGVSATKLNDLCFSNVYADSFVVKGDDV
AAS10463.1      PSVYAWERKRISNCVADYSVLYNSTSFSTFKCYGVSATKLNDLCFSNVYADSFVVKGDDV
AAP13441.1      PSVYAWERKKISNCVADYSVLYNSTFFSTFKCYGVSATKLNDLCFSNVYADSFVVKGDDV
AAP13567.1      PSVYAWERKKISNCVADYSVLYNSTFFSTFKCYGVSATKLNDLCFSNVYADSFVVKGDDV
QHD43416.1      ASVYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLNDLCFTNVYADSFVIRGDEV
AVP78031.1      PSVYAWERTKISDCIADYTVFYNSTSFSTFKCYGVSPSKLIDLCFTSVYADTFLIRFSEV
ABD75323.1      PSVYAWERTKISDCVADYTVFYNSTSFSTFNCYGVSPSKLIDLCFTSVYADTFLIRFSEV
QDF43835.1      PNVYAWERTKISDCVADYTVLYNSTSFSTFKCYGVSPSKLIDLCFTSVYADTFLIRSSEV
ABD75332.1      PNVYAWERTKISDCVADYTVLYNSTSFSTFKCYGVSPSKLIDLCFTSVYADTFLIRSSEV
QDF43820.1      PNVYAWERTKISDCVADYTVLYNSTSFSTFKCYGVSPSKLIDLCFTSVYADTFLIRSSEV
AAZ67052.1      PNVYAWERTKISDCVADYTVLYNSTSFSTFKCYGVSPSKLIDLCFTSVYADTFLIRSSEV
AFS88936.1      PQVYNFKRLVFTNCNYNLTKLLSLFSVNDFTCSQISPAAIASNCYSSLILDYFSYPLSMK
YP_0010399      PQVYNFKRLVFSNCNYNLTKLLSLFAVDEFSCNGISPDSIARGCYSTLTVDYFAYPLSMK
                ..** ::*  :::*  : : : .   .. *.*  :*.  :   *::.:  * *    .  
QDF43825.1      RQIAPGQTGVIADYNYKLPDDFMGC-VLAWNTRNIDATSTGNYNYKYRSLRHGKLRPFER
AGZ48818.1      RQIAPGQTGVIADYNYKLPDDFTGC-VLAWNTRNIDATQTGNYNYKYRSLRHGKLRPFER
ALK02457.1      RQIAPGQTGVIADYNYKLPDDFTGC-VLAWNTRNIDATQTGNYNYKYRSLRHGKLRPFER
AAS10463.1      RQIAPGQTGVIADYNYKLPDDFMGC-VLAWNTRNIDATSTGNYNYKYRYLRHGKLRPFER
AAP13441.1      RQIAPGQTGVIADYNYKLPDDFMGC-VLAWNTRNIDATSTGNYNYKYRYLRHGKLRPFER
AAP13567.1      RQIAPGQTGVIADYNYKLPDDFMGC-VLAWNTRNIDATSTGNYNYKYRYLRHGKLRPFER
QHD43416.1      RQIAPGQTGKIADYNYKLPDDFTGC-VIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFER
AVP78031.1      RQVAPGQTGVIADYNYKLPDDFTGC-VIAWNTAKQD---VGNYF--YRSHRSTKLKPFER
ABD75323.1      RQVAPGQTGVIADYNYKLPDDFTGC-VIAWNTAKQD---VGSYF--YRSHRSSKLKPFER
QDF43835.1      RQVAPGETGVIADYNYKLPDDFTGC-VIAWNTAKQD---QGQYY--YRSSRKTKLKPFER
ABD75332.1      RQVAPGETGVIADYNYKLPDDFTGC-VIAWNTAQQD---QGQYY--YRSYRKEKLKPFER
QDF43820.1      RQVAPGETGVIADYNYKLPDDFTGC-VIAWNTAKQD---TGHYY--YRSHRKTKLKPFER
AAZ67052.1      RQVAPGETGVIADYNYKLPDDFTGC-VIAWNTAKQD---QGQYY--YRSHRKTKLKPFER
AFS88936.1      SDLSVSSAGPISQFNYKQSFSNPTC-LILATVPHNLTTITKPLKYSYINKCSRLLSDDRT
YP_0010399      SYIRPGSAGNIPLYNYKQSFANPTCRVMASVLANVTITKPHAYG--YIS-KCSRLTGANQ
                  :  ..:* *. :*** .     * ::     :            *       *     
QDF43825.1      DISNVPFSPDGKPCTPP-AF-NCYW-----------PLNDYGFFTTNGIGYQPYRVVVLS
AGZ48818.1      DISNVPFSPDGKPCTPP-AF-NCYW-----------PLNDYGFYITNGIGYQPYRVVVLS
ALK02457.1      DISNVPFSPDGKPCTPP-AF-NCYW-----------PLNDYGFYITNGIGYQPYRVVVLS
AAS10463.1      DISNVPFSPDGKPCTPP-AP-NCYW-----------PLNGYGFYTTSGIGYQPYRVVVLS
AAP13441.1      DISNVPFSPDGKPCTPP-AL-NCYW-----------PLNDYGFYTTTGIGYQPYRVVVLS
AAP13567.1      DISNVPFSPDGKPCTPP-AL-NCYW-----------PLNDYGFYTTTGIGYQPYRVVVLS
QHD43416.1      DISTEIYQAGSTPCNGVEGF-NCYF-----------PLQSYGFQPTNGVGYQPYRVVVLS
AVP78031.1      DLSSDE---------------NGVR-----------TLSTYDFNPNVPLEYQATRVVVLS
ABD75323.1      DLSSEE---------------NGVR-----------TLSTYDFNQNVPLEYQATRVVVLS
QDF43835.1      DLTSDE---------------NGVR-----------TLSTYDFYPNVPIEYQATRVVVLS
ABD75332.1      DLSSDE---------------NGVY-----------TLSTYDFYPSIPVEYQATRVVVLS
QDF43820.1      DLSSDDG--------------NGVY-----------TLSTYDFNPNVPVAYQATRVVVLS
AAZ67052.1      DLSSDE---------------NGVR-----------TLSTYDFYPSVPVAYQATRVVVLS
AFS88936.1      EVPQLVNANQYSPCVSI-VP-STVWEDGDYYRKQLSPLEGGGWLVASGSTVAMTEQLQMG
YP_0010399      DVETPLYINPGEYSICRDFSPGGFSEDGQVFKRTLTQFEGGGLLIGVGTRVPMTDNLQMS
                ::                   .               :.  .              : :.
QDF43825.1      FELL----NAPATVC-----GPKLSTDLIKNQCVNFNFNGLTGTGVLTPSSKRFQPFQQF
AGZ48818.1      FELL----NAPATVC-----GPKLSTDLIKNQCVNFNFNGLTGTGVLTPSSKRFQPFQQF
ALK02457.1      FELL----NAPATVC-----GPKLSTDLIKNQCVNFNFNGLTGTGVLTPSSKRFQPFQQF
AAS10463.1      FELL----NAPATVC-----GPKLSTDLIKNQCVNFNFNGLTGTGVLTPSSKRFQPFQQF
AAP13441.1      FELL----NAPATVC-----GPKLSTDLIKNQCVNFNFNGLTGTGVLTPSSKRFQPFQQF
AAP13567.1      FELL----NAPATVC-----GPKLSTDLIKNQCVNFNFNGLTGTGVLTPSSKRFQPFQQF
QHD43416.1      FELL----HAPATVC-----GPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKKFLPFQQF
AVP78031.1      FELL----NAPATVC-----GPKLSTQLVKNQCVNFNFNGLKGTGVLTDSSKRFQSFQQF
ABD75323.1      FELL----NAPATVC-----GPKLSTSLVKNQCVNFNFNGFKGTGVLTDSSKTFQSFQQF
QDF43835.1      FELL----NAPATVC-----GPKLSTGLVKNQCVNFNFNGLRGTGVLTDSSKRFQSFQQF
ABD75332.1      FELL----NAPATVC-----GPKLSTQLVKNQCVNFNFNGLRGTGVLTTSSKRFQSFQQF
QDF43820.1      FELL----NAPATVC-----GPKLSTQLVKNQCVNFNFNGLKGTGVLTDSSKRFQSFQQF
AAZ67052.1      FELL----NAPATVC-----GPKLSTQLVKNQCVNFNFNGLKGTGVLTESSKRFQSFQQF
AFS88936.1      FGITVQYGTDTNSVCPKLEFANDTKIASQLGNCVEYSLYGVSGRGVFQNCTAVGVRQQRF
YP_0010399      FIISVQYGTGTDSVCPMLDLGDSLTITNRLGKCVDYSLYGVTGRGVFQNCTAVGVKQQRF
                * :       . :**     . . .     .:**::.: *. * **:  ..      *.*
QDF43825.1      GRDVSD-FTDSVRDPKTSEILDISPCSFGGVSVITPGTNTSSEVAVLYQDVNCTDVPVAI
AGZ48818.1      GRDVSD-FTDSVRDPKTSEILDISPCSFGGVSVITPGTNTSSEVAVLYQDVNCTDVPVAI
ALK02457.1      GRDVLD-FTDSVRDPKTSEILDISPCSFGGVSVITPGTNTSSEVAVLYQDVNCTDVPVAI
AAS10463.1      GRDVSD-FTDSVRDPKTSEILDISPCSFGGVSVITPGTNASSEVAVLYQDVNCTDVSTLI
AAP13441.1      GRDVSD-FTDSVRDPKTSEILDISPCSFGGVSVITPGTNASSEVAVLYQDVNCTDVSTAI
AAP13567.1      GRDVSD-FTDSVRDPKTSEILDISPCSFGGVSVITPGTNASSEVAVLYQDVNCTDVSTAI
QHD43416.1      GRDIAD-TTDAVRDPQTLEILDITPCSFGGVSVITPGTNTSNQVAVLYQDVNCTEVPVAI
AVP78031.1      GKDASD-FIDSVRDPQTLEILDITPCSFGGVSVITPGTNTSLEVAVLYQDVNCTDVPTTI
ABD75323.1      GRDASD-FTDSVRDPQTLRILDISPCSFGGVSVITPGTNTSSAVAVLYQDVNCTDVPRTI
QDF43835.1      GRDTSD-FTDSVRDPQTLEILDITPCSFGGVSVITPGTNASSEVAVLYQDVNCTDVPTAI
ABD75332.1      GRDTSD-FTDSVRDPQTLEILDISPCSFGGVSVITPGTNASSEVAVLYQDVNCTDVPTSI
QDF43820.1      GRDTSD-FTDSVRDPQTLEILDITPCSFGGVSVITPGTNASSEVAVLYQDVNCTDVPTAI
AAZ67052.1      GRDTSD-FTDSVRDPQTLEILDISPCSFGGVSVITPGTNASSEVAVLYQDVNCTDVPAAI
AFS88936.1      VYDAYQNLVGYYSDDGNYYCLR--ACVSVPVSVIY--DKETKTHATLFGSVACEHISSTM
YP_0010399      VYDSFDNLVGYYSDDGNYYCVR--PCVSVPVSVIY--DKSTNLHATLFGSVACEHVTTMM
                 *  :   .   *  .   :   .*    ****    : :   *.*: .* *  :.  :
QDF43825.1      --HADQLTPAWRIYSTGNNVFQTQAGCLIGAEHVDTSY---ECDIPIGAGICASYHTVSS
AGZ48818.1      --HADQLTPSWRVYSTGNNVFQTQAGCLIGAEHVDTSY---ECDIPIGAGICASYHTVSS
ALK02457.1      --HADQLTPSWRVYSTGNNVFQTQAGCLIGAEHVDTSY---ECDIPIGAGICASYHTVSS
AAS10463.1      --HAEQLTPAWRIYSTGNNVFQTQAGCLIGAEHVDTSY---ECDIPIGAGICASYHTVSS
AAP13441.1      --HADQLTPAWRIYSTGNNVFQTQAGCLIGAEHVDTSY---ECDIPIGAGICASYHTVSL
AAP13567.1      --HADQLTPAWRIYSTGNNVFQTQAGCLIGAEHVDTSY---ECDIPIGAGICASYHTVSL
QHD43416.1      --HADQLTPTWRVYSTGSNVFQTRAGCLIGAEHVNNSY---ECDIPIGAGICASYQTQTN
AVP78031.1      --HADQLTPAWRIYATGTNVFQTQAGCLIGAEHVNASY---ECDIPIGAGICASYHTASI
ABD75323.1      --QADQLAPSWRVYTTGPYVFQTQAGCLIGAEHVNASY---QCDIPIGAGICASYHTASH
QDF43835.1      --RADQLTPAWRVYSTGINVFQTQAGCLIGAEHVNASY---ECDIPIGAGICASYHTAST
ABD75332.1      --HADQLTPAWRVYSTGVNVFQTQAGCLIGAEHVNASY---ECDIPIGAGICASYHTASV
QDF43820.1      --RADQLTPAWRVYSTGVNVFQTQAGCLIGAEHVNASY---ECDIPIGAGICASYHTAST
AAZ67052.1      --HADQLTPAWRVYSTGTNVFQTQAGCLIGAEHVNASY---ECDIPIGAGICASYHTAST
AFS88936.1      SQYSRSTRSMLKRRDSTYGPLQTPVGCVLGL--VNSSLFVEDCKLPLGQSLCALPDTPST
YP_0010399      S-QFSRLTQSNLRRRDSNIPLQTAVGCVIGLS--NNSLVVSDCKLPLGQSLCAV-PPVST
                                    :** .**::*    : *    :*.:*:* .:**   . : 
QDF43825.1      ----LRSTS----QKSI--------VAYTMSLGADSSIAYSNNTIAIPTNFSISITTEVM
AGZ48818.1      ----LRSTS----QKSI--------VAYTMSLGADSSIAYSNNTIAIPTNFSISITTEVM
ALK02457.1      ----LRSTS----QKSI--------VAYTMSLGADSSIAYSNNTIAIPTNFSISITTEVM
AAS10463.1      ----LRSTS----QKSI--------VAYTMSLGADSSIAYSNNTIAIPTNFSISITTEVM
AAP13441.1      ----LRSTS----QKSI--------VAYTMSLGADSSIAYSNNTIAIPTNFSISITTEVM
AAP13567.1      ----LRSTS----QKSI--------VAYTMSLGADSSIAYSNNTIAIPTNFSISITTEVM
QHD43416.1      SPRRARSVA----SQSI--------IAYTMSLGAENSVAYSNNSIAIPTNFTISVTTEIL
AVP78031.1      ----LRSTS----QKAI--------VAYTMSLGAENSIAYANNSIAIPTNFSISVTTEVM
ABD75323.1      ----LRSTG----QKSI--------VAYTMSLGAENSVAYANNSIAIPTNFSISVTTEVM
QDF43835.1      ----LRSVG----QKSI--------VAYTMSLGAENSIAYANNSIAIPTNFSISVTTEVM
ABD75332.1      ----LRSTG----QKSI--------VAYTMSLGAENSIAYANNSIAIPTNFSISVTTEVM
QDF43820.1      ----LRSVG----QKSI--------VAYTMSLGAENSIAYANNSIAIPTNFSISVTTEVM
AAZ67052.1      ----LRSVG----QKSI--------VAYTMSLGAENSIAYANNSIAIPTNFSISVTTEVM
AFS88936.1      ----LTPRS----VRSVPGEMRLASIAFNHPIQVDQ-LNSSYFKLSIPTNFSFGVTQEYI
YP_0010399      ----FRSYSASQFQLAV--------LNYTSPIVV-TPINSSGFTAAIPTNFSFSVTQEYI
                      . .      ::        : :. .: .   :  :  . :*****::.:* * :
QDF43825.1      PVSMAKTSVDCNMYICGDSTECANLLLQYGSFCTQLNRALSGIAVEQDRNTREVFAQVKQ
AGZ48818.1      PVSMAKTSVDCNMYICGDSTECANLLLQYGSFCTQLNRALSGIAVEQDRNTREVFAQVKQ
ALK02457.1      PVSMAKTSVDCNMYICGDSTECANLLLQYGSFCTQLNRALSGIAVEQDRNTREVFAQVKQ
AAS10463.1      PVSMAKTSVDCNMYICGDSTECANLLLQYGSFCRQLNRALSGIAAEQDRNTREVFVQVKQ
AAP13441.1      PVSMAKTSVDCNMYICGDSTECANLLLQYGSFCTQLNRALSGIAAEQDRNTREVFAQVKQ
AAP13567.1      PVSMAKTSVDCNMYICGDSTECANLLLQYGSFCTQLNRALSGIAAEQDRNTREVFAQVKQ
QHD43416.1      PVSMTKTSVDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGIAVEQDKNTQEVFAQVKQ
AVP78031.1      PVSMAKTSVDCTMYICGDSIECSNLLLQYGSFCTQLNRALSGIAIEQDKNTQEVFAQVKQ
ABD75323.1      PVSMAKTSVDCTMYICGDSLECSNLLLQYGSFCTQLNRALSGIAVEQDKNTQEVFAQVKQ
QDF43835.1      PVSMSKTSVDCTMYICGDSQECSNLLLQYGSFCTQLNRALTGIAIEQDKNTQEVFAQVKQ
ABD75332.1      PVSIAKTSVDCTMYICGDSLECSNLLLQYGSFCTQLNRALTGIAIEQDKNTQEVFAQVKQ
QDF43820.1      PVSMAKTSVDCTMYICGDSQECSNLLLQYGSFCTQLNRALTGVALEQDKNTQEVFAQVKQ
AAZ67052.1      PVSMAKTSVDCTMYICGDSLECSNLLLQYGSFCTQLNRALSGIAIEQDKNTQEVFAQVKQ
AFS88936.1      QTTIQKVTVDCKQYVCNGFQKCEQLLREYGQFCSKINQALHGANLRQDDSVRNLFASVKS
YP_0010399      ETSIQKVTVDCKQYVCNGFTRCEKLLVEYGQFCSKINQALHGANLRQDESVYSLYSNIKT
                 .:: *.:***. *:*..   * :** :**.** ::*.** *    ** .. .:: .:* 
QDF43825.1      MYKTPTLKD-FGG-FNFSQILPDPLKPTKRSF---IEDLLFNKVTLADAGFMKQYGECL-
AGZ48818.1      MYKTPTLKD-FGG-FNFSQILPDPLKPTKRSF---IEDLLFNKVTLADAGFMKQYGECL-
ALK02457.1      MYKTPTLKD-FGG-FNFSQILPDPLKPTKRSF---IEDLLFNKVTLADAGFMKQYGECL-
AAS10463.1      MYKTPTLKD-FGG-FNFSQILPDPLKPTKRSF---IEDLLFNKVTLADAGFMKQYGECL-
AAP13441.1      MYKTPTLKY-FGG-FNFSQILPDPLKPTKRSF---IEDLLFNKVTLADAGFMKQYGECL-
AAP13567.1      MYKTPTLKY-FGG-FNFSQILPDPLKPTKRSF---IEDLLFNKVTLADAGFMKQYGECL-
QHD43416.1      IYKTPPIKD-FGG-FNFSQILPDPSKPSKRSF---IEDLLFNKVTLADAGFIKQYGDCL-
AVP78031.1      IYKTPPIKD-FGG-FNFSQILPDPSKPSKRSF---IEDLLFNKVTLADAGFIKQYGDCL-
ABD75323.1      MYKTPTIRD-FGG-FNFSQILPDPLKPTKRSF---IEDLLYNKVTLADAGFMKQYADCL-
QDF43835.1      MYKTPAIKD-FGG-FNFSQILPDPSKPTKRSF---IEDLLFNKVTLADAGFMKQYGECL-
ABD75332.1      MYKTPAIKD-FGG-FNFSQILPDPSKPTKRSF---IEDLLFNKVTLADAGFMKQYGECL-
QDF43820.1      MYKTPAIKD-FGG-FNFSQILPDPSKPTKRSF---IEDLLFNKVTLADAGFMKQYGECL-
AAZ67052.1      MYKTPAIKD-FGG-FNFSQILPDPSKPTKRSF---IEDLLFNKVTLADAGFMKQYGECL-
AFS88936.1      SQSSPIIPG-FGGDFNLTLLEPVSISTGSRSARSAIEDLLFDKVTIADPGYMQGYDDCMQ
YP_0010399      T-STQTLEYGLNGDFNLTLLQVPQIGGSSSSYRSAIEDLLFDKVTIADPGYMQGYDDCMK
                  .:  :   :.* **:: :        . *    *****::***:**.*::: * :*: 
QDF43825.1      -GDINARDLICAQKFNGLTVLPPLLTDDMIAAYTAALVSGTATAGWTFGAGAALQIPFAM
AGZ48818.1      -GDINARDLICAQKFNGLTVLPPLLTDDMIAAYTAALVSGTATAGWTFGAGAALQIPFAM
ALK02457.1      -GDINARDLICAQKFNGLTVLPPLLTDDMIAAYTAALVSGTATAGWTFGAGAALQIPFAM
AAS10463.1      -GDINARDLICAQKFNGLTVLPPLLTDDMIAAYTAALVSGTATAGWTFGAGAALQIPFAM
AAP13441.1      -GDINARDLICAQKFNGLTVLPPLLTDDMIAAYTAALVSGTATAGWTFGAGAALQIPFAM
AAP13567.1      -GDINARDLICAQKFNGLTVLPPLLTDDMIAAYTAALVSGTATAGWTFGAGAALQIPFAM
QHD43416.1      -GDIAARDLICAQKFNGLTVLPPLLTDEMIAQYTSALLAGTITSGWTFGAGAALQIPFAM
AVP78031.1      -GGISARDLICAQKFNGLTVLPPLLTDEMIAAYTAALISGTATAGWTFGAGAALQIPFAM
ABD75323.1      -GGINARDLICAQKFNGLTVLPPLLTDDMIAAYTAALISGTATAGWTFGAGAALQIPFAM
QDF43835.1      -GDINARDLICAQKFNGLTVLPPLLTDDMIAAYTAALVSGTATAGWTFGAGAALQIPFAM
ABD75332.1      -GDISARDLICAQKFNGLTVLPPLLTDEMIAAYTAALVSGTATAGWTFGAGSALQIPFAM
QDF43820.1      -GDINARDLICAQKFNGLTVLPPLLTDDMIAAYTAALVSGTATAGWTFGAGAALQIPFAM
AAZ67052.1      -GDISARDLICAQKFNGLTVLPPLLTDEMIAAYTAALVSGTATAGWTFGAGSALQIPFAM
AFS88936.1      QGPASARDLICAQYVAGYKVLPPLMDVNMEAAYTSSLLGSIAGVGWTAGLSSFAAIPFAQ
YP_0010399      QGPQSARDLICAQYVSGYKVLPPLYDPNMEAAYTSSLLGSIAGAGWTAGLSSFAAIPFAQ
                 *   ******** . * .*****   :* * **::*:..    *** * .:   **** 
QDF43825.1      QMAYRFNGIGVTQNVLYENQKQIANQFNKAISQIQESLTTTSTALGKLQDVVNQNAQALN
AGZ48818.1      QMAYRFNGIGVTQNVLYENQKQIANQFNKAISQIQESLTTTSTALGKLQDVVNQNAQALN
ALK02457.1      QMAYRFNGIGVTQNVLYENQKQIANQFNKAISQIQESLTTTSTALGKLQDVVNQNAQALN
AAS10463.1      QMAYRFNGIGVTQNVLYENQKQIANQFNKAISQIQESLTTTSTALGKLQDVVNQNAQALN
AAP13441.1      QMAYRFNGIGVTQNVLYENQKQIANQFNKAISQIQESLTTTSTALGKLQDVVNQNAQALN
AAP13567.1      QMAYRFNGIGVTQNVLYENQKQIANQFNKAISQIQESLTTTSTALGKLQDVVNQNAQALN
QHD43416.1      QMAYRFNGIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTASALGKLQDVVNQNAQALN
AVP78031.1      QMAYRFNGIGVTQNVLYENQKLIANQFNSAIGKIQESLTSTASALGKLQDVVNQNAQALN
ABD75323.1      QMAYRFNGIGVTQNVLYENQKQIANQFNKAITQIQESLTTTSTALGKLQDVVNQNAQALN
QDF43835.1      QMAYRFNGIGVTQNVLYENQKQIANQFNKAISQIQESLTTTSTALGKLQDVVNQNAQALN
ABD75332.1      QMAYRFNGIGVTQNVLYENQKQIANQFNKAISQIQESLTTTSTALGKLQDVVNQNAQALN
QDF43820.1      QMAYRFNGIGVTQNVLYENQKQIANQFNKAISQIQESLTTTSTALGKLQDVVNQNAQALN
AAZ67052.1      QMAYRFNGIGVTQNVLYENQKQIANQFNKAISQIQESLTTTSTALGKLQDVVNQNAQALN
AFS88936.1      SIFYRLNGVGITQQVLSENQKLIANKFNQALGAMQTGFTTTNEAFQKVQDAVNNNAQALS
YP_0010399      SMFYRLNGVGITQQVLSENQKLIANKFNQALGAMQTGFTTSNLAFSKVQDAVNANAQALS
                .: **:**:*:**:** **** ***:**.*:  :* .::::  *: *:**.** *****. 
QDF43825.1      TLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRA
AGZ48818.1      TLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRA
ALK02457.1      TLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRA
AAS10463.1      TLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRA
AAP13441.1      TLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRA
AAP13567.1      TLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRA
QHD43416.1      TLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRA
AVP78031.1      TLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRA
ABD75323.1      TLVKQLSSNFGAISSALNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRA
QDF43835.1      TLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRA
ABD75332.1      TLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRA
QDF43820.1      TLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRA
AAZ67052.1      TLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRA
AFS88936.1      KLASELSNTFGAISASIGDIIQRLDVLEQDAQIDRLINGRLTTLNAFVAQQLVRSESAAL
YP_0010399      KLASELSNTFGAISSSISDILARLDTVEQDAQIDRLINGRLISLNAFVSQQLVRSETAAR
                .*..:**..*****: :.**: *** :* :.******.*** :*:::*:***:*:     
QDF43825.1      SANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQAAPHGVVFLHVTYVPSQERNFTTAPA
AGZ48818.1      SANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQAAPHGVVFLHVTYVPSQERNFTTAPA
ALK02457.1      SANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQAAPHGVVFLHVTYVPSQERNFTTAPA
AAS10463.1      SANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQAAPHGVVFLHVTYVPSQERNFTTAPA
AAP13441.1      SANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQAAPHGVVFLHVTYVPSQERNFTTAPA
AAP13567.1      SANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQAAPHGVVFLHVTYVPSQERNFTTAPA
QHD43416.1      SANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQEKNFTTAPA
AVP78031.1      SANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYIPSQEKNFTTAPA
ABD75323.1      SANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPSQEKNFTTAPA
QDF43835.1      SANLAATKMSECVLGQSKRVDFCGRGYHLMSFPQAAPHGVVFLHVTYVPSQEKNFTTAPA
ABD75332.1      SANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQAAPHGVVFLHVTYVPSQERNFTTAPA
QDF43820.1      SANLAATKMSECVLGQSKRVDFCGRGYHLMSFPQAAPHGVVFLHVTYVPSQEKNFTTAPA
AAZ67052.1      SANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQAAPHGVVFLHVTYVPSQERNFTTAPA
AFS88936.1      SAQLAKDKVNECVKAQSKRSGFCGQGTHIVSFVVNAPNGLYFMHVGYYPSNHIEVVSAYG
YP_0010399      SAQLASDKVNECVKSQSKRNGFCGSGTHIVSFVVNAPNGFYFFHVGYVPTNYTNVTAAYG
                **:**  *:.*** .**** .*** * *::**   **:*. *:** * *::  :..:* .
QDF43825.1      ICHEGK---AYFPREGVFVFNGTS-------WFITQRNFFSPQIITTDNT-FVSGSCDVV
AGZ48818.1      ICHEGK---AYFPREGVFVFNGTS-------WFITQRNFFSPQIITTDNT-FVSGSCDVV
ALK02457.1      ICHEGK---AYFPREGVFVFNGTS-------WFITQRNFFSPQIITTDNT-FVSGSCDVV
AAS10463.1      ICHEGK---AYFPREGVFVFNGTS-------WFITQRNFFSPQIITTDNT-FVSGNCDVV
AAP13441.1      ICHEGK---AYFPREGVFVFNGTS-------WFITQRNFFSPQIITTDNT-FVSGNCDVV
AAP13567.1      ICHEGK---AYFPREGVFVFNGTS-------WFITQRNFFSPQIITTDNT-FVSGNCDVV
QHD43416.1      ICHDGK---AHFPREGVFVSNGTH-------WFVTQRNFYEPQIITTDNT-FVSGNCDVV
AVP78031.1      ICHEGK---AHFPREGVFVSNGTH-------WFVTQRNFYEPKIITTDNT-FVSGNCDVV
ABD75323.1      ICHEGK---AYFPREGVFVSNGSS-------WFITQRNFYSPQIITTDNT-FVAGSCDVV
QDF43835.1      ICHEGK---AYFPREGVFVSNGTS-------WFITQRNFYSPQIITTDNT-FVAGSCDVV
ABD75332.1      ICHEGK---AYFPREGVFVSNGTS-------WFITQRNFYSPQIITTDNT-FVAGNCDVV
QDF43820.1      ICHEGK---AYFPREGVFVSNGTF-------WFITQRNFYSPQIITTDNT-FVAGNCDVV
AAZ67052.1      ICHEGK---AYFPREGVFVSNGTS-------WFITQRNFYSPQIITTDNT-FVAGSCDVV
AFS88936.1      LCDAANPTNCIAPVNGYFIKTNNT--RIVDEWSYTGSSFYAPEPITSLNTKYVA--PQVT
YP_0010399      LCNNNNPPLCIAPIDGYFITNQTTTYSVDTEWYYTGSSFYKPEPITQANSRYVS--SDVK
                :*   :   .  * :* *: . .        *  *  .*: *: **  *: :*:   :* 
QDF43825.1      IGIINNTVYDPL---QPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRL
AGZ48818.1      IGIINNTVYDPL---QPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEINRL
ALK02457.1      IGIINNTVYDPL---QPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRL
AAS10463.1      IGIINNTVYDPL---QPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQEEIDRL
AAP13441.1      IGIINNTVYDPL---QPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRL
AAP13567.1      IGIINNTVYDPL---QPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRL
QHD43416.1      IGIVNNTVYDPL---QPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRL
AVP78031.1      IGIINNTVYDPL---QPELDSFKEELDKYFKNHTSPDIDLGDISGINASVVNIQKEIDRL
ABD75323.1      IGIINNTVYDPL---QPELDSFKQELDKYFKNHTSPDVDLGDISGINASVVDIQKEIDRL
QDF43835.1      IGIINNTVYDPL---QPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRL
ABD75332.1      IGIINNTVYDPL---QPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRL
QDF43820.1      IGIINNTVYDPL---QPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRL
AAZ67052.1      IGIINNTVYDPL---QPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRL
AFS88936.1      YQNISTNLPPPLLGNSTGID-FQDELDEFFKNVSTSIPNFGSLTQINTTLLDLTYEMLSL
YP_0010399      FDKLENNLPPPLLENSTDVD-FKDELEEFFKNVTSHGPNFAEISKINTTLLDLSDEMAML
                   :...:  **   .. :* *::**:::*** ::   ::..:: **::::::  *:  * 
QDF43825.1      NEVAKNLNESLIDLQELGKYEQYIKWPWYVWLGFIAGLIAIVMVTILLCCMTSCCSCLKG
AGZ48818.1      NEVAKNLNESLIDLQELGKYEQYIKWPWYVWLGFIAGLIAIVMVTILLCCMTSCCSCLKG
ALK02457.1      NEVAKNLNESLIDLQELGKYEQYIKWPWYVWLGFIAGLIAIVMVTILLCCMTSCCSCLKG
AAS10463.1      NEVAKNLNESLIDLQELGKYEQYIKWPWYVWLGFIAGLIAIVMVTILLCCMTSCCSCLKG
AAP13441.1      NEVAKNLNESLIDLQELGKYEQYIKWPWYVWLGFIAGLIAIVMVTILLCCMTSCCSCLKG
AAP13567.1      NEVAKNLNESLIDLQELGKYEQYIKWPWYVWLGFIAGLIAIVMVTILLCCMTSCCSCLKG
QHD43416.1      NEVAKNLNESLIDLQELGKYEQYIKWPWYIWLGFIAGLIAIVMVTIMLCCMTSCCSCLKG
AVP78031.1      NEVARNLNESLIDLQELGKYEQYIKWPWYVWLGFIAGLIAIVMVTILLCCMTSCCSCLKG
ABD75323.1      NEVAKNLNESLIDLQELGKYEQYIKWPWYVWLGFIAGLVGLFMAIILLCYFTSCCSCCKG
QDF43835.1      NEVAKNLNESLIDLQELGKYEQYIKWPWYVWLGFIAGLIAIVMATILLCCMTSCCSCLKG
ABD75332.1      NEVAKNLNESLIDLQELGKYEQYIKWPWYVWLGFIAGLIAIVMVTILLCCMTSCCSCLKG
QDF43820.1      NEVAKNLNESLIDLQELGKYEQYIKWPWYVWLGFIAGLIAIVMATILLCCMTSCCSCLKG
AAZ67052.1      NEVAKNLNESLIDLQELGKYEQYIKWPWYVWLGFIAGLIAIVMVTILLCCMTSCCSCLKG
AFS88936.1      QQVVKALNESYIDLKELGNYTYYNKWPWYIWLGFIAGLVALALCVFFILCCTGCGTNCMG
YP_0010399      QEVVKQLNDSYIDLKELGNYTYYNKWPWYVWLGFIAGLVALLLCVFFLLCCTGCGTSCLG
                ::*.. **:* ***:***:*  * *****:********:.: :  :::   *.* :   *
QDF43825.1      ACSCGSCC-KFDEDDSEPVLKGVKLHYT
AGZ48818.1      ACSCGSCC-KFDEDDSEPVLKGVKLHYT
ALK02457.1      ACSCGSCC-KFDEDDSEPVLKGVKLHYT
AAS10463.1      ACSCGSCC-KFDEDDSEPVLKGVKLHYT
AAP13441.1      ACSCGSCC-KFDEDDSEPVLKGVKLHYT
AAP13567.1      ACSCGSCC-KFDEDDSEPVLKGVKLHYT
QHD43416.1      CCSCGSCC-KFDEDDSEPVLKGVKLHYT
AVP78031.1      CCSCGSCC-KFDEDDSEPVLKGVKLHYT
ABD75323.1      MCSCGSCC-RFDEDDSEPVLKGVKLHYT
QDF43835.1      ACSCGSCC-KFDEDDSEPVLKGVKLHYT
ABD75332.1      ACSCGSCC-KFDEDDSEPVLKGVKLHYT
QDF43820.1      ACSCGSCC-KFDEDDSEPVLKGVKLHYT
AAZ67052.1      ACSCGSCC-KFDEDDSEPVLKGVKLHYT
AFS88936.1      KLKCNRCCDRYEEYDLEP----HKVHVH
YP_0010399      KMKCKNCCDSYEEYDVE------KIHVH
                  .*  **  ::* * *      *:*

Table 2: Highlighted amino acids for binding. Bolded are the RBD spike residues. In parenthesis are the 5 critical residue points.

QDF43825.1      RQIAPGQTGVIADYNYKLPDDFMGC-VLAWNTRNIDATSTGNYNYKYR(S)LRHGKLRPFER
AGZ48818.1      RQIAPGQTGVIADYNYKLPDDFTGC-VLAWNTRNIDATQTGNYNYKYR(S)LRHGKLRPFER
ALK02457.1      RQIAPGQTGVIADYNYKLPDDFTGC-VLAWNTRNIDATQTGNYNYKYR(S)LRHGKLRPFER
AAS10463.1      RQIAPGQTGVIADYNYKLPDDFMGC-VLAWNTRNIDATSTGNYNYKYR(Y)LRHGKLRPFER
AAP13441.1      RQIAPGQTGVIADYNYKLPDDFMGC-VLAWNTRNIDATSTGNYNYKYR(Y)LRHGKLRPFER
AAP13567.1      RQIAPGQTGVIADYNYKLPDDFMGC-VLAWNTRNIDATSTGNYNYKYR(Y)LRHGKLRPFER
QHD43416.1      RQIAPGQTGKIADYNYKLPDDFTGC-VIAWNSNNLDSKVGGNYNYLYR(L)FRKSNLKPFER
AVP78031.1      RQVAPGQTGVIADYNYKLPDDFTGC-VIAWNTAKQD---VGNYF--YR(S)HRSTKLKPFER
ABD75323.1      RQVAPGQTGVIADYNYKLPDDFTGC-VIAWNTAKQD---VGSYF--YR(S)HRSSKLKPFER
QDF43835.1      RQVAPGETGVIADYNYKLPDDFTGC-VIAWNTAKQD---QGQYY--YR(S)SRKTKLKPFER
ABD75332.1      RQVAPGETGVIADYNYKLPDDFTGC-VIAWNTAQQD---QGQYY--YR(S)YRKEKLKPFER
QDF43820.1      RQVAPGETGVIADYNYKLPDDFTGC-VIAWNTAKQD---TGHYY--YR(S)HRKTKLKPFER
AAZ67052.1      RQVAPGETGVIADYNYKLPDDFTGC-VIAWNTAKQD---QGQYY--YR(S)HRKTKLKPFER
AFS88936.1      SDLSVSSAGPISQFNYKQSFSNPTC-LILATVPHNLTTITKPLKYSYINKCSRLLSDDRT
YP_0010399      SYIRPGSAGNIPLYNYKQSFANPTCRVMASVLANVTITKPHAYG--YIS-KCSRLTGANQ
                  :  ..:* *. :*** .     * ::     :            *       *     
QDF43825.1      DISNVPFSPDGKPCTPP-A(F)-NCYW-----------PL(N)(D)YGFFTT(N)GIGYQPYRVVVLS
AGZ48818.1      DISNVPFSPDGKPCTPP-A(F)-NCYW-----------PL(N)(D)YGFYIT(N)GIGYQPYRVVVLS
ALK02457.1      DISNVPFSPDGKPCTPP-A(F)-NCYW-----------PL(N)(D)YGFYIT(N)GIGYQPYRVVVLS
AAS10463.1      DISNVPFSPDGKPCTPP-A(P)-NCYW-----------PL(N)(G)YGFYTT(S)GIGYQPYRVVVLS
AAP13441.1      DISNVPFSPDGKPCTPP-A(L)-NCYW-----------PL(N)(D)YGFYTT(T)GIGYQPYRVVVLS
AAP13567.1      DISNVPFSPDGKPCTPP-A(L)-NCYW-----------PL(N)(D)YGFYTT(T)GIGYQPYRVVVLS
QHD43416.1      DISTEIYQAGSTPCNGVEG(F)-NCYF-----------PL(Q)(S)YGFQPT(N)GVGYQPYRVVVLS
AVP78031.1      DLSSDE---------------NGVR-----------TL(S)(T)YDFNPN(V)PLEYQATRVVVLS
ABD75323.1      DLSSEE---------------NGVR-----------TL(S)(T)YDFNQN(V)PLEYQATRVVVLS
QDF43835.1      DLTSDE---------------NGVR-----------TL(S)(T)YDFYPN(V)PIEYQATRVVVLS
ABD75332.1      DLSSDE---------------NGVY-----------TL(S)(T)YDFYPS(I)PVEYQATRVVVLS
QDF43820.1      DLSSDDG--------------NGVY-----------TL(S)(T)YDFNPN(V)PVAYQATRVVVLS
AAZ67052.1      DLSSDE---------------NGVR-----------TL(S)(T)YDFYPS(V)PVAYQATRVVVLS
AFS88936.1      EVPQLVNANQYSPCVSI-VP-STVWEDGDYYRKQLSPLEGGGWLVASGSTVAMTEQLQMG
YP_0010399      DVETPLYINPGEYSICRDFSPGGFSEDGQVFKRTLTQFEGGGLLIGVGTRVPMTDNLQMS
                    ::                   .               :.  .              : :.


4. Figure 3:Class phylogenetic tree results for coronavirus S- proteins with Human betacoronavirus 2c and trylonycteris bat coronavirus as outgroups.
Nathan Week4Phylogenetictree.png4.

5. Comparison of the class data sequences (Table 1) and the class phylogenetic tree (Fig. 1).

  • The class phylogenetic tree matches the data with the class data sequence. This can first be seen with the outgroups Human betacoronavirus 2c (seq AFS88936.1) and trylonycteris bat coronavirus (seq. YP_0010399). In each sequence comparison, each of the sequences shares sequence similarities. However, the outgroups differ and they have gaps inside their sequence where the other strands share similarities. Another comparison is that the outgroups do have gaps, whereas the other strands have gaps. In the phylogenetic tree, within the sequences, there are two major branches, other than the out-groups. In some of the sequence analysis, there are some sequence similarities where there is a difference in conserved sequences. Like the outgroup, the two major branches have some sequence alignments with gaps where the other branch does not have a gap. Another big difference is that binding amino acids remain similar in the top branch and are different than the ones found in the bottom branch. An example of this is that the amino acids in the top branch had the 4th and 5th amino acid bound was an N and D, N and G, or Q and S which are similar to the ones in the Wan et. al. 2020 paper. However, in the second branch, the 4th and 5th amino acids were S and T (Table 1 and Figure 1).

6. Comparison of Table 1 sequence alignment to the Figure 3 alignment from the (Wan et. al. 2020) paper.

  • The two sequences had very similar results with the RBD residue statements. The top branch was more similar to the RBD sequences and amino acids than the bottom branch sequences. None of the sequences had identical sequences to the spike protein alignment in the Wan. et al. sequence. The 5 amino acid sequences were the exact same for a couple and very different for a couple. For the first critical amino acid residue, all sequences had either had a Y, S, or L, which is the same as the Wan et. al. 2020 sequences. All the bottom branch sequences had the S amino acid, whereas, the top branch sequences had S amino acid for a couple of sequences and Y or L for the rest. For the second amino acid the first branch had the amino acid for the critical residue point being either F, L, or P. The Wan et. al. sequence had them being either F or L. The one sequence that had the amino acid P may have just been a single mutation in that sequence. The second branch, however, did not even have the second amino acid displayed but had a gap. This either means that their sequences had the amino acid somewhere else or that they only had 4 critical amino acid binding points. the third and fourth critical amino acid points in the first branch were N or Q in the 3rd critical residue point and D, G, or S in the fourth. The third is consistent with the results from the Wan et. al. paper as they were also N or Q. The fourth amino acid, on the other hand, had the sequence g for one of them. The point at which it was G was on the same sequence that had P for the second amino acid sequence showing mutations along the whole sequence. The second branch was more consistent and had the S amino acid for the 3rd critical amino acid residue and T for the fourth. This was not shared in any of the Wan et. al 2020 sequences. Finally for the fifth amino acid sequence, both the top branch sequence and the Wan et. al. sequence had the same amino acids at the position being either T, S, or N. On the other hand, the bottom branch either had the sequence with the critical amino acid residue of V or T. The five critical amino acid residues and RBD sequences show that the top branch was more closely related to the Wan et. al. SARS sequences than the bottom branch (Table 2 and Wan et. al. 2020).

7. Comparison of Figure 1 phylogenetic tree with Figure 2 Phylogenetic tree from the (Wan et. al. 2020) paper.

  • The two trees are similar in the fact that they have two branches and the same virus spike protein RBDs on the top branch were the same in the recreated tree and the spike protein RBD sequences on the bottom branch did not change as well. The differences were that the outgroup was changed from "BtSCoV PDF2386" to the HKU4-bat spike protein and MERS-CoV. Another difference was that only about half the sequences were used from the Wan et. al. 2020 figure four sequences and 3 others were added. The added ones were the spike proteins with the GenBank accession number QHD43416.1, AAP13441.1, and QDF43825.1. These changed the splits that were present in the paper. As the half that was chosen were every other one every split changed as mutation events were not the same (Wan et. al. 2020).

8. Were the results reproducible?

  • The results were not easily replicable. The first reason is that their out-group "BtSCov PDF 2386" was not found in GenBank. This means that the sequence is either not publicly accessible or with a different accession number. The other reason was that they did not state how they found the amino acid that bound to the ACE-2 and just highlighted it. This made it difficult to find the same amino acids in other viruses with other spike proteins. Likewise, they did not show how to find the RBD sequences, and they only highlighted them.

Scientific Conclusion

  • Understanding how to gather DNA and RNA sequences and create sequence alignments and phylogenetic trees is crucial for all biologists. Comparing sequences allows the biologist to better understand the mechanisms for binding, common lineage, protein folding, and more. This allows us to not only learn about organisms but also about humans. In this lab the specific RBD amino acid critical residues that were studied show where virus' spike proteins bind to the human ACE-2 receptors. Without this learning how to find sequences and align them to other similar sequences it would make it harder to find the relationship. viruses mutate and form new viruses with similar sequences. Through studying phylogenetic trees we can see how many of the same genus of viruses have common ancestors and share amino acid sequences together. This was also seen by comparing the amino acid sequences and finding similarities. This lab shows that through the understanding of databases and tree technology anyone can study DNA, RNA, or amino acid similarities from anywhere.

Acknowledgments

  1. Referenced and copied OpenWebWare syntax from the BIOL368/F20 week 1 page.
  2. Referenced and copied questions, spike protein sequences, and methods from the BIOL368/F20 week 4 page.
  3. Referenced MediaWiki for image formatting syntax.
  4. Asked Dr. Dahlquist about Sequence alignment interpretation and phylogenetic tree interpretation during Office hours.
  5. Asked Dr. Dahlquist about locating critical amino acid residue through email.
  6. Referenced the article "Receptor Recognition by the Novel Coronavirus from Wuhan: an Analysis Based on Decade-Long Structural Studies of SARS Coronavirus" for Figures 2 and 3.
  7. Referenced GenBank for the SARS Coronavirus Urbani DNA sequences and information regarding Coronavirus BtRs-BetaCoV/YN2018A.
  8. Created a sequence alignment and phylogenetic tree using https://www.phylogeny.fr.
  9. Worked with my partner Macie Duran on how to create the sequence alignment and phylogenetic tree during class.

References

  1. OpenWetWare. (2020). BIOL368/F20:Week 1. Retrieved September 22, 2020, from https://openwetware.org/wiki/BIOL368/F20:Week_1
  2. OpenWetWare. (2020). BIOL368/F20:Week 4. Retrieved September 29, 2020, from https://openwetware.org/wiki/BIOL368/F20:Week_4
  3. Wan, Y., et al. (2020). Receptor Recognition by the Novel Coronavirus from Wuhan: an Analysis Based on Decade-Long Structural Studies of SARS Coronavirus. Journal of Virology, 54 (7), retrieved from https://doi.org/10.1128/JVI.00127-20.
  4. GenBank. (2019). Coronavirus BtRs-BetaCoV/YN2018A, complete genome. Retrieved September 24, 2020, from https://www.ncbi.nlm.nih.gov/nuccore/MK211375.
  5. GenBank. (2016). SARS coronavirus Urbani, complete genome. Retrieved September 24, 2020, from https://www.ncbi.nlm.nih.gov/nuccore/AY278741.1
  6. Phylogeny.fr (2020). Alignment results. Methodes et Aigrithmes pour la bio-informatique LIRMM. Retrieved September 24,2020, from http://www.phylogeny.fr/simple_phylogeny.cgi?workflow_id=c52fa7aed876bb95fafa812ccb1c8f9a&tab_index=3.
  7. Phylogeny.fr (2020). Tree Rendering. Methodes et Aigrithmes pour la bio-informatique LIRMM. Retrieved September 24,2020, from http://www.phylogeny.fr/simple_phylogeny.cgi?workflow_id=c52fa7aed876bb95fafa812ccb1c8f9a&tab_index=6

"Except for what is noted above, this individual journal entry was completed by me and not copied from another source" Nathan R. Beshai (talk) 16:39, 29 September 2020 (PDT)