Kam Taghizadeh Week 4
Links to Weekly Assignments
Links to Individual Journal Assignments
- Kam Taghizadeh
- Kam Taghizadeh Week 2
- Kam Taghizadeh Week 3
- Kam Taghizadeh Week 4
- Kam Taghizadeh Week 5
- Kam Taghizadeh Week 6
- Kam Taghizadeh Week 7
- BacFITBase Review
- Kam Taghizadeh Week 9
- Kam Taghizadeh Week 10
- Kam Taghizadeh Week 11
- Kam Taghizadeh Week 12
- Kam Taghizadeh Week 14
- Class Journal Week 1
- Class Journal Week 2
- Class Journal Week 3
- Class Journal Week 4
- Class Journal Week 5
- Class Journal Week 6
- Class Journal Week 7
- Class Journal Week 8
- Class Journal Week 9
- Class Journal Week 10
- Class Journal Week 11
- Class Journal Week 12
- Class Journal Week 14
Purpose
- This week's assignment instructs one how to access particular genomic sequences and learn how to compare them to other sequences using phylogenetic trees, in order to determine a common ancestor.By learning these skills, we can better analyze viral strains through their similarities and differences.
Methods and Results
Part 1: Access GenBank Records
- I chose the coronavirus 2 isolate Wuhan-Hu-1 from the GenBank Record of the Data & Resources section from the BIOL/F20 Week 4 page and viewed the full record and the FASTA formatted sequence.
- The Accession number was: MN908947
- I interpreted the information provided on GenBank regarding this particular protein sequence:
- Definition:Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome
- Organism:Severe acute respiratory syndrome coronavirus 2
- Title:A new coronavirus associated with human respiratory disease in China
- Source:Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)
- Full sequence of the genome
- I Downloaded the nucleotide sequence in FASTA format to my local hard drive.
- I Clicked the send-to link in the upper right side of the page. Selected Complete Record, File as the Destination, and FASTA as the format. I clicked the File button and remembered where to put the file and named it so it can be found later.
- I Opened the file that was saved with a word processor to confirm that the sequence is there and is in the FASTA format. In this format, each sequence begins with a greater than sign (>).
- After gaining a good understanding on how to navigate GenBank with the chosen sequence from the data and resources section from the BIOL/F20 Week 4 page,I searched for my assigned viral sequence, Bat SARS-like coronavirus isolate bat-SL-CoVZC45.
- I added a hyperlink to the viral genome sequence in the Data & Tools section of the Week 4 Assignment.
- This is the entire Bat SARS-like coronavirus isolate origin sequence.
- I then searched for the spike protein of the bat-SL-CoVZC45 sequence in the GenBank record.
- I then added a hyperlink to it in the list of sequences in the Data & Tool section of the Week 4 Assignment.
- I downloaded the spike protein sequence in the FASTA format.
- Spike Protein [bat-SL-CoVZC45] Sequence:
- I then added a hyperlink to it in the list of sequences in the Data & Tool section of the Week 4 Assignment.
- Spike protein sequence accessed from GenBank.
>AVP78031.1 spike protein [Bat SARS-like coronavirus] MLFFLFLQFALVNSQCVNLTGRTPLNPNYTNSSQRGVYYPDTIYRSDTLVLSQGYFLPFYSNVSWYYSLT TNNAATKRTDNPILDFKDGIYFAATEHSNIIRGWIFGTTLDNTSQSLLIVNNATNVIIKVCNFDFCYDPY LSGYYHNNKTWSIREFAVYSSYANCTFEYVSKSFMLNISGNGGLFNTLREFVFRNVDGHFKIYSKFTPVN LNRGLPTGLSVLQPLVELPVSINITKFRTLLTIHRGDPMPNNGWTAFSAAYFVGYLKPRTFMLKYNENGT ITDAVDCALDPLSETKCTLKSLTVQKGIYQTSNFRVQPTQSVVRFPNITNVCPFHKVFNATRFPSVYAWE RTKISDCIADYTVFYNSTSFSTFKCYGVSPSKLIDLCFTSVYADTFLIRFSEVRQVAPGQTGVIADYNYK LPDDFTGCVIAWNTAKQDVGNYFYRSHRSTKLKPFERDLSSDENGVRTLSTYDFNPNVPLEYQATRVVVL SFELLNAPATVCGPKLSTQLVKNQCVNFNFNGLKGTGVLTDSSKRFQSFQQFGKDASDFIDSVRDPQTLE ILDITPCSFGGVSVITPGTNTSLEVAVLYQDVNCTDVPTTIHADQLTPAWRIYATGTNVFQTQAGCLIGA EHVNASYECDIPIGAGICASYHTASILRSTSQKAIVAYTMSLGAENSIAYANNSIAIPTNFSISVTTEVM PVSMAKTSVDCTMYICGDSIECSNLLLQYGSFCTQLNRALSGIAIEQDKNTQEVFAQVKQIYKTPPIKDF GGFNFSQILPDPSKPSKRSFIEDLLFNKVTLADAGFIKQYGDCLGGISARDLICAQKFNGLTVLPPLLTD EMIAAYTAALISGTATAGWTFGAGAALQIPFAMQMAYRFNGIGVTQNVLYENQKLIANQFNSAIGKIQES LTSTASALGKLQDVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTY VTQQLIRAAEIRASANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYIPSQEKNFTT APAICHEGKAHFPREGVFVSNGTHWFVTQRNFYEPKIITTDNTFVSGNCDVVIGIINNTVYDPLQPELDS FKEELDKYFKNHTSPDIDLGDISGINASVVNIQKEIDRLNEVARNLNESLIDLQELGKYEQYIKWPWYVW LGFIAGLIAIVMVTILLCCMTSCCSCLKGCCSCGSCCKFDEDDSEPVLKGVKLHYT
- This protein sequence can also be found in the Week 4 Talk Page for this assignment.
Part 2: Creating a Phylogenetic Tree using Phylogeny.fr
- I went on www.phylogeny.fr, scrolled down the page to the section called ‘Phylogeny analysis’, and clicked on the text ‘One Click’.
- I clicked in the text field labeled ‘Upload your set of sequences in FASTA, EMBL, or NEXUS format’.
- In order to create a phylogenetic tree, I copied the list of sequences from the BIOL/F20 Week 4 page talk page and pasted them in the text field using Ctrl-V, then clicked submit.
- Once alignment was completed, I saw a page with Alignment results, a page with phylogeny results, and a page with Tree rendering results.
- I found the numbered tabs located below the text One Click Mode, and clicked the tab labeled 3.Alignment.
- In alignment, the individual positions are color coded to display their conservation as such:
- Blue highlighting=high conservation
- Gray highlighting=lower conservation
- White highlighting=little/no conservation
- I made an initial observation regarding color coding.
- The beginning of the alignment showed very little conservation compared to the rest of the sequences.
- The end of the alignment showed much more conservation compared to the rest of the sequences.
- Towards the bottom of the page, I clicked Alignment in Clustal format under Outputs. This displayed the alignment in text-only format, where conservation is displayed as a symbol underneath the alignment block as such:
- “*” for invariant
- “:” for highly conserved
- “.” for weakly conserved
- "space" for not conserved
- This entire alignment is copied and pasted below, and was formatted properly by using the space character at the beginning of each line.
3. Table 1:Class Sequence alignment-CLUSTAL FORMAT: MUSCLE (3.8) multiple sequence alignments
QDF43825.1 ---------MKLLVLV-----FATLVSSYTIEKCTDFD------DRTPPSNTQFLSSHRG ALK02457.1 ----------MFIFLF-----FLTLTSGSDLESCTTFD------DVQAPNYPQHSSSRRG AAS10463.1 ----------MFIFLL-----FLTLTSGSDLDRCTTFD------DVQAPNYTQHTSSMRG AAP13441.1 ----------MFIFLL-----FLTLTSGSDLDRCTTFD------DVQAPNYTQHTSSMRG AAP13567.1 ----------MFIFLL-----FLTLTSGSDLDRCTTFD------DVQAPNYTQHTSSMRG QHD43416.1 ----------MFVFLV-----LLPLVS----SQCVNLT------TRTQLPPAYTNSFTRG AVP78031.1 -----------MLFFL-----FLQFALVN--SQCVNLT------GRTPLNPNYTNSSQRG ABD75323.1 --------MKILIFAF-----LVTLVKAQ--EGCGVIN------LRTQPKLTQVSSSRRG QDF43835.1 --------MKVLIVLL-----CLGLVTAQ--DGCGHIS------TKPQPLLDKFSSSRRG QDF43820.1 --------MKILIFAF-----LVTLVEAQ--EGCGIIS------RKPQPKMAQVSSSRRG AAZ67052.1 --------MKILILAF-----LASLAKAQ--EGCGIIS------RKPQPKMAQVSSSRRG AFS88936.1 ----MIHSVFLLMFLLTPTESYVDVGPDSVKSACIEVDIQQTFFDKTWPRPIDVSKA-DG YP_0010399 MTLLMCLLMSLLIFVRGCDSQFVDMSPASNTSECLESQVDAAAFSKLMWPYPIDPSKVDG ::. . . * . *
QDF43825.1 VYYPDDIFRSNVLHLVQDHFLPFDSNVT--RFITFGLN--------------FDNPIIPF ALK02457.1 VYYPDEIFRSDTLYLTQDLFLPFYSNVT--GFHTINHR--------------FDNPVIPF AAS10463.1 VYYPDEIFRSDTLYLTQDLFLPFYSNVT--GFHTINHT--------------FDDPVIPF AAP13441.1 VYYPDEIFRSDTLYLTQDLFLPFYSNVT--GFHTINHT--------------FGNPVIPF AAP13567.1 VYYPDEIFRSDTLYLTQDLFLPFYSNVT--GFHTINHT--------------FDNPVIPF QHD43416.1 VYYPDKVFRSSVLHSTQDLFLPFFSNVT--WFHAIHVSGTNGTK-------RFDNPVLPF AVP78031.1 VYYPDTIYRSDTLVLSQGYFLPFYSNVS--WYYSLTTNNAATKR--------TDNPILDF ABD75323.1 VYYNDDIFRSDVLHLTQDYFLPFHSNLT--QYFSLNIESDKIVY--------FDNPILKF QDF43835.1 VYYNDDIFRSDVLHLTQDYFLPFDTNLT--RYLSFNMDSATKVY--------FDNPTLPF QDF43820.1 VYYNDDIFRSDVLHLTQDYFLPFDSNLT--QYFSLNVDSDRYTY--------FDNPILDF AAZ67052.1 VYYNDDIFRSNVLHLTQDYFLPFDSNLT--QYFSLNVDSDRFTY--------FDNPILDF AFS88936.1 IIYPQGRTYSNITITYQGLF-PYQGDHG--DMYVYSAGHATGTTPQKLFVANYSQDVKQF YP_0010399 IIYPLGRTYSNITLAYTGLF-PLQGDLGSQYLYSVSHAVGHDGDPTKAYISNYSLLVNDF : * *. . * * : . *
QDF43825.1 RDGVYF----AATEKSNVIRG-------------WVFGSTMNNKSQ---------SVIIM ALK02457.1 KDGVYF----AATEKSNVVRG-------------WVFGSTMNNKSQ---------SVIII AAS10463.1 KDGIYF----AATEKSNVVRG-------------WVFGSTMNNKSQ---------SVIII AAP13441.1 KDGIYF----AATEKSNVVRG-------------WVFGSTMNNKSQ---------SVIII AAP13567.1 KDGIYF----AATEKSNVVRG-------------WVFGSTMNNKSQ---------SVIII QHD43416.1 NDGVYF----ASTEKSNIIRG-------------WIFGTTLDSKTQ---------SLLIV AVP78031.1 KDGIYF----AATEHSNIIRG-------------WIFGTTLDNTSQ---------SLLIV ABD75323.1 GDGVYF----AATEKSNVIRG-------------WVFGSTFDNTTQ---------SAIIV QDF43835.1 GDGIYF----AATEKSNVVRG-------------WIFGSTMDNTTQ---------SAIIV QDF43820.1 GDGVYF----AATEKSNVIRG-------------WIFGSTFDNTTQ---------SAVIV AAZ67052.1 GDGVYF----AATEKSNVIRG-------------WIFGSTFDNTTQ---------SAVIV AFS88936.1 ANGFVVRIGAAANSTGTVIISPSTSATIRKIYPAFMLGSSVGNFSDGKMGRFFNHTLVLL YP_0010399 DNGFVVRIGAAANSTGTIVISPSVNTKIKKAYPAFILGSSLTNTSAGQ-PLYANYSLTII :*. . *:.. ..:: . :::*::. . : : ::
QDF43825.1 NNSTNLVIRACNFELCDNPFFVVLRSNNTQIPSY----IFNNAFNCTFEYVSKDFNLDIG ALK02457.1 NNSTNVVIRACNFELCDNPFFAVSKPTGTQTHTM----IFDNAFNCTFEYISDSFSLDVA AAS10463.1 NNSTNVVIRACNFELCDNPFFVVSKPMGTRTHTM----IFDNAFNCTFEYISDAFSLDVS AAP13441.1 NNSTNVVIRACNFELCDNPFFAVSKPMGTQTHTM----IFDNAFNCTFEYISDAFSLDVS AAP13567.1 NNSTNVVIRACNFELCDNPFFAVSKPMGTQTHTM----IFDNAFNCTFEYISDAFSLDVS QHD43416.1 NNATNVVIKVCEFQFCNDPFLGVYYHKNNKSWMESEFRVYSSANNCTFEYVSQPFLMDLE AVP78031.1 NNATNVIIKVCNFDFCYDP-YLSGYYHNNKTWSIREFAVYSSYANCTFEYVSKSFMLNIS ABD75323.1 NNSTHIIIRVCYFNLCKDPMYTVSAGTQKSSW------VYQSAFNCTYDRVEKSFQLDTS QDF43835.1 NNSTHIIIRVCYFNLCKEPMYAISNEQHYKSW------VYQNAYNCTYDRVEQSFQLDTA QDF43820.1 NNSTHIIIRVCNFNLCKEPMYTVSRGTQQSSW------VYQSAFNCTYDRVERSFQLDTA AAZ67052.1 NNSTHIIIRVCNFNLCKEPMYTVSRGAQQSSW------VYQSAFNCTYDRVEKSFQLDTA AFS88936.1 PDGCGTLLRAFYCILEPRSGNHCPAGNSYTSFAT----YHTPATDCSDGNYNRNASLNSF YP_0010399 PDGCGTVLHAFYCILKPRTVNRCPSGTGYVSYFI----YETVHNDC-QSTINRNASLNSF :. ::.. : . :* . ::
QDF43825.1 EKPGNFKDLREFVFRN--------KDGFLHVYSGYQPISAASGLPTGF--NALKPIFKLP ALK02457.1 EKSGNFKHLREFVFKN--------KDGFLYVYKGYQPIDVVRDLPSGF--NILKPIFKLP AAS10463.1 EKSGNFKHLREFVFKN--------KDGFLYVYKGYQPIDVVRDLPSGF--NTLKPIFKLP AAP13441.1 EKSGNFKHLREFVFKN--------KDGFLYVYKGYQPIDVVRDLPSGF--NTLKPIFKLP AAP13567.1 EKSGNFKHLREFVFKN--------KDGFLYVYKGYQPIDVVRDLPSGF--NTLKPIFKLP QHD43416.1 GKQGNFKNLREFVFKN--------IDGYFKIYSKHTPINLVRDLPQGF--SALEPLVDLP AVP78031.1 GNGGLFNTLREFVFRN--------VDGHFKIYSKFTPVNLNRGLPTGL--SVLQPLVELP ABD75323.1 PKTGNFTDLREFVFKN--------RDGFFTAYQTYTPVNLLRGLPSGL--SVLKPILKLP QDF43835.1 PQTGNFKDLREYVFKN--------KDGFLSVYNAYSPIDIPRGLPVGF--SVLKPILKLP QDF43820.1 PKTGNFKDLREYVFKN--------RDGFLSVYQTYTAVNLPRGLPIGF--SVLRPILKLP AAZ67052.1 PKTGNFKDLREYVFKN--------RDGFLSVYQTYTAVNLPRGLPIGF--SVLRPILKLP AFS88936.1 KE---YFNLRNCTFMYTYNITEDEILEWFGITQTAQGVHLFSSRYVDLYGGNMFQFATLP YP_0010399 KS---FFDLVNCTFFNSWDITADETKEWFGITQDTQGVHLYSSRKGDLYGGNMFRFATLP . : * : .* : . : . .: . : : **
QDF43825.1 LGINITNFRTLLTAF------PPNPGYWGTSAAAYFVGYLKPTTFMLKYDENGTITDAVD ALK02457.1 LGINITNFRAILTAF------LPAQDTWGTSAAAYFVGYLKPATFMLKYDENGTITDAVD AAS10463.1 LGINITNFRAILTAF------SPAQDTWGTSAAAYFVGYLKPTTFMLKYDENGTITDAVD AAP13441.1 LGINITNFRAILTAF------SPAQDIWGTSAAAYFVGYLKPTTFMLKYDENGTITDAVD AAP13567.1 LGINITNFRAILTAF------SPAQDTWGTSAAAYFVGYLKPTTFMLKYDENGTITDAVD QHD43416.1 IGINITRFQTLLALHRSYLTPGDSSSGWTAGAAAYYVGYLQPRTFLLKYNENGTITDAVD AVP78031.1 VSINITKFRTLLTIHRGD---PMPNNGWTAFSAAYFVGYLKPRTFMLKYNENGTITDAVD ABD75323.1 FGINITSFRVVMAMF------SKTTSNYVPESAAYYVGNLKQSTFMLSFNQNGTIVDAVD QDF43835.1 IGINITSFKVVMSMF------SRTTSNFLPEVAAYFVGNLKYSTFMLNFNENGTITDAID QDF43820.1 FGINITSYRVVMAMF------SQTTSNFLPESAAYYVGNLKYTTFMLRFNENGTITDAID AAZ67052.1 FGINITSYRVVMAMF------SQTTSNFLPESAAYYVGNLKYTTFMLSFNENGTITNAID AFS88936.1 VYDTIKYYSIIPHSIRSI---QSDRKAW----AAFYVYKLQPLTFLLDFSVDGYIRRAID YP_0010399 VYEGIKYYTVIPRSFRSK---ANKREAW----AAFYVYKLHQLTYLLDFSVDGYIRRAID . *. : : : **::* *: *::* :. :* * *:*
QDF43825.1 CSQNPLAELKCSVKSFEIDKGIYQTSNFRVAPSKEVVRFPNITNLCPFGEVFNATTFPSV ALK02457.1 CSQNPLAELKCSVKSFEIDKGIYQTSNFRVAPSKEVVRFPNITNLCPFGEVFNATTFPSV AAS10463.1 CSQNPLAELKCSVKSFEIDKGIYQTSNFRVVPSGDVVRFPNITNLCPFGEVFNATKFPSV AAP13441.1 CSQNPLAELKCSVKSFEIDKGIYQTSNFRVVPSGDVVRFPNITNLCPFGEVFNATKFPSV AAP13567.1 CSQNPLAELKCSVKSFEIDKGIYQTSNFRVVPSGDVVRFPNITNLCPFGEVFNATKFPSV QHD43416.1 CALDPLSETKCTLKSFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFGEVFNATRFASV AVP78031.1 CALDPLSETKCTLKSLTVQKGIYQTSNFRVQPTQSVVRFPNITNVCPFHKVFNATRFPSV ABD75323.1 CSQDPLAELKCTTKSFNVSKGIYQTSNFRVSPVTEVVRFPNITNLCPFDKVFNATRFPSV QDF43835.1 CAQNPLSELKCTIKNFNVSKGIYQTSNFRVSPTHEVIRFPNITNRCPFDKVFNASRFPNV QDF43820.1 CAQNPLAELKCTIKNFNVSKGIYQTSNFRVSPTQEVVRFPNITNRCPFDKVFNASRFPNV AAZ67052.1 CAQNPLAELKCTIKNFNVSKGIYQTSNFRVSPTQEVIRFPNITNRCPFDKVFNATRFPNV AFS88936.1 CGFNDLSQLHCSYESFDVESGVYSVSSFEAKPSGSVVEQAEGVE-CDFSPLLSGTP-PQV YP_0010399 CGHDDLSQLHCSYTSFEVDTGVYSVSSYEASATGTFIEQPNATE-CDFSPMLTGVA-PQV *. : *:: :*: .: :..*:*..*.: . . .: .: .: * * ::.. ..*
QDF43825.1 YAWERKRISNCVADYSVLYNSTSFSTFKCYGVSATKLNDLCFSNVYADSFVVKGDDVRQI ALK02457.1 YAWERKRISNCVADYSVLYNSTSFSTFKCYGVSATKLNDLCFSNVYADSFVVKGDDVRQI AAS10463.1 YAWERKRISNCVADYSVLYNSTSFSTFKCYGVSATKLNDLCFSNVYADSFVVKGDDVRQI AAP13441.1 YAWERKKISNCVADYSVLYNSTFFSTFKCYGVSATKLNDLCFSNVYADSFVVKGDDVRQI AAP13567.1 YAWERKKISNCVADYSVLYNSTFFSTFKCYGVSATKLNDLCFSNVYADSFVVKGDDVRQI QHD43416.1 YAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLNDLCFTNVYADSFVIRGDEVRQI AVP78031.1 YAWERTKISDCIADYTVFYNSTSFSTFKCYGVSPSKLIDLCFTSVYADTFLIRFSEVRQV ABD75323.1 YAWERTKISDCVADYTVFYNSTSFSTFNCYGVSPSKLIDLCFTSVYADTFLIRFSEVRQV QDF43835.1 YAWERTKISDCVADYTVLYNSTSFSTFKCYGVSPSKLIDLCFTSVYADTFLIRSSEVRQV QDF43820.1 YAWERTKISDCVADYTVLYNSTSFSTFKCYGVSPSKLIDLCFTSVYADTFLIRSSEVRQV AAZ67052.1 YAWERTKISDCVADYTVLYNSTSFSTFKCYGVSPSKLIDLCFTSVYADTFLIRSSEVRQV AFS88936.1 YNFKRLVFTNCNYNLTKLLSLFSVNDFTCSQISPAAIASNCYSSLILDYFSYPLSMKSDL YP_0010399 YNFKRLVFSNCNYNLTKLLSLFAVDEFSCNGISPDSIARGCYSTLTVDYFAYPLSMKSYI * ::* :::* : : : . .. *.* :*. : *::.: * * . :
QDF43825.1 APGQTGVIADYNYKLPDDFMGC-VLAWNTRNIDATSTGNYNYKYRSLRHGKLRPFERDIS ALK02457.1 APGQTGVIADYNYKLPDDFTGC-VLAWNTRNIDATQTGNYNYKYRSLRHGKLRPFERDIS AAS10463.1 APGQTGVIADYNYKLPDDFMGC-VLAWNTRNIDATSTGNYNYKYRYLRHGKLRPFERDIS AAP13441.1 APGQTGVIADYNYKLPDDFMGC-VLAWNTRNIDATSTGNYNYKYRYLRHGKLRPFERDIS AAP13567.1 APGQTGVIADYNYKLPDDFMGC-VLAWNTRNIDATSTGNYNYKYRYLRHGKLRPFERDIS QHD43416.1 APGQTGKIADYNYKLPDDFTGC-VIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDIS AVP78031.1 APGQTGVIADYNYKLPDDFTGC-VIAWNTAKQDV---GNYF--YRSHRSTKLKPFERDLS ABD75323.1 APGQTGVIADYNYKLPDDFTGC-VIAWNTAKQDV---GSYF--YRSHRSSKLKPFERDLS QDF43835.1 APGETGVIADYNYKLPDDFTGC-VIAWNTAKQDQ---GQYY--YRSSRKTKLKPFERDLT QDF43820.1 APGETGVIADYNYKLPDDFTGC-VIAWNTAKQDT---GHYY--YRSHRKTKLKPFERDLS AAZ67052.1 APGETGVIADYNYKLPDDFTGC-VIAWNTAKQDQ---GQYY--YRSHRKTKLKPFERDLS AFS88936.1 SVSSAGPISQFNYKQSFSNPTC-LILATVPHNLTTITKPLKYSYINKCSRLLSDDRTEVP YP_0010399 RPGSAGNIPLYNYKQSFANPTCRVMASVLANVTITKPHAYG--YIS-KCSRLTGANQDVE ..:* *. :*** . * :: : * * ::
QDF43825.1 NVPFSPDGKPCTPP-AF-NCYW-----------PLNDYGFFTTNGIGYQPYRVVVLSFEL ALK02457.1 NVPFSPDGKPCTPP-AF-NCYW-----------PLNDYGFYITNGIGYQPYRVVVLSFEL AAS10463.1 NVPFSPDGKPCTPP-AP-NCYW-----------PLNGYGFYTTSGIGYQPYRVVVLSFEL AAP13441.1 NVPFSPDGKPCTPP-AL-NCYW-----------PLNDYGFYTTTGIGYQPYRVVVLSFEL AAP13567.1 NVPFSPDGKPCTPP-AL-NCYW-----------PLNDYGFYTTTGIGYQPYRVVVLSFEL QHD43416.1 TEIYQAGSTPCNGVEGF-NCYF-----------PLQSYGFQPTNGVGYQPYRVVVLSFEL AVP78031.1 SDE---------------NGVR-----------TLSTYDFNPNVPLEYQATRVVVLSFEL ABD75323.1 SEE---------------NGVR-----------TLSTYDFNQNVPLEYQATRVVVLSFEL QDF43835.1 SDE---------------NGVR-----------TLSTYDFYPNVPIEYQATRVVVLSFEL QDF43820.1 SDDG--------------NGVY-----------TLSTYDFNPNVPVAYQATRVVVLSFEL AAZ67052.1 SDE---------------NGVR-----------TLSTYDFYPSVPVAYQATRVVVLSFEL AFS88936.1 QLVNANQYSPCVSI-VP-STVWEDGDYYRKQLSPLEGGGWLVASGSTVAMTEQLQMGFGI YP_0010399 TPLYINPGEYSICRDFSPGGFSEDGQVFKRTLTQFEGGGLLIGVGTRVPMTDNLQMSFII . :. . : :.* :
QDF43825.1 L----NAPATVC-----GPKLSTDLIKNQCVNFNFNGLTGTGVLTPSSKRFQPFQQFGRD ALK02457.1 L----NAPATVC-----GPKLSTDLIKNQCVNFNFNGLTGTGVLTPSSKRFQPFQQFGRD AAS10463.1 L----NAPATVC-----GPKLSTDLIKNQCVNFNFNGLTGTGVLTPSSKRFQPFQQFGRD AAP13441.1 L----NAPATVC-----GPKLSTDLIKNQCVNFNFNGLTGTGVLTPSSKRFQPFQQFGRD AAP13567.1 L----NAPATVC-----GPKLSTDLIKNQCVNFNFNGLTGTGVLTPSSKRFQPFQQFGRD QHD43416.1 L----HAPATVC-----GPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKKFLPFQQFGRD AVP78031.1 L----NAPATVC-----GPKLSTQLVKNQCVNFNFNGLKGTGVLTDSSKRFQSFQQFGKD ABD75323.1 L----NAPATVC-----GPKLSTSLVKNQCVNFNFNGFKGTGVLTDSSKTFQSFQQFGRD QDF43835.1 L----NAPATVC-----GPKLSTGLVKNQCVNFNFNGLRGTGVLTDSSKRFQSFQQFGRD QDF43820.1 L----NAPATVC-----GPKLSTQLVKNQCVNFNFNGLKGTGVLTDSSKRFQSFQQFGRD AAZ67052.1 L----NAPATVC-----GPKLSTQLVKNQCVNFNFNGLKGTGVLTESSKRFQSFQQFGRD AFS88936.1 TVQYGTDTNSVCPKLEFANDTKIASQLGNCVEYSLYGVSGRGVFQNCTAVGVRQQRFVYD YP_0010399 SVQYGTGTDSVCPMLDLGDSLTITNRLGKCVDYSLYGVTGRGVFQNCTAVGVKQQRFVYD . :** . . . .:**::.: *. * **: .. *.* *
QDF43825.1 VSD-FTDSVRDPKTSEILDISPCSFGGVSVITPGTNTSSEVAVLYQDVNCTDVPVAI--- ALK02457.1 VLD-FTDSVRDPKTSEILDISPCSFGGVSVITPGTNTSSEVAVLYQDVNCTDVPVAI--- AAS10463.1 VSD-FTDSVRDPKTSEILDISPCSFGGVSVITPGTNASSEVAVLYQDVNCTDVSTLI--- AAP13441.1 VSD-FTDSVRDPKTSEILDISPCSFGGVSVITPGTNASSEVAVLYQDVNCTDVSTAI--- AAP13567.1 VSD-FTDSVRDPKTSEILDISPCSFGGVSVITPGTNASSEVAVLYQDVNCTDVSTAI--- QHD43416.1 IAD-TTDAVRDPQTLEILDITPCSFGGVSVITPGTNTSNQVAVLYQDVNCTEVPVAI--- AVP78031.1 ASD-FIDSVRDPQTLEILDITPCSFGGVSVITPGTNTSLEVAVLYQDVNCTDVPTTI--- ABD75323.1 ASD-FTDSVRDPQTLRILDISPCSFGGVSVITPGTNTSSAVAVLYQDVNCTDVPRTI--- QDF43835.1 TSD-FTDSVRDPQTLEILDITPCSFGGVSVITPGTNASSEVAVLYQDVNCTDVPTAI--- QDF43820.1 TSD-FTDSVRDPQTLEILDITPCSFGGVSVITPGTNASSEVAVLYQDVNCTDVPTAI--- AAZ67052.1 TSD-FTDSVRDPQTLEILDISPCSFGGVSVITPGTNASSEVAVLYQDVNCTDVPAAI--- AFS88936.1 AYQNLVGYYSDDGNYYCLR--ACVSVPVSVIY--DKETKTHATLFGSVACEHISSTMSQY YP_0010399 SFDNLVGYYSDDGNYYCVR--PCVSVPVSVIY--DKSTNLHATLFGSVACEHVTTMM--- : . * . : .* **** : : *.*: .* * :. :
QDF43825.1 -HADQLTPAWRIYSTGNNVFQTQAGCLIGAEHVD-TSYECDIPIGAGICASYHTVSS--- ALK02457.1 -HADQLTPSWRVYSTGNNVFQTQAGCLIGAEHVD-TSYECDIPIGAGICASYHTVSS--- AAS10463.1 -HAEQLTPAWRIYSTGNNVFQTQAGCLIGAEHVD-TSYECDIPIGAGICASYHTVSS--- AAP13441.1 -HADQLTPAWRIYSTGNNVFQTQAGCLIGAEHVD-TSYECDIPIGAGICASYHTVSL--- AAP13567.1 -HADQLTPAWRIYSTGNNVFQTQAGCLIGAEHVD-TSYECDIPIGAGICASYHTVSL--- QHD43416.1 -HADQLTPTWRVYSTGSNVFQTRAGCLIGAEHVN-NSYECDIPIGAGICASYQTQTNSPR AVP78031.1 -HADQLTPAWRIYATGTNVFQTQAGCLIGAEHVN-ASYECDIPIGAGICASYHTASI--- ABD75323.1 -QADQLAPSWRVYTTGPYVFQTQAGCLIGAEHVN-ASYQCDIPIGAGICASYHTASH--- QDF43835.1 -RADQLTPAWRVYSTGINVFQTQAGCLIGAEHVN-ASYECDIPIGAGICASYHTAST--- QDF43820.1 -RADQLTPAWRVYSTGVNVFQTQAGCLIGAEHVN-ASYECDIPIGAGICASYHTAST--- AAZ67052.1 -HADQLTPAWRVYSTGTNVFQTQAGCLIGAEHVN-ASYECDIPIGAGICASYHTAST--- AFS88936.1 SRSTRSMLKRRDSTYGP--LQTPVGCVLGLVNSSLFVEDCKLPLGQSLCALPDTPST--- YP_0010399 SQFSRLTQSNLRRRDSNIPLQTAVGCVIGLSNNSLVVSDCKLPLGQSLCAVPPV-ST--- . . . . :** .**::* : . :*.:*:* .:** . :
QDF43825.1 -LRSTS----QKSI--------VAYTMSLGADSSIAYSNNTIAIPTNFSISITTEVMPVS ALK02457.1 -LRSTS----QKSI--------VAYTMSLGADSSIAYSNNTIAIPTNFSISITTEVMPVS AAS10463.1 -LRSTS----QKSI--------VAYTMSLGADSSIAYSNNTIAIPTNFSISITTEVMPVS AAP13441.1 -LRSTS----QKSI--------VAYTMSLGADSSIAYSNNTIAIPTNFSISITTEVMPVS AAP13567.1 -LRSTS----QKSI--------VAYTMSLGADSSIAYSNNTIAIPTNFSISITTEVMPVS QHD43416.1 RARSVA----SQSI--------IAYTMSLGAENSVAYSNNSIAIPTNFTISVTTEILPVS AVP78031.1 -LRSTS----QKAI--------VAYTMSLGAENSIAYANNSIAIPTNFSISVTTEVMPVS ABD75323.1 -LRSTG----QKSI--------VAYTMSLGAENSVAYANNSIAIPTNFSISVTTEVMPVS QDF43835.1 -LRSVG----QKSI--------VAYTMSLGAENSIAYANNSIAIPTNFSISVTTEVMPVS QDF43820.1 -LRSVG----QKSI--------VAYTMSLGAENSIAYANNSIAIPTNFSISVTTEVMPVS AAZ67052.1 -LRSVG----QKSI--------VAYTMSLGAENSIAYANNSIAIPTNFSISVTTEVMPVS AFS88936.1 -LTPRS----VRSVPGEMRLASIAFNHPIQVDQ-LNSSYFKLSIPTNFSFGVTQEYIQTT YP_0010399 -FRSYSASQFQLAV--------LNYTSPIVV-TPINSSGFTAAIPTNFSFSVTQEYIETS . . :: : :. .: . : : . :*****::.:* * : .:
QDF43825.1 MAKTSVDCNMYICGDSTECANLLLQYGSFCTQLNRALSGIAVEQDRNTREVFAQVKQMYK ALK02457.1 MAKTSVDCNMYICGDSTECANLLLQYGSFCTQLNRALSGIAVEQDRNTREVFAQVKQMYK AAS10463.1 MAKTSVDCNMYICGDSTECANLLLQYGSFCRQLNRALSGIAAEQDRNTREVFVQVKQMYK AAP13441.1 MAKTSVDCNMYICGDSTECANLLLQYGSFCTQLNRALSGIAAEQDRNTREVFAQVKQMYK AAP13567.1 MAKTSVDCNMYICGDSTECANLLLQYGSFCTQLNRALSGIAAEQDRNTREVFAQVKQMYK QHD43416.1 MTKTSVDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGIAVEQDKNTQEVFAQVKQIYK AVP78031.1 MAKTSVDCTMYICGDSIECSNLLLQYGSFCTQLNRALSGIAIEQDKNTQEVFAQVKQIYK ABD75323.1 MAKTSVDCTMYICGDSLECSNLLLQYGSFCTQLNRALSGIAVEQDKNTQEVFAQVKQMYK QDF43835.1 MSKTSVDCTMYICGDSQECSNLLLQYGSFCTQLNRALTGIAIEQDKNTQEVFAQVKQMYK QDF43820.1 MAKTSVDCTMYICGDSQECSNLLLQYGSFCTQLNRALTGVALEQDKNTQEVFAQVKQMYK AAZ67052.1 MAKTSVDCTMYICGDSLECSNLLLQYGSFCTQLNRALSGIAIEQDKNTQEVFAQVKQMYK AFS88936.1 IQKVTVDCKQYVCNGFQKCEQLLREYGQFCSKINQALHGANLRQDDSVRNLFASVKSSQS YP_0010399 IQKVTVDCKQYVCNGFTRCEKLLVEYGQFCSKINQALHGANLRQDESVYSLYSNIKTT-S : *.:***. *:*.. * :** :**.** ::*.** * ** .. .:: .:* .
QDF43825.1 TPTLKD-FGG-FNFSQILPDPLKPTKRSF---IEDLLFNKVTLADAGFMKQYGECL--GD ALK02457.1 TPTLKD-FGG-FNFSQILPDPLKPTKRSF---IEDLLFNKVTLADAGFMKQYGECL--GD AAS10463.1 TPTLKD-FGG-FNFSQILPDPLKPTKRSF---IEDLLFNKVTLADAGFMKQYGECL--GD AAP13441.1 TPTLKY-FGG-FNFSQILPDPLKPTKRSF---IEDLLFNKVTLADAGFMKQYGECL--GD AAP13567.1 TPTLKY-FGG-FNFSQILPDPLKPTKRSF---IEDLLFNKVTLADAGFMKQYGECL--GD QHD43416.1 TPPIKD-FGG-FNFSQILPDPSKPSKRSF---IEDLLFNKVTLADAGFIKQYGDCL--GD AVP78031.1 TPPIKD-FGG-FNFSQILPDPSKPSKRSF---IEDLLFNKVTLADAGFIKQYGDCL--GG ABD75323.1 TPTIRD-FGG-FNFSQILPDPLKPTKRSF---IEDLLYNKVTLADAGFMKQYADCL--GG QDF43835.1 TPAIKD-FGG-FNFSQILPDPSKPTKRSF---IEDLLFNKVTLADAGFMKQYGECL--GD QDF43820.1 TPAIKD-FGG-FNFSQILPDPSKPTKRSF---IEDLLFNKVTLADAGFMKQYGECL--GD AAZ67052.1 TPAIKD-FGG-FNFSQILPDPSKPTKRSF---IEDLLFNKVTLADAGFMKQYGECL--GD AFS88936.1 SPIIPG-FGGDFNLTLLEPVSISTGSRSARSAIEDLLFDKVTIADPGYMQGYDDCMQQGP YP_0010399 TQTLEYGLNGDFNLTLLQVPQIGGSSSSYRSAIEDLLFDKVTIADPGYMQGYDDCMKQGP : : :.* **:: : . * *****::***:**.*::: * :*: *
QDF43825.1 INARDLICAQKFNGLTVLPPLLTDDMIAAYTAALVSGTATAGWTFGAGAALQIPFAMQMA ALK02457.1 INARDLICAQKFNGLTVLPPLLTDDMIAAYTAALVSGTATAGWTFGAGAALQIPFAMQMA AAS10463.1 INARDLICAQKFNGLTVLPPLLTDDMIAAYTAALVSGTATAGWTFGAGAALQIPFAMQMA AAP13441.1 INARDLICAQKFNGLTVLPPLLTDDMIAAYTAALVSGTATAGWTFGAGAALQIPFAMQMA AAP13567.1 INARDLICAQKFNGLTVLPPLLTDDMIAAYTAALVSGTATAGWTFGAGAALQIPFAMQMA QHD43416.1 IAARDLICAQKFNGLTVLPPLLTDEMIAQYTSALLAGTITSGWTFGAGAALQIPFAMQMA AVP78031.1 ISARDLICAQKFNGLTVLPPLLTDEMIAAYTAALISGTATAGWTFGAGAALQIPFAMQMA ABD75323.1 INARDLICAQKFNGLTVLPPLLTDDMIAAYTAALISGTATAGWTFGAGAALQIPFAMQMA QDF43835.1 INARDLICAQKFNGLTVLPPLLTDDMIAAYTAALVSGTATAGWTFGAGAALQIPFAMQMA QDF43820.1 INARDLICAQKFNGLTVLPPLLTDDMIAAYTAALVSGTATAGWTFGAGAALQIPFAMQMA AAZ67052.1 ISARDLICAQKFNGLTVLPPLLTDEMIAAYTAALVSGTATAGWTFGAGSALQIPFAMQMA AFS88936.1 ASARDLICAQYVAGYKVLPPLMDVNMEAAYTSSLLGSIAGVGWTAGLSSFAAIPFAQSIF YP_0010399 QSARDLICAQYVSGYKVLPPLYDPNMEAAYTSSLLGSIAGAGWTAGLSSFAAIPFAQSMF ******** . * .***** :* * **::*:.. *** * .: **** .:
QDF43825.1 YRFNGIGVTQNVLYENQKQIANQFNKAISQIQESLTTTSTALGKLQDVVNQNAQALNTLV ALK02457.1 YRFNGIGVTQNVLYENQKQIANQFNKAISQIQESLTTTSTALGKLQDVVNQNAQALNTLV AAS10463.1 YRFNGIGVTQNVLYENQKQIANQFNKAISQIQESLTTTSTALGKLQDVVNQNAQALNTLV AAP13441.1 YRFNGIGVTQNVLYENQKQIANQFNKAISQIQESLTTTSTALGKLQDVVNQNAQALNTLV AAP13567.1 YRFNGIGVTQNVLYENQKQIANQFNKAISQIQESLTTTSTALGKLQDVVNQNAQALNTLV QHD43416.1 YRFNGIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTASALGKLQDVVNQNAQALNTLV AVP78031.1 YRFNGIGVTQNVLYENQKLIANQFNSAIGKIQESLTSTASALGKLQDVVNQNAQALNTLV ABD75323.1 YRFNGIGVTQNVLYENQKQIANQFNKAITQIQESLTTTSTALGKLQDVVNQNAQALNTLV QDF43835.1 YRFNGIGVTQNVLYENQKQIANQFNKAISQIQESLTTTSTALGKLQDVVNQNAQALNTLV QDF43820.1 YRFNGIGVTQNVLYENQKQIANQFNKAISQIQESLTTTSTALGKLQDVVNQNAQALNTLV AAZ67052.1 YRFNGIGVTQNVLYENQKQIANQFNKAISQIQESLTTTSTALGKLQDVVNQNAQALNTLV AFS88936.1 YRLNGVGITQQVLSENQKLIANKFNQALGAMQTGFTTTNEAFQKVQDAVNNNAQALSKLA YP_0010399 YRLNGVGITQQVLSENQKLIANKFNQALGAMQTGFTTSNLAFSKVQDAVNANAQALSKLA **:**:*:**:** **** ***:**.*: :* .:::: *: *:**.** *****..*.
QDF43825.1 KQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRASAN ALK02457.1 KQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRASAN AAS10463.1 KQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRASAN AAP13441.1 KQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRASAN AAP13567.1 KQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRASAN QHD43416.1 KQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRASAN AVP78031.1 KQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRASAN ABD75323.1 KQLSSNFGAISSALNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRASAN QDF43835.1 KQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRASAN QDF43820.1 KQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRASAN AAZ67052.1 KQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRASAN AFS88936.1 SELSNTFGAISASIGDIIQRLDVLEQDAQIDRLINGRLTTLNAFVAQQLVRSESAALSAQ YP_0010399 SELSNTFGAISSSISDILARLDTVEQDAQIDRLINGRLISLNAFVSQQLVRSETAARSAQ .:**..*****: :.**: *** :* :.******.*** :*:::*:***:*: **:
QDF43825.1 LAATKMSECVLGQSKRVDFCGKGYHLMSFPQAAPHGVVFLHVTYVPSQERNFTTAPAICH ALK02457.1 LAATKMSECVLGQSKRVDFCGKGYHLMSFPQAAPHGVVFLHVTYVPSQERNFTTAPAICH AAS10463.1 LAATKMSECVLGQSKRVDFCGKGYHLMSFPQAAPHGVVFLHVTYVPSQERNFTTAPAICH AAP13441.1 LAATKMSECVLGQSKRVDFCGKGYHLMSFPQAAPHGVVFLHVTYVPSQERNFTTAPAICH AAP13567.1 LAATKMSECVLGQSKRVDFCGKGYHLMSFPQAAPHGVVFLHVTYVPSQERNFTTAPAICH QHD43416.1 LAATKMSECVLGQSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQEKNFTTAPAICH AVP78031.1 LAATKMSECVLGQSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYIPSQEKNFTTAPAICH ABD75323.1 LAATKMSECVLGQSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPSQEKNFTTAPAICH QDF43835.1 LAATKMSECVLGQSKRVDFCGRGYHLMSFPQAAPHGVVFLHVTYVPSQEKNFTTAPAICH QDF43820.1 LAATKMSECVLGQSKRVDFCGRGYHLMSFPQAAPHGVVFLHVTYVPSQEKNFTTAPAICH AAZ67052.1 LAATKMSECVLGQSKRVDFCGKGYHLMSFPQAAPHGVVFLHVTYVPSQERNFTTAPAICH AFS88936.1 LAKDKVNECVKAQSKRSGFCGQGTHIVSFVVNAPNGLYFMHVGYYPSNHIEVVSAYGLCD YP_0010399 LASDKVNECVKSQSKRNGFCGSGTHIVSFVVNAPNGFYFFHVGYVPTNYTNVTAAYGLCN ** *:.*** .**** .*** * *::** **:*. *:** * *:: :..:* .:*
QDF43825.1 EGKAYF---PREGVFVFNGTS-------WFITQRNFFSPQIITTDNT-FVSGSCDVVIGI ALK02457.1 EGKAYF---PREGVFVFNGTS-------WFITQRNFFSPQIITTDNT-FVSGSCDVVIGI AAS10463.1 EGKAYF---PREGVFVFNGTS-------WFITQRNFFSPQIITTDNT-FVSGNCDVVIGI AAP13441.1 EGKAYF---PREGVFVFNGTS-------WFITQRNFFSPQIITTDNT-FVSGNCDVVIGI AAP13567.1 EGKAYF---PREGVFVFNGTS-------WFITQRNFFSPQIITTDNT-FVSGNCDVVIGI QHD43416.1 DGKAHF---PREGVFVSNGTH-------WFVTQRNFYEPQIITTDNT-FVSGNCDVVIGI AVP78031.1 EGKAHF---PREGVFVSNGTH-------WFVTQRNFYEPKIITTDNT-FVSGNCDVVIGI ABD75323.1 EGKAYF---PREGVFVSNGSS-------WFITQRNFYSPQIITTDNT-FVAGSCDVVIGI QDF43835.1 EGKAYF---PREGVFVSNGTS-------WFITQRNFYSPQIITTDNT-FVAGSCDVVIGI QDF43820.1 EGKAYF---PREGVFVSNGTF-------WFITQRNFYSPQIITTDNT-FVAGNCDVVIGI AAZ67052.1 EGKAYF---PREGVFVSNGTS-------WFITQRNFYSPQIITTDNT-FVAGSCDVVIGI AFS88936.1 AANPTNCIAPVNGYFIKTNNT--RIVDEWSYTGSSFYAPEPITSLNTKYVA--PQVTYQN YP_0010399 NNNPPLCIAPIDGYFITNQTTTYSVDTEWYYTGSSFYKPEPITQANSRYVS--SDVKFDK :. * :* *: . . * * .*: *: ** *: :*: :*
QDF43825.1 INNTVYDPL---QPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRLNEV ALK02457.1 INNTVYDPL---QPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRLNEV AAS10463.1 INNTVYDPL---QPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQEEIDRLNEV AAP13441.1 INNTVYDPL---QPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRLNEV AAP13567.1 INNTVYDPL---QPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRLNEV QHD43416.1 VNNTVYDPL---QPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRLNEV AVP78031.1 INNTVYDPL---QPELDSFKEELDKYFKNHTSPDIDLGDISGINASVVNIQKEIDRLNEV ABD75323.1 INNTVYDPL---QPELDSFKQELDKYFKNHTSPDVDLGDISGINASVVDIQKEIDRLNEV QDF43835.1 INNTVYDPL---QPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRLNEV QDF43820.1 INNTVYDPL---QPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRLNEV AAZ67052.1 INNTVYDPL---QPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRLNEV AFS88936.1 ISTNLPPPLLGNSTGID-FQDELDEFFKNVSTSIPNFGSLTQINTTLLDLTYEMLSLQQV YP_0010399 LENNLPPPLLENSTDVD-FKDELEEFFKNVTSHGPNFAEISKINTTLLDLSDEMAMLQEV :...: ** .. :* *::**:::*** :: ::..:: **:::::: *: *::*
QDF43825.1 AKNLNESLIDLQELGKYEQYIKWPWYVWLGFIAGLIAIVMVTILLCCMTSCCSCLKGACS ALK02457.1 AKNLNESLIDLQELGKYEQYIKWPWYVWLGFIAGLIAIVMVTILLCCMTSCCSCLKGACS AAS10463.1 AKNLNESLIDLQELGKYEQYIKWPWYVWLGFIAGLIAIVMVTILLCCMTSCCSCLKGACS AAP13441.1 AKNLNESLIDLQELGKYEQYIKWPWYVWLGFIAGLIAIVMVTILLCCMTSCCSCLKGACS AAP13567.1 AKNLNESLIDLQELGKYEQYIKWPWYVWLGFIAGLIAIVMVTILLCCMTSCCSCLKGACS QHD43416.1 AKNLNESLIDLQELGKYEQYIKWPWYIWLGFIAGLIAIVMVTIMLCCMTSCCSCLKGCCS AVP78031.1 ARNLNESLIDLQELGKYEQYIKWPWYVWLGFIAGLIAIVMVTILLCCMTSCCSCLKGCCS ABD75323.1 AKNLNESLIDLQELGKYEQYIKWPWYVWLGFIAGLVGLFMAIILLCYFTSCCSCCKGMCS QDF43835.1 AKNLNESLIDLQELGKYEQYIKWPWYVWLGFIAGLIAIVMATILLCCMTSCCSCLKGACS QDF43820.1 AKNLNESLIDLQELGKYEQYIKWPWYVWLGFIAGLIAIVMATILLCCMTSCCSCLKGACS AAZ67052.1 AKNLNESLIDLQELGKYEQYIKWPWYVWLGFIAGLIAIVMVTILLCCMTSCCSCLKGACS AFS88936.1 VKALNESYIDLKELGNYTYYNKWPWYIWLGFIAGLVALALCVFFILCCTGCGTNCMGKLK YP_0010399 VKQLNDSYIDLKELGNYTYYNKWPWYVWLGFIAGLVALLLCVFFLLCCTGCGTSCLGKMK .. **:* ***:***:* * *****:********:.: : ::: *.* : * .
QDF43825.1 CGSCC-KFDEDDSEPVLKGVKLHYT ALK02457.1 CGSCC-KFDEDDSEPVLKGVKLHYT AAS10463.1 CGSCC-KFDEDDSEPVLKGVKLHYT AAP13441.1 CGSCC-KFDEDDSEPVLKGVKLHYT AAP13567.1 CGSCC-KFDEDDSEPVLKGVKLHYT QHD43416.1 CGSCC-KFDEDDSEPVLKGVKLHYT AVP78031.1 CGSCC-KFDEDDSEPVLKGVKLHYT ABD75323.1 CGSCC-RFDEDDSEPVLKGVKLHYT QDF43835.1 CGSCC-KFDEDDSEPVLKGVKLHYT QDF43820.1 CGSCC-KFDEDDSEPVLKGVKLHYT AAZ67052.1 CGSCC-KFDEDDSEPVLKGVKLHYT AFS88936.1 CNRCCDRYEEYDLEP----HKVHVH YP_0010399 CKNCCDSYEEYDVE------KIHVH * ** ::* * * *:*
- Next, I clicked the tab labeled 6.Tree Rendering, and I observed a phylogenetic tree with five sequences. This tree has:
- Horizontal lines (branches) that represent individual evolutionary lineages
- Vertical lines (splits) that represent mutation events
- The vertical length of each split is solely there for visual clarity with no biological meaning
- The left-most split is known as the root of the tree, and represents a hypothesis about the most recent common ancestor (MRCA) of the sequences within the tree.
- The length of each branch depicts the percentage change in residue sequence occurring along that branch, relative to the scale bar shown at the bottom of the tree.
- The scale bar will be a number between 0 and 1 and can be reinterpreted as a percent.
- For example, 0.05 would be 5%.
- The scale bar will be a number between 0 and 1 and can be reinterpreted as a percent.
- The tree may also contain support values for each clade; shown in red on the branches, also expressed as a number between 0 and 1.
- 0.05 would be 5%.
- In general, a higher support value indicates a higher statistical confidence in a particular clade.
Phylogenetic Tree: Comparison of the phylogenetic tree to the multiple sequence alignment:
- The class sequence alignment resembles the phylogenetic tree shown, as the branch points and distance between the sequences depict how conserved/not conserved they are in the sequence alignment. For instance, the two outgroups, AFS88936.1 [Human betacoronavirus 2c EMC/2012] and YP_001039953.1 spike glycoprotein [Tylonycteris bat coronavirus HKU4] branch from the same point, and they both show that they have little conserved compared to other spike protein sequences. The similarity in their sequences is to be expected, as they are sister taxa on the phylogenetic tree.
Comparison of the alignment to Figure 3 of the Wan et al (2020) paper:
- The amino acids that were discussed in the Wan et al. (2020) paper:
QDF43825.1 RQIAPGQTGVIADYNYKLPDDFMGC-VLAWNTRNIDATSTGNYNYKYRSLRHGKLRPFER AGZ48818.1 RQIAPGQTGVIADYNYKLPDDFTGC-VLAWNTRNIDATQTGNYNYKYRSLRHGKLRPFER ALK02457.1 RQIAPGQTGVIADYNYKLPDDFTGC-VLAWNTRNIDATQTGNYNYKYRSLRHGKLRPFER AAS10463.1 RQIAPGQTGVIADYNYKLPDDFMGC-VLAWNTRNIDATSTGNYNYKYRYLRHGKLRPFER AAP13441.1 RQIAPGQTGVIADYNYKLPDDFMGC-VLAWNTRNIDATSTGNYNYKYRYLRHGKLRPFER AAP13567.1 RQIAPGQTGVIADYNYKLPDDFMGC-VLAWNTRNIDATSTGNYNYKYRYLRHGKLRPFER QHD43416.1 RQIAPGQTGKIADYNYKLPDDFTGC-VIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFER AVP78031.1 RQVAPGQTGVIADYNYKLPDDFTGC-VIAWNTAKQD---VGNYF--YRSHRSTKLKPFER ABD75323.1 RQVAPGQTGVIADYNYKLPDDFTGC-VIAWNTAKQD---VGSYF--YRSHRSSKLKPFER QDF43835.1 RQVAPGETGVIADYNYKLPDDFTGC-VIAWNTAKQD---QGQYY--YRSSRKTKLKPFER ABD75332.1 RQVAPGETGVIADYNYKLPDDFTGC-VIAWNTAQQD---QGQYY--YRSYRKEKLKPFER QDF43820.1 RQVAPGETGVIADYNYKLPDDFTGC-VIAWNTAKQD---TGHYY--YRSHRKTKLKPFER AAZ67052.1 RQVAPGETGVIADYNYKLPDDFTGC-VIAWNTAKQD---QGQYY--YRSHRKTKLKPFER AFS88936.1 SDLSVSSAGPISQFNYKQSFSNPTC-LILATVPHNLTTITKPLKYSYINKCSRLLSDDRT YP_0010399 SYIRPGSAGNIPLYNYKQSFANPTCRVMASVLANVTITKPHAYG--YIS-KCSRLTGANQ : ..:* *. :*** . * :: : * * QDF43825.1 DISNVPFSPDGKPCTPP-AF-NCYW-----------PLNDYGFFTTNGIGYQPY AGZ48818.1 DISNVPFSPDGKPCTPP-AF-NCYW-----------PLNDYGFYITNGIGYQPY ALK02457.1 DISNVPFSPDGKPCTPP-AF-NCYW-----------PLNDYGFYITNGIGYQPY AAS10463.1 DISNVPFSPDGKPCTPP-AP-NCYW-----------PLNGYGFYTTSGIGYQPY AAP13441.1 DISNVPFSPDGKPCTPP-AL-NCYW-----------PLNDYGFYTTTGIGYQPY AAP13567.1 DISNVPFSPDGKPCTPP-AL-NCYW-----------PLNDYGFYTTTGIGYQPY QHD43416.1 DISTEIYQAGSTPCNGVEGF-NCYF-----------PLQSYGFQPTNGVGYQPY AVP78031.1 DLSSDE---------------NGVR-----------TLSTYDFNPNVPLEYQAT ABD75323.1 DLSSEE---------------NGVR-----------TLSTYDFNQNVPLEYQAT QDF43835.1 DLTSDE---------------NGVR-----------TLSTYDFYPNVPIEYQAT ABD75332.1 DLSSDE---------------NGVY-----------TLSTYDFYPSIPVEYQAT QDF43820.1 DLSSDDG--------------NGVY-----------TLSTYDFNPNVPVAYQAT AAZ67052.1 DLSSDE---------------NGVR-----------TLSTYDFYPSVPVAYQAT AFS88936.1 EVPQLVNANQYSPCVSI-VP-STVWEDGDYYRKQLSPLEGGGWLVASGSTVAMT YP_0010399 DVETPLYINPGEYSICRDFSPGGFSEDGQVFKRTLTQFEGGGLLIGVGTRVPMT :: . :. .
- It is observed that the amino acid residues have mutated in the bottom sequences. The five critical amino acids also have mutations within their sequences. In comparison to the class alignment, this alignment shows different amino acids, and the class alignment has a lot more sequences than the one in the paper. Therefore, it can be seen that the alignment from the article is more conserved than the class alignment, which has a lot of spaces. Furthermore, the article alignment shows many stars, indicating invariance. This was not seen as much within the class sequence alignment.
Comparison of the alignment to Figure 2 of the Wan et al (2020) paper:
- They are similar in that they both have two branch points. However, the tree in figure 2 shows one branch being very similar with little divergence, while the other branch shows a lot of divergence. In the class tree, the branches do not diverge as much and are more similar. Furthermore, there were many sequences within the tree from figure 2 that are not displayed in the class, making it difficult to compare the two.
Is enough information provided by Wan et al (2020) in their paper for us to reproduce their analysis? Explain your answer.
- Only if one knows the software that was used to provide their analysis and can reproduce phylogenetic trees and sequence alignments on their own would this be reproducible. Other than that, one would require a thorough explanation for the process of generating phylogeny.
Scientific Conclusion
This week's assignment taught us how to generate phylogenetic trees and sequence alignments.It also taught us how to work with GenBank records. Furthermore, we were able to analyze sequences by observing their highlighted portions, to determine if they were conserved or not. This assignment allows us to further our understanding of viral strains, and how similar or different infectious strains can be.
Acknowledgements
- I consulted with my partner Fatima Alghanem over zoom to discuss the figures and how to compare them to our tree.
- I referenced the Wan et. al - Receptor Recognition by the Novel Coronavirus from Wuhan paper for Figures 2 and 3.
- I created the phylogenetic tree and created sequence alignments using Phylogeny.fr.
- I obtained sequences from GenBank.
- I crowdsourced sequences from Week 4 Talk page.
- Except for what is noted above, this individual journal entry was completed by me and not copied from another source.
Kam Taghizadeh (talk) 23:15, 1 October 2020 (PDT)
References
- Phylogeny.fr: "One Click" Mode. (2020). Retrieved 30 September 2020, from http://www.phylogeny.fr/simple_phylogeny.cgi?workflow_id=b9c0813cbbe9695d63cf7e31da5f026d&tab_index=1
- NCBI GenBank. (2020). Spike protein [Bat SARS CoV Rf1/2004] - Protein. Retrieved 30 September 2020, from https://www.ncbi.nlm.nih.gov/protein/ABD75323.1?report=fasta
- NCBI GenBank. (2020). Bat SARS-like coronavirus Rs3367, complete genome - Nucleotide. Retrieved 30 September 2020, from https://www.ncbi.nlm.nih.gov/nuccore/556015127/
- OpenWetWare. (2020). BIOL368/F20:Week 4. Retrieved 30 September 2020, from https://openwetware.org/wiki/BIOL368/F20:Week_4#Data_.26_Tools
- OpenWetWare. (2020). Talk:BIOL368/F20:Week 4. Retrieved 30 September 2020, from https://openwetware.org/wiki/Talk:BIOL368/F20:Week_4
- Wan, Y., Shang, J., Graham, R., Baric, R., & Li, F. (2020). Receptor Recognition by the Novel Coronavirus from Wuhan: an Analysis Based on Decade-Long Structural Studies of SARS Coronavirus. Journal Of Virology, 94(7). doi: 10.1128/jvi.00127-20