IGEM:IMPERIAL/2006/project/parts/aiiasequence
Abstract
I am comparing the sequencing data from the aiiA plus immunotag (ie.after PCR) with the expected sequence using bio-informatics tools.
My conclusion from this work the sequencing data is unreliable so no conclusions can be drawn.
Method
Using Clustaw analysis I compared the sequencing data and the expected sequence. I set the program for maximum gap penalties which causes the program to effectively slide the sequences against each other and pick the one with the most bases aligned.
Results
Running alignment with maximum penalties for gap opening and extension, this tells the program to slide the sequences against each other and pick the alignment with the highest number of bases matching.
Score = 29 this effectively says there is no similarity between the sequences.
CLUSTAL W (1.83) multiple sequence alignment
aiia ------------------------------------------------------------
Sequenceing NNANANNAACNATAGAGTGGAAANGGGGNNATCTTGAGNCNTGGGNAAAACGGNAAAAAA 60
aiia ------------------------------------------------------------
Sequenceing GGNGTGNANGTNATNTNCNGGAATTTCCTGNAGANTGNNAAANAGAGAGNANNTGAATAG 120
aiia ------------------------------------------------------------
Sequenceing AAAACNGNCNNCAGNNGNNCNCTTTTACNGNGGTANCAANANGGCAAAGNANGCTTTNCG 180
aiia ------------------------------------------------------------
Sequenceing NGCTNANTNTTGNNGNNGANAGAANNGGTANTNGGANGNTNGNAGGACNCCNNATNACNC 240
aiia ------------------------------------------------------------
Sequenceing GGGATAANGATNTATNGNNTTAANAGGACCTNANAATAGTTNNACNATAGNCNTNNCTNG 300
aiia ------------------------------------------------------------
Sequenceing ACNAGNNNTTCNANTNGCGNGNNAANTNATNCACNTNAAGCNGGAAANTGTTGNCNANAC 360
aiia ------------------------------------------GATTATAAAGATGATGAT 18
Sequenceing NNGNTCTGCNTACGGCANNNGAGANNTNCAGCANGCTTNTATNNNNACGAGAGANTCGNT 420
* * * *
aiia GATAAAGGTATGACAGTAAAGAAGCTTTATTTCGTCCCAGCAGGTCGTTGTATGTTGGAT 78
Sequenceing NNNCNNGNANNGNNANTNGNGCCGNCGTTAATGATANNNCGGCTTNANTNCGAAGNGTCN 480
* * * * * * * * * * * *
aiia CATTCGTCTGTTAATAGTACATTAACACCAGGAGAATTATTAGACTTACCGGTTTGGTGT 138
Sequenceing NGCCTNCNANANGAACNNANCCTGGANAGAGAANCANNGGATANGNGGGCNACTCCGNAN 540
* * * ** * * * * *
aiia TATCTTTTGGAGACTGAAGAAGGACCTATTTTAGTAGATACAGGTATGCCAGAAAGTGCA 198
Sequenceing NGACNATCAATNCNNNNNGGCNAAANNNNNGGACCGNNNCCGGTCTNNNNNCACCACGCN 600
* * * * * * * * **
aiia GTTAATAATGAAGGTCTTTTTAACGGTACATTTGTCGAAGGGCAGGTTTTACCGAAAATG 258
Sequenceing GNANCCGNCCNGTAAACCCACNNTGCCGGGCTANNNGGGANGNANNNNCTCTNACNNTNG 660
* * * * * * * *
aiia ACTGAAGAAGATAGAATCGTGAATATTTTAAAACGGGTTGGTTATGAGCCGGAAGACCTT 318
Sequenceing CAATTGNCNNNACGANGANGNNCCTGNATACNCTNCTGNACCTNGNNANNGGCNGTGGCT 720
** ** * ** * *
aiia CTTTATATTATTAGTTCTCACTTGCATTTTGATCATGCAGGAGGAAATGGCGCTTTTATA 378
Sequenceing NTNACANCNNCNNNNTGGTNGGNGNNNTNNNTCTNNNGNAATNNNAANNNNCTAATNTNN 780
* * * * ** *
aiia AATACACCAATCATTGTACAGCGTGCTGAATATGAGGCGGCGCAGCATAGCGAAGAATAT 438
Sequenceing GNCANCNNNAGNGNGGNANGNGANTCNTANNTNTTNNNNNNAAANNNNNGNNNTGNTGNN 840
* * * * * * * * *
aiia TTGAAAGAATGTATATTGCCGAATTTAAACTACAAAATCATTGAAGGTGATTATGAAGTC 498
Sequenceing CNGNNTNNNCANGNGANNNCTTNNAAGNGNNGCGTNCACNNNTNANGAANNNCATNANNN 900
* * * * * * *
aiia GTACCAGGAGTTCAATTATTGCATACACCAGGCCATACTCCAGGGCATCAATCGCTATTA 558
Sequenceing NNANCTAGNANCNANNNNGNTAGNGNNNTNTNNCGNCTTANNNNNAANNTNGAAGNNAGG 960
* * * * * * *
aiia ATTGAGACAGAAAAATCCGGTCCTGTATTATTAACGATTGATGCATCGTATACGAAAGAG 618
Sequenceing GAANAAANGNGGNCGTNNTNTTANGAGAGNNNNNNNNGCNNNGTNNAAGNNGNGNNNNNC 1020
* * * * * * *
aiia AATTTTGAAAATGAAGTGCCATTTGCGGGATTTGATTCAGAATTAGCTTTATCTTCAATT 678
Sequenceing ANGNNGNNGAGTGCGNGGCGNANTGGNGGANNNANNANNNGNNGCGCCANNNNNTGGNAT 1080
* * ** ** ** *** ** * *
aiia AAACGTTTAAAAGAAGTGGTGATGAAAGAGAAGCCGATTGTTTTCTTTGGACATGATATA 738
Sequenceing AAGTNGANNATTNNGGCAGNAANANGGNAAGNTANNNANGAGNANCNTNACNNANANNNC 1140
** * * * * * * * *
aiia GAGCAGGAAAGGGGATGTAAAGTGTTCCCTGAATATATAGCTGCAAACGACGAAAACTAC 798
Sequenceing ATNNCANNGTNTGACAGANCTTNNCNNCNTCNCCNAGNNNACTGAGANCTNGAAANNNTN 1200
* * * * * * ****
aiia GCTTTAGTAGCTTAATAA---------------------------------- 816
Sequenceing NGAAANGAANNNNNANNATNTAANTNNNNNNNTNNTNNCGGNNACCCNCTNG 1252
* * * *
PLEASE NOTE: Showing colors on large alignments is slow.
Running first alignment, Defult penalties for gap open and extention. This is useful in evolutionary analysis but not for our purpouses.
CLUSTAL W (1.83) multiple sequence alignment
aiia ------------------------------------------------------------
Sequenceing NNANANNAACNATAGAGTGGAAANGGGGNNATCTTGAGNCNTGGGNAAAACGGNAAAAAA 60
aiia ------------------------------------------------------------
Sequenceing GGNGTGNANGTNATNTNCNGGAATTTCCTGNAGANTGNNAAANAGAGAGNANNTGAATAG 120
aiia ------------------------------------------------------------
Sequenceing AAAACNGNCNNCAGNNGNNCNCTTTTACNGNGGTANCAANANGGCAAAGNANGCTTTNCG 180
aiia ------------------------------------------------------------
Sequenceing NGCTNANTNTTGNNGNNGANAGAANNGGTANTNGGANGNTNGNAGGACNCCNNATNACNC 240
aiia ------------------------------------------------------------
Sequenceing GGGATAANGATNTATNGNNTTAANAGGACCTNANAATAGTTNNACNATAGNCNTNNCTNG 300
aiia ----------------------------------------------------GATTATAA 8
Sequenceing ACNAGNNNTTCNANTNGCGNGNNAANTNATNCACNTNAAGCNGGAAANTGTTGNCNANAC 360
* * *
aiia AGATGATGATGATAA-AGGTATGACAGTAAAGAAGCTTTATTTCGTCCCAGCAGGTCGTT 67
Sequenceing NNGNTCTGCNTACGGCANNNGAGANNTNCAGCANGCTTNTATNNNNACGAGAGANTCGNT 420
** * * ** * * **** * * ** *** *
aiia GTA--TGTTGGATCATTCGT----CTGTTAATAGTACATTAACACCAGGA-GAATTATTA 120
Sequenceing NNNCNNGNANNGNNANTNGNGCCGNCGTTAATGATANNNCGGCTTNANTNCGAAGNGTCN 480
* * * * ****** ** * * *** *
aiia GACTTACCGGTTTGGTGTTATCTTT-TGGAGA----CTGAAGAAGGACCTATTTTAGTAG 175
Sequenceing NGCCTNCNANANGAACNNANCCTGGANAGAGAANCANNGGATANGNGGGCNACTCCGNAN 540
* * * ** **** * * * * * * *
aiia ATACAGGTA-TGCCAGAAAGTGCAGTTAATA-ATGAAGGTCTTTTTAACGGTAC-ATTTG 232
Sequenceing NGACNATCAATNCNNNNNGGCNAAANNNNNGGACCGNNNCCGGTCTNNNNNCACCACGCN 600
** * * * * * * * * * ** *
aiia TCGAAGGGCAGGT---TTTACCGAAAATGACTGAAGAAGATAGAATCG---TGAATATTT 286
Sequenceing GNANCCGNCCNGTAAACCCACNNTGCCGGGCTANNNGGGANGNANNNNCTCTNACNNTNG 660
* * ** ** * ** ** * * * *
aiia TAA--AACGGGTTGGTTATGAGCCGGAAGACCTTCTTTA----TATTATTAGTTCT--CA 338
Sequenceing CAATTGNCNNNACGANGANGNNCCTGNATACNCTNCTGNACCTNGNNANNGGCNGTGGCT 720
** * * * * ** * * ** * * * * * *
aiia CTTGCATTTTGATCATGCAGGAGGAAATGGCGCTTTTATAAATACACCAATC-ATTGTAC 397
Sequenceing NTNACANCNNCNNNNTGGTNGGNGNNNTNNNTCTNNNGNAATNNNAANNNNCTAATNTNN 780
* ** ** * * * ** ** * * * * *
aiia AGCGTGCTGAATATGAGGCGGCGCAGCATAGCGAAGAATATTTGAAAGAATGTATATTGC 457
Sequenceing GNCANCNNNAGNGNG-GNANGNGANTCNTANNTNTTNNNNNNAAANNNNNGNNNTGNTGN 839
* * * * * * * ** * * **
aiia CGAATTTAAACTACAAAATCATTGAAGGTGAT---------TATGAAGTCGTACCAGGAG 508
Sequenceing NCNGNNTNNNCANGNGANNNCTTNNAAGNGNNGCGTNCACNNNTNANGAANNNCATNANN 899
* * * ** * * * * * * *
aiia TTCAATTATTGCATACACCAG----GCCATACTCCAGGGCATCAATCGCTATTAATTGAG 564
Sequenceing NNNANCTAGNANCNANNNNGNTAGNGNNNTNTNNCGNCTTANNNNNAANNTNGAAGNNAG 959
* ** * * * * * ** **
aiia ACAGAAAAATCCGGTCCTGTATTATTAA--CGATTGATGCATCGTATACGAAAGAGAATT 622
Sequenceing GGAANAAANGNGGNCGTNNTNTTANGAGAGNNNNNNNNGCNNNGTNNAAGNNGNGNNNNN 1019
* *** * * *** * ** ** * *
aiia TTGAAAATGAAGTGCCATTTGCGGGATTTGATTCAGAATTAGCTTTATCTTCAATTAAAC 682
Sequenceing CANGNNGNNGAGTGCGNGGCGNANTGGNGGANNNANNANNNGNNGCGCCANNNNNTGGNA 1079
***** * ** * * * * *
aiia ---GTTTAAAAGAAGTGGTGATGAAAGAGAAGCCGATTGT---TTTCTTTGGACATGATA 736
Sequenceing TAAGTNGANNATTNNGGCAGNAANANGGNAAGNTANNNANGAGNANCNTNACNNANANNN 1139
** * * * * * * *** * * *
aiia TAGAGCAGGAAAGG---GGATGTAAAGTGTTCCCTGAATATAT----AGCTGCAAACGAC 789
Sequenceing CATNNCANNGTNTGACAGANCTTNNCNNCNTCNCCNAGNNNACTGAGANCTNGAAANNNT 1199
* ** * * * ** * * * * ** ***
aiia GA-AAACTACGCTTTAGTAGCTTAATAA------------------------- 816
Sequenceing NNGAAANGAANNNNNANNATNTAANTNNNNNNNTNNTNNCGGNNACCCNCTNG 1252
*** * * * * * *
PLEASE NOTE: Showing colors on large alignments is slow.
Conclusion
There is no homology (similarity) between the sequencing data and the desired sequence. There are far more errors than you would expect from PCR alone. The sequencing could only identify 62.7% of the bases, usually you would expect above 95% accuracy so I conclude that the sequencing data is unreliable. No useful information can be gained from this data.
The most likely possible cause of this is errors in the primer design, for sequencing the primers must be perfect. It could also be that the DNA is impure and there are multiple plasmids stored.