IGEM:IMPERIAL/2006/project/parts/aiiasequence

Abstract
I am comparing the sequencing data from the aiiA plus immunotag (ie.after PCR) with the expected sequence using bio-informatics tools.

My conclusion from this work the sequencing data is unreliable so no conclusions can be drawn.

Method
Using Clustaw analysis I compared the sequencing data and the expected sequence. I set the program for maximum gap penalties which causes the program to effectively slide the sequences against each other and pick the one with the most bases aligned.

Results
Running alignment with maximum penalties for gap opening and extension, this tells the program to slide the sequences against each other and pick the alignment with the highest number of bases matching.

Score = 29 this effectively says there is no similarity between the sequences.

CLUSTAL W (1.83) multiple sequence alignment aiia Sequenceing     NNANANNAACNATAGAGTGGAAANGGGGNNATCTTGAGNCNTGGGNAAAACGGNAAAAAA 60 aiia Sequenceing     GGNGTGNANGTNATNTNCNGGAATTTCCTGNAGANTGNNAAANAGAGAGNANNTGAATAG 120 aiia Sequenceing     AAAACNGNCNNCAGNNGNNCNCTTTTACNGNGGTANCAANANGGCAAAGNANGCTTTNCG 180 aiia Sequenceing     NGCTNANTNTTGNNGNNGANAGAANNGGTANTNGGANGNTNGNAGGACNCCNNATNACNC 240 aiia Sequenceing     GGGATAANGATNTATNGNNTTAANAGGACCTNANAATAGTTNNACNATAGNCNTNNCTNG 300 aiia Sequenceing     ACNAGNNNTTCNANTNGCGNGNNAANTNATNCACNTNAAGCNGGAAANTGTTGNCNANAC 360 aiia            --GATTATAAAGATGATGAT 18 Sequenceing     NNGNTCTGCNTACGGCANNNGAGANNTNCAGCANGCTTNTATNNNNACGAGAGANTCGNT 420 * *       * * aiia             GATAAAGGTATGACAGTAAAGAAGCTTTATTTCGTCCCAGCAGGTCGTTGTATGTTGGAT 78 Sequenceing     NNNCNNGNANNGNNANTNGNGCCGNCGTTAATGATANNNCGGCTTNANTNCGAAGNGTCN 480 *   *  * *   *  *   *   *  *         *   *       *     aiia             CATTCGTCTGTTAATAGTACATTAACACCAGGAGAATTATTAGACTTACCGGTTTGGTGT 138 Sequenceing     NGCCTNCNANANGAACNNANCCTGGANAGAGAANCANNGGATANGNGGGCNACTCCGNAN 540 *   *   *      ** *  *             *   *  *     aiia             TATCTTTTGGAGACTGAAGAAGGACCTATTTTAGTAGATACAGGTATGCCAGAAAGTGCA 198 Sequenceing     NGACNATCAATNCNNNNNGGCNAAANNNNNGGACCGNNNCCGGTCTNNNNNCACCACGCN 600 * *           *    *        *       * *         *    **   aiia             GTTAATAATGAAGGTCTTTTTAACGGTACATTTGTCGAAGGGCAGGTTTTACCGAAAATG 258 Sequenceing     GNANCCGNCCNGTAAACCCACNNTGCCGGGCTANNNGGGANGNANNNNCTCTNACNNTNG 660 *                      *      *    *    * *     *         * aiia             ACTGAAGAAGATAGAATCGTGAATATTTTAAAACGGGTTGGTTATGAGCCGGAAGACCTT 318 Sequenceing     CAATTGNCNNNACGANGANGNNCCTGNATACNCTNCTGNACCTNGNNANNGGCNGTGGCT 720 **            **            *       **  *    * aiia             CTTTATATTATTAGTTCTCACTTGCATTTTGATCATGCAGGAGGAAATGGCGCTTTTATA 378 Sequenceing     NTNACANCNNCNNNNTGGTNGGNGNNNTNNNTCTNNNGNAATNNNAANNNNCTAATNTNN 780 *            *       *   *                 **        *     aiia             AATACACCAATCATTGTACAGCGTGCTGAATATGAGGCGGCGCAGCATAGCGAAGAATAT 438 Sequenceing     GNCANCNNNAGNGNGGNANGNGANTCNTANNTNTTNNNNNNAAANNNNNGNNNTGNTGNN 840 *    *     * *       *  *              *     *    *       aiia             TTGAAAGAATGTATATTGCCGAATTTAAACTACAAAATCATTGAAGGTGATTATGAAGTC 498 Sequenceing     CNGNNTNNNCANGNGANNNCTTNNAAGNGNNGCGTNCACNNNTNANGAANNNCATNANNN 900 *               *            *     *     * *         *    aiia             GTACCAGGAGTTCAATTATTGCATACACCAGGCCATACTCCAGGGCATCAATCGCTATTA 558 Sequenceing     NNANCTAGNANCNANNNNGNTAGNGNNNTNTNNCGNCTTANNNNNAANNTNGAAGNNAGG 960 * * *     *                   *    *       *              aiia             ATTGAGACAGAAAAATCCGGTCCTGTATTATTAACGATTGATGCATCGTATACGAAAGAG 618 Sequenceing     GAANAAANGNGGNCGTNNTNTTANGAGAGNNNNNNNNGCNNNGTNNAAGNNGNGNNNNNC 1020 * *       *    *   *                 *          *       aiia             AATTTTGAAAATGAAGTGCCATTTGCGGGATTTGATTCAGAATTAGCTTTATCTTCAATT 678 Sequenceing     ANGNNGNNGAGTGCGNGGCGNANTGGNGGANNNANNANNNGNNGCGCCANNNNNTGGNAT 1080 *       * **    **    **  ***               **       *    * aiia             AAACGTTTAAAAGAAGTGGTGATGAAAGAGAAGCCGATTGTTTTCTTTGGACATGATATA 738 Sequenceing     AAGTNGANNATTNNGGCAGNAANANGGNAAGNTANNNANGAGNANCNTNACNNANANNNC 1140 **      *     *  *  *      *          *       *       *     aiia             GAGCAGGAAAGGGGATGTAAAGTGTTCCCTGAATATATAGCTGCAAACGACGAAAACTAC 798 Sequenceing     ATNNCANNGTNTGACAGANCTTNNCNNCNTCNCCNAGNNNACTGAGANCTNGAAANNNTN 1200 *  *          * *              * *    ****      aiia             GCTTTAGTAGCTTAATAA-- 816 Sequenceing     NGAAANGAANNNNNANNATNTAANTNNNNNNNTNNTNNCGGNNACCCNCTNG 1252 * *    *  *                                    PLEASE NOTE: Showing colors on large alignments is slow.

Running first alignment, Defult penalties for gap open and extention. This is useful in evolutionary analysis but not for our purpouses.

 CLUSTAL W (1.83) multiple sequence alignment aiia Sequenceing     NNANANNAACNATAGAGTGGAAANGGGGNNATCTTGAGNCNTGGGNAAAACGGNAAAAAA 60 aiia Sequenceing     GGNGTGNANGTNATNTNCNGGAATTTCCTGNAGANTGNNAAANAGAGAGNANNTGAATAG 120 aiia Sequenceing     AAAACNGNCNNCAGNNGNNCNCTTTTACNGNGGTANCAANANGGCAAAGNANGCTTTNCG 180 aiia Sequenceing     NGCTNANTNTTGNNGNNGANAGAANNGGTANTNGGANGNTNGNAGGACNCCNNATNACNC 240 aiia Sequenceing     GGGATAANGATNTATNGNNTTAANAGGACCTNANAATAGTTNNACNATAGNCNTNNCTNG 300 aiia            GATTATAA 8 Sequenceing     ACNAGNNNTTCNANTNGCGNGNNAANTNATNCACNTNAAGCNGGAAANTGTTGNCNANAC 360 *  * *  aiia             AGATGATGATGATAA-AGGTATGACAGTAAAGAAGCTTTATTTCGTCCCAGCAGGTCGTT 67 Sequenceing     NNGNTCTGCNTACGGCANNNGAGANNTNCAGCANGCTTNTATNNNNACGAGAGANTCGNT 420 **  *    *     **     *  * ****   *     * **    *** * aiia             GTA--TGTTGGATCATTCGTCTGTTAATAGTACATTAACACCAGGA-GAATTATTA 120 Sequenceing     NNNCNNGNANNGNNANTNGNGCCGNCGTTAATGATANNNCGGCTTNANTNCGAAGNGTCN 480 *      * * *       ******  **      *   *    ***   *   aiia             GACTTACCGGTTTGGTGTTATCTTT-TGGAGACTGAAGAAGGACCTATTTTAGTAG 175 Sequenceing     NGCCTNCNANANGAACNNANCCTGGANAGAGAANCANNGGATANGNGGGCNACTCCGNAN 540 * * *             **     ****      * * * *        *  * *  aiia             ATACAGGTA-TGCCAGAAAGTGCAGTTAATA-ATGAAGGTCTTTTTAACGGTAC-ATTTG 232 Sequenceing     NGACNATCAATNCNNNNNGGCNAAANNNNNGGACCGNNNCCGGTCTNNNNNCACCACGCN 600 **   * * *      *   *        *       *  * *      ** *     aiia             TCGAAGGGCAGGT---TTTACCGAAAATGACTGAAGAAGATAGAATCG---TGAATATTT 286 Sequenceing     GNANCCGNCCNGTAAACCCACNNTGCCGGGCTANNNGGGANGNANNNNCTCTNACNNTNG 660 * * **      **       * **      **   *       * *   *   aiia             TAA--AACGGGTTGGTTATGAGCCGGAAGACCTTCTTTATATTATTAGTTCT--CA 338 Sequenceing     CAATTGNCNNNACGANGANGNNCCTGNATACNCTNCTGNACCTNGNNANNGGCNGTGGCT 720 **   *     *   * *  ** * * **  *  *          *   *   *  *  aiia             CTTGCATTTTGATCATGCAGGAGGAAATGGCGCTTTTATAAATACACCAATC-ATTGTAC 397 Sequenceing     NTNACANCNNCNNNNTGGTNGGNGNNNTNNNTCTNNNGNAATNNNAANNNNCTAATNTNN 780 * **         **   *  *   *    **     **    *     * * * *   aiia             AGCGTGCTGAATATGAGGCGGCGCAGCATAGCGAAGAATATTTGAAAGAATGTATATTGC 457 Sequenceing     GNCANCNNNAGNGNG-GNANGNGANTCNTANNTNTTNNNNNNAAANNNNNGNNNTGNTGN 839 *     *    * *   * *   * **              *         *  **  aiia             CGAATTTAAACTACAAAATCATTGAAGGTGAT-TATGAAGTCGTACCAGGAG 508 Sequenceing     NCNGNNTNNNCANGNGANNNCTTNNAAGNGNNGCGTNCACNNNTNANGAANNNCATNANN 899 *  *     *    **  * * *             * * *     *       aiia             TTCAATTATTGCATACACCAGGCCATACTCCAGGGCATCAATCGCTATTAATTGAG 564 Sequenceing     NNNANCTAGNANCNANNNNGNTAGNGNNNTNTNNCGNCTTANNNNNAANNTNGAAGNNAG 959 * **      *          *   *    *     *            **   ** aiia             ACAGAAAAATCCGGTCCTGTATTATTAA--CGATTGATGCATCGTATACGAAAGAGAATT 622 Sequenceing     GGAANAAANGNGGNCGTNNTNTTANGAGAGNNNNNNNNGCNNNGTNNAAGNNGNGNNNNN 1019 * ***    *      * ***  *           **   **  * *           aiia             TTGAAAATGAAGTGCCATTTGCGGGATTTGATTCAGAATTAGCTTTATCTTCAATTAAAC 682 Sequenceing     CANGNNGNNGAGTGCGNGGCGNANTGGNGGANNNANNANNNGNNGCGCCANNNNNTGGNA 1079 *****    *        **   *  *   *      *      *      aiia             ---GTTTAAAAGAAGTGGTGATGAAAGAGAAGCCGATTGT---TTTCTTTGGACATGATA 736 Sequenceing     TAAGTNGANNATTNNGGCAGNAANANGGNAAGNTANNNANGAGNANCNTNACNNANANNN 1139 ** *  *     *  *    * *  ***              * *     *      aiia             TAGAGCAGGAAAGG---GGATGTAAAGTGTTCCCTGAATATATAGCTGCAAACGAC 789 Sequenceing     CATNNCANNGTNTGACAGANCTTNNCNNCNTCNCCNAGNNNACTGAGANCTNGAAANNNT 1199 *  **      *   *    *       ** *  *    *     * **  ***     aiia             GA-AAACTACGCTTTAGTAGCTTAATAA- 816 Sequenceing     NNGAAANGAANNNNNANNATNTAANTNNNNNNNTNNTNNCGGNNACCCNCTNG 1252 *** *      *  *  * * *                            PLEASE NOTE: Showing colors on large alignments is slow.

Conclusion
There is no homology (similarity) between the sequencing data and the desired sequence. There are far more errors than you would expect from PCR alone. The sequencing could only identify 62.7% of the bases, usually you would expect above 95% accuracy so I conclude that the sequencing data is unreliable. No useful information can be gained from this data.

The most likely possible cause of this is errors in the primer design, for sequencing the primers must be perfect. It could also be that the DNA is impure and there are multiple plasmids stored.