Difference between revisions of "Harvard:Biophysics 101/2007/Notebook:Michael Wang/2007-2-13"

From OpenWetWare
Jump to: navigation, search
(Removing all content from page)
 
Line 1: Line 1:
For anyone still trying to get clustalw working on a PC after reading the link [http://openwetware.org/wiki/Talk:Harvard:Biophysics_101/2007/02/13/install-clustalw#Windows here], the key seems to be making sure that clustalw works from the command line.  Even if you set it up properly, any problems in the actual call will give you the same error as if you didn't set it up properly.  The only thing python cares about is whether or not the output file was created.
 
  
The current version of my code is not very intelligent on the analysis side.  It currently sucks up all the fasta files in the ./import folder of the current directory and then compiles them into a single file.  This file is passed into clustalw for alignment. 
 
 
<pre>
 
#!/usr/bin/env python
 
 
import os
 
from Bio import Clustalw
 
 
#This first section of code merges all fasta files located in the input folder of curdir
 
#into a single file called all.fasta
 
input_list = list(os.listdir(os.path.join(os.curdir,'input')))
 
print input_list
 
merged_file = open(os.path.join(os.curdir, 'all.fasta'),"w")
 
print os.path.join(os.curdir, 'all.fasta')
 
for i in input_list:
 
        print "loading ", os.path.join(os.curdir,'input\\',i)
 
        current_file = open(os.path.join(os.curdir,'input\\',i),"r")
 
        all_lines = current_file.readlines()
 
        merged_file.writelines(all_lines)
 
        current_file.close()
 
        merged_file.write("\n\n")
 
print "done making file"
 
merged_file.close()
 
 
#Once the merged file has been created, it is passed into the alignment program
 
cline = Clustalw.MultipleAlignCL(os.path.join(os.curdir, 'all.fasta'))
 
cline.set_output('test.aln')
 
alignment = Clustalw.do_alignment(cline)
 
all_records = alignment.get_all_seqs()
 
 
print alignment
 
</pre>
 
 
I have yet to write code to do counts of say, how many frameshift mutations there are, etc.  It just prints the raw alignment for now.
 
 
Using a test files uploaded [[Media:apoemod.fasta]] and [[Media:Copy of apoe.fasta]], the following output is generated.
 
 
<pre>
 
loading  .\input\apoe.fasta
 
loading  .\input\Copy of apoe.fasta
 
done making file
 
CLUSTAL X (1.81) multiple sequence alignment
 
 
 
gi|178350|gb|K00296.1|HUMAPOE3      CGCAGCGGAGGTGAAGGACGTCCTTCCCCAGGAGCCGACTGGCCAATCAC
 
gi|189350|gb|K10296.1|HUMAPOE3      CGCAGCGGAGGTGAAGGACGTCCTTCCCCAGGAGCCGACTGGCCAATCAC
 
gi|178850|gb|K00396.1|HUMAPOE3      CGCAGCGGAGGTGAAGGACGTCCTTCCCCAGGAGCCGACTGGCCAATCAC
 
gi|178843|gb|K06396.1|HUMAPOE3      CGCAGCGGAGGTGAAGGACGTCCTTCCCCAGGAGCCGACTGGCCAATCAC
 
                                    **************************************************
 
 
gi|178350|gb|K00296.1|HUMAPOE3      AGGCAGGAAGATGAAGGTTCTGTGGGCTGCGTTGCTGGTCACATTCCTGG
 
gi|189350|gb|K10296.1|HUMAPOE3      AGGCAGGAAGATGAAGGTTCTGTGGGCTGCGTTGCTGGTCACATTCCTGG
 
gi|178850|gb|K00396.1|HUMAPOE3      AGGCAGGAAGATGAAGGTTCTGTGGGCTGCGTTGCTGGTCACATTCCTGG
 
gi|178843|gb|K06396.1|HUMAPOE3      AGGCAGGAAGATGAAGGTTCTGTGGGCTGCGTTGCTGGTCACATTCCTGG
 
                                    **************************************************
 
 
gi|178350|gb|K00296.1|HUMAPOE3      CAGGATGCCAGGCCAAGGTGGAGCAAGCGGTGGAGACAGAGCCGGAGCCC
 
gi|189350|gb|K10296.1|HUMAPOE3      CAGGATGCCAGGCCAAGGTGGAG--GGCGGTGGAGACAGAGCCGGAGCCC
 
gi|178850|gb|K00396.1|HUMAPOE3      CAGGATGCCAGGCCAAGGTGGAGCAAGCGGTGGAGACAGAGCCGGAGCCC
 
gi|178843|gb|K06396.1|HUMAPOE3      CAGGATGCCAGGCCAAGGTGGAGCAAGCGGTGGAGACAGAGCCGGAGCCC
 
                                    ***********************  ************************
 
 
gi|178350|gb|K00296.1|HUMAPOE3      GAGCTGCGCCAGCAGACCGAGTGGCAGAGCGGCCAGCGCTGGGAACTGGC
 
gi|189350|gb|K10296.1|HUMAPOE3      GAGCTGCGCCAGCAGACCGAGTGGCAGAGCGGCCAGCGCTGGGAACTGGC
 
gi|178850|gb|K00396.1|HUMAPOE3      GAGCTGCGCCAGCAGACCGAGTGGCAGAGCGGCCAGCGCTGGGAACTGGC
 
gi|178843|gb|K06396.1|HUMAPOE3      GAGCTGCGCCAGCAGACCGAGTGGCAGAGCGGCCAGCGCTGGGAACTGGC
 
                                    **************************************************
 
 
gi|178350|gb|K00296.1|HUMAPOE3      ACTGGGTCGCTTTTGGGATTAATCCTGCGCTGGGTGCAGACACTGTCTGA
 
gi|189350|gb|K10296.1|HUMAPOE3      ACTGGGTCGCTTTTGGGATTAATCCTGCGCTGGGTGCAGACACTGTCTGA
 
gi|178850|gb|K00396.1|HUMAPOE3      ACTGGGTCGCTTTTGGGATTA--CCTGCGCTGGGTGCAGACACTGTCTGA
 
gi|178843|gb|K06396.1|HUMAPOE3      ACTGGGTCGCTTTTGGGATTA--CCTGCGCTGGGTGCAGACACTGTCTGA
 
                                    *********************  ***************************
 
 
gi|178350|gb|K00296.1|HUMAPOE3      GCAGGTGCAGGAGGAGCTGCTCAGCTCCCAGGTCACCCAGGAACTGAGGG
 
gi|189350|gb|K10296.1|HUMAPOE3      GCAGGTGCAGGAGGAGCTGCTCAGCTCCCAGGTCACCCAGGAACTGAGGG
 
gi|178850|gb|K00396.1|HUMAPOE3      GCAGGTGCAGGAGGAGCTGCTCAGCTCCCAGGTCACCCAGGAACTGAGGG
 
gi|178843|gb|K06396.1|HUMAPOE3      GCAGGTGCAGGAGGAGCTGCTCAGCTCCCAGGTCACCCAGGAACTGAGGG
 
                                    **************************************************
 
 
gi|178350|gb|K00296.1|HUMAPOE3      CGCTGATGGACGAGACCATGAAGGAGTTGAAGGCCTACAAATCGGAACTG
 
gi|189350|gb|K10296.1|HUMAPOE3      CGCTGATGGACGAGACCATGAAGGAGTTGAAGGCCTACAAATCGGAACTG
 
gi|178850|gb|K00396.1|HUMAPOE3      CGCTGATGGACGAGACCATGAAGGAGTTGAAGGCCTACAAATCGGAACTG
 
gi|178843|gb|K06396.1|HUMAPOE3      CGCTGATGGACGAGACCATGAAGGAGTTGAAGGCCTACAAATCGGAACTG
 
                                    **************************************************
 
 
gi|178350|gb|K00296.1|HUMAPOE3      GAGGAACAACTGACCCCGGTGGCGGAGGAGACGCGGGCACGGCTGTCCAA
 
gi|189350|gb|K10296.1|HUMAPOE3      GAGGAACAACTGACCCCGGTGGCGGAGGAGACGCGGGCACGGCTGTCCAA
 
gi|178850|gb|K00396.1|HUMAPOE3      GAGGAACAACTGACCCCGGTGGCGGAGGAGACGCGGGCACGGCTGTCCAA
 
gi|178843|gb|K06396.1|HUMAPOE3      GAGGAACAACTGACCCCGGTGGCGGAGGAGACGCGGGCACGGCTGTCCAA
 
                                    **************************************************
 
 
gi|178350|gb|K00296.1|HUMAPOE3      GGAGCTGCAGGCGGCGCAGGCCCGGCTGGGCGCGGACATGGAGGACGTGT
 
gi|189350|gb|K10296.1|HUMAPOE3      GGAGCTGCAGGCGGCGCAGGCCCGGCTGGGCGCGGACATGGAGGACGTGT
 
gi|178850|gb|K00396.1|HUMAPOE3      GGAGCTGCAGGCGGCGCAGGCCCGGCTGGGCGCGGACATGGAGGACGTGT
 
gi|178843|gb|K06396.1|HUMAPOE3      GGAGCTGCAGGCGGCGCAGGCCCGGCTGGGCGCGGACATGGAGGACGTGT
 
                                    **************************************************
 
 
gi|178350|gb|K00296.1|HUMAPOE3      GCGGCCGCCTGGTGCAGTACCGCGGCGAGGTGCAGGCCATGCTCGGCCAG
 
gi|189350|gb|K10296.1|HUMAPOE3      GCGGCCGCCTGGTGCAGTACCGCGGCGAGGTGCAGGCCATGCTCGGCCAG
 
gi|178850|gb|K00396.1|HUMAPOE3      GCGGCCGCCTGGTGCAGTACCGCGGCGAGGTGCAGGCCATGCTCGGCCAG
 
gi|178843|gb|K06396.1|HUMAPOE3      GCGGCCGCCTGGTGCAGTACCGCGGCGAGGTGCAGGCCATGCTCGGCCAG
 
                                    **************************************************
 
 
gi|178350|gb|K00296.1|HUMAPOE3      AGCACCGAGGAGCTGCGGGTGCGCCTCGCCTCCCACCTGCGCAAGCTGCG
 
gi|189350|gb|K10296.1|HUMAPOE3      AGCACCGAGGAGCTGCGGGTGCGCCTCGCCTCCCACCTGCGCAAGCTGCG
 
gi|178850|gb|K00396.1|HUMAPOE3      AGCACCGAGGAGCTGCGGGTGCGCCTCGCCTCCCACCTGCGCAAGCTGCG
 
gi|178843|gb|K06396.1|HUMAPOE3      AGCACCGAGGAGCTGCGGGTGCGCCTCGCCTCCCACCTGCGCAAGCTGCG
 
                                    **************************************************
 
 
gi|178350|gb|K00296.1|HUMAPOE3      TAAGCGGCTCCTCCGCGATGCCGATGACCTGCAGAAGCGCCTGGCAGTGT
 
gi|189350|gb|K10296.1|HUMAPOE3      TAAGCGGCTCCTCCGCGATGCCGATGACCTGCAGAAGCGCCTGGCAGTGT
 
gi|178850|gb|K00396.1|HUMAPOE3      TAAGCGGCTCCTCCGCGATGCCGATGACCTGCAGAAGCGCCTGGCAGTGT
 
gi|178843|gb|K06396.1|HUMAPOE3      TAAGCGGCTCCTCCGCGATGCCGATGACCTGCAGAAGCGCCTGGCAGTGT
 
                                    **************************************************
 
 
gi|178350|gb|K00296.1|HUMAPOE3      ACCAGGCCGGGGCCCGCGAGGGCGCCGAGCGCGGCCTCAGCGCCATCCGC
 
gi|189350|gb|K10296.1|HUMAPOE3      ACCAGGCCGGGGCCCGCGAGGGCGCCGAGCGCGGCCTCAGCGCCATCCGC
 
gi|178850|gb|K00396.1|HUMAPOE3      ACCAGGCCGGGGCCCGCGAGGGCGCCGAGCGCGGCCTCAGCGCCATCCGC
 
gi|178843|gb|K06396.1|HUMAPOE3      ACCAT-------CCCGCGAGGGCGCCGAGCGCGGCCTCAGCGCCATCCGC
 
                                    ****        **************************************
 
 
gi|178350|gb|K00296.1|HUMAPOE3      GAGCGCCTGGGGCCCCTGGTGGAACAGGGCCGCGTGCGGGCCGCCACTGT
 
gi|189350|gb|K10296.1|HUMAPOE3      GAGCGCCTGGGGCCCCTGGTGGAACAGGGCCGCGTGCGGGCCGCCACTGT
 
gi|178850|gb|K00396.1|HUMAPOE3      GAGCGCCTGGGGCCCCTGGTGGAACAGGGCCGCGTGCGGGCCGCCACTGT
 
gi|178843|gb|K06396.1|HUMAPOE3      GAGCGCCTGGGGCCCCTGGTGGAACAGGGCCGCGTGCGGGCCGCCACTGT
 
                                    **************************************************
 
 
gi|178350|gb|K00296.1|HUMAPOE3      GGGCTCCCTGGCCGGCCAGCCGCTACAGGAGCGGGCCCAGGCCTGGGGCG
 
gi|189350|gb|K10296.1|HUMAPOE3      GGGCTCCCTGGCCGGCCAGCCGCTACAGGAGCGGGCCCAGGCCTGGGGCG
 
gi|178850|gb|K00396.1|HUMAPOE3      GGGCTCCCTGGCCGGCCAGCCGCTACAGGAGCGGGCCCAGGCCTGGGGCG
 
gi|178843|gb|K06396.1|HUMAPOE3      GGGCTCCCTGGCCGGCCAGCCGCTACAGGAGCGGGCCCAGGCCTGGGGCG
 
                                    **************************************************
 
 
gi|178350|gb|K00296.1|HUMAPOE3      AGCGGCTGCGCGCGCGGATGGAGGAGATGGGCAGCCGGACCCGCGACCGC
 
gi|189350|gb|K10296.1|HUMAPOE3      AGCGGCTGCGCGCGCGGATGGAGGAGATGGGCAGCCGGACCCGCGACCGC
 
gi|178850|gb|K00396.1|HUMAPOE3      AGCGGCTGCGCGCGCGGATGGAGGAGATGGGCAGCCGGACCCGCGACCGC
 
gi|178843|gb|K06396.1|HUMAPOE3      AGCGGCTGCGCGCGCGGATGGAGGAGATGGGCAGCCGGACCCGCGACCGC
 
                                    **************************************************
 
 
gi|178350|gb|K00296.1|HUMAPOE3      CTGGACGAGGTGAAGGAGCAGGTGGCGGAGGTGCGCGCCAAGCTGGAGGA
 
gi|189350|gb|K10296.1|HUMAPOE3      CTGGACGAGGTGAAGGAGCAGGTGGCGGAGGTGCGCGCCAAGCTGGAGGA
 
gi|178850|gb|K00396.1|HUMAPOE3      CTGGACGAGGTGAAGGAGCAGGTGGCGGAGGTGCGCGCCAAGCTGGAGGA
 
gi|178843|gb|K06396.1|HUMAPOE3      CTGGACGAGGTGAAGGAGCAGGTGGCGGAGGTGCGCGCCAAGCTGGAGGA
 
                                    **************************************************
 
 
gi|178350|gb|K00296.1|HUMAPOE3      GCAGGCCCAGCAGATACGCCTGCAGGCCGAGGCCTTCCAGGCCCGCCTCA
 
gi|189350|gb|K10296.1|HUMAPOE3      GCAGGCCCAGCAGATACGCCTGCAGGCCGAGGCCTTCCAGGCCCGCCTCA
 
gi|178850|gb|K00396.1|HUMAPOE3      GCAGGCCCAGCAGATACGCCTGCAGGCCGAGGCCTTCCAGGCCCGCCTCA
 
gi|178843|gb|K06396.1|HUMAPOE3      GCAGGCCCAGCAGATACGCCTGCAGGCCGAGGCCTTCCAGGCCCGCCTCA
 
                                    **************************************************
 
 
gi|178350|gb|K00296.1|HUMAPOE3      AGAGCTGGTTCGAGCCCCTGGTGGAAGACATGCAGCGCCAGTGGGCCGGG
 
gi|189350|gb|K10296.1|HUMAPOE3      AGAGCTGGTTCGAGCCCCTGGTGGAAGACATGCAGCGCCAGTGGGCCGGG
 
gi|178850|gb|K00396.1|HUMAPOE3      AGAGCTGGTTCGAGCCCCTGGTGGAAGACATGCAGCGCCAGTGGGCCGGG
 
gi|178843|gb|K06396.1|HUMAPOE3      AGAGCTGGTTCGAGCCCCTGGTGGAAGACATGCAGCGCCAGTGGGCCGGG
 
                                    **************************************************
 
 
gi|178350|gb|K00296.1|HUMAPOE3      CTGGTGGAGAAGGTGCAGGCTGCCGTGGGCACCAGCGCCGCCCCTGTGCC
 
gi|189350|gb|K10296.1|HUMAPOE3      CTGGTGGAGAAGGTGCAGGCTGCCGTGGGCACCAGCGCCGCCCCTGTGCC
 
gi|178850|gb|K00396.1|HUMAPOE3      CTGGTGGAGAAGGTGCAGGCTGCCGTGGGCACCAGCGCCGCCCCTGTGCC
 
gi|178843|gb|K06396.1|HUMAPOE3      CTGGTGGAGAAGGTGCAGGCTGCCGTGGGCACCAGCGCCGCCCCTGTGCC
 
                                    **************************************************
 
 
gi|178350|gb|K00296.1|HUMAPOE3      CAGCGACAATCACTGAACGCCGAAGCCTGCAGCCATGCGACCCCACGCCA
 
gi|189350|gb|K10296.1|HUMAPOE3      CAGCGACAATCACTGAACGCCGAAGCCTGCAGCCATGCGACCCCACGCCA
 
gi|178850|gb|K00396.1|HUMAPOE3      CAGCGACAATCACTGAACGCCGAAGCCTGCAGCCATGCGACCCCACGCCA
 
gi|178843|gb|K06396.1|HUMAPOE3      CAGCGACAATCACTGAACGCCGAAGCCTGCAGCCATGCGACCCCACGCCA
 
                                    **************************************************
 
 
gi|178350|gb|K00296.1|HUMAPOE3      CCCCGTGCCTCCTGCCTCCGCGCAGCCTGCAGCGGGAGACCCTGTCCCCG
 
gi|189350|gb|K10296.1|HUMAPOE3      CCCCGTGCCTCCTGCCTCCGCGCAGCCTGCAGCGGGAGACCCTGTCCCCG
 
gi|178850|gb|K00396.1|HUMAPOE3      CCCCGTGCCTCCTGCCTCCGCGCAGCCTGCAGCGGGAGACCCTGTCCCCG
 
gi|178843|gb|K06396.1|HUMAPOE3      CCCCGTGCCTCCTGCCTCCGCGCAGCCTGCAGCGGGAGACCCTGTCCCCG
 
                                    **************************************************
 
 
gi|178350|gb|K00296.1|HUMAPOE3      CCCCAGCCGTCCTCCTGGGGTGGACCCTAGTTTAATAAAGATTCACCAAG
 
gi|189350|gb|K10296.1|HUMAPOE3      CCCCAGCCGTCCTCCTGGGGTGGACCCTAGTTTAATAAAGATTCACCAAG
 
gi|178850|gb|K00396.1|HUMAPOE3      CCCCAGCCGTCCTCCTGGGGTGGACCCTAGTTTAATAAAGATTCACCAAG
 
gi|178843|gb|K06396.1|HUMAPOE3      CCCCAGCCGTCCTCCTGGGGTGGACCCTAGTTTAATAAAGATTCACCAAG
 
                                    **************************************************
 
 
gi|178350|gb|K00296.1|HUMAPOE3      TTTCACGT
 
gi|189350|gb|K10296.1|HUMAPOE3      TTTCACGT
 
gi|178850|gb|K00396.1|HUMAPOE3      TTTCACGC
 
gi|178843|gb|K06396.1|HUMAPOE3      TTTCACGC
 
                                    *******
 
</pre>
 
Each of the two files contains two sequences (I made fake changes to each).
 

Latest revision as of 21:21, 19 February 2007