Open writing projects/Sage and cython a brief introduction
From OpenWetWare
m (Marshall Hampton/Sage moved to User:Marshall Hampton/Sage) |
|||
| Line 16: | Line 16: | ||
Jose Unpingco has made a [http://sage.math.washington.edu/home/wdj/expository/unpingco/ good short introductory video] on the notebook interface that may help get a sense of what its like. | Jose Unpingco has made a [http://sage.math.washington.edu/home/wdj/expository/unpingco/ good short introductory video] on the notebook interface that may help get a sense of what its like. | ||
| + | |||
| + | The @interact command in the Sage notebook provides an easy way to make simple GUIs to explore data. In the example below, a user can enter the URL of a fasta-formatted protein file and a PROSITE-style regular expression. Using biopython and the "re" module of python we can search for and display matches to the pattern. For this screenshot, I used proteins from the malaria-causing Plasmodium falciparum and a fragment of the transthyretin pattern ([http://ca.expasy.org/cgi-bin/prosite-search-ac?PDOC00617 Prosite PS00768]). | ||
<syntax type="python"> | <syntax type="python"> | ||
Revision as of 16:51, 1 May 2008
Contents |
Work in progress
Please check back later for the final version...
Abstract
This is a quick introduction to Sage, a powerful new computational platform that builds on the strengths of Python. This article was directly inspired by Julius B. Lucks' "Python: All A Scientist Needs"; I recommend reading it first as it explains some of the attractions of Python and biopython.
Sage is a free and open-source project for computation of all sorts that uses Python as its primary language and "glue". One of the goals of Sage is to provide a viable free and open-source alternative to Matlab, Maple, and Mathematica. Sage unifies a great deal of open-source mathematical and statistical software; it includes biopython as an optional package and the statistics language R by default.
Sage notebook interface
(TODO: notebook interface screenshots, different computers, good 2-d graphics)
A key feature of Sage is its notebook web-browser interface.
Jose Unpingco has made a good short introductory video on the notebook interface that may help get a sense of what its like.
The @interact command in the Sage notebook provides an easy way to make simple GUIs to explore data. In the example below, a user can enter the URL of a fasta-formatted protein file and a PROSITE-style regular expression. Using biopython and the "re" module of python we can search for and display matches to the pattern. For this screenshot, I used proteins from the malaria-causing Plasmodium falciparum and a fragment of the transthyretin pattern (Prosite PS00768).
<syntax type="python"> def PStoRE(PrositePattern):
"""
Converts a PROSITE regular expression to a python r.e.
"""
rePattern = PrositePattern
rePattern = rePattern.replace('-',)
rePattern = rePattern.replace(' ',)
rePattern = rePattern.replace('x','.')
rePattern = rePattern.replace('{','[^')
rePattern = rePattern.replace('}',']')
rePattern = rePattern.replace('(','{')
rePattern = rePattern.replace(')','}')
return rePattern
from Bio import Fasta import re import urllib2 as U @interact def re_scan(fasta_file_url = 'http://www.d.umn.edu/~mhampton/PlasProtsRef.fa', pat = input_box('G - x - P - [AG] - x(2) - [LIVM] - x - [IV] ', type = str, width = 60)):
re_pat = re.compile(PStoRE(pat))
parser = Fasta.RecordParser()
prot_file = U.urlopen(fasta_file_url)
fasta_iterator = Fasta.Iterator(prot_file, parser = parser)
for record in fasta_iterator:
matches = re_pat.findall(record.sequence)
if len(matches) != 0:
html(record.title)
html(matches)
print
Cython
Sage initially used an alternative to SWIG (described in Julius's article) called Pyrex to compile Python code to C when performance concerns demanded it. Because they needed to extend Pyrex in various ways, they created a friendly fork of Pyrex called "Cython". I believe it is fair to say that Cython is the easiest way to create C code in Python.
(TODO: example of Cython usage)



