Open writing projects/Sage and cython a brief introduction: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
No edit summary
No edit summary
Line 79: Line 79:
== Cython ==
== Cython ==
Sage initially used an alternative to SWIG (described in Julius's article) called Pyrex to compile Python code to C when performance concerns demanded it.  Because they needed to extend Pyrex in various ways, they created a friendly fork of Pyrex called "Cython".  I believe it is fair to say that Cython is the easiest way to create C code in Python.
Sage initially used an alternative to SWIG (described in Julius's article) called Pyrex to compile Python code to C when performance concerns demanded it.  Because they needed to extend Pyrex in various ways, they created a friendly fork of Pyrex called "Cython".  I believe it is fair to say that Cython is the easiest way to create C code in Python.
Lets look at a toy example of Cython usage.  Consider the following python function, which we can imagine is the inner loop of some more complex operation:
<syntax type="python">
def slow_logistic(n,a,lamb = 3.82):
    q = a
    for i in range(n):
        q = lamb*q*(1-q)
    return q
time a = slow_function(10^6,.5)
</syntax> 
<syntax type="python">
</syntax> 
<syntax type="python">
</syntax> 


(TODO: example of Cython usage)
(TODO: example of Cython usage)

Revision as of 10:06, 2 May 2008

Work in progress

Please check back later for the final version...

Abstract

This is a quick introduction to Sage, a powerful new computational platform that builds on the strengths of Python. This article was directly inspired by Julius B. Lucks' "Python: All A Scientist Needs"; I recommend reading it first as it explains some of the attractions of Python and biopython.

Sage is a free and open-source project for computation of all sorts that uses Python as its primary language and "glue". One of the goals of Sage is to provide a viable free and open-source alternative to Matlab, Maple, and Mathematica. Sage unifies a great deal of open-source mathematical and statistical software; it includes biopython as an optional package and the statistics language R by default.

Sage notebook interface

A key feature of Sage is its notebook web-browser interface.

Jose Unpingco has made a good short introductory video on the notebook interface that may help get a sense of what its like.

The notebook starts by default in native Sage mode, which is almost identical to python. However, other environments are available. The screenshot below shows a Sage notebook in R mode, with some simple R commands executed.

The @interact command in the Sage notebook provides an easy way to make simple GUIs to explore data. In the example below, a user can enter the URL of a fasta-formatted protein file and a PROSITE-style regular expression. Using biopython and the "re" module of python we can search for and display matches to the pattern. For this screenshot, I used proteins from the malaria-causing Plasmodium falciparum and a fragment of the transthyretin pattern (Prosite PS00768). This is just for illustrative purposes - I do not claim any significance for these matches.

<syntax type="python"> def PStoRE(PrositePattern):

   """
   Converts a PROSITE regular expression to a python r.e.
   """
   rePattern = PrositePattern
   rePattern = rePattern.replace('-',)
   rePattern = rePattern.replace(' ',)
   rePattern = rePattern.replace('x','.')
   rePattern = rePattern.replace('{','[^')
   rePattern = rePattern.replace('}',']')
   rePattern = rePattern.replace('(','{')
   rePattern = rePattern.replace(')','}')
   return rePattern

from Bio import Fasta import re import urllib2 as U @interact def re_scan(fasta_file_url = 'http://www.d.umn.edu/~mhampton/PlasProtsRef.fa', pat = input_box('G - x - P - [AG] - x(2) - [LIVM] - x - [IV] ', type = str, width = 60)):

   re_pat = re.compile(PStoRE(pat))
   parser = Fasta.RecordParser()
   prot_file = U.urlopen(fasta_file_url)
   fasta_iterator = Fasta.Iterator(prot_file, parser = parser)
   for record in fasta_iterator:
       matches = re_pat.findall(record.sequence)
       if len(matches) != 0:
           html(record.title)
           html(matches)
           print 

</syntax>


Here is another example of the interact command, along with some of Sage's 2D plotting. We take a human mitochondrial DNA sequence and plot the fraction of CG dinucleotides in a window of variable size. This fraction is divided by 16, to normalize it against the expected value in the case of independently, uniformly distributed nucleotides (obviously a poor model in this case).

<syntax type = "python"> from Bio import SeqIO import urllib2 as U import scipy.stats as Stat f = U.urlopen('http://www.d.umn.edu/~mhampton/HomoSmito.fa') p = SeqIO.parse(f, 'fasta') hsmito = p.next().seq.tostring() f.close() display_f = RealField(16) @interact def cgplot(window_length = slider([2^i for i in range(4,12)],default = 2^9)):

   avg = [16.0*hsmito[x-window_length: x+window_length].count('CG')/(2*window_length-1) for x in range(window_length, len(hsmito) - window_length)]
   mean = display_f(Stat.mean(avg))
   std = display_f(Stat.std(avg))
   html('Ratio of CG dinucleotides in a window of size ' + str(2*window_length) + ' to the expected fraction 1/16 
in the human mitochondrion.
Mean value: ' + str(mean) + '; standard deviation: ' + str(std)) show(list_plot(avg, plotjoined=True), ymin = 0, ymax = max(avg))

</syntax>

For more (mostly mathematical) examples of the @interact command, see the corresponding Sage interact wiki page.

Cython

Sage initially used an alternative to SWIG (described in Julius's article) called Pyrex to compile Python code to C when performance concerns demanded it. Because they needed to extend Pyrex in various ways, they created a friendly fork of Pyrex called "Cython". I believe it is fair to say that Cython is the easiest way to create C code in Python.

Lets look at a toy example of Cython usage. Consider the following python function, which we can imagine is the inner loop of some more complex operation: <syntax type="python"> def slow_logistic(n,a,lamb = 3.82):

   q = a
   for i in range(n):
       q = lamb*q*(1-q)
   return q

time a = slow_function(10^6,.5) </syntax>

<syntax type="python"> </syntax>

<syntax type="python"> </syntax>

(TODO: example of Cython usage)