Talk:Open writing projects/Python all a scientist needs

From OpenWetWare

Revision as of 10:58, 19 June 2008 by Dan Bolser (Talk | contribs)
(diff) ←Older revision | Current revision (diff) | Newer revision→ (diff)
Jump to: navigation, search

Please Leave A Comment

Adrian Del Maestro 15:04, 24 February 2008 (EST): Very nice article on the scientific uses of python. Using python to produce publication quality plots via matplotlib has saved me hours of time as the scope and results of a project evolve. I also enjoyed your comments on data provenance, which is an very important topic that many scientists doing numerics are somewhat cavalier about.

My usual approach to large numerical projects is to use python as a scripting glue for analysis, provenance and plotting of data produced using large scale c++ codes. After reading the article, I think that I will attempt to do all my prototyping, coding and profiling completely in python, then use SWIG where appropriate for my next numerical project.

One thing that might be helpful for scientists that are new to python would be some more elaborate discussion on the confusion surrounding the various array packages (i.e. numarray vs. numpy etc.)

Great work, keep the articles coming!



Joao Xavier: I read quickly your article and I liked it very much. Here are a few comments. I liked the paper a lot. Its great that you wrote up your experience with the genomic project. I certainly relate to the scientist who strugles with a number of programming tools for each project (java, matlab and python scripts). Although this works well for me, it's a nightmare when I have to pass it on to somebody else. Not to mention that it is even embarassing trying to explain all that "just do it" type of code that I use to glue many steps for the processing of large data sets. As you say, python could be the solution for this. My main problem with python is that being completely open there are many tools out there available to perform the same task. This can be overwhelming for someone starting up, like me. It would be great if you can say how you coped with this in your paper. For exqample, Im sure you tried other libraries before deciding to use matplotlib or the numeric packages you use. what do you suggest people do to avoid having to try many packages themselves? Are there webpages or discussion groups where python scientists can go for advice?

Also, did you try "sage", the free software for python that does a lot of maths including symbolic math? A colleague told me it's great and that it's growing amonh scientists.


The most complicated part of this discussion is the SWIG typemap stuff. The whole point of SWIG is to avoid the need to have to write these types of functions yourself. I think you could just have passed in a numpy array (of type "float64" I think) and it would just have worked without any need to typemap. Alternatively, SWIG provides some convenience functions for just this purpose. Just add the following to your interface file, and it provides the user a double_array function which do the Python list --> C array conversion for them.

%include "carrays.i"
%array_class(double, doubleArray)
%pythoncode %{
def double_array(mylist):
    """Create a C array of doubles from a list."""
    c = doubleArray(len(mylist))
    for i,v in enumerate(mylist):
        c[i] = v
    return c
%}

Comment from Peter Cock

Hello Julius,

I've just stumbled across your page: http://openwetware.org/wiki/Julius_B._Lucks/Projects/Python_All_A_Scientist_Needs

I just thought I'd point out a slight improvement to the Biopython issue raised in Note (5),

gb_parsed_record = SeqIO.parse(gb_file,"genbank").next() # (5)
...
(5) The Bio.SeqIO.parse method can parse a variety of formats. Here we
use it to parse the GenBank files on our local disk using the "genbank"
format parameter. The method returns a generator, who's next() method
is used to retrieve an object representing the parsed file.

I see you were using Biopython 1.44, but I just wanted to let you know that Biopython 1.45 introduced another function, Bio.SeqIO.read() for use in exactly this situation (when there is one and only one record in the sequence file).

i.e.

gb_parsed_record = SeqIO.read(gb_file,"genbank")

If the file contained no records, or more than one, then an exception would be raised. This prevents the possible problem of silently ignoring an unexpected second record which could happen with the original code using parse(...).next().

Peter

Software Carpentry

Not sure if you have seen this, but its based on Python:

http://www.osl.iu.edu/~lums/swc/

Personal tools