Open writing projects/Scientific Programming with Python and Subversion/Outline: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
0 Introduction: revamping editing
Line 7: Line 7:
*** project file management with version control software to promote data integrity and provenance
*** project file management with version control software to promote data integrity and provenance
*** modular code writing to promote code re-use and bug isolation
*** modular code writing to promote code re-use and bug isolation
***





Revision as of 00:58, 31 March 2008

Outline

0 Introduction

  • Why this book?
    • Motivation/Aim - There's lots of information about what you can do with computers in biology, chemistry, and physics, but little training in how to do it in a scientifically rigorous way. By combining programming practices used by professional software engineers with modern programming tools, this book teaches scientists a computational workflow that at its core promotes data integrity and reproduceablity.
    • How To Read - This book tells the story of a typical scientific investigation from start to finish - generation of raw data, processing of this data according to hypotheses, creating data visualizations, and modifying processing code in light of new hypotheses. Along the way we introduce pieces of our scientific workflow, and specific tools to carry them out
      • project file management with version control software to promote data integrity and provenance
      • modular code writing to promote code re-use and bug isolation


    • Assumes no prior knowledge of Python; introduces computing tools as they are needed in the context of a typical scientific investigation. This makes it useful to both beginners and more experienced users
    • goal - to make managing projects easier, but more importantly to promote good scientific practice using computing methods
  • Introduce scientific themes throughout the book
    • Covers themes from biology, informatics, and physics? - for informatics, maybe use examples from one of the NCBI coffee breaks

Part I: Intro to scientific programming using python

1 Why use python for scientific programming?

  • What is python?
    • computer language that offers easy access to high-level functions, and has a large and growing community of scientific users
  • Why build scientific applications in python?
    • python code looks clean - easy to understand yours or your collaborators code a week later
    • everything from data generation to analysis to plots can be done in python, making every aspect of your project consistent. These together promote good scientific practices (data integrity, data reproduceability)

2 Source Control Management with Subversion

  • What is source control?
    • Similar to Word 'track changes' or wiki 'history' but for all the files in a project.
    • A way to keep a history of every step in a process.
    • Not only for computer code, but for data, plots, paper manuscripts, etc.
  • An introduction to Subversion
    • What is a repository?
    • How to create a repository
    • How to make basic commits
    • Seeing differences between versions
    • Retrieving past versions
    • Collaboration using subversion
  • Advanced Topics
    • Branching and Merging


3 A brief introduction to python

  • What the scientist needs to know to get started
    • variable assignment
    • basic control structures
    • functions
    • package structure and import
    • objects (just like packages)
    • References to Programming Python for more detail, and A Byte of Python and Dive Into Python for more intro material

4 Making scientific plots with python

  • An introduction to matplotlib
    • basic functionality - simple line, bar, histogram plots
    • more sophisticated graphics - insets, labeling with text, drawing arrows
    • interactive graphics - adjusting parameters for real-time fitting
  • An example project use of matplotlib
    • bioinformatics
    • physics


5 Crunching numbers with python

  • Python community modules
    • using numpy for matrix manipulations
    • using the scipy project tools
    • interacting with the Gnu Scientific Library
  • An example project
    • bioinformatics
    • physics
    • others?


6 Unit testing for scientists

  • What is unit testing?
    • A way to generate automated tests of small units of code
  • Why do unit testing?
    • example: switching a sorting algorithm - how do you know the code works the same way
      • typically done by 'eye' by running the code manually and looking at output
      • with unit tests can see if the code failed, and if it did, where exactly
  • Using python and nose to write unit tests?
    • example of test code, and how to run the tests
      • bioinformatics
      • physics
  • How do I know which tests to write?
    • (This one is hard)


7 Advanced topics - using SWIG and psyco to speed up python code

  • (this section could be omitted initially)
  • What if python is not fast enough for my project?
    • Several options:
      • Use psyco to 'compile' the python code
      • Identify the slow parts and write them in C/C++ and bind them to python using SWIG
  • Using psyco
  • Using C with SWIG


Part II: Examples

  • Ideally we could have an svn repo set up for people to pull from to look at the code examples at each step of the way
  • A complete case study of [blah] from start to finish
    • Creating a code repository
    • Approaching the scientific problem with code
      • deconstruct the problem into manageable parts
        • bioinformatics - write the downloading and saving data files code
        • physics - write the basic parts of the simulation code
    • Writing your first tests
      • write unit tests for these basic codes
    • Getting more sophisticated
      • separating your code into modules
      • using objects to encapsulate the code cleanly
    • Rinse, Lather, Repeat
      • a general methodology for approaching the scientific problems
        • start with the simplest possible task and write a script for it
        • move this code into a module and write unit tests for it
        • objectify the code when appropriate
        • identify speed bottle-necks if needed, and speed up those parts