# Open writing projects/Beginning Python for Bioinformatics

 This page is part of the Open Writing Project Beginning_Python_for_Bioinformatics. (More Open Writing Projects.)

## Beginning at the Beginning

This site is based on the book Beginning Perl for Bioinformatics by James Tisdal which was published in 2001. My idea here is to follow the structure of the book, analysing each chapter and converting the Perl scripts into Python. The original book is very well written and an excellent starting point for any aspiring bioinformatician. It is helpful if you are a biologist that does not understand programming or a computer scientist that does not know a lot of biology.

In no way this site tries to plagiarize the book, as it is only used as an starting point (a very good one indeed) to this journey into Python. Here you will not find biological concept explanations and criticisms towards Perl. Making this clear, I will start from the beginning.

## Why Python (and not Perl)?

According to the official Python website:

Python and Perl come from a similar background (Unix scripting, which both have long outgrown) [to learn more about that check this tutorial], and sport many similar features, but have a different philosophy. Perl emphasizes support for common application-oriented tasks, e.g. by having built-in regular expressions, file scanning and report generating features. Python emphasizes support for common programming methodologies such as data structure design and object-oriented programming, and encourages programmers to write readable (and thus maintainable) code by providing an elegant but not overly cryptic notation. As a consequence, Python comes close to Perl but rarely beats it in its original application domain; however Python has an applicability well beyond Perl's niche.

I couldn't explain better than that. But still I have to give my take on why I prefer Python over Perl, and why I decided to use it in my day-to-day programming. Python code is "extremely" readable; in no-time you can grasp it completely. OK, I admit that it has at least one odd feature for the non-computer savvy: the "mandatory" indentation. In Python you have to indent loops, if clauses, function definitions, etc. Maybe this is the first and only hard step to get, but after a couple of hours of coding you will be pleased how pretty your code looks.

## Setting Up Python on Your Computer

Python is freely available for all types of computers (Windows, Macintosh, Linux, etc). Macs and Linux machines have a version of Python installed as part of the standard operating system. Windows users will have to install Python as an Application. Python is frequently updated, and the update to to version 3.0 has many significant changes. All users are encouraged to install a current version of Python from python.org [1]

There are three basic ways to work with Python on your computer.

1) You can open a terminal window and start up Python as an interactive command line application. On most systems the command to launch Python is python3. Working in interactive mode has the advantage that commands are executed as soon as you type them (and press the enter/return key). However, it is more difficult to make changes and debug mistakes. It is very difficult to develop programs that are more than a few lines long interactively.

2) You can type a set of Python commands in a text file, then save that text file and covert it to an executable script (see below).

3) You can work in an Interactive Development Environment (an application that works as an editor for Python code). All of the downloadable packages from python.org contain the IDE called "IDLE".

## Hands on code: Sequences and strings - part I

As pointed out in Beginning Perl for Bioinformatics, a large percentage of bioinformatics methods deal with data as strings of text, especially DNA and amino acids sequence data. DNA is composed of four different nucleotide bases: A, C, T and G; while proteins contain 20 amino acids. Each one of these elements have one letter of the alphabet assigned to them. In the DNA case an additional set of letters are used as ambiguity codes to represent positions which may be occupied by one or more different nucleotides. [2] .