Harvard:Biophysics 101/2007/Notebook:Christopher Nabel/2007-2-8

From OpenWetWare
Jump to: navigation, search

Assignment Due Feb 8

Proposed Program Construction

The goal of this program is to generate 10,000 10-letter strings, consisting solely of the letters H and T, and analyze these strings for varying overlapping stretches of H or T.

My ideal program would work in three parts. First, I would generate a loop that would create a string of 10 random letters. Next, as I generate each individual string, I would add it to a master list (before generating a new string). Once the master list is complete, I would run it through a screen for homogeneous letter stretches of successive length.

Technical Difficulties

I was unable to execute this design model as I ran into serious problems in the first phase of writing the program. I experimented with while and for loops, trying to add 10 randomly chosen variables to my initial string. I failed each time. I do not know how to implement the join function that other members of the class seem to use. Once this barrier is overcome, I think that I have the knowledge to use the append function to assemble the master list, and then analyze the master list through the code written for us on the assignment due the 6th. This is where I currently stand with this assignment, and more explanation of python and string manipulation would be greatly appreciated.

Updated Assignment

After helpful advice from Shawn, I was able to understand some of the programming basics. Here is my completed assignment:


#!/usr/bin/env python

# Load random operations for generation of random 10-mers
import random 

# Create an empty list to store the 10-mers, an empty string
# for the individual 10-mers, and a specific reference string
# for when we sample through the data set
data = []
string = []
refstring = []

# Generate 10,000 10-mers using a for loop and add those to the list
for i in range(10000):
    string = ''.join([random.choice('HT') for n in range(10)])

# Iterate through the list and count up the stretches of H's and T's
# to automate the iterations, I will incorporate an additional loop to scan for
# each variable
possibilities = ['H','T']
print "Using method 2 from the Feb. 1 Assignment..."
for s in possibilities:
    for i in range(10):
        tally = 0 # We need a new variable to keep track of total substrings
        substr = ''.join([s for n in range(i+1)])
        for j in range(10000):
            refstring = data[j]
            count = 0
            pos = refstring.find(substr,0)
            while not pos == -1:
                count = count + 1
                tally = tally + 1
                pos = refstring.find(substr,pos+1)
        print substr, tally

Program Output

>>> ================================ RESTART ================================
Using method 2 from the Feb. 1 Assignment...
H 49909
HH 22455
HHH 9985
HHHH 4349
HHHHH 1893
T 50091
TT 22645
TTT 10122
TTTT 4515
TTTTT 1950

Back to my user page