Harvard:Biophysics 101/2007/Notebook:Mdwang/2007-4-13

Deliverable by April 19: Based on a list of rs ID's from the dbSNP search, return a list of gene loci and the cumulative percentage of mutation in the population for all disease related SNPs in that loci.

Current Issues So unfortunately, the deliverable will not be met today. I'm still having some issues determining exactly what sort of frequency data I should be trying to accumulate. It seems as if genecards is the best source of allele frequencies, but I can't just sum those up because many of them are in the same haplotype. [Link here]

Since brca1 is a pretty well studied gene, I looked at the hapmap data for it, but the allele frequencies output seems rather uninterpretable. [Link here]

The overall goal that I'm envisioning is some sort of metric that corresponds to the total frequency of all deleterious alleles in a particular gene, but maybe that's too ambitious. Ideally, the "harmfulness" of a particular allele could be weighted by the relative risk or something similar, but I'm pretty sure that would be too ambitious.