Harvard:Biophysics 101/Notebook:ZS/2007-4-12

What I plan to do in the near future
We've established that there is a demand to classify rs#'s based on clinical importance - one way to go about doing this relatively painlessly is to determine how many people are diagnosed with the disorder when given a disease name. With that in mind, I found a excel file from California which has disease names linked to the CDC naming convention, along with number diagnosed; if I can mine this file given a disease ID, or better yet just an rs#, then we have a basis by which to judge clinical importance.

Note: I'm really sorry guys, but I won't be able to work on this until after the 16th - I have the MCAT on the morning of, but afterwards I'll get to it. Hopefully it won't be too complicated though.

Resources for disease frequency
--Zsun
 * CDC Database of morbidity: http://www.cdc.gov/nchs/icd9.htm, in text files which can be parsed
 * Diagnosis Frequency data from state of California: http://www.oshpd.ca.gov/hqad/PatientLevel/ICD9_Codes/index.htm
 * CDC information for causes of death (in pdf) http://www.cdc.gov/nchs/fastats/lcod.htm