Difference between revisions of "Harvard:Biophysics 101/2009/Project"

From OpenWetWare
Jump to: navigation, search
m (Infrastructure Background)
Line 5: Line 5:
==Infrastructure Background==
==Infrastructure Background==
[[Harvard:Biophysics_101/2009/Infrastructure|Coming Soon!]]
[[Harvard:Biophysics_101/2009/Infrastructure|Coming Soon!]]
Request from Thu Nov 5 Class: Paper on human quantitative trait example [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2687076/?tool=pubmed | on height].

Revision as of 05:41, 9 November 2009

Biophysics 101: Genomics, Computing, and Economics


Home        People        Schedule        Project        Assignments        Python        Help       

Infrastructure Background

Coming Soon!

Request from Thu Nov 5 Class: Paper on human quantitative trait example | on height.


Epistasis article and class notes// ZMF Here is the article on epistasis mentioned in class today. It provides a terrific overview of epistasis. Also here are notes from today's class if anyone wants to review for inspiration.

Nov 2. Meeting Report // Alex L Last night, Hossein, Ben, Zach and I met in Quincy to discuss the project. In today's class, we will present the project plan we came up with, and will propose the adoption of a common language in which to discuss the project. Provided that the group agrees with the framework, coding could start as soon as today!

Here's that overview of the structure of what resources we have from trait-o-matic: File:Trait-o-matic overview.pdf. Keep in mind that this is a work in progress and I will update it once I gain more and more of an understanding of how things work. Feel free to ask specific questions, and keep in mind any additional data we might want to use for this project. - FZ

Hey! We created a user page for discussion and ideas: Biology Aspect of Project(AR,AT,JN,RT)

Diagram of project/groups // ZMF Here is the diagram from today's class - Zach

Zach Frankel Jobs I think need to be done:

  1. High level work on finding ways to organize data on polygenic inheritance
  2. Organizing Database of multifactorial inheritance
  3. Describing analytic method to evaluate inhe
  4. Learning trait-o-matic code and structure
  5. Integrating code to include analysis of polygenic traits

Possible additional work/applications

  1. SNPCupid type analytic tool add-on
  2. Pharmacogenetic tool add-on

The role I would like to take is on the interface of designing an algorithm to understand polygenic inheritance etc... and integrating this into the program. However, I would like to see how others imagine working and work in a cooperative framework.

  • Development process:
  • Step 1: make sure my combination is useful. To do this, I'll need to learn about the nature of uncertainty in gene combination expression. I'd love to find a book or paper on this or something. This will help me determine three things: 1) whether correlative predictions will be valuable to gene combination research at all; 2) if it is useful, what data is needed; and 3) whether the PGP does/is able to collect this data from its participants. Ideally I'd do this in the next couple weeks. Checkpoint: At this point, I'll come back to the class and get feedback about whether the rest of my project is worth pursuing.
  • Step 2: Identify prototypical examples of gene combinations with <100% certainty that could be tested.
  • Step 3: Create new variation of Trait-O-Matic that scans PGP participants and outputs "X percent of PGP participants have gene A"
  • Step 4: Extend this tool to link with the observational data, if possible. If not, figure out what else to produce as a final product.
  • Why I want to do it: I am motivated much more by long term impact than any of the other drivers we've talked about. High level obsevation: I am entering this project with the goal of: "Assume in 2012 the PGP has 100,000 genomes. What can I do now to make this tool as robust as possible?" I am slightly disappointed that we aren't taking this view with the project at large, but that is the question I am trying to answer with my contribution. If anybody has further suggestions for me, please let me know!
  • I think that two tools will be crucial for the PGP in the long run: 1) a researcher-facing API; and 2) a mechanism for continuous, dynamic data collection from participants. I think my project could go a long way to framing how these tools will work. If nothing else, my final project could be a report of "here is what I've learned in our class project about an API and data collection need to be improved in the PGP." In my experience with software development, this kind of experience can be really valuable.

Following up on Tues Class // George Church 10-28-09.

Notes From the Other Day // Zach Frankel 10-26-09 Check out the semi-organized notes from last Thursday. Sorry for not getting these up earlier.

Further comments // Alex L 10-24-09

After the discussion many of us had on Thursday in Quincy, it seems like we're definitely leaning towards the SNPCupid project, provided that we can find a better name for it :) Joe already pointed out its advantages, so I won't go over them again.

In addition, there is no reason why we wouldn't integrate some of the other ideas (concerning metabolism, for instance) as extra features of the software: it seems like once we have the basic framework, it would be fairly doable, and indeed preferable, to add as much functionality as possible.

And so this leads me to my main point: the semester is already halfway done and we really need to start working on the project!!! I think it's best to develop it leisurely and in small increments, allowing us to reflect on the project and improve it as we actually create it, rather than rush to finish it in a week. Again, the sooner we start, the more features we could implement in it; and a richer feature set is what's needed to take the project to the next level.

Tonight's Discussion // JT 10-22-09

Thanks again to Zach for booking the room in Quincy - the meeting was fun and productive, and I feel like a lot of ground was covered. Zach has said he'll post notes from the meeting - until then, just a recap on my own conclusions and thoughts going forward:

  • The consensus seems to be that we should move forward on SNPCupid, which I'm happy to do. It's attractive because it's: (1) do-able, (2) not obviously being done already, (3) would afford opportunities for each of us to use our unique talents (biological, computational, etc.), (4) is something "real people," rather than just researchers, could use and find valuable.
  • That said, I'm eager to hear alternative project proposals and debate their merits. Like I said, SNPCupid has some desirable traits, but it's certainly less "sexy" than other projects we debated (metabolic or other networks, genealogy-type analyses, etc.)
  • In any case, and to avoid this dragging on indefinitely, I think we ought to put all possible projects to a vote at some point in the coming week or two. As someone pointed out, we need to start working on some project or we'll run out of time!

Today was fun, let's hope there are more like it-

Other thought about metabolism project // ZMF 10-21-09

One thing I want to make sure we consider is making sure we are sufficiently ambitious to not be doing work already done. In particular, I have been trying to review some of the literature on the application of pharmacogenomics to drug dosing etc... I think there are a couple paths we might want to consider

  1. Effects of multiple genes: According to the conclusion of one of the reviews I found this: " Inherited difference in a single gene has such a profound effect on the pharmacokinetics or pharmacodynamics of a drug that interindividual difference in one gene has a clinically important effect on drug response. These are the “low-hanging fruit” of pharmacogenetics. However, the effects of most drugs are determined by many proteins, and composite genetic polymorphisms in multiple genes coupled with nongenetic factors will be found to determine drug response." Finding a novel way to statistically guess at the polygenetic factors might be interesting. We might even consider developing the groundwork for a method that we could implement once we have more genomes sequenced.
  2. Aggregating information that might already exist and making it into a usable tool. This is not quite as ambitious but if other research has already compiled 'low hanging fruits' perhaps we could apply those. This seems to be the direction we have largely talked about, but I think making the distinction is important

Class Project Topics:

Response to JT // ZMF 10-21-09 Joe, I think you raise a valid point about the limits of a metabolism based project. While you are certainly right that it would be hard to predict how arbitrary alleles influence certain pathways, there are certainly some pathways we understand quite well. Though your example of glycolisis emphasizes that even very well studied pathways elude complete understanding, I think there is a threshhold of knowing enough. In particular, if we know enough to even suggest genes for further research, the project has done something useful. Ie. if we identify an SNP as affecting a pathway which we think influences Codeine metabolism and then we find the allele in actual humans correlates with being an ultra-rapid codeine metaboliser, we are onto something. I think it would be a bit over-ambitious to assume any project could be completely diagnostic in nature - that is to say, without looking at the statistical effect of an allele in humans, we can only make suggestions. Nonetheless, I think these suggestions a) are useful in and of themselves and b) once more data is available on people, their phenotypes(in particular their metabolic phenotypes), and c) their genomes - the project could be a very useful tool. Admittedly, this is thinking ahead, but I think that's what we should be doing. Also, I just went through this paper and thought it was both interesting and relevant to the project. It presents a case study of a way to look for the role of SNPs in pharmacogenetic pathways - I'll post more on this soon. Cheers,


Do We Understand Glycolysis? // JT 10-20-09
While in principle I like the idea of a metabolism-based project, I'd like to inject some skepticism. Here's a nice paper on yeast glycolysis: File:Glycolysis.pdf. Even if the methods are a bit foreign to some of you, I think the intro and conclusions can suggest some of the problems in a metabolism-based project. So a few concerns:

  • In the absence of high-quality systems-level models of all the metabolic pathways in humans (which we do not have), how can we say something meaningful about the role of individual human alleles in such a pathway?
  • Ought we to assume such a model will exist soon (my guess is the $1000 genome will happen first!) and be useful, or is it easier to simply connect known human alleles to their observed phenotypes, without resorting to complex metabolic models?

There seems to be broad agreement that whatever project we work on should be forward-thinking, but still plausible and potentially useful. My concerns are not meant to be criticisms - but I do hope they'll generate some debate over which potential project are plausible and which are perhaps *too* forward-thinking.

'til Thursday-

"SNPCupid" // Anna Turetsky & Joe Torella

  • We decided to build on Anna's project idea for a genetic testing service which would allow couples to assess, before having children, what phenotypes those children might inherit (and with what probability). These phenotypes range from the medically relevant (i-cell disease, diabetes) to the cosmetic (male pattern baldness, eye color), to the beneficial (intelligence, athletic ability).
  • A starting point for this kind of work is to cross-reference the SNPedia, which catalogs easily/cheaply-testable human SNPs associated with some phenotype, and OMIM, which contains a wealth of information on the heritability of genes associated with those SNPs. GeneTests[1] can also be used to gather up-to-date information about medically relevant genetic tests currently able to be offered. The program would focus mostly on recessive inheritance patterns, with X-linked recessive traits providing an additional layer of complexity. In addition, due to the non-standard formatting of heritability information in OMIM, parsing the data in a systematic way would provide an interesting (and I think "do-able") challenge.
  • Check out our talk page for an example of this idea using male-pattern baldness, and for a discussion of how something like this might be implemented in a systematic way.

In Response to SNPCupid //Ben Leibowicz

Your idea of a genetically oriented "dating service" sparked my interest in our last class meeting and I think it's a good jumping-off point. It also seems like something we could do given what we have, which is really just computers and access to some existing genomes. I wrote a bit more about this idea in my talk page and I'm curious whether you think such an idea is a significant enough departure from our current situation to be considered Human 2.0, as what it would be doing is simply to allow a group of individuals to manipulate human evolution in a seemingly positive way through advanced information collection and processing. Doesn't this beg the question: should we instead be focusing on technologies that will allow us to manipulate the genome of an offspring in a way that prevents inheritance of genetic disorders where it would naturally occur? This seems to be realistic in the not-so-distant future and might make such a genetic dating service obsolete. What do you think?

JT: I should say first that I'm not really behind the idea of this as some sort of "dating service" (although, admittedly, the project name implies it pretty strongly). I think it's more appropriate to think of this as a tool for inferring F1 phenotypes from very complex parental genotypes, in human populations. At any rate, there's a lot here, so I'll just focus on two questions: (1) is this enough of a departure for "human 2.0," and (2) shouldn't we be focused on interventions rather than informatics? Here are my answers:

  1. If we consider the cloud of ideas that has gone around regarding human 2.0, they are primarily concepts in which some human ability is enhanced by virtue of greater information. Personalized medicine is something we think of as obviously "human 2.0," but it is not fundamentally different from present-day medicine; it simply updates modern medical treatment with personal (genetic) information. Similarly, we find romantic partners largely through instinct, and it seems logical that part of "human 2.0" would be to incorporate the new information we have (again, genetic) into partner-finding decisions - and that's the idea here. In conclusion, since we think of personalized medicine as "human 2.0," I think it is fair to consider this "SNPCupid" idea similarly "human 2.0" in character.
  2. Your second question is whether we should focus on interventions, rather than informatics. But I feel that question sets up a false dichotomy; I'm not sure the boundary between informatics and intervention is so great. For instance, some countries are beginning to approve preimplantation genetic screening for in vitro fertilization, to avoid undesirable medical problems: Spain allows preimplantation genetic screening for cancer. Basically, they used in vitro fertilization to produce fertilized embryos, and implanted only those not carrying a gene greatly increasing the risk of breast and ovarian cancer. Since reliable genetic manipulation of human embryos is (seemingly) a long way off, such in vitro selection methods are the best way of avoiding or encouraging the inheritance of certain traits. In order to do this, however, we require knowledge of what traits the child is likely to inherit, and ways of testing for it, before we can rationally select which embryos are "best" (and while I'm aware this drifts into that dark and stormy 'eugenics' category, I think most parents would jump at the chance to prevent their child from inheriting an 80% probability of breast cancer!).

Medicine 2.0 // Filip Zembowicz, Zach Frankel, Alex Ratner, Alex Lupsasca, Ben Leibowicz

More on Medicine 2.0 // Jackie Nkeube and Brett Thomas We also really liked the idea of a drug metabolism tool. We had some additional ideas we wanted to add:

  • We made a high level design of the research implications of such a tool here. We identified differences in CYP540 expression across populations as an area that data from Filip's tool could really advance. Factors to consider include race, gender, age, any others? (Brett wonders if diet and activity are relevant too)
  • In general, we think that any project like this should be designed with research implications in mind, as the success of a personalized drug recommendation engine depends on the underlying research.
  • On that note, we think that the drug metabolism tool should strive to be self learning, which would have major implications to the underlying architecture. The way it could be self learning is to track user observations about their responses to drugs. This would require some sort of feedback mechanism for users to optionally tell us how the dosage worked.
  • There are some websites that provide drug interaction services - maybe we can experiment with an addon to one of them instead of reinventing the wheel.

Some Data to play around with // Filip Zembowicz 12:01, 13 October 2009 (EDT):

  • I've taken the metabolic pathway data from under COMPOUND and REACTION the KEGG database ([[2]]) and scraped it into a mySQL database. The following two excel files hold the majority of the compounds involved in biosynthesis and metabolism, along with a listing of the particular biochemical reactions that the compounds take part in. In addition, there is a file that lists all of the reactions, enumerating the compounds that are substrates and products.
  • Right now the schemata for these databases are quite primitive, I hope to in the next few days combine the REACTION data together with the ENZYME database, so that we can see what particular enzymes are active in a particular pathway, and from there we can actually start looking for mutations in those enzymes' genetic information for a particular genome. I might consider using some of the other approaches to accessing the KEGG data as well. I'll also put up a visualization of this data, once I get my database internet-accessible again, since right now I am just hosting it on my own computer.
  • There is also a KEGG DRUG database that has information about drugs approved in Japan, the US, and Europe, including interesting things such as which enzymes are the targets of particular drugs
  • Do explore KEGG -- it has a lot of interesting data!
  • File:Compounds.xls
  • File:Reaction.xls

Interesting Links: