User talk:Alex Lupsasca
Hello, Alex Lupsasca! This is a welcome message from OpenWetWare. By the way, we've announced you on the home page! You can leave messages to any OWW member by editing their User_talk pages like this one. And don't forget to personalize your User Page so that we can get to know you better! We've included some tips below to get you started.
Conclusion: my final report on the project
Throughout this semester, I have discovered many topics whose existence I had not even suspected: genomics is indeed an exciting field that's picking up a lot of speed (it's all exponential!) and a lot of amazing advances have been made; many more are just around the corner...
This was also quite challenging for me, mainly because my biology background coming in was quite weak. At the end of the course, I wouldn't say I'm a biology pro, but I've picked up the basics while perusing the literature and there's some areas, notably in modeling, which I feel I've got a good grasp on (enough to work on my own models!).
I have worked extensively as part of the Modeling Group to produce our logistic regression model and the accompanying software implementation. Together with Ben, Daniel and Zach, the following code was produced:
This file contains the code for our software, together with some fake data to check that it worked out. Working on this code allowed me to (re)discover Python, which I hadn't touched in a long while (I usually code in C++). This code should (hopefully) learn how SNPs interact and based on that, predict traits based on a genotype
Finally, I wrote a PDF documentation of the work of our group, which includes most of the math in our model. This constitutes my synthesis of our work done on the project:
DOCUMENTATION & Final Report (~13 pages so far)
This should prove very useful to anyone trying to run our program, and invaluable to anyone trying to find out why/how it works. Zach and I are planning to keep working on the model & software; nonetheless, someone will hopefully continue with this project in the future (the next generation of Biophysics 101 students?). In this case, I hope they find this report useful and that it allows them to start where we left off...
See also project page for class-wide comments.
12/9 Update Yesterday, Zach, Alex L., Daniel, and I met for a full day of coding; we wrote a basic software implementation of our current model, which we'll present in our next class. We'd like to discuss possible issues with the model, related to sample size, for instance...
12/5 Update Ben, Zach and I met to discuss the direction of the project; we currently have a working model which we are in the process of coding
11/20 Update The modeling group met to discuss snp interaction. We perused the literature looking for existing models and started work on our own...
After the discussion many of us had on Thursday in Quincy, it seems like we're definitely leaning towards the SNPCupid project, provided that we can find a better name for it :) This project seems more attractive because it's at the right level of difficulty (not so easy, but not impossible either) and we should be able to complete it by the end of the semester, provided we start now. Of course, one of its advantages is that we can make it as complex as we want: there's no obvious cut-off point in terms of possibilities, and the sooner we start, the more complete it will be.
On that note, there is no reason why we wouldn't integrate some of the other ideas (concerning metabolism, for instance) as extra features of the software: it seems like once we have the basic framework, it would be fairly doable, and indeed preferable, to add as much functionality as possible. I'll try to bring that up in our next discussion, as well as push for the vote on the choice of project... We need to start working very soon!
- Python installation: I am running Ubuntu, a linux distribution which comes pre-packaged with the latest version of Python, which I therefore did not need to install. In order to test the code provided by George Church in class, however, I had to install additional libraries for Python; on a Debian-based system like Ubuntu, the apt-get function renders this task trvial:
- I ran sudo apt-get install python-numpy to install the numpy library (and its associated packages), which provides a myriad of useful mathematical functions.
- I also ran sudo apt-get install python-matplotlib to obtain access to a library which allows for the creation of simple mathematical plots.
- Finally, upon executing Prof. Church's Python code, the interpreter complained about a missing library. This is highly unusual since apt-get usually manages dependencies automatically, but the problem was resolved by executing one last install command: sudo apt-get install python-tk
- That's all it took - ah, the pleasures of linux ;)
- Python coding: I then set out to code Python programs that would plot the three iterative processes presented by Prof. Church, with the following inputs: a seed (the initial value in the iterative process), a parameter k, and a number of iterations (which determines the number of plot points, and allows for the x-axis to label the iteration number). The source codes will not be made publicly available on this page, but will be provided upon request.
- The first difference equation, y[n+1]=k*y(n) turns out to be a discrete version of the exponential function k^x, whose behaviour is simple and familiar. If k<0, then the iterative process produces an exponential decay to 0; if k=0, the resulting curve is the constant line y=1; and if k>0, then it becomes an unchecked exponential growth curve.
- The next difference equation, y[n+1]=k*y(n)*(1-y[n]), is a discrete version of the logistic equation. If one takes the continuum limit of the equation, one obtains a differential equation whose analytic solution is often used in biology to model population dynamics (this requires a seed between 0 and 1). The logistic equation is known for its highly chaotic behaviour, i.e. the curve it produces is completely changed under even small variations of the parameter k. In fact, this is often studied with bifurcation plots of the logistic equation, or even by drawing a Lyapunov fractal.
- The last equation is the same as the previous one, except that it does not allow negative values in the iterative process - this is therefore a more accurate model of population dynamics; in particular, if the process ever hits 0, it will forever stay there (this is population extinction).
- Spreadsheet plotting: Finally, I used a spreadsheet editor to simultaneously plot the above curves with different values of k but the same seed. Again, the spreadsheet is available on demand.
- The first difference equation can clearly be seen to be the exponential function seed^k and different values of k make the process evolve as described above: k<0 and k>0 lead to exponential decay and growth, respectively, while k=0 creates a constant curve.
- The chaotic nature of the logistic equation is best revealed on a plot of curves with different values of k but the same seed: indeed, the curves are very different: some converge to 0, others to 1, and yet others go to other values. Some curves do not converge but rather oscillate between 2 points (cyclic behavior) or even evolve seemingly at random.
My Python code for Assignment 3:
In my mind, the idea of Human 2.0 is that of humanity freed from the influence of arbitrary mutations. All life on Earth so far has been subject to evolutionary forces beyond its control, and for the first time, humans are about to escape its grasp.
Human 2.0 represents the notion of man (re)writing its DNA to his liking. But such a vision is still a ways into the future: today, we are far from the level of understanding and technology required to write a human's DNA from scratch.
Thus, it seems like Human 1.1 is the intermediary step we need to reach before getting to Human 2.0, that is, we must first learn to manipulate specific traits (i.e. genes). In doing so, we can create a modular vision of man as a collection of parts: in short, we should adopt a reductionist approach to Human 2.0.
In practice, I propose that we take a small step in the direction of Human 2.0 by first apprehending Human 1.1 - maybe this approach is not ambitious enough (and certainly some of my friends' ideas are much more ambitious and engaging) but at least I think this goal is attainable. My idea, then, is to make a program which takes as inputs: 1) a large collection of DNA genomes of the members of a specific population; and 2) the DNA genome of a specific member of the population who possesses a trait whose gene we wish to isolate. Then after some statistical analysis, the program would output a list of genes which are likely to be responsible for the traits we wish to isolate.
For instance, suppose we get the genome of a savant with incredible mathematical abilities, and we wish to find the gene responsible for this desirable trait. Then, supposing this savant is a white German male, we could take the genomes of a large sample of other white German males (maybe from the same family!), and through some analysis, determine which genes are likely to be responsible for the trait. Hence, we could create a list of desirable traits and their corresponding genes - in the long run, these "modules" could be written into a DNA genome to create an improved human being.
Assignment 5 & 6
He explains the idea in a diagram here.
Basically, in a given population there exist variations of single nucleotides, called single nucleotide polymorphisms (SNPs). These may affect, among other things, the efficacy of enzymes which are responsible for the breaking down of certain drugs in the metabolism; thus, people will react more strongly or more weakly to a certain class of drugs depending on what corresponding SNP their genes code.
As such, by taking in a person's genome as input, a program could query SNPedia to find what specific SNPs a person carries. Once these variants are found, the software would interrogate Drug Bank to find out what drugs these variants correspond to; finally, it would output a list of these drugs and their associated effect on the person's metabolism (strong or weak, etc). This would allow for recommendations of drug dosage to be issued for patients whose genome has been sequenced.
Note: in my opinion, this is not really Human 2.0 but rather Medicine 2.0, i.e. personalized medicine. For my thoughts on Human 2.0 please visit the talk page of the People section of the wiki. I think Fil's idea would make for a useful mini-project, as it shouldn't be too difficult to implement and would immediately produce a useful medical tool.
Update: I familiarized myself with the databases. It seems to me that some match information types A and B, and others information types B and C, i.e. that they are complementary. As such, it would be useful to scrape their information into one single database that would combine all the information types together. I think Fil has already collected data from two of them, so it's definitely feasible.
Also, I read the Wikipedia entries, which was very useful for me since my background in biology is somewhat lacking. It also opened my eyes to the complexity of the projects we've proposed!
Alex, Ben and I talked over lunch about the direction of the project. Ben and I are quite taken with the idea of SNPcupid but now I am a bit concerned about the feasibility of the project (re: the above comment). In other terms, I don't think that we could really come up with complex predictions about two people's offspring by examining their DNA, beyond basic indications about possible diseases, which can already be done through individual tests which are much cheaper. (?)
Finally, I asked Harris to tell us more about Bioweather map, which I've been reading up on and seems very interesting. As such, Harris decided to have a presentation of the project on Thursday, after which we'll be able to see whether it matches our idea of Human 2.0...