Our (Anugraha's [], Kelly's [],and Ridhi's []) idea for the class project was to create a computationally-based array that would allow us to discover new genotype-phenotype connections as well as check existing links that have been previously documented. To do this, we would need access to the genomic data from the Personal Genome Project, as well as the test subject's personal phenotypes. Then, we would scroll through all of the PGP data and search for genes that were found in multiple cases of a certain phenotype. We could expand this out to test all possible ORFs and all possible genotypes, utilizing a similar approach to running a DNA microarray experiment in wet-lab biology. We would then get a list of genes that were overexpressed in individuals expressing a certain phenotype. We would then use OMIM [], GeneTests [], and SNPedia[] to see if any of the genes were already documented. All three databases might have an API that would allow this part of the program to run easier; otherwise, we could study how to search online databases via some sort of webcrawler. This would allow us to focus on generating novel hypotheses about which genes might be linked to which phenotypes.
In order to show that our method is working, we would first take test subjects from PGP with known phenotypes and genotypes, and take a gene which is widely documented in OMIM and other primary research to produce a certain phenotype. Then we would run a sequence alignment with the genotypic sequence from OMIM and see if this sequence is present in the genotype of the known person from the PGP database with the desired phenotype. We would also double check by putting this sequence into SNPedia to see if it gives us the known phenotype. Once we have done this, we would have shown that we can use this method to find new phenotype-genotype associations. The program would then be developed for beta release.
As an additional side note, such a program would have the capability to take on a variety of tangential functions. For example, we could expand it to look for polygenic traits. Theoretically, our algorithm would have already identified any genes that were overexpressed. We could then add code to see if there were any instances in which phenotypes existed without the presence of all of our expected genes. This could lead to the identification of true contributors, thus expanding the program's potential and driving it towards the frontier of systems biology - data-driven research.
To infinity - and Human 2.0 - and beyond!
Kelly and Anugraha
Assignment 4. / Ridhi, another idea
I wanted to mention one more idea...it is based on the fact that personalized medicine on an individual level may never be economically feasible or practical. However, it would be useful to start identifying higher probabilities of certain allele mutations within some sort of “sub-population” based on the groups with low allele frequency differentiation. This may or may not align itself nicely with geographical, religious, cultural and other traditional means of categorization.
I also don’t know if this is being done on some level already, but we could go through OMIM, parse out their autosomal recessive diseases and try to categorize them based on which of the “sub-groups” show a higher probability of alleles associated with these diseases. In the end, I am envisioning a SNP sequencing company which provides basically nothing beyond minimum sequencing on a cost + small margin basis, but tells you what “sub-population” you belong to. The individual takes this information, looks at the associated diseases and risks on SNPedia and orders a custom made tray to only test for their relevant SNPs. Could this also lower the cost of testing?
GeneTests/SNPedia/OMIM assignment, Daniel Jordan & Jason Zhang
This is not necessarily explicitly related to the idea of Human 2.0, but an interesting idea towards what we called "Medicine 2.0" in class. Biologists are starting to realize that many diseases (particularly chronic diseases) we previously thought unrelated are actually closely related to each other. When we're designing our next-generation treatments for chronic diseases (and especially for diseases of aging) it will be more useful to group diseases by genetic etiology rather than by symptoms, as we now do. We tried an approach to finding genetically linked groups of diseases using OMIM. The short version is, we generated a list of diseases genetically related to type 2 diabetes. As expected, many are very different from type 2 diabetes, though many of these are also already known to be related to diabetes. See Daniel's talk page for the list.