User:The Biology Group

From OpenWetWare
Jump to: navigation, search

Hey! In order to make it easier and more accessible, we have moved all discussion surrounding the biology of our potential project to this page.


Contact Information

  • Ridhi Tariyal (
  • Jackie Nkuebe (
  • Anugraha Raman (

Group Description

1. Coming up with concrete examples of epistatic interactions in polygenic traits.

2. Developing a method (in conjunction with the math modeling group) to accurately predict the correlation between disease risk and SNP "hierarchy" in human disease.

3. Determining the feasibility of given models.

Final Progress

Overall Ideology

Overall ideology.png
Continued ideology.png

We envision our target consumers as of now to be:

  • High school students (with mathematical skills, discretionary time and a keen sense of curiosity)
  • Biologists (specific high-end needs)
  • Experimental geneticists
  • Clinical geneticists

Relevant Literature

Eye color and the Prediction of Complex Phenotypes from Genotypes

In order to attempt to recreate their results with our own model, we first looked into the supplementary information. Unable to find a dataset containing actual genotypes of individuals involved in this study linked to their corresponding phenotypes, the authors of this paper were contacted. (See "Rotterdam Proposal and Other Outreach Efforts")

Pigmentation Paper and accompanying SNP Spreadsheet

This paper looks at pigmentation underlying variation in the skin, hair and eye color in human populations. Pigmentation is actually the underlying factor determining eye color. What is unique about this particular study is that they have also categorized the SNPs for pigmentation in hair/coat color by ethnicity. Having data separated by ethnicity will provide us with more future applications (see trait-o-matic add-ons section). In the spreadsheet below we have narrowed down SNPs important for eye color determination from this paper. These SNPs are primarily found in the OCA2and HERC2 genes.
SNPs important for Eye Color determination
Approximately 34 SNPs are listed here, but in reality studies have narrowed it down to 6 important ones. We hope that the mathematical models can do the same. The rsids, chromosome location, gene and alleles are all listed in this spreadsheet.

Aggregate GWAS Studies Spreadsheet (See tool portion for more information) This spreadsheets aggregates GWAS studies up to december 2009. It includes the date published, author, pubmed id, link to the journal, what particular trait the study looked at, sample and replication sample size, info about the snp (gene, position on chromosome, rsid#, strongest risk allele, risk allele frequency), and some statistical info about each GWAS study included in the compilation. To see more data on what types of GWAS studies were included in this compilation see [this]. In order to better navigate through this information a tool was developed (See tool section).

LDL and Cholesterol GWAS study and accompanying SNP Spreadsheet

We thought it would be interesting to look into LDL cholesterol, since high levels of LDL can lead to cardiovascular diseases. Above is a GWAS study and the SNPs identified from this study. See Background and Methods under the first link for more information on how these GWAS studies were done.


Final Presentation

Intermediate Eye-Color Presentation

Tool for finding SNPs and Relevant Literature

This tool uses the spreadsheet from the cumulative GWAS study to:

  1. Identify all traits studied by these GWAS studies and group them together
  2. List rsids linked to information from dbSNP about these snps (including in particular poulation diversity data (some of it which is linked to HapMap info))
  3. Link rsids to corresponding primary literature

Useful Tool Script

Important Files: GWAS text file, Frame style sheet

Formulated Data Set

Overall Process:


We attempted to get genomes of actual people accompanied by phenotypic data; however we were unable to get that. The author of the eye color study told us that if we wrote a proposal we would most likely be able to obtain this data ( See Rotterdam Proposal Section)

In order to aid the modeling group we created data sets. The data set shown below is one for eye color. Taking 20 individuals with twelve different SNPs inmportant to eye color we created a genotypic matrix. After meeting for the modeling group we learned that they wanted to deal with a binary system of zeroes and ones that would turn even continuous traits into seemingly "binary" traits, since this would be easier to model. For example, eye color would be a continuous trait, since in our model we use SNPs with genotypes that yield blue, intermediate, or brown eyes. Then for each SNP, the individual could be homozygous dominant, heterozygous, homozygous recessive. In order to accomidate for all three cases we listed for each SNP 2 categories of homozygous dominant and heterozygous. We marked 0 if they didn't have the trait and 1 if they did have the trait. By process of elimination, or by the presence of a one we could accomadate for three possibilies using a binary system.

We then looked at the phenotypes yielded by each row. In this case blue, brown, intermediate. We then devised simple mathematical rules using all SNPs in question to come up with values. Ranges of values were then correlated with different phenotypes. The goal of the modeling team is to see if they can generate these rules using their model.

Below are 3 seperately generated data sets (3 different randomly generated matricies) that each use two rules as models:

Values in the following ranges correspond to the following phenotypes: (0-4)= Blue (5-12)= Intermediate (13-19)= Brown
  1. (.5*homozygous recessive SNP1 + 2*homozygous recessive SNP3+ 3*heterozygous SNP6+ 12*heterozygous SNP10)
  2. (.67*heterozygous SNP2+ 1.5*homozygous recessive SNP4+ 5*homozygous recessive SNP7+ 4*heterozygous SNP9+ .4*homozygous recessive SNP11)
Data Set 1
Data Set 2
Data Set 3

Rotterdam Proposal and Other Outreach Efforts

Ridhi contacted Drs. Kayser and Liu, authors of a paper on eye color and the prediction of complex phenotypes from genotypes. When asked how to obtain an anonymous, yet real data set, with genotypic and corresponding phenotypic data for the purpose of testing statistical models, they responded telling us to write a proposal to the management team of the Rotterdam Study. However, they did indicate that since certain expectations from the researchers requesting the data are usually in place before such data can be given out, there is the possibility that we wouldn't be given this data set. Dr.Liu stressed that creating dummy data would not be straightforward due to [linkage-disequilibrium], and he suggested to download [HapMap] data and create phenotypes based on genotypes at specific loci.

Here is the proposal that was sent to the management team.

Ridhi also tried contacting Dr.Shriver for a real data set, but we have yet to receive a real data set.

Jacqui contacted Amy Carmargo at the Broad Institute. She works on the genotyping, sequencing and haplotype determination of [candidate genes]. Her paper "Association of genetic variants in KCNH2 with QT interval duration in the Framingham Heart Study" was of particular interest to us because this study had a good documentation of the SNP Genotypes and Echocardiographic Phenotypes]. We wanted to see if we could get a real data-set from this study to test our model with.

After it became clear that the next best thing to having the corresponding data sets from these studies, would actually be to download HapMap data, Jacqui was able to successfully view data on SNPs for eye color after downloading HaploView.


In class Professor Church had mentioned the problem of chromosome location standardization. Since documentation has not been standardized, different locations in different studies that correlate snps with phenotypes could actually be addressing the same chromosomal location. In order to address issues related to this we contacted Bruce Birren who works on genome-wide mapping and sequencing programs in humans and directed sequencing projects for microbes at the Broad Institute.

Trait-o-matic add-ons

  • We thought that it would be very useful if one could type in a particular SNP location and get a listing of all of the genotypes for that location for everyone in the trait-o-matic database. This would act as a first step for building a model that tested the association between SNPs and phenotypic expressions based on research. This tool was then [implemented] into trait-o-matic by the infrastructure group.
  • We also thought it would be interesting if trait-o-matic would allow us to could search for SNPs that show a high minor allele frequency, and to then look for which ethnicities have the greatest variation for that SNP. To further this idea, it would be able to pick a region in the genome (based on characteristics that this region is generally known to modulate) and see a matrix revealing different allele frequencies by ethnicity. The tool above expands on the latter idea, but both ideas have yet to be implemented into trait-o-matic.
  • A tutorial that walks a user through how to use the new add ons to Trait-o-Matic would increase the appeal of the tool.

Future Directions and Cool Applications (Wish List)

Future Modeling

  • Once we obtain real genotypic data with corresponding phenotypic data, in addition to the 10 individuals in the PGP, we will be able to train and test out data sets and a much more representative group. We will also be able to make more realistic models.

Future Functional Tools

  • In order to have better predictions of phenotype from genotype, we should take in to account environmental factors as well. Such factors could include family history, diet, exposure to carcinogens. A functional tool that combined the human phenome project, environmental knowledge, genotypic data and Trait-o-Matic in a consistent and easily usable way would be of great value to users.
Environmental factors.jpg
  • Protein-Protein interactions would also allow us to increase the accuracy of our predictions. Not only would this include the interactions of proteins produced by the genes where the SNPs are lcoated, but could potentially include surrounding proteins or proteins necessary for the gene corresponding to the SNP to function correctly. For example in ABO blood typing, depending on the form of the H antigen inherited by the child, the child could potentially have a phenotype corresponding to an allele that neither one of the parents are carriers for, since the H antigen is a precursor for the formation of A and B antigens in blood.
  • We began this project heading in the pharmacogenetics direction. It would be interesting to look at traits corresponding to reactions to certain drugs, and traits corresponding to various diseases. A dosage based tool could potentially be developed with SNP data for these traits.

Future User Interface Ideas

  • It would be cool to expand the trait-o-matic user interface to include a 3D visualization of some kind where the user could click on a different portion of the human to look at traits associated with that particular area, and then view more detailed SNP data for those traits, accompanied by population specific information.
  • A user interface that allows the user to choose a variety of traits from drop down menus and then creates a potential DNA sequence corresponding to the desired traits. It could also create a potential image of this person. This would allow for potential forensics applications.