Alyssa N Gomes Week 12 Journal

This week we are working with YEASTRACT. Our experiment is analyzing the 6000 genes in yeast (targets) and looking more directly at the 250 that code for transcription (regulators). We'll be looking at the binding studies (Lee et al 2002), expression studies, location of DNA binding sites of transcription factors. All three factors have potential for false positives, but binding studies also has a chance for false negative. What we need is a list of transcription factors that help us see the generated gene regulatory network.

Methods

Turn on the file extensions
By downloading your selected gene profile and downloading it from LionShare, we open it in Excel
Going onto the YEASTRACT website, click rank by RF on the left-hand side
Paste your list of Gene Symbols from your selected gene profile into the Target Genes window
Check the box "Check for all TFs" and make sure all are set on the default (DNA binding plus expression evidence, TF acting as activator or inhibitor) and click Search
The p values colored green are considered "significant", the ones colored yellow are considered "borderline significant" and the ones colored pink are considered "not significant"
- How many transcription factors are green or "significant"? 23

List the "significant" transcription factors on your wiki page, along with the corresponding "% in user set", "% in YEASTRACT", and "p value". Are CIN5, GLN3, HMO1, and ZAP1 on the list? The significant factors are:

There are 24 transcription factors that are green or "significant"

"Significant" transcription factors:
- Sfp1p
  - % in user set: 79.53%
  - % in Yeastract: 9.41%
  - p-value: 0
- Fkh2p
  - % in user set: 21.44%%
  - % in Yeastract: 15.76%%
  - p-value: 0
- Yhp1p
  - % in user set: 38.60%%
  - % in Yeastract: 15.38%%
  - p-value: 0
- Yox1p
  - % in user set: 41.13%%
  - % in Yeastract: 14.78%%
  - p-value: 0
- Cyc8p
  - % in user set: 0.39%
  - % in Yeastract: 100.00%
  - p-value: 0
- YLR278C
  - % in user set: 14.62%%
  - % in Yeastract: 17.65%%
  - p-value: 2.9E-14
- Ace2p
  - % in user set: 81.29%%
  - % in Yeastract: 8.73%%
  - p-value: 6.4E-14
- Rif1p
  - % in user set: 12.87%
  - % in Yeastract: 18.44%%
  - p-value: 1.25E-13
- Msn2p
  - % in user set: 63.35%%
  - % in Yeastract: 9.53%%
  - p-value: 1.69E-13
- Cse2p
  - % in user set: 21.25%%
  - % in Yeastract: 14.05%%
  - p-value: 4.67E-13
- Stb5p
  - % in user set: 27.88%%
  - % in Yeastract: 12.31%%
  - p-value: 2.672E-12
- Ndt80p
  - % in user set: 15.59%%
  - % in Yeastract: 14.08%%
  - p-value: 7.8902E-10
- Asg1p
  - % in user set: 8.77%%
  - % in Yeastract: 17.58%%
  - p-value: 4.5755E-09
- Msn4p
  - % in user set: 47.95%%
  - % in Yeastract: 9.59%%
  - p-value: 4.6451E-09
- Mig2p
  - % in user set: 9.75%%
  - % in Yeastract: 16.29%%
  - p-value: 1.0326E-08
- Snf2p
  - % in user set: 40.35%
  - % in Yeastract: 9.95%
  - p-value: 1.0656E-08
- Swi5p
  - % in user set: 38.21%
  - % in Yeastract: 10.08%
  - p-value: 1.1467E-08
- Spt20p
  - % in user set: 38.01%
  - % in Yeastract: 10.07%
  - p-value: 1.4665E-08
- Snf6p
  - % in user set: 46.98%
  - % in Yeastract: 9.13%
  - p-value: 9.9913E-07
- Pdr1p
  - % in user set: 28.46%
  - % in Yeastract: 10.15%
  - p-value: 2.4577E-06
- Gcr2p
  - % in user set: 25.73%
  - % in Yeastract: 10.09%
  - p-value: 7.7693E-06
- Gat3p
  - % in user set: 10.92%
  - % in Yeastract: 12.56%
  - p-value: 1.1840E-05
- Mcm1p
  - % in user set: 31.19%
  - % in Yeastract: 9.58%
  - p-value: 1.4349E-05
- Pop2p
  - % in user set: 5.46%
  - % in Yeastract: 15.64%
  - p-value: 2.8483E-05

CIN5, GLN3, HMO1, and ZAP1 are not listed on the significant value list, but are listed under the red/insignificant value genes
22 Transcription factors appeared on both lists. They were
- Ace2p
- Asg1p
- Cse2p
- Cyc8p
- Fkh2p
- Gcr2p
- Mcm1p
- Mig2p
- Msn2p
- Msn4p
- Ndt80p
- Pdr1p
- Rif1p
- Sfp1p
- Snf2p
- Snf6p
- Spt20p
- Stb5p
- Swi5p
- Yhp1p
- YLR278C
- Yox1p
Both Tessa and I had 15-30 transcription factors so we did not undergo the procedure of adding more
The Excel sheet with document binding plus expression had 153 edges.
The Excel Sheet with DNA binding had 33 edges.
The DNA Binding PLUS Expression had 10 edges.
The DNA Binding PLUS Expression had 8 edges.
PPT: Tessa Morris and Alyssa Gomes PPT
Write a paragraph discussing and explaining the results of each aspect of today's work.
- Determining candidate transcription factors that regulate a cluster of genes from your dataset: When determining the candidate transcription factors that regulate the cluster of genes from my dataset, we used specific parameters into YEASTRACT, which sorted out the transcription factors based on significance, or smallest p-value. Comparing what GLN3 and Wild Type had in common both for Profile 45, there were 22/24 of the Wild Type transcription factors in common. We can see that there is a link in the cluster of genes selected between GLN 3 and Wild Type gene transcription in these values. In the Wild Type genes, none of CIN5, GLN3, HMO1, and ZAP1, factors assumed to be expected to show up as significant genes, did show up significantly. This makes me curious to see what the other gene profiles have listed for the significance of these factors.
- Creating three candidate gene regulatory networks: When sorting out the gene regulatory networks, we separated by DNA binding PLUS expression, Only DNA binding and DNA binding AND expression. To be honest, I'm not quite sure what the difference between these three gene regulatory networks are in terms of how the genes have been sorted, but in the order listed before, each set of genes factors numbers went down significantly from one to the next.
- Determining the total number of edges and degree distribution of your three gene regulatory networks: In determining the total number of edges, as the number of factors went down from Plus expression to Only DNA binding to DNA binding and expression, we see that the number of edges decreased as well. Because an excess number of edges may cause density in the model due to too many parameters, the PLUS expression had 133 which may be too dense. We see that the Only DNA binding set of networks had 33 edges, which was the closest to the target amount, 40-50. This GRNSight graph was shown very clearly. The DNA binding and Expression had less than the assumed 15-30 transcription factors, so the edges were only 8. This made this possibly unusable for research purposes as there may not be enough information given.

Visualizing the networks: Looking at the graphs, we see the varying frequencies and the differences between In/Out Total frequencies across the board. Refer to the PPT for further analysis of this. Looking at the GRNSight, it is hard to say what these graphs mean but we see the simplicities as less transcription factors work through. Choosing a particular gene regulatory network to pursue for the modeling: I have assumed that the preffered model will be the DNA binding ONLY one, as stated above. As I am not sure what occurs next, we can assume that this is a close enoguh value to the 40-50 transcription factors preferred in order to move on. This value has enough factors to get some information, yet a small enough number such that it wont over-clutter the model and confuse our assumptions to be made.

Alyssa N Gomes Week 12 Journal

Methods

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

research

Tools