James C. Clements: Week 12

Further analysis of yeast data. Powerpoint containing data is here: [[Media:Clements-Yeast_Cold_Shock_Gene_Expression_Analysis.ppt|Clements_PPT]]

Analyzing and Interpreting STEM Results

 * 1) Select one of the profiles you saved in the previous step for further intepretation of the data.  We suggest that you choose one that has a pattern of up- or down-regulated genes at the early (first three) timepoints.  Answer the following:
 * 2) * Why did you select this profile? In other words, why was it intersting to you?
 * 3) ** I selected this profile (profile 28 for Schade data) because it had a significant p value and genes were clearly upregulated during the early timepoints. This profile is interesting to me because it is unregulated at first and then has similar regulation as its control for the larger time points.
 * 4) * How many genes belong to this profile?
 * 5) ** 62 genes were assigned to this profile.
 * 6) * How many genes were expected to belong to this profile?
 * 7) ** 187 genes were expected to belong to this profile.
 * 8) * What is the p value for the enrichment of genes in this profile? Bear in mind that in  last week's assignment, you computed p values to determine whether each individual gene had a significant change in gene expression at each time point.  This p value determines whether the number of genes that show this particular expression profile across the time points is significantly more than expected.
 * 9) ** The p value of this profile was 2.6*10 -38 (significant).
 * 10) * Open the GO list file you saved for this profile in Excel. This list shows all of the Gene Ontology terms that are associated with genes that fit this profile.  Select the third row and then choose from the menu Data > Filter > Autofilter.  Filter on the "p-value" column to show only GO terms that have a p value of < 0.05.  How many GO terms are associated with this profile at p < 0.05?  The GO list also has a column called "Corrected p-value".  This correction is needed because the software has performed thousands of significance tests.  Filter on the "Corrected p-value" column to show only GO terms that have a corrected p value of < 0.05.  How many GO terms are associated with this profile with a corrected p value < 0.05?
 * 11) ** GO terms with uncorrected p < 0.05: 63 terms
 * 12) ** GO terms with corrected p < 0.05: 11 terms
 * 13) * Select 10 Gene Ontology terms from your filtered list (either p < 0.05 or corrected p < 0.05). Look up the definitions for each of the terms at http://geneontology.org.  Write a paragraph that describes the biological interpretation of these GO terms.  In other words, why does the cell react to cold shock by changing the expression of genes associated with these GO terms?
 * 14) ** Used list from corrected p value <0.05
 * 15) ribosome biogenesis: A cellular process that results in the biosynthesis of constituent macromolecules, assembly, and arrangement of constituent parts of a ribosomal subunit; includes transport to the sites of protein synthesis.
 * 16) ribosomal subunit export from nucleus: The directed movement of a ribosomal subunit from the nucleus into the cytoplasm.
 * 17) ribonucleoprotein complex export from nucleus: The directed movement of a ribonucleoprotein complex from the nucleus to the cytoplasm.
 * 18) rRNA-containing ribonucleoprotein complex export from nucleus: The directed movement of a ribonucleoprotein complex that contains ribosomal RNA from the nucleus to the cytoplasm.
 * 19) ribonucleoprotein complex localization: Any process in which a ribonucleoprotein complex is transported to, or maintained in, a specific location within a cell.
 * 20) ribosome localization: A process in which a ribosome is transported to, and/or maintained in, a specific location.
 * 21) establishment of ribosome localization: The directed movement of the ribosome to a specific location.
 * 22) ribonucleoprotein complex biogenesis: A cellular process that results in the biosynthesis of constituent macromolecules, assembly, and arrangement of constituent parts of a complex containing RNA and proteins. Includes the biosynthesis of the constituent RNA and protein molecules, and those macromolecular modifications that are involved in synthesis or assembly of the ribonucleoprotein complex.
 * 23) rRNA processing: Any process involved in the conversion of a primary ribosomal RNA (rRNA) transcript into one or more mature rRNA molecules.
 * 24) nucleolar part: Any constituent part of a nucleolus, a small, dense body one or more of which are present in the nucleus of eukaryotic cells. It is rich in RNA and protein, is not bounded by a limiting membrane, and is not seen during mitosis.
 * 25) rRNA metabolic process: The chemical reactions and pathways involving rRNA, ribosomal RNA, a structural constituent of ribosomes.
 * 26) * The vast majority of these terms are related to adjusting the parameters of protein manufacturing. These genes move ribosomes throughout the cell, metabolize ribosomes, and modify rRNA within ribosomes. This could be evidence of the cell adjusting to a difference in protein or protein manufacturing mRNA stability because of cold. Another possibility is that the cell is up-regulating its protein manufacturing centers in order to create the proteins necessary to survive the cold.

Using YEASTRACT
CIN5 CUP9 FHL1 GTS1 HSF1 MSN1 MSN4 NRG1 RAP1 RCS1 REB1 ROX1 RPH1 YAP1 YAP6
 * 1) Answer the following questions:
 * 2) * What are the top 10 transcription factors in your results? List them on your wiki page with the percent of the genes in your cluster that they each regulate.
 * 3) **Ste12: 27.6%
 * 4) **Rap1: 19.5%
 * 5) **Ino4: 12.4%
 * 6) **Sok2: 10.8%
 * 7) **Yap1: 10.3%
 * 8) **Gcn4: 9.7%
 * 9) **Fhl1: 9.7%
 * 10) **Phd1: 9.7%
 * 11) **Yap6: 9.2%
 * 12) **Sko1: 8.1%
 * 13) * Is Gln3 on the list? What percentage of the genes in the cluster does it regulate?  How many genes does it regulate?  What are the names of the genes?
 * 14) ** Gln3 is on the list. It regulates 3.2% of the genes in the cluster (which is 6 genes). The names of the genes are: ENA1, ZRT1, NMD3, AAH1, MCH4, ASN1
 * 15) For the mathematical model that we will build in class, we need to define a gene regulatory network of transcription factors that regulate other transcription factors.  We can use YEASTRACT to assist us with creating the network.  The model that we will start with has the following transcription factors in it:


 * We will also include GLN3 because it is known to regulate the genes that code for enzymes in nitrogen metabolism. Based on your previous analysis of the transcription factors that regulate your chosen cluster above, select up to five additional transcription factors to add to the network.  Which transcription factors do you want to add to the model and why?
 * I want to add: Ste12, Ino4, Sok2, Gcn4, and Phd1 because they all regulated at least 9% of the genes in the cluster.