Alyssa N Gomes Week 11 Journal
For this project, Tessa and I will be studying the difference between the Wild Type and gln3 DNA microarray data. Tessa will be focusing on the GLN3 data and I will be studying the Wild Type. In the end, we will come together and compare our results in order to gain a wider understanding of the gene transcription.
Statistical Analysis Part 1: ANOVA
- Download Professor Dahlquist's microarray data from Lionshare and save to desktop
- Looking at the assigned strain, record number of replicates done for each time period
- Create a new worksheet in Excel, labeling it 'stats'
- Copy the first two columns from sheet 1 and name it 'data'
- Row 1:
- Columns C-G label: wt_xbar_(TIME)
- Columns H and I label: wt_xbar_grand and wt_ss_HO.
- Columns J-N label: wt_ss_(TIME)
- Columns O, P, and Q label: wt_SS_full, Fstat and p-value.
- For C2 type =AVERAGE() and select all of the associated data in that row
- Repeat for all of the time periods
- For H2 labeled wt_xbar_grand take the average of C2-G2 (all time periods) and copy this formula down the column
- For I2 type =SUMSQ( then in the "data" sheet, highlight the data in Row 2 that is associated with the wt and t15 and copy this formula down the column
- Repeat for all of the time periods
- J2 type =SUMSQ(data!C2:F2)-n*stats!C2^2 and copy this formula down the column where n is the number of data points and do for each time period
- O2 type =sum(j2:n2) and copy down the column
- P2 type =((n-5)/5)*(i2-o2)/o2, where n is the total number of data points
- Q2 type =FDIST(P2,5,n-5), where n is the total number of data points
- Label R2 "wt_Bonferroni_p-value"
- R2 type =q2*6189 and copy down the column where 6189 is the total number of data points
- To see how many of the p-values are less than 0.0, use Sort & Filter >> Filter >> on drop down arrow for Q1 (p-value) Number Filter >> less than >> 0.05 >> OK
- To correct p-values that are greater than 1 by the number 1 in S2 type =IF(r2>1,1,r2)
Data & Observations Methods:
- Create a new sheet named "B&H".
- Create an index column by naming "Index" for A1. Then by typing 1, 2, 3 into the following rows and dragging it down, to 6189 to fill the index column.
- Copy and paste the column of ID's from your previous worksheet into the columns
- Sort the columns by ascending on C
- Title D1 'Rank', Repeat the series of numbers in ascending order from 1 to 6189 in order to rank p-value
- Type "wt_B-H_p-value" in E1. Type the following formula in E2: =(C2*6189)/D2 and bring to entire column
- Type "wt_B-H_p-value" into F1 and bring to entire column
- In F2: =IF(E2>1,1,E2) and bring to entire column
- Sort column F in ascending order while selecting all A through F
- Paste column F into the stats sheet
'Sanity Check: Number of genes significantly changed'
By analysing the number of values with p-values cut off we will confirm that we did the data analyses correctly. When we see p-value cut offs at times, we are saying that the gene expressions deviates this far from 0 this percentage (5%, 1%, .1%, 0.01%)
- Go to the "stats" worksheet and select row A and select the menu item Data > Filter > Autofilter (The funnel icon on the Data tab) in order to custom set the p-value to be less than 0.05, 0.01, 0.001, 0.0001 and see what genes have satisfied that parameter
- Now we can further our specifications by performings this p-value tests on Bonferroni and B&H values.
- Make a powerpoint slide about the comparison of p-values for dGLN3 vs Wild Type.
- Delete all but the SPOT (Index) and Gene Symbol (ID) and the time value average columns. Using the filter, make sure that only the p-values of <0.05 remain on the list and all VALUE rows have been deleted. Save as txt file
- After unzipping STEM software, select the file for part 1, choose Saccharomyces cerevisiae for part 2, No cross references and No gene locations for part 3 no normalization at the top and select Spot IDs included box at the top and then Execute.
- For Wild Type:
- t15: 4 replicates
- t30: 5 replicates
- t60: 4 replicates
- t90: 5 replicates
- t120: 5 replicates
- I had a difficult time with some of the Excel worksheet formatting due to missed asterisks or incomplete highlighting. This was fixed with the help of Dr. Fitzpatrick and Dr. Dahlquist as well as Tessa.
- From the initial sanity test:
- 2378/6189 genes have p-values of less than 0.05. This is a percentage of: 38.42%
- 1527/6189 genes have p-values of less than 0.01. This is a percentage of: 24.67%
- 860/6189 genes have p-values of less than 0.001. This is a percentage of: 13.90%
- 460/6189 genes have p-values of less than 0.0001. This is a percentage of: 7.43%
- From this, we see that for the confidence percentage, 2378/6189 (38.42%) gene expressions p-value deviated less than 5% from 0, 1527/6189 (24.67%) p-value deviated less than 1% from 0, 860/6189 (13.90%) had p-values deviating less than 0.1% from 0, and 460/6189 (7.43%) of gene expression -values deviated less than 0.01% from 0.
- For the corrected Bonferroni p-value, 228/6189 genes have p-values less than 0.05. This is a percentage of: 3.68%
- For the corrected B&H p-value, 1656/6189 genes have p-values less than 0.05. This is a percentage of: 26.76%
- Looking at the gene NSR1 (ID#YGR159C), a gene known to be induced by cold shock, we see the uncorrected p-value of: 1.43E-8
- Looking at the gene NSR1 (ID#YGR159C), a gene known to be induced by cold shock, we see the Bonferroni p-value of 8.86E-5
- Looking at the gene NSR1 (ID#YGR159C), a gene known to be induced by cold shock,we see the B&H p-value of 3.85E-6
- Looking at the gene NSR1 (ID#YGR159C), a gene known to be induced by cold shock, we see Log fold changes of: 3.067775 at T of 15, 3.3937 at T of 30, 3.413875 at T of 60, -1.41454 at T of 90,-0.57006 at T of 120
- I had a hard time using the Execute button on the STEM program and had to change to several computers. After using a library computer, I was able to execute the program.
- Analyzing and Interpreting STEM Results
- Why did you select this profile? In other words, why was it interesting to you? I selected Profile 45 because since the STEM profiles were ordered by significance levels, 45 was the first one that showed up, therefore it had the most significance in detalining changes in gene expression. At 0, the expression was all at 0 but rose quickly at 15M. Most expressions seemed to decrease after 15M but did not cross into down-expression untul after 60M.
- How many genes belong to this profile? 515 genes were assigned to this profile.
- How many genes were expected to belong to this profile? 43 genes were expected.
- What is the p value for the enrichment of genes in this profile? The p-value for this was 0, meaning that it was significant.
- How many GO terms are associated with this profile at p < 0.05? 211/786 had a p-value of less than 0.05 (26.84%).
- How many GO terms are associated with this profile with a corrected p value < 0.05? 31/786 had a corrected p-value of less than 0.05(3.94%)
- 10 Ontology Terms
- GO:0005730: Nucleolus:A small, dense body one or more of which are present in the nucleus of eukaryotic cells. It is rich in RNA and protein, is not bounded by a limiting membrane, and is not seen during mitosis.
- GO:0022613: Ribonucleoprotein complex biogenesis: A cellular process that results in the biosynthesis of constituent macromolecules, assembly, and arrangement of constituent parts of a complex containing RNA and proteins.
- GO:0042254: ribosome biogenesis:A cellular process that results in the biosynthesis of constituent macromolecules, assembly, and arrangement of constituent parts of ribosome subunits; includes transport to the sites of protein synthesis.
- GO:0016072: rRNA metabolic process: The chemical reactions and pathways involving rRNA, ribosomal RNA, a structural constituent of ribosomes.
- GO:0006364: rRNA processing: Any process involved in the conversion of a primary ribosomal RNA (rRNA) transcript into one or more mature rRNA molecules
- GO:0034660: ncRNA metabolic process: The chemical reactions and pathways involving non-coding RNA transcripts (ncRNAs).
- GO:0034470: ncRNA processing:Any process that results in the conversion of one or more primary non-coding RNA (ncRNA) transcripts into one or more mature ncRNA molecules.
- GO:0031981: nuclear lumen: The volume enclosed by the nuclear inner membrane.
- GO:0030684: preribosome: Any complex of pre-rRNAs, ribosomal proteins, and associated proteins formed during ribosome biogenesis.
- GO:0043233: organelle lumen: The internal volume enclosed by the membranes of a particular organelle; includes the volume enclosed by a single organelle membrane, e.g. endoplasmic reticulum lumen, or the volume enclosed by the innermost of the two lipid bilayers of an organelle envelope, e.g. nuclear lumen
- In other words, why does the cell react to cold shock by changing the expression of genes associated with these GO terms?
- The cell reacts to cold shock by changing expression of genes associated with these GO terms because all of these terms relate to rRNA and rna. RNA serves as information secondary to the DNA. rRNA associates with the set of proteins to form ribosomes. They help catalyze protein formation and help with the synthesis. The proteins and ribosomes help regulate the cell and keep its function. In response to cold shock, the rRNA helps transmit information to start protein catalysis and help the cell stay regulated in response the stress. Proteins help maintain cell equilibrium and stress which is what is needed. Without the rRNA, these proteins may not be made. The nuclear lumen and organelle lumen hold items within the cell, areas that may be under stress during the cold shock.
- PPT slides comparing the two: TMorrisAGomesPPT
- Note: The profiles that were both significant for dGLN3 and Wild Type was: 45,22,9,48
- Wild Type seemed to show that the genes followed more of the same expression up/down expressions because the groupings for the lines were darker and more clumped.