BIOL368/F14:Isabel Gonzaga Week 15

From OpenWetWare
Jump to navigationJump to search

Sanity Check Redux

Sanity Check: Number of genes significantly changed

Before we move on to the GenMAPP/MAPPFinder analysis, we want to perform a sanity check to make sure that we performed our data analysis correctly. We are going to find out the number of genes that are significantly changed at various p value cut-offs and also compare our data analysis with the published results.

  • Open your spreadsheet and go to the "forGenMAPP" tab.
  • Click on cell A1 and select the menu item Data > Filter > Autofilter. Little drop-down arrows should appear at the top of each column. This will enable us to filter the data according to criteria we set.
  • Click on the drop-down arrow on your "Pvalue" column. Select "Custom". In the window that appears, set a criterion that will filter your data so that the Pvalue has to be less than 0.05.
    • How many genes have p value < 0.05? NO: 806 HYP: 417
    • What about p < 0.01? NO: 348 HYP: 107
    • What about p < 0.001? NO: 116 HYP: 7
    • What about p < 0.0001? NO: 44 HYP: 2
  • When we use a p value cut-off of p < 0.05, what we are saying is that you would have seen a gene expression change that deviates this far from zero less than 5% of the time.
  • We have just performed 5480 T tests for significance. Another way to state what we are seeing with p < 0.05 is that we would expect to see this magnitude of a gene expression change in about 5% of our T tests, or 274 times. If have more than 274 genes that pass this cut off, we know that some genes are significantly changed. However, we don't know which ones.
  • The "Avg_LogFC_all" tells us the size of the gene expression change and in which direction. Positive values are increases relative to the control; negative values are decreases relative to the control.
    • Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change greater than zero. How many are there?
      • NO: 62
      • HYP: 164
    • Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change less than zero. How many are there?
      • NO: 86
      • HYP: 253
    • What about an average log fold change of > 0.25 and p < 0.05?
      • NO: 315
      • HYP: 160
    • Or an average log: fold change of < -0.25 and p < 0.05? (These are more realistic values for the fold change cut-offs because it represents about a 20% fold change which is about the level of detection of this technology.)
      • NO: 475
      • HYP: 244
  • In summary, the p value cut-off should not be thought of as some magical number at which data becomes "significant". Instead, it is a moveable confidence level. If we want to be very confident of our data, use a small p value cut-off. If we are OK with being less confident about a gene expression change and want to include more genes in our analysis, we can use a larger p value cut-off. For the GenMAPP analysis below, we will use the fold change cut-off of greater than 0.25 or less than -0.25 and the p value cut off of p < 0.05 for our analysis because we want to include several hundred genes in our analysis.
  • What criteria did your paper use to determine a significant gene expression change? How does it compare to our method?
    • The paper performed normalization for Cy3 and Cy5 for all spots, except for induction ratios in the top or bottom 5%. Noise value determined by calculating average intensity for 20% lowest intensity, and values below this were raised to this average value. There is no mention of the log based calculations or statistical analyses used.
    • Our method does not discriminate the top and bottom 5% spot intensities when normalizing the log ratios.

Sanity Check: Compare individual genes with known data

  • Look in your paper for genes that are specifically mentioned. What are their fold changes and p values in the paper? Are they significantly changed in your analysis?
    • Only 38/48 genes in the dormancy regulon were found in the data
    • SD values were provided instead of pvalues, as found in Supplemental Table S1. The 48 gene regulon was all considered significantly repressed in the paper
Gene NO Fold Values in paper NO Fold Value in analysis NO Significant in analysis? HYP Fold Values in paper HYP Fold Value in analysis HYP Significant in analysis?
RV0079 15 ---- ---- 13.0 ---- ----
RV0080 6.4 3.72 yes 8.2 5.2154 yes
RV0081 2.8 2.84 yes 3.8 3.1783 yes
RV0569 24.2 ---- ---- 17.1 ---- ----
RV0570 3.0 3.04 yes 3.0 3.4478 yes
RV0571C 4.3 3.91 yes 1.8 4.0624 yes
RV0572C 16.6 ---- ---- 9.4 ---- ----
RV0573C 1.9 3.30 yes 1.3 3.1616 yes
RV0574C 4.9 4,19 yes 2.9 5.1561 yes
RV1733C 21.0 7.27 yes 15.5 7.82 yes
RV1734C 5.7 5.4 yes 5.1 3.7585 no
RV1735C 1.9 2.57 yes 2.0 1.7887 no
RV1736C 4.0 3.80 yes 3.3 3.5576 yes
RV1737C 15.0 ---- ---- 12.5 ---- ----
RV1738 26.6 8.42 yes 50.4 10.127 yes
RV1812C 2.4 3.93 yes 2.0 1.4468 no
RV1813C 17.8 ---- ---- 12.6 ---- ----
RV1996 14.9 8.36 yes 13.7 8.3218 yes
RV1997 6.8 ---- ---- 4.4 ---- ----
RV1998C 15.9 7.4 yes 8.6 6.5477 yes
RV2003C 13.8 ---- ---- 12.3 ---- ----
RV2004C 2.1 2.26 yes 2.1 1,7185 yes
RV2005C 7.3 7.80 yes 9.2 6.1906 yes
RV2006 4.1 5.42 yes 4.0 4.3638 yes
RV2007C 15.6 6.73 yes 24.1 6.7604 yes
RV2028C 4.8 4.15 yes 3.5 3.465 yes
RV2029C 15.8 6.88 yes 12.2 7.7055 yes
RV2030C 19.0 6.52 yes 10.6 6.9765 yes
RV2031C 22.6 4.98 yes 14.6 6.5413 no
RV2032 31.3 8.15 yes 45.2 8.6766 yes
RV2623 5.5 4.96 yes 7.3 4.6019 yes
RV2624C 16.9 7.60 yes 19.7 8.3347 yes
RV2625C 5.6 6.58 yes 6.9 3.9833 yes
RV2626C 14.6 7.57 yes 40.6 7.4947 yes
RV2627C 10.6 6.48 yes 11.9 6.3545 yes
RV2628 7.9 6.25 yes 5.2 5.3776 yes
RV2629 7.2 6.15 yes 7.4 5.6296 yes
RV2630 5.1 4.84 yes 4.2 4.0339 yes
RV2631 2.0 1.20 yes 1.6 0.1759 no
RV3126C 21.5 7.13 yes 22.7 8.0638 yes
RV3127 24.5 7.69 yes 36.0 8.4612 yes
RV3128C 11.6 ---- ---- 17.5 ---- ----
RV3129 25.9 ---- ---- 24.5 ---- ----
RV3130C 21.0 ---- ---- 14.0 ---- ----
RV3131 5.5 5.68 yes 4.6 3.9427 yes
RV3132C 12.1 6.66 yes 9.8 5.9421 yes
RV3133C 14.4 7.98 yes 11.9 9.9405 yes
RV3134C 9.1 6.70 yes 11.5 6.1710 yes


total: 3269 genes

P Value Cutoff # Genes NO % Genes NO # Genes HYP % Genes HYP
p<0.05 806 24.7% 417 12.8%
p<0.01 348 10.6% 107 3.3%
p<0.001 116 3.5% 7 0.2%
p<0.0001 44 1.3% 2 0.01%
Bon p<0.05 22 0.67% 0 0
BH p<0.05 200 6.1% 0 0

Complete Microarray Data Analysis

  • Files used for GenMAPP Analysis Media:IsabelGonzaga_GenMAPP.zip
    • Contains Hypoxia and NO results for MAPPFinder Analysis
      • HYP results only contain standard p-value analysis and not BH p values, as no scaled centered BH values were less than the 0.05 cutoff
  • M.Tuberculosis file was downloaded and uploaded onto the GenMAPP software
  • MASTER DATA Spreadsheet was converted to a .txt file and uploaded onto GenMAPP
    • 13 errors in raw data discovered by GenMapp
  • Expression Dataset Manger was used to color code the treatment conditions. Color sets were made for the NO condition and Hypoxia conditions with pink being increased and blue being decreased expression. Four labels were set as follows for both conditions
    • BH Increased: fold expression > 0.25 AND BH p-value < 0.05
    • BH Decreaed: fold expression < -0.25 AND BH p-value < 0.05
    • Increased: fold expression > 0.25 AND p-value < 0.05
    • Decreased: fold expression < -0.25 AND p-value < 0.05
  • MAPPFinder was accessed and ran for the two treatments, for all 4 labels. The generated results were saved as .txt documents (see: zip file)
  • The results for regular p-value changes were used for further analysis. These files were opened and saved as an Excel spreadsheet. The following filters were applied to all four data sets analyzed:
    • Z score >2
    • Number change between 4 and 100
    • PermuteP less than .05

Table 1. MAPPFinder results for increased expression in Nitric Oxide

GO term # Changed Percent Changed P-value
Nitrate reductase complex 4 100 0
Cellular response to nitrosative stress 12 100 0
Oxidoreductase activity,
acting on other nitrogenous compounds as donors
4 80 0
Cellular response to iron ion starvation 5 45.45 0.001
Folic acid and derivative metabolic process 4 44.44 0.013
Response to hypoxia 13 37.14 0
Acyl carrier activity 7 35 0.002
Amine binding 6 31.5789 0.01
Cellular response to stress 27 30.681 0
Cellular response to stimulus 27 28.421 0


Table 2. MAPPFinder results for decreased expression in Nitric Oxide

GO term # Changed Percent Changed P-value
rRNA Binding 14 45.16 0
Structural constituent of ribosome 18 39.13 0
Negative regulation of growth 14 41.17 0
Nonmembrane bounded organelle 20 31.333 0
Cellular protein metabolic process 35 27.727 0
Translation 24 28.91 0.001
Large ribosomal subunit 4 80 0.002
Protein metabolic process 42 204878 0.005
Regulation of growth 19 25.676 0.011
Evasion or tolerance of host immune responses 7 30.43478 0.037

Final Presentation

Media:IsabelGonzagaFinalPresentation.ppt



Weekly Assignments

Class Journals

Electronic Lab Notebook