BIOL368/F14:Isabel Gonzaga Week 15
From OpenWetWare
Jump to navigationJump to search
Sanity Check Redux
Sanity Check: Number of genes significantly changed
Before we move on to the GenMAPP/MAPPFinder analysis, we want to perform a sanity check to make sure that we performed our data analysis correctly. We are going to find out the number of genes that are significantly changed at various p value cut-offs and also compare our data analysis with the published results.
- Open your spreadsheet and go to the "forGenMAPP" tab.
- Click on cell A1 and select the menu item Data > Filter > Autofilter. Little drop-down arrows should appear at the top of each column. This will enable us to filter the data according to criteria we set.
- Click on the drop-down arrow on your "Pvalue" column. Select "Custom". In the window that appears, set a criterion that will filter your data so that the Pvalue has to be less than 0.05.
- How many genes have p value < 0.05? NO: 806 HYP: 417
- What about p < 0.01? NO: 348 HYP: 107
- What about p < 0.001? NO: 116 HYP: 7
- What about p < 0.0001? NO: 44 HYP: 2
- When we use a p value cut-off of p < 0.05, what we are saying is that you would have seen a gene expression change that deviates this far from zero less than 5% of the time.
- We have just performed 5480 T tests for significance. Another way to state what we are seeing with p < 0.05 is that we would expect to see this magnitude of a gene expression change in about 5% of our T tests, or 274 times. If have more than 274 genes that pass this cut off, we know that some genes are significantly changed. However, we don't know which ones.
- The "Avg_LogFC_all" tells us the size of the gene expression change and in which direction. Positive values are increases relative to the control; negative values are decreases relative to the control.
- Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change greater than zero. How many are there?
- NO: 62
- HYP: 164
- Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change less than zero. How many are there?
- NO: 86
- HYP: 253
- What about an average log fold change of > 0.25 and p < 0.05?
- NO: 315
- HYP: 160
- Or an average log: fold change of < -0.25 and p < 0.05? (These are more realistic values for the fold change cut-offs because it represents about a 20% fold change which is about the level of detection of this technology.)
- NO: 475
- HYP: 244
- Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change greater than zero. How many are there?
- In summary, the p value cut-off should not be thought of as some magical number at which data becomes "significant". Instead, it is a moveable confidence level. If we want to be very confident of our data, use a small p value cut-off. If we are OK with being less confident about a gene expression change and want to include more genes in our analysis, we can use a larger p value cut-off. For the GenMAPP analysis below, we will use the fold change cut-off of greater than 0.25 or less than -0.25 and the p value cut off of p < 0.05 for our analysis because we want to include several hundred genes in our analysis.
- What criteria did your paper use to determine a significant gene expression change? How does it compare to our method?
- The paper performed normalization for Cy3 and Cy5 for all spots, except for induction ratios in the top or bottom 5%. Noise value determined by calculating average intensity for 20% lowest intensity, and values below this were raised to this average value. There is no mention of the log based calculations or statistical analyses used.
- Our method does not discriminate the top and bottom 5% spot intensities when normalizing the log ratios.
Sanity Check: Compare individual genes with known data
- Look in your paper for genes that are specifically mentioned. What are their fold changes and p values in the paper? Are they significantly changed in your analysis?
- Only 38/48 genes in the dormancy regulon were found in the data
- SD values were provided instead of pvalues, as found in Supplemental Table S1. The 48 gene regulon was all considered significantly repressed in the paper
Gene | NO Fold Values in paper | NO Fold Value in analysis | NO Significant in analysis? | HYP Fold Values in paper | HYP Fold Value in analysis | HYP Significant in analysis? |
---|---|---|---|---|---|---|
RV0079 | 15 | ---- | ---- | 13.0 | ---- | ---- |
RV0080 | 6.4 | 3.72 | yes | 8.2 | 5.2154 | yes |
RV0081 | 2.8 | 2.84 | yes | 3.8 | 3.1783 | yes |
RV0569 | 24.2 | ---- | ---- | 17.1 | ---- | ---- |
RV0570 | 3.0 | 3.04 | yes | 3.0 | 3.4478 | yes |
RV0571C | 4.3 | 3.91 | yes | 1.8 | 4.0624 | yes |
RV0572C | 16.6 | ---- | ---- | 9.4 | ---- | ---- |
RV0573C | 1.9 | 3.30 | yes | 1.3 | 3.1616 | yes |
RV0574C | 4.9 | 4,19 | yes | 2.9 | 5.1561 | yes |
RV1733C | 21.0 | 7.27 | yes | 15.5 | 7.82 | yes |
RV1734C | 5.7 | 5.4 | yes | 5.1 | 3.7585 | no |
RV1735C | 1.9 | 2.57 | yes | 2.0 | 1.7887 | no |
RV1736C | 4.0 | 3.80 | yes | 3.3 | 3.5576 | yes |
RV1737C | 15.0 | ---- | ---- | 12.5 | ---- | ---- |
RV1738 | 26.6 | 8.42 | yes | 50.4 | 10.127 | yes |
RV1812C | 2.4 | 3.93 | yes | 2.0 | 1.4468 | no |
RV1813C | 17.8 | ---- | ---- | 12.6 | ---- | ---- |
RV1996 | 14.9 | 8.36 | yes | 13.7 | 8.3218 | yes |
RV1997 | 6.8 | ---- | ---- | 4.4 | ---- | ---- |
RV1998C | 15.9 | 7.4 | yes | 8.6 | 6.5477 | yes |
RV2003C | 13.8 | ---- | ---- | 12.3 | ---- | ---- |
RV2004C | 2.1 | 2.26 | yes | 2.1 | 1,7185 | yes |
RV2005C | 7.3 | 7.80 | yes | 9.2 | 6.1906 | yes |
RV2006 | 4.1 | 5.42 | yes | 4.0 | 4.3638 | yes |
RV2007C | 15.6 | 6.73 | yes | 24.1 | 6.7604 | yes |
RV2028C | 4.8 | 4.15 | yes | 3.5 | 3.465 | yes |
RV2029C | 15.8 | 6.88 | yes | 12.2 | 7.7055 | yes |
RV2030C | 19.0 | 6.52 | yes | 10.6 | 6.9765 | yes |
RV2031C | 22.6 | 4.98 | yes | 14.6 | 6.5413 | no |
RV2032 | 31.3 | 8.15 | yes | 45.2 | 8.6766 | yes |
RV2623 | 5.5 | 4.96 | yes | 7.3 | 4.6019 | yes |
RV2624C | 16.9 | 7.60 | yes | 19.7 | 8.3347 | yes |
RV2625C | 5.6 | 6.58 | yes | 6.9 | 3.9833 | yes |
RV2626C | 14.6 | 7.57 | yes | 40.6 | 7.4947 | yes |
RV2627C | 10.6 | 6.48 | yes | 11.9 | 6.3545 | yes |
RV2628 | 7.9 | 6.25 | yes | 5.2 | 5.3776 | yes |
RV2629 | 7.2 | 6.15 | yes | 7.4 | 5.6296 | yes |
RV2630 | 5.1 | 4.84 | yes | 4.2 | 4.0339 | yes |
RV2631 | 2.0 | 1.20 | yes | 1.6 | 0.1759 | no |
RV3126C | 21.5 | 7.13 | yes | 22.7 | 8.0638 | yes |
RV3127 | 24.5 | 7.69 | yes | 36.0 | 8.4612 | yes |
RV3128C | 11.6 | ---- | ---- | 17.5 | ---- | ---- |
RV3129 | 25.9 | ---- | ---- | 24.5 | ---- | ---- |
RV3130C | 21.0 | ---- | ---- | 14.0 | ---- | ---- |
RV3131 | 5.5 | 5.68 | yes | 4.6 | 3.9427 | yes |
RV3132C | 12.1 | 6.66 | yes | 9.8 | 5.9421 | yes |
RV3133C | 14.4 | 7.98 | yes | 11.9 | 9.9405 | yes |
RV3134C | 9.1 | 6.70 | yes | 11.5 | 6.1710 | yes |
total: 3269 genes
P Value Cutoff | # Genes NO | % Genes NO | # Genes HYP | % Genes HYP |
---|---|---|---|---|
p<0.05 | 806 | 24.7% | 417 | 12.8% |
p<0.01 | 348 | 10.6% | 107 | 3.3% |
p<0.001 | 116 | 3.5% | 7 | 0.2% |
p<0.0001 | 44 | 1.3% | 2 | 0.01% |
Bon p<0.05 | 22 | 0.67% | 0 | 0 |
BH p<0.05 | 200 | 6.1% | 0 | 0 |
Complete Microarray Data Analysis
- Files used for GenMAPP Analysis Media:IsabelGonzaga_GenMAPP.zip
- Contains Hypoxia and NO results for MAPPFinder Analysis
- HYP results only contain standard p-value analysis and not BH p values, as no scaled centered BH values were less than the 0.05 cutoff
- Contains Hypoxia and NO results for MAPPFinder Analysis
- M.Tuberculosis file was downloaded and uploaded onto the GenMAPP software
- MASTER DATA Spreadsheet was converted to a .txt file and uploaded onto GenMAPP
- 13 errors in raw data discovered by GenMapp
- Expression Dataset Manger was used to color code the treatment conditions. Color sets were made for the NO condition and Hypoxia conditions with pink being increased and blue being decreased expression. Four labels were set as follows for both conditions
- BH Increased: fold expression > 0.25 AND BH p-value < 0.05
- BH Decreaed: fold expression < -0.25 AND BH p-value < 0.05
- Increased: fold expression > 0.25 AND p-value < 0.05
- Decreased: fold expression < -0.25 AND p-value < 0.05
- MAPPFinder was accessed and ran for the two treatments, for all 4 labels. The generated results were saved as .txt documents (see: zip file)
- The results for regular p-value changes were used for further analysis. These files were opened and saved as an Excel spreadsheet. The following filters were applied to all four data sets analyzed:
- Z score >2
- Number change between 4 and 100
- PermuteP less than .05
Table 1. MAPPFinder results for increased expression in Nitric Oxide
GO term | # Changed | Percent Changed | P-value |
---|---|---|---|
Nitrate reductase complex | 4 | 100 | 0 |
Cellular response to nitrosative stress | 12 | 100 | 0 |
Oxidoreductase activity, acting on other nitrogenous compounds as donors |
4 | 80 | 0 |
Cellular response to iron ion starvation | 5 | 45.45 | 0.001 |
Folic acid and derivative metabolic process | 4 | 44.44 | 0.013 |
Response to hypoxia | 13 | 37.14 | 0 |
Acyl carrier activity | 7 | 35 | 0.002 |
Amine binding | 6 | 31.5789 | 0.01 |
Cellular response to stress | 27 | 30.681 | 0 |
Cellular response to stimulus | 27 | 28.421 | 0 |
Table 2. MAPPFinder results for decreased expression in Nitric Oxide
GO term | # Changed | Percent Changed | P-value |
---|---|---|---|
rRNA Binding | 14 | 45.16 | 0 |
Structural constituent of ribosome | 18 | 39.13 | 0 |
Negative regulation of growth | 14 | 41.17 | 0 |
Nonmembrane bounded organelle | 20 | 31.333 | 0 |
Cellular protein metabolic process | 35 | 27.727 | 0 |
Translation | 24 | 28.91 | 0.001 |
Large ribosomal subunit | 4 | 80 | 0.002 |
Protein metabolic process | 42 | 204878 | 0.005 |
Regulation of growth | 19 | 25.676 | 0.011 |
Evasion or tolerance of host immune responses | 7 | 30.43478 | 0.037 |
Final Presentation
Media:IsabelGonzagaFinalPresentation.ppt
Weekly Assignments
- Week 1 Assignment
- Week 2 Assignment
- Week 3 Assignment
- Week 4 Assignment
- Week 5 Assignment
- Week 6 Assignment
- Week 7 Assignment
- Week 8 Assignment
- Week 9 Assignment
- Week 10 Assignment
- Week 11 Assignment
- Week 12 Assignment
- Week 13 Assignment
- Week 15 Assignment
Class Journals
- Class Journal Week 1
- Class Journal Week 2
- Class Journal Week 3
- Class Journal Week 4
- Class Journal Week 5
- Class Journal Week 6
- Class Journal Week 7
- Class Journal Week 8
- Class Journal Week 9
- Class Journal Week 10
- Class Journal Week 11
- Class Journal Week 12
- Class Journal Week 13
- Class Journal Week 15