BIOL368/F14:Chloe Jones Week 15
Redo the Complete Sanity Check
Sanity Check: Number of genes significantly changed
• Professor Dahlquist added her calculations to our dataset that we already calculated the week prior. She labeled her datasheets "_norm_KD" and "_notnorm_KD", with norm symbolizing that the data was scaled and centered and notnorm meaning that it wasn’t. Since, this dataset was different that the one that I had previously done my sanity check on it had to be redone, to see how my data compared to the data from the paper (Table 1). Next, a gene was picked from the paper that displayed a log fold change, and then the gene was found in my dataset and used to compare. Based on the similarities with that of the data set and paper it was concluded that I would use the "forGenMAPP_norm_KD" dataset.
- Before we move on to the GenMAPP/MAPPFinder analysis, we want to perform a sanity check to make sure that we performed our data analysis correctly. We are going to find out the number of genes that are significantly changed at various p value cut-offs and also compare our data analysis with the published results.'
- Step 1. Go to "_norm_KD" worksheet. On A1 cell right click filter>filter by selected cells value
- Step 2. Click on drop down menu on "Pvalue" column. Text filters>custom filters. Set criteria for P-value:'
* We have just performed 5480 T tests for significance. Another way to state what we are seeing with p < 0.05 is that we would expect to see this magnitude of a gene expression change in about 5% of our T tests, or 274 times. If have more than 274 genes that pass this cut off, we know that some genes are significantly changed. However, we don't know which ones.
- "Avg_LogFC_all" column tells the size of gene expression change, positive values correlates to increases relative to control. Negative values correlate to decreases relative to the control.
- Step 3.
- "Pvalue" filter at p < 0.05, filter the "AverageLogFC" column to show all genes with an average log fold change greater than zero. There are 658.
- "Pvalue" filter at p > 0.05, filter the "AverageLogFC" column to show all genes with an average log fold change less than zero. There are 527.
- What about an average log fold change of > 0.25 and p < 0.05? There are 634.
- What about an average log fold change of < -0.25 and p < 0.05?There are 504.More realistic.
- For the GenMAPP analysis below, we will use the fold change cut-off of greater than 0.25 or less than -0.25 and the p value cut off of p < 0.05 for our analysis because we want to include several hundred genes in our analysis.
Table 1. Repeat Sanity Check.
|P-value cut off||# genes||% genes|
- Step 1. from within GenMAPP, select Tools > MAPPFinder. Make sure correct gene database is loaded. Correct one was chose.
- Step 2. "Calculate New Results"> "Find File" > choose my expression datasheet
- Step 3. Choose the Color Set and Criteria with which to filter the data. Click on either the "Increased" and "Decreased" criteria in the right-hand box Pink and blues were chose.
- Step 4.Check boxes, "Gene Ontology" and "p value".
- Step 5. click browse, create purposeful file name>Click "Run MAPPFinder", takes several minutes
- Step 6. Gene ontology window opens showing results. Gene Ontology terms that have at least 3 genes measured and a p value of less than 0.05 will be highlighted yellow. A term with a p value less than 0.05 is considered a "significant" result.
- list of the most significant Gene Ontology terms, click on the menu item "Show Ranked List".
- Step 7. In the MAPPfinder window you can collapse the tree and browse the genes that were mentioned in the paper. (i.e SAR____)
- Step 8.Type the identifier for one of these genes into the MAPPFinder browser gene ID search field. Choose "OrderedLocusNames" from the drop-down menu to the right of the search field. Click on the GeneID Search button. The GO term(s) that are associated with that gene will be highlighted in blue. The genes on the MAPP will be color-coded with the gene expression data from the microarray experiment.
- Step 9.Launch Microsoft Excel. Open the copies of the .txt files in Excel (you will need to "Show all files" and click "Finish" to the wizard that will open your file). This will show you the same data that you saw in the MAPPFinder Browser, but in tabular form.
- Step 10. You will filter this list to show the top GO terms represented in your data for both the "Increased" and "Decreased" criteria. Click on the drop-down arrow for the column you wish to filter and choose "(Custom…)". A window will open giving you choices on how you want to filter.
- Z score (in column N) greater than 2
- PermuteP (in column 0) less than 0.05
- Number Changed (in column I) greater than or equal to 4 or 5 AND less than 100 Percent Changed (in column L) greater than or equal to 25-50
- Step 11. Save your changes as an excel spreadsheet.
Table 2. MAPPFinder results for increased “RanaUp” gene expression of 10 GO terms.
|GO term||# Changed||% Changed||P-value|
|Threonine Biosynthetic Process||5||83.33||0.05|
|Ion Transmembrane Transport||22||34.38||0.059|
|Metal Ion Transport||13||35.14||0.55|
|Sodium Ion Transport||7||53.85||0.39|
|Serine-type Endopeptidase Activity||5||41.67||1|
Table 1. MAPPFinder results for decreased “RanaDown” gene expression of 10 GO terms.
|GO term||# Changed||% Changed||P-value|
|Nucleotide Metabolic Process||36||18.56||0.999|
|’de novo’ IMP Biosynthetic Process||5||45.45||0.99|
|Cellular Aromatic Compound Metabolic Process||91||15.29||1|
|Purine Nucleotide Biosynthetic Process||9||25.71||1|
|Phosphorelay Signal Transduction System||9||25.71||1|
|Purine Nucleotide Metabolic Process||27||18.62||1|
|tRNA Aminoacylation for protein translation||6||30||1|
|Aminoacyl-tRNA Ligase Activity||6||30||1|
Table 3. 10 "most significant" genes
|SAR2523||Putative membrane protein|
|SAR2725||Putative surface anchored protein|
|SAR1398||Phosphate-specific transport system accessory protein PhoU|
|SAR0645||Putative membrane protein|
|SAR0464||N-acetylmuramoyl-L-alanine amidase sle1|
- From the Overton_MicroarrayData_20141119_CJ_downloaded_20141202_editedKD.xlsx the P-value column was filtered in ascending order. The values towards the top have a p-value of 0.00 showing the most significant genes. The ids were then looked up in Uniportto find out the function of the genes.
Power point for my final presentation on the analysis of MRSA can be found Here
Electronic Lab Notebook
- Chloe Jones Week 2
- Chloe Jones Week 3
- Chloe Jones Week 4
- Chloe Jones Week 5
- Chloe Jones Week 6
- Chloe Jones Week 7
- Chloe Jones Week 8
- Chloe Jones Week 9
- Chloe Jones Week 10
- Chloe Jones Week 11
- Chloe Jones Week 12
- Chloe Jones Week 13
- Chloe Jones Week 15
- Week 1 Assignment
- Week 2 Assignment
- Week 3 Assignment
- Week 4 Assignment
- Week 5 Assignment
- Week 6 Assignment
- Week 7 Assignment
- Week 8 Assignment
- Week 9 Assignment
- Week 10 Assignment
- Week 11 Assignment
- Week 12 Assignment
- Week 13 Assignment
- Week 15 Assignment
- Class Journal Week 1
- Class Journal Week 2
- Class Journal Week 3
- Class Journal Week 4
- Class Journal Week 5
- Class Journal Week 6
- Class Journal Week 7
- Class Journal Week 8
- Class Journal Week 9
- Class Journal Week 10
- Class Journal Week 11
- Class Journal Week 12
- Class Journal Week 13
- Class Journal Week 15
Chloe Jones 03:46, 15 October 2014 (EDT)Chloe Jones