BIOL368/F14:Chloe Jones Week 15

From OpenWetWare
Jump to navigationJump to search

Data Analysis

Redo the Complete Sanity Check

Sanity Check: Number of genes significantly changed

• Professor Dahlquist added her calculations to our dataset that we already calculated the week prior. She labeled her datasheets "_norm_KD" and "_notnorm_KD", with norm symbolizing that the data was scaled and centered and notnorm meaning that it wasn’t. Since, this dataset was different that the one that I had previously done my sanity check on it had to be redone, to see how my data compared to the data from the paper (Table 1). Next, a gene was picked from the paper that displayed a log fold change, and then the gene was found in my dataset and used to compare. Based on the similarities with that of the data set and paper it was concluded that I would use the "forGenMAPP_norm_KD" dataset.

  • Before we move on to the GenMAPP/MAPPFinder analysis, we want to perform a sanity check to make sure that we performed our data analysis correctly. We are going to find out the number of genes that are significantly changed at various p value cut-offs and also compare our data analysis with the published results.'
  • Step 1. Go to "_norm_KD" worksheet. On A1 cell right click filter>filter by selected cells value
  • Step 2. Click on drop down menu on "Pvalue" column. Text filters>custom filters. Set criteria for P-value:'

* We have just performed 5480 T tests for significance. Another way to state what we are seeing with p < 0.05 is that we would expect to see this magnitude of a gene expression change in about 5% of our T tests, or 274 times. If have more than 274 genes that pass this cut off, we know that some genes are significantly changed. However, we don't know which ones.

  • "Avg_LogFC_all" column tells the size of gene expression change, positive values correlates to increases relative to control. Negative values correlate to decreases relative to the control.
  • Step 3.
    • "Pvalue" filter at p < 0.05, filter the "AverageLogFC" column to show all genes with an average log fold change greater than zero. There are 658.
    • "Pvalue" filter at p > 0.05, filter the "AverageLogFC" column to show all genes with an average log fold change less than zero. There are 527.
    • What about an average log fold change of > 0.25 and p < 0.05? There are 634.
    • What about an average log fold change of < -0.25 and p < 0.05?There are 504.More realistic.
    • For the GenMAPP analysis below, we will use the fold change cut-off of greater than 0.25 or less than -0.25 and the p value cut off of p < 0.05 for our analysis because we want to include several hundred genes in our analysis.

Table 1. Repeat Sanity Check.

P-value cut off # genes % genes
p<0.05 1185 1185/5481=21.6%
p<0.01 382 382/5481=7%
p<0.001 62 62/5481=1.13%
B-H 6 6/5481=.11%
Bonferroni 2 2/5481=3.6*10^-4



GenMAPP/Mappfinder procedure

  • Step 1. from within GenMAPP, select Tools > MAPPFinder. Make sure correct gene database is loaded. Correct one was chose.
  • Step 2. "Calculate New Results"> "Find File" > choose my expression datasheet
  • Step 3. Choose the Color Set and Criteria with which to filter the data. Click on either the "Increased" and "Decreased" criteria in the right-hand box Pink and blues were chose.
  • Step 4.Check boxes, "Gene Ontology" and "p value".
  • Step 5. click browse, create purposeful file name>Click "Run MAPPFinder", takes several minutes
  • Step 6. Gene ontology window opens showing results. Gene Ontology terms that have at least 3 genes measured and a p value of less than 0.05 will be highlighted yellow. A term with a p value less than 0.05 is considered a "significant" result.
  • list of the most significant Gene Ontology terms, click on the menu item "Show Ranked List".
  • Step 7. In the MAPPfinder window you can collapse the tree and browse the genes that were mentioned in the paper. (i.e SAR____)
  • Step 8.Type the identifier for one of these genes into the MAPPFinder browser gene ID search field. Choose "OrderedLocusNames" from the drop-down menu to the right of the search field. Click on the GeneID Search button. The GO term(s) that are associated with that gene will be highlighted in blue. The genes on the MAPP will be color-coded with the gene expression data from the microarray experiment.
  • Step 9.Launch Microsoft Excel. Open the copies of the .txt files in Excel (you will need to "Show all files" and click "Finish" to the wizard that will open your file). This will show you the same data that you saw in the MAPPFinder Browser, but in tabular form.
  • Step 10. You will filter this list to show the top GO terms represented in your data for both the "Increased" and "Decreased" criteria. Click on the drop-down arrow for the column you wish to filter and choose "(Custom…)". A window will open giving you choices on how you want to filter.
    • Z score (in column N) greater than 2
    • PermuteP (in column 0) less than 0.05
    • Number Changed (in column I) greater than or equal to 4 or 5 AND less than 100 Percent Changed (in column L) greater than or equal to 25-50
  • Step 11. Save your changes as an excel spreadsheet.


Table 2. MAPPFinder results for increased “RanaUp” gene expression of 10 GO terms.

GO term # Changed  % Changed P-value
Ion Transport 34 30.63 0.012
Threonine Biosynthetic Process 5 83.33 0.05
Ion Transmembrane Transport 22 34.38 0.059
Oxidative Phosphorylation 6 66.67 0.072
Metal Ion Transport 13 35.14 0.55
Sodium Ion Transport 7 53.85 0.39
Cation Transport 18 27.69 0.99
Serine-type Endopeptidase Activity 5 41.67 1
DNA Recombination 12 26.67 1
Endopeptidase Activity 7 31.82 1



Table 1. MAPPFinder results for decreased “RanaDown” gene expression of 10 GO terms.

GO term # Changed  % Changed P-value
Pathogenesis 23 30.26 0.044
Cell Wall 7 46.67 0.39
Nucleotide Metabolic Process 36 18.56 0.999
’de novo’ IMP Biosynthetic Process 5 45.45 0.99
Cellular Aromatic Compound Metabolic Process 91 15.29 1
Purine Nucleotide Biosynthetic Process 9 25.71 1
Phosphorelay Signal Transduction System 9 25.71 1
Purine Nucleotide Metabolic Process 27 18.62 1
tRNA Aminoacylation for protein translation 6 30 1
Aminoacyl-tRNA Ligase Activity 6 30 1




Table 3. 10 "most significant" genes

Gene Function
SAR2523 Putative membrane protein
SAR2774 Collagen adhesin
SAR2725 Putative surface anchored protein
SAR0384 Uncharacterized protein
SAR1398 Phosphate-specific transport system accessory protein PhoU
SAR0645 Putative membrane protein
SAR1408 4-hydroxy-tetrahydrodipicolinate reductase
SAR0932 Putative transposase
SAR2774 Collagen adhesin
SAR0464 N-acetylmuramoyl-L-alanine amidase sle1



Final Presentation

Power point for my final presentation on the analysis of MRSA can be found Here

Electronic Lab Notebook

Weekly Assignments

Class Journals


Chloe Jones 03:46, 15 October 2014 (EDT)Chloe Jones