BIOL368/F14:Chloe Jones Week 15

Data Analysis

Redo the Complete Sanity Check

Sanity Check: Number of genes significantly changed

• Professor Dahlquist added her calculations to our dataset that we already calculated the week prior. She labeled her datasheets "_norm_KD" and "_notnorm_KD", with norm symbolizing that the data was scaled and centered and notnorm meaning that it wasn’t. Since, this dataset was different that the one that I had previously done my sanity check on it had to be redone, to see how my data compared to the data from the paper (Table 1). Next, a gene was picked from the paper that displayed a log fold change, and then the gene was found in my dataset and used to compare. Based on the similarities with that of the data set and paper it was concluded that I would use the "forGenMAPP_norm_KD" dataset.

Before we move on to the GenMAPP/MAPPFinder analysis, we want to perform a sanity check to make sure that we performed our data analysis correctly. We are going to find out the number of genes that are significantly changed at various p value cut-offs and also compare our data analysis with the published results.'
Step 1. Go to "_norm_KD" worksheet. On A1 cell right click filter>filter by selected cells value
Step 2. Click on drop down menu on "Pvalue" column. Text filters>custom filters. Set criteria for P-value:'

* We have just performed 5480 T tests for significance. Another way to state what we are seeing with p < 0.05 is that we would expect to see this magnitude of a gene expression change in about 5% of our T tests, or 274 times. If have more than 274 genes that pass this cut off, we know that some genes are significantly changed. However, we don't know which ones.

"Avg_LogFC_all" column tells the size of gene expression change, positive values correlates to increases relative to control. Negative values correlate to decreases relative to the control.
Step 3.
- "Pvalue" filter at p < 0.05, filter the "AverageLogFC" column to show all genes with an average log fold change greater than zero. There are 658.
- "Pvalue" filter at p > 0.05, filter the "AverageLogFC" column to show all genes with an average log fold change less than zero. There are 527.
- What about an average log fold change of > 0.25 and p < 0.05? There are 634.
- What about an average log fold change of < -0.25 and p < 0.05?There are 504.More realistic.
- For the GenMAPP analysis below, we will use the fold change cut-off of greater than 0.25 or less than -0.25 and the p value cut off of p < 0.05 for our analysis because we want to include several hundred genes in our analysis.

Table 1. Repeat Sanity Check.

P-value cut off	# genes	% genes
p<0.05	1185	1185/5481=21.6%
p<0.01	382	382/5481=7%
p<0.001	62	62/5481=1.13%
B-H	6	6/5481=.11%
Bonferroni	2	2/5481=3.6*10^-4

GenMAPP/Mappfinder procedure

Step 1. from within GenMAPP, select Tools > MAPPFinder. Make sure correct gene database is loaded. Correct one was chose.
Step 2. "Calculate New Results"> "Find File" > choose my expression datasheet
Step 3. Choose the Color Set and Criteria with which to filter the data. Click on either the "Increased" and "Decreased" criteria in the right-hand box Pink and blues were chose.
Step 4.Check boxes, "Gene Ontology" and "p value".
Step 5. click browse, create purposeful file name>Click "Run MAPPFinder", takes several minutes
Step 6. Gene ontology window opens showing results. Gene Ontology terms that have at least 3 genes measured and a p value of less than 0.05 will be highlighted yellow. A term with a p value less than 0.05 is considered a "significant" result.
list of the most significant Gene Ontology terms, click on the menu item "Show Ranked List".
Step 7. In the MAPPfinder window you can collapse the tree and browse the genes that were mentioned in the paper. (i.e SAR____)
Step 8.Type the identifier for one of these genes into the MAPPFinder browser gene ID search field. Choose "OrderedLocusNames" from the drop-down menu to the right of the search field. Click on the GeneID Search button. The GO term(s) that are associated with that gene will be highlighted in blue. The genes on the MAPP will be color-coded with the gene expression data from the microarray experiment.
Step 9.Launch Microsoft Excel. Open the copies of the .txt files in Excel (you will need to "Show all files" and click "Finish" to the wizard that will open your file). This will show you the same data that you saw in the MAPPFinder Browser, but in tabular form.
Step 10. You will filter this list to show the top GO terms represented in your data for both the "Increased" and "Decreased" criteria. Click on the drop-down arrow for the column you wish to filter and choose "(Custom…)". A window will open giving you choices on how you want to filter.
- Z score (in column N) greater than 2
- Number Changed (in column I) greater than or equal to 4 or 5 AND less than 100 Percent Changed (in column L) greater than or equal to 25-50
Step 11. Save your changes as an excel spreadsheet.

Table 2. MAPPFinder results for increased “RanaUp” gene expression of 10 GO terms.

GO term	# Changed	% Changed	P-value
Ion Transport	34	30.63	0.012
Threonine Biosynthetic Process	5	83.33	0.05
Ion Transmembrane Transport	22	34.38	0.059
Oxidative Phosphorylation	6	66.67	0.072
Metal Ion Transport	13	35.14	0.55
Sodium Ion Transport	7	53.85	0.39
Cation Transport	18	27.69	0.99
Serine-type Endopeptidase Activity	5	41.67	1
DNA Recombination	12	26.67	1
Endopeptidase Activity	7	31.82	1

Table 1. MAPPFinder results for decreased “RanaDown” gene expression of 10 GO terms.

GO term	# Changed	% Changed	P-value
Pathogenesis	23	30.26	0.044
Cell Wall	7	46.67	0.39
Nucleotide Metabolic Process	36	18.56	0.999
’de novo’ IMP Biosynthetic Process	5	45.45	0.99
Cellular Aromatic Compound Metabolic Process	91	15.29	1
Purine Nucleotide Biosynthetic Process	9	25.71	1
Phosphorelay Signal Transduction System	9	25.71	1
Purine Nucleotide Metabolic Process	27	18.62	1
tRNA Aminoacylation for protein translation	6	30	1
Aminoacyl-tRNA Ligase Activity	6	30	1

Table 3. 10 "most significant" genes

Gene	Function
SAR2523	Putative membrane protein
SAR2774	Collagen adhesin
SAR2725	Putative surface anchored protein
SAR0384	Uncharacterized protein
SAR1398	Phosphate-specific transport system accessory protein PhoU
SAR0645	Putative membrane protein
SAR1408	4-hydroxy-tetrahydrodipicolinate reductase
SAR0932	Putative transposase
SAR2774	Collagen adhesin
SAR0464	N-acetylmuramoyl-L-alanine amidase sle1

From the Overton_MicroarrayData_20141119_CJ_downloaded_20141202_editedKD.xlsx the P-value column was filtered in ascending order. The values towards the top have a p-value of 0.00 showing the most significant genes. The ids were then looked up in Uniportto find out the function of the genes.

BIOL368/F14:Chloe Jones Week 15

Contents

Data Analysis

Redo the Complete Sanity Check

Sanity Check: Number of genes significantly changed

GenMAPP/Mappfinder procedure

Final Presentation

Electronic Lab Notebook

Weekly Assignments

Class Journals

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

research

Tools