User:Matthew Whiteside/Notebook/Malaria Microarray/2009/07/15
|Project name||Main project page|
Previous entry Next entry
Task 4.3: MAID meta-analysis of Human Malaria Microarray Datasets
Meta-analysis: Increase in the sensitivity & reliability of gene expression measurements by integrating results from different microarray datasets that address similar questions
Effect sizes: MAID uses effect sizes. Typical p-values represent the confidence that a observed relationship occurred by chance. Effect sizes measure the strength of the relationship (direction & magnitude of an effect). The effect size referred to here is the standardized measure of the differences in means between trmt & ctrl (see MAID paper).
The benefit of MAID over GeneMeta is that it accommodates direct microarray designs (2-color arrays).
Performing MAID meta-analysis
After preparing the microarray data into a array-level expression matrix (link), used R script ~/.../meta_analysis/maid_de_workflow.R.
Used the maid 1.0.1 bioconductor package provided by the authors. This uses old version of the exprSet objects - so i had to modify the code.
Arrays from contrasts: 1.1, 2.3, 3.1, 4.1, 5.1 were used in 1st meta-analysis. These microarray datasets come from very different tissues.
916 genes were identified as being DE (FDR < 0.05) by the MAID meta-analysis, 150 up / 766 dn. All results are in dir: .../meta_analysis/de_genes/maid/.
In MAID, either a FEM (fixed effects model) with no between study variation or a REM (random effects model) with between study variation is used (see MAID paper). To asses which model is appropriate - the hyp H0: τ2 = 0 is tested (τ2 between study variation).
Q-statistic (eqn 10 in paper) ignores between study variability and will follow a Chi-sq distribution (with degree = number experiments - 1) if H0 is true.
Q = Σ wj(γj - μFEM)2
γj is the observed effect size in study j and μFEM weighted least-sq mean:
μFEM = Σ wj · γj ÷ wj
Based on those results i went with REM.
One of the assumptions is that the transformed data (the effect size z-scores) are aproximately normal. To examine this i created a Q-Q normality plot with the REM model z-scores. Here is that figure. Is the data normal?
MAID / GeneMeta also suggest plots to show the improvement in sensitivity & recall provided by the meta-analysis.
The IDR (improved discovery rate) plot shows the proportion of additional z-scores above a threshold in the combined studies vs the individual studies (i.e. what proportion of predictions by the meta-analysis are unique to the meta-analysis and would not be found by doing individual analysis). There is a line for positive effect sizes and negative effect sizes. Here is the IDR plot:
Here is a figure showing the combined and individual z-scores for contrast 2. You can see some changes (reversals of effect sizes), however there does appear to be an improvement in combined z-score for a select few of the genes - these will be the reinforced / highly-expressed genes found in other datasets.
TAKE HOME: The IDR is low and number of genes identified is also low relative to some of the datasets. The only benefit of meta-analysis here is generalization - common malaria genes that will be active in many tissues. There also may be problems with including this diverse of datasets - as the normality assumption may be violated.
I ran GO term and pathway ora analysis (InnateDB) using the 916 genes identified from the meta-analysis. Results are in .../malaria_data/meta_analysis/ora/maid/.
764 pathways were found associated with genes. 0 were statistically significant after Benjamini-hochburg multiple hypothesis correction (FDR < 0.05). One reason for this - the genes identified by the meta-analysis were core, downstream genes; TNF, INF-gamma, STAT etc. These are in many pathways - so result in large numbers of predictions. The BH correction probably assumes 765 independent predictions were made - this is not true, since many of the predictions were dependent.
This is similar for the 1761 GO terms.
Another note: many of the pathways and GO terms associated with down-reg genes are core processes like transcription, translation etc. This is not surprising since infected cells will go into a state of cellular senescence.