User:Matthew Whiteside/Notebook/Malaria Microarray/2009/01/29

{| width="800"
 * style="background-color: #EEE"|[[Image:owwnotebook_icon.png|128px]] Project name
 * style="background-color: #F2F2F2" align="center"|  |Main project page
 * style="background-color: #F2F2F2" align="center"|  |Main project page


 * colspan="2"|
 * colspan="2"|

Malaria Task 3.4
Looking in-depth for publicly available human malarial datasets in pubmed, specifically cerebral malaria.

DEFINITIVE LIST of papers to use in meta-analysis
//Dataset: Transcription profiling of human endothelials treated with Plasmodium falciparum infected RBC (PRBCs) and/or TNF-alpha. (AE ID: E-SGRP-3)
 * 1) chakra2007 pmid=17383656

Study: Plasmodium falciparum infected RBC (PRBCs) co-cultered with human umbilical vein endothelials cells. Study simulates of PRBC sequestration to brain microvascular sites. Studies role of endothelium.

Chip: (GPL570) Affymetrix GeneChip Human Genome U133 Plus 2.0 Samples: 28 samples; combinations of HUVEC endothelials, TNF-alpha, and infected/uninfected RBCs.

Normalized: normalized data available. Protocol: Bioconductor affy package RMA normalization. //Dataset: Transcription profiling of human childrens host response to malaria (AE ID: E-SMDB-2669) Study: Whole-blood from Kenyan children with malaria and possibly other infections. Compare gene expression before and after treatment and confirmed infection resolution. Chip: (GPL2614) SMD Homo sapiens Lymphochip Array LC-36
 * 1) griffiths2005 pmid=15838786

Samples: 28 samples; Bacterial & viral infections only, acute malaria, combinations of malaria and other infections, covalescent malaria (baseline).

Normalized: normalized data available. Protocol: Array features with a signal:background ratio of <2.5 (in either sample or reference channel) and a regression correlation coefficient between sample and reference signal of <0.6 were excluded. Fluorescence signals from each array were scaled on the basis of the geometric mean of the sample:reference signal ratio from all array features after local background subtraction. Features representing the same GenBank accession number were collapsed to an arithmetic mean. Gene features with consistent signal quality across the 28 arrays were identified by selecting features that were present on 25 of the arrays. Signal intensity for identical gene features replicated across the arrays were median centered. This refined data set comprised 9869 gene features.

//Dataset: Transcription profiling of whole blood cells from healthy African children and those with uncomplicated malaria or severe malarial anemia (AE ID: E-GEOD-1124, GEO: GEO GSE1124, GDS1971).
 * 1) boldt2006 pmid=16738667

Study: Compare gene expression between uncomplicated and severe malaria anemia. Chip: (GPL96) Affymetrix GeneChip Human Genome HG-U133A

Samples: 15 microarrays with pooled RNA. 4 Control (C), 20 with severe (S) and 20 with uncomplicated (U).

Normalized: normalized data is available. Protocol: We standardized for sample loading and variations in staining by scaling the signals on all arrays to constant target intensity (TGT 150).

//Dataset: Presymptomatic and symptomatic malaria: peripheral blood mononuclear cells (GEO: GDS2362)
 * 1) ocken2006 pmid=16988231

Study: Comparison of peripheral blood mononuclear cells of subjects with early, presymptomatic, experimentally acquired malaria to those with acute, uncomplicated, naturally acquired malaria. Results provide insight into the immune response to malaria in these two stages of infection.

Chip: (GPL96) Affymetrix GeneChip Human Genome U133 Array Set HG-U133A

Samples: 71 samples. RNA extracted from Peripherial Blood Mononuclear Cells (PBMCs). 22 malaria-naive US volunteers (used as baseline). Infected with malaria and comparison Infected sample taken. Group 2: 15 Cameroonian with acute malaria infection, and samples from 12 of them after treatment with chloroguine.

Normalization: The scanned images were analyzed using Affymetrix MAS 5.0 to generate CEL files (fluorescence intensity files), which were normalized at the probe level using the robust multichip average method (19), with the average fluorescence intensity of each probe expressed as log2. The data sets from all groups (22 data sets from experimentally infected U.S. volunteers, 22 data sets from healthy U.S. volunteers, and 15 data sets from naturally infected Cameroonian volunteers) were normalized together in order to permit direct comparisons of gene expression patterns in the two groups relative to the same baseline.

//Dataset: Placental malaria(GEO: GDS2822)
 * 1) muehl2007 pmid=17579077

Study: Analysis of inflamed placentas from patients with chronic placental malaria (PM) Chip: (GPL570) Affymetrix GeneChip Human Genome U133 Plus 2.0 Array

Samples: Compare gene expression of RNA extracted from placentas from first-time Tanzanian mothers. 10 with active placental malaria, 10 PM negative (however, some with evidence of past PM).

Normalized: Transcription profiles were defined by GeneChip operating system (GCOS) absolute expression analysis. Data were normalized by the GeneChip robust multiarray analysis (GC-RMA) algorithm and then analyzed by t test and hierarchical clustering with Acuity 4.0 (Axon).

//Dataset: Effect of Plasmodium falciparum infected erythrocytes on primary human brain microvascular endothelial cell (GEO: GSE9861)
 * 1) tripath2006 pmid=16714553

Study: investigated the global transcriptional gene response of primary human brain endothelial cells after incubation with high numbers of infected erythrocytes.

Chip: (GPL571) Affymetrix GeneChip Human Genome U133A 2.0 Array

Samples: Total of 8 samples (4 control and 4 treated) were analyzed. 4 control samples included two normal RBC control and two medium controls. 4 treated samples includes 2 exposed to low binding Pf-IRBC and 2 exposed to high binding Pf-IRBC (Pf-IRBC-P). Medium and RBC controls were finally used as four replicates of control and all four Pf-IRBC or Pf-IRBC-P exposed endothelial cells were used as 4 separate treated controls.

Normalized: Signal intensity was calculated by GCOS 1.4 software - but affy values are NOT NORMALIZED!!

Karsten performed normalization - email:

I have placed files with normalized values into /tmp on koch:

khokamp@koch:/tmp> l GSE9861_* -rw-r--r-- 1 khokamp wg-users 3257047 2009-05-13 13:56 GSE9861_clean_gcrma.txt -rw-r--r-- 1 khokamp wg-users 3256320 2009-05-13 13:56 GSE9861_gcrma.txt

But if you are going to analyse them with limma, you could easily re-do this yourself in R:

1. Change into the directory containing the (compressed) CEL files, e.g. cd /tmp/GSE9861_cleanCEL

2. start R

3. read data and normalize: library(gcrma) data = ReadAffy norm_data = gcrma(data)

norm_data is an expressionSet that should be readily usable with limma

The last step that I applied for extracting the data was write.exprs(norm_data, file='GSE9861_clean_gcrma.txt')

It might be best to run both the original and the clean data set through limma to see if the resulting gene lists differ greatly.

Explanation of 'clean' version: this experiment has been loaded into ArrayWiki...http://arraywiki.bme.gatech.edu/index.php/GSE9861 As you can see, the first slide has some fairly strong artifacts and gets the lowest quality score (88.63%). They provide a 'clean' version of the original data, where the spots with high variance have been replaced with the median value of this probe from other chips in the dataset (see here for more details: http://arraywiki.bme.gatech.edu/index.php/BioPNG_format)

Others
//Stretch - microarray of ~300 immunity genes in 3 different human populations - data is available in GEO. Focuses on Treg response, and protective alleles/phenotype in Fulani pop.
 * 1) torcia2008 pubmed=18174328

//Commentary on current knowledge of CM using ma. No references to human microarray dataset, strictly murine.
 * 1) chandy2007 pmid=17991710

//Microarray of liver celling expression plasmodium protein.
 * 1) singh2007 pmid=17981117

Is monkey model studies interesting/useful?


 * }