Dahlquist:Notebook/Microarray Data Analysis/2008/10/14

{| width="800"
 * style="background-color: #EEE"|[[Image:owwnotebook_icon.png|128px]] Microarray Data Analysis
 * style="background-color: #F2F2F2" align="center"|  |Main project page
 * style="background-color: #F2F2F2" align="center"|  |Main project page


 * colspan="2"|
 * colspan="2"|

Today's Workflow
 First compared dCIN5 and wt data following the same procedure as outlined on 10/02/2008: savePlot(filename = "PvalHistogram_wt-vs-dCIN5", type = c("png"), device = dev.cur)
 * Files in Desktop "Data analysis 2008-10-02"
 * Used gene file "wt-dCIN5_consolidated_Edge_genes-indexonly_20080715.txt"
 * Used covariate file "wt-dCIN5_consolidated_Edge_covariates_20080710.txt
 * Load both into an Edge session.
 * Select "Impute Missing Data" from the menu. Calculate Percent Missing Data by clicking on the button.  The results are:
 * Percent of genes missing data: 7.63%
 * Percent of arrays missing data: 95.35%
 * Overall percent of missing data: 3.15%
 * For KNN Parameters, set:
 * Percent of missing values to tolerate in a gene: 100 (so all genes included)
 * Number of nearest neighbors to use (maximum of 15): 15
 * clicked GO to impute missing data.
 * Selected "Identify Differentially Expressed Genes"
 * Note: this is to compare between the wt and dCIN5 strains. Different parameters and gene/covariate files will need to be used to analyze individual strains.
 * Class Variable is: Strain
 * Differential Expression Type is: Time Course
 * Number of null iterations, set to 1000
 * Choose a seed for reproducible results, set to 47
 * Choose Time Course Settings
 * Covariate giving time points is: Timepoint
 * Covariate corresponding to individuals is: Flask
 * Choose spline type, accepted default of Natural Cubic Spline, dimension 4
 * Click "Apply" and then click "Go"
 * 1000 permutations looks like it will take about 10 minutes.
 * Results: (Saved in 2008-10-14 Results)
 * No significant genes under these settings.
 * Choose Q-Value cutoff as 1, recalculate
 * Saved total list of genes as: "GeneList_20081014_wt-vs-dCIN5"
 * To save the plots, do the following command in the R console window.
 * This will save the active plot window under a file name you choose. Saves in folder "edge_1.1.290"
 * Saved Q-Plot as "QPlot_20081014_wt-vs-dCIN5"
 * Saved Histograms as "PvalHistogram_20081014_wt-vs-dCIN5

Then dCIN5 dataset was ran on its own:
 * Files in "Edge_data_20080710"
 * Used gene file "dCIN5-only_Edge_genes-indexonly_20080715.txt"
 * Used covariate file "dCIN5-only_Edge_covariates_20080710.txt"
 * Load both into an Edge session.
 * Select "Impute Missing Data" from the menu. Calculate Percent Missing Data by clicking on the button.  The results are:
 * Percent of genes missing data: 1.32%
 * Percent of arrays missing data: 90%
 * Overall percent of missing data: 0.09%
 * For KNN Parameters, set:
 * Percent of missing values to tolerate in a gene: 100 (so all genes included)
 * Number of nearest neighbors to use (maximum of 15): 15
 * clicked GO to impute missing data.
 * Selected "Identify Differentially Expressed Genes"
 * Class Variable is: None (within class differential expression)
 * Differential Expression Type is: Time Course
 * Number of null iterations, set to 1000
 * Choose a seed for reproducible results, set to 47
 * Choose Time Course Settings
 * Covariate giving time points is: Timepoint
 * Covariate corresponding to individuals is: Flask
 * Choose spline type, accepted default of Natural Cubic Spline, dimension 4
 * Click "Apply" and then click "Go"
 * 1000 permutations looks like it will take about 2 minutes.
 * Results: (Saved in 2008-10-14 Results)
 * 157 Genes Called Significant (Cutoff Q Value 0.1)
 * Saved total list of genes as "GeneList_20081014_dCIN5-only"
 * Saved Q-Plot as "QPlot_20081014_dCIN5-only"
 * Saved Histograms as "PvalHistogram_20081014_dCIN5-only"

Then wildtype dataset was ran on its own:
 * Files in "Edge_data_20080710"
 * Used gene file "wt-only_Edge_genes-indexonly_20080715.txt"
 * Used covariate file "wt-only_Edge_covariates_20080710.txt"
 * Load both into an Edge session.
 * Select "Impute Missing Data" from the menu. Calculate Percent Missing Data by clicking on the button.  The results are:
 * Percent of genes missing data: 6.79%
 * Percent of arrays missing data: 91.3%
 * Overall percent of missing data: 2.5%
 * For KNN Parameters, set:
 * Percent of missing values to tolerate in a gene: 100 (so all genes included)
 * Number of nearest neighbors to use (maximum of 15): 15
 * clicked GO to impute missing data.
 * Selected "Identify Differentially Expressed Genes"
 * Class Variable is: None (within class differential expression)
 * Differential Expression Type is: Time Course
 * Number of null iterations, set to 1000
 * Choose a seed for reproducible results, set to 47
 * Choose Time Course Settings
 * Covariate giving time points is: Timepoint
 * Covariate corresponding to individuals is: Flask
 * Choose spline type, accepted default of Natural Cubic Spline, dimension 4
 * Click "Apply" and then click "Go"
 * 1000 permutations looks like it will take about 2 minutes.
 * Results: (Saved in 2008-10-14 Results)
 * 1000 Genes Called Significant (Cutoff Q Value 0.0326)
 * Saved total list of genes as "GeneList_20081014_wt-only"
 * Saved Q-Plot as "QPlot_20081014_wt-only"
 * Saved Histograms as "PvalHistogram_20081014_wt-only"

Just in case we want the data, the total data set was ran without strain classification:
 * Files in Desktop "Data analysis 2008-10-02"
 * Used gene file "wt-dCIN5_consolidated_Edge_genes-indexonly_20080715.txt"
 * Used covariate file "wt-dCIN5_consolidated_Edge_covariates_20080710.txt
 * Load both into an Edge session.
 * Select "Impute Missing Data" from the menu. Calculate Percent Missing Data by clicking on the button.  The results are:
 * Percent of genes missing data: 7.63%
 * Percent of arrays missing data: 95.35%
 * Overall percent of missing data: 3.15%
 * For KNN Parameters, set:
 * Percent of missing values to tolerate in a gene: 100 (so all genes included)
 * Number of nearest neighbors to use (maximum of 15): 15
 * clicked GO to impute missing data.
 * Selected "Identify Differentially Expressed Genes"
 * Class Variable is: None (within class differential expression)
 * Differential Expression Type is: Time Course
 * Number of null iterations, set to 1000
 * Choose a seed for reproducible results, set to 47
 * Choose Time Course Settings
 * Covariate giving time points is: Timepoint
 * Covariate corresponding to individuals is: Flask
 * Choose spline type, accepted default of Natural Cubic Spline, dimension 4
 * Click "Apply" and then click "Go"
 * 1000 permutations looks like it will take about 9 minutes.
 * Results: (Saved in 2008-10-14 Results)
 * 998 Genes Called Significant (Cutoff Q Value 0.00475)
 * Saved total list of genes as "GeneList_20081014_wt-and-dCIN5-together"
 * Saved Q-Plot as "QPlot_20081014_wt-and-dCIN5-together"
 * Saved Histograms as "PvalHistogram_20081014_wt-and-dCIN5-together"

--Kevin C. Entzminger 16:54, 14 October 2008 (EDT)


 * }