Dahlquist:Notebook/Microarray Data Analysis/2008/10/14

From OpenWetWare
Jump to navigationJump to search

Home        Research        Protocols        Notebook        People        Publications        Courses        Contact       


Microarray Data Analysis Main project page
Previous entry      Next entry

Today's Workflow

First compared dCIN5 and wt data following the same procedure as outlined on 10/02/2008:

  • Files in Desktop "Data analysis 2008-10-02"
    • Used gene file "wt-dCIN5_consolidated_Edge_genes-indexonly_20080715.txt"
    • Used covariate file "wt-dCIN5_consolidated_Edge_covariates_20080710.txt
  • Load both into an Edge session.
  • Select "Impute Missing Data" from the menu. Calculate Percent Missing Data by clicking on the button. The results are:
    • Percent of genes missing data: 7.63%
    • Percent of arrays missing data: 95.35%
    • Overall percent of missing data: 3.15%
  • For KNN Parameters, set:
    • Percent of missing values to tolerate in a gene: 100 (so all genes included)
    • Number of nearest neighbors to use (maximum of 15): 15
    • clicked GO to impute missing data.
  • Selected "Identify Differentially Expressed Genes"
    • Note: this is to compare between the wt and dCIN5 strains. Different parameters and gene/covariate files will need to be used to analyze individual strains.
    • Class Variable is: Strain
    • Differential Expression Type is: Time Course
    • Number of null iterations, set to 1000
    • Choose a seed for reproducible results, set to 47
    • Choose Time Course Settings
    • Covariate giving time points is: Timepoint
    • Covariate corresponding to individuals is: Flask
    • Choose spline type, accepted default of Natural Cubic Spline, dimension 4
    • Click "Apply" and then click "Go"
    • 1000 permutations looks like it will take about 10 minutes.
  • Results: (Saved in 2008-10-14 Results)
    • No significant genes under these settings.
    • Choose Q-Value cutoff as 1, recalculate
      • Saved total list of genes as: "GeneList_20081014_wt-vs-dCIN5"
    • To save the plots, do the following command in the R console window.
savePlot(filename = "PvalHistogram_wt-vs-dCIN5", type = c("png"), device = dev.cur())
  • This will save the active plot window under a file name you choose. Saves in folder "edge_1.1.290"
    • Saved Q-Plot as "QPlot_20081014_wt-vs-dCIN5"
    • Saved Histograms as "PvalHistogram_20081014_wt-vs-dCIN5

Then dCIN5 dataset was ran on its own:

  • Files in "Edge_data_20080710"
    • Used gene file "dCIN5-only_Edge_genes-indexonly_20080715.txt"
    • Used covariate file "dCIN5-only_Edge_covariates_20080710.txt"
  • Load both into an Edge session.
  • Select "Impute Missing Data" from the menu. Calculate Percent Missing Data by clicking on the button. The results are:
    • Percent of genes missing data: 1.32%
    • Percent of arrays missing data: 90%
    • Overall percent of missing data: 0.09%
  • For KNN Parameters, set:
    • Percent of missing values to tolerate in a gene: 100 (so all genes included)
    • Number of nearest neighbors to use (maximum of 15): 15
    • clicked GO to impute missing data.
  • Selected "Identify Differentially Expressed Genes"
    • Class Variable is: None (within class differential expression)
    • Differential Expression Type is: Time Course
    • Number of null iterations, set to 1000
    • Choose a seed for reproducible results, set to 47
    • Choose Time Course Settings
    • Covariate giving time points is: Timepoint
    • Covariate corresponding to individuals is: Flask
    • Choose spline type, accepted default of Natural Cubic Spline, dimension 4
    • Click "Apply" and then click "Go"
    • 1000 permutations looks like it will take about 2 minutes.
  • Results: (Saved in 2008-10-14 Results)
    • 157 Genes Called Significant (Cutoff Q Value 0.1)
    • Saved total list of genes as "GeneList_20081014_dCIN5-only"
    • Saved Q-Plot as "QPlot_20081014_dCIN5-only"
    • Saved Histograms as "PvalHistogram_20081014_dCIN5-only"

Then wildtype dataset was ran on its own:

  • Files in "Edge_data_20080710"
    • Used gene file "wt-only_Edge_genes-indexonly_20080715.txt"
    • Used covariate file "wt-only_Edge_covariates_20080710.txt"
  • Load both into an Edge session.
  • Select "Impute Missing Data" from the menu. Calculate Percent Missing Data by clicking on the button. The results are:
    • Percent of genes missing data: 6.79%
    • Percent of arrays missing data: 91.3%
    • Overall percent of missing data: 2.5%
  • For KNN Parameters, set:
    • Percent of missing values to tolerate in a gene: 100 (so all genes included)
    • Number of nearest neighbors to use (maximum of 15): 15
    • clicked GO to impute missing data.
  • Selected "Identify Differentially Expressed Genes"
    • Class Variable is: None (within class differential expression)
    • Differential Expression Type is: Time Course
    • Number of null iterations, set to 1000
    • Choose a seed for reproducible results, set to 47
    • Choose Time Course Settings
    • Covariate giving time points is: Timepoint
    • Covariate corresponding to individuals is: Flask
    • Choose spline type, accepted default of Natural Cubic Spline, dimension 4
    • Click "Apply" and then click "Go"
    • 1000 permutations looks like it will take about 2 minutes.
  • Results: (Saved in 2008-10-14 Results)
    • 1000 Genes Called Significant (Cutoff Q Value 0.0326)
    • Saved total list of genes as "GeneList_20081014_wt-only"
    • Saved Q-Plot as "QPlot_20081014_wt-only"
    • Saved Histograms as "PvalHistogram_20081014_wt-only"

Just in case we want the data, the total data set was ran without strain classification:

  • Files in Desktop "Data analysis 2008-10-02"
    • Used gene file "wt-dCIN5_consolidated_Edge_genes-indexonly_20080715.txt"
    • Used covariate file "wt-dCIN5_consolidated_Edge_covariates_20080710.txt
  • Load both into an Edge session.
  • Select "Impute Missing Data" from the menu. Calculate Percent Missing Data by clicking on the button. The results are:
    • Percent of genes missing data: 7.63%
    • Percent of arrays missing data: 95.35%
    • Overall percent of missing data: 3.15%
  • For KNN Parameters, set:
    • Percent of missing values to tolerate in a gene: 100 (so all genes included)
    • Number of nearest neighbors to use (maximum of 15): 15
    • clicked GO to impute missing data.
  • Selected "Identify Differentially Expressed Genes"
    • Class Variable is: None (within class differential expression)
    • Differential Expression Type is: Time Course
    • Number of null iterations, set to 1000
    • Choose a seed for reproducible results, set to 47
    • Choose Time Course Settings
    • Covariate giving time points is: Timepoint
    • Covariate corresponding to individuals is: Flask
    • Choose spline type, accepted default of Natural Cubic Spline, dimension 4
    • Click "Apply" and then click "Go"
    • 1000 permutations looks like it will take about 9 minutes.
  • Results: (Saved in 2008-10-14 Results)
    • 998 Genes Called Significant (Cutoff Q Value 0.00475)
    • Saved total list of genes as "GeneList_20081014_wt-and-dCIN5-together"
    • Saved Q-Plot as "QPlot_20081014_wt-and-dCIN5-together"
    • Saved Histograms as "PvalHistogram_20081014_wt-and-dCIN5-together"

--Kevin C. Entzminger 16:54, 14 October 2008 (EDT)