Dahlquist:Notebook/Microarray Data Analysis/2008/10/21

From OpenWetWare
Jump to navigationJump to search

Home        Research        Protocols        Notebook        People        Publications        Courses        Contact       


Microarray Data Analysis Main project page
Previous entry      Next entry

Today's Workflow

The results generated on 10/14/2008 were downloaded and placed on the Desktop in "Edge Analysis" in Kevin's profile. Significant gene results were saved as tab-delimited files and the Pvalue Histograms and QPlots were saved into a powerpoint and printed.

  • Only the wt-only results should be used, the other results are useless, see below for explanation.

Previous run (10/14/2008) on dCIN5-only dataset gave interesting results. While the wt-only dataset produced about 1000 significant genes, the dCIN5-only one gave about 150 significant genes. To verify this result:

  • First the covariates and genelist files were uploaded to lion share. They will be opened with excel and checked for errors.
    • IMPORTANT: It was found that the flask numbers were wrong for covariates files for dCIN5-only and wt-vs-dCIN5. They were changed and new runs were completed.
    • The new files were saved on the desktop in the Edge Analysis folder as:
      • dCIN5-only_Edge_covariates_20081021.txt and
      • wt-vs-dCIN5_Edge_covariates_20081021.txt

Reran the dCIN5-vs-wt data with the updated covariate file:

  • Gene file in Desktop "Data analysis 2008-10-02", Covariate file on Desktop
    • Used gene file "wt-dCIN5_consolidated_Edge_genes-indexonly_20080715.txt"
    • Used covariate file "wt-dCIN5_consolidated_Edge_covariates_20081021.txt
  • Load both into an Edge session.
  • Select "Impute Missing Data" from the menu. Calculate Percent Missing Data by clicking on the button. The results are:
    • Percent of genes missing data: 7.63%
    • Percent of arrays missing data: 95.35%
    • Overall percent of missing data: 3.15%
  • For KNN Parameters, set:
    • Percent of missing values to tolerate in a gene: 100 (so all genes included)
    • Number of nearest neighbors to use (maximum of 15): 15
    • clicked GO to impute missing data.
  • Selected "Identify Differentially Expressed Genes"
    • Class Variable is: Strain
    • Differential Expression Type is: Time Course
    • Number of null iterations, set to 1000
    • Choose a seed for reproducible results, set to 47
    • Choose Time Course Settings
    • Covariate giving time points is: Timepoint
    • Covariate corresponding to individuals is: Flask
    • Choose spline type, accepted default of Natural Cubic Spline, dimension 4
    • Click "Apply" and then click "Go"
    • 1000 permutations looks like it will take about 9 minutes.
  • Results: (Saved in 2008-10-14 Results)
    • 2 significant genes under these settings.(ID 1068 and 1798) with Q Value Cutoff of 0.1
    • Choose show all
      • Saved total list of genes as: "GeneList_20081021_wt-vs-dCIN5"
    • To save the plots, do the following command in the R console window.
savePlot(filename = "PvalHistogram_wt-vs-dCIN5", type = c("png"), device = dev.cur())
  • This will save the active plot window under a file name you choose. Saves in folder "edge_1.1.290"
    • Saved Q-Plot as "QPlot_20081021_wt-vs-dCIN5"
    • Saved Histograms as "PvalHistogram_20081021_wt-vs-dCIN5

Then dCIN5 dataset was ran on its own:

  • Gene file in "Edge_data_20080710" and covariate file on Desktop
    • Used gene file "dCIN5-only_Edge_genes-indexonly_20080715.txt"
    • Used covariate file "dCIN5-only_Edge_covariates_20081021.txt"
  • Load both into an Edge session.
  • Select "Impute Missing Data" from the menu. Calculate Percent Missing Data by clicking on the button. The results are:
    • Percent of genes missing data: 1.32%
    • Percent of arrays missing data: 90%
    • Overall percent of missing data: 0.09%
  • For KNN Parameters, set:
    • Percent of missing values to tolerate in a gene: 100 (so all genes included)
    • Number of nearest neighbors to use (maximum of 15): 15
    • clicked GO to impute missing data.
  • Selected "Identify Differentially Expressed Genes"
    • Class Variable is: None (within class differential expression)
    • Differential Expression Type is: Time Course
    • Number of null iterations, set to 1000
    • Choose a seed for reproducible results, set to 47
    • Choose Time Course Settings
    • Covariate giving time points is: Timepoint
    • Covariate corresponding to individuals is: Flask
    • Choose spline type, accepted default of Natural Cubic Spline, dimension 4
    • Click "Apply" and then click "Go"
    • 1000 permutations looks like it will take about 2 minutes.
  • Results: (Saved on Desktop)
    • 1000 Genes Called Significant (Cutoff Q Value 0.0114)!!!!
    • Saved total list of genes as "GeneList_20081021_dCIN5-only"
    • Saved Q-Plot as "QPlot_20081021_dCIN5-only"
    • Saved Histograms as "PvalHistogram_20081021_dCIN5-only"

All Q-Plots and Pvalue Histograms were combined into a powerpoint. All significant gene lists were exported and saved into text files in Edge Analysis on the Desktop in Dahlquist's Lab.