Dahlquist:Notebook/Microarray Data Analysis/2008/10/14: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
Line 72: Line 72:
** 157 Genes Called Significant (Cutoff Q Value 0.1)
** 157 Genes Called Significant (Cutoff Q Value 0.1)
** Saved total list of genes as "GeneList_20081014_dCIN5-only"
** Saved total list of genes as "GeneList_20081014_dCIN5-only"
** Saved Q-Plot as "QPlot_20081014_wt-vs-dCIN5"
** Saved Q-Plot as "QPlot_20081014_dCIN5-only"
** Saved Histograms as "PvalHistogram_20081014_wt-vs-dCIN5
** Saved Histograms as "PvalHistogram_20081014_dCIN5-only"


'''Then wildtype dataset was ran on its own:'''
* Files in "Edge_data_20080710"
** Used gene file "wt-only_Edge_genes-indexonly_20080715.txt"
** Used covariate file "wt-only_Edge_covariates_20080710.txt"
* Load both into an Edge session.
* Select "Impute Missing Data" from the menu.  Calculate Percent Missing Data by clicking on the button.  The results are:
** Percent of genes missing data: 6.79%
** Percent of arrays missing data: 91.3%
** Overall percent of missing data: 2.5%
* For KNN Parameters, set:
** Percent of missing values to tolerate in a gene: 100 (so all genes included)
** Number of nearest neighbors to use (maximum of 15): 15
** clicked GO to impute missing data.
* Selected "Identify Differentially Expressed Genes"
** Class Variable is: None (within class differential expression)
** Differential Expression Type is: Time Course
** Number of null iterations, set to 1000
** Choose a seed for reproducible results, set to 47
** Choose Time Course Settings
** Covariate giving time points is: Timepoint
** Covariate corresponding to individuals is: Flask
** Choose spline type, accepted default of Natural Cubic Spline, dimension 4
** Click "Apply" and then click "Go"
** 1000 permutations looks like it will take about 2 minutes.
* Results: (Saved in 2008-10-14 Results)
** 1000 Genes Called Significant (Cutoff Q Value 0.0326)
** Saved total list of genes as "GeneList_20081014_wt-only"
** Saved Q-Plot as "QPlot_20081014_wt-only"
** Saved Histograms as "PvalHistogram_20081014_wt-only"


<!-- ##### DO NOT edit below this line unless you know what you are doing. ##### -->
<!-- ##### DO NOT edit below this line unless you know what you are doing. ##### -->

Revision as of 13:19, 14 October 2008

Home        Research        Protocols        Notebook        People        Publications        Courses        Contact       


Microarray Data Analysis <html><img src="/images/9/94/Report.png" border="0" /></html> Main project page
<html><img src="/images/c/c3/Resultset_previous.png" border="0" /></html>Previous entry<html>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</html>Next entry<html><img src="/images/5/5c/Resultset_next.png" border="0" /></html>

Today's Workflow

First compared dCIN5 and wt data following the same procedure as outlined on 10/02/2008:

  • Files in Desktop "Data analysis 2008-10-02"
    • Used gene file "wt-dCIN5_consolidated_Edge_genes-indexonly_20080715.txt"
    • Used covariate file "wt-dCIN5_consolidated_Edge_covariates_20080710.txt
  • Load both into an Edge session.
  • Select "Impute Missing Data" from the menu. Calculate Percent Missing Data by clicking on the button. The results are:
    • Percent of genes missing data: 7.63%
    • Percent of arrays missing data: 95.35%
    • Overall percent of missing data: 3.15%
  • For KNN Parameters, set:
    • Percent of missing values to tolerate in a gene: 100 (so all genes included)
    • Number of nearest neighbors to use (maximum of 15): 15
    • clicked GO to impute missing data.
  • Selected "Identify Differentially Expressed Genes"
    • Note: this is to compare between the wt and dCIN5 strains. Different parameters and gene/covariate files will need to be used to analyze individual strains.
    • Class Variable is: Strain
    • Differential Expression Type is: Time Course
    • Number of null iterations, set to 1000
    • Choose a seed for reproducible results, set to 47
    • Choose Time Course Settings
    • Covariate giving time points is: Timepoint
    • Covariate corresponding to individuals is: Flask
    • Choose spline type, accepted default of Natural Cubic Spline, dimension 4
    • Click "Apply" and then click "Go"
    • 1000 permutations looks like it will take about 10 minutes.
  • Results: (Saved in 2008-10-14 Results)
    • No significant genes under these settings.
    • Choose Q-Value cutoff as 1, recalculate
      • Saved total list of genes as: "GeneList_20081014_wt-vs-dCIN5"
    • To save the plots, do the following command in the R console window.
savePlot(filename = "PvalHistogram_wt-vs-dCIN5", type = c("png"), device = dev.cur())
  • This will save the active plot window under a file name you choose. Saves in folder "edge_1.1.290"
    • Saved Q-Plot as "QPlot_20081014_wt-vs-dCIN5"
    • Saved Histograms as "PvalHistogram_20081014_wt-vs-dCIN5

Then dCIN5 dataset was ran on its own:

  • Files in "Edge_data_20080710"
    • Used gene file "dCIN5-only_Edge_genes-indexonly_20080715.txt"
    • Used covariate file "dCIN5-only_Edge_covariates_20080710.txt"
  • Load both into an Edge session.
  • Select "Impute Missing Data" from the menu. Calculate Percent Missing Data by clicking on the button. The results are:
    • Percent of genes missing data: 1.32%
    • Percent of arrays missing data: 90%
    • Overall percent of missing data: 0.09%
  • For KNN Parameters, set:
    • Percent of missing values to tolerate in a gene: 100 (so all genes included)
    • Number of nearest neighbors to use (maximum of 15): 15
    • clicked GO to impute missing data.
  • Selected "Identify Differentially Expressed Genes"
    • Class Variable is: None (within class differential expression)
    • Differential Expression Type is: Time Course
    • Number of null iterations, set to 1000
    • Choose a seed for reproducible results, set to 47
    • Choose Time Course Settings
    • Covariate giving time points is: Timepoint
    • Covariate corresponding to individuals is: Flask
    • Choose spline type, accepted default of Natural Cubic Spline, dimension 4
    • Click "Apply" and then click "Go"
    • 1000 permutations looks like it will take about 2 minutes.
  • Results: (Saved in 2008-10-14 Results)
    • 157 Genes Called Significant (Cutoff Q Value 0.1)
    • Saved total list of genes as "GeneList_20081014_dCIN5-only"
    • Saved Q-Plot as "QPlot_20081014_dCIN5-only"
    • Saved Histograms as "PvalHistogram_20081014_dCIN5-only"

Then wildtype dataset was ran on its own:

  • Files in "Edge_data_20080710"
    • Used gene file "wt-only_Edge_genes-indexonly_20080715.txt"
    • Used covariate file "wt-only_Edge_covariates_20080710.txt"
  • Load both into an Edge session.
  • Select "Impute Missing Data" from the menu. Calculate Percent Missing Data by clicking on the button. The results are:
    • Percent of genes missing data: 6.79%
    • Percent of arrays missing data: 91.3%
    • Overall percent of missing data: 2.5%
  • For KNN Parameters, set:
    • Percent of missing values to tolerate in a gene: 100 (so all genes included)
    • Number of nearest neighbors to use (maximum of 15): 15
    • clicked GO to impute missing data.
  • Selected "Identify Differentially Expressed Genes"
    • Class Variable is: None (within class differential expression)
    • Differential Expression Type is: Time Course
    • Number of null iterations, set to 1000
    • Choose a seed for reproducible results, set to 47
    • Choose Time Course Settings
    • Covariate giving time points is: Timepoint
    • Covariate corresponding to individuals is: Flask
    • Choose spline type, accepted default of Natural Cubic Spline, dimension 4
    • Click "Apply" and then click "Go"
    • 1000 permutations looks like it will take about 2 minutes.
  • Results: (Saved in 2008-10-14 Results)
    • 1000 Genes Called Significant (Cutoff Q Value 0.0326)
    • Saved total list of genes as "GeneList_20081014_wt-only"
    • Saved Q-Plot as "QPlot_20081014_wt-only"
    • Saved Histograms as "PvalHistogram_20081014_wt-only"