Dahlquist:Notebook/Microarray Data Analysis/2008/10/21: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
Line 17: Line 17:


* First the covariates and genelist files were uploaded to lion share. They will be opened with excel and checked for errors.  
* First the covariates and genelist files were uploaded to lion share. They will be opened with excel and checked for errors.  
** IMPORTANT: It was found that the flask numbers were wrong for covariates files for dCIN5-only and wt-vs-dCIN5. They were changed and new runs were completed:
** IMPORTANT: It was found that the flask numbers were wrong for covariates files for dCIN5-only and wt-vs-dCIN5. They were changed and new runs were completed.
** The new files were saved on the desktop in the Edge Analysis folder as:
*** dCIN5-only_Edge_covariates_20081021.txt and
*** wt-vs-dCIN5_Edge_covariates_20081021.txt


* Then for an additional test, the difference between dCIN5 and wt at an individual timepoint was tested:
* Then for an additional test, the difference between dCIN5 and wt at an individual timepoint was tested:

Revision as of 13:41, 21 October 2008

Home        Research        Protocols        Notebook        People        Publications        Courses        Contact       


Microarray Data Analysis <html><img src="/images/9/94/Report.png" border="0" /></html> Main project page
<html><img src="/images/c/c3/Resultset_previous.png" border="0" /></html>Previous entry<html>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</html>Next entry<html><img src="/images/5/5c/Resultset_next.png" border="0" /></html>

Today's Workflow

The results generated on 10/14/2008 were downloaded and placed on the Desktop in "Edge Analysis" in Kevin's profile. Significant gene results were saved as tab-delimited files and the Pvalue Histograms and QPlots were saved into a powerpoint and printed.

  • Only the wt-only results should be used, the other results are useless, see below for explanation.

Previous run (10/14/2008) on dCIN5-only dataset gave interesting results. While the wt-only dataset produced about 1000 significant genes, the dCIN5-only one gave about 150 significant genes. To verify this result:

  • First the covariates and genelist files were uploaded to lion share. They will be opened with excel and checked for errors.
    • IMPORTANT: It was found that the flask numbers were wrong for covariates files for dCIN5-only and wt-vs-dCIN5. They were changed and new runs were completed.
    • The new files were saved on the desktop in the Edge Analysis folder as:
      • dCIN5-only_Edge_covariates_20081021.txt and
      • wt-vs-dCIN5_Edge_covariates_20081021.txt
  • Then for an additional test, the difference between dCIN5 and wt at an individual timepoint was tested:
    • Files in Desktop "Data analysis 2008-10-02"
    • Used gene file "wt-dCIN5_consolidated_Edge_genes-indexonly_20080715.txt"
    • Used covariate file "wt-dCIN5_consolidated_Edge_covariates_20080710.txt"
  • Load both into an Edge session.
  • Select "Impute Missing Data" from the menu. Calculate Percent Missing Data by clicking on the button. The results are:
    • Percent of genes missing data: 7.63%
    • Percent of arrays missing data: 95.35%
    • Overall percent of missing data: 3.15%
  • For KNN Parameters, set:
    • Percent of missing values to tolerate in a gene: 100 (so all genes included)
    • Number of nearest neighbors to use (maximum of 15): 15
    • clicked GO to impute missing data.
  • Selected "Identify Differentially Expressed Genes"
    • Class Variable is: Strain
    • Differential Expression Type is: Static (standard, non-time course sampling)
    • Number of null iterations, set to 1000
    • Choose a seed for reproducible results, set to 47
    • click "Go"
    • 1000 permutations looks like it will take about 1h 35min. Because it was taking so long and because it may not have produced the results we wanted, I aborted to do the analysis stated below.
    • This computation will identify genes with a significant difference in expression between wt and dCIN5 without respect to time. To determine the difference between individual timepoints, the genes-indexonly files will have to be changed to show only the timepoint of interest.
  • Results: (Saved in )
    • # significant genes under these settings.
    • Choose Q-Value cutoff as 1, recalculate
      • Saved total list of genes as: ""
    • To save the plots, do the following command in the R console window.
savePlot(filename = "PvalHistogram_wt-vs-dCIN5", type = c("png"), device = dev.cur())
  • This will save the active plot window under a file name you choose. Saves in folder "edge_1.1.290"
    • Saved Q-Plot as ""
    • Saved Histograms as "