Dahlquist:Clustering Microarray Data with STEM

From OpenWetWare
Jump to navigationJump to search

HeaderBlueBig.gif Home        Research        Protocols        Notebook        People        Publications        Courses        Contact       


This page contains the protocol used during SURP 2019.

Clustering and GO Term Enrichment with stem

  1. Go to the Dahlquist Lab repository on GitHub to obtain the microarray data.
  2. Prepare your microarray data file for loading into STEM.
  3. Now download and extract the STEM software.
    • Click here to download an archived version of the stem.zip file to your Desktop.
    • Unzip the file. Right click on the file icon and select the menu item 7-zip > Extract Here.
    • This will create a folder called stem.
    • Inside the folder, double-click on the stem.jar to launch the STEM program.
  4. Running STEM
    1. In section 1 (Expression Data Info) of the the main STEM interface window, click on the Browse... button to navigate to and select your file.
      • Click on the radio button No normalization/add 0.
      • Check the box next to Spot IDs included in the data file.
    2. In section 2 (Gene Info) of the main STEM interface window, leave the default selection for the three drop-down menu selections for Gene Annotation Source, Cross Reference Source, and Gene Location Source as "User provided".
    3. Click the "Browse..." button to the right of the "Gene Annotation File" item. Browse to your "stem" folder and select the file "gene_association.sgd.gz" and click Open.
    4. In section 3 (Options) of the main STEM interface window, make sure that the Clustering Method says "STEM Clustering Method" and do not change the defaults for Maximum Number of Model Profiles or Maximum Unit Change in Model Profiles between Time Points.
    5. In section 4 (Execute) click on the yellow Execute button to run STEM.
  5. Viewing and Saving STEM Results
    1. A new window will open called "All STEM Profiles (1)". Each box corresponds to a model expression profile. Colored profiles have a statistically significant number of genes assigned; they are arranged in order from most to least significant p value. Profiles with the same color belong to the same cluster of profiles. The number in each box is simply an ID number for the profile.
      • Click on the button that says "Interface Options...". At the bottom of the Interface Options window that appears below where it says "X-axis scale should be:", click on the radio button that says "Based on real time". Then close the Interface Options window.
      • Take a screenshot of this window (on a PC, simultaneously press the Alt and PrintScreen buttons to save the view in the active window to the clipboard) and paste it into a PowerPoint presentation to save your figures.
    2. Click on each of the SIGNIFICANT profiles (the colored ones) to open a window showing a more detailed plot containing all of the genes in that profile.
  6. Analyzing and Interpreting STEM Results
    1. Each person in the class will select one profile for further analysis. Answer the following:
      • Why did you select this profile? In other words, why was it interesting to you?
      • How many genes belong to this profile?
      • How many genes were expected to belong to this profile?
      • What is the p value for the enrichment of genes in this profile for each of the strains? Bear in mind that we just finished computing p values to determine whether each individual gene had a significant change in gene expression at each time point. The p value reported by stem determines whether the number of genes that show this particular expression profile across the time points is significantly more than expected.
      • Open the GO list file you saved for this profile in Excel. This list shows all of the Gene Ontology terms that are associated with genes that fit this profile. Select the third row and then choose from the menu Data > Filter > Autofilter. Filter on the "p-value" column to show only GO terms that have a p value of < 0.05. How many GO terms are associated with this profile at p < 0.05? The GO list also has a column called "Corrected p-value". This correction is needed because the software has performed thousands of significance tests. Filter on the "Corrected p-value" column to show only GO terms that have a corrected p value of < 0.05. How many GO terms are associated with this profile with a corrected p value < 0.05?
      • Select 10 Gene Ontology terms from your filtered list (either p < 0.05 or corrected p < 0.05) that you will present and analyze in your final report.
        • Create a table for your final report with just those 10 terms. Your table should include the following data from the GO list file:
          • Category ID
          • Category Name
          • #Genes Category
          • #Genes Assigned
          • #Genes Expected
          • #Genes Enriched
          • p-value
          • Corrected p-value
          • Fold
        • Look up the definitions for each of the terms at http://geneontology.org. For your final lab report, you will supply the definition and discuss the biological interpretation of these GO terms. In other words, why does the cell react to cold shock by changing the expression of genes associated with these GO terms?
        • To easily look up the definitions, go to http://geneontology.org.
        • Copy and paste the GO ID (e.g. GO:0044848) into the search field on the left of the page.
        • In the results page, click on the button that says "Link to detailed information about <term>, in this case "biological phase"".
        • The definition will be on the next results page, e.g. here.