Kara M Dismuke Week 11 Journal

From OpenWetWare
Jump to navigationJump to search

Electronic Journal Notebook

Summary of what you need to turn in for the individual Week 11 assignment Upload your updated Excel spreadsheet to LionShare that has today's calculations in it. Use the same filename as before so that the download link that you already provided to Drs. Dahlquist and Fitzpatrick will still work. Create, upload to OpenWetWare, and link to a PowerPoint presentation that contains the p value table and the screenshots of your stem results. Each slide in the presentation should have a meaningful title that describes the main message of the slide. These slides will form the basis of your final presentation in the class. [[ Zip together all of the tab-delimited text files that you created for and from stem and upload them to LionShare: the file that was saved from your original spreadsheet that you used to run stem each of the genelist and GOlist files for each of your significant profiles.

Methods

Background

To analyze microarray data:

  • Using GenePix Pro Software...
    • In each spot, quantitate the fluorescence signal
    • Calculate ration of red/green fluorescence
    • Log transform ratios
    • Normalize ratios on each microarray slide
  • Using a script in R...
    • Normalize ratios for set of slides
  • Using Microsoft Excel...
    • Perform statistical analysis on ratios
    • Compare genes w/ known data
  • USING STEM software...
    • Map to biological pathways
  • Using MATLAB...
    • Create mathematical model of transcriptional network


Project Partner: Kristen Horstmann Dahlquist lab microarray data set: Wild type vs. dZAP1

  • Note: Kristen will be analyzing the Wild type strain data and I (Kara) will be analyzing the dZAP1 strain data.

Statistical Analysis: ANOVA

  1. In the Excel data file, we created a new sheet, and named it "stats"
  1. In first row...
  2. Copied first two columns into the stats sheet (ID, Standard name)
    1. Labeled columns C-G using the format: (STRAIN)_xbar_(TIME) in first row
      • In the dZAP1 case: *"dZAP1_xbar_t15", "dZAP1_xbar_t30", "dZAP1_xbar_t60", "dZAP1_xbar_t90", "dZAP1_xbar_t120"
    2. Labeled columns H and I using the format: (STRAIN)_xbar_grand and (STRAIN)_ss_HO
      • In the dZAP1 case: *"dZAP1_xbar_grand", "dZAP1_ss_HO"
    3. Labeled columns J-N using the format: (STRAIN)_SS_full, Fstat and p-value
      • In the dZAP1 case: *"dZAP1_SS_full", "dZAP1_Fstat", "dZAP1_p-value"
  3. Performed Computations
    1. In A2, typed =AVERAGE(
      • clicked tab containing the data, then highlighted all data sheet in row 2 associated with dZAP1 and t15 then close parenthesis with ) and press "Enter"
        • Then, we did this for t30, t60, t90, t120 (to compute the average)
    2. Clicked cell C2, positioned cursor at bottom right corner and once, we saw a plus sign, double clicked so formula will be copied into column for all the other genes
      • Performed similar operation for D2 (with t30 data), E2 (with t60 data), F2 (with t90 data), and G2 (with t120 data)
      • Performed similar operation for all dZAP1 data (entire row 2 instead of specific time points)


    • After downloading the data, we used Excel to compute the averages of our data from the replicates from each time point for each gene.
    • Then we computed the "grand" average for all of our data for a particular gene. For the case of dZAP1, this could be computed by averaging all the data (this is contains all the data at each time point) or by averaging the averages previously connected because there were 4 replicates for each time point. If done both ways, you should get the same value.
    • column heading: "ZAP1_xbar_grand"
    • We used Excel to compute the sum of squares for the data for each individual gene.
      • column heading: "ZAP1_ss_HO"
    • For each time point, we calculated the sum of squares by squaring the original data, and then subtracted from this the replicates times the average of the time point's data squared.
      • column headings: "ZAP1_ss_t15", "ZAP1_ss_t30", "ZAP1_ss_t60", "ZAP1_ss_t90", "ZAP1_ss_t120"
    • Then, we took these calculated values for each time point and found the average of them.
      • column heading: "ZAP1_SS_full"


Results

unadjusted p value <.05

  • wild-type: 31.42% (2378/6189)
  • dZAP1: 36.58% (2264/6189)

unadjusted p-value < .01

  • wild-type: 24.67% (1527/6189)
  • dZAP1: 23.35% (1445/6189)

unadjusted p-value < .001

  • wild-type: 13.90% (860/6189)
  • dZAP1: 12.80% (792/6189)

unadjusted p-value < .0001

  • wild-type: 7.43% (460/6189)
  • dZAP1: 6.69% (414/6189)

Benjamini & Hochberg-adjusted p-value < .05

  • wild-type: 26.76% (1656/6189)
  • dzap1: 24.85% (1538/6189)

Bonferroni-adjusted p-value <.05

  • wild-type: 3.68% (228/6189)
  • dZAP1: 3.10% (192/6189)



Did this step first: Prepare your microarray data file for loading into STEM.

use B-H adjusted p values, to find ones that are not significant (>.05) delete columns H through T (kept columns with xbar...our averages)

Biological Interpretation of STEM Results

Conclusions

Answers to Questions