Evan Montz Week 8

From OpenWetWare
Jump to navigationJump to search

Excel File Spreadsheet

Evan Montz Excel Spreadsheet

Text File Spreadsheet

Evan Montz Text Spreadsheet

Electronic Notebook

  • I first downloaded the file, opened it, and re-saved it to make it unique.
  • I inserted the "scaled_centered" worksheet into the file.
  • Then I copied all the raw data into the new worksheet and added the rows for Average and Standard Deviation.
  • Using the "=Average()" and "STDEV()" functions, I was able to calculate the values for the log ratios for each chip.
    • It was noticed that there were some cells within the raw data that had no data in them. This yielded a minor error saying that some cells had no data. This error was ignored.
  • Next I added in the "scaled_centered" columns for each run of each patient.
  • Now using the "=(B4-$B$2)/$B$3" equation along with the drag down feature, this value was calculated for each cell for every run.
    • NOTE: It is vital to use the "$" symbol while writing this equation because without it, when we used the drag down feature, excel would continue to reference different cells and that was not what we desired.
  • Next, I created the new worksheet titled "statistics"
  • In this worksheet, I copied the first column along with all the columns that were scaled and centered.
  • Then I created the three average columns for each patient and using the =AVERAGE command along with the drag down feature again, I computed the values for each cell for all three patients.
  • Next, I created the column that was to display the average of all three averages. The averages of these values were taken using the same method as the previous step, except I took the average of the three averages rather than the scaled and centered data.
  • Next, I created the Tstat column and computed the results using the function: =AVERAGE(N2:P2)/(STDEV(N2:P2)/SQRT(number of replicates)) where the number of replicates was 3 for our case. The drag down feature was again used to obtain the values for all cells.
  • The last column that was created was the Pvalue column. The values were calculated using the function: =TDIST(ABS(R2),degrees of freedom,2) where the degrees of freedom was 2 for our case. The values were again dragged down to get the rest of the numbers.
  • Finally, I created the "forGenMAPP" worksheet.
  • Here, the data was manipulated to get the correct number of decimal places for the different values. The statistical columns were also cut and shifted to the beginning of the spreadsheet. I also added the "system code" column after the ID column and added an "N" to each cell using the drag down method.
  • Lastly, the file was re-saved as an .xls file and then saved as a .txt file.
  • Both were then uploaded to this page at the top.

Sanity Check: Number of genes significantly changed

  • I clicked on the A1 cell and applied the Autofilter.
  • The autofilter was applied to the pvalue column according to the following criterion:
p<0.05 = 948 genes fit into this criteria
p<0.01 = 235 genes fit into this criteria
p<0.001 = 24 genes fit into this criteria
p<0.0001 = 2 genes fit into this criteria
  • The pvalue criteria was returned to p<0.05
  • The "Avg_LogFC_all" column was then filtered to display all genes with an average log fold change greater than zero.
p<0.05 and "Avg_LogFC_all">0 = 352 genes fit into this criteria
  • The "Avg_LogFC_all" column was then filtered to display all genes with an average log fold change less than zero.
p<0.05 and "Avg_LogFC_all"<0 = 596 genes fit into this criteria
  • The "Avg_LogFC_all" column was then filtered to display all genes with an average log fold change greater than 0.25 or less than -0.25 (This is more realistic).
p<0.05 and "Avg_LogFC_all">0.25 = 339 genes fit into this criteria
p<0.05 and "Avg_LogFC_all"<-0.25 = 579 genes fit into this criteria
  • From what I can tell, our experiment was very similar to that of Merrell et al. (2002). The main difference I see is that they used Statistical Analysis for Microarrays (SAM) program and we used Excel. Assuming that both programs do the same thing, both experiments were very similar.

Sanity Check: Compare individual genes with known data

  • VC0028 (in the table there were two genes with this title)
    • fold change = 1.65 and 1.27 respectively
    • p value = 0.0474 and 0.0629 respectively
    • significantly changed? NO p<0.05 and YES p>0.05 respectively
  • VC0941 (in the table there were two genes with this title)
    • fold change = 0.09 and -.28 respectively
    • p value = 0.6759 and 0.1636 respectively
    • significantly changed? YES p>0.05 and YES p>0.05 respectively
  • VC0869 (in the table there were five genes with this title)
    • fold change = 1.59, 1.95, 2.20, 1.50 and 2.12 respectively
    • p value = 0.0463, 0.0227, 0.0020, 0.0174, and 0.0200 respectively
    • significantly changed? NO p<0.05, NO p<0.05, NO p<0.05, NO p<0.05 and NO p<0.05 respectively
  • VC0051 (in the table there were two genes with this title)
    • fold change = 1.92 and 1.89 respectively
    • p value = 0.0139 and 0.0160 respectively
    • significantly changed? NO p<0.05 and NO p<0.05 respectively
  • VC0647 (in the table there were three genes with this title)
    • fold change = -1.11, -0.94 and -1.05 respectively
    • p value = 0.0003, 0.0125, 0.0051 respectively
    • significantly changed? NO p<0.05, NO p<0.05 and NO p<0.05 respectively
  • VC0468
    • fold change = -0.17
    • p value = 0.3350
    • significantly changed? YES p>0.05
  • VC2350
    • fold change = -2.40
    • p value = 0.0130
    • significantly changed? NO p<0.05
  • VCA0583
    • fold change = 1.06
    • p value = 0.1011
    • significantly changed? YES p>0.05

Evan Montz 22:55, 24 October 2010 (EDT)