Evan Montz Week 8
From OpenWetWare
Jump to navigationJump to search
Excel File Spreadsheet
Text File Spreadsheet
Electronic Notebook
- I first downloaded the file, opened it, and re-saved it to make it unique.
- I inserted the "scaled_centered" worksheet into the file.
- Then I copied all the raw data into the new worksheet and added the rows for Average and Standard Deviation.
- Using the "=Average()" and "STDEV()" functions, I was able to calculate the values for the log ratios for each chip.
- It was noticed that there were some cells within the raw data that had no data in them. This yielded a minor error saying that some cells had no data. This error was ignored.
- Next I added in the "scaled_centered" columns for each run of each patient.
- Now using the "=(B4-$B$2)/$B$3" equation along with the drag down feature, this value was calculated for each cell for every run.
- NOTE: It is vital to use the "$" symbol while writing this equation because without it, when we used the drag down feature, excel would continue to reference different cells and that was not what we desired.
- Next, I created the new worksheet titled "statistics"
- In this worksheet, I copied the first column along with all the columns that were scaled and centered.
- Then I created the three average columns for each patient and using the =AVERAGE command along with the drag down feature again, I computed the values for each cell for all three patients.
- Next, I created the column that was to display the average of all three averages. The averages of these values were taken using the same method as the previous step, except I took the average of the three averages rather than the scaled and centered data.
- Next, I created the Tstat column and computed the results using the function: =AVERAGE(N2:P2)/(STDEV(N2:P2)/SQRT(number of replicates)) where the number of replicates was 3 for our case. The drag down feature was again used to obtain the values for all cells.
- The last column that was created was the Pvalue column. The values were calculated using the function: =TDIST(ABS(R2),degrees of freedom,2) where the degrees of freedom was 2 for our case. The values were again dragged down to get the rest of the numbers.
- Finally, I created the "forGenMAPP" worksheet.
- Here, the data was manipulated to get the correct number of decimal places for the different values. The statistical columns were also cut and shifted to the beginning of the spreadsheet. I also added the "system code" column after the ID column and added an "N" to each cell using the drag down method.
- Lastly, the file was re-saved as an .xls file and then saved as a .txt file.
- Both were then uploaded to this page at the top.
Sanity Check: Number of genes significantly changed
- I clicked on the A1 cell and applied the Autofilter.
- The autofilter was applied to the pvalue column according to the following criterion:
p<0.05 = 948 genes fit into this criteria p<0.01 = 235 genes fit into this criteria p<0.001 = 24 genes fit into this criteria p<0.0001 = 2 genes fit into this criteria
- The pvalue criteria was returned to p<0.05
- The "Avg_LogFC_all" column was then filtered to display all genes with an average log fold change greater than zero.
p<0.05 and "Avg_LogFC_all">0 = 352 genes fit into this criteria
- The "Avg_LogFC_all" column was then filtered to display all genes with an average log fold change less than zero.
p<0.05 and "Avg_LogFC_all"<0 = 596 genes fit into this criteria
- The "Avg_LogFC_all" column was then filtered to display all genes with an average log fold change greater than 0.25 or less than -0.25 (This is more realistic).
p<0.05 and "Avg_LogFC_all">0.25 = 339 genes fit into this criteria p<0.05 and "Avg_LogFC_all"<-0.25 = 579 genes fit into this criteria
- From what I can tell, our experiment was very similar to that of Merrell et al. (2002). The main difference I see is that they used Statistical Analysis for Microarrays (SAM) program and we used Excel. Assuming that both programs do the same thing, both experiments were very similar.
Sanity Check: Compare individual genes with known data
- VC0028 (in the table there were two genes with this title)
- fold change = 1.65 and 1.27 respectively
- p value = 0.0474 and 0.0629 respectively
- significantly changed? NO p<0.05 and YES p>0.05 respectively
- VC0941 (in the table there were two genes with this title)
- fold change = 0.09 and -.28 respectively
- p value = 0.6759 and 0.1636 respectively
- significantly changed? YES p>0.05 and YES p>0.05 respectively
- VC0869 (in the table there were five genes with this title)
- fold change = 1.59, 1.95, 2.20, 1.50 and 2.12 respectively
- p value = 0.0463, 0.0227, 0.0020, 0.0174, and 0.0200 respectively
- significantly changed? NO p<0.05, NO p<0.05, NO p<0.05, NO p<0.05 and NO p<0.05 respectively
- VC0051 (in the table there were two genes with this title)
- fold change = 1.92 and 1.89 respectively
- p value = 0.0139 and 0.0160 respectively
- significantly changed? NO p<0.05 and NO p<0.05 respectively
- VC0647 (in the table there were three genes with this title)
- fold change = -1.11, -0.94 and -1.05 respectively
- p value = 0.0003, 0.0125, 0.0051 respectively
- significantly changed? NO p<0.05, NO p<0.05 and NO p<0.05 respectively
- VC0468
- fold change = -0.17
- p value = 0.3350
- significantly changed? YES p>0.05
- VC2350
- fold change = -2.40
- p value = 0.0130
- significantly changed? NO p<0.05
- VCA0583
- fold change = 1.06
- p value = 0.1011
- significantly changed? YES p>0.05
Evan Montz 22:55, 24 October 2010 (EDT)