Evan Montz Week 8

Excel File Spreadsheet

Text File Spreadsheet

Electronic Notebook

I first downloaded the file, opened it, and re-saved it to make it unique.
I inserted the "scaled_centered" worksheet into the file.
Then I copied all the raw data into the new worksheet and added the rows for Average and Standard Deviation.
Using the "=Average()" and "STDEV()" functions, I was able to calculate the values for the log ratios for each chip.
- It was noticed that there were some cells within the raw data that had no data in them. This yielded a minor error saying that some cells had no data. This error was ignored.
Next I added in the "scaled_centered" columns for each run of each patient.
Now using the "=(B4-$B$2)/$B$3" equation along with the drag down feature, this value was calculated for each cell for every run.
- NOTE: It is vital to use the "$" symbol while writing this equation because without it, when we used the drag down feature, excel would continue to reference different cells and that was not what we desired.
Next, I created the new worksheet titled "statistics"
In this worksheet, I copied the first column along with all the columns that were scaled and centered.
Then I created the three average columns for each patient and using the =AVERAGE command along with the drag down feature again, I computed the values for each cell for all three patients.
Next, I created the column that was to display the average of all three averages. The averages of these values were taken using the same method as the previous step, except I took the average of the three averages rather than the scaled and centered data.
Next, I created the Tstat column and computed the results using the function: =AVERAGE(N2:P2)/(STDEV(N2:P2)/SQRT(number of replicates)) where the number of replicates was 3 for our case. The drag down feature was again used to obtain the values for all cells.
The last column that was created was the Pvalue column. The values were calculated using the function: =TDIST(ABS(R2),degrees of freedom,2) where the degrees of freedom was 2 for our case. The values were again dragged down to get the rest of the numbers.
Finally, I created the "forGenMAPP" worksheet.
Here, the data was manipulated to get the correct number of decimal places for the different values. The statistical columns were also cut and shifted to the beginning of the spreadsheet. I also added the "system code" column after the ID column and added an "N" to each cell using the drag down method.
Lastly, the file was re-saved as an .xls file and then saved as a .txt file.
Both were then uploaded to this page at the top.

Sanity Check: Number of genes significantly changed

I clicked on the A1 cell and applied the Autofilter.
The autofilter was applied to the pvalue column according to the following criterion:

p<0.05 = 948 genes fit into this criteria
p<0.01 = 235 genes fit into this criteria
p<0.001 = 24 genes fit into this criteria
p<0.0001 = 2 genes fit into this criteria

The pvalue criteria was returned to p<0.05
The "Avg_LogFC_all" column was then filtered to display all genes with an average log fold change greater than zero.

p<0.05 and "Avg_LogFC_all">0 = 352 genes fit into this criteria

The "Avg_LogFC_all" column was then filtered to display all genes with an average log fold change less than zero.

p<0.05 and "Avg_LogFC_all"<0 = 596 genes fit into this criteria

The "Avg_LogFC_all" column was then filtered to display all genes with an average log fold change greater than 0.25 or less than -0.25 (This is more realistic).

p<0.05 and "Avg_LogFC_all">0.25 = 339 genes fit into this criteria
p<0.05 and "Avg_LogFC_all"<-0.25 = 579 genes fit into this criteria

From what I can tell, our experiment was very similar to that of Merrell et al. (2002). The main difference I see is that they used Statistical Analysis for Microarrays (SAM) program and we used Excel. Assuming that both programs do the same thing, both experiments were very similar.

Sanity Check: Compare individual genes with known data

VC0028 (in the table there were two genes with this title)
- fold change = 1.65 and 1.27 respectively
- p value = 0.0474 and 0.0629 respectively
- significantly changed? NO p<0.05 and YES p>0.05 respectively
VC0941 (in the table there were two genes with this title)
- fold change = 0.09 and -.28 respectively
- p value = 0.6759 and 0.1636 respectively
- significantly changed? YES p>0.05 and YES p>0.05 respectively
VC0869 (in the table there were five genes with this title)
- fold change = 1.59, 1.95, 2.20, 1.50 and 2.12 respectively
- p value = 0.0463, 0.0227, 0.0020, 0.0174, and 0.0200 respectively
- significantly changed? NO p<0.05, NO p<0.05, NO p<0.05, NO p<0.05 and NO p<0.05 respectively
VC0051 (in the table there were two genes with this title)
- fold change = 1.92 and 1.89 respectively
- p value = 0.0139 and 0.0160 respectively
- significantly changed? NO p<0.05 and NO p<0.05 respectively
VC0647 (in the table there were three genes with this title)
- fold change = -1.11, -0.94 and -1.05 respectively
- p value = 0.0003, 0.0125, 0.0051 respectively
- significantly changed? NO p<0.05, NO p<0.05 and NO p<0.05 respectively
VC0468
- fold change = -0.17
- p value = 0.3350
- significantly changed? YES p>0.05
VC2350
- fold change = -2.40
- p value = 0.0130
- significantly changed? NO p<0.05
VCA0583
- fold change = 1.06
- p value = 0.1011
- significantly changed? YES p>0.05

Evan Montz 22:55, 24 October 2010 (EDT)

Evan Montz Week 8

Contents

Excel File Spreadsheet

Text File Spreadsheet

Electronic Notebook

Sanity Check: Number of genes significantly changed

Sanity Check: Compare individual genes with known data

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

research

Tools