# Katrina Sherbina: Week 4

## 06/08/2011

Using the multtest package that comes with the R program, we began calculating p values to analyze whether or not genes that were differentially expressed through experimentation were differentially expressed by chance or not. Only the between array normalized data from the wildtype strains from time point t15 were used. This data was extracted from its master file and written to a new .csv file. This file was then imported into R. In multtest, the method used was the step-down minP and the test used was the one sample t test. When first running the code, few numerical p values were generated. Mostly the output was a series of NaN's. In modifying the code to include standardize=FALSE, multtest was able to successfully produce p-values.

data1<-read.csv("wt_t15_normalized.csv",header=TRUE,sep=",") data2<-as.matrix(data1[,2:4]) data3<-rep(0,length(data1[,2:4])) seed<-99 exp<-MTP(X=data2,Y=data3,test="t.onesamp",B=100,method="sd.minP",seed=seed,standardize=FALSE) write.table(as.matrix(exp@rawp),"wt_t15_pvalues.csv",sep=",")

The next step is to compare the raw p-values to p-values generated by hand with the same data in Excel using Dr. Dahlquist's method.

Katrina Sherbina 20:07, 8 June 2011 (EDT)

## 06/09/2011

The T statistic and P values for the data collected for the wild type strains for all replicates at t15 were calculated in Excel using the formulas AVERAGE((range of cells)/(STDEV(range of cells)/SQRT(number of replicates)) and TDIST(ABS(cell containing T statistic),degrees of freedom,2) where degrees of freedom is the number of replicates minus 1, respectively. The Bonferroni correction was also calculated for each data point.

Using the multtest package in R, the raw p values and the adjusted p values were collected for the same data mentioned above using multiple testing procedues such as the single-step maxT (ss.maxT), single-step minP (ss.minP), step-down maxT (sd.maxT), and step-down minP (sd.minP). Also, raw p and adjusted p values were calculated by controlling the false discover rate.

data1<-read.csv("wt_t15_normalized.csv",header=TRUE,sep=",") data2<-as.matrix(data1[,2:4]) data3<-rep(0,length(data1[,2:4])) seed<-99 exp<-MTP(X=data2,Y=data3,test="t.onesamp",standardize=FALSE,typeone="fdr",fdr.method="conservative",B=3000,seed=seed) exp1<-MTP(X=data2,Y=data3,test="t.onesamp",standardize=FALSE,method="sd.minP",B=3000,seed=seed) exp2<-MTP(X=data2,Y=data3,test="t.onesamp",standardize=FALSE,method="sd.maxT",B=3000,seed=seed) exp3<-MTP(X=data2,Y=data3,test="t.onesamp",standardize=FALSE,method="ss.minP",B=3000,seed=seed) exp4<-MTP(X=data2,Y=data3,test="t.onesamp",standardize=FALSE,method="ss.maxT",B=3000,seed=seed)

Also four graphical summaries were created for the results of the ss.minP: number of rejected hypotheses vs. Type I error rate, sorted adjusted p-values vs. number of rejected hypotheses, adjusted p-values vs. test statistics, and adjusted p-values versus index.

The next step will be to compare the raw and adjusted p values calculated by the R multiple testing procedures with the p value and Bonferroni corrections calculated in Excel, respectively.

Katrina Sherbina 19:43, 9 June 2011 (EDT)

## 06/10/2011

For wildtype t15 data, A scatter plot was creating comparing the t-statistic derived raw p values with the raw p values obtained from performing a multtest using FDR and found that the t-stat derived raw p values were more conservative than the FDR raw p values. Also, for the dCIN5 data for all time point, f-statistic derived raw p values were calculated. These raw p values were comparied to the raw p values from a multtest using FDR, ss.minP, and sd.minP in three separate scatter plots. Each of these three scatter plots showed no correlation between the f-statistic derived raw p values and any of the p values calculated by the different multtests performed. Also, a benjamini hochberg correction wqas applied to the f-statistic derived p values. These were then compared to the adjusted p-values calculated using the aformentioned multtests. Again, it was not possible to discern any relationship between the values.

In addition, Dr. Fitzpatrick suggested comparing t-statistic derived p values for the dCIN5 data to the f-statistic derived p calues for the same data. However, if the same t-statistic calculations performed on the wildtype strain t15 data are performed on the dCIN5 data, this comparison cannot be made because while the f-statistic derived p values take into account all time points, at least as was calculated today, the t-statistic derived p values are for each individual time point.

Katrina Sherbina 19:48, 10 June 2011 (EDT)