# J'aime C. Moehlman's Week 12

### Vibrio cholerae Data Analysis

#### Normalize the log ratios for the set of slides in the experiment

• entered a new worksheet into our excel file
• pasted all of the compiled raw data into the scaled_centered worksheet
• intserted two rows at the top of the worksheet (above data & below titles)
• in cell A2, we typed "Average" and in cell A3, we typed "StdDev"
• You will now compute the Average log ratio for each chip (each column of data). We did this by using the excel equation "=AVERAGE(B4:B5224)"
• After following that example we computed the average for the rest of the columns
• Then we followed the same steps as above to compute the standard deviation of the log ratios by using the equation "=STDEV(B4:B5224)" and then found the standard deviations for all of the other columns.
• we inserted new colums to the right of each patient sample (i.e. A1- A4, B1-B4, C1-C4) and labelled them each A1-C4_scaled_centered
• In cell C4, we entered this equation: "=(B4-$B$2)/$B$3" and then did the same for every cell in the column, after that we did this for each of the following empty columns

#### Perform Statistical Analysis on the Ratios

• we created a new worksheet called "statistics"
• then we copied all of the gene id's into this new worksheet into column A
• we copied the values from the A1_scaled_centered worksheet
• we created 3 new columns to show the log values.
• we created a new sheet that is designed specifically for genMAPP.

genMAPP worksheets

#### Sanity Check

• pvalues less than .05: 5
• pvalues less than .01: 0
• pvalues less than .001: 0
• pvalues less than .0001: 0
• Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change greater than zero. How many are there?
• 4
• Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change less than zero. How many are there?
• 1
• There showed to be 1617 log fold changes between -.25 and .25.
• Merrell et al. used the p value as criteria to determine significant gene expression change.
• VC0028 has a p value of .325668
• VC0941 has a p value of about .73
• VC0869 has a p value of about .46
• VC0051 has a p value of about .28
• VC0647 has a p value of about .45
• VC0468 has a p value of about .83
• VC2350 has a p value of about .18
• VCA0583 has a p value of about .29
• These values all seem to be very different, some are significant while others don't really seem to be.