Jennifer Okonta Week 8
Files
File:Merrell Compiled Raw Data Vibrio JO 20101910.xls
File:Merrell Compiled Raw Data Vibrio JO 20101910-1.txt
Electronic Journal
Jennifer Okonta Electronic Journal
- First I opened up excel after downloading the new raw data.
- Next I saved and named the document with my initials and date included
- I created a new worksheet and named it scaled_centered
- I then copied the raw data and pasted onto the scale_centered worksheet
- I then found the average for each column by first typing in =AVERAGE(B4:B5224)and dragging it down the column.
- I continued to do this at the top of the column to fill in the average row
- Next I found the standard deviation for each column by typing in STDEV(B4:B5224)
- I continued to do this for each column until the top deviation row was filled.
- Next I inserted the columns into the worksheet. They were placed in between each column of numbers starting after the “B” column. I named each column the name of the previous one plus scaled_centered. (example: A1 is the first the new column would be A1 scaled _centered)
- In the C4 cell I typed the equation =(B4-$B$2)/$B$3 so that the average of the B column would be subtracted from the B cell and then divided by the standard deviation.
- Next, I clicked and dragged the equation to the bottom of the cell and excel automatically changed the equation for every cell and listed the values.
- I did that for ever scaled_centered column. I edited the letters to fit the eqation as I went along
- New I created a new worksheet and named it statistics
- I then went back to my Scaled-centered worksheet and coped all of my newly created scaled_centered columns and pasted them individually on to the statistics worksheet. I made sure to go back and copy the ID row and column too
- I made sure to paste special so that only the values would come up as I pasted in the new information
- To the right at the very next empty column after the values I typed in the headers with the names of Avg_LogFC_A, Avg_LogFC_B, and Avg_LogFC_C.
- I computed the averages for the all the A values in each row but putting in this equation =AVERAGE(B2:E2)
- Then I dragged the plus sign down to the bottom of the column and excel automatically computed those values for me also.
- I continued to do this for the other log values to for B and C values. I made sure to change my column letters accordingly to fit the equation. I then clicked the plus sign and dragged it down the column to the end
- Next I created a new column and named it Avg_LogFC_all
- I then created this equation to compute all the numbers =AVERAGE(N2:P2)
- I then clicked the plus sign and dragged it to the bottom
- Next I created a new column and name it Tstats in the cell under it I put in this equation =AVERAGE(N2:P2)/(STDEV(N2:P2)/SQRT 3))
- I then clicked and dragged it down to the end of the column and let excel do the calculations for the numbers.
- Next, I created a P value column and into the cell under it I typed =TDIST(ABS(R2),2,2)
- I then clicked and dragged the values until I reached the end of the column
- Now after computing all the information, I now created a new worksheet for GenMapp. I went to the statistics page and copied and pasted all of the values by using Paste special onto the GennMapp page
- I then changed all of the B-Q column values to 2 decimal places by going to format then to cells then to number and typing in 2. It changed to two decimal places
- Next I went to columns R through S and went to format and cell and numbers and typed in 4 to make it four decimal places.
- Next I cut and pasted column N-S and placed them right after the ID column
- Next to the right of the ID column I create da new column named System Code and filled it cells with the letter “N”
- Then I saved it at a txt file and as an excel file
- This completed my work
Sanity Check
1. I clicked on A1 then went up to the data and clicked autofilter
2. Next I clicked on the up and down arrow next to pvalue and found custom
3. It to the dropped down box I chose less than and filled in 0.05
4. Next I look at the bottom and recorded my results
5. I did the same for the <0.01
6. And I did the same for < .0001
945 have <0.05
245 have<0.01
0 have<.01
4 have <0.0001
7. Next I clicked show all to bring back all my values with the pvalue still at <0.05
8. Next I customed it to show all the values > 0 in the Log All row
9. Next I costumed it to show all the values <0 in the same row
10. And then changed it to be <0.25 and >-0.25
354 have >0
591 have <0
916 have >.25 or <-.25
The link was broken so I could not view the criteria that they needed to determine a significant gene expression change. So I am not able to answer this question. VC0028,VC0941, VC0869, VC0051, VC0647,VC0468, VC2350 and VCA0583 don’t have values on my excel worksheet. So I don’t know if I did it wrong or correctly but I did not see values for these numbers but I guess that it would affect their data because these numbers will have an effect on their averages and totals because it is eight new pieces of data that need to be included into the worksheet.
Jennifer Okonta 19:51, 24 October 2010 (EDT)