Etchevers:Notebook/Genomics of hNCC/2009/11/06

{| width="800"
 * style="background-color: #EEE"|[[Image:C14.jpg|128px]] Genomics of human neural crest cells
 * style="background-color: #F2F2F2" align="center"|  |Main project page
 * style="background-color: #F2F2F2" align="center"|  |Main project page


 * colspan="2"|
 * colspan="2"|

Cilia genes and other things - Excel filtering
I wanted to generate different clustering and heatmap views of cilia-related genes in the neural tube. This brought me to download the old, but still useful, Cluster and Treeview programs from this article. I know there is much more up-to-date stuff and, notably, fuzzy clustering which sounds ultimately better, but for playing around with the SAGE data, this will do.

This brought me to the desire to look at the whole set of four banks and how they cluster, but I realized that with a range from 0 to a few hundred, the intensities of the colors attributed by TreeView were all rather weak. Maybe it will be the same thing after this operation; I'm regretting not multiplying all the columns by 100 instead of 1000. Anyhow, I have some 35K rows and 6 columns (gene names/descriptions and the four bank tag counts) in an Excel worksheet.

To multiply by 1000, just type = (then click on the top of one column)*1000 and return. Select the top of the column, and Ctrl-D will fill in all the way down.

Then I had to copy the values only, of the four new columns on top of the previous ones. TO do this, select, then Paste Special or use the arrow on the right of the Paste button, and select "Values". I hope this will prevent me searching again in future years, as I do this about once a year.

Finally, I think it might be interesting as an overview to just consider one example of highest value for tag expression, rather than all the tags, for the purposes of clustering (which I am not sure are really applicable to the whole bank, but that is another problem.) To do this, I found the following workaround after a long search:


 * "In a blank column put the formula = if(A1&B1=A2&B2,"",1) where A1, B1 & C1 refer to your three categories. Filter to remove blanks."

IF(logical_test, [value if true], [value if false]) or = IF(A1&B1=A2&B2,"",1)

This puts a 1 at each last row where it transitions into the next gene. So you should sort first by description, then by ROS_C12 (ascending) then by ROS_C13 (ascending), and only then do the logical test.


 * Heather 08:39, 6 November 2009 (EST):


 * }