User:Matthew Whiteside/Notebook/Fumigatus Microarray/2009/05/04

{| width="800"
 * style="background-color: #EEE"|[[Image:owwnotebook_icon.png|128px]] Project name
 * style="background-color: #F2F2F2" align="center"|  |Main project page
 * style="background-color: #F2F2F2" align="center"|  |Main project page


 * colspan="2"|
 * colspan="2"|

Dataset

 * SreA study.

Tool

 * Matisse

Data

 * Data is in data folder matisse_network_clustering

Expression Data

 * Samples were clustered in Matisse using Spearman correlation (Pearson & Euclidean showed no correlation b/w samples). Samples consist of time-points (0,10,30,60,120 & 240 minutes) after addition of Fe following Fe starvation. This was done for SreA ko & wild-type. Samples appear to cluster mainly based on time following addition of Fe. Some ordering achieved by SreA ko / wt variable especially @ 120/240 timepoint.

Networks

 * 1) KEGG. kegg metabolic networks were constructed from the output of the KEGG network provider from the NeAT website.

To complement the Anidulan metabolic network (with limited # of rxns, covers only 470 Afum genes), built a KEGG metabolic graph. KEGG provides RPAIRS, which defined the "before & after" compound players being acted up by the enzymes. These are classified as main, trans, leave etc. Could not find a good definition of the RPAIR types. Constructed a network using script build_kegg_network.pl. Files are in .../kegg_network in data dir.

Details on network construction:
 * Connected Afum genes to RPAIRs by the following KEGG DB links: Gene -> KO -> RXN -> RPAIR
 * Connected edges by genes that connected to similar RPAIR compounds (only used 'main' RPAIRS, dropped other RPAIR types which did not seem to support a traditional pathway flow). The RPAIR edges were obtained from KEGG network provider from the NeAT website, so i did not do the 'connecting the dots' myself.
 * Converted network into 'enzyme-only' (not the bipartite compound-enzyme graph) having edges 'pass-thru' compounds to other enzymes. Did not include 'parallel' edges (i.e. enzymes belonging to the same RPAIR and therefore would connect the same compounds).

Clustering

 * Clustering with Matisse FAILED. This may be due to limitations with the data (see below) or problems with Matisse operation. No source code is available for Matisse, so it was difficult to debug (and difficult to work with in "Other Species" mode).


 * Limitations with data:
 * Very few of the graph nodes (~2500) were found in the sreA expression set (only 193 of 1146 DE genes were in the interaction graph). With this few of the "front" nodes and so many "back" nodes, perhaps Matisse is unable to operate.
 * Possible fix: reduce graph nodes by removing compounds (bridging edges between compounds).
 * POSSIBLE FIX worked. When using spearman distances found modules using the non-bipartite graph (edges linked between nodes connected to the same compound, see network section for details.)


 * }