GRNmap Testing Report: Strain Run Comparisons 2015-05-27

From OpenWetWare
Jump to navigationJump to search


  • The purpose of this test is to analyze how the model behaves when running the same network with several different combinations of strain data.
  • Issue #10 on GitHub: [1]

Test Conditions

Undefined function 'max' for input arguments of type ''.
Error in GRNmodel (line 15)
nfig        = max(figHandles);
    • For the first ten categories, we will upload data obtained from September 2014. To see if we can run the last two categories on 2014b without any glaring differences, we will first test wt alone on both 2014a (with Fitzpatrick's fall version of the code) and 2014b (with the code from the class BIOL398-04/S15). If the differences in estimated parameters are negligible, we could move on to run the last two categories (wt+dCIN5+dZAP1 and wt+dCIN5+dZAP1+dGLN3).
      • Note: for these older versions of the code, the input file must be in the same folder as the code itself.

To get the LSE & the penalty term, type the following:

Code for LSE:

Code for Penalty

Excluding the wt, running the individual deletion strains through MATLAB was done with an iestimate set to 0.00E00. That value means that there would not be any estimation of the parameters.

  • Because of this observation, we had to first compare the wt from MATLAB versions 2014a vs. 2014b.
  • Next, we had to analyze the threshold values and the optimized weights in order to see if the differences between the outputs were negligible
    • If they were negligible, we would proceed to run estimations of the individual strains
  • The comparisons of the individual strains were estimated, so those do not have to be re-run on MATLAB.

We have decided to standardize everything on the code from BIO398 with the 2014b version of MATLAB. All data below will be run on this model (excluding the wt alone, 2014a). We are standardizing because, although the difference was negligible, it could confound our results if they also have negligible differences in estimated parameters.

  • Note: when using this version, ensure that "fix_b" is set to 0 (i.e. estimate b) and create a simtime row on the optimization_parameters worksheet.

Results, Individual Strains

wt alone, 2014a

wt alone, 2014b

dCIN5 alone

dCIN5 visualized, normalized GRNSight network

dGLN3 alone

dGLN3 visualized, normalized GRNSight network

dHMO1 alone

dHMO1 visualized, normalized GRNSight network

dZAP1 alone

dZAP1 visualized, normalized GRNSight network

Results, Multiple Strains

All Strains

wt vs. dCIN5

Visualized Network with wt and dCIN5 data

wt vs. dGLN3

Visualized network with wt and dGLN3 data

wt vs. dHMO1

Visualized network with wt and dHMO1 data

wt vs. dZAP1

Weighted visualized network with wt and dZAP1 data

wt + dCIN5 + dZAP1

Visualized network with wt, dCIN5, and dZAP1 data

wt + dCIN5 + dZAP1 + dGLN3


  • Excel sheet comparing output weights for CIN5, FHL1, PHD1 and SKN7 regulators, estimated b values, and estimated production rates for all the above strain combinations: Media:GJ Estimated weight output comparison all combinations.xlsx
  • Examine the graphs that were output by each of the runs. Which genes in the model have the closest fit between the model data and actual data? Which genes have the worst fit between the model and actual data? Why do you think that is? (Hint: how many inputs do these genes have?) How does this help you to interpret the microarray data?
  • Which genes showed the largest dynamics over the timecourse? In other words, which genes had a log fold change that is different than zero at one or more timepoints. The p values from the Week 11 ANOVA analysis are informative here. Does this seem to have an effect on the goodness of fit (see question above)?
  • Which genes showed differences in dynamics between the wild type and the other strain your group is using? Does the model adequately capture these differences? Given the connections in your network (see the visualization in GRNsight), does this make sense? Why or why not?
  • Examine the bar charts comparing the weights and production rates between the two runs. Were there any major differences between the two runs? Why do you think that was? Given the connections in your network (see the visualization in GRNsight), does this make sense? Why or why not?
  • What other questions should be answered to help us further analyze the data?
    • Production rate vs. degradation rate. How do these combine?
    • ANOVA p-value for within strain
      1. Magnitude (large dynamics)?
      2. Variance (spread of the data points)?
      3. Some combination of the two?
    • Fit of the model vs. parameter value stability
  • Ppt analyzing genes with no inputs and genes that only regulate themselves: Media:GRNmap Testing Analysis.pptx
  • Powerpoint containing some genes that have poor T60 fits to the provided data: Media:Poor Fitting T60 Model to Data Points.pptx