My Computational Journal Summer 2014: Difference between revisions

Revision as of 15:25, 3 July 2014

May 19, 2014

Began putting together MATLAB code to create a stochastic model for a dummy network consisting of 4 genes. In this network, one gene regulates itself and all genes are regulated but at least one other gene.

Created an Excel workbook that will be the input to the MATLAB code. Has similarities to the input workbook for the sigmoidal and Michaelis-Menten model simulations.
1. "data" : Contains dummy data for each of the four genes for four replicates for each time point (t15, t30, and t60). The data was generated with the following random number generator code where the min (-4) and max (4) values were determined based upon actual microarray data: =RAND()*(4-(-4))+(-4). Data must then be copied and pasted as values.
2. "network" : Adjacency matrix for gene network.
3. "network_weights" : Copy of adjacency matrix for gene network.
4. "run_control_parameters" : Parameters that control the simulation run.
5. "simulation_times" : Time points for which to find the model.
Started to put together the script with some comments about the sequence of commands.

Katrina Sherbina 18:56, 19 May 2014 (EDT)

May 20, 2014

Began looking over at the MATLAB scripts Nick had on his backup disk for the stochastic model from the Spring of 2013. As a result, I made some changes to the input workbook I began yesterday (Input_4_Gene_Dummy_Network_Stochastic_Model.xls) in order to match his input workbook so that I can use his code:

Changed the name of the "data" worksheet to "wt".
Copied the following variables from Nick's "optimization_parameters" sheet to my "run_control_parameters" sheet: fix_P, fix_b, Strain, Sheet, and Deletion.
Added a "degradation_rates" worksheet with randomly computed degradation rates for each gene using the following formula: (0.001)+(0.3-0.001)*rand(), where the min 0.001 and max 0.3 was determined from looking at the min and max degradation rates in Nick's worksheet.
Added a "production_rates" worksheet with listing the production rate for each gene as twice its degradation rate.

*NOTE: In Nick's corresponding worksheet, the production rate is not twice the degradation rate, but for most genes, with the exception of two, the production rate is a bit bigger than the degradation rate.

Changed the "network_weights" worksheet to estimates of the weight of each edge computed with the following formula: (-1)+(1.4-(-1))*rand(), where the min -1 and max 1 was determined from looking at the min and max weight in Nick's corresponding worksheet.

The new order of the workbook is as follows:

"wt"
"run_control_parameters"
"degradation_rates"
"production_rates"
"simulation_times"
"network"
"network_weights"

To test out the simplified way Fitzpatrick has suggested to compute the state probabilities for each gene, I began putting together some MATLAB code to run a forward simulation using some of Nick's code as a foundation from the Spring of 2013. The first two scripts are the driver that runs the simulation (Stochastic_Driver_20140520.m) and the script that imports all of the data from the Excel workbook (Stochastic_Parameters_20140520.m). In adapting the second script from Nick's corresponding script, I noticed that the imported production rates and degradation rates are divided by log(2) where the log as it is used in MATLAB is the natural log. I need to ask Dr. Fitzpatrick why this is because this this transformation is not used in the deterministic model code. I created another script called Stochastic_Forward_Simulation_20140520.m that takes the code from Nick's Stochastic_LSE.m that corresponds to just the forward simulation. However, I have a few questions about this script from Nick that may or may not result in changes having to be made to my forward simulation script:

While the variable "nmc" is set to 10 in the script that imports all the Excel data, nmc is reset in the Stochastic_LSE.m script. First of all, what is nmc (the number of iterations?) and why is the value of it reset?
In Nick's Stochastic_LSE.m, there are 3 different forward simulation runs between which there do not seem to be any differences other than nmc is set to 100 before the last of these 3 runs. Did Nick construct these runs for troubleshooting purposes?

After the Stochastic_Forward_Simulation_20140520, I will need to modify the simple_stochastic_sim.m and network_rates.m which seem to have a similar purpose to ode45 and the general network dynamics scripts for the deterministic model, respectively. I still don't quite understand how simple_stochastic_sim.m actually works so that is what I will be working on tomorrow. As for network_rates.m, I believe this script can be very easily altered removing many of the variables according to how Dr. Fitzpatrick suggested to modify the state probabilities computations. Namely, I think this will involve expressing the production of a gene (designated by the variable pro) using the equation Dr. Fitzpatrick gave for the probability that the state of a gene will be 1 (when the activators are on) in the next time step given some initial state. Then, the degradation of a gene (designated by the variable deg) can be expressed using the equation Dr. Fitzpatrick gave for the probability that the state of a gene will be -1 (when the repressors are on) in the next time step given some initial state.

Katrina Sherbina 18:25, 20 May 2014 (EDT)

May 21, 2014

Today, I was working on better understanding and modifying network_rates.m. In commenting the code, a couple of questions arose.

In the case that a target gene is not controlled by any of the transcription factors in the network, the number of positive steps (designated by the variable upj) is 0. However, why is it not the production rate of the target gene?
In finding the product of the weights and states of the transcription factors that control a particular target gene, the state of a transcription factor in the product is given by 2^state. I am not sure why this is.

In the case that the target gene is controlled by at least one transcription factor in the network, I changed the network_rates code to reflect the state probabilities that I talked about in the journal entry for May 20, 2014. However, at this point, I am a bit stuck and have some more questions:

The probabilities for the state of a target gene to be 1 or -1 have a weighting constant that is unique to each gene. I am wondering if this constant should also be dependent upon the number of activators and repressors that control the gene. In other words, there should be a specific weighting constant for the probability of a state of a target gene being 1 when the activators are on and another weighting constant specific to the probability of a state of a target gene to be -1 when the repressors are on.
Are both these probabilities used to compute the production of a target gene? How should this computation look?
Should the aforementioned computation be weighted by the production rate constant of the target gene?
How do you represent the degradation of a target gene?
What do you do with the probability of the state of a target gene being 0?

Katrina Sherbina 22:26, 21 May 2014 (EDT)

May 27, 2014

I have been working on altering the network rates function based upon an email Dr. Fitzpatrick sent me May 22nd. This email contained a Word attachment with state probabilities computed differently than Fitzpatrick and I originally discussed, which I have described in previous entries. In these the probabilities that the state of a gene is 1 and the state of the gene is -1, the denominator is only the summation of the weights of all the edges in the network. For the probability that the state of a gene is 0, the alpha parameter in the equation is multiplied by the sum of all of the weights of all transcription factors that control a target gene divided by the summation of the weights of all edges in the network. To find the summation of the weights of all edges, I actually added the following lines of code to the Stochastic_Forward_Simulation script (includes comments designated by %):

%Considering the edges in the network, targ is the index of a gene
%controlled by a transcription factor in the network and tfact is the
%index of the transcription factor that controls the target gene
[targ,tfact] = find(wtmat ~= 0); 
edgeindex = sortrows([targ,tfact],1); %Sort based on the index of the target gene
for ii = 1:nedges
   allweights(ii,1) = wtmat(edgeindex(ii,1),edgeindex(ii,2)); %Put all of the network weights into a vector
end
sumall = sum(allweights); %Find the sum of all the weights

After some discussion with Dr. Fitzpatrick today, I will actually try the forward simulation with these altered state probabilities as well as the state probabilities that we initially talked about. In modifying the network rates function for both of these sets of state probabilities, I changed the output structure to a number of genes x 3 dimensional array where the first column corresponds to the probability that the state of a gene will be 1, the second column corresponds to the probability that the state of a gene will be -1, and the third column corresponds to the probability that the state of a gene will be 0. For both network_rates functions, neither the production rates or degradation rates are used.

For both network_rates functions, there is also still some work to be done in computing the state probabilities for the no input genes. Dr. Fitzpatrick explained today that the state probabilities for these genes should be as follows.

p(1|y) = (1/2)*alpha_i
p(-1|y) = (1/2)*alpha_i
p(0|y) = 1-alpha_i

where y is the state of the transcription factor at time t=0. However, as alpha_i is the proportion of transcription factors in the network that regulate the target gene for all of the genes with inputs, I am not sure what alpha_i should be for the no input genes.

I don't believe that that the dummy network has any no input genes so I will need to redo the dummy network to gauge how these state probabilities will work.

In addition, there are some questions that will need to be answered further down the road:

Is the degradation rate of mRNA slower than that of protein? This is something to ask Dr. Dahlquist.
What should a state of 1, -1, or 0 mean once we incorporate the log fold change data in simulations that estimate the parameters?

Katrina Sherbina 19:49, 27 May 2014 (EDT)

May 29, 2014

I have been working on two network rates functions network_rates_20140527.m and network_rates_20140529.m, the former which contains the altered state probabilities for every gene controlled by at least one transcription factor in the network as Dr. Fitzpatrick described via email (described in the entry for May 27, 2014) with the parameter alpha set as 0.9 as suggested by Dr. Fitzpatrick. The latter script contains the state probabilities as initially discussed on the first day of summer work. Both functions however give the state probabilities for the no input genes as described in the entry for May 27, 2014. After some consultation with Dr. Fitzpatrick, I set the alpha parameter for the no input genes to be 0.05. According to Dr. Fitzpatrick, the alpha parameter should be lower for the no input genes than the other genes. If the alpha parameter is in fact dependent on the transcription factors in the network that control a gene, then naturally the genes controlled by at least one transcription factor in the network should have a higher alpha parameter than the no input genes. I think we will probably need to experiment with what the value should be for the alpha parameter depending on how the models fit the data.

Katrina Sherbina 22:09, 29 May 2014 (EDT)

May 30, 2014

In order to test out the if statement in the network rates function that determines the state probabilities for the no input genes, I had to alter the network contained in the input file (Input_4_Gene_Dummy_Network_Stochastic_Model.xls) to include a no input gene. I chose GeneA to be the no input gene by setting the regulatory weight of all the transcription factors in the network to 0 for GeneA in the sheet "network_weights". In so doing, I also needed to zero out the row for GeneA in the "network" sheet.

To be able to run the forward simulation, I had to make several changes to the simple_stochastic_sim function of which I created a new copy called simple_stochastic_sim_20140530.m:

To find the variable nevents, I changed the computation to 3*n_genes as the output of the network_rates function consists of three columns rather than two columns as it initially was.
I created a new varible rn to take into account the third column of the network_rates function output, which involved adding the following two lines of code:

rn    = zeros(n_genes,1); %Probability of no change

rn    = rr(:,3); %Probability of no change

In addition, several function calls in some scripts had to be changed to ensure that the network_rates_20140529 and simple_stochastic_sim_20140530 functions were being called on.

There is still more troubleshooting to go as I got the following error message after trying to run the simulation:

??? Attempted to access Nlist(14,5); index out of bounds because
size(Nlist)=[14,4].

Error in ==> simple_stochastic_sim_20140530 at 73
           Nlist(icount,ii-n_genes) = max(0,Nlist(icount,ii-n_genes)-1/4);

Error in ==> Stochastic_Forward_Simulation_20140520 at 20
   [simtime,model]   =
   simple_stochastic_sim_20140530(@network_rates_20140529,simtime,x0);

Error in ==> Stochastic_Driver_20140520 at 8
Stochastic_Forward_Simulation_20140520

Once I have finished debugging the simple_stochastic_sim_20140530 function, I will run the forward simulation with the other network_rates function.

Katrina Sherbina 23:24, 30 May 2014 (EDT)

June 2, 2014

I spent some time today debugging the Michaelis-Menten simulation scripts that Juan was using. Each time he tried to run the simulation, he got a matrix dimension mismatch error in the general_network_dynamics_mm script. In order to debug the code, I had to comment out the "clc" and "clear all" calls that he added to the beginning of the estimation_driver script. In debugging the general_network_dynamics_mm script, I noticed that the dimensions of the array D is 1x21 while the dimensions of the array zz is 21x1. As a result, it was not possible to do the following computation: D.*zz. The problem ended up being in how the variable D is specified. In Juan's script, he set D to be degrate. However, the correct line of code is D = degrate(:). This corrected line ensures that D is a column vector of dimensions 21x1. To be consistent with the script that I used when I ran simulations for my thesis, I also changed the way the variable P is computed to the following line of code: P = prorate(:). In making these changes, the simulation was able to run.

After helping Juan, I went back to the stochastic model. As I forgot to copy the files I was working on at home to my flash drive, I spent some time recreating the files I wrote about in my entries for May 29, 2014 and May 30, 2014. I tried to run a forward simulation for the stochastic model using the network_rates_function_20140529 and got the same error message that I received on May 30, 2014. At this point, I began troubleshooting the simple_stochastic_sim_20140530 script. In order to do so, I had to rerun the simulation outputting icount, Nlist, ii, and ii-n_genes at each iteration for the for loop in the simple_stochastic_sim_20140530 script. In so doing, I got almost the same error message as I received May 30, 2014 with the exception that the simulation was trying to access Nlist(8,5). At this point, since ii is greater than the number of genes in the network, the simulation is told to access Nlist(icount,ii-n_genes). I believe that the element accessed is given by (icount,ii-n_genes) in order to take into account that the arrays r2 and Ti in the same script have the dimensions n*3 x 1 where n is the number of columns in Nlist. However, the computation ii-n_genes becomes problematic when ii is not a multiple of n_genes. In the error message that I described, the problem is that ii is 9 which is not a multiple of n_genes which is 4. So, the simulation cannot access the correct element in Nlist. To fix this, I replaced Nlist(icount,ii-n_genes) with Nlist(icount,mod(ii,n_genes)). However, after trying to run the simulation with this fix, I got the following error message:

??? Attempted to access Nlist(14,0); index must be a positive integer or logical.

Error in ==> simple_stochastic_sim_20140530 at 76
           Nlist(icount,mod(ii,n_genes)) = max(0,Nlist(icount,mod(ii,n_genes))-1/4);

Error in ==> Stochastic_Forward_Simulation_20140520 at 20
   [simtime,model]   = simple_stochastic_sim_20140530(@network_rates_20140529,simtime,x0);

Error in ==> Stochastic_Driver_20140520 at 8
Stochastic_Forward_Simulation_20140520

From the error message, it looks like there is a problem when ii is greater than the number of genes but is divisible by the number of genes. To fix this problem, I rewrote the for loop in simple_stochastic_sim_20140530.m governed by the condition ii<=n_genes as follows

if ii<=n_genes
    Nlist(icount,ii) = min(2,Nlist(icount,ii)+1/4);
    Nlist1 = Nlist  %%% For troubleshooting purposes
else
    if mod(ii,n_genes)==0
        Nlist(icount,n_genes) = max(0,Nlist(icount,n_genes)-1/4);
        Nlist3 = Nlist  %%% For troubleshooting purposes
    else
        Nlist(icount,mod(ii,n_genes)) = max(0,Nlist(icount,mod(ii,n_genes))-1/4);
        Nlist2 = Nlist  %%% For troubleshooting purposes
    end
end

In making the above alterations to the code, I reran the forward simulation. This time around I did not get any error messages. However, the model outputted does not seem correct. In looking at the array log2FC.model, I noticed that most of the elements of the array were -Inf.

I decided to run the forward simulation with the script network_rates_function_20140527.m instead of network_rates_function_20140529.m to see if the model would be any better. In order to do so, I had to change any call to the network_rates_function in any of the scripts in the forward simulation to network_rates_function_20140527.m from network_rates_function_20140529.m. In addition, I had to also add a globals statement to Stochastic_Forward_Simulation_20140520 to add the variable sumall to the globals. The variable sumall denotes the sum of the weights of all edges in the network. Also, I also added the variable sumall to the globals statement in the scripts network_rates_20140527.m and simple_stochastic_sim_20140530.m. However, the model simulated with the network_rates_function_20140527 script was not any better than that simulated with the network_rates_function_20140529 function. The array log2FC.model had many -Inf elements. I will be emailing Dr. Fitzpatrick to see if he has any idea what the problem could be.

Katrina Sherbina 18:59, 2 June 2014 (EDT)

June 10, 2014

Dr. Fitzpatrick suggested that I simplify the forward simulation that I have been working with because it had too many moving parts as I adopted Nick's code. As I have not done as much work with the stochastic model as Nick did, I spent a couple of days going over some old journal club presentations and reading some online material on Markov Chain Monte Carlo simulations. I also ended up reading some of Nick's journal entries on his work on the stochastic model and came across an entry at the very beginning of the work referencing early simple stochastic simulation and network rates scripts. I dug up these scripts from my email and started to make modifications to it in order to use the network_rates_20140527.m script.

I saved a copy of this early simple stochastic simulation script as simple_stochastic_sim_20140610.m in order to make some changes. Many of these changes were changes I made to the simple stochastic sim and network rates scripts I have already described in past entries. I removed all scripts having to do with production rates because the production rates are not incorporated into the model yet. In using the simple_stochastic_sim_20140610.m, I removed all input sheets from the input Excel workbooks except network, network_rates, and simulation_times and renamed the input Excel workbook as Simplified_Input_4_Gene_Dummy_Network_Stochastic_Model.xls.

The forward simulation now just involves two scripts: simple_stochastic_sim_20140610.m and one of the two network_rates scripts that I have already been working on. The simple_stochastic_sim_20140610.m incorporates a simplified version of Stochastic_Parameters_20140520.m. I tried to run the script but ran into the following error:

??? Index exceeds matrix dimensions.

Error in ==> network_rates_20140527 at 14
   wtii = wtmat(ii,jj); % Weight of the regulatory effect of each of the
   transcription factors

Error in ==> simple_stochastic_sim_20140610 at 92
   [rr]  = network_rates_20140527(N0);

I am in the process of debugging this error. I am surprised with this error because I the code I used closely resembles the simple_stochastic_sim I have used before. I have also not made any changes to the network_rates scripts in this forward simulation.

Katrina Sherbina 01:14, 11 June 2014 (EDT)

June 12, 2014

I managed to fix the error that I was getting after making a few changes that I made to the following scripts: simple_stochastic_sim_20140610.m and network_rates_20140527.m:

Changed the vector x0 from a n_genes x 1 vector of zeros to ones.
Changed jump_rates in the network_rates script from a 2 column array to a 3 column array.
Altered the globals statement in simple_stochastic_sim to match the globals statement in the network_rates script.
Had to modify the for loop that updates the Nlist in the simple_stochastic_sim as discussed in the entry for June 2, 2014.
Added line of code that takes the log base 2 of the model.
Add script to save a .mat file of the simulation.

June 16, 2014

Today, I began creating another network_rates script (called network_rates_201406106.m) with different state probabilities that Dr. Fitzpatrick and I discussed in today's morning meeting. Before I talk about the actual modified network_rates script, I should note that I changed two parameter values at the onset. I made the parameter alpha much smaller (i.e. 0.01 for the genes controlled by at least one transcription factor in the network and 0.005 for the no input genes) and, as a result, made nmc greater (i.e. 1000). I also removed a line of code in the simple_stochastic_sim_20140610 script that found the log base 2 of the model and added a loop to graph the model for each gene.

As for the modifications to the network_rates script, the following new state probabilities were coded for the genes that are controlled by at least one transcription factor in the network:

The probability that the state of a target gene is 1 in the next time step is given by

(alpha_i/2)+sum_j(w_{ij}*(y_j > 0)) where j is an element of the activators and alpha_i is a parameter dependent on the target gene i

The probability that the state of a target gene is -1 in the next time step is given by

(alpha_i/2)-sum_j(w_{ij}*(y_j > 0)) where j is an element of the repressors and alpha_i is a parameter dependent on the target gene i

The probability that the state of a target gene is 0 in the next time step is given by

1-(alpha_i)-sum_j(abs(w_{ij})*(y_j > 0)) where j is an element of the set of transcription factors and alpha_i is a parameter dependent on the target gene i

After running the forward simulation using the scripts simple_stochastic_sim_20140610.m and network_rates_20140616.m with the network that I have been working with without any errors popping up in the process, I ran the forward simulation for a few other specific networks:

Each gene is regulated by one transcription factor in the network and the regulatory weight is positive (Simplified_Input_4_Gene_Dummy_Network_All_Positive_Weights_Stochastic_Model.xls).
Each gene is regulated by one transcription factor in the network and the regulatory weight is negative (Simplified_Input_4_Gene_Dummy_Network_All_Negative_Weights_Stochastic_Model.xls).
Each gene is regulated by at least one transcription factor in the network and there is a mixture of positive and negative regulatory weights in the network (Simplified_Input_4_Gene_Dummy_Network_Mixed_Weights_Stochastic_Model.xls).

However, I obtained strange expression profile plots for each gene in the network for the first two aforementioned networks (in bulleted list). For both of these networks, the expression plots for each gene began at 1 and then sharply decreased to 0. I would expect this of the network with all negative weights but not of the network with all positive weights. For the simulation using the network with all positive weights, the model for each gene zeros out by 38 time points. For the simulation using the network with all negative weights, the model for each gene zeros out by 42 time points. The models for each of the networks are not identical but do follow the same trend. In the case of the third of the aforementioned networks (in bulleted list), the simulation could not even finish outputting the line Tnow = 0 many times before getting stuck. Taking a look at the model once I stopped the simulation n_model_times x n_genes array of 0's.

I experimented with two more values of alpha for the genes that are controlled by at least one transcription factor: 0.1 and 0.9. I thought that maybe the expression profiles for the genes in the all positive weights network could be decreasing to 0 because maybe alpha was too small and perturbations to the network would result in very small probabilities (close to 0) that the state of a gene would be 1. However, I observed the same trends as described when I originally ran the simulation with the alpha set to 0.01. I tried to run the simulation with the mixed weights network with an alpha of 0.9 and the simulation again would not stop outputting several lines of Tnow = 0 before getting stuck.

In looking at the scripts simple_stochastic_sim_20140610.m and network_rates_20140616.m I noticed what is most likely screwing up the simulation. In the simple_stochastic_sim_20140610 script, I forgot to call network_rates_20140616 and not another network_rates script. In addition, in the function description in network_rates_20140616.m, I called network_rates and not network_rates_20140616. In looking at other versions of the network rates, I noticed the same problem (i.e. that the name in the function description line did not match the name of the actual script). I remedied this issue in all of the network_rates scripts that I have.

I reran the simulation with the simple_stochastic_sim_20140610 and network_rates_20140616 scripts reverting the value of alpha for all genes controlled by at least one transcription factor in the network to 0.01 and got much better results. For the simulation with the all positive weights network, the expression profiles outputted for each gene oscillated a lot but the oscillations began to decrease with time. For the simulation with the all negative weights network, the expression profiles outputted for each gene decreased sharply to 0. The results are to be expected for each of the networks considering the sign of the regulatory weights in each network. For the simulation with the mixed weights network, the expression profiles outputted were similar between GeneA and GeneB and between GeneC and GeneD. For GeneA and GeneB, the plot increased plateuing a bit above 1.6. For GeneC and GeneD, the plot decreased to 0. These results are expected since the transcription factors regulating GeneA and GeneB have positive regulatory weights while the transcription factors regulating GeneC and GeneD have negative regulatory weights.

Katrina Sherbina 17:27, 16 June 2014 (EDT)

June 23, 2014

For the past few days, I spent some time browsing the resources on stochastic models and gene regulatory networks. Dr. Fitzpatrick and I had briefly discussed last week the notion of noise in the model being indicative of a pattern of expression of gene. I looked up a few articles by a professor I met at UC Irvine as I remembered that one of his students talked about his research into that topic. However, I am not sure if we can use the work presented in a few of those articles that I read.

Today, Dr. Fitzpatrick clarified what he meant by simplifying the code. In talking with him, I realized that there were many extraneous elements to the "simplified" code that I have been working with that remained from trying to base my work off of Nick's. These extraneous elements are in part as a result of the fact that Nick's simulation used time steps of different sizes whereas the simulation we are trying to do now uses a fixed time step. However, with Dr. Fitzpatrick's suggestion, I stripped away many of these elements and created a much simpler simple_stochastic_sim script, which I have saved as simple_stochastic_sim_20140623.m.

In this new, simplified script, I cleaned up the loop that calls the network_rates function (I am still using network_rates_20140616.m). First of all, I changed it from a for loop to a while loop that executes a set of commands given that a counter (which is initialized prior to the loop) is less than or equal to the number of increments created given the time range imported from the input workbook containing simulation times. I increased the number of these increments from 101 to 1001. In removing many unnecessary variables, I created a new for loop in this while loop that assigns a state to each gene by comparing a random number generated for each gene with the probability that the state of a gene is 1 and the probability that the state of a gene is -1 in the next time step. The loop looks as follows

for g = 1:n_genes
       if u(g) <= up(g)
           x1(g) = 1;           % State of a gene goes to 1
           model(icount,g) = 1; % Store state to output matrix
       elseif (u(g) > up(g)) && (u(g) <= up(g)+dwn(g))
           x1(g) = -1;           % State of a gene goes to -1
           model(icount,g) = -1; % Store state to output matrix
       elseif u(g) > up(g)+dwn(g)
           x1(g) = 0;           % State of a gene goes to 0
           model(icount,g) = 0; % Store state to output matrix
       end
end

where up(g) is the probability that the state of a gene g goes to 1, dwn(g) is the probability that the state of a gene g goes to -1, and icount is a counter.

I ran this new script using the network with all positive weights a few times to see if the genes in the network had a similar profile (by this I mean the plot of the state of each gene for each time step) between the runs. For two genes, specifically GeneA and GeneC, I saw that the genes went to a state of 1 more than a state of -1. However, the profile for GeneB and GeneD changed between the runs.

There are still some things left to explore/implement with this script as it is now:

If there is some consistency in the profile of each gene given more runs are performed with the simulation.
The profile of the genes in the network using the all negative weights network and the mixed weights network.
Incorporating a no input gene into the network.
Looping the entire script so as to automate the runs.
Look at different values of the gene specific parameter alpha.

Katrina Sherbina 19:17, 23 June 2014 (EDT)

June 24, 2014

Today, I spent some time modifying the simple_stochastic_sim_20140623 script to being able to loop several runs of the simulation and save the output from each run automatically. This involved creating a new script called driver_20140624.m:

global A wtmat prorate degrate wts log2FC alpha i_forced no_inputs n_genes counter nmc tspan sumall nrun TX1

input_file_name = 'Simplified_Input_4_Gene_Dummy_Network_All_Negative_Weights_Stochastic_Model';
igraph          = 1;

for nrun = 1:5
   simple_stochastic_sim_20140623 
end

input_file_name needs to be changed by the user to whatever workbook contains the input data (network and simulation times) the user wishes to work with. The for loop runs the simple stochastic simulation as many times as the user specifies.

I modifying the code in simple_stochastic_sim_20140623 that creates names for the graphs of the change of a gene's state over time to include the number of the run that is being executed. To ensure that the appropriate graphs would be saved for each run, I added the statement "close all" to the end of the script. I also added the following for loop to store some data to a mat file that describes the frequency of a specific state for each gene:

for j = 1:n_genes % Ouput some information about the states of the genes
   info.run = nrun;                                               % Number of run
   info.gene{j,1} = TX1{1+(j),1};                                 % Name of gene
   info.nup(j,1) = length(find(model(:,j)==1));                   % At how many time points does the state of a gene goes to 1?
   info.ndwn(j,1) = length(find(model(:,j)==-1));                 % At how many time points does the state of a gene goes to -1?
   info.compare(j,1) = info.nup(j,1)>info.ndwn(j,1);              % Does the state of a gene go to 1 more so than -1?
end

I performed five runs of the simulation for the all positive weights network and the all negative weights network. I did notice that the state of GeneA, GeneB, and GeneC did go to 1 more so than -1 when performing the simulation for the all positive weights network. Eyeballing the plots generated from performing the simulation for the all negative weights network, it looks the state of the genes went to -1 more so than 1. These results make sense considering the networks.

It will be interesting to see the kinds of results I get when I run the simulation with a mixed weights network. I am also interested in seeing how the value of the gene specific parameter alpha will affect the results. I am also wondering whether I should try to perform more runs to get a better, more reliable interpretation of the genes' profiles.

Katrina Sherbina 00:10, 25 June 2014 (EDT)

June 25, 2014

Today, I ran five runs of a simulation with the code I was working on yesterday (driver_20140624.m, simple_stochastic_sim_20140623.m, and network_rates_20140616.m) with the mixed weights network (Simplified_Input_4_Gene_Dummy_Network_Mixed_Weights_Stochastic_Model.xls). After completing all the runs, I looked at the frequency with which the state of each gene went to 1 and to -1. GeneA, GeneB, and GeneD showed the clearest difference between these two frequencies. The state of GeneA and GeneB went to 1 more so than -1. The state of GeneD went to -1 more so than 1. It would seem that, at least in this mixed weights network, GeneA and GeneB are up-regulated while GeneD is downregulated. These observations are also evident from the plots of the change in state of a gene over time. Whether or not the state of GeneC goes to 1 more so than -1 or vice versa is not as clear. For two runs, the frequency the state of GeneC went to 1 was equal to the frequency that the state went to -1. For another two runs, he frequency the state of GeneC went to -1 was greater than the frequency that the state went to 1. From looking at the input sheet, since the only regulator that controls GeneC (GeneD) has a negative regulatory weight, GeneC should be down-regulated. I think this implies that the frequency that the state of GeneC goes to -1 should be greater than the frequency that the state goes to 1.

I tried experimenting with the number of time steps. I increased it to 1,000,000 but then it was not possible to make sense of the plots. I might try setting the number of time steps to 10,000 or 100,000 (I have been specifying 1,000 time steps in the simulations for which I have described the results), but I am not sure that that will have much of an effect.

On a different note, Dr. Dahlquist sent me an email about a script that determines if a gene has a significant change in expression between two different strains. In looking into her question, I found that the script I had one my computer had several lines of code to create plots to see the expression profile for each gene between two different strains. I tried to quickly run the script but I ran into an error with one of the variables. I will try to figure out the source of this problem and see if I can re-integrate the lines of code to create the plots.

Katrina Sherbina 22:23, 25 June 2014 (EDT)

June 27, 2014

I've taken a break from the stochastic model and went back to a MATLAB script that determines if a gene has a significant change in expression between two different strains that is described in a protocol called Comparing Significant Changes in Expression Between Two Strains. I was able to troubleshoot the script per Dr. Dahlquist's request to add a few old lines of code that create plots to see the expression profile for each gene between two different strains to the code in the protocol. This involved adding some new variables that serve as independent variables necessary to create the plots. At the moment, plots are generated for all genes that have an unadjusted p value of less than 0.05. Since hundreds of genes satisfy this p value requirement, I will ask Dr. Dahlquist if she wants to create a more stringent filter (i.e. Benjamini & Hochberg adjusted p value of less than 0.05) to generate these plots.

I am currently working on adding more comments to the protocol to allow the user to understand what each part of the code means.

June 30, 2014

Today, I went back to the stochastic model. The latest iteration of it didn't seem to be doing what Dr. Fitzpatrick and I would expect given what we know about transcription.

When we took another look at the gene profile plots that I sent last week during today's morning meeting, in particular the plots from the all positive network simulation, the gene was more likely to stay at a state of 0 rather than a state of 1. We plotted the probability of a state of 1 at each time step and the probability of a state of 0 at each time step and found that the latter probability was much higher than the former. Whereas, we would think that if a gene is activated in our current network (which is a 4-gene cyclic network), then the gene would stay on.
We tried to deal with the above issue by gradually increasing the gene-dependent parameter alpha. In so doing, the probability that the state of a gene is 1 did become greater than the probability that the state of a gene is 0. However, in so doing, this random chance parameter alpha has a greater influence on the model dynamics than the regulatory weights. We would hope that the converse is the case.
In the gene profile plots, the time a gene spent at a state of 1 was relatively short. However, biologically, we think that the transcription factor that first activates a gene stays one allowing many mRNAs to be produced in a relatively short time. According to the chapter "How Cells Read the Genome: From DNA to Protein" from the book Molecular Biology of the Cell 4th Edition by Alberts et al., it is the case that mRNAs are produced quickly because the synthesis of a new mRNA is begun before an RNA polymerase even finishes synthesizing the prior mRNA.

With the aforementioned problems, Dr. Fitzpatrick and I discussed a different way to express the state probabilities (in the following, X_i is the state of a gene i and y is the state of the gene at the current time step):

P(X_i(t+h) = 1 | X(t) = y) =
- (alpha_i/2) + sum_ j (w_{ij}*(y_ j > 0) when y_i = 0 where j is an element of the activators that control the target gene i
- beta_i when y_i > 0
- 0 when y_i < 0

P(X_i(t+h) = -1 | X(t) = y) =
- (alpha_i/2) - sum_ j (w_{ij}*(y_ j > 0) when y_i = 0 where j is an element of the repressors that control the target gene i
- beta_i when y_i < 0
- 0 when y_i > 0

P(X_i(t+h) = 0 | X(t) = y) = 1 - P(X_i(t+h) = 1) - P(X_i(t+h) = -1)

For both probabilities when the current state y_ j > 0, the gene is up- or down-regulated based on an OR probability (either due to the random chance alpha_i/2 or by the sum of the regulatory weights). The parameter alpha_i should be relatively small. On the other hand, the parameter beta_i should be relatively high since the high affinity of the transcription factor to its binding site should ensure that a gene stays up- or down-regulated (depending on the type of transcription factor) until the gene has a higher chance of being in another state.

I modified the existing network_rates script renaming it network_rates_20140630.m to reflect the revised state probabilities above. I added a few lines of code to simple_stochastic_sim_20140623.m, for troubleshooting purposes, to create plots of the change in the probability that the state of a genes goes to 1, the change in the probability that the state of a genes goes to -1, and the change in the probability that the state of a genes is 0.

I ran a simulation using driver_20140624.m, simple_stochastic_sim_20140623.m, and network_rates_20140630.m using the all positive weights network (Simplified_Input_4_Gene_Dummy_Network_All_Positive_Weights_Stochastic_Model.xls). I set alpha to 0.01 and beta to 0.5. However, I observed what I spoke about previously as a problem. The probability of no change was higher than the probability that the state of a gene goes to 1 at most time steps. I then reran the simulation decreasing alpha to 0.001 but keeping beta at 0.5 to see if increasing the difference in magnitude between the two parameters would force the state of a gene to stay at 1 once it reached 1. However, with these changes, the frequency with which the state of a gene went to 1 was much less than with the initial value of alpha. I think I need to experiment some more with the alpha and beta parameters.

In addition, with this new network_rates script, when the state of a gene went to 1, the gene stayed at this state longer than when the simulation was run with network_rates_20140616.m. While we were thinking that maybe the time for which the state of a gene stayed at 1 should be longer than what it was when running the simulation with network_rates_20140616.m, I am not sure how long we really expect the gene to stay at a state of 1 given an all positive weight network. The synthesis of mRNA occurs quickly but the x-axis of the gene profile plots goes from 0 minutes to 60 minutes. Therefore, maybe a genes stays at a state for such a short time that it is difficult to tell on the scale of minutes.

Katrina Sherbina 18:32, 30 June 2014 (EDT)

July 2, 2014

I have continued working with the most current stochastic model simulation (driver_20140624.m, simple_stochastic_sim_20140623.m, and network_rates_20140630.m) experimenting with different values of the regulatory weights in the all positive weight regulatory network (4 gene cyclic network) and the parameters alpha and beta. I began by doing a little bit of scratch work to see what the state probabilities of GeneD in the network would be given that the initial state of all genes is 0 and that GeneA turned on (the state of GeneA went to 1) by chance. In doing this scratch work, I decided to increase the magnitude of the weights in the all positive weights gene regulatory network from 0.5 to 0.7 in order to see how the probability that the state of GeneD will go to 1 with GeneA at a state of 1 would change the gene profile plots. In so doing, I found that all of the genes went to a state of 1 more frequently.

I also decided to see what would happen if I increased the value of beta keeping the value of alpha the same. By increasing beta from 0.5 to 0.8, the maximum probability that the state of a gene would go to 1 went from 0.5 to 0.8, which is expected since beta is the probability that the state of a gene stays at 1 given that its current state is greater than 0. I thought that with higher regulatory weights and a greater value of beta, there would not be any time step during which the state of a gene went to -1. Since this was not the case, I went back to doing some more scratch work and decided to see if I could decrease the chances that the state of a gene would go to -1 by decreasing the value of alpha but keeping the value of beta at 0.8. In so doing, there were less spikes to a state of 1 for all genes.

Before I send Dr. Fitzpatrick the plots that I have been compiling of the gene profiles and the changes in the state probabilities, I want to experiment with increasing the value of alpha a bit but keeping beta at 0.8 and the regulatory weights at 0.7.

Katrina Sherbina 17:40, 2 July 2014 (EDT)

July 3, 2014

I ran a simulation setting alpha to 0.03, beta to 0.8, and all of the regulatory weights to 0.7. Looking at the plots of the change in a gene's state over time, I didn't really notice anything glaringly different or any improvement from the other simulations I have run this week. I do not want to increase alpha too much because, as I have said before, the model should be more influenced by the regulatory weights than the parameter alpha. It would be worthwhile I think to experiment with different values of alpha and beta again when we start to fit the model to the microarray data.

Other than that, I finished compiling two different sets of plots into a PowerPoints that I sent off to Dr. Fitzpatrick: plots of the change in a gene's state over time and plots of the probabilities of a state for one of the genes in the network.

Katrina Sherbina 18:25, 3 July 2014 (EDT)

@@ Line 335: / Line 335: @@
 ==July 3, 2014==
+I ran a simulation setting alpha to 0.03, beta to 0.8, and all of the regulatory weights to 0.7. Looking at the plots of the change in a gene's state over time, I didn't really notice anything glaringly different or any improvement from the other simulations I have run this week. I do not want to increase alpha too much because, as I have said before, the model should be more influenced by the regulatory weights than the parameter alpha. It would be worthwhile I think to experiment with different values of alpha and beta again when we start to fit the model to the microarray data.
+Other than that, I finished compiling two different sets of plots into a PowerPoints that I sent off to Dr. Fitzpatrick: plots of the change in a gene's state over time and plots of the probabilities of a state for one of the genes in the network.
+[[User:Katrina Sherbina|Katrina Sherbina]] 18:25, 3 July 2014 (EDT)