User:Nuri Purswani/Network/Introduction/Literature

Algorithms for Biological Network Reconstruction from data

Literature Review

This section will review a few methods that were not implemented in this project, but were very interesting for possible future comparisons. Here we provide a summary of the methods and the relevant references. For detailed descriptions of the methods that have been implemented (i.e. Beal et al. 2005, Stan, Gonçalves et al. 2008-2010) see the Methods section. =Other interesting network inference methods=

Description
This method has the same underlying assumptions on the data as Beal's method. This method was implemented in the initial stages of the project, but the variational bayesian approach was chosen over it, as it is less prone to overfitting (Beal et al. 2007). This algorithm assumes that the gene expression data can be modelled as a linear dynamical system, with gaussian noise perturbing the hidden states and the observations at every time point. The implementation steps of the algorithm are the following:
 * Crossvalidation
 * Estimates the optimal dimension of hidden states "K"
 * Estimates the parameters A, B, C, D, and noise covariances for the state space model for gene expression.
 * Bootstrap
 * Increases the confidence of the estimate. Typically outputs 100 candidate networks, for which a mean result is approximated
 * The bootstrapping compensates for situations when there are not sufficient experimental repeats of our data.

Description
This method is based on ODEs and has different assumptions on the input data to the methods from Beal et al., Rangel et al. and Stan et al. Biological systems are non-linear state space models, instantiated on hill kinetics and mass action kinetics. It estimates parameters in the non linear model provided that the structure has been previously set, and thus, requires knowledge of the boolean structure of the network prior the start of the estimation. The implementation steps are summarised as follows:
 * The key point of this method is that it utilises an unscented kalman filter to allow for estimation of the non linear evolution of a variable. This way, they adapt it to conventional bayesian estimation of parameters.
 * The examples they provide on line are able to infer parameters in the repressilator and the JaK-STAT pathway. The repressilator is an example that neither of the methods implemented in this project can cope with.

Description
The previous state space models introduced, and the ones used for this project do not take into account time delays in regulatory networks. When we apply an input perturbation into the system, there are delays due to transcription, translation and transport which in turn modify their effects on the system of interest. So replacing the conventional state space representation, this network inference algorithm assumes that gene expression can be modelled as follows: $$x_{t+1}=Ax_t + Bu_{t-T} + w_t$$ $$y_{t}=Cx_t + v_t $$ Where $$ A, B, C x_t and y_t $$ have the same meaning as the variables described in Synthetic Datasets and $$T$$ is the delay in the input perturbation caused by the aforementioned processes. Interestingly, they also use the Akaike Information Criterion to rank network structures. This method is worth looking into as they go one step further than existing methods and apply their ideas to ChIP-ChIP data.

Description
This approach backs up Beal's method of estimating a posterior distribution of parameters, instead of a point estimate. Point estimates can be misleading and lead to development of "sloppy" models in systems biology. This method cannot infer parameters without knowledge of the boolean structure of the network, so it cannot cope with hidden variables. However, it takes an interesting perspective for parameter estimation and applies the idea to the MAPK signalling biological example. What is interesting about this method is that it can cope with linear and non linear examples, such as the classical Lotka-Volterra predator-prey interactions. The variational treatment is similar to Beal's although this framework uses sequential montecarlo simulations to optimize parameters, instead of the expectation maximization algorithm, more prone to getting stuck in local minima. The implementation steps can be summarised as: =Other Interesting Papers=
 * 1) Initialize parameters
 * 2) Propose an estimate of the parameter at the next time point, according to a prior distribution
 * 3) Simulate the dataset from that estimate
 * 4) Quantify the Eucledian distance measure between $$d(estimate dataset, observed dataset)$$
 * 5) If the distance is not small enough set the next estimate of the parameter with a probability given by the ratio of the likelihood of that estimate and the previous estimate - analogous to simulated annealing
 * 6) Continue iterating steps 2-5 until the distance is minimised.

Reference

 * Zak DE, Gonye GE, Schwaber JS, Doyle FJ, 3rd. Importance of input perturbations and stochastic gene exprsesion in the reverse engineering of genetic regulatory networks: insights from an identifiability analysis of an in silico network. Genome Res (2003);13:2396-2405

Description
This paper performs an identiviability analysis of an an in silico gene regulatory network that takes into account stochastic effects of gene expression. They identify the accuracy with which network parameters can be estimated as a function of the input perturbation, and show that for the network to be identifiable, they require prior knowledge of mRNA degradation constants. In addition, they mentioned that complex perturbations (such as a step) are more favourable in identifying network parameters than simpler ones (such as a pulse). What is most thought provoking about this paper is that they mention the necessity of the perturbation, and that "reconstruction is otherwise not possible" without extra information. This can be related to the method from Stan et al. and the results of the simulations from that in silico model were used as inputs to Beal's Variational Bayesian method (Beal et al. 2005). An interesting point of comparison would be the implementation of this in silico network with the robust control algorithm. This was not possible due to limited amounts of time.

Description
While systems biology aims to study and understand biological systems, synthetic biology uses a different approach, by building them "de-novo". In this paper, the authors created a synthetic network in yeast and measured time series and steady state expression after multiple perturbations of the system. Then they tested several in silico methods and their ability to reverse engineer the underlying network structure. The types of algorithms they tested included: BANJO (Bayesian Network Inference), ARACNE and an ODE based method. The possible future extension of the robust control method can be applied to the construction of synthetic gene networks, and has the potential of becoming a debugging tool for synthetic biology.