Natalie Williams Week 10: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
(Formatting and adding break spaces)
(Edited links and added a new heading + descriptions and Added Vocabulary section)
(5 intermediate revisions by the same user not shown)
Line 29: Line 29:
*For the model, an assumption that there is repeated interactions between regulators and target genes over time.
*For the model, an assumption that there is repeated interactions between regulators and target genes over time.
**The model also assumes there is combinatorial control among the regulators for target genes.
**The model also assumes there is combinatorial control among the regulators for target genes.
[http://nar.oxfordjournals.org/content/35/1/279/embed/mml-math-1.gif| Equation 1]
[http://nar.oxfordjournals.org/content/35/1/279/embed/mml-math-1.gif Equation 1]
*''yj'': expression level regulators
*''yj'': expression level regulators
*''wj'': regulatory weights
*''wj'': regulatory weights
Line 36: Line 36:
*''b'': parameter for transcription initiation delay/unspecific bias caused by regulator effects associated with gene expression
*''b'': parameter for transcription initiation delay/unspecific bias caused by regulator effects associated with gene expression
Rate of expression of target gene (dz/dt) is given by regulatory effects of other genes ρ & the effect of degradation ''x''. <br>
Rate of expression of target gene (dz/dt) is given by regulatory effects of other genes ρ & the effect of degradation ''x''. <br>
[http://nar.oxfordjournals.org/content/35/1/279/embed/mml-math-2.gif| Equation2]
[http://nar.oxfordjournals.org/content/35/1/279/embed/mml-math-2.gif Equation2]
*Degradation is shown with a first order chemical reaction --> x = k*z
*Degradation is shown with a first order chemical reaction --> x = k*z
*ρ = regulatory effect g of regulators transformed by a sigmoidal transfer
*ρ = regulatory effect g of regulators transformed by a sigmoidal transfer
The entire model for control of target gene expression ''z'': <br>
The entire model for control of target gene expression ''z'': <br>
[http://nar.oxfordjournals.org/content/35/1/279/embed/mml-math-3.gif| Equation 3]
[http://nar.oxfordjournals.org/content/35/1/279/embed/mml-math-3.gif Equation 3]
*k2: rate of degradation of target gene product
*k2: rate of degradation of target gene product
*k1: rate of expression
*k1: rate of expression
However, Equation 3 can be simplified to Equation 4 <br>
However, Equation 3 can be simplified to Equation 4 <br>
[http://nar.oxfordjournals.org/content/35/1/279/embed/mml-math-4.gif| Equation 4]
[http://nar.oxfordjournals.org/content/35/1/279/embed/mml-math-4.gif Equation 4]
*''y'' is approximated with a polynomial of degree ''n''
*''y'' is approximated with a polynomial of degree ''n''
[http://nar.oxfordjournals.org/content/35/1/279/embed/mml-math-5.gif| Approximation of y]
[http://nar.oxfordjournals.org/content/35/1/279/embed/mml-math-5.gif Approximation of y]
*Coefficients were taken from experimental gene expression data using a least squares approximation.
*Coefficients were taken from experimental gene expression data using a least squares approximation.
*An assumption that all the weight errors for all points were the same.
*An assumption that all the weight errors for all points were the same.
Line 52: Line 52:
*''n'' has to be chosen to represent the large amounts of changes in gene expression for each individual experiment
*''n'' has to be chosen to represent the large amounts of changes in gene expression for each individual experiment
These expression profiles <b>Z</b> {z(t)} for the target and <b>Y</b> {y(t)} for regulating genes measure at time points ranging from 1,2...Q were used to look at and analyze the gene profiles to minimize the average square error. <br>
These expression profiles <b>Z</b> {z(t)} for the target and <b>Y</b> {y(t)} for regulating genes measure at time points ranging from 1,2...Q were used to look at and analyze the gene profiles to minimize the average square error. <br>
[http://nar.oxfordjournals.org/content/35/1/279/embed/mml-math-6.gif| Equation 6]
[http://nar.oxfordjournals.org/content/35/1/279/embed/mml-math-6.gif Equation 6]
*{z^c(t)}: altered profile of z(t) for all Z at time points t=1,2,...Q,
*{z^c(t)}: altered profile of z(t) for all Z at time points t=1,2,...Q,
*Q: data points calculated from Equation 4
*Q: data points calculated from Equation 4
The issue began to focus on how to get the best results with the minimal amount of error.
The issue began to focus on how to get the best results with the minimal amount of error.
<br>
<br>
The [http://nar.oxfordjournals.org/content/35/1/279/embed/mml-math-7.gif| linear model] was then compared to the nonlinear model.
The [http://nar.oxfordjournals.org/content/35/1/279/embed/mml-math-7.gif Linear model] was then compared to the nonlinear model.
*The parameters (d) came from the minimization of errors in function 6.  
*The parameters (d) came from the minimization of errors in function 6.  
====Computational algorithm====
====Computational algorithm====
*Regulators for target genes are being chosen to predict the profiles of the target genes by using the pool of 184 potential regulators
**Equations 4 and 6 were used
*Potential missing experimental data is added into the method by using the polynomial of degree ''n'', with ''n'' representing the number of data points and level of expression change
The algorithm used is as follows:
#Fit regulators with Equation 5
#Select a target gene
#Potential regulatory gene is chosen
#The least squares minimization for target and regulator genes was then applied
#Step 3 is repeated for all potential regulators
#Regulators that best fit the selection are then picked out
#Step 2 and then all following steps are repeated until this method has been done for all target genes
*The above algorithm was done 100 times for each pair of regulator and target gene.
*Optimization was based off the LEvenberg-Marquardt method & Equation 4 was solved with ode45 in MATLAB.
====Dataset selection====
To validate their model, Vu and Vohradsky compared their results to microarray data from Spellman.
*6178 open reading fames were on the chip.
*The amount of regulators was smaller for influencing the cell cycle.
The 184 possible regulators was extracted from YEASTRACT and other published papers
<br>
The 40 target genes were selected from Chen et al's work.
====Inference of regulators====
The data were in the form of log base 2 ratios between actual values of mRNA divided by value of a standard.Before analyzing the data, each expression was squared then underwent the least squares minimization procedure for all target-regulator relationship. <br>
[http://nar.oxfordjournals.org/content/35/1/279/embed/mml-math-8.gif Equation 8] is used to approximate the unknown real expression pattern for each gene due to potential error in the execution and results of the experiment as well as the nature of the biological processes that occur.
*Through multiple measurements and the use of polynomial fit, a statistical model can predict the overall error.
**However, the polynomial fit test was still going to be used
[http://nar.oxfordjournals.org/content/35/1/279/embed/mml-math-9.gif Equation 9] shows the expression of how to obtain the deviation of the model from the data.
Minimization of error (Eq 6) and finding the target’s expression as a response to its regulator’s profile (Eq 4) will help identify the best regulator-target pairs.
*Pairs for target and regulator were chosen based on their error being less than the deviation obtained from Eq 9.
*In the plots, the smaller E’s (squared errors) signify that the regulator fits the profile of the target gene better than other regulators.
*A comparison was made between the selected pairs that had the best profiles with the YEASTRACT database.
*If there was a match, then the pair was labeled as correct.
Table 1 shows the target genes and which regulator fit an individual target best as well as the errors associated with the pair.
*In Table 1, 35% of the regulator-target relationships were described as correct and best-fitted.
The average false positive was low.
*’min(m)’ shows that most targets’ regulators were identified within the first 5 tests done from the regulator pool; 90% of targets had their regulators identified within the first 10 tests from the pool.
*The false positive was found as the relationship between regulators that were identified as false positives and total number of potential regulators (184).
*The false positives in this paper are defined as those regulator-target pairs that were not found on the YEASTRACT database
*The false positives were centered on a select few target genes because their profiles were easily influenced by many, almost all, regulators.
The specificity of prediction is Sp = (N - FP)/N where N is the number of potential regulators and FP is false positives.
*This equation is used to find the number of experiments needed to be done to verify what was seen as the results from the algorithms.
Regulators can either activate or repress the target gene depending on the sign of the weight (+ activates while - represses)
*In comparing the algorithm’s predictions to YEASTRACT around 75% of the time it correctly identified a regulator as an activator or repressor.
There were many sources of error, including:
*YEASTRACT not having all of the information regarding genes and their relationship to other genes
*Experimental noise
*Use of least squares altering the degree to which the parameters fall together to achieve the optimized/maximum value
**The algorithm/procedure was run 100 times with arbitrary initial values as well as the parameters being singled out to achieve the best profile.
====Comparison with linear model====
Linear Model shown in Eq 7
*Table 1 shows results from the linear model
*Figure 2 has the histograms to compare the nonlinear and linear models’ minimum for how many regulators were tested before the correct profile was given
*In Table 1, the error function shows that the fit is one degree better for the nonlinear model than the linear
*Best fit from the nonlinear algorithm was also compared to YEASTRACT as well as another paper by Chen et al.
**No matches occurred between the predictions of Chen et al and only 5/40 were correct for the nonlinear model.


===Discussion===
*An algorithm (nonlinear model) that assumed that the target gene's profile is a result of a specific regulator was used to help model the cell cycle.
*The observed, experimental data was compared to the output of the model.
**This difference was minimized with the least squares function ([http://nar.oxfordjournals.org/content/35/1/279/embed/mml-math-6.gif Eq 6]).
*The pairs that had the best relationship in the model similar to the data were chosen; the algorithm correctly identified around 40% of the pairs - verified with YEASTRACT.
*This model takes into account all the possible connections between target genes and their potential regulators.
**To run at a larger scale instead of 40 targets and 184 potential regulators, the algorithm would parallelized.
*The nonlinear model predicts and fits the data better than the linear model.
*Comparing all three results (Chen et al, nonlinear, and linear), different connections and data sets were produced from each of them.
*This study focused on a simplified case/gene regulatory networks.
**Because it is simplified, the data used to compile and create this network came from previous studies; however, with not all the information gathered about particular genes - target and regulatory - there could be some false inferences taken from what was observed.
**The study also did not include in the regulatory pool the target genes themselves, which could have skewed the results. Some target genes may regulate other target genes or itself.
*Models cannot be applied to an overall organism under all conditions, but under specific environments or testing for isolated responses.
*The algorithm constructed for this paper can be used to figure out the transcription network in other organisms and their gene regulation.
===Conclusions===
*The focus of this study was to describe and understand the relationship between the target genes and their regulators and to understand the basic transcriptional regulation of those genes.
**It can correctly identify whether genes are activators or repressors.
**It also helps determine the strength regulators have on their target genes.


==Vocabulary==
#Co-ordinately controlled genes:
#Putative:
#Nonlinear differential equation:
#Recursive:
#Reconstructed profile:
#Levenberg-Marquardt procedure:
#Specificity of prediction:
#Parallelized:
#Proteomic:
#


{{Template:Natalie Williams}}
{{Template:Natalie Williams}}

Revision as of 13:27, 22 March 2015

Outline of Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae

Introduction

  • Gene regulation makes a working copy of the genetic information of DNA sequences into proteins and/or functional RNAs.
    • Promoting regions must be recognized by transcription regulatory proteins which bind RNA polymerase to the DNA strand.
  • Microarray developments have made it easier to follow the changes of the cell's gene expression over time.
    • Analyzing this microarray data, we could better understand the relationships between genes and their transcription factor regulators.
    • Because these relationships collectively form a network among the genes, it should be possible to construct networks by studying the results of microarray data.
  • Budding yeast, Saccharomyces cerevisiae, has been studied extensively in the lab.
    • There is a lot of knowledge about its genome.
    • Expression data was collected and analyzed to figure out what genes were being used at a specific stage of the cell cycle.
    • Genes were grouped based on where their regulators bound to promoter regions.
  • Methods in which networks were produced previously:
    • A generalized linear model was going to be created to described regulators and guess the pattern of regulators and their target genes.
    • A kinetic model with Bayesian networks was used to predict gene regulatory networks as well as the proteins that regulate genes expression.
    • Including both information from the genome and gene expression data named another method to predicting networks.
      • Another research furthered this method by using promoter regions or the sigma factor.
  • An alternative method used in this paper:
    • A model based on nonlinear differential equation model was used.
      • It called for all potential regulators
      • Genes from a group of potential regulators are picked and the model is applied to try to fit the gene expression results of the target genes.
      • This is done for all potential regulators
  • In this model:
    • There were 40 target genes;
    • 184 possible regulators were identified;
    • The data were analyzed using a linear model; and,
    • Results from the linear model were compared to that of the nonlinear differential equation system to see how well it predicted the target genes' profiles.

Results

Dynamic model of transcriptional control

  • For the model, an assumption that there is repeated interactions between regulators and target genes over time.
    • The model also assumes there is combinatorial control among the regulators for target genes.

Equation 1

  • yj: expression level regulators
  • wj: regulatory weights
  • g: regulator effect of a specific gene
  • j =1,2,...m, where m is number of regulators controlling a gene
  • b: parameter for transcription initiation delay/unspecific bias caused by regulator effects associated with gene expression

Rate of expression of target gene (dz/dt) is given by regulatory effects of other genes ρ & the effect of degradation x.
Equation2

  • Degradation is shown with a first order chemical reaction --> x = k*z
  • ρ = regulatory effect g of regulators transformed by a sigmoidal transfer

The entire model for control of target gene expression z:
Equation 3

  • k2: rate of degradation of target gene product
  • k1: rate of expression

However, Equation 3 can be simplified to Equation 4
Equation 4

  • y is approximated with a polynomial of degree n

Approximation of y

  • Coefficients were taken from experimental gene expression data using a least squares approximation.
  • An assumption that all the weight errors for all points were the same.
  • The simplified version - Equation 4 - was used to figure out regulators of the target genes
  • n has to be chosen to represent the large amounts of changes in gene expression for each individual experiment

These expression profiles Z {z(t)} for the target and Y {y(t)} for regulating genes measure at time points ranging from 1,2...Q were used to look at and analyze the gene profiles to minimize the average square error.
Equation 6

  • {z^c(t)}: altered profile of z(t) for all Z at time points t=1,2,...Q,
  • Q: data points calculated from Equation 4

The issue began to focus on how to get the best results with the minimal amount of error.
The Linear model was then compared to the nonlinear model.

  • The parameters (d) came from the minimization of errors in function 6.

Computational algorithm

  • Regulators for target genes are being chosen to predict the profiles of the target genes by using the pool of 184 potential regulators
    • Equations 4 and 6 were used
  • Potential missing experimental data is added into the method by using the polynomial of degree n, with n representing the number of data points and level of expression change

The algorithm used is as follows:

  1. Fit regulators with Equation 5
  2. Select a target gene
  3. Potential regulatory gene is chosen
  4. The least squares minimization for target and regulator genes was then applied
  5. Step 3 is repeated for all potential regulators
  6. Regulators that best fit the selection are then picked out
  7. Step 2 and then all following steps are repeated until this method has been done for all target genes
  • The above algorithm was done 100 times for each pair of regulator and target gene.
  • Optimization was based off the LEvenberg-Marquardt method & Equation 4 was solved with ode45 in MATLAB.

Dataset selection

To validate their model, Vu and Vohradsky compared their results to microarray data from Spellman.

  • 6178 open reading fames were on the chip.
  • The amount of regulators was smaller for influencing the cell cycle.

The 184 possible regulators was extracted from YEASTRACT and other published papers
The 40 target genes were selected from Chen et al's work.

Inference of regulators

The data were in the form of log base 2 ratios between actual values of mRNA divided by value of a standard.Before analyzing the data, each expression was squared then underwent the least squares minimization procedure for all target-regulator relationship.
Equation 8 is used to approximate the unknown real expression pattern for each gene due to potential error in the execution and results of the experiment as well as the nature of the biological processes that occur.

  • Through multiple measurements and the use of polynomial fit, a statistical model can predict the overall error.
    • However, the polynomial fit test was still going to be used

Equation 9 shows the expression of how to obtain the deviation of the model from the data. Minimization of error (Eq 6) and finding the target’s expression as a response to its regulator’s profile (Eq 4) will help identify the best regulator-target pairs.

  • Pairs for target and regulator were chosen based on their error being less than the deviation obtained from Eq 9.
  • In the plots, the smaller E’s (squared errors) signify that the regulator fits the profile of the target gene better than other regulators.
  • A comparison was made between the selected pairs that had the best profiles with the YEASTRACT database.
  • If there was a match, then the pair was labeled as correct.

Table 1 shows the target genes and which regulator fit an individual target best as well as the errors associated with the pair.

  • In Table 1, 35% of the regulator-target relationships were described as correct and best-fitted.

The average false positive was low.

  • ’min(m)’ shows that most targets’ regulators were identified within the first 5 tests done from the regulator pool; 90% of targets had their regulators identified within the first 10 tests from the pool.
  • The false positive was found as the relationship between regulators that were identified as false positives and total number of potential regulators (184).
  • The false positives in this paper are defined as those regulator-target pairs that were not found on the YEASTRACT database
  • The false positives were centered on a select few target genes because their profiles were easily influenced by many, almost all, regulators.

The specificity of prediction is Sp = (N - FP)/N where N is the number of potential regulators and FP is false positives.

  • This equation is used to find the number of experiments needed to be done to verify what was seen as the results from the algorithms.

Regulators can either activate or repress the target gene depending on the sign of the weight (+ activates while - represses)

  • In comparing the algorithm’s predictions to YEASTRACT around 75% of the time it correctly identified a regulator as an activator or repressor.

There were many sources of error, including:

  • YEASTRACT not having all of the information regarding genes and their relationship to other genes
  • Experimental noise
  • Use of least squares altering the degree to which the parameters fall together to achieve the optimized/maximum value
    • The algorithm/procedure was run 100 times with arbitrary initial values as well as the parameters being singled out to achieve the best profile.

Comparison with linear model

Linear Model shown in Eq 7

  • Table 1 shows results from the linear model
  • Figure 2 has the histograms to compare the nonlinear and linear models’ minimum for how many regulators were tested before the correct profile was given
  • In Table 1, the error function shows that the fit is one degree better for the nonlinear model than the linear
  • Best fit from the nonlinear algorithm was also compared to YEASTRACT as well as another paper by Chen et al.
    • No matches occurred between the predictions of Chen et al and only 5/40 were correct for the nonlinear model.

Discussion

  • An algorithm (nonlinear model) that assumed that the target gene's profile is a result of a specific regulator was used to help model the cell cycle.
  • The observed, experimental data was compared to the output of the model.
    • This difference was minimized with the least squares function (Eq 6).
  • The pairs that had the best relationship in the model similar to the data were chosen; the algorithm correctly identified around 40% of the pairs - verified with YEASTRACT.
  • This model takes into account all the possible connections between target genes and their potential regulators.
    • To run at a larger scale instead of 40 targets and 184 potential regulators, the algorithm would parallelized.
  • The nonlinear model predicts and fits the data better than the linear model.
  • Comparing all three results (Chen et al, nonlinear, and linear), different connections and data sets were produced from each of them.
  • This study focused on a simplified case/gene regulatory networks.
    • Because it is simplified, the data used to compile and create this network came from previous studies; however, with not all the information gathered about particular genes - target and regulatory - there could be some false inferences taken from what was observed.
    • The study also did not include in the regulatory pool the target genes themselves, which could have skewed the results. Some target genes may regulate other target genes or itself.
  • Models cannot be applied to an overall organism under all conditions, but under specific environments or testing for isolated responses.
  • The algorithm constructed for this paper can be used to figure out the transcription network in other organisms and their gene regulation.

Conclusions

  • The focus of this study was to describe and understand the relationship between the target genes and their regulators and to understand the basic transcriptional regulation of those genes.
    • It can correctly identify whether genes are activators or repressors.
    • It also helps determine the strength regulators have on their target genes.

Vocabulary

  1. Co-ordinately controlled genes:
  2. Putative:
  3. Nonlinear differential equation:
  4. Recursive:
  5. Reconstructed profile:
  6. Levenberg-Marquardt procedure:
  7. Specificity of prediction:
  8. Parallelized:
  9. Proteomic:

Back to User Page: User:Natalie Williams
To view the Course and Assignments:BIOL398-04/S15