Kara M Dismuke Week 10 Journal

From OpenWetWare

Jump to: navigation, search

Journal Club 2 Group Presentation PPT: JC2 PPT




Regulation of gene expression

  • important process in cell
  • takes static information (in DNA) and transmits it into protein molecules (that serve various functions)
  • requires recognition of specific promoter sequences
  • the effects of transcription change with as the cell changes/develops


  • document changes in gene expression over time
    • analysis these changes can enable one to see a relationship between genes and their regulators
  • use microarray data to track the interaction between genes and their regulators

Saccharomyces cerevisiae

  • gene-expression data gathered from genome-wide microarrays
  • data analyzed using clustering methods
  • data modeled using singular value decomposition
  • genes were grouped according to their transcriptional regulatory networks (i.e. relationship between the genes and their respective regulators/promoters)

Previous Studies

  • use differential equations to try to develop a linear model that reflects the transcription pattern of each of the genes being studied
  • Woolf and Wang: used "fuzzy logic" to try to do this
    • Nachman: used kinetic model and Bayesian networks
    • Bar-Joseph: used genomic information and analysis of gene expression data
      • Wang and Makita: building of Bar-Joseph approach, the looked at the analysis of the promoter sequences and the sigma factor binding sequence motif

This Paper

  • alternative method b/c uses a nonlinear differential equation model
  • Procedure
    • choose set of all potential regulators (chose pool of 184)
    • choose set of target genes of S. cerevisiae (chose 40)
    • picks genes from possible regulators and applies model to then compare results to information known about the target gene
      • repeated to exhaust all possibilites
      • determine which regulators correctly model gene expression model
  • compare results and make conclusions using results from other studies & also a comparison of the linear model
  • result: this method can correctly identify a target gene's specific regulator and can say whether or not that regulator is an activator or repressor


Dynamic model of transcription control

Model's Assumptions

  • recursive action of regulators on target gene (over time)
  • regulatory effect on gene can be expressed with a combination of its regulators

Image: Mml-math-1.gif

  • b: parameter that represents the initial delay or unspecific bias from regulatory effects associated with gene expression
  • g: regulatory effect for particular gene
  • wj: regulatory weights
  • yj: expression level of regulators
  • j=1,2,...m
    • m: the number of regulators controlling the gene

Image: Mml-math-2.gif

  • ρ: regulatory effects of other genes
  • x: effect of degradation
  • degradation: x = k*z where k is a constant in this kinetic equation
  • ρ and x make up the rate of expression of a target gene (dz/dt)

Image: Mml-math-3.gif

    • z: target expression level
    • complete model for control of target gene expression z

Image: Mml-math-4.gif

  • k1: maximal rate of expression
  • k2: rate of degradation of target gene product
  • simplification of Equation 3

Image: Mml-math-5.gif

  • y: approximated with polynomial of degree n

Image: Mml-math-6.gif

  • Once we have the expression profiles Z {z(t)} of the target and Y {y(t)} of the regulator genes, we search for gene profiles that minimize the mean square error function.
  • t: 1,2,...Q
    • Q: data points computed using Equation 4
  • {zc(t)}: reconstructed profile of z(t) in Z at all time points

Image: Mml-math-7.gif

  • Linear form of the model
  • parameters di (i=0,1,2): computed by minimizing error in function 6

Computational Algorithm

  • estimate expression profile of target gene in order to choose a set of potential regulators for a particular target gene
    • search for potential regulators uses Equations 4 and 6
  • approximate regulator gene profile by polynomial of degree n
  • algorithm
    1. fit regulators using Equation 5
    2. choose target gene
    3. choose a regulatory gene from pool of possible regulators
    4. use least squares minimization on the target and regulator genes
    5. repeat for all possible regulators (step 3)
    6. choose regulators that best satisfy criterion
    7. repeat for all target genes (step two)
  • procedure in algorithm was done 100 times for each pair of regulator and target gene
  • optimization done using Levenberg-Marquardt procedure
    • uses Runge-Kutta procedure (MATLAB's ode45 function)

Dataset selection

  • evaluated model by using Spellman's dataset
    • changes in gene expression: 18 time points over 2 cell cycle periods
    • chip had 6178 open reading frames
    • Spellman identified 800 genes associated with cell cycle, but in reality, there are a lot less regulators controlling the cell cycle
  • this paper:
    • 184 possible regulators (chosen based on YEASTRACT data and other papers' data
    • chose 40 target genes (ones from Chen's paper)

Inference of Regulators

  • data- in form of log base 2 of ratio between RNA amount divided by a standard
  • prior to analysis, data was squared
  • least squares minimization on each target gene for all possible regulators

Image: Mml-math-8.gif

  • approximation of unknown real profile of a target gene (contains error)
    • contains error, but this error can be estimated by this polynomial fit and/or a statistical model

Image: Mml-math-9.gif

  • deviation from experimental data
  • find "best" regulator for given target gene by finding regulator profile
    • regulator profile- based on using model (equation 4) and minimizing E (equation 6)

Table 1: summary of 'correct identification' of regulators for all targets

  • correct identification= gene identified as regulator for given target was also the regulator according to YEASTRACT
    • note: YEASTRACT is a good resource, but still a work in progress
  • correct identification- only 35% of the time, but false positive rate (FP) was very low
    • FP rate: ratio between regulators identified as FPs and total number of potential regulators
  • as criteria is softened, % found increased
    • most increases can be attributed to a few targets (YOR323C, YJL155C, YDR285W, and YAL018C)
    • specificity of the prediction: SP= (N-FP)/N
      • N: number of potential regulators
      • FP: number of false positives
  • regulators either activate or repress the target gene
    • classification was made based on the sign of the weight (w)
    • algorithm correctly identified regulator as activator or repressor 75% of the time (based on YEASTRACT data)

Sources of Error

  • YEASTRACT being incomplete
  • experimental noise
  • risk that the least squares minimization procedure does not yield optimal solution (although, there was attempt to avoid this by changing parameters and repeating the procedure 100 times for each target/regulator)

Comparison with linear model

  • Linear model= Equation 7
  • Figure 2: comparison between nonlinear model's and linear model's minimum for the number of regulators that needed to be tested before a yielding of the correct profile
  • Table 1: nonlinear model fits better (by one order of magnitude) than linear model
  • Regulators identified as "best fit" were compared against YEASTRACT and Chen's paper
    • No matches with results in Chen's paper
    • Nonlinear model prediction was the same as the linear model prediction in only 5 out of the 40 cases


  • create non-linear model that generates target gene expression profile from a specific regulator to help model the cell cycle
  • minimize difference between measured target gene profile and profile computed from the regulator
  • model can correctly identify regulators of target genes and determine their function as either an activator or repressor
  • algorithm models all possible combinations of target/regulator and chooses from the pool of regulators to get the best predictions
  • get complete information ability of each regulator to model each target gene profile so as to then determine the "best" regulator
  • in comparison with the linear model, the nonlinear model gives much better results as it correctly identifies more regulators and gives a better fit of the computed target gene expression profile
  • comparison between results from the Chen, nonlinear, and linear model, we get different results in terms of sets of genes
  • focused on modeling simple case
    • relies on outside knowledge (and if a regulator is not identified in outside sources, then it escapes being modeled)
    • interactions between regulators and interactions between genes may skew results (not accounted for)
  • models are designed for particular cases, and this model,was successful in capturing the behavior of transcriptional regulation with a fair amount of accuracy as well as identifying their function
  • algorithm can be extended for use with a different organism's data set
  • in future, model will be able to handle more complexity in the transcriptional regulatory interactions


  • focus of this study: understand relationship between target genes and their regulators and understand the basic transcriptional regulation of the genes
    • also, identify function of regulator as either an activator or repressor
  • improvements to the algorithm can be made in the future to better account for the number of computations the algorithm requires

Figures and Tables

Table 1: "Summary of identification of regulators for 40 selected yeast cell cycle regulated genes"

  • "best" column..."best" are regulators with smallest E
  • as constraints on E are loosened, the number of regulators found/identified increases (in one case E1 is multiplied by 1.1 and in the other, E1 is multiplied by 1.2)
  • min(m)...position of first correctly found regulator in list of regulators for given target
    • two columns compare linear and non-linear model results
  • E...E values
    • two columns list lowest E values obtained from linear model and non-linear model

Figure 1

  • 2 sections: A (graphs for regulators found that are repressors) and B (graphs for regulators found that are activators)
  • x-axis: 18 time points used in model's simulation (to obtain data)
  • y-axis: expression relative to time point zero
  • names of target/regulator pair
  • dotted line: reconstructed target (outputs from simulation)
  • solid line: regulator that best fits the data for the specific target gene
  • A: repressors have "opposite" curves to target gene and it's reconstructed profile
    • more repression, less gene expression (and less repression, more gene expression)
  • B: activators have a similar curve when compared to that of the target gene and it's reconstructed profile
    • more activation, more gene expression (and less activation, less gene expression)

Figure 2: Histogram of distribution of the order of correctly identified regulators in the sorted list of potential regulators (columns Min(m) and Min(m) from Table 1)

  • A: results from nonlinear model (equation 4)
  • B: results from linear model (equation 7)
  • in comparing the two, one sees the nonlinear model better predicted the regulators of genes with a smaller pool (highest value for A was approximately between 19 and 21; highest value for B was approximately between 33 and 35)

Further Questions

What is the main result presented in this paper?

  • The main result presented in this paper was the discovery of a nonlinear method that can accurately pair regulators to their respective target genes, while also determining the function of the regulator (as a repressor or activator).

What is the importance or significance of this work?

  • After reading this paper, it is clear there is no "end all be all" model that describes the relationships between regulators and their target genes. However, it became clear that their nonlinear model proved to be more effective in accurately identifying/describing these relationships than a linear model. In addition, the authors of the article noted the specificity of the case presented in the article and does not want their approach to eliminate all others; rather, they want them to work in conjunction with each other in order to develop a more full overall picture for the regulatory processes within yeast cells.

The methods are described at points in the outline.

  • 40 target genes
  • pool of 184 potential regulators
  • least square minimization used to help fit the data
  • for each target/regulator pair, the procedure was done 100 times with different initial conditions to obtain optimum fit


  1. transcription
    • Transcription is the first step of gene expression, in which a particular segment of DNA is copied into RNA by the enzyme RNA polymerase. Both RNA and DNA are nucleic acids, which use base pairs of nucleotides as a complementary language that can be converted back and forth from DNA to RNA by the action of the correct enzymes. During transcription, a DNA sequence is read by an RNA polymerase, which produces a complementary, antiparallel RNA strand called a primary transcript. As opposed to DNA replication, transcription results in an RNA complement that includes the nucleotide uracil (U) in all instances where thymine (T) would have occurred in a DNA complement. Also unlike DNA replication where DNA is synthesized, transcription does not involve an RNA primer to initiate RNA synthesis.Although Transcription is nice.
    • http://www.biology-online.org/dictionary/Transcription
  2. RNA polymerase
    • An enzyme that is responsible for making rna from a dna template. In all cells RNAP is needed for constructing rna chains from a dna template, a process termed transcription. In scientific terms, RNAP is a nucleotidyl transferase that polymerizes ribonucleotides at the 3' end of an rna transcript. Rna polymerase enzymes are essential and are found in all organisms, cells, and many viruses.
    • http://www.biology-online.org/dictionary/RNA_polymerase
  3. promoter
  4. activator
  5. repressor
  6. regulator
  7. mRNA
  8. gene expression
  9. punative
  10. combinatorial
    • Any system using a random assortment of components at any positions in the linear arrangement of atoms, i.e., a combinatorial library of mutations could contain positions where all four bases have been randomly inserted.
    • http://www.biology-online.org/dictionary/Combinatorial
Personal tools