Kara M Dismuke Week 10 Journal

From OpenWetWare
Revision as of 19:15, 23 March 2015 by Kara M Dismuke (talk | contribs) (→‎Discussion: all bullets for discussion section)
Jump to navigationJump to search

Outline

Introduction

Regulation of gene expression

  • important process in cell
  • takes static information (in DNA) and transmits it into protein molecules (that serve various functions)
  • requires recognition of specific promoter sequences
  • the effects of transcription change with as the cell changes/develops

Microarrays

  • document changes in gene expression over time
    • analysis these changes can enable one to see a relationship between genes and their regulators
  • use microarray data to track the interaction between genes and their regulators

Saccharomyces cerevisiae

  • gene-expression data gathered from genome-wide microarrays
  • data analyzed using clustering methods
  • data modeled using singular value decomposition
  • genes were grouped according to their transcriptional regulatory networks (i.e. relationship between the genes and their respective regulators/promoters)

Previous Studies

  • use differential equations to try to develop a linear model that reflects the transcription pattern of each of the genes being studied
  • Woolf and Wang: used "fuzzy logic" to try to do this
    • Nachman: used kinetic model and Bayesian networks
    • Bar-Joseph: used genomic information and analysis of gene expression data
      • Wang and Makita: building of Bar-Joseph approach, the looked at the analysis of the promoter sequences and the sigma factor binding sequence motif

This Paper

  • alternative method b/c uses a nonlinear differential equation model
  • Procedure
    • choose set of all potential regulators (chose pool of 184)
    • choose set of target genes of S. cerevisiae (chose 40)
    • picks genes from possible regulators and applies model to then compare results to information known about the target gene
      • repeated to exhaust all possibilites
      • determine which regulators correctly model gene expression model
  • compare results and make conclusions using results from other studies & also a comparison of the linear model
  • result: this method can correctly identify a target gene's specific regulator and can say whether or not that regulator is an activator or repressor

Results

Dynamic model of transcription control

Model's Assumptions

  • recursive action of regulators on target gene (over time)
  • regulatory effect on gene can be expressed with a combination of its regulators
Equations

EQUATION 1

  • b: parameter that represents the initial delay or unspecific bias from regulatory effects associated with gene expression
  • g: regulatory effect for particular gene
  • wj: regulatory weights
  • yj: expression level of regulators
  • j=1,2,...m
    • m: the number of regulators controlling the gene

EQUATION 2

  • ρ: regulatory effects of other genes
  • x: effect of degradation
  • degradation: x = k*z where k is a constant in this kinetic equation
  • ρ and x make up the rate of expression of a target gene (dz/dt)

EQUATION 3

    • z: target expression level
    • complete model for control of target gene expression z

EQUATION 4

  • k1: maximal rate of expression
  • k2: rate of degradation of target gene product
  • simplification of Equation 3

EQUATION 5

  • y: approximated with polynomial of degree n

EQUATION 6

  • Once we have the expression profiles Z {z(t)} of the target and Y {y(t)} of the regulator genes, we search for gene profiles that minimize the mean square error function.
  • t: 1,2,...Q
    • Q: data points computed using Equation 4
  • {zc(t)}: reconstructed profile of z(t) in Z at all time points

EQUATION 7

  • Linear form of the model
  • parameters di (i=0,1,2): computed by minimizing error in function 6

Computational Algorithm

  • estimate expression profile of target gene in order to choose a set of potential regulators for a particular target gene
    • search for potential regulators uses Equations 4 and 6
  • approximate regulator gene profile by polynomial of degree n
  • algorithm
    1. fit regulators using Equation 5
    2. choose target gene
    3. choose a regulatory gene from pool of possible regulators
    4. use least squares minimization on the target and regulator genes
    5. repeat for all possible regulators (step 3)
    6. choose regulators that best satisfy criterion
    7. repeat for all target genes (step two)
  • procedure in algorithm was done 100 times for each pair of regulator and target gene
  • optimization done using Levenberg-Marquardt procedure
    • uses Runge-Kutta procedure (MATLAB's ode45 function)

Dataset selection

  • evaluated model by using Spellman's dataset
    • changes in gene expression: 18 time points over 2 cell cycle periods
    • chip had 6178 open reading frames
    • Spellman identified 800 genes associated with cell cycle, but in reality, there are a lot less regulators controlling the cell cycle
  • this paper:
    • 184 possible regulators (chosen based on YEASTRACT data and other papers' data
    • chose 40 target genes (ones from Chen's paper)

Inference of Regulators

  • data- in form of log base 2 of ratio between RNA amount divided by a standard
  • prior to analysis, data was squared
  • least squares minimization on each target gene for all possible regulators

EQUATION 8

  • approximation of unknown real profile of a target gene (contains error)
    • contains error, but this error can be estimated by this polynomial fit and/or a statistical model

EQUATION 9

  • deviation from experimental data
  • find "best" regulator for given target gene by finding regulator profile
    • regulator profile- based on using model (equation 4) and minimizing E (equation 6)

Table 1: summary of 'correct identification' of regulators for all targets

  • correct identification= gene identified as regulator for given target was also the regulator according to YEASTRACT
    • note: YEASTRACT is a good resource, but still a work in progress
  • correct identification- only 35% of the time, but false positive rate (FP) was very low
    • FP rate: ratio between regulators identified as FPs and total number of potential regulators
  • as criteria is softened, % found increased
    • most increases can be attributed to a few targets (YOR323C, YJL155C, YDR285W, and YAL018C)
    • specificity of the prediction: SP= (N-FP)/N
      • N: number of potential regulators
      • FP: number of false positives
  • regulators either activate or repress the target gene
    • classification was made based on the sign of the weight (w)
    • algorithm correctly identified regulator as activator or repressor 75% of the time (based on YEASTRACT data)

Sources of Error

  • YEASTRACT being incomplete
  • experimental noise
  • risk that the least squares minimization procedure does not yield optimal solution (although, there was attempt to avoid this by changing parameters and repeating the procedure 100 times for each target/regulator)

Comparison with linear model

  • Linear model= Equation 7
  • Figure 2: comparison between nonlinear model's and linear model's minimum for the number of regulators that needed to be tested before a yielding of the correct profile
  • Table 1: nonlinear model fits better (by one order of magnitude) than linear model
  • Regulators identified as "best fit" were compared against YEASTRACT and Chen's paper
    • No matches with results in Chen's paper
    • Nonlinear model prediction was the same as the linear model prediction in only 5 out of the 40 cases

Discussion

  • create non-linear model that generates target gene expression profile from a specific regulator to help model the cell cycle
  • minimize difference between measured target gene profile and profile computed from the regulator
  • model can correctly identify regulators of target genes and determine their function as either an activator or repressor
  • algorithm models all possible combinations of target/regulator and chooses from the pool of regulators to get the best predictions
  • get complete information ability of each regulator to model each target gene profile so as to then determine the "best" regulator
  • in comparison with the linear model, the nonlinear model gives much better results as it correctly identifies more regulators and gives a better fit of the computed target gene expression profile
  • comparison between results from the Chen, nonlinear, and linear model, we get different results in terms of sets of genes
  • focused on modeling simple case
    • relies on outside knowledge (and if a regulator is not identified in outside sources, then it escapes being modeled)
    • interactions between regulators and interactions between genes may skew results (not accounted for)
  • models are designed for particular cases, and this model,was successful in capturing the behavior of transcriptional regulation with a fair amount of accuracy as well as identifying their function
  • algorithm can be extended for use with a different organism's data set
  • in future, model will be able to handle more complexity in the transcriptional regulatory interactions

Conclusions

Definitions

  1. transcription
    • Transcription is the first step of gene expression, in which a particular segment of DNA is copied into RNA by the enzyme RNA polymerase. Both RNA and DNA are nucleic acids, which use base pairs of nucleotides as a complementary language that can be converted back and forth from DNA to RNA by the action of the correct enzymes. During transcription, a DNA sequence is read by an RNA polymerase, which produces a complementary, antiparallel RNA strand called a primary transcript. As opposed to DNA replication, transcription results in an RNA complement that includes the nucleotide uracil (U) in all instances where thymine (T) would have occurred in a DNA complement. Also unlike DNA replication where DNA is synthesized, transcription does not involve an RNA primer to initiate RNA synthesis.Although Transcription is nice.
    • http://www.biology-online.org/dictionary/Transcription
  2. RNA polymerase
    • An enzyme that is responsible for making rna from a dna template. In all cells RNAP is needed for constructing rna chains from a dna template, a process termed transcription. In scientific terms, RNAP is a nucleotidyl transferase that polymerizes ribonucleotides at the 3' end of an rna transcript. Rna polymerase enzymes are essential and are found in all organisms, cells, and many viruses.
    • http://www.biology-online.org/dictionary/RNA_polymerase
  3. promoter
  4. activator
  5. repressor
  6. regulator
  7. mRNA
  8. gene expression
  9. punative
  10. combinatorial
    • Any system using a random assortment of components at any positions in the linear arrangement of atoms, i.e., a combinatorial library of mutations could contain positions where all four bases have been randomly inserted.
    • http://www.biology-online.org/dictionary/Combinatorial