Tessa A. Morris Week 10

From OpenWetWare
Jump to: navigation, search

Article

Transcriptional Regulatory Networks in Saccharomyces cerevisiae

Presentation

Partners: Kristen M. Horstmann and Lucia I. Ramirez
Presentation

10 Biological Terms

  1. Nucleate: To form a nucleus; to act as a nucleus (for). source
  2. Chromatin: A complex of nucleic acids (e.g. DNA or RNA) and proteins (histones), which condenses to form a chromosome during cell division. In eukaryotic cells, it is found within the cell nucleus whereas in prokaryotic cells, it is found within the nucleoid. Its functions are to package DNA into a smaller volume to fit in the cell, strengthen the DNA to allow mitosis and meiosis, and to serve as a mechanism to control expression. source
  3. Motifs: The smallest group of atoms in a polymer that, when under the influence of a rotation-translation operator, will assemble the rest of the atoms in the chain. source
  4. Genome-wide location analysis: a tool for identifying protein–DNA interaction sites on a genomic scale source
  5. myc epitope tag: Epitope tagging is a technique in which a known epitope is fused to a recombinant protein by means of genetic engineering. By choosing an epitope for which an antibody is available, the technique makes it possible to detect proteins for which no antibody is available. This is especially useful for the characterization of newly discovered proteins and proteins of low immunogenicity. By selection of the appropriate epitope and antibody pair, it is possible to find a combination with properties that are suitable for the desired experimental application, such as Western blot analysis, immunoprecipitation, immunochemistry, and affinity purification. source
  6. Immunoblot analysis (western blotting) is a rapid and sensitive assay for the detection and characterization of proteins that works by exploiting the specificity inherent in antigen-antibody recognition. It involves the solubilization and electrophoretic separation of proteins, glycoproteins, or lipopolysaccharides by gel electrophoresis, followed by quantitative transfer and irreversible binding to nitrocellulose, PVDF, or nylon. source
  7. Peptone: The soluble and diffusible substance or substances into which albuminous portions of the food are transformed by the action of the gastric and pancreatic juices. Peptones are also formed from albuminous matter by the action of boiling water and boiling dilute acids. Collectively, in a broader sense, all the products resulting from the solution of albuminous matter in either gastric or pancreatic juice. In this case, however, intermediate products (albumose bodies), such as antialbumose, hemialbumose, etc, are mixed with the true peptones. Also termed albuminose. pure peptones are of three kinds, amphopeptone, antipeptone, and hemipeptone, and, unlike the albumose bodies, are not precipitated by saturating their solutions with ammonium sulphate. source
  8. Dextrose: a sirupy, or white crystalline, variety of sugar, C6H12O6 (so called from turning the plane of polarization to the right), occurring in many ripe fruits. Dextrose and levulose are obtained by the inversion of cane sugar or sucrose, and hence called invert sugar. Dextrose is chiefly obtained by the action of heat and acids on starch, and hence called also starch sugar. It is also formed from starchy food by the action of the amylolytic ferments of saliva and pancreatic Juice. The solid products are known to the trade as grape sugar; the sirupy products as glucose, or mixing sirup. These are harmless, but are only about half as sweet as cane or sucrose. source
  9. Chromatin immunoprecipitation: detecting interactions between a protein and a DNA sequence in vivo source
  10. Thiamine: chemical name: Thiazolium, 3-((4-amino-2-methyl-5-pyrimidinyl)methyl)-5-(2-hydroxyethyl)-4-methyl- chloride A B vitamin that prevents beriberi; maintains appetite and growth.More commonly known as vitamin c and found commonly in cereals, thiamine acts as a coenzyme used to breakdown sugars. source
  11. Fhl1 function: Regulator of ribosomal protein (RP) transcription source
  12. Locus: The location of a gene (or of a significant sequence) on a chromosome, as in genetic locus. source
  13. Epitope: That part of an antigenic molecule to which the t-cell receptor responds, a site on a large molecule against which an antibody will be produced and to which it will bind. source
  14. Abf1: functions in transcription, replication, gene silencing, and NER (nucleotide excision repair) in yeast source

Outline

Abstract

  • It was determined how most transcriptional regulators encoded in Saccharomyces cerevisiae associate with genes across the genome in living cells, which can be used to describe potential pathways yeast cells can use to regulate global gene expression programs
  • In this experiment, the scientists used this information to identify network motifs (simplest units of network architecture) and show that an automated process can use motifs to assemble a transcription regulatory network structure
  • Results: eukaryotic cellular functions are highly connected through networks of transcriptional regulators that regulate other transcriptional regulators

Introduction

  • Aim of paper is to understand how cells control global gene expression programs
  • Each cell is the product of specific gene expression programs involving regulated transcription of thousands of genes
  • Transcriptional programs are modified as cells progress through the cell cycle due to changes in environment and during organismal development
  • Gene expression programs are dependent on the recognition of specific promoter sequences by transcriptional regulatory proteins
  • Regulatory proteins recruit and regulate chromatin-modifying complexes and components of the transcriptional apparatus
    • Knowledge of the sites bound by all the transcriptional regulators encoded in a genome can provide the information necessary to nucleate models for transcriptional regulatory networks
  • With the availability of complete genome sequences and development of a method for genome-wide binding analysis (genome-wide location analysis), investigators can identify the set of target genes bound in vivo by each of the transcriptional regulators that are encoded in a cell’s genome.

Experimental Design

  • Used genomewide location analysis to investigate how yeast transcriptional regulators bind to promoter sequences across the genome
    • Figure 1-A:
      • Yeast transcriptional regulators were tagged by introducing the coding sequence for a c-myc epitope tag into the normal genomic locus for each regulator.
      • 106 of the yeast strains contained a single epitope-tagged regulator whose expression could be detected in rich growth conditions.
      • Chromatin immunoprecipitation (ChIP) was performed on each of these 106 strains.
      • Promoter regions enriched through the ChIP procedure were identified by hybridization to microarrays containing a genome-wide set of yeast promoter regions.
  • Studied all 141 transcription factors listed in the Yeast Proteome Database and reported to have DNA binding and transcriptional activity
  • Yeast strains were constructed so that each of the transcription factors contained a myc epitope tag.
  • Epitope tag coding sequences were introduced into the genomic sequences encoding the COOH terminus of each regulator to increase the likelihood that tagged factors were expressed at physiologic levels
  • Appropriate insertion of the tag and expression of the tagged protein were confirmed by polymerase chain reaction and immunoblot analysis.
  • Introduction of an epitope tag might have affected the function of some transcriptional regulators
    • For 17 of the 141 factors, they were not able to obtain viable tagged cells, despite three attempts to tag each regulator.
  • Not all the transcriptional regulators were expected to be expressed at detectable levels when yeast cells were grown in rich medium, but immunoblot analysis showed that 106 of the 124 tagged regulator proteins could be detected under these conditions.
  • Performed genome-wide location analysis experiment for the 106 yeast strains that expressed epitope-tagged regulators.
  • Each tagged strain was grown in three independent cultures in rich medium (yeast extract, peptone, and dextrose).
  • Genome-wide location data were subjected to quality control filters and normalized, then the ratio of immunoprecipitated to control DNA was determined for each array spot.
  • Confidence value (P-value) for each spot from each array was calculated using an error model.
    • Data for each of the three samples in an experiment were combined by a weighted average method
    • Each ratio was weighted by P-value and then averaged.
    • Final P values for these combined ratios were then calculated.
  • Error models were used to obtain a probabilistic assessment of regulator location data because of the properties of the biological system of study (cell populations, DNA binding factors capable of binding to both specific and nonspecific sequences) and the expectation of noise in microarray-based data
    • Figure 1-B: The total number of protein-DNA interactions in the location analysis data set, using a range of P value thresholds
      • Effect of P-value threshold.
      • The sum of all regulator-promoter region interactions is displayed as a function of varying P value thresholds applied to the entire location data set for the 106 regulators.
      • More stringent P values reduce the number of interactions reported but decrease the likelihood of false-positive results.
  • Specific P value thresholds were selected to facilitate discussion of a subset of the data at a high confidence level, but this artificially imposes a “bound or not bound” binary decision for each protein-DNA interaction.
  • The results obtained were described as a P value threshold of 0.001 because the analysis indicated that this threshold maximizes inclusion of legitimate regulator-DNA interactions and minimizes false positives.
  • Various experimental and analytical methods indicate that the frequency of false positives in the genome-wide location data at the 0.001 threshold is 6% to 10%
    • Conventional, gene-specific chromatin immunoprecipitation experiments have confirmed 93 of 99 binding interactions (involving 29 different regulators) that were identified by location analysis data at a threshold P-value of 0.001.
  • Use of a high-confidence threshold should underestimate the regulator-DNA interactions that actually occur in these cells.
  • Estimated that about one-third of the actual regulator-DNA interactions in cells are not reported at the 0.001 threshold.

Regulator Density

  • There were nearly 4000 interactions observed between regulators and promoter regions at a P value threshold of 0.001.
  • The promoter regions of 2343 of 6270 yeast genes (37%) were bound by one or more of the 106 transcriptional regulators in yeast cells grown in rich medium.
  • Many yeast promoters were bound by multiple transcriptional regulators (Fig. 2A), a feature previously associated with gene regulation in higher eukaryotes, suggesting that yeast genes are also frequently regulated through combinations of regulators.
    • Figure 2-A:
      • Plot of the number of regulators bound per promoter region.
      • The distribution for the actual location data (red circles) is shown alongside the distribution expected from the same set of P values randomly assigned among regulators and intergenic regions (white circles).
      • At a P value threshold of 0.001, significantly more intergenic regions bind four or more regulators than expected by chance.
  • More than one-third of the promoter regions that are bound by regulators were bound by two or more regulators (P value threshold = 0.001), and, relative to the expected distribution from randomized data, a disproportionately high number of promoter regions were bound by four or more regulators.
  • Because of the stringency of the P value threshold, this represents an underestimate of regulator density.
    • Figure 2-B The number of different promoter regions bound by each regulator in cells grown in rich medium ranged from 0 to 181 (P value threshold = 0.001), with an average of 38 promoter regions per regulator
      • Distribution of the number of promoter regions bound per regulator.
  • The regulator Abf1 bound the largest number (181) of promoter regions.
  • Regulators that should be active under growth conditions other than yeast extract, peptone, and dextrose were typically found, as expected, to bind the smallest number of promoter regions.
    • Thi2 (which activates transcription of thiamine biosynthesis genes under conditions of thiamine starvation) was among the regulators that bound the smallest number (3) of promoters.
  • Identification of a set of promoter regions that are bound by specific regulators allowed us to predict sequence motifs that are bound by these regulators

Network Motifs

  • Network motifs (simplest units of commonly used transcriptional regulatory network architecture) provide specific regulatory capacities such as positive and negative feedback loops.
  • Genome-wide location data was used to identify six regulatory network motifs: autoregulation, multicomponent loops, feedforward loops, single-input, multi-input, and regulator chain (Fig. 3).
    • Figure 3: Examples of networkmotifs in the yeast regulatory network
  • The motifs suggest models for regulatory mechanisms that can be tested.
  • An autoregulation motif consists of a regulator that binds to the promoter region of its own gene.
  • They identified 10 autoregulation motifs with genome-wide location data for the 106 regulators (p-value = 0.001), suggesting ~10% of yeast genes encoding regulators are autoregulated (does not change substantially at less stringent p-value thresholds).
  • Studies of Escherichia coli genetic regulatory networks indicate that 52% - 74% prokaryotic genes encoding transcriptional regulators are autoregulated
  • Autoregulation is thought to provide several selective growth advantages, including reduced response time to environmental stimuli, decreased biosynthetic cost of regulation, and increased stability of gene expression
    • Upon exposure to mating pheromone, the concentrations of the pheromone-responsive Ste12 transcriptional regulator rapidly increase because Ste12 binds to and up-regulates its own gene (Fig. 3).
    • The increase in Ste12 protein leads to the binding of other genes required for the mating process.
  • A multicomponent loop motif consists of a regulatory circuit whose closure involves two or more factors (Fig. 3).
  • We observed three multicomponent loop motifs in the location data for 106 regulators (p-value = 0.001).
  • The closed-loop structure provides the capacity for feedback control and offers the potential to produce bistable systems that can switch between two alternative states.
  • The multicomponent loop motif has yet to be identified in bacterial genetic networks.
  • Feedforward loop motifs contain a regulator that controls a second regulator and have the additional feature that both regulators bind a common target gene (Fig. 3).
  • The regulator location data reveal that feedforward loop architecture has been highly favored during the evolution of transcriptional regulatory networks in yeast.
  • We found that 39 regulators are involved in 49 feedforward loops potentially controlling 240 genes in the yeast network (about 10% of genes that are bound in the genomewide location data set).
  • A feedforward loop can provide several features to a regulatory circuit.
  • The feedforward loop may act as a switch that is designed to be sensitive to sustained rather than transient inputs.
  • Feedforward loops have the potential to provide temporal control of a process, because expression of the ultimate target gene may depend on the accumulation of adequate levels of the master and secondary regulators.
  • Feedforward loops may provide a form of multistep ultrasensitivity, as small changes in the level or activity of the master regulator at the top of the loop might be amplified at the ultimate target gene because of the combined action of the master regulator and a second regulator that is under the control of the master regulator.
  • Single-input motifs contain a single regulator that binds a set of genes under a specific condition.
  • Single-input motifs are potentially useful for coordinating a discrete unit of biological function, such as a set of genes that code for the subunits of a biosynthetic apparatus or enzymes of a metabolic pathway.
    • Several genes of the leucine biosynthetic pathway are controlled by the Leu3 transcriptional regulator (Fig. 3).
  • Multi-input motifs consist of a set of regulators that bind together to a set of genes.
  • 295 combinations of two or more regulators that could bind to a common set of promoter regions.
  • This motif offers the potential for coordinating gene expression across a wide variety of growth conditions.
    • Each of the regulators bound to a set of genes can be responsible for regulating those genes in response to a unique input.
  • Two different regulators responding to two different inputs would allow coordinate expression of the set of genes under these two different conditions.
  • Regulator chain motifs consist of chains of three or more regulators in which one regulator binds the promoter for a second regulator, the second binds the promoter for a third regulator, and so forth (Fig. 3).
  • This network motif is observed frequently in the location data for yeast regulators; 188 regulator chain motifs varied in size from 3 to 10 regulators.
  • The chain represents the simplest circuit logic for ordering transcriptional events in a temporal sequence.
  • The most straightforward form of this appears in the regulatory circuit of the cell cycle, where regulators functioning at one stage of the cell cycle regulate the expression of factors required for entry into the next stage of the cell cycle.
  • The regulatory motifs described above suggest models for gene regulatory mechanisms whose predictions can be tested with experimental data.
  • Transcriptional regulation of ribosomal protein genes is not well understood.
  • Fhl1 forms a single- input regulatory motif consisting of essentially all ribosomal protein genes, but little else (unknown before experiment)
    • No other regulator studied here exhibited this behavior.
    • Loss of Fhl1 function should have a profound effect on ribosome biosynthesis if no other regulators are capable of taking its place.
    • Mutation in Fhl1 causes severe defects in ribosome biosynthesis (known because of the genome-wide location data)
  • Many ribosomal protein genes are also components of a multi-input motif involving Fhl1 and additional regulators (Fig. 3), which suggests that expression of these genes may be coordinated by multiple regulators under various growth conditions.
  • This model and others suggested by regulatory motifs can be addressed with future experiments.

Assembling Motifs into Network Structures

  • Assumption: regulatory network motifs form building blocks that can be combined into larger network structures.
  • An algorithm was developed that explores all the genome-wide location data together with the expression data from over 500 expression experiments to identify groups of genes that are both coordinately bound and coordinately expressed.
    • Begins by defining a set of genes, G, that are bound by a set of regulators, S, with a p-value of 0.001.
    • Found: a large subset of genes in G that are similarly expressed over the entire set of expression data, and those genes were used to establish a core expression profile.
    • Genes are then dropped from G if their expression profile is significantly different from this core profile.
    • The remainder of the genome is scanned for genes with expression profiles that are similar to the core profile.
    • Genes with a significant match in expression profiles are then examined to see if the set of regulators S are bound.
    • The probability of a gene being bound by the set of regulators is used instead of the individual probabilities of that gene being bound by each of the individual regulators.
  • The p-value can be relaxed for individual binding events and thus recapture information that is lost because of the use of an arbitrary p-value threshold (done because they are assaying the combined probability of the set of regulators being bound and are relying on similarity of expression patterns)
    • The process is repeated until all combinations of genes bound by regulators have been considered.
  • The resulting sets of regulators and genes are essentially multi-input motifs refined for common expression (MIM-CE).
  • It is expected that these be robust examples of coordinate binding and expression and therefore useful for nucleating network models.
  • The refined motifs were used to construct a network structure for the yeast cell cycle by an automatic process that requires no prior knowledge of the regulators that control transcription during the cell cycle.
  • The cell cycle regulatory network was selected because of the importance of this biological process, the availability of extensive genome-wide expression data for the cell cycle, and the extensive literature that can be used to explore features of a network model.
  • The goal was to determine whether the computational approach would construct the regulatory logic of the cell cycle from the location and expression data without previous knowledge of the regulators involved.
  • They reasoned that MIM-CEs that are significantly enriched in genes whose expression oscillates through the cell cycle would identify the regulators that control these genes. (11 regulators were identified)
  • To construct the cell cycle network, they generated a new set of MIM-CEs by using only the 11 regulators and the cell cycle expression data.
  • To produce a cell cycle transcriptional regulatory network model, they aligned the MIM-CEs around the cell cycle on the basis of peak expression of the genes in the group (Fig. 4).
    • Figure 4:Model for the yeast cell cycle transcriptional regulatory network
  • Three features of the resulting network model:
  1. The computational approach correctly assigned all the regulators to stages of the cell cycle, where they were shown to function in previous studies.
  2. Two regulators that have been implicated in cell cycle control but whose functions were ill-defined could be assigned within the network on the basis of direct binding data.
  3. Reconstruction of the regulatory architecture was automatic and required no prior knowledge of the regulators that control transcription during the cell cycle.
  • This approach should represent a general method for constructing other regulatory networks.

Coordination of Cellular Processes

  • Transcriptional regulators were often bound to genes encoding other transcriptional regulators (Fig. 5).
    • Figure 5: Network of transcriptional regulators binding to genes encoding other transcriptional regulators.
  • There were many instances in which transcriptional regulators within a functional category (ex: cell cycle) bound to genes encoding regulators within the same category.
  • Noted: cell cycle regulators bound to other cell cycle regulators, which was also apparent among transcriptional regulators that fall into the metabolism and environmental response categories.
    • Ex: the metabolic regulator Gcn4 bound to promoters for PUT3 and UGA3, genes that encode transcriptional regulators for amino acid and other metabolic functions.
  • The stress response activator Yap6 bound to the gene encoding the Rox1 repressor, and vice versa, which suggests positive and negative feedback loops.
  • Multiple transcriptional regulators within each category were able to bind to genes encoding regulators that are responsible for control of other cellular processes.
    • Ex: the cell cycle activators bind to genes for transcriptional regulators that play key roles in metabolism (GAT1, GAT3, NRG1,and SFL1); environmental responses (ROX1, YAP1, and ZMS1); development (ASH1, SOK2, and MOT3); and DNA, RNA, and protein biosynthesis (ABF1).
  • Partially explains how cells coordinate transcriptional regulation of the cell cycle with other cellular processes.
  • These connections are generally consistent with previous experimental information about the relationships between cellular processes.
    • Ex: the developmental regulator Phd1 has been shown to regulate genes involved in pseudohyphal growth during certain nutrient stress conditions; they found that Phd1 also binds to genes that are key to regulation of general stress responses (MSN4, CUP9, and ZMS1) and metabolism (HAP4).
  • Implications:
    • Control of most (if not all) cellular processes is characterized by networks of transcriptional regulators that regulate other regulators.
    • The effects of transcriptional regulator mutations on global gene expression, as measured by expression profiling, are as likely to reflect the effects of the network of regulators as they are to identify the direct targets of a single regulator.

Significance of Regulatory Network Information

  • This study identified network motifs that provide specific regulatory capacities for yeast, revealing the regulatory strategies that were selected during evolution for this eukaryote.
  • These motifs can be used as building blocks to construct large network structures through an automated approach that combines genome-wide location and expression data in the absence of prior knowledge of regulator functions.
  • The network of transcriptional regulators that control other transcriptional regulators is highly connected which suggests that the network substructures for cellular functions (cell cycle and development) are themselves coordinated at a transcriptional level.
  • It is possible to envision mapping the regulatory networks that control gene expression programs in considerable depth in yeast and in other living cells.
  • More complete understanding of transcriptional regulatory networks in yeast will require knowledge of regulator binding sites under various growth conditions and experimental testing of models that emerge from computational analysis of regulator binding, gene expression, and other information.
  • Future research & effects:
    • The approach described here can also be used to discover transcriptional regulatory networks in higher eukaryotes
    • Knowledge of these networks will be important for understanding human health and designing new strategies to combat disease.

Questions

  1. What is the main result presented in this paper?
    • Created a model based on the peak expression, to model the network of transcriptional regulators
    • The computational approach correctly assigned all the regulators to stages of the cell cycle, where they were shown to function in previous studies
    • Two regulators that have been implicated in cell cycle control but whose functions were ill-defined (35–37) could be assigned within the network on the basis of direct binding data.
    • Reconstruction of the regulatory architecture was automatic and required no prior knowledge of the regulators that control transcription during the cell cycle
  2. What is the importance or significance of this work?
    • This represents a general method for constructing other regulatory networks
  3. Briefly describe their methods, including the following information. A flow chart may be helpful here.
    1. How did they treat the cells (what experiment were they doing?)
      • Tagged 106 strains with a regulator, used chromatin IP to enrich the promoters bound by regulator in vivo, then used microarray to identify the promoters bound by regulator in vivo.
    2. What strain(s) of yeast did they use? Was the strain haploid or diploid?
      • They used 106 strains of yeast
    3. What media did they grow them in? Under what conditions and temperatures?
      • Yeast extract, peptone, and dextrose
    4. What controls did they use?
      • Cell cycle
    5. How many replicates did they perform per condition?
      • Three independent cultures
    6. What mathematical/statistical method did they use to analyze the data?
      • Statistical methods: p-value for each spot was calculated using an error model, ratio was weighted by p-value then averaged to find the final p-values for these combined ratios
    7. What transcription factors did they talk about?
      • Abf1 and Thi2



Biomathematical Modeling Navigation

User Page: Tessa A. Morris
Course Page: Biomathematical Modeling