Courtney L. Merriam Week 14

From OpenWetWare
Jump to navigationJump to search


The purpose of this assignment was to explore and and develop knowledge about the many biology oriented databases for journal articles that are available to students and researchers on the internet. Then read a specific article about the specific database from the Nucleic Acids Research Journal and then go explore that database. After doing this, develop a quick talk to catch up the rest of the class about the database I explored.

Methods and Results

Part I

  • Nucleic Acids Research Database Issue Table of Contents 2016
  • Read the article about the database from the Nucleic Acids Research journal and then go online to the database itself. When you answer the questions below, provide a hyperlink to the page that you got the information from (there must be at least one link per answer).
    1. What database did you access? (link to the home page of the database)
    2. What is the purpose of the database?
      • It is the first publicly accessible database of measured protein–ligand affinity data. It's designed to support access to focused data sets, focusing mainly on the interactions of protein considered to be drug-targets with small, drug-like molecules Introduction
    3. What biological information does it contain?
      • 1,279,670 binding data, for 6,612 protein targets and 565,136 small molecules Info
    4. What species are covered in the database?
      • ~450 species eg. Influenza A virus, Human immunodeficiency virus, HAMSTER, and GUINEA PIG Source Organism
    5. What biological questions can it be used to answer?
      • If interested in machine-readable compound and affinity data across a set of related targets, to test and parameterize algorithms for computer-aided drug design; and interested in affinity data across as many human proteins as possible, to help screen a drug candidate for side effects or develop hypotheses for the mechanism of action of a new bioactive compound. Introduction
    6. What type of database is it (sequence, structure model organism, or specialty [what?]; primary or “meta”; curated electronically, manually [in-house], manually [community])?
      • Specialty: interactions of protein considered to be drug-targets with small, drug-like molecules,
      • Both primary and meta: allows community input of data and sources data from other databases Contribute Data
      • Curated in- house and a combination of electronically and manually Home
    7. What individual or organization maintains the database?
    8. What is their funding source(s)?
      • NIH grant R01GM070064, and was previously supported by NSF grant 9808318 and by NIST About Us: Support
    9. Is there a license agreement or any restrictions on access to the database?
      • 1) Data curated by BindingDB are made available subject to a Creative Commons Attribution License, so long as BindingDB is cited. 2) Data from ChEMBL is provided under a Creative Commons Attribution-Share Alike 3.0 Unported License. Info
    10. How often is the database updated? When was the last update?
      • These files updated when new data are added, usually weekly. The last Full BindingDB Database update was September 28, 2016. They gave very detailed information of when each individual data was updated. Download Files
    11. Are there links to other databases?
      • Yes
        • PDB IDs match BindingDB data, based upon 85% sequence identity and exact ligand match [1]
        • PDB IDs match BindingDB data, based upon 100% sequence identity and exact ligand match [2]
        • UniProtKB/Swiss-Prot [3]
        • UniProtKB/TrEMBL [4]
        • PubMed [5]
    12. Can the information be downloaded? And in what file formats?
    13. Evaluate the “user-friendliness” of the database.
      • Is the Web site well-organized?
        • Yes, straight from the home page there are a ton of links to take you to specific information one would look for such as Download Files, Citations, Source Organisms, and a list of FDA-approved drugs in BindingDB Home
      • Does it have a help section or tutorial?
      • Run a sample query. Do the results make sense?
        • Can search by nucleic acid sequence, protein sequence, name, chemical structure and pH. They give all the information of that protein including ligand, structure and citations where it was found. Example Query Envelope polyprotein GP160

Part II


  1. Trehalose: A sugar of the disaccharide class produced by some fungi, yeasts, and similar organisms. [6]
  2. Nucleolin: A protein found abundantly in the nucleoli of cells, associated with the transcription of ribosomal RNA and the assembly of ribosomes. [7]
  3. Diauxic: the growth in two separate phases due to the preferential use of one carbon source over another; between the phases a temporary lag occurs. [8]
  4. Transient: Lasting only for a short time; impermanent [9]
  5. Glycolysis: The breakdown of glucose by enzymes, releasing energy and pyruvic acid. [10]
  6. Oxidative: Relating to the process or result of oxidizing or being oxidized [11]
  7. Dendrogram: a branching diagram representing a hierarchy of categories based on degree of similarity or number of shared characteristics especially in biological taxonomy [12]
  8. Phosphorylation: Introduce a phosphate group into (a molecule or compound) [13]
  9. Biogenesis: The synthesis of substances by living organisms. [14]
  10. Dimorphic: Occurring in or representing two distinct forms [15]
  11. Proteasome: A complex of proteinases involved in breaking down selected intracellular proteins. [16]
  12. Ubiquitin: A compound found in living cells which plays a role in the degradation of defective and superfluous proteins. It is a single-chain polypeptide. [17]
  13. Glutathione: A compound involved as a coenzyme in oxidation–reduction reactions in cells. It is a tripeptide derived from glutamic acid, cysteine, and glycine. [18]
  14. Glutaredoxin: A family of thioltransferases that contain two active site CYSTEINE residues, which either form a disulfide (oxidized form) or a dithiol (reduced form). [19]
  15. Posttranslational level: Occurring after the translation of a mRNA sequence into the amino-acid sequence it encodes [20]
  16. Molecular Chaperones: A protein required for the proper folding and/or assembly of another protein or protein complex. [21]
  17. Locus Control Region (LCR): A regulatory region first identified in the human beta-globin locus but subsequently found in other loci.[22]
  18. Msn2p/Msn4p- Stress responsive transcriptional activators; activated in stochastic pulses of nuclear localization in response to various stress conditions; binds DNA at stress response elements of responsive genes; relative distribution to nucleus increases upon DNA replication stress [23]
  19. Dimorphic -Having two different distinct forms of individuals within the same species or two different distinct forms of parts within the same organism [24]
  20. Glycogen - A branched polymer of glucose that is mainly produced in liver and muscle cells, and functions as secondary long-term energy storage in animal cells.[25]
  21. Peroxidation - Conversion into a peroxide. [26]
  22. Saccharomyces cerevisiae - Bakers yeast, any of various single-celled fungi that reproduce asexually by budding or division [27]
  23. Escherichia coli - a species of bacterium normally present in intestinal tract of humans and other animals; sometimes pathogenic; can be a threat to food safety [28]
  24. Osmosis/Osmotic -A process by which molecules of a solvent tend to pass through a semipermeable membrane from a less concentrated solution into a more concentrated one. [29]
  25. Chaperone -A protein required for the proper folding and/or assembly of another protein or protein complex. [30]
  26. Kinase -a subclass of the transferases, comprising the enzymes that catalyze the transfer of a high-energy group from a donor (usually ATP) to an acceptor, and named, according to the acceptor, as creatine kinase, fructokinase, etc. [31]
  27. Open Reading Frame (ORF)-
  28. UFD1: Substrate-recruiting cofactor of the Cdc48p-Npl4p-Ufd1p segregase; polyubiquitin binding protein that assists in the dislocation of misfolded [32]
  29. MGA2: ER membrane protein involved in regulation of OLE1 transcription; inactive ER form dimerizes and one subunit is then activated by ubiquitin/proteasome-dependent processing followed by nuclear targeting [33]
  30. NPL4: Substrate-recruiting cofactor of the Cdc48p-Npl4p-Ufd1p segregase; assists Cdc48p in the dislocation of misfolded, polyubiquitinated ERAD substrates that are subsequently delivered to the proteasome for degradation; also involved in the regulated destruction of resident ER membrane proteins [34]


  • Unicellular organisms are programmed to respond to stress
  • Saccharoymces cerevisiae, otherwise known as yeast is studied under different stresses to determine its response on a molecular level
    • Many studies have been done on heat-induced stress
    • Little is known about stress-response to cold
  • Studies have shown that ~10% of the genome changes in response to stress
    • induced or repressed
  • environmental stress response genes (ESR), are involved in many organismal functions
    • regulation of gene is determined by transcription factors Msn2p and Msn4p
  • Adaptation within the cell is determined by different regulatory mechanisms within cells and varies between organisms
  • In S. cerevisiae the genes TIP1, TIR1, TIR2, and NSR1 have been shown to be involved in stress response.
    • S. Cerevisiae wild type and msn2 msn4 cells were analyzed to compare the cold responses to other stressors
Materials and Methods
  • Strains used were the wild-type, BY4743 and BSY25 which was obtained from a cross of two single-mutant strains. W303, a separate wild-type was also used
  • Cultures were inoculated and grown at 30 C on YPD medium.
    • After being harvested during log phase and transferred to new medium, the temperature was decreased 4C per minute.
  • RNA was isolated using the hot-phenol method and purified using the Oligiotex Spin-Column Protocol
  • mRNA was labeled and resulting cDNA was hybridized onto microarrays
  • slides were analyzed within the ScanArray lite scanner apparatus and QuantArray software
    • Normalization was performed for DNA spots to be included in analysis
  • All cultures used in experiment were in the same physiological state
    • Time points used for two independent biological repeats were 0, 2 and 12 hours
    • for three independent biological repeats time points used were 10, 30, and 60 minutes
    • Cy dyes were swapped for reference for each experiment, and a control microarray was done to obtain variability for cultures grown at 30C
      • Only 14 genes within the control demonstrated variability
  • p-value of <0.03 was used for experimental analysis
  • a total of 43 microarray were performed for this study
  • glycogen and trehalose values were determined using a glucose kit
Cold Response of S. cerevisiae
  • S. cerevisiae showed reduced growth rate but a normal growth curve under low temperature conditions
  • The doubling time of cultures with a usual doubling time of ~90 minutes was reduced to 20.7 hours after reducing the temperature from 30 to 10C
  • Cultures reached a stationary phase after ~120 hours
  • A rapid temperature shift from 30 to 10C in S. cerevisiae do show transient changes in gene expression
  • Of the five clusters that were organized using two-dimensional hierarchical clustering, 3 induced genes while 2 repressed genes in response to the cold.
    • One subset of the cold induced genes was particularly induced during the first 2 hours while the other subset was particular induced after 12 and 60 hours.
    • The subsets were labeled as either early cold response (ECR) or late cold response (LCR)
  • A test was performed to determine whether the cold induction treatment was effective and the results showed effective cold induction.
Early Cold Response
  • ECR genes were defined as being reproducibly induced 2-fold or more at one or more of the three early time points examined
  • 130 open reading frames were identified
  • The genes defined are associated with transport, lipid and amino acid metabolism, and transcription, as well as ORFs with unknown functions
  • A set of ECR genes involved in transcription were also defined.
  • UFD1 was significantly induced during the ECR
  • MGA2 and NPL4 showed reproducible increases in transcription abundance during the ECR, but not more than twofold
  • S. cerevisiae cells responded to a temperature downshift as well
    • Levels of expression of some genes rapidly decreased
    • Expression of 32 genes reduced twofold, including genes encoding heat shock proteins
Late Cold Response
  • 280 genes LCR genes were identified
    • These genes include ones encoding metabolic enzymes involved in various processes
  • Some genes required for regulation of carbohydrate metabolism were coordinately induces
  • Heat shock proteins involved in stress response were also found to be LCR induced genes
  • Additional genes that were previously shown to be induced by oxidative stress and implicated in detoxification process were also shown to be induced in the LCR
  • 256 cold repressible LCR genes were identified
    • 36% of these genes are involved in protein synthesis
    • Others are associated with nucleotide biosynthesis, protein modification, and vesicle transport.
      • These results indicate that ribosomal genes and others involved in protein synthesis contributes to adaptation to cold
Cold Response Compared with Other Environmental Stress Responses
  • When comparing ECR with the transcriptional pattern produced with a decrease of temperature from 37 to 25C they produced a similar transcription response
Figure 1: Transcriptional response to cold

A) Two-dimensional hierarchical cluster analysis of microarray data obtained from a time- course experiment with S. cerevisiae wild-type diploid cells

B) Classification of ECR genes

C) Classification of LCR genes

Figure 2

Temperature downshift response yields similar response as early cold shock response. 47% of induced early cold shock genes were induced by the temperature downshift (a). A large amount of down regulated genes in early cold shock were also repressed in the temperature downshift (b).

Figure 3
  • ECR Genes Showed Reciprocal Transcriptional Behavior in Comparison to Other Stress Stimuli. Comparison of ECR genes (CS 2 h) to LCR genes (CS 12 h) and to the responses to other stimuli. Half of repressed ECR genes were induced in heat shock. 40% of induced ECR genes were repressed after 0.5 h of heat shock. 18% of induced ECR genes showed no heat shock response.
  • LCR Genes Showed Similar Transcriptional Responses In All Cases.
  • LCR Involves The ESR and ECR Indicates a Cold-Specific Transcriptional Response. Induced and repressed LCR and ECR genes compared to identified environmental stress response (ESR) genes (Gasch et al., 2000). Induced and repressed LCR genes had a significant overlap of 87 and 111 genes with induced ESR genes. ECR and ESR genes did not have a significant overlap.
Figure 4

Msn2p and msn4p play a major role in late cold shock response and environmental response. Msn2p and msn4p are transcription factors that are required for 99 long cold shock response genes. There are other factors that control late cold shock response. Msn2p and msn4p have little effect in early cold shock response.

Figure 5

Genes Involved in Carbohydrate Metabolism Are Induced at 12 h. •Increase in glycogen and trehalose content observed after 12 h (LCR). Induction of genes in carbohydrate metabolism depend on STREs in the promoters. Mutant strains lacking Msn2p and Msn4p lose induction of these genes during cold treatment.

General Trends in Figure 2-5
  • majority of repressed ECR genes were also repressed during a temperature downshift
  • when the transcription profiles from cultures grown continuously (20 h) at low temperatures (15, 17, or 21°C) were compared with the ECR and LCR profiles, only weak correlations were seen
  • unexpected correlations for OS after 15 min, for MD and XS after 0.5 h, and for DTT after 2 h.
  • induced ECR genes were repressed, whereas repressed ECR genes were induced
  • ECR and heat shock: Almost half of the repressed ECR genes were induced during heat shock, including HSP genes and genes involved in amino acid and carbohydrate metabolism
  • These observations strongly suggest that the LCR involves the ESR, whereas the ECR indicates a “cold-specific” transcriptional response
  • These results support the comparison of the ECR expression profile to those seen under various stress conditions in indicating a cold-specific response of S. cerevisiae during the ECR
  • A physiological consequence of the general stress response in S. cerevisiae is the accumulation of the two major reserve carbohydrates, glycogen and trehalose
  • there is no accumulation in response to cold during the first 2 h, but a reproducible increase in glycogen and trehalose content was observed after 12 h of cold treatment
  • The induction of genes involved in reserve carbohydrate metabolism in response to stress depends on the presence of STREs in the promoters of these genes
  • Two distinct responses to cold shock, an early cold shock response (<2 hrs) and a late cold shock response (>12 hrs).
  • Early cold shock response: genes that play role in RNA and fat metabolism
  • Late cold shock response: genes that protect cells
  • Low temperatures slow RNA translation
    • Bacteria cold adapts, ATP-dependent RNA helicases remove mRNA
    • Yeast cold adapts: genes encoding RNA helicases, binding proteins, and processing proteins
    • Mutation in some of these genes leads to cold sensitive phenotypes
  • Cold inducible RNA helicase found in plants too
    • Demonstrates role in cold adaptation
  • Membrane fluidity is another cold adaptation
    • Low temp decreases fluidity
    • Countered by higher prod. of unsaturated fatty acids
      • Involves fatty acid desaturase activity
      • Activity found in bacteria, fish, and yeast
  • LCR gene expression program involves metabolic and stress genes
    • Compensate for reduction in enzyme activity, synthesize protective molecules
  • Trehalose in plants offers cold protection
    • Protects against autolysis, increases freezing tolerance, stabilizes membranes
  • Trehalose and glycogen accumulation observed during LCR
    • Suggested that stress stimulates recycling of both carbohydrates
  • Trehalose observed aiding survival of yeast and E. coli near freezing temperatures
    • May play important role in cold adaptation
    • However, growth rate and viability seemed unaffected by strains w/o trehalose
  • HSP genes found to be cold induced
    • Suggests help is necessary for protein maintenance in cold
  • Global stress-transcription of S. cerevisiae compared w/ gene expression for cold response
    • Data from different labs may have inconsistencies
    • No standard method for inducing stress response
  • Found similarities and differences with different studies
    • Sahara et all (2002) grouping of similar genes found, but observed genes behaving differently
    • May be difference in strain background or experimental design
    • Gasch et al. (2000)
    • Overlap in data found between LCR and ESR genes, indicating LCR activating stress response. ECR response was not consistent
  • Msn2p and msn4p important in glucose and trehalose synthesis in LCR but not the only mechanism
  • S. cerevisiae’s transcriptional cold response comprised of two patterns, early and late
    • Early phase changes membrane fluidity, destabilize RNA
    • Late phase similar to environmental cold response, may be caused by altered physiological state of cell
  • Transcriptional response includes general stress and cold-specific mechanisms.
  • Many other changes likely affect cold survivability of organisms
Figure 6
  • Comparison of Shade and Sahara et al. (2000) yield conflicting and supporting results
  • Comparison of 634 cold-responsive genes
  • Contradiction between the Schade and Sahara et al. (2002) data. Difference found in induction or repression of ribosomal genes.
  • Consistency between the Schade and Sahara et al. (2002) data. Environmental stress response genes being unregulated at times of exposure longer than 2 hours.
  • Are the data publicly available for download? From which web site? Yes [35]


Schade Journal Club Part I


The database I explored was informative on a small portion of biology, measured binding affinities. What was very helpful was not necessarily the information contained within the articles of the database, but the questions provided that I had to answer. Those questions helped guide me to learn what to look for when finding a new database or researching for an experiment. Before this exercise I had no idea about the scope or the specificity some databases offer, but now I know there is likely a niche database for anything I could hope to learn more about.


I collaborated with Avery Vernon-Moore, Zachary T. Goldstein and Jordan T. Detamore in class on this assignment. While I worked with the people noted above, this individual journal entry was completed by me and not copied from another source.


  • Gilson, M. K., Liu, T., Baitaluk, M., Nicola, G., Hwang, L., & Chong, J. (2016). BindingDB in 2015: A public database for medicinal chemistry, computational chemistry and systems pharmacology. Nucleic acids research, 44(D1), D1045-D1053. doi:10.1093/nar/gkv1072
  • Gilson, M., Liu, T., Hwang, L., & Chong, J. September 28, 2016, Binding Database. Retrieved from on 22 November 2016

Useful Links

Courtney L. Merriam

Clas Page: Bioinformatics Laboratory

Weekly Assignments Individual Journal Assignments Shared Journal Assignments