Alice Finton Online Lab Notebook

From OpenWetWare
Jump to navigationJump to search

Summer 2019

Week 2: May 28 - May 30

  • Goals:
  1. Perform an exploratory search for software that can be used for comparing gene ontologies of the wild-type, dCIN5, dGLN3, dHAP4, dHMO1, and dZAP1 strains of Saccharomyces cerevisiae.
  2. Organize STEM profile results for all strains to make comparison of the profile types.
  3. Present a Journal Club on "Parameter Estimation for Gene Regulatory Networks from Microarray Data: Cold Shock Response in Saccharomyces cerevisiae"
  • Progress
  1. Alice and Mihir May 28 Journal Club Powerpoint
  2. Comparative gene ontology web search
    1. Software reviewed:
      • Revigo- Allows for the creation of scatterplots from a single list of gene ontology IDs, with the option of including p-values. The resulting graphs colorize the functional groups according to the p-value.
        • Does not allow for a comparison between GO lists. Therefore, it can be used only for a single strain.
      • WEGO- Creates comparative plots, but supports narrow file types.
        • Need to determine how to change file format of GO list from STEM software to work on WEGO.
      • Panther Classification System- Creates bar graphs or pie charts of the gene ontology terms from the gene list data from STEM.
        • There is no option to restrict the GO terms based on p-value significance.
        • You have to use the gene list, not the GO list from STEM.
      • ClueGO- Creates networks from the gene ontology terms and allows for a comparison between two lists of data.
        • Requires Cytoscape, as it is a plug-in for the software.
          • On May 30, Dr. Dahlquist requested access to the plug-in. (On June 2, the license was given to Dr. Dahlquist)
      • CompGO
        • Requires R software to run. Need to figure out how to use CompGO in the R software.
      • Comparative GO- "A webserver for comparative gene ontology, gene ontology network and gene ontology based gene selection"
        • Limited species inclusion: Bacteria, Virus, Zebrafish, Human, Rice. Unsure if it supports S. cerevisiae genes.
      • BiNGO- Cytoscape plug-in that creates networks of the gene ontologies.
        • Determines which gene ontology categories are overrepresented in the gene list.
        • Supports a wide range of organisms.
      • GOTaxExplorer- Allows for the comparison between gene sets.
        • " It is possible to compare arbitrarily selected organisms or groups of organisms from the taxonomic tree on the basis of the functionality of their genes" (GO Tools: Visualization)
        • Unable to download the software. May need to request access to use the software.
      • Panther Compare Lists- Allows for a comparison between gene ID lists. Statistical overrepresentation test.
        • Produces p-values from the analysis. (Gives the option of using Bonferroni corrected p-values)
        • Allows for the visualization of the gene ontology groups with a bar chart, multiple pie diagram, overlaid area chart of difference, and bar chart of difference.
    2. Simple comparison of GO lists
      • Venn Diagrams
        • Pangloss- Creates a venn diagram from only two lists of data, but states which terms are overlapping and which are unique
        • InteractiVenn- Allows for the creation of a venn diagram for up to six sets of data.
          • Does not indicate which terms are overlapping or unique.
      • Microsoft Excel to Compare IDs
        • Allows for the comparison of gene ontology terms to determine which are overlapping and which are unique between the strains.
      • Compare Two Lists
        • Allows you to input sets of data and compare them. It offers information about which inputs are unique to either list and those that are overlapping.

Week 3: June 3 - 6

  • Goals:
  1. Run ClueGO with Cytoscape and create powerpoint of various functions it can do
  2. Try to figure out CompGO and how to use the R software
  3. Present a Journal Club on "Physiological and Transcriptional Responses of Anaerobic Chemostat Cultures of Saccharomyces cerevisiae Subjected to Diurnal Temperature Cycles"
  • Progress:
  1. Alice June 3 Journal Club Powerpoint
  2. ClueGO
    1. ClueGO does not allow for unique gene ontology groups to be included in the network. Only the categories that are common to all sets of genes are included in the network.
    2. There is the option of using gene list IDs or gene ontology terms for the analysis. Gene IDs offer the creation of a network, but the gene ontology terms only create separate nodes with no edges.
    3. There is a ClueGO plugin that gives the option to make the nodes of the network pie charts, giving information about the percentage of genes in each cluster that are part of the specific functional group. I downloaded the plugin and used it to determine the percentages of wt and dCIN5 genes were part of the functional categories in the network, but each pie chart showed the same percentages. Therefore, the number of genes belonging to the specific functional category could be the same for each node.
    4. When I created the network comparing wild-type profile 45 and dCIN5 profile 45 gene IDs, every functional category was more highly overrepresented in the wild-type strain than the dCIN5 strain. Therefore, each node in the network was colorized red (indicating wild-type) based on the color settings in ClueGO.
    5. I am currently working on creating a PowerPoint with the various functions of ClueGO.
      • Run ClueGO analysis on the clusters, comparing different strains in each cluster that would be useful for analysis.
    6. When using ClueGO for analyzing the GO terms that are given through STEM, the title of the columns for p-value and GO ID are sensitive.
      Figure 1: ClueGO Headers for GO Term Analysis
      • In order to run an analysis with GO terms, you need to select "Preselected Functions"' instead of "Functional Analysis' at the top of the window.
      • Initially when I pasted the columns into the box, I wrote "GOID" for the gene ontology ID column, and "p value" for the column listing the respective p-values. When I ran the test, the result was not in the form of a network, rather just a grid of functional categories.
      • I ran another test without putting a header for the columns. Initially, I began getting networks for the results, with nodes and edges. But after restarting Cytoscape, running the software with no headers resulted in grids of functional categories rather than the network.
      • I looked back through the ClueGO documentation to see if there was any information about what to include in the headers. I found that when running GO terms, the header of the column including the GO Ids should be labelled "GOID:PathwayID" and the column including the p-values should be labelled "p value" (Fig. 1). After correcting the format, the results of the tests were networks.
    7. When I tried to save the networks that I had created on June 5, Cytoscape crashed because too much memory was being used. Therefore, it is important to save your work as you go.
      • Additionally, do not save as a Cytoscape file. Save as a file on ClueGO, otherwise only the networks can be seen, not the analysis tables.
      • You should save your work as a ClueGO file and also as a Cytoscape session. It takes a long time to load a large ClueGO file (I tried loading the file from the work I had done on June 6 and it got stuck and crashed Cytoscape. I will attempt to open that session later.)
      • On June 5, I ran an analysis on all of the profiles and the gene list IDs and took screenshots of the resulting networks. The actual ClueGO files were lost because Cytoscape crashed.(I will have to redo the analysis on gene list)
  3. CompGO
    1. CompGO Vignettes, CompGO paper, and CompGO user manual

Week 4 : June 10 - 13

  • Goals:
  1. Present a journal club on the ClueGO paper and the progress that I have made on ClueGO so far.
  2. Begin modeling experiments
    • Variable inclusion of strain data for db5
      • First make a list of experiments in Excel to keep track.
      1. wt-only
      2. wt + each strain individually
      3. wt + two strains
      4. wt + three strains
      5. wt + 4 of the five deletion strains (in other words, leaving one deletion strain out)
    • looking at production rates
  • Progress:
  1. Week 4 ClueGO Journal Club
    • Includes all of the networks that have been created with ClueGO so far.
  2. ClueGO
    • Run a ClueGO analysis on the rest of the strains for the GO terms. Run a comparison analysis between wild-type and deletion strain (not including dHMO1) for profile 45, 9, 22, and 48.
  3. Modeling experiments:
    • Models will be run on the strains (wt, dCIN5, dGLN3, dHAP4, dHMO1, and dZAP1) by deleting data entirely. For example, a model will run with all but one strain (i.e. wt-dCIN5-dGLN3-dHAP4-dHMO1). I have created an Excel sheet that shows all of the models that will be run.
    • On June 11, I have started to run 26 models on GRNsight MATLAB version 1.10.
      • There was an issue with the CPU affinity selections for the trials. I would select an affinity for the MATLAB run, but when I would check the CPU again, all of the processors would be selected. After that, I did not know which CPU correlated to which deletion run. For instance, CPU 0 was chosen for the all-strain model, but when I went back through the affinities, I could not find a MATLAb.exe with CPU 0 selected.
        • In order for one processor to be used, the CPU has to be chosen after the model has started to run, not when the MATLAB command window pops up. Once the model ("Figure 1") window pops up, the CPUs return to being all checked. Therefore, in order to keep each model restricted to one CPU, it has to be selected after the file has been chosen and the "Figure 1" window pops up.
    • On June 12, I started to run the rest of the models. In total, there are 32 models.
Cerevisiae Computer June 11, 2019                           Paradoxus Computer June 11, 2019                     Paradoxus Computer June 12, 2019       
   Time              Strains Included                          Time               Strains Included                  CPU         Time           Strains Included   
1:20 - 4:43      all-strain                                 2:38 - 6:30      wt - dCIN5 - dHMO1 - dZAP1              0      11:02 - 12:15        wt-only
1:27 - 7:27      wt - dCIN5 - dGLN3 - dHAP4 - dHMO1         2:41 - 7:08      wt - dGLN3 - dHAP4 - dHMO1              1      11:03 - 12:14       wt - dCIN5
1:33 - 7:31      wt - dCIN5 - dGLN3 - dHAP4 - dZAP1         2:44 - 8:07      wt - dGLN3 - dHAP4 - dZAP1              2      11:06 - 11:51       wt - dGLN3
1:37 - 8:42      wt - dCIN5 - dGLN3 - dHMO1 - dZAP1         2:47 - 7:00      wt - dGLN3 - dHMO1 - dZAP1              3      11:09 - 12:30       wt - dHAP4
1:40 - 6:16      wt - dCIN5 - dHAP4 - dHMO1 - dZAP1         2:52 - 6:17      wt - dHAP4 - dHMO1 - dZAP1              4      11:12 - 2:22        wt - dHMO1
1:41 - 3:27      wt - dGLN3 - dHAP4 - dHMO1 - dZAP1         2:55 - 6:42      wt - dCIN5 - dGLN3                      5      11:14 - 1:47        wt - dZAP1
1:47 - 3:14      wt - dCIN5 - dGLN3 - dHAP4                 2:57 - 6:11      wt - dCIN5 - dHAP4
1:49 - 6:33      wt - dCIN5 - dGLN3 - dHMO1                 3:00 - 7:01      wt - dCIN5 - dHMO1
1:53 - 5:15      wt - dCIN5 - dGLN3 - dZAP1                 3:02 - 7:11      wt - dCIN5 - dZAP1
1:55 - 4:12      wt - dCIN5 - dHAP4 - dHMO1                 3:04 - 6:55      wt - dGLN3 - dHAP4
2:01 - 3:36      wt - dCIN5 - dHAP4 - dZAP1                 3:07 - 7:01      wt - dGLN3 - dHMO1
                                                            3:09 - 8:57      wt - dGLN3 - dZAP1
                                                            3:11 - 5:59      wt - dHAP4 - dHMO1
                                                            3:14 - 7:23      wt - dHAP4 - dZAP1
                                                            3:16 - 7:25      wt - dHMO1 - dZAP1
    • After all of the models have run, they create an output file that includes optimized parameters.
      • I have created an Excel workbook for the optimized production rates, threshold (b), and weights for each of the strain deletions. In addition, I have compared the LSE, minLSE, and LSE:minLSE ratios for each of the strain deletions and have created a bar graphs for each. Excel workbooks
    • The output files were run through GRNsight, and the weighted SIF files were downloaded and used in the creation of heat maps. Using previous data by Lauren Kelly, I was able to normalize the data and create the heat maps for visualization of the activation and repression of the genes.
    • Heat maps were created for the strain deletions. They were sorted based on how I wrote them, increasing LSE:minLSE ratio, minLSE, and LSE.

Week 5: June 17 - 20

  • Goals:
  1. Upload the output analysis files to GitHub under an analysis folder.
  2. Present on the progress made throughout the week.
  • Progress: