Natalie Williams: Electronic Notebook: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
(Added Tidbit about Spring 2016 and Beginnings of Fall 2016)
 
(55 intermediate revisions by the same user not shown)
Line 9: Line 9:
*[[Dahlquist:GRNmap]]
*[[Dahlquist:GRNmap]]


=== Electronic Notebook ===
===[[Natalie Williams Fall Electronic Notebook 2014 |Fall 2014]]===
===[[Natalie Williams Fall Electronic Notebook 2014 |Fall 2014]]===
This contains all the procedures and tasks that I completed and the trials that I ran in Fall 2014.
This contains all the procedures and tasks that I completed and the trials that I ran in Fall 2014.
Line 19: Line 18:
This link has all the information for what occurred over the summer. A lot of it was testing the code by changing the initial weights and the threshold b values of the input sheets.
This link has all the information for what occurred over the summer. A lot of it was testing the code by changing the initial weights and the threshold b values of the input sheets.


===Fall 2015===
===[[Natalie Williams Fall Electronic Notebook 2015 |Fall  2015]]===
====September 2015====
This month consisted of meetings and getting to know new members of the research team.
<br>
We decided that we would be going through all the data analysis & number crunching of the raw data. In doing so, we hope to consolidate all of the information and data to obtain one master data set.
*The protocol for this can be found [[ |here]]
<br>
Grace and I also looked over the literature that we selected last time to find data sets with production and degradation rates for mRNA.


For the sanity check:
===[[Natalie Williams Fall Electronic Notebook 2016 | Fall 2016]]===
* <b> WT statistical results </b>
::{| border="1"
|+ Sanity Check
|-
! P-value criteria
! Number of Genes
! % out of 6189
|-
| p<0.05
| 2600
| 42.0%
|-
| p<0.01
| 1727
| 27.9%
|-
| p<0.001
| 1015
| 16.4%
|-
| p<0.0001
| 574
| 9.3%
|-
| Bonferroni p<0.05
| 302
| 4.9%
|-
| B&H p<0.05
| 1936
| 31.3%
|}


* <b> S. paradoxus statistical results </b>
==Spring 2017==
::{|border ="1"
|+ Sanity Check
|-
! P-value criteria
! Number of Genes
! % out of 6189
|-
| p<0.05
| 2513
| 40.6%
|-
| p<0.01
| 1637
| 26.5%
|-
| p<0.001
| 852
| 13.8%
|-
| p<0.0001
| 433
| 7.0%
|-
| Bonferroni p<0.05
| 232
| 3.7%
|-
| B&H p<0.05
| 1791
| 28.9%
|}
*<b> dHMO1 statistical results </b>
::{| border ="1"
|+ Sanity Check
|-
! P-value criteria
! Number of Genes
! % out of 6189
|-
| p<0.05
| 1019
| 16.5%
|-
| p<0.01
| 432
| 7.0%
|-
| p<0.001
| 119
| 1.9%
|-
| p<0.0001
| 46
| 0.7%
|-
| Bonferroni p<0.05
| 25
| 0.4%
|-
| B&H p<0.05
| 114
| 1.8%
|}


====October 2015====
=== January 2017 ===
=====October 7, 2015=====
====Week of January 12, 2017====
*I took the significant p-values that were less than 0.05 from the B-H column. Those genes that came up with significant p-values were then put into YEASTRACT to determine which transcription factors are enriched.
Monday & Thursday: Worked on collecting sources for my thesis project. The annotated bibliography is due 20/01. I will be in Boston at that time, but I will still submit my annotated bibliography in time. We had our first lab meeting of the semester on Thursday.
*This was done for both the dHMO1 and the S. paradoxus strains
*In trying to pare down the network, I came up with five ways to generate different networks:
**All genes that were not connected were removed from the network. Also, genes with fewer than 1 regulator and controlling 1 other gene were removed
*# Top 25 genes with the highest or best p-values
*# Top 25 genes that have the highest regulation control
*# Genes with the highest number of regulators
*# 10 Genes with the highest number of regulators and 10 genes with the lowest regulation control
*# Genes with low regulators and low regulation control, with these values falling between 2 and 10.
=====October 14, 2015=====
*In coming back from this week's meeting, ways to pare down the network were discussed.
*The method was to:
*# Delete genes that were not connected
*# Delete the genes with the significant p-values from YEASTRACT, but with the highest p-value, regardless of if they were deletion strains
*#* See how many networks you get from deleting this
*#* If more than one gene was affected, resulting in 0 connections with the network, those genes were to be deleted as well
*# Next, regard the deletion strains and make sure that they remain in the network
*#* The same steps were followed for above, however, these genes must be kept such that the deletion strains remain in the network
=====October 21, 2015=====
The first list contains the initial genes found in the network with no regard for the genes that have microarray data for them from the Dahlquist Lab.
Here are the genes that made it to the initial network containing 32 nodes and 71 edges:
*ACE2
*MSN2
*SFP1
*YHP1
*YOX1
*ASH1
*ASF1
*CSE2
*SNF2
*CYC8
*MGA2
*STB5
*SWI5
*YLR278C
*CST6
*RPN4
*SNF6
*MSN4
*ABF1
*SNF5
*ZAP1
*GCN4
*TAF14
*PHO2
*MCM1
*AFT2
*HSF1
*SKO1
*SWI3
*GCR2
*SOK2
*CIN5
With these remaining nodes, the network was pared down based on the least significant p-value within this network. There were a subsequent 14 deletions. The list that follows contains which genes were deleted and how many nodes and edges remained after the specific gene's deletion.
*Deletion 1: SOK2
**31 nodes
**58 edges
*Deletion 2: GCR2
**20 nodes
**56 edges
*Deletion 3: SWI3
**29 nodes
**54 edges
*Deletion 4: SKO1
**28 nodes
**49 edges
*Deletion 5: HSF1
**26 nodes
**46 edges
**CSE2 deleted as a result of deletion of HSF1
*Deletion 6: MCM1
**24 nodes
**39 edges
**SNF6 deleted sequentially
*Deletion 7: PHO2
**23 nodes
**38 edges
*Deletion 8: TAF14
**22 nodes
**37 edges
*Deletion 9: GCN4
**21 nodes
**36 edges
*Deletion 10: SNF1
**20 nodes
**35 edges
*Deletion 11: CIN5
**19 nodes
**29 edges
*Deletion 12: AFT2
**18 nodes
**28 edges
*Deletion 13: ZAP1
**17 nodes
**27 edges
*Deletion 14: ABF1
**13 nodes
**17 edges
**CST6, MGA2, and SNF2 deleted as a result of ABF1 deletion


When considering the TFs studied in this lab, different deletions were made to ensure that the TFs of interest would remain in the network. The initial network had the following 35 nodes and 88 edges:
====Week of January 19, 2017 ====
*ACE2
Monday: Worked on writing the abstract for the SCSBC at UC Irvine on Saturday, 28/01. The abstract can be found on the Dahlquist Lab repository on github.
*MSN2
*SFP1
*YHP1
*YOX1
*ASH1
*ASF1
*CSE2
*SNF2
*CYC8
*MGA2
*STB5
*SWI5
*YLR278C
*CST6
*RPN4
*SNF6
*MSN4
*ABF1
*SNF5
*ZAP1
*GCN4
*TAF14
*PHO2
*MCM1
*AFT2
*HSF1
*SKO1
*SWI3
*GCR2
*SWI4
*CIN5
*GLN3
*HMO1
*HAP4
19 subsequent deletions were made following the this original network to pare down the network to a mere 15 nodes.
*Deletion 1: GCR2
**24 nodes
**86 edges
*Deletion 2: SWI3
**33 nodes
**84 edges
*Deletion 3: SKO1
**32 nodes
**78 edges
*Deletion 4: HSF1
**31 nodes
**74 edges
*Deletion 5: MCM1
**30 nodes
**65 edges
**CSE2 deleted as a result of deletion of MCM1
*Deletion 6: PHO2
**28 nodes
**64 edges
*Deletion 7: TAF14
**27 nodes
**63 edges
*Deletion 8: SNF5
**26 nodes
**62 edges
*Deletion 9: MSN4
**25 nodes
**57 edges
*Deletion 10: SNF6
**24 nodes
**56 edges
*Deletion 11: RPN
**23 nodes
**54 edges
*Deletion 12: CST6
**22 nodes
**53 edges
*Deletion 13: YLR278C
**21 nodes
**50 edges
*Deletion 14: SWI5
**20 nodes
**48 edges
*Deletion 15: STB5
**19 nodes
**45 edges
*Deletion 16: MGA2
**18 nodes
**43 edges
*Deletion 17: CYC8
**17 nodes
**37 edges
*Deletion 18: SNF2
**16 nodes
**36 edges
*Deletion 19: ASH1
**15 nodes
**32 edges


=====October 28, 2015=====
Thursday: Not present. Interview at Harvard Medical School.
Took a look at the Degradation rates (half-lives) of the mRNA from literature.


The two papers that we decided to draw data from were: Wang 2002 and Shalem 2008. The following weeks will be spent using Access to pick out the specific half-lives of the wanted transcription factors.
====Week of January 26, 2017 ====
Monday: Finished most of the poster that will be presented this upcoming Saturday at the conference. I wrote much of the content and analysis and Brandon worked on the formatting. Much of the analysis done was on the optimized production and threshold b value's, a motif - Hmo1 --> Msn2 --> Cin5 --> Yhp1.


===November 2015===
Thursday: Went over poster during lab meeting. With Dahlquist's correction, I updated the poster and uploaded it to the github repositoryto be edited and reviewed by Dahlquist before printing.
====November 4, 2015====
We worked on pulling out the data from the respective data sets we chose to work on --> Grace w/ Wang & myself w/ Shalem
*Access was used to extract the data for the transcription factors of interest (TFoI)
*Both the Access protocol and the list of TFoI can be found on Grace's page
<b>Results</b>
<br>
Only one TF did not have a half life associated with it: YPL248C.
<br>
Most of the TFs also had large differences between the two measured half-lives in minutes. For example, YER045C had one half-life equal to 146.871 with its second value at 12.751. Their average is 79.811 with standard deviation set at 94.837.
<br>
Many of these values, half-lives with large discrepancies between the measurements, were seen (at least for the TFoI).
<br>
I created a column where it lists TFs that had standard deviations less than 10. If those values were less than 10, the numerical value was kept; however, if the numbers were greater than 10, the value was sent to 0 for easier analysis.
<br>
The values that were sent to 0 were then counted. Out of the 202 TFoI, 136 were sent to 0.
<br>
To me, this suggests that we may not be able to use the measured half-life values from this source - just based on the raw numbers.


The data file can be seen here: [[Media:NW_DegRates_SpecificTFs.xlsx‎]]
=== February 2017 ===
====November 11, 2015====
====January 31, 2017 & February 2, 2017====
This week Input sheets were created for the networks that we created based on the criteria of p-values.
Monday: Reran the networks derived from dgln3, dhap4, and dzap1 on bouldardii 2 for consistency so that there aren't any discrepancies from running these networks on a different computer.
<br>
Input sheet creation protocol can be found on Github's wiki. Some of the pages are blank, such as the degradation and production rates sheets. Some of the parameters also are blank because I did not know what values to input for them.
*I may just use the values used in previous network sheets for these values (i.e. MaxIter, TolMax, etc.)


====November 18, 2015====
Thursday:
We wanted to finish creating the Input sheets before running them in MATLAB. We would try to accomplish this before our next meeting (after Thanksgiving Break)
*Compiled the optimized parameters into one file as well as the MSE values for individual genes in each of the networks. Each of the networks were visualized again on GRNsight just to ensure that the visualizations match with the output optimized weights for each network.
*Received feedback from Dr. Dahlquist on my annotated bibliography as well as additional sources to use for my thesis.


===December 2015===
====Week of February 6, 2017====
====December 2, 2015====
Monday: Edited the 10 random output sheet's K. Grace Johnson ran last year to make them into input sheets to re-run on boulardii 2.
I finished creating my last two input sheets for the wt_added genes. We tried running our input sheets and were not
*I deleted all the output sheets: the sigmas, optimized_network_weights, optimized_expression, and the optimized production and threshold_b
*I copied the production and degradation rates from Brandon's dhap4 network into all the corresponding sheets in the random network input sheets
Worked on creating the working abstract for my talk during LMU's Undergraduate Research Symposium.
*The adjacency matrices from the random network files were then copied and pasted into the adjacency matrix of Brandon's file so that all parameters and information would be the same. The only difference was the network and the network weight sheets.


We will create a test sheet without missing values.
Thursday: I was not here due to an interview at UCSF's medical school.


==Fall 2016==
====Week of February 13, 2017====
I was abroad last spring semester, Spring 2016, so there are no notes or records of my experience in the lab at that time.
Monday: I generated some random networks with Brandon's R script to be run on the model. A folder was created to hold all the input and output sheets for the random networks that are run with GRNmap [https://github.com/kdahlquist/DahlquistLab/tree/master/data/bouldardii2_GRNmap_outputs/Random_network_intput_output]. For further analysis, I will also look at the distribution of the in and out degrees of all the random networks compared to the network derived from the dhap4 data.
*Distribution of weights (positive vs. negative) and the overall network
*Are any motifs/connections conserved?
*Any self or auto-regulators?
*Visualization will also be seen via GRNsight


===September 2016===
Throughout the next couple of weeks, I will be running the random networks generated on GRNmap.
====September 14, 2016====
====Week of February 20, 2017====
Monday:Began to look at the MSE values of the db networks 1 & 5 (derived from wt and dhap4 data) compared to the p values from the ANOVA. For the analysis, I looked at the expression data plots categorized by number of significant p values (B&H p values) at the suggestion of Brandon. Divisions were made as follows:
*Two or more significant p values across all strains
*One significant p value across all strains
*No significant p value across all strains
I then described whether the MSE value that was matched with the p value fit well with the modeled dynamics from the expression plots. I created an excel file with these comparisons and comments about the fit of the model, which can be found in the Dahlquist Lab Repository [https://github.com/kdahlquist/DahlquistLab/blob/master/data/15-gene_networks_analysis/pvalue_MSE_comparison.xlsx pvalue_MSE_comparison].


This week's job was to understand and analyze the study of Neymotin et al (2014) to derive degradation rates from the half-life values of the genes annotated.
Thursday: Continued to do analysis of p values and MSE outputs by looking at the expression plots. However, during meeting, was told that this was futile and would not generate results because I should be looking at the minMSE for each gene's output. By comparing the MSE:minMSE ratio for each gene, I could see if genes with p values had better or worse fit due to the ratio.
====Week of February 27, 2017====
Monday: Continued to run the random networks on boulardii 2 (left off at random network 23). Instead of using the expression plots for analysis, I began to compare the MSE values of db network 5 (derived from hap4) and the random networks with the same number of nodes (15) and edges (28). The last random network included in the file is rand19. On every sheet, I have the MSE value output from the run in GRNmap next to the p values from the ANOVA for the dhap4 strain. Below those comparisons, we see the differences in MSE values of the random network from the db derived network 5.
*If the number is negative, it suggests that GRNmap reduced the mean square error of that individual gene in the random networks;
*however, if the number is larger, then the db derived network's individual gene saw better modeling/mean square error.
This file can be found in the pvalue_MSE_comparison excel file [https://github.com/kdahlquist/DahlquistLab/blob/master/data/15-gene_networks_analysis/pvalue_MSE_comparison.xlsx].


Personally, I received feedback on my HNRS thesis abstract that is to be submitted on Sept. 30. I made the changes and sent them to Dr. Dahlquist.
Thursday: Continued to run the random networks on boulardii 2 (random network 27 currently running). There are only three remaining random networks (28-30) that need to be run on GRNmap. I carried on with my compilation of the random network MSE value comparisons to db network 5. The last LSE:minLSE comparison made was between db 5 and random network 26. Again, the file can be found [https://github.com/kdahlquist/DahlquistLab/blob/master/data/15-gene_networks_analysis/pvalue_MSE_comparison.xlsx here] on the Dahlquist Lab Repository under the file name pvalue_MSE_comparison.xlsx.  


I have worked more of the R Tutorial that Dr. Dahlquist has issued to both Brandon and me. While Brandon has already coded a script to generate random matrices, our next task will be to come up with code to then generate the distribution of in-degree and out-degree via a bar graph.
Further, on the sheet labeled dhap4, a bar graph comparing the LSE:minLSE ratios for all the GRNs run on MATLAB thus far can be found. I've begun to look at the regulatory relationships identified in the three lowest and three highest ratio random networks.
*The smallest LSE:minLSE ratios
**random networks 15, 16, and 24
*The largest LSE:minLSE ratios
**random networks 5, 7, and 12


====Week of March 4, 2017====
Spring Break this week. I was in Mammoth for the week.
Sunday: Kristen noticed that random networks 4 & 5 were identical, so I created a new random network (rand 31) to be run in GRNmap. After it was run in the model, the optimization diagnostics showed that random network 31 had a larger LSE:minLSE ratio than random network 5. Therefore, analysis will now be conducted on the following with the highest LSE:minLSE ratios:
*Random network 7, 12, and 31
**Rand7: 1.5202
**Rand12: 1.5080
**Rand31: 1.5001
Thursday: I computed the minMSE values for the DB5 network so that I could use the information for my Symposium presentation. The following protocol was done.
#Using the log2 expression data for the specific strain in a input sheet, average the values for the same timepoints
#* i.e. for wt strain, there are four 15 time point measurements, five 30 time point measurements, and four 60 time point measurements. Therefore, the first gene's average log fold expression change is averaged across four timepoints for 15, five for 30, and four for 60.
#** ABF1 averages: 15 = -1.1878; 30 = -1.1819; 60 = -1.9142
#Next, the difference between each individual log2 expression at a given time point and the average for that given time point was found.
#* i.e for wt's ABF1 gene, we see the following formula: = t15.1 - avg15, where t15.1 is the individual log2 expression at the first 15 time point and avg15 is the average of the four observed expression changes at 15 time point replicates
#** ABF1's first 15 time point: = B2 - P2 = t15.1 - avg15 = -2.1071 - (-1.1878) = -0.9193
#Then, the differences are squared so that no negative numbers result and to account for differences seen above and below this difference
#* i.e. for wt's ABF1 gene, we see the following formula in the cell: = B20^2
#The squared differences were then summed up for all the time points and divided by the total number of time points.
#* The formula used was as follows: =SUM(B38:N38)/13 (based off wt's ABF1 gene)
#** i.e. for wt, there are 13 time points (four 15 time points + five 30 time points + four 60 time points = 13)
#** Note that the sum for all these time points differs for each individual strain, such that db4 (dGLN3) has 12 overall time points
#To ensure that these calculations were correct, I first used this procedure to calculate the MSE observed via the model. After I receive the same output values, I proceeded to calculate the actual minMSEs.
====Week of March 12, 2017====
Monday: I worked on completing the analysis of my results. I used Brandon's regulatory relationship workbook to compare the regulatory relationships for DB5 and the three best (15, 16, 24) and worst (7, 12, 31) random networks.
*Process for isolating regulatory relationships
*#Using GRNsight, I visualized the weighted networks of interest and exported the network as a .siv file to isolate the regulatory relationships between regulator and target gene
*#Next, I opened the SIV file in Excel. In a new Excel workbook, I wrote down the relationship between the transcription factor and its target as Regulator --> Target Gene in one cell with the weight of the transcription factor's influence in the column right of it.
*#After I saved all these relationships for the seven networks (DB5, Rand7, Rand12, Rand15, Rand16, Rand24, and Rand31), I compiled all of their regulatory relationships together in a list.
*#Next, I pasted the values that corresponded to a specific node/relationship for each network into the correct cell.
*#*Reading R->L (DB5, Rand7, ..., Rand31)
*#Because Brandon's Excel file already highlighted cell's based on the weights within them, stronger activators were colored red; stronger repressors were colored blue, and grey was used for the weak influencers.
Thursday: Presented a first draft of my presentation for LMU's URS. That was the focus of lab this day.
*Finished up the first draft of my presentation
*For further analysis, I included:
**The sum of weights to identify if the network was 'overall repressive (-) or activating (+)'
**The shared nodes between DB5 and the 3 best and 3 worst networks & found that it shared more nodes with the better networks
====Week of March 19, 2017====
Monday: Worked on completing my powerpoint presentation for the LMU's Undergraduate Research Symposium. I sat down with Dr. Dahlquist to discuss my presentation and re-work some of the analysis that I did.
Thursday: I practiced my presentation in Sea120 before I rehearsed my presentation in front of my lab. Later, I presented my powerpoint for the symposium to my fellow researchers. I received feedback (overall positive, with minor changes to make). I listened to Kristen's presentation, too, before the end of lab.
====Week of March 26, 2017====
Monday: Worked on a lot of my thesis, writing my discussion
Thursday: Continued to work and write my thesis before the holiday (Cesar Chavez). During the lab meeting, we discussed future directions/what we should work on for the remainder of our time in the lab.


==Documents==
==Documents==

Latest revision as of 14:00, 11 April 2017

Natalie Williams: Electronic Notebook

Protocol for MATLAB

This page will help you input and run data sets from your document into an output.

Fall 2014

This contains all the procedures and tasks that I completed and the trials that I ran in Fall 2014.

Spring 2015

This contains all the procedures and tasks that I completed and the trials that I ran in Spring 2015. Most of the activities/notes for this semester focused on creating a poster for the various conferences that we attended in the Spring.

Summer 2015

This link has all the information for what occurred over the summer. A lot of it was testing the code by changing the initial weights and the threshold b values of the input sheets.

Fall 2015

Fall 2016

Spring 2017

January 2017

Week of January 12, 2017

Monday & Thursday: Worked on collecting sources for my thesis project. The annotated bibliography is due 20/01. I will be in Boston at that time, but I will still submit my annotated bibliography in time. We had our first lab meeting of the semester on Thursday.

Week of January 19, 2017

Monday: Worked on writing the abstract for the SCSBC at UC Irvine on Saturday, 28/01. The abstract can be found on the Dahlquist Lab repository on github.

Thursday: Not present. Interview at Harvard Medical School.

Week of January 26, 2017

Monday: Finished most of the poster that will be presented this upcoming Saturday at the conference. I wrote much of the content and analysis and Brandon worked on the formatting. Much of the analysis done was on the optimized production and threshold b value's, a motif - Hmo1 --> Msn2 --> Cin5 --> Yhp1.

Thursday: Went over poster during lab meeting. With Dahlquist's correction, I updated the poster and uploaded it to the github repositoryto be edited and reviewed by Dahlquist before printing.

February 2017

January 31, 2017 & February 2, 2017

Monday: Reran the networks derived from dgln3, dhap4, and dzap1 on bouldardii 2 for consistency so that there aren't any discrepancies from running these networks on a different computer.

Thursday:

  • Compiled the optimized parameters into one file as well as the MSE values for individual genes in each of the networks. Each of the networks were visualized again on GRNsight just to ensure that the visualizations match with the output optimized weights for each network.
  • Received feedback from Dr. Dahlquist on my annotated bibliography as well as additional sources to use for my thesis.

Week of February 6, 2017

Monday: Edited the 10 random output sheet's K. Grace Johnson ran last year to make them into input sheets to re-run on boulardii 2.

  • I deleted all the output sheets: the sigmas, optimized_network_weights, optimized_expression, and the optimized production and threshold_b
  • I copied the production and degradation rates from Brandon's dhap4 network into all the corresponding sheets in the random network input sheets

Worked on creating the working abstract for my talk during LMU's Undergraduate Research Symposium.

  • The adjacency matrices from the random network files were then copied and pasted into the adjacency matrix of Brandon's file so that all parameters and information would be the same. The only difference was the network and the network weight sheets.

Thursday: I was not here due to an interview at UCSF's medical school.

Week of February 13, 2017

Monday: I generated some random networks with Brandon's R script to be run on the model. A folder was created to hold all the input and output sheets for the random networks that are run with GRNmap [1]. For further analysis, I will also look at the distribution of the in and out degrees of all the random networks compared to the network derived from the dhap4 data.

  • Distribution of weights (positive vs. negative) and the overall network
  • Are any motifs/connections conserved?
  • Any self or auto-regulators?
  • Visualization will also be seen via GRNsight

Throughout the next couple of weeks, I will be running the random networks generated on GRNmap.

Week of February 20, 2017

Monday:Began to look at the MSE values of the db networks 1 & 5 (derived from wt and dhap4 data) compared to the p values from the ANOVA. For the analysis, I looked at the expression data plots categorized by number of significant p values (B&H p values) at the suggestion of Brandon. Divisions were made as follows:

  • Two or more significant p values across all strains
  • One significant p value across all strains
  • No significant p value across all strains

I then described whether the MSE value that was matched with the p value fit well with the modeled dynamics from the expression plots. I created an excel file with these comparisons and comments about the fit of the model, which can be found in the Dahlquist Lab Repository pvalue_MSE_comparison.

Thursday: Continued to do analysis of p values and MSE outputs by looking at the expression plots. However, during meeting, was told that this was futile and would not generate results because I should be looking at the minMSE for each gene's output. By comparing the MSE:minMSE ratio for each gene, I could see if genes with p values had better or worse fit due to the ratio.

Week of February 27, 2017

Monday: Continued to run the random networks on boulardii 2 (left off at random network 23). Instead of using the expression plots for analysis, I began to compare the MSE values of db network 5 (derived from hap4) and the random networks with the same number of nodes (15) and edges (28). The last random network included in the file is rand19. On every sheet, I have the MSE value output from the run in GRNmap next to the p values from the ANOVA for the dhap4 strain. Below those comparisons, we see the differences in MSE values of the random network from the db derived network 5.

  • If the number is negative, it suggests that GRNmap reduced the mean square error of that individual gene in the random networks;
  • however, if the number is larger, then the db derived network's individual gene saw better modeling/mean square error.

This file can be found in the pvalue_MSE_comparison excel file [2].

Thursday: Continued to run the random networks on boulardii 2 (random network 27 currently running). There are only three remaining random networks (28-30) that need to be run on GRNmap. I carried on with my compilation of the random network MSE value comparisons to db network 5. The last LSE:minLSE comparison made was between db 5 and random network 26. Again, the file can be found here on the Dahlquist Lab Repository under the file name pvalue_MSE_comparison.xlsx.

Further, on the sheet labeled dhap4, a bar graph comparing the LSE:minLSE ratios for all the GRNs run on MATLAB thus far can be found. I've begun to look at the regulatory relationships identified in the three lowest and three highest ratio random networks.

  • The smallest LSE:minLSE ratios
    • random networks 15, 16, and 24
  • The largest LSE:minLSE ratios
    • random networks 5, 7, and 12

Week of March 4, 2017

Spring Break this week. I was in Mammoth for the week. Sunday: Kristen noticed that random networks 4 & 5 were identical, so I created a new random network (rand 31) to be run in GRNmap. After it was run in the model, the optimization diagnostics showed that random network 31 had a larger LSE:minLSE ratio than random network 5. Therefore, analysis will now be conducted on the following with the highest LSE:minLSE ratios:

  • Random network 7, 12, and 31
    • Rand7: 1.5202
    • Rand12: 1.5080
    • Rand31: 1.5001

Thursday: I computed the minMSE values for the DB5 network so that I could use the information for my Symposium presentation. The following protocol was done.

  1. Using the log2 expression data for the specific strain in a input sheet, average the values for the same timepoints
    • i.e. for wt strain, there are four 15 time point measurements, five 30 time point measurements, and four 60 time point measurements. Therefore, the first gene's average log fold expression change is averaged across four timepoints for 15, five for 30, and four for 60.
      • ABF1 averages: 15 = -1.1878; 30 = -1.1819; 60 = -1.9142
  2. Next, the difference between each individual log2 expression at a given time point and the average for that given time point was found.
    • i.e for wt's ABF1 gene, we see the following formula: = t15.1 - avg15, where t15.1 is the individual log2 expression at the first 15 time point and avg15 is the average of the four observed expression changes at 15 time point replicates
      • ABF1's first 15 time point: = B2 - P2 = t15.1 - avg15 = -2.1071 - (-1.1878) = -0.9193
  3. Then, the differences are squared so that no negative numbers result and to account for differences seen above and below this difference
    • i.e. for wt's ABF1 gene, we see the following formula in the cell: = B20^2
  4. The squared differences were then summed up for all the time points and divided by the total number of time points.
    • The formula used was as follows: =SUM(B38:N38)/13 (based off wt's ABF1 gene)
      • i.e. for wt, there are 13 time points (four 15 time points + five 30 time points + four 60 time points = 13)
      • Note that the sum for all these time points differs for each individual strain, such that db4 (dGLN3) has 12 overall time points
  5. To ensure that these calculations were correct, I first used this procedure to calculate the MSE observed via the model. After I receive the same output values, I proceeded to calculate the actual minMSEs.

Week of March 12, 2017

Monday: I worked on completing the analysis of my results. I used Brandon's regulatory relationship workbook to compare the regulatory relationships for DB5 and the three best (15, 16, 24) and worst (7, 12, 31) random networks.

  • Process for isolating regulatory relationships
    1. Using GRNsight, I visualized the weighted networks of interest and exported the network as a .siv file to isolate the regulatory relationships between regulator and target gene
    2. Next, I opened the SIV file in Excel. In a new Excel workbook, I wrote down the relationship between the transcription factor and its target as Regulator --> Target Gene in one cell with the weight of the transcription factor's influence in the column right of it.
    3. After I saved all these relationships for the seven networks (DB5, Rand7, Rand12, Rand15, Rand16, Rand24, and Rand31), I compiled all of their regulatory relationships together in a list.
    4. Next, I pasted the values that corresponded to a specific node/relationship for each network into the correct cell.
      • Reading R->L (DB5, Rand7, ..., Rand31)
    5. Because Brandon's Excel file already highlighted cell's based on the weights within them, stronger activators were colored red; stronger repressors were colored blue, and grey was used for the weak influencers.

Thursday: Presented a first draft of my presentation for LMU's URS. That was the focus of lab this day.

  • Finished up the first draft of my presentation
  • For further analysis, I included:
    • The sum of weights to identify if the network was 'overall repressive (-) or activating (+)'
    • The shared nodes between DB5 and the 3 best and 3 worst networks & found that it shared more nodes with the better networks

Week of March 19, 2017

Monday: Worked on completing my powerpoint presentation for the LMU's Undergraduate Research Symposium. I sat down with Dr. Dahlquist to discuss my presentation and re-work some of the analysis that I did.

Thursday: I practiced my presentation in Sea120 before I rehearsed my presentation in front of my lab. Later, I presented my powerpoint for the symposium to my fellow researchers. I received feedback (overall positive, with minor changes to make). I listened to Kristen's presentation, too, before the end of lab.

Week of March 26, 2017

Monday: Worked on a lot of my thesis, writing my discussion

Thursday: Continued to work and write my thesis before the holiday (Cesar Chavez). During the lab meeting, we discussed future directions/what we should work on for the remainder of our time in the lab.

Documents

Summer 2015

To view the most updated powerpoint click here
To see the input sheet that was run for the fixed b trial, please click this link
To view the output file from this fixed b trial, click here
To see the input sheet that was run from the estimated b, please click this
To view the output file from the estimated b, click here
The powerpoint that reviews and analyzes the outputs can be viewed here

GRNmap Testings

This is the template for future reports: GRNmap Testing Report
GRNmap Testing Report: Strain Run Comparisons 2015-05-27
GRNmap Testing Report: Non-1 Initial Weight Guesses 2015-05-28

Other Links

Back to User:Natalie Williams
To visit the Dahlquist Lab: click here
To see K. Grace J's Notebook: click here