Natalie Williams: Electronic Notebook: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
(→‎April 2015: Edited heading as well as what we did that week)
 
(149 intermediate revisions by the same user not shown)
Line 1: Line 1:
{{TOC right| TOC limit|4}}
==Natalie Williams: Electronic Notebook==
==Natalie Williams: Electronic Notebook==


Line 7: Line 9:
*[[Dahlquist:GRNmap]]
*[[Dahlquist:GRNmap]]


=== Electronic Notebook ===
===[[Natalie Williams Fall Electronic Notebook 2014 |Fall 2014]]===
===Fall 2014===
This contains all the procedures and tasks that I completed and the trials that I ran in Fall 2014.
====September 2014====
 
=====September 18, 2014=====
===[[Natalie Williams Spring Electronic Notebook 2015 |Spring 2015]]===
Data Set Up<br>
This contains all the procedures and tasks that I completed and the trials that I ran in Spring 2015. Most of the activities/notes for this semester focused on creating a poster for the various conferences that we attended in the Spring.
Openwetware familiarization: I became familiar with openwetware code and programing <br>
 
MATLAB procedure: the MATLAB procedure that was written contains the instructions in using it to receive the output with the optimization network weights of the system. <br>
===[[Natalie Williams Summer Electronic Notebook 2015 |Summer 2015]]===
<br>
This link has all the information for what occurred over the summer. A lot of it was testing the code by changing the initial weights and the threshold b values of the input sheets.
Network<br>
Ten random networks were made from the original network.
*The original network Excel file was used, and each cell on the network sheet had the following formula in it:
** =IF(RAND()<0.1134,1,0)
*This procedure was done ten times to get these ten random networks
*Each network was saved as rand# (1 - 10)


=====September 25, 2014=====
===[[Natalie Williams Fall Electronic Notebook 2015 |Fall  2015]]===
The random network Excel files were put into MATLAB to be run to get the optimized weights of the network
*The file is saved as the final name with _output.xls
Opening the file, the weights of these networks was found on the optimized_network_weights sheet.<br>
<br>
Visualization of the Networks<br>
These files had to be re-saved as .xlsx in order to upload them to GRNSight. GRNSight visualizes the networks and because there are varying numbers that suggest how much one gene controls another, the resulting output has different colors. After each individual random network was visualized, it was compared to the original network. For better analysis, the same order of the proteins was used to see the different connectivities.


====October 2014====
===[[Natalie Williams Fall Electronic Notebook 2016 | Fall 2016]]===
=====October 2, 2014=====
Information that could have been gathered from comparing the Original network to the 10 Random ones was found. This information includes:
Nodes: the positive and the negative <br>
Frequencies: the In and Out degrees. These were how often one gene controlled other genes. It was found through the following equations:
*=COUNTIF(B2:B22,"<>"&0)
*=COUNTIF(B3:V3,"<>"&0)
*From this, the frequencies were found by looking at how often 0 appeared or 1 appeared, etc.
**For example, =COUNTIF(B23:V23,"=0") for In Degree to see how often 0 occurred
Next, bar graphs were used to compare the weighted networks between a random network and the original network.<br>
After that, the minimum and maximum values from each random network was found.
*The minimum was found using: =MIN(B2:V22)
*The maximum was found using: =MAX(B2:V22)
The sum was found of the entire worksheet of the optimized_network_weights.
*=SUM(B2:V22)
The average of the worksheet was also taken for the entire matrix.
*=AVERAGE(B2:V22)
<br>
We used this information to see if there were any key factors to what made the original network the one that we accepted. We hoped that it would shed some light on what key differences were between the random networks and the original one.


=====October 9, 2014=====
==Spring 2017==
I was out of town, so there was nothing I needed to do specifically for this week.


=====October 16, 2014=====
=== January 2017 ===
I began the process of the forward simulations of the networks. I had to isolate the deletion strains and see if there was any resemblance between the wild type strain with the four deletion strains.
====Week of January 12, 2017====
Monday & Thursday: Worked on collecting sources for my thesis project. The annotated bibliography is due 20/01. I will be in Boston at that time, but I will still submit my annotated bibliography in time. We had our first lab meeting of the semester on Thursday.


=====October 23, 2014=====
====Week of January 19, 2017 ====
All the bugs in the system were noted and written down to be fixed.<br>
Monday: Worked on writing the abstract for the SCSBC at UC Irvine on Saturday, 28/01. The abstract can be found on the Dahlquist Lab repository on github.
The forward simulations were rerun. The production and degradation rates from the output were inserted into each of the individual strains. For the network weights of the individual strains, the output from the general workbook sheet, optimized_network_weights, was used.<br>
The deletion strains needed hard 0's across their row on their worksheet. On the optimization_parameters sheet, the following things needed to be altered:
*iestimate = 0.00+E0
*fixed_b = 0
*strains:
**wt/3/0
**dcin5/4/3, where the first number is the sheet, and the second number is the row of the gene within the sheet
**This controlled which strains would be shown after the workbook was run through MATLAB
Network_b sheet used the optimized_network_b from the general workbook output was used for each individual strain.


=====October 30, 2014=====
Thursday: Not present. Interview at Harvard Medical School.
The Real WT individual strain was compared to the forward simulation WT and deletion strains.<br>
I made a list of transcription factors of the individual strains that did not compare well with the real WT individual. Those transcription factors were going to be looked at more closely and might have been taken away. The parts that I compared were the data points and the fit of the line.  


====November 2014====
====Week of January 26, 2017 ====
=====November 6, 2014=====
Monday: Finished most of the poster that will be presented this upcoming Saturday at the conference. I wrote much of the content and analysis and Brandon worked on the formatting. Much of the analysis done was on the optimized production and threshold b value's, a motif - Hmo1 --> Msn2 --> Cin5 --> Yhp1.
16 transcription factors were taken and run through YEASTRACT. However, the results have to be formatted in a way so that GRNSIght can visualize it the network that results.
  The network I used was created with the following transcription factors:
  ARG80
  CIN5
  GLN3
  HAP4
  HMO1
  NRG2
  RSF2
  RTG3
  STB4
  SWI4
  TBF1
  TOS8
  TYE7
  YHP1
  YOX1
  ZAP1
#Navigate to Generate Regulation Matrix [[http://www.yeastract.com/formgenerateregulationmatrix.php]] on the YEASTRACT
#Select the appropriate check boxes for the filters.
#Paste the list of transcription factors into the appropriate field.
#Paste a list of targets into the Target ORF/Genes field, or check the box to consider all ORF/Genes.
#Click the Generate button.
#In the results window that appears, click the link to download the Regulation matrix results file as a Semicolon Separated Values (CSV) file.
#Once you have downloaded the file, launch Microsoft Excel.
#Select the menu item, File > Open and select the file that you downloaded.
#Select Column A.
#Select the menu item, Data > Text to Columns...
#In the first window of the wizard that appears, select the radio button for "Delimited" and click Next.
#In the second window of the wizard that appears, check the box for "Other" under "Delimiters" and type a semicolon in the field to the right and click Finish.
#Select the menu item, File > Save As.  Save the file as an Excel Workbook (.xlsx).
#The orientation of the matrix has to be flipped. A new worksheet must be created by clicking on the new worksheet icon at the bottom of the screen. Name this new worksheet "network".
#Select the adjacency matrix from the first worksheet and copy it to the clipboard. Go to the "network" worksheet and click on cell A1. Select the menu item Edit > Paste Special. In the window that appears, check the box "Transpose" and click OK.
#The labels for the genes in the columns and rows needs to match. The "p" of the gene names in the columns must be deleted.
#Paste the following text into cell A1 "rows genes affected/cols genes controlling".
#Save your work, which is now ready for loading into GRNsight. The original sheet can be deleted if you want.


Results<br>
Thursday: Went over poster during lab meeting. With Dahlquist's correction, I updated the poster and uploaded it to the github repositoryto be edited and reviewed by Dahlquist before printing.
GRNSight v1.8 had to be used to visualize the networks. Only four of the input selection choices gave network connections among the listed transcription factors.<br>
Documented DNA Binding Evidence
*15 genes
*58 edges
Documented DNA Expression
*15 genes
*31 edges
DNA Expression plus Binding
*15 genes
*58 edges
DNA Expression and Binding
*15 genes
*4 edges
Potential with Motifs
*0 genes
*0 edges
Potential without Motifs
*0 genes
*0 edges
Documented plus Potential
*0 genes
*0 edges
Documented and Potential
*0 genes
*0 edges


=====November 13, 2014=====
=== February 2017 ===
I reran MATLAB to see if I got the same results as Dr. Fitzpatrick. I received the same results as Dr. Fitzpatrick. When each deletion strain was compared to the WT strain, the targeted genes that were supposed to be affected were.  
====January 31, 2017 & February 2, 2017====
Monday: Reran the networks derived from dgln3, dhap4, and dzap1 on bouldardii 2 for consistency so that there aren't any discrepancies from running these networks on a different computer.


===Spring 2015===
Thursday:
====January 2015====
*Compiled the optimized parameters into one file as well as the MSE values for individual genes in each of the networks. Each of the networks were visualized again on GRNsight just to ensure that the visualizations match with the output optimized weights for each network.
I met the other people that are working on this project - Juan, Trixie, and Grace. For this month, we discussed where the project was heading and what parts of the code need to be changed. During these meetings, Profs. Dahlquist and Fitzpatrick gave overviews of the research project and all the computational functions that the model requires.
*Received feedback from Dr. Dahlquist on my annotated bibliography as well as additional sources to use for my thesis.


====February 2015====
====Week of February 6, 2017====
=====February 6, 2015=====
Monday: Edited the 10 random output sheet's K. Grace Johnson ran last year to make them into input sheets to re-run on boulardii 2.
I reran the protocol for microarray data that I received from Dr. Dahlquist. The protocol can be found [[Dahlquist:Microarray Data Processing in R|here]].
*I deleted all the output sheets: the sigmas, optimized_network_weights, optimized_expression, and the optimized production and threshold_b
*First, I created files on my desktop to host the Ontario and GCAT data
*I copied the production and degradation rates from Brandon's dhap4 network into all the corresponding sheets in the random network input sheets
**Each folder contained the following:
Worked on creating the working abstract for my talk during LMU's Undergraduate Research Symposium.
**#The script for either Ontario or GCAT
*The adjacency matrices from the random network files were then copied and pasted into the adjacency matrix of Brandon's file so that all parameters and information would be the same. The only difference was the network and the network weight sheets.
**#The target files for those scripts --> Ontario_Targets and GCAT_Targets
**#The .gpr files from the microarrays were also located in the individual files
*I downloaded and unzipped the files that were listed under the protocol
Note that Ontario was saved under Ontario and GCAT with GCAT
*The R used was the 32-bit
The directory had to be changed to the folder where the files that would be corrected were extracted to
*The Ontario script was run first and then followed by GCAT
*There will be two different outputs from running GCAT. We want the Final_Normalized_Data


My file was then sent to Grace J., who then began to compare the results that we got.
Thursday: I was not here due to an interview at UCSF's medical school.


=====February 12/13, 2015=====
====Week of February 13, 2017====
I spent this day searching literature for data sets of transcription rates with Grace J. We wrote an abstract to submit to the Undergraduate Research Symposium to present what was done last semester. The abstract was submitted the following Friday.
Monday: I generated some random networks with Brandon's R script to be run on the model. A folder was created to hold all the input and output sheets for the random networks that are run with GRNmap [https://github.com/kdahlquist/DahlquistLab/tree/master/data/bouldardii2_GRNmap_outputs/Random_network_intput_output]. For further analysis, I will also look at the distribution of the in and out degrees of all the random networks compared to the network derived from the dhap4 data.
*Distribution of weights (positive vs. negative) and the overall network
*Are any motifs/connections conserved?
*Any self or auto-regulators?
*Visualization will also be seen via GRNsight


=====February 19/20, 2015=====
Throughout the next couple of weeks, I will be running the random networks generated on GRNmap.
Spring Break
====Week of February 20, 2017====
Monday:Began to look at the MSE values of the db networks 1 & 5 (derived from wt and dhap4 data) compared to the p values from the ANOVA. For the analysis, I looked at the expression data plots categorized by number of significant p values (B&H p values) at the suggestion of Brandon. Divisions were made as follows:
*Two or more significant p values across all strains
*One significant p value across all strains
*No significant p value across all strains
I then described whether the MSE value that was matched with the p value fit well with the modeled dynamics from the expression plots. I created an excel file with these comparisons and comments about the fit of the model, which can be found in the Dahlquist Lab Repository [https://github.com/kdahlquist/DahlquistLab/blob/master/data/15-gene_networks_analysis/pvalue_MSE_comparison.xlsx pvalue_MSE_comparison].


=====February 26/27, 2015=====
Thursday: Continued to do analysis of p values and MSE outputs by looking at the expression plots. However, during meeting, was told that this was futile and would not generate results because I should be looking at the minMSE for each gene's output. By comparing the MSE:minMSE ratio for each gene, I could see if genes with p values had better or worse fit due to the ratio.
====Week of February 27, 2017====
Monday: Continued to run the random networks on boulardii 2 (left off at random network 23). Instead of using the expression plots for analysis, I began to compare the MSE values of db network 5 (derived from hap4) and the random networks with the same number of nodes (15) and edges (28). The last random network included in the file is rand19. On every sheet, I have the MSE value output from the run in GRNmap next to the p values from the ANOVA for the dhap4 strain. Below those comparisons, we see the differences in MSE values of the random network from the db derived network 5.
*If the number is negative, it suggests that GRNmap reduced the mean square error of that individual gene in the random networks;
*however, if the number is larger, then the db derived network's individual gene saw better modeling/mean square error.
This file can be found in the pvalue_MSE_comparison excel file [https://github.com/kdahlquist/DahlquistLab/blob/master/data/15-gene_networks_analysis/pvalue_MSE_comparison.xlsx].


We worked on the poster for our presentation at the 7th Annual Undergraduate Research Symposium. We compiled the data that we were going to use and present on our poster. For the random networks, we gathered the LSE's to compare to the lierature-derived. We chose Random Networks 1 and 4 due to them having the lowest and the highest LSE output.
Thursday: Continued to run the random networks on boulardii 2 (random network 27 currently running). There are only three remaining random networks (28-30) that need to be run on GRNmap. I carried on with my compilation of the random network MSE value comparisons to db network 5. The last LSE:minLSE comparison made was between db 5 and random network 26. Again, the file can be found [https://github.com/kdahlquist/DahlquistLab/blob/master/data/15-gene_networks_analysis/pvalue_MSE_comparison.xlsx here] on the Dahlquist Lab Repository under the file name pvalue_MSE_comparison.xlsx.  


====March 2015====
Further, on the sheet labeled dhap4, a bar graph comparing the LSE:minLSE ratios for all the GRNs run on MATLAB thus far can be found. I've begun to look at the regulatory relationships identified in the three lowest and three highest ratio random networks.
=====March 5/6, 2015=====
*The smallest LSE:minLSE ratios
We added more information to our poster.
**random networks 15, 16, and 24
*The largest LSE:minLSE ratios
**random networks 5, 7, and 12


=====March 12/13, 2015=====
====Week of March 4, 2017====
We continued to edit our poster for the 7th Annual Research Symposium. We edited the results section and changed the layout of the poster. We also edited the abstract that we submitted for the symposium to enter ourselves in the WCBSURC (West Coast Biological Sciences Undergraduate Research Conference). We found out the next day that we were accepted to present on April 25, 2015.
Spring Break this week. I was in Mammoth for the week.
Sunday: Kristen noticed that random networks 4 & 5 were identical, so I created a new random network (rand 31) to be run in GRNmap. After it was run in the model, the optimization diagnostics showed that random network 31 had a larger LSE:minLSE ratio than random network 5. Therefore, analysis will now be conducted on the following with the highest LSE:minLSE ratios:
*Random network 7, 12, and 31
**Rand7: 1.5202
**Rand12: 1.5080
**Rand31: 1.5001
Thursday: I computed the minMSE values for the DB5 network so that I could use the information for my Symposium presentation. The following protocol was done.
#Using the log2 expression data for the specific strain in a input sheet, average the values for the same timepoints
#* i.e. for wt strain, there are four 15 time point measurements, five 30 time point measurements, and four 60 time point measurements. Therefore, the first gene's average log fold expression change is averaged across four timepoints for 15, five for 30, and four for 60.
#** ABF1 averages: 15 = -1.1878; 30 = -1.1819; 60 = -1.9142
#Next, the difference between each individual log2 expression at a given time point and the average for that given time point was found.
#* i.e for wt's ABF1 gene, we see the following formula: = t15.1 - avg15, where t15.1 is the individual log2 expression at the first 15 time point and avg15 is the average of the four observed expression changes at 15 time point replicates
#** ABF1's first 15 time point: = B2 - P2 = t15.1 - avg15 = -2.1071 - (-1.1878) = -0.9193
#Then, the differences are squared so that no negative numbers result and to account for differences seen above and below this difference
#* i.e. for wt's ABF1 gene, we see the following formula in the cell: = B20^2
#The squared differences were then summed up for all the time points and divided by the total number of time points.
#* The formula used was as follows: =SUM(B38:N38)/13 (based off wt's ABF1 gene)
#** i.e. for wt, there are 13 time points (four 15 time points + five 30 time points + four 60 time points = 13)
#** Note that the sum for all these time points differs for each individual strain, such that db4 (dGLN3) has 12 overall time points
#To ensure that these calculations were correct, I first used this procedure to calculate the MSE observed via the model. After I receive the same output values, I proceeded to calculate the actual minMSEs.


For Friday, we continued to look for production and degradation rates in the literature.
====Week of March 12, 2017====
Monday: I worked on completing the analysis of my results. I used Brandon's regulatory relationship workbook to compare the regulatory relationships for DB5 and the three best (15, 16, 24) and worst (7, 12, 31) random networks.
*Process for isolating regulatory relationships
*#Using GRNsight, I visualized the weighted networks of interest and exported the network as a .siv file to isolate the regulatory relationships between regulator and target gene
*#Next, I opened the SIV file in Excel. In a new Excel workbook, I wrote down the relationship between the transcription factor and its target as Regulator --> Target Gene in one cell with the weight of the transcription factor's influence in the column right of it.
*#After I saved all these relationships for the seven networks (DB5, Rand7, Rand12, Rand15, Rand16, Rand24, and Rand31), I compiled all of their regulatory relationships together in a list.
*#Next, I pasted the values that corresponded to a specific node/relationship for each network into the correct cell.
*#*Reading R->L (DB5, Rand7, ..., Rand31)
*#Because Brandon's Excel file already highlighted cell's based on the weights within them, stronger activators were colored red; stronger repressors were colored blue, and grey was used for the weak influencers.
Thursday: Presented a first draft of my presentation for LMU's URS. That was the focus of lab this day.
*Finished up the first draft of my presentation
*For further analysis, I included:
**The sum of weights to identify if the network was 'overall repressive (-) or activating (+)'
**The shared nodes between DB5 and the 3 best and 3 worst networks & found that it shared more nodes with the better networks


=====March 19/20, 2015=====
====Week of March 19, 2017====
For this week, we finalized our poster for presenting at LMU's symposium. The Results section as well as some of the background information were edited. Images, graphs, and the layout of the poster were formatted to be clearer and easier to follow. The graph titles as well as the scales were altered so that each graph had the same axises. Furthermore, some of the section headings were edited to summarize the main finding for each result. We printed out our posters to put them up on Friday morning.
Monday: Worked on completing my powerpoint presentation for the LMU's Undergraduate Research Symposium. I sat down with Dr. Dahlquist to discuss my presentation and re-work some of the analysis that I did.  


On Friday, we continued to look for the various degradation and production rates of mRNA.
Thursday: I practiced my presentation in Sea120 before I rehearsed my presentation in front of my lab. Later, I presented my powerpoint for the symposium to my fellow researchers. I received feedback (overall positive, with minor changes to make). I listened to Kristen's presentation, too, before the end of lab.


=====March 26/27, 2015=====
====Week of March 26, 2017====
I was not in the lab this week. From Thursday to Sunday, I attended NSBE's (National Society for Black Engineers) National Convention in Anaheim.
Monday: Worked on a lot of my thesis, writing my discussion


====April 2015====
Thursday: Continued to work and write my thesis before the holiday (Cesar Chavez). During the lab meeting, we discussed future directions/what we should work on for the remainder of our time in the lab.


In April, not much was done on mine and Grace J.'s part. For this month, we both completed the homework assignments for my upper division biomathematics course [[BIOL398-04/S15]]. For the following weeks in April, we worked on Assignments from Week 11 - Week 14. We no longer met on Fridays for this month due to a lack of having assignments.
==Documents==
===Summer 2015===
To view the most updated powerpoint click [[Media:Williams wtANOVA Ttest 2.pptx| here]]
<br>
To see the input sheet that was run for the fixed b trial, please click this [[Media:Williams_Input_Scer_Spar_point01_PROF45.xlsx |link]]
<br>
To view the output file from this fixed b trial, click [[Media:Williams Input Scer Spar point01 PROF45 fixedb output.xlsx| here]]
<br>
To see the input sheet that was run from the estimated b, please click [[Media:Williams Input Scer Spar point01 PROF45 estimatedb.xlsx|this]]
<br>
To view the output file from the estimated b, click [[Media:Williams Input Scer Spar point01 PROF45 estimatedb output.xlsx|here]]
<br>
The powerpoint that reviews and analyzes the outputs can be viewed [[Media:Williams Running GRNmap Results.pptx|here]]


=====April 9, 2015=====
==GRNmap Testings==
Last week was Easter break. This week we met up to discuss our progress with the mRNA production and degradation rates. Grace J. and I combined our findings from the literature into one document and sent it to our mentors and professors - Drs. Dahlquist and Fitzpatrick.
This is the template for future reports: [[GRNmap Testing Report]]
<br>
[[GRNmap Testing Report: Strain Run Comparisons 2015-05-27]]
<br>
[[GRNmap Testing Report: Non-1 Initial Weight Guesses 2015-05-28]]


=====April 16, 2015=====
==Other Links==
We discussed the results that Grace J. got from completing the Assignment due that week. My feedback is seen on my electronic notebook from that course.
Back to [[User:Natalie Williams]]
<br>
To visit the Dahlquist Lab: [[Dahlquist| click here]]
<br>
To see K. Grace J's Notebook: [[Katherine Grace Johnson Electronic Lab Notebook| click here]]


=====April 23, 2015=====
We talked about the conference that we would be attending WCBSURC Saturday. Again, we discussed Grace's results from completing that week's assignments.


=====April 30, 2015=====
We discussed the plan for summer research.
*Journal Clubs
**12-2; Lunch time meeting
*Weekly meetings where we discuss what occurred during the week
*First week training process of going through all the data
*Focusing on sigmoidal model to get results for publications


<br>
[[Category:Dahlquist Lab]]
Back to [[User:Natalie Williams]]
[[Category:GRNmap]]

Latest revision as of 14:00, 11 April 2017

Natalie Williams: Electronic Notebook

Protocol for MATLAB

This page will help you input and run data sets from your document into an output.

Fall 2014

This contains all the procedures and tasks that I completed and the trials that I ran in Fall 2014.

Spring 2015

This contains all the procedures and tasks that I completed and the trials that I ran in Spring 2015. Most of the activities/notes for this semester focused on creating a poster for the various conferences that we attended in the Spring.

Summer 2015

This link has all the information for what occurred over the summer. A lot of it was testing the code by changing the initial weights and the threshold b values of the input sheets.

Fall 2015

Fall 2016

Spring 2017

January 2017

Week of January 12, 2017

Monday & Thursday: Worked on collecting sources for my thesis project. The annotated bibliography is due 20/01. I will be in Boston at that time, but I will still submit my annotated bibliography in time. We had our first lab meeting of the semester on Thursday.

Week of January 19, 2017

Monday: Worked on writing the abstract for the SCSBC at UC Irvine on Saturday, 28/01. The abstract can be found on the Dahlquist Lab repository on github.

Thursday: Not present. Interview at Harvard Medical School.

Week of January 26, 2017

Monday: Finished most of the poster that will be presented this upcoming Saturday at the conference. I wrote much of the content and analysis and Brandon worked on the formatting. Much of the analysis done was on the optimized production and threshold b value's, a motif - Hmo1 --> Msn2 --> Cin5 --> Yhp1.

Thursday: Went over poster during lab meeting. With Dahlquist's correction, I updated the poster and uploaded it to the github repositoryto be edited and reviewed by Dahlquist before printing.

February 2017

January 31, 2017 & February 2, 2017

Monday: Reran the networks derived from dgln3, dhap4, and dzap1 on bouldardii 2 for consistency so that there aren't any discrepancies from running these networks on a different computer.

Thursday:

  • Compiled the optimized parameters into one file as well as the MSE values for individual genes in each of the networks. Each of the networks were visualized again on GRNsight just to ensure that the visualizations match with the output optimized weights for each network.
  • Received feedback from Dr. Dahlquist on my annotated bibliography as well as additional sources to use for my thesis.

Week of February 6, 2017

Monday: Edited the 10 random output sheet's K. Grace Johnson ran last year to make them into input sheets to re-run on boulardii 2.

  • I deleted all the output sheets: the sigmas, optimized_network_weights, optimized_expression, and the optimized production and threshold_b
  • I copied the production and degradation rates from Brandon's dhap4 network into all the corresponding sheets in the random network input sheets

Worked on creating the working abstract for my talk during LMU's Undergraduate Research Symposium.

  • The adjacency matrices from the random network files were then copied and pasted into the adjacency matrix of Brandon's file so that all parameters and information would be the same. The only difference was the network and the network weight sheets.

Thursday: I was not here due to an interview at UCSF's medical school.

Week of February 13, 2017

Monday: I generated some random networks with Brandon's R script to be run on the model. A folder was created to hold all the input and output sheets for the random networks that are run with GRNmap [1]. For further analysis, I will also look at the distribution of the in and out degrees of all the random networks compared to the network derived from the dhap4 data.

  • Distribution of weights (positive vs. negative) and the overall network
  • Are any motifs/connections conserved?
  • Any self or auto-regulators?
  • Visualization will also be seen via GRNsight

Throughout the next couple of weeks, I will be running the random networks generated on GRNmap.

Week of February 20, 2017

Monday:Began to look at the MSE values of the db networks 1 & 5 (derived from wt and dhap4 data) compared to the p values from the ANOVA. For the analysis, I looked at the expression data plots categorized by number of significant p values (B&H p values) at the suggestion of Brandon. Divisions were made as follows:

  • Two or more significant p values across all strains
  • One significant p value across all strains
  • No significant p value across all strains

I then described whether the MSE value that was matched with the p value fit well with the modeled dynamics from the expression plots. I created an excel file with these comparisons and comments about the fit of the model, which can be found in the Dahlquist Lab Repository pvalue_MSE_comparison.

Thursday: Continued to do analysis of p values and MSE outputs by looking at the expression plots. However, during meeting, was told that this was futile and would not generate results because I should be looking at the minMSE for each gene's output. By comparing the MSE:minMSE ratio for each gene, I could see if genes with p values had better or worse fit due to the ratio.

Week of February 27, 2017

Monday: Continued to run the random networks on boulardii 2 (left off at random network 23). Instead of using the expression plots for analysis, I began to compare the MSE values of db network 5 (derived from hap4) and the random networks with the same number of nodes (15) and edges (28). The last random network included in the file is rand19. On every sheet, I have the MSE value output from the run in GRNmap next to the p values from the ANOVA for the dhap4 strain. Below those comparisons, we see the differences in MSE values of the random network from the db derived network 5.

  • If the number is negative, it suggests that GRNmap reduced the mean square error of that individual gene in the random networks;
  • however, if the number is larger, then the db derived network's individual gene saw better modeling/mean square error.

This file can be found in the pvalue_MSE_comparison excel file [2].

Thursday: Continued to run the random networks on boulardii 2 (random network 27 currently running). There are only three remaining random networks (28-30) that need to be run on GRNmap. I carried on with my compilation of the random network MSE value comparisons to db network 5. The last LSE:minLSE comparison made was between db 5 and random network 26. Again, the file can be found here on the Dahlquist Lab Repository under the file name pvalue_MSE_comparison.xlsx.

Further, on the sheet labeled dhap4, a bar graph comparing the LSE:minLSE ratios for all the GRNs run on MATLAB thus far can be found. I've begun to look at the regulatory relationships identified in the three lowest and three highest ratio random networks.

  • The smallest LSE:minLSE ratios
    • random networks 15, 16, and 24
  • The largest LSE:minLSE ratios
    • random networks 5, 7, and 12

Week of March 4, 2017

Spring Break this week. I was in Mammoth for the week. Sunday: Kristen noticed that random networks 4 & 5 were identical, so I created a new random network (rand 31) to be run in GRNmap. After it was run in the model, the optimization diagnostics showed that random network 31 had a larger LSE:minLSE ratio than random network 5. Therefore, analysis will now be conducted on the following with the highest LSE:minLSE ratios:

  • Random network 7, 12, and 31
    • Rand7: 1.5202
    • Rand12: 1.5080
    • Rand31: 1.5001

Thursday: I computed the minMSE values for the DB5 network so that I could use the information for my Symposium presentation. The following protocol was done.

  1. Using the log2 expression data for the specific strain in a input sheet, average the values for the same timepoints
    • i.e. for wt strain, there are four 15 time point measurements, five 30 time point measurements, and four 60 time point measurements. Therefore, the first gene's average log fold expression change is averaged across four timepoints for 15, five for 30, and four for 60.
      • ABF1 averages: 15 = -1.1878; 30 = -1.1819; 60 = -1.9142
  2. Next, the difference between each individual log2 expression at a given time point and the average for that given time point was found.
    • i.e for wt's ABF1 gene, we see the following formula: = t15.1 - avg15, where t15.1 is the individual log2 expression at the first 15 time point and avg15 is the average of the four observed expression changes at 15 time point replicates
      • ABF1's first 15 time point: = B2 - P2 = t15.1 - avg15 = -2.1071 - (-1.1878) = -0.9193
  3. Then, the differences are squared so that no negative numbers result and to account for differences seen above and below this difference
    • i.e. for wt's ABF1 gene, we see the following formula in the cell: = B20^2
  4. The squared differences were then summed up for all the time points and divided by the total number of time points.
    • The formula used was as follows: =SUM(B38:N38)/13 (based off wt's ABF1 gene)
      • i.e. for wt, there are 13 time points (four 15 time points + five 30 time points + four 60 time points = 13)
      • Note that the sum for all these time points differs for each individual strain, such that db4 (dGLN3) has 12 overall time points
  5. To ensure that these calculations were correct, I first used this procedure to calculate the MSE observed via the model. After I receive the same output values, I proceeded to calculate the actual minMSEs.

Week of March 12, 2017

Monday: I worked on completing the analysis of my results. I used Brandon's regulatory relationship workbook to compare the regulatory relationships for DB5 and the three best (15, 16, 24) and worst (7, 12, 31) random networks.

  • Process for isolating regulatory relationships
    1. Using GRNsight, I visualized the weighted networks of interest and exported the network as a .siv file to isolate the regulatory relationships between regulator and target gene
    2. Next, I opened the SIV file in Excel. In a new Excel workbook, I wrote down the relationship between the transcription factor and its target as Regulator --> Target Gene in one cell with the weight of the transcription factor's influence in the column right of it.
    3. After I saved all these relationships for the seven networks (DB5, Rand7, Rand12, Rand15, Rand16, Rand24, and Rand31), I compiled all of their regulatory relationships together in a list.
    4. Next, I pasted the values that corresponded to a specific node/relationship for each network into the correct cell.
      • Reading R->L (DB5, Rand7, ..., Rand31)
    5. Because Brandon's Excel file already highlighted cell's based on the weights within them, stronger activators were colored red; stronger repressors were colored blue, and grey was used for the weak influencers.

Thursday: Presented a first draft of my presentation for LMU's URS. That was the focus of lab this day.

  • Finished up the first draft of my presentation
  • For further analysis, I included:
    • The sum of weights to identify if the network was 'overall repressive (-) or activating (+)'
    • The shared nodes between DB5 and the 3 best and 3 worst networks & found that it shared more nodes with the better networks

Week of March 19, 2017

Monday: Worked on completing my powerpoint presentation for the LMU's Undergraduate Research Symposium. I sat down with Dr. Dahlquist to discuss my presentation and re-work some of the analysis that I did.

Thursday: I practiced my presentation in Sea120 before I rehearsed my presentation in front of my lab. Later, I presented my powerpoint for the symposium to my fellow researchers. I received feedback (overall positive, with minor changes to make). I listened to Kristen's presentation, too, before the end of lab.

Week of March 26, 2017

Monday: Worked on a lot of my thesis, writing my discussion

Thursday: Continued to work and write my thesis before the holiday (Cesar Chavez). During the lab meeting, we discussed future directions/what we should work on for the remainder of our time in the lab.

Documents

Summer 2015

To view the most updated powerpoint click here
To see the input sheet that was run for the fixed b trial, please click this link
To view the output file from this fixed b trial, click here
To see the input sheet that was run from the estimated b, please click this
To view the output file from the estimated b, click here
The powerpoint that reviews and analyzes the outputs can be viewed here

GRNmap Testings

This is the template for future reports: GRNmap Testing Report
GRNmap Testing Report: Strain Run Comparisons 2015-05-27
GRNmap Testing Report: Non-1 Initial Weight Guesses 2015-05-28

Other Links

Back to User:Natalie Williams
To visit the Dahlquist Lab: click here
To see K. Grace J's Notebook: click here