Natalie Williams Fall Electronic Notebook 2016

Fall 2016

I was abroad last spring semester, Spring 2016, so there are no notes or records of my experience in the lab at that time.

September 2016

September 14, 2016

This week's job was to understand and analyze the study of Neymotin et al (2014) to derive degradation rates from the half-life values of the genes annotated.

Personally, I received feedback on my HNRS thesis abstract that is to be submitted on Sept. 30. I made the changes and sent them to Dr. Dahlquist.

I have worked more of the R Tutorial that Dr. Dahlquist has issued to both Brandon and me. While Brandon has already coded a script to generate random matrices, our next task will be to come up with code to then generate the distribution of in-degree and out-degree via a bar graph.

September 21, 2016

Today, Wednesday 21/9/2016, I completed my task of computing the degradation rates from Neymotin et al's article. I uploaded the file to the DahlquistLab repository where it waits to be reviewed by Dr. Dahlquist.

For the completion of my task with the degradation rates, the following was done:

I downloaded the supplemental data (Table S5) from Neymotin et al
From Neymotin's data, I edited the following
- Alphabetized: Gene names were used for the alphabetization
  - For alphabetization, I selected the entire sheet
  - Next, I clicked the Sort button that looks like a funnel, and selected "Custom sort"
  - For custom sort, I selected the column with the gene names, for me, Column 1
  - I, then, sorted from descending order from A -> to Z
- Isolated Half Lives: Created a separate sheet with only Systematic & Gene Names and the thalf life
  - On this new sheet, I copied the Gene names and the thalf lifes corresponding to those genes
  - I calculated the median half life, which will used to calculate the degradation rate of any gene with missing data
  - The following Excel equation was used
  - =MEDIAN("Column Containing thalf lives")
- Degradation Rates: Created an additional sheet for calculating the degradation rate from the half lives
  - Again, the Gene names and the thalf lives were pasted into this new sheet so that the calculations could be carried out on a single page without interfering with other information or formats
  - The following equation was used to calculate the degradation rate
  - = (ln (0.5)/ half life of specific gene)
  - For genes with missing data, the equation would be the following
  - = (ln (0.5)/ median half life)
I used a previous file shared with me from Dr. Dahlquist to make the comparison between this work (Neymotin) and Harbison's list of 203 TFs
I used Microsoft Access to pair the two data sets together using the systematic names in order to identify if there was missing data for the genes
1. First open a new blank database.
2. I imported my two excel files that contained my data
  - This act can be achieved by selecting the External Data tab and clicking the Excel icon
  - I then went through a series of instructions
    1. I browsed my computer for the file that I needed and selected it
    2. I chose the sheet that I would import, for me, this was Harbison's list of 203 TFs and the sheet with Neymotin's calculated degradation rates
    3. Depending on your sheet's format, the first row may either include headings or go directly into your data; select the box if your first row contains column headings
    4. I skipped the next question, asking about field names and the index, clicking next
    5. I then chose my own primary key - setting it to the first column with Systematic Names (not all genes have universal Gene names)
    6. I then clicked finish and import.
  - Now your data should be seen as a table in Access
3. To pair the data sets together, I selected the Create tab and hit Query Design
  - When you selected Query Design, a pop-up window appears and shows all the tables within your current database. Choose the tables that you wish to pair the data for. Exit out of that pop-up window and now you should see your tables with their heading under them.
  - Select the heading that has the information you want to pair with the other file. For me it was the Systematic Names from Neymotin's data with the Systematic Names from Harbison's data
  - Drag the heading and match it to the heading for the other data. Right click on the link that forms between the two headings
  - Because I only want the data from Neymotin's that matches with Harbison's data, I would select the option that states: "Include ALL records from 'Harbison 203' and only those records from 'Neymotin degradation rates' where the joined fields are equal."
  - Press ok and you should now see a pointed arrow head towards Neymotin deg rate heading
  - Now you can drag and drop the headings with the data that you want into the field below. For my query, I selected the names of Harbison's 203 TFs and then dragged down Neymotin's Systematic names as well as the calculated degradation rate to see if any genes were missing.
  - Now that the field is full, click Run to run your query.
4. A table should appear now with the data you wanted beside the heading - for me, I have the Systematic names paired together and their corresponding degradation rates in the column beside them.

- I also chose to include the calculated degradation rates from Neymotin's data in that query as well

I also revised my abstract and sent that to Dr. Dahlquist for review.

September 28, 2016

Today, I worked on some of the TRACE documentation (Numbers 5 & 6). For No. 5, I noted that it required descriptions of how coding was tested and implemented as well as software design. For those portions of the Implementation Verification (No. 5), I will have Eddie from the Coding team help me.

I finished edits of my Honors Thesis abstract and submitted it.

In talking with Dr. Dahlquist, Brandon and I will formulate the standard of input sheets needed for the lab. The process includes:

Using the genes from last year's GRNs
Plugging those genes into YEASTRACT to get the most up-to-date connections with those genes
Uploading the matrix into GRNSight to make sure that all the genes in the GRN are connected to each other
Creating the input sheets from scratch with the new degradation rates I computed, estimated production rates, and the expression rates from the microarray data
- For any missing data points, it was decided that the average of expression levels from all available time points will be used
- To ensure knowledge of missing time points, cells will be highlighted in different colors so that when GRNmap can execute with missing values, the filled in data can be removed easily

October 2016

October 5, 2016

Today, I created the input sheets for the two strains that I have - wild-type & dCIN5 from Kayla Jackson's file. The protocol can be found on the Dahlquist Github repository.

To achieve the degradation rates and the log expression data for each strain, the Access protocol above was use. The data from one workbook was paired to the existing data in the other workbook with the log expression so that only genes in the network had their expression's noted.

October 12, 2016

Today, I worked on a lot of documentation and cleaning up my various files that I have shared.

The first thing I updated was the protocol to obtaining the file with the degradation rates and the calculations that I did from the half lives.
I updated the wiki (github) with the newer protocol, which still has be to be reviewed by Dr. D.
I tweaked a few files that I've uploaded to the Dahlquist Repository on Github

The files that I edited are the following:

wt_NEW_Input_16_Node; I changed the optimization parameters sheet to add the headings 'optimization_parameter' and 'value'
- I then updated the values for the optimization parameters, i.e. alpha and MaxIter. These values can be found under Step 11: GRNmap on Dr. Dahlquist's Microarray Data Workflow.
dCIN5_NEW_KJ_15_Node; again, I changed the optimization parameters sheet to added the headings 'optimization_parameter' and 'value' and updated the parameters' values according to the workflow mentioned above
Neymotin_Williams_TF_Comparison; I added an additional sheet for the rounded values that will be used for the degradation rates of the input sheets.

I reviewed Brandon's input sheets while he reviews mine. Because I don't know how thoroughly I should've reviewed his data, I started with the accumulation of the log expression data. After that, I will average the numbers for the missing data, to ensure that what we have calculated is convergent.

I had to reupload and recalculate the degradation rates. Instead of taking the median of the TFs from Harbison's list, I took the median of all the genes on Neymotin's list. The median calculated from all of the genes from Neymotin's data was 10.2 compared to the median from the TFs in Harbison's list, which is 7.

October 19, 2016

I had to re-format and re-upload my old Input sheets. The files on the Dahlquist Repository did not have the data from all the strains, which was requested. Further, I updated the degradation rates as well as the production rates for the files. I continued with the formatting of the cells, using Arial font @ 11.

I then focused on trying to find the degradation rates that I found earlier this semester. The sites were bookmarked on my computer; however, there was an issue with my laptop where it wiped most recently bookmarked websites. Unfortunately, I wasn't able to find the specific sources I had earlier.

October 26, 2016

Updated wild-type input sheet with dHMO1 log fold expression and also changed the format such that the labels weren't capitalized for the wt input sheet. Because the dCIN5 network does not contain HMO1 in its GRN, there was no need to include the dHMO1 expression data in the workbook for the input sheet.

dCIN5_log2_expression --> dcin5_log2_expression

I tested GRNmap with the wt data and the Testing Report can be found here.

The issue on Github can be found here: #265. Please note, that to download version 1.4.4 of GRNmap code, you have to go to the GRNmap website and click the 'Downloads' link.
From the 'Downloads' link, you will choose to Download GRNmap Source Code and click the latest version of the code. For me on 2016/10/26, that version was 1.4.4.

I tested GRNmap with the dCIN5 data and the Testing Report can be found here.
The same steps were taken above for the downloading of the code.

Changes had to be made to the CIN5 network input sheet due to multiple reasons. The main change was the time headings for the dcin5_log2_expression sheet. Instead of having other times (30 & 60) available, all the columns read 15.

October 31, 2016

Research Meeting: Went over Kristen Hortsmann & Maggie O'Neal's powerpoint and Brandon Klein and my powerpoint.

Kristen & Maggie's powerpoint, from Kristen's Lab Notebook page: here
Mine and Brandon's powerpoint: here

For this upcoming week, we are going to work on the following:

Showing the ratio of LSE:minLSE up to four decimal points on all future powerpoints
Adding the number of genes and edges to each network that we describe
Include two slides with all 5 networks visualized unweighted and weighted networks on them
Use degree distribution via Brandon's R script
We will also revisit the dCIN5 network provided by KJackson last year and the wt network

November 2016

November 2, 2016

Updated the powerpoint to reflect the comments made from Monday's meeting. These changes include:

Round the values of each parameter to four decimal places
Including the number of nodes and edges for each network on the summary page of the respective strain
Including two slides with the unweighted and weighted networks visualized for all five strains

To ensure that the values for wt agreed with each other. I ran the wt network again through GRNmap. The results will be posted on a different GRNmap Testing page,here

When I ran the model today, I got all the figures. However, none of the output files were generated or saved into the folder with the original data. I will create an issue for this on Github. I will link it [|here].

November 9, 2016

I missed Monday's meeting because I had a medical school interview at Washington University School of Medicine in St. Louis.

Today, I finished my analysis (where applicable) of the first wt run (10/30/2016). I commented on fit of genes and which had bigger dynamics. I have yet to comment on the latter two analysis. I also created a file that compares the two runs of GRNmap with the wild-type GRN. There were no differences found between the two files for the following computations:

Penalty term
LSE
min LSE
iteration count
optimized network weights
optimized production rates
optimized threshold b, and
min LSE for each of the genes for the specific run

The file can be found here: Media:NEW_Comparison_wt_runs_20161103.xlsx‎

I am now running GRNmap on the paradoxus computer with the wild-type data to compare the results to the earlier results. The GRNmap testing page can be found here.

November 16, 2016

I reran the dCIN5 network that Dr. Dahlquist generated. I created the input sheets and had Brandon look over them before running them. Two networks were created, the first consists of 17 genes in the regulatory network with 32 edges. The second network has 14 genes and 25 edges. The second network does not contain ZAP1 because ZAP1 and ACE2 were connected to the network by one edge to MCM1. Once removed MCM1 was removed from the network, ZAP1 also disappeared.

November 21, 2016

During the lab meeting, I presented the results from the dCIN5 model runs.

The powerpoint can be found [[Media:|here]].

Also note that the degree distribution charts were not on the powerpoint because I forgot how to use Brandon's script for R. I tried for about an hour, and finally decided that I would wait and ask Brandon during our lab meeting.

November 23, 2016

Home for the Thanksgiving Holiday!!

November 28, 2016

Was not in the lab meeting due to a medical school interview at UNC Chapel Hill.

November 30, 2016

Discussed what the focus for the remainder of the lab should be with Dr. Dahlquist

Discussion of the model runs thus far
Read GRNmap and GRNSight discussions to think about the biology of our investigations and compose our own conclusions for this semester
From these discussion and analysis, I should also consider which network (not trying to model all 6 networks) should be the basis for our random network runs next semester
Think about next semester's thesis project and how the discussion(s) will aid in next year's work and conclusion to my time in this lab

More to come