William A. C. Gendron Week 13

From OpenWetWare
Jump to: navigation, search

Create the Input Excel Workbook for the Model

  1. I used a file given to me by my professor as the basis for my model: "Input_4_gene_forward_correct_params.xlsx". I then proceeded to fill in my data for this work. I also renamed it to avoid confusion and to properly label it for uploading. (Here is is!!!)
  2. This assignment is based off of information from the previous Week 12 Assignment. Using my selected network from my previous work, I copied the transposed matrix for use. Before I copied it, I made sure it was in alphabetical order. This was difficult using LibreOffice, but I have heard it is simpler in Excel.
    • I pasted the matrix into the sections "network" and "network weighted".
    • I also made sure that all of the labels had the"p"s removed and they are capitalized. Otherwise the program may have issues with it.
  3. I then modified the "degradation" tab.
    • I pasted my list of transcription factors under the column named "StandardName". I then looked up the proper names of my genes, which would go under "SystematicName". I went to Yeastract to match the names by pasting the transcription factors into the proper field. This links to the site where this can be done: here.
    • I then used data from a previous study to match the genes to degradation rates. This data is a compilation from a previous study.
    • When the data did not have a value for a transcription factor, "0.027182242" was input as the degradation rate.
  4. I then modified the tab "production rates".
    • I then filled in the columns "SystematicName" and "StandardName" just as I did "degradation".
    • The professor told us to assume that the "production rate" was double the "degradation rate" so that is what was input. I imagine this can be fine tuned later.
  5. The data for the mutant(dcin5 in my case) and the wild type were then input into my worksheet.
    • The wild type data was put into the tab "wt" so that it matches the gene. The same was done for dCIN5 in my case and it was labeled as such.
    • As seen previously, the StandardName and SystematicName were labelled as seen previously.
    • I used the Log fold changes to fill in this assignment as used in the previous assignment: Week 11 Assignment. I only used the cold shock data so I compared time points "15", "30" and "60". Repeat the numbers for each replica of that time point.
    • I made sure to align the genes with which they matched. Use the find feature and copy paste each line to be faster.
  6. The second to last sheet that I modified was the "optimization_parameters" worksheet.
    • For the parameter "time" (Cell A13), I replaced the previous parameters with "15", "30", and "60", to match the time points in the data.
    • I was already doing dCIN5, so I did not have to replace that cell, but if you use another mutant change it.
  7. For the parameter "Deletion", I left the zero there because the wild type did not have deletions. In cell C15, I put a 5 to match where CIN5 is in my list to signify the deletion.
    • For the parameter "simtime", I put "0", "5", "10", ..., "60"(by 5s until 60).
  8. I finally modified "network_b".
    • As with the other sheets, I filled in the StandardNames, but this one does not have the systematic names.
    • Column B should have 0 for all of the values.
  9. When I was done with this I uploaded to LionShare and sent it to my professors for review.

The Professor's Appendix: If you are curious about the terms which were used

  • alpha: Penalty term weighting (from an L-curve analysis)
  • kk_max: Number of times to re-run the optimization loop: in some cases re-starting the optimization loop can improve performance of the estimation.
  • MaxIter: Number of times MATLAB iterates through the optimization scheme. If this is set too low, MATLAB will stop before the parameters are optimized.
  • TolFun: How different two least squares evaluations should be before it says it's not making any improvement
  • MaxFunEval: maximum number of times it will evaluate the least squares cost
  • TolX: How close successive least squares cost evaluations should be before MATLAB determines that it is not making any improvement.
  • Sigmoid: =1 if sigmoidal model, =0 if Michaelis-Menten model
  • iestimate: =1 if want to estimate parameters and =0 if the user wants to do just one forward run
  • iGraphs: =1 to output graphs; =0 to not output graphs
  • fix_P: =1 if the user does not want to estimate the production rate, P, parameter, use initial guess and never change; =0 to estimate
  • fix_b: =1 if the user does not want to estimate the b parameter, use initial guess and never change; =0 to estimate
  • time: A row containing a list of the time points when the data was collected experimentally. Should correspond to the timepoint column headers in the expression sheets.
  • Strain: A row containing a list of all of the strains for which there is expression data in the workbook. Should correspond to the names of the sheets for each strain.
  • Sheet: A row where each entry is the order number of the sheet (left to right) that corresponds to the list of strains above.
  • Deletion: Gives the index of the gene in the network sheet that has been deleted in each strain listed above. For example, if data has been provided for the CIN5 deletion strain, then give the index number from the network sheet corresponding to CIN5.
  • simtime: A list of times for which the forward simulation should be evaluated.