Tessa A. Morris Week 13

From OpenWetWare
Jump to navigationJump to search

Electronic Lab Notebook






Alyssa N Gomes


The purpose of this week's experiment is to create the input excel workbook for the model of our network.


Create the Input Excel Workbook for the Model

  1. You should download Input_4_gene_forward_correct_params.xlsx and change the name (TM_AG_expression_data_params)
  2. To determine the transcription factors we're including in our network, use the "transposed" Regulation Matrix that you generated from YEASTRACT in the Week 12 Assignment. We are using "Only DNA Binding" because it was determined in Week 12 to be the most useful model
    • Put the transcription factors in alphabetical order (using the sort feature in Excel), but whether you leave your list the same as it is from the YEASTRACT assignment or in alphabetical order, make sure it is the same order for all of the worksheets
      • You can only sort by column so sort, transpose, then sort, and transpose again.
  3. The next worksheet to edit is the one called "degradation_rates".
    • Paste your list of transcription factors from your "network" sheet into the column named "StandardName"
    • Look up the "SystematicName" using YEASTRACT.
    • Look up the degradation rates for the list of transcription factors. These rates have been calculated from protein half-life data from a paper by Belle et al. (2006). Look up the rates for your transcription factors from this file and include them in your "degradation_rates" worksheet.
    • If a transcription factor does not appear in the file above, use the value "0.027182242" for the degradation rate.
  4. The next worksheet to edit is the one called "production_rates".
    • Paste the "SystematicName" and "StandardName" columns rom your "degradation_rates" sheet into the "production_rates" sheet.
    • The initial guesses for the production rates we are using for the model are two times the degradation rate. Compute these values from your degradation rates and paste the values into the column titled "ProductionRate".
  5. Next you will input the expression data for the wild type strain and the other strain your partner is using (dcin5, dgln3, dhmo1, dzap1, spar). You need to include only the data for the genes in your network, in the same order as they appear in the other worksheets.
    • Put the wild type data in the sheet called "wt".
    • The sample spreadsheet has a worksheet named "dcin5". Change this to "dgln3."
    • Paste the SystematicName and StandardName columns from one of your previous sheets into this one.
    • This data in this sheet is the Log Fold Changes for each replicate and each timepoint from your Week 11 Assignment. We are only going to use the cold shock timepoints for the modeling. Thus your column headings for the data should be "15", "30", and "60". There will be multiple columns for each timepoint (typically 4) to represent the replicate data, but they will all have the same name. For example, you may have four columns with the header "15".
    • Copy and paste the data from your Week 11 spreadsheet into this one. You need to include only the data for the genes in your network. Make sure that the genes are in the same order as in the other sheets.
  6. We will only be editing parts of the "optimization_parameters" worksheet.
    • For the parameter "time" (Cell A13), replace what is in the sample file with "15", "30", and "60", since these are the timepoints we have in our data.
    • For the parameter "Strain" (Cell A14), replace "dcin5" with "dgln3".
  7. For the parameter "Deletion", leave the zero in cell B15. In cell C15, put a number corresponding to the position in the list of gene names that the gene that was deleted appears. In the sample file, CIN5 is number 3 in the list of 4 genes.
    • For the parameter "simtime", you perform the forward simulation of the expression in five minute increments from 0 to 60 minutes. Thus, this row should read: "simtime", "0", "5", "10", ..., "60".
  8. The last sheet you will need to modify is called "network_b".
    • Paste in the list of standard names for your transcription factors from one of your previous sheets. Note that this sheet does not have a column for the systematic name.
    • Cell A1 in the sample files has the text "rows genes affected/cols genes controlling". I believe you can either have this text in cell A1 or "StandardName".
    • The "threshold" value for each gene should be "0".
  9. When you have completed the modifications to your file, upload it to LionShare and send Dr. Dahlquist and Dr. Fitzpatrick and e-mail with a link to the file. Your assignment will not be considered complete until we have successfully downloaded the correct file from you. If you need assistance with LionShare, please ask well ahead of the assignment deadline.

Appendix: Full explanation of the "optimization_parameters" sheet

  • alpha: Penalty term weighting (from an L-curve analysis)
  • kk_max: Number of times to re-run the optimization loop: in some cases re-starting the optimization loop can improve performance of the estimation.
  • MaxIter: Number of times MATLAB iterates through the optimization scheme. If this is set too low, MATLAB will stop before the parameters are optimized.
  • TolFun: How different two least squares evaluations should be before it says it's not making any improvement
  • MaxFunEval: maximum number of times it will evaluate the least squares cost
  • TolX: How close successive least squares cost evaluations should be before MATLAB determines that it is not making any improvement.
  • Sigmoid: =1 if sigmoidal model, =0 if Michaelis-Menten model
  • iestimate: =1 if want to estimate parameters and =0 if the user wants to do just one forward run
  • iGraphs: =1 to output graphs; =0 to not output graphs
  • fix_P: =1 if the user does not want to estimate the production rate, P, parameter, use initial guess and never change; =0 to estimate
  • fix_b: =1 if the user does not want to estimate the b parameter, use initial guess and never change; =0 to estimate
  • time: A row containing a list of the time points when the data was collected experimentally. Should correspond to the timepoint column headers in the expression sheets.
  • Strain: A row containing a list of all of the strains for which there is expression data in the workbook. Should correspond to the names of the sheets for each strain.
  • Sheet: A row where each entry is the order number of the sheet (left to right) that corresponds to the list of strains above.
  • Deletion: Gives the index of the gene in the network sheet that has been deleted in each strain listed above. For example, if data has been provided for the CIN5 deletion strain, then give the index number from the network sheet corresponding to CIN5.
  • simtime: A list of times for which the forward simulation should be evaluated.

Data & Observations:

  • When trying to look up the degradation rates a few of them were not listed on the excel document (ASG1, CYC8, HMO1, RIF1, SNF5, SNF6) /
  • On 4/21/2015 Dr. Dahlquist pointed out an error with the optimization parameters. For sheet, the dgln3 was the third out of four sheets so in the columns next to "sheet" it should say 3 then 4. For the "deletion" dgln3 was the 7th deleted, thus it should say 0 then 7 next to "deletion"
    • These changes were made and the updated file was uploaded to lionshare.
  • This model was run on Matlab on 4/22/2015. More detailed information here.

Biomathematical Modeling Navigation

User Page: Tessa A. Morris
Course Page: Biomathematical Modeling