Parameter Estimation Discussion Feb 05
Below is a pdf where I give a trivial, but probably useful, connection between two big problems that were mentioned at the retreat:
- What's up with estimating parameters from time courses of observations of concentrations (say from quantitative Western blot)?
- What's up with estimating parameters by applying time-dependent stimuli (doses) to the yeast and measuring their responses? (As Ty did in simulation.)
This is not rocket science; I just realized it over morning coffee, so have no math-phobic qualms. I hope it proves useful in thinking about these problems.
- What you summarized in the pdf is what's in E-Cell's toolbox. Numerically this works nicely in many cases, and we should give it a try. But let me add one more thing to the three points you wrote for me at the end of your report; if the number of unknowns (unknown rate coefficients plus unknown initial concentrations) exceed, say, 3-5, it becomes extremely hard to make well-grounded scientific arguments on it. It is practically impossible to provide sufficient amount of dataset to constrain the system to converge to a single solution, plus at this stage of the project we cannot be 100% sure about if the structure of the simulated model is identical to the real system. As a result, it is highly likely that fitting error would be distributed over all parameters to be estimated. This is fine for many engineering problems, in which the purpose is to make the system to behave as we want. But the point here is we need to be extremely careful when we feedback the result of parameter estimation to our model. It may be safer to conceive it as more like a means of analysis, from which we could get some indirect info on how the system works, rather than a means of determining parameters in our model.
- Another class of approach you pointed out, and I used to pursue in E-Cell project, is time series analysis which makes use of statistical means rather than error minimization. Such as linear filters, auto-regressive models, and Karman filters. As you pointed out, what we can do with linear filters is limited when the system is nonlinear. But, I think (a) we can get some info with linear analysis, and (b) there are some non-linear analysis methods that might work, with which my expericne is limited.
- The situation that this kind of things works best is when we have highly fluctuating signals. However, I think it might be worth trying to apply these methods to Ty's data IF the cell responds in the same frequency domain as that of the time-varying stimuli given, or if we can give the stimuli at the frequency domain of that level. Otherwise, usefulness of most of time-series analysis techniques that I know of would be less likely.
I had been intending, before suggesting that, to make a stab at trying to pull together some of the interesting threads that have been started here, which are in addition to whatever Ty has been able to do.
- First, here, we have fairly clearly different sets of "easy mutual intelligibility groups". Larry, or Larry and Joyce, comprise one such group. Rich and Andrew, _I believe_, comprise another. Koichi may comprise a third. Ty may represent a fourth.
- Not all things thought to be obvious by one group of people may be understood at all by other groups.
- For examples, neither Koichi nor Nathan (to pick on only two people) gave evidence of being aware of the nature of Ty's simulated data... randomly varying spike pulses. Nathan (to pick on Nathan) was unaware that one might get from a model that did not explicitly incorporate space. Some people have told me (to pick on me) the issue of estimating rates of reactions from time dependent output is no different from the general optimization problems associated with "parameter" estimation, while in my own mind it's not clear to me that the fact that we have starting concentrations of monomeric protein species, and "system architecture", might not somehow constrain or simplify the quest for rates. Some discussions of parameter estimation being well known and doable seem to only hold for linear systems, while I (to pick on me again) remains unclear on which approaches might hold for non-linear systems, which might only apply to linear systems, whether we might not be able to "linearize" subsets of our living systems to make them more tractable, etc.
- So at this point I still think we have probably the hardest cross-cultural problem we have ever addressed (at least, the one with the largest number of different "ethnic groups" to try to communicate across) and are far from where we want to be.
- One suggestion is to focus on a question or two, and one of those would have to be estimation, constraint, delimitation, whatever, of rates of intracellular biochemical reactions from output in response to time variant input. Not "parameters" but rates. Not "periodic time variant input" but rather "time variant input". If you want to add to rates "numbers of individual heteroligomeric molecular species" you can. But we can't be saying "system identification" or "system architecture" either. Each of the words in the Summer 2004 challenge problem matters.
- This discussion, as Drew pointed out, might be appropriate for testing wiki's. I don't want to get bogged down in format, though. In the meantime I hope people will continue to communicate pre-meeting.
Comment on Roger's email by Koichi
Koichi's questions to Ty
By executive order, the meeting will be at 2:00pm (PST) on Monday February 6th.
Thanks to all those who tried something new with the wiki. Maybe next time.
Below are the proposed meeting times. Please indicate which times you would be able to attend.