Drummond:Coupling: Difference between revisions

Revision as of 19:38, 13 July 2008

Prediction of probability of protein folding

Assume that folding is a binary outcome represented by the random variable [math]\displaystyle{ F }[/math]. Given some predictor [math]\displaystyle{ X }[/math] (such as mean pair probability, max % identity, # of missed pairs), we want to infer [math]\displaystyle{ Pr(F|X) }[/math]. We assume that there is a sigmoidal relationship between X and the probability of folding,

[math]\displaystyle{ \Pr(F|X) = p = 1/(1 + e^{aX + b}) }[/math]

where [math]\displaystyle{ a }[/math] and [math]\displaystyle{ b }[/math] quantify the steepness and position of the step function. This formulation is equivalent to assuming a linear relationship between the predictor [math]\displaystyle{ X }[/math] and the log-odds,

[math]\displaystyle{ aX + b = \ln {1-p \over p} }[/math].

We can write down a likelihood of the observed data, where [math]\displaystyle{ x_i }[/math] is the value of the predictor [math]\displaystyle{ X }[/math] for an actual protein

[math]\displaystyle{ L(F\|\{x_i\})\! }[/math]	[math]\displaystyle{ = \prod_{i \in \textrm{folded}} \Pr(F\|X=x_i) \prod_{j \in \textrm{unfolded}} (1 - \Pr(F\|X=x_j)) }[/math]
[math]\displaystyle{ \ln L(F\|\{x_i\})\! }[/math]	[math]\displaystyle{ = \sum_{i \in \textrm{folded}} \ln \Pr(F\|X=x_i) \prod_{j \in \textrm{unfolded}} \ln (1 - \Pr(F\|X=x_j)) }[/math]
[math]\displaystyle{ \ln L(F\|\{x_i\})\! }[/math]	[math]\displaystyle{ = -\sum_{i \in \textrm{all}} \ln (1 + e^{ax_i + b}) + \sum_{j \in \textrm{unfolded}} a x_j + b }[/math]

The parameters can then be fit by maximizing the log-likelihood function. The whole process is termed logistic regression.

Application to WW domains

Given only the Socolich et al. data, we can estimate the probability of folding given mean pair probability, max % identity, # of missed pairs. Specifically, we can estimate the curve

[math]\displaystyle{ f(x) = Pr(F|X=x)= 1/(1 + e^{aX + b})\! }[/math]

@@ Line 5: / Line 5: @@
 :<math>\Pr(F|X) = p = 1/(1 + e^{aX + b})</math>
-where <math>a</math> and <math>b</math> quantify the steepness and position of the step function.  If We can write down a likelihood of the observed data, where <math>x_i</math> is the value of the predictor <math>X</math> for an actual protein
+where <math>a</math> and <math>b</math> quantify the steepness and position of the step function.  This formulation is equivalent to assuming a linear relationship between the predictor <math>X</math> and the log-odds,
+:<math>aX + b = \ln {1-p \over p}</math>.
+We can write down a likelihood of the observed data, where <math>x_i</math> is the value of the predictor <math>X</math> for an actual protein
 :{|
@@ Line 22: / Line 26: @@
 ==Application to WW domains==
-Given only the Socolich et al. data, we can estimate the probability of folding given mean pair probability, max % identity, # of missed pairs.  First question: what's the best predictor?
+Given only the Socolich et al. data, we can estimate the probability of folding given mean pair probability, max % identity, # of missed pairs.  Specifically, we can estimate the curve
-:<math>
+:<math>f(x) = Pr(F|X=x)= 1/(1 + e^{aX + b})\!</math>
-Ax + b = \log \frac{1-p}{p}
-\!</math>
-can be fit by maximum likelihood.

[math]\displaystyle{ L(F\|\{x_i\})\! }[/math]	[math]\displaystyle{ = \prod_{i \in \textrm{folded}} \Pr(F\|X=x_i) \prod_{j \in \textrm{unfolded}} (1 - \Pr(F\|X=x_j)) }[/math]
[math]\displaystyle{ \ln L(F\|\{x_i\})\! }[/math]	[math]\displaystyle{ = \sum_{i \in \textrm{folded}} \ln \Pr(F\|X=x_i) \prod_{j \in \textrm{unfolded}} \ln (1 - \Pr(F\|X=x_j)) }[/math]
[math]\displaystyle{ \ln L(F\|\{x_i\})\! }[/math]	[math]\displaystyle{ = -\sum_{i \in \textrm{all}} \ln (1 + e^{ax_i + b}) + \sum_{j \in \textrm{unfolded}} a x_j + b }[/math]

Drummond:Coupling: Difference between revisions

Revision as of 19:38, 13 July 2008

Prediction of probability of protein folding

Application to WW domains

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

research

Tools