User:Timothee Flutre/Notebook/Postdoc/2011/12/14
From OpenWetWare
(Difference between revisions)
(→Learn about mixture models and the EM algorithm: add ref tuto Tomasi) 
m (→Learn about mixture models and the EM algorithm) 

Line 23:  Line 23:  
<math>l(\theta) = \sum_{i=1}^N log(f(x_i/\theta)) = \sum_{i=1}^N log( \sum_{k=1}^{K} w_k \frac{1}{\sqrt{2\pi} \sigma_k} \exp^{\frac{1}{2}(\frac{x_i  \mu_k}{\sigma_k})^2})</math>  <math>l(\theta) = \sum_{i=1}^N log(f(x_i/\theta)) = \sum_{i=1}^N log( \sum_{k=1}^{K} w_k \frac{1}{\sqrt{2\pi} \sigma_k} \exp^{\frac{1}{2}(\frac{x_i  \mu_k}{\sigma_k})^2})</math>  
  * '''Latent variables''': here it's worth noting that, although everything seems fine, a big information is lacking, we aim at finding the parameters defining the mixture but we don't know from which cluster each observation is coming... That's why we need to introduce the following N latent variables <math>Z_1,...,Z_i,...,Z_N</math>, one for each observation, such that <math>Z_i=k</math> means that <math>x_i</math> belongs to cluster <math>k</math>. Thanks to this, we can reinterpret the mixing probabilities: <math>w_k = P(Z_i=k/\theta)</math>. Moreover, we can now define the membership probabilities, one for each observation: <math>P(Z_i=k/x_i,\theta) = \frac{w_k g(x_i/\mu_k,\sigma_k)}{\sum_{l=1}^K w_l g(x_i/\mu_l,\sigma_l)}</math>. We will note these membership probabilities <math>p(k/i)</math> as they will have a big role in the EM algorithm below. Indeed, we don't know the values taken by the latent variables, so we will have to infer their probabilities from the data.  +  * '''Latent variables''': here it's worth noting that, although everything seems fine, a big information is lacking, we aim at finding the parameters defining the mixture but we don't know from which cluster each observation is coming... That's why we need to introduce the following N latent variables <math>Z_1,...,Z_i,...,Z_N</math>, one for each observation, such that <math>Z_i=k</math> means that <math>x_i</math> belongs to cluster <math>k</math>. Thanks to this, we can reinterpret the mixing probabilities: <math>w_k = P(Z_i=k/\theta)</math>. Moreover, we can now define the membership probabilities, one for each observation: <math>P(Z_i=k/x_i,\theta) = \frac{w_k g(x_i/\mu_k,\sigma_k)}{\sum_{l=1}^K w_l g(x_i/\mu_l,\sigma_l)}</math>. We will note these membership probabilities <math>p(k/i)</math> as they will have a big role in the EM algorithm below. Indeed, we don't know the values taken by the latent variables, so we will have to infer their probabilities from the data. Introducing the latent variables corresponds to what is called the "missing data formulation" of the mixture problem. 
* '''Technical details''': a few important rules are required, but only from a highschool level in maths (see [http://en.wikipedia.org/wiki/Differentiation_%28mathematics%29#Rules_for_finding_the_derivative here]). Let's start by finding the maximumlikelihood estimates of the mean of each cluster:  * '''Technical details''': a few important rules are required, but only from a highschool level in maths (see [http://en.wikipedia.org/wiki/Differentiation_%28mathematics%29#Rules_for_finding_the_derivative here]). Let's start by finding the maximumlikelihood estimates of the mean of each cluster: 
Revision as of 17:42, 2 January 2012
Project name  Main project page Previous entry Next entry 
Learn about mixture models and the EM algorithm(Caution, this is my own quickanddirty tutorial, see the references at the end for presentations by professional statisticians.)
As we derive with respect to μ_{k}, all the others means μ_{l} with are constant, and thus disappear:
And finally:
Once we put all together, we end up with:
By convention, we note the maximumlikelihood estimate of μ_{k}:
Therefore, we finally obtain:
By doing the same kind of algebra, we derive the loglikelihood w.r.t. σ_{k}:
And then we obtain the ML estimates for the standard deviation of each cluster:
The partial derivative of l(θ) w.r.t. w_{k} is tricky. ... <TO DO> ...
Finally, here are the ML estimates for the mixture proportions:
#' Generate univariate observations from a mixture of Normals #' #' @param K number of components #' @param N number of observations GetUnivariateSimulatedData < function(K=2, N=100){ mus < seq(0, 6*(K1), 6) sigmas < runif(n=K, min=0.5, max=1.5) tmp < floor(rnorm(n=K1, mean=floor(N/K), sd=5)) ns < c(tmp, N  sum(tmp)) clusters < as.factor(matrix(unlist(lapply(1:K, function(k){rep(k, ns[k])})), ncol=1)) obs < matrix(unlist(lapply(1:K, function(k){ rnorm(n=ns[k], mean=mus[k], sd=sigmas[k]) }))) new.order < sample(1:N, N) obs < obs[new.order] rownames(obs) < NULL clusters < clusters[new.order] return(list(obs=obs, clusters=clusters, mus=mus, sigmas=sigmas, mix.probas=ns/N)) }
#' Return probas of latent variables given data and parameters from previous iteration #' #' @param data Nx1 vector of observations #' @param params list which components are mus, sigmas and mix.probas Estep < function(data, params){ GetMembershipProbas(data, params$mus, params$sigmas, params$mix.probas) } #' Return the membership probabilities P(zi=k/xi,theta) #' #' @param data Nx1 vector of observations #' @param mus Kx1 vector of means #' @param sigmas Kx1 vector of std deviations #' @param mix.probas Kx1 vector of mixing probas P(zi=k/theta) #' @return NxK matrix of membership probas GetMembershipProbas < function(data, mus, sigmas, mix.probas){ N < length(data) K < length(mus) tmp < matrix(unlist(lapply(1:N, function(i){ x < data[i] norm.const < sum(unlist(Map(function(mu, sigma, mix.proba){ mix.proba * GetUnivariateNormalDensity(x, mu, sigma)}, mus, sigmas, mix.probas))) unlist(Map(function(mu, sigma, mix.proba){ mix.proba * GetUnivariateNormalDensity(x, mu, sigma) / norm.const }, mus[K], sigmas[K], mix.probas[K])) })), ncol=K1, byrow=TRUE) membership.probas < cbind(tmp, apply(tmp, 1, function(x){1  sum(x)})) names(membership.probas) < NULL return(membership.probas) } #' Univariate Normal density GetUnivariateNormalDensity < function(x, mu, sigma){ return( 1/(sigma * sqrt(2*pi)) * exp(1/(2*sigma^2)*(xmu)^2) ) }
