User:Timothee Flutre/Notebook/Postdoc/2011/12/14
From OpenWetWare
Main project page Previous entry Next entry
| |
Learn about mixture models and the EM algorithm(Caution, this is my own quick-and-dirty tutorial, see the references at the end for presentations by professional statisticians.)
The constraints are:
As usual, it's easier to deal with the log-likelihood:
Let's take the derivative with respect to one parameter, eg. θl:
This shows that maximizing the likelihood of a mixture model is like doing a weighted likelihood maximization. However, these weights depend on the parameters we want to estimate! That's why we now switch to the missing-data formulation of the mixture model.
The observed-data likelihood (also called sometimes "incomplete" or "marginal", even though these appellations are misnomers) is still written the same way:
But now we can also write the augmented-data likelihood, assuming all observations are independent conditionally on their membership:
And here is the augmented-data log-likelihood (useful in the M step of the EM algorithm, see below):
In terms of graphical model, the Gaussian mixture model described here can be represented like this.
Here is the observed-data log-likelihood:
First we introduce the hidden variables by integrating them out:
Then, we use any probability distribution q on these hidden variables (in fact, we use a distinct distribution
And here is the great trick, as explained by Beal: "any probability distribution over the hidden variables gives rise to a lower bound on lobs". This is due to to the Jensen inequality (the logarithm is concave):
At each iteration, the E step maximizes the lower bound (
The E-step amounts to inferring the posterior distribution of the hidden variables
Indeed, the
Then, at the M step, we use these statistics to maximize the new lower bound
As a result, the E step may not always lead to a tight bound.
As usual, to find the maximum, we derive and equal to zero:
Now, to find the multiplier, we go back to the constraint:
Finally:
Finally:
Finally:
#' Generate univariate observations from a mixture of Normals
#'
#' @param K number of components
#' @param N number of observations
#' @param gap difference between all component means
GetUnivariateSimulatedData <- function(K=2, N=100, gap=6){
mus <- seq(0, gap*(K-1), gap)
sigmas <- runif(n=K, min=0.5, max=1.5)
tmp <- floor(rnorm(n=K-1, mean=floor(N/K), sd=5))
ns <- c(tmp, N - sum(tmp))
clusters <- as.factor(matrix(unlist(lapply(1:K, function(k){rep(k, ns[k])})),
ncol=1))
obs <- matrix(unlist(lapply(1:K, function(k){
rnorm(n=ns[k], mean=mus[k], sd=sigmas[k])
})))
new.order <- sample(1:N, N)
obs <- obs[new.order]
rownames(obs) <- NULL
clusters <- clusters[new.order]
return(list(obs=obs, clusters=clusters, mus=mus, sigmas=sigmas,
mix.weights=ns/N))
}
#' Return probas of latent variables given data and parameters from previous iteration
#'
#' @param data Nx1 vector of observations
#' @param params list which components are mus, sigmas and mix.weights
Estep <- function(data, params){
GetMembershipProbas(data, params$mus, params$sigmas, params$mix.weights)
}
#' Return the membership probabilities P(zi=k/xi,theta)
#'
#' @param data Nx1 vector of observations
#' @param mus Kx1 vector of means
#' @param sigmas Kx1 vector of std deviations
#' @param mix.weights Kx1 vector of mixture weights w_k=P(zi=k/theta)
#' @return NxK matrix of membership probas
GetMembershipProbas <- function(data, mus, sigmas, mix.weights){
N <- length(data)
K <- length(mus)
tmp <- matrix(unlist(lapply(1:N, function(i){
x <- data[i]
norm.const <- sum(unlist(Map(function(mu, sigma, mix.weight){
mix.weight * GetUnivariateNormalDensity(x, mu, sigma)}, mus, sigmas, mix.weights)))
unlist(Map(function(mu, sigma, mix.weight){
mix.weight * GetUnivariateNormalDensity(x, mu, sigma) / norm.const
}, mus[-K], sigmas[-K], mix.weights[-K]))
})), ncol=K-1, byrow=TRUE)
membership.probas <- cbind(tmp, apply(tmp, 1, function(x){1 - sum(x)}))
names(membership.probas) <- NULL
return(membership.probas)
}
#' Univariate Normal density
GetUnivariateNormalDensity <- function(x, mu, sigma){
return( 1/(sigma * sqrt(2*pi)) * exp(-1/(2*sigma^2)*(x-mu)^2) )
}
#' Return ML estimates of parameters
#'
#' @param data Nx1 vector of observations
#' @param params list which components are mus, sigmas and mix.weights
#' @param membership.probas NxK matrix with entry i,k being P(zi=k/xi,theta)
Mstep <- function(data, params, membership.probas){
params.new <- list()
sum.membership.probas <- apply(membership.probas, 2, sum)
params.new$mus <- GetMlEstimMeans(data, membership.probas,
sum.membership.probas)
params.new$sigmas <- GetMlEstimStdDevs(data, params.new$mus,
membership.probas,
sum.membership.probas)
params.new$mix.weights <- GetMlEstimMixWeights(data, membership.probas,
sum.membership.probas)
return(params.new)
}
#' Return ML estimates of the means (1 per cluster)
#'
#' @param data Nx1 vector of observations
#' @param membership.probas NxK matrix with entry i,k being P(zi=k/xi,theta)
#' @param sum.membership.probas Kx1 vector of sum per column of matrix above
#' @return Kx1 vector of means
GetMlEstimMeans <- function(data, membership.probas, sum.membership.probas){
K <- ncol(membership.probas)
sapply(1:K, function(k){
sum(unlist(Map("*", membership.probas[,k], data))) /
sum.membership.probas[k]
})
}
#' Return ML estimates of the std deviations (1 per cluster)
#'
#' @param data Nx1 vector of observations
#' @param membership.probas NxK matrix with entry i,k being P(zi=k/xi,theta)
#' @param sum.membership.probas Kx1 vector of sum per column of matrix above
#' @return Kx1 vector of std deviations
GetMlEstimStdDevs <- function(data, means, membership.probas,
sum.membership.probas){
K <- ncol(membership.probas)
sapply(1:K, function(k){
sqrt(sum(unlist(Map(function(p.ki, x.i){
p.ki * (x.i - means[k])^2
}, membership.probas[,k], data))) /
sum.membership.probas[k])
})
}
#' Return ML estimates of the mixture weights
#'
#' @param data Nx1 vector of observations
#' @param membership.probas NxK matrix with entry i,k being P(zi=k/xi,theta)
#' @param sum.membership.probas Kx1 vector of sum per column of matrix above
#' @return Kx1 vector of mixture weights
GetMlEstimMixWeights <- function(data, membership.probas,
sum.membership.probas){
K <- ncol(membership.probas)
sapply(1:K, function(k){
1/length(data) * sum.membership.probas[k]
})
}
GetLogLikelihood <- function(data, mus, sigmas, mix.weights){
loglik <- sum(sapply(data, function(x){
log(sum(unlist(Map(function(mu, sigma, mix.weight){
mix.weight * GetUnivariateNormalDensity(x, mu, sigma)
}, mus, sigmas, mix.weights))))
}))
return(loglik)
}
EMalgo <- function(data, params, threshold.convergence=10^(-2), nb.iter=10,
verbose=1){
logliks <- vector()
i <- 1
if(verbose > 0) cat(paste("iter ", i, "\n", sep=""))
membership.probas <- Estep(data, params)
params <- Mstep(data, params, membership.probas)
loglik <- GetLogLikelihood(data, params$mus, params$sigmas,
params$mix.weights)
logliks <- append(logliks, loglik)
while(i < nb.iter){
i <- i + 1
if(verbose > 0) cat(paste("iter ", i, "\n", sep=""))
membership.probas <- Estep(data, params)
params <- Mstep(data, params, membership.probas)
loglik <- GetLogLikelihood(data, params$mus, params$sigmas, params$mix.weights)
if(loglik < logliks[length(logliks)]){
msg <- paste("the log-likelihood is decreasing:", loglik, "<", logliks[length(logliks)])
stop(msg, call.=FALSE)
}
logliks <- append(logliks, loglik)
if(abs(logliks[i] - logliks[i-1]) <= threshold.convergence)
break
}
return(list(params=params, membership.probas=membership.probas, logliks=logliks, nb.iters=i))
}
## simulate data
K <- 3
N <- 300
simul <- GetUnivariateSimulatedData(K, N)
data <- simul$obs
## run the EM algorithm
params0 <- list(mus=runif(n=K, min=min(data), max=max(data)),
sigmas=rep(1, K),
mix.weights=rep(1/K, K))
res <- EMalgo(data, params0, 10^(-3), 1000, 1)
## check its convergence
plot(res$logliks, xlab="iterations", ylab="log-likelihood",
main="Convergence of the EM algorithm", type="b")
## plot the data along with the inferred densities
png("mixture_univar_em.png")
hist(data, breaks=30, freq=FALSE, col="grey", border="white", ylim=c(0,0.15),
main="Histogram of data overlaid with densities inferred by EM")
rx <- seq(from=min(data), to=max(data), by=0.1)
ds <- lapply(1:K, function(k){dnorm(x=rx, mean=res$params$mus[k], sd=res$params$sigmas[k])})
f <- sapply(1:length(rx), function(i){
res$params$mix.weights[1] * ds[[1]][i] + res$params$mix.weights[2] * ds[[2]][i] + res$params$mix.weights[3] * ds[[3]][i]
})
lines(rx, f, col="red", lwd=2)
dev.off()
It seems to work well, which was expected as the clusters are well separated from each other... The classification of each observation can be obtained via the following command:
## get the classification of the observations
memberships <- apply(res$membership.probas, 1, function(x){which(x > 0.5)})
table(memberships)
| |

and
. Moreover, we can now define the membership probabilities, one for each observation:
.
for each observation):
) with respect to the
:
given the current parameter
make the bound tight (the inequality becomes an equality):


