# User:Timothee Flutre/Notebook/Postdoc/2011/12/14

< User:Timothee Flutre‎ | Notebook‎ | Postdoc‎ | 2011‎ | 12
• Data: we have N observations, noted $\displaystyle X = (x_1, x_2, ..., x_N)$ . For the moment, we suppose that each observation $\displaystyle x_i$ is univariate, ie. each corresponds to only one number.
• Hypotheses and aim: let's assume that the data are heterogeneous and that they can be partitioned into $\displaystyle K$ clusters (see examples above). This means that a subset of the observations come from cluster $\displaystyle k=1$ , another subset come from cluster $\displaystyle k=2$ , and so on.
• Model: technically, we say that the observations were generated by a family of density functions. The density of all the observations is thus a mixture of densities, one per cluster. In our case, we will assume that each cluster $\displaystyle k$ corresponds to a Normal distribution of mean $\displaystyle \mu_k$ and standard deviation $\displaystyle \sigma_k$ . Moreover, as we don't know for sure from which cluster a given observation comes from, we define the mixture probability $\displaystyle w_k$ to be the probability that any given observation comes from cluster $\displaystyle k$ . As a result, we have the following list of parameters: $\displaystyle \theta=(w_1,...,w_K,\mu_1,...\mu_K,\sigma_1,...,\sigma_K$ . Finally, for a given observation $\displaystyle x_i$ , we can write the model $\displaystyle f(x_i/\theta) = \sum_{k=1}^{K} w_k g(x_i/\mu_k,\sigma_k)$ , wth $\displaystyle g$ being the Normal distribution $\displaystyle g(x_i/\mu_k,\sigma_k) = \frac{1}{\sqrt{2\pi} \sigma_k} \exp^{-\frac{1}{2}(\frac{x_i - \mu_k}{\sigma_k})^2}$