How gaussian mixture models work?

Question

I am given an example:

Suppose 1000 observations are drawn from $N(0,1)$ and $N(5,2)$ with mixing parameters $\pi_{1}=0.2$ and $\pi_{2}=0.8$ respectively. Suppose we only know $\sigma$ and want to estimate $\mu$ and $\pi$. How does one go about using Gaussian Mixture models to estimate these parameters? I know I have to use the EM algorithm but I do not know where to start. I want to use this simple example to get a better understanding of how it works.

They are the population mean and standard deviation of the normal distributions from which the points were generated.Once the points are generated, the idea is to estimate $\mu$ and $\pi$ given you know $\sigma$
But the distributions $N(0,1)$ and $N(5,2)$ and the mixing distribution $\pi$ fully determine these, don't they? So, in the end, what exactly are you supposed to know?
The point is to generate the points using those parameters, and then estimate them from the data alone to see how well they match with the true parameters. For example, using MATLAB or Python, you can generate these data, and the idea is to write an algorithm that estimates these parameters given you only know $\sigma$
OK. But somehow you know $\sigma$, right? Which seems strange...

guy · Accepted Answer · 2013-04-30 22:22:10Z

The canonical thing to do would be to add latent variables $C_i, i = 1, 2, ..., n$. Letting $X_1, X_2, ..., X_n$, $n = 1000$ be your data, you would set $X_i | C_i = 1 \sim N(0, 1)$ and $X_i | C_i = 0 \sim N(5, 2)$ with $C_i = 1$ with probability $\pi = 0.2$ and $C_i = 0$ with probability $1 - \pi$.

The likelihood after augmenting becomes $$L = \prod_{i=1}^n [\pi N(x_i | \mu_1, 1)]^{C_i} [(1 - \pi) N(x_i | \mu_2, 2)]^{1 - C_i},$$ and the log-likelihood becomes $$\ell = \sum_{i = 1}^n C_i[\log(\pi) + \log(N(x_i | \mu_1, 1))] + (1 - C_i) [\log(1 - \pi) + \log(N(x_i | \mu_2, 2)].$$

We don't know the values of the $C_i$ so we can't maximize this directly; EM works by replacing this maximization problem with maximizing an expected log-likelihood. To apply EM, we first initialize $\mu_1, \mu_2$, and $\pi$ to $\mu_1^{(0}, \mu_2^{(0)}, \pi^{(0)}$. Next, we take the expectation of $\ell$ with respect to the conditional distribution of $[C_1, ..., C_n | X_1, ..., X_n]$ evaluated at the values of the parameters we just initialized. With a little effort it can be shown that $E[C_i | X_1, ..., X_n]$ under this distribution is $$\rho_i^{(0)} = \frac{\pi^{(0)} N(x_i | \mu_1^{(0)}, 1)}{\pi^{(0)} N(x_i |\mu_1^{(0)}, 1) + (1 - \pi^{(0)}) N(x_i | \mu_2^{(0)}, 2)}.$$ Plugging this into $E[\ell | X_1, X_2, ..., X_n]$ gives $$E[\ell | X_1, X_2, ..., X_n] = \sum_{i = 1} ^ n \rho_i^{(0)}[\log(\pi) + \log(N(x_i | \mu_1, 1))] + (1 - \rho_i^{(0)}) [\log(1 - \pi) + \log(N(x_i | \mu_2, 2).$$ Now we take the gradient of this expression in $\mu_1, \mu_2, \pi$, set it equal to $0$, and solve for $\mu_1, \mu_2, \pi$ in terms of $x_i$ and $\rho_i^{(0)}$. The solutions to this equation furnish $\mu_1^{(1)}$, $\mu_2^{(1)}$, and $\pi^{(1)}$, and we start back over with these new values. Iterate until convergence.

In the language of the EM-algorithm, the following sets can be broken into an E-step and M-step. In the E-step of iteration $k+1$st, we calculate $\rho_i^{(k)}$ for $i = 1, 2, ..., n$ in terms of $\mu_1^{(k)}$, $\mu_2^{(k)}$, and $\pi^{(k)}$. In the M-step we set the derivative of the expected log-likelihood to $0$ and solve, which turns out to give $$\pi^{(k+1)} = \frac{\sum \rho_i^{(k)}}{n},$$ $$\mu_1 ^{(k+1)} = \frac{\sum_{i = 1}^n \rho_i^{(k)} x_i}{\sum_{i = 1} ^ n \rho_i^{(k)}},$$ $$\mu_2 ^{(k+1)} = \frac{\sum_{i = 1}^n (1 - \rho_i^{(k)})x_i}{\sum_{i = 1} ^ n (1 - \rho_i^{(k)})}.$$ Intuitively, what is going on is that $\rho_i$ is something like the probability that $C_i = 1$ - occasionally you'll hear $\rho_i$ referred to as a responsibiity to a mixture component; so average these together to get the estimate of $\pi$, and take a weighted average of the $X_i$ according to their probability of having $C_i = 1$ to get $\mu_1$, and a similar story for $\mu_2$.

asked	14 days ago
viewed	44 times
active	14 days ago

How gaussian mixture models work?

1 Answer

Your Answer

Not the answer you're looking for? Browse other questions tagged probability machine-learning or ask your own question.

Community Bulletin

How gaussian mixture models work?

1 Answer

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged probability machine-learning or ask your own question.

Community Bulletin

Related