The canonical thing to do would be to add latent variables $C_i, i =
1, 2, ..., n$. Letting $X_1, X_2, ..., X_n$, $n = 1000$ be your data,
you would set $X_i | C_i = 1 \sim N(0, 1)$ and $X_i | C_i = 0 \sim
N(5, 2)$ with $C_i = 1$ with probability $\pi = 0.2$ and $C_i = 0$ with
probability $1 - \pi$.
The likelihood after augmenting becomes $$L = \prod_{i=1}^n [\pi N(x_i |
\mu_1, 1)]^{C_i} [(1 - \pi) N(x_i | \mu_2, 2)]^{1 - C_i},$$ and the
log-likelihood becomes $$\ell = \sum_{i = 1}^n C_i[\log(\pi) +
\log(N(x_i | \mu_1, 1))] + (1 - C_i) [\log(1 - \pi) + \log(N(x_i |
\mu_2, 2)].$$
We don't know the values of the $C_i$ so we can't maximize this directly; EM works by replacing this maximization problem with maximizing an expected log-likelihood. To apply EM, we first initialize $\mu_1, \mu_2$, and $\pi$ to
$\mu_1^{(0}, \mu_2^{(0)}, \pi^{(0)}$. Next, we take the expectation of
$\ell$ with respect to the conditional distribution of $[C_1, ..., C_n
| X_1, ..., X_n]$ evaluated at the values of the parameters we just
initialized. With a little effort it can be shown that $E[C_i | X_1,
..., X_n]$ under
this distribution is $$\rho_i^{(0)} = \frac{\pi^{(0)} N(x_i |
\mu_1^{(0)}, 1)}{\pi^{(0)} N(x_i |\mu_1^{(0)}, 1) + (1 - \pi^{(0)}) N(x_i |
\mu_2^{(0)}, 2)}.$$ Plugging this into $E[\ell | X_1, X_2, ..., X_n]$
gives $$E[\ell | X_1, X_2, ..., X_n] = \sum_{i = 1} ^ n \rho_i^{(0)}[\log(\pi) +
\log(N(x_i | \mu_1, 1))] + (1 - \rho_i^{(0)}) [\log(1 - \pi) + \log(N(x_i |
\mu_2, 2).$$
Now we take the gradient of this expression in $\mu_1, \mu_2, \pi$,
set it equal to $0$, and solve for $\mu_1, \mu_2, \pi$ in terms of
$x_i$ and $\rho_i^{(0)}$. The solutions to this equation furnish
$\mu_1^{(1)}$, $\mu_2^{(1)}$, and $\pi^{(1)}$, and we start back over
with these new values. Iterate until convergence.
In the language of the EM-algorithm, the following sets can be broken
into an E-step and M-step. In the E-step of iteration $k+1$st, we
calculate $\rho_i^{(k)}$ for $i = 1, 2, ..., n$ in terms of
$\mu_1^{(k)}$, $\mu_2^{(k)}$, and $\pi^{(k)}$. In the M-step we set
the derivative of the expected log-likelihood to $0$ and solve, which
turns out to give $$\pi^{(k+1)} = \frac{\sum \rho_i^{(k)}}{n},$$
$$\mu_1 ^{(k+1)}
= \frac{\sum_{i = 1}^n \rho_i^{(k)} x_i}{\sum_{i = 1} ^ n
\rho_i^{(k)}},$$ $$\mu_2 ^{(k+1)} = \frac{\sum_{i = 1}^n (1 -
\rho_i^{(k)})x_i}{\sum_{i = 1} ^ n (1 - \rho_i^{(k)})}.$$ Intuitively,
what is going on is that $\rho_i$ is something like the probability
that $C_i = 1$ - occasionally you'll hear $\rho_i$ referred to as a responsibiity to a mixture component; so average these together to get the estimate of
$\pi$, and take a weighted average of the $X_i$ according to their
probability of having $C_i = 1$ to get $\mu_1$, and a similar story for $\mu_2$.