Can anyone provide a clear list of differences between log-linear regression and logistic regression? I understand the former is a simple linear regression model but I am not clear on when each should be used.
The name is a bit of a misnomer. Log-linear models were traditionally used for the analysis of data in a contingency table format. While "count data" need not necessarily follow a Poisson distribution, the log-linear model is actually just a Poisson regression model. Hence the "log" name (Poisson regression models contain a "log" link function). A "log transformed outcome variable" in a linear regression model is not a log-linear model, (neither is an exponentiated outcome variable, as "log-linear" would suggest). Both log-linear models and logistic regressions are examples of generalized linear models, in which the relationship between a linear predictor (such as log-odds or log-rates) is linear in the model variables. They are not "simple linear regression models" (or models using the usual $E[Y|X] = a + bX$ format). Despite all that, it's possible to obtain equivalent inference on associations between categorical variables using logistic regression and poisson regression. It's just that in the poisson model, the outcome variables are treated like covariates. Interestingly, you can set up some models that borrow information across groups in a way much similar to a proportional odds model, but this is not well understood and rarely used. Examples of obtaining equivalent inference in logistic and poisson regression models using R illustrated below:
Interesting, lack of association between $y$ and $x$ means the odds ratio is 1 in the logistic regression model and, likewise, the interaction term is 0 in the loglinear model. Gives you an idea of how we measure conditional independence in contingency table data. |
|||||||||
|
I don't think I would call either of them a "simple linear regression model". Although it is possible to use the log or the logit transformations as the link function for a number of different models, these are typically understood to refer to specific models. For example, "logistic regression" is understood to be a generalized linear model (GLiM) for situations where the response variable is distributed as a binomial. In addition, "log-linear regression" is usually understood to be a Poisson GLiM applied to multi-way contingency tables. In other words, beyond the fact that they are both regression models / GLiMs, I don't see them as necessarily being very similar (there are some connections between them, as @AdamO points out, but the typical usages are fairly distinct). The biggest difference would be that logistic regression assumes the response is distributed as a binomial and log-linear regression assumes the response is distributed as Poisson. In fact, log-linear regression is rather different from most regression models in that the response variable isn't really one of your variables at all (in the usual sense), but rather the set of frequency counts associated with the combinations of your variables in the multi-way contingency table. |
|||||||||
|