Log-linear regression vs. logistic regression

Question

Can anyone provide a clear list of differences between log-linear regression and logistic regression? I understand the former is a simple linear regression model but I am not clear on when each should be used.

AdamO · Accepted Answer · 2014-02-16 01:39:38Z

The name is a bit of a misnomer. Log-linear models were traditionally used for the analysis of data in a contingency table format. While "count data" need not necessarily follow a Poisson distribution, the log-linear model is actually just a Poisson regression model. Hence the "log" name (Poisson regression models contain a "log" link function).

A "log transformed outcome variable" in a linear regression model is not a log-linear model, (neither is an exponentiated outcome variable, as "log-linear" would suggest). Both log-linear models and logistic regressions are examples of generalized linear models, in which the relationship between a linear predictor (such as log-odds or log-rates) is linear in the model variables. They are not "simple linear regression models" (or models using the usual $E[Y|X] = a + bX$ format).

Despite all that, it's possible to obtain equivalent inference on associations between categorical variables using logistic regression and poisson regression. It's just that in the poisson model, the outcome variables are treated like covariates. Interestingly, you can set up some models that borrow information across groups in a way much similar to a proportional odds model, but this is not well understood and rarely used.

Examples of obtaining equivalent inference in logistic and poisson regression models using R illustrated below:

y <- c(0, 1, 0, 1)
x <- c(0, 0, 1, 1)
w <- c(10, 20, 30, 40)

## odds ratio for relationship between x and y from logistic regression
glm(y ~ x, family=binomial, weights=w)

## the odds ratio is the same interaction parameter between contingency table frequencies
glm(w ~ y * x, family=poisson)

Interesting, lack of association between $y$ and $x$ means the odds ratio is 1 in the logistic regression model and, likewise, the interaction term is 0 in the loglinear model. Gives you an idea of how we measure conditional independence in contingency table data.

Again, this probably shows my inexperience, but would you be able to provide a definition for contingency tables? It may also help others who come across this question. — user38133, Feb 16 at 1:45
Contingency tables are (usually) 2 dimensional tables which enumerate all possible responses of two variables and show the frequency of observations in the cells. For instance, you might have a 2 by 2 contingency table showing smoking status (never vs current) and cancer (lung ca vs no cancer) which you would use to estimate the association between smoking and cancer risk. — AdamO, Feb 16 at 1:47

gung · Answer 2 · 2014-02-16 02:45:45Z

I don't think I would call either of them a "simple linear regression model". Although it is possible to use the log or the logit transformations as the link function for a number of different models, these are typically understood to refer to specific models. For example, "logistic regression" is understood to be a generalized linear model (GLiM) for situations where the response variable is distributed as a binomial. In addition, "log-linear regression" is usually understood to be a Poisson GLiM applied to multi-way contingency tables. In other words, beyond the fact that they are both regression models / GLiMs, I don't see them as necessarily being very similar (there are some connections between them, as @AdamO points out, but the typical usages are fairly distinct). The biggest difference would be that logistic regression assumes the response is distributed as a binomial and log-linear regression assumes the response is distributed as Poisson. In fact, log-linear regression is rather different from most regression models in that the response variable isn't really one of your variables at all (in the usual sense), but rather the set of frequency counts associated with the combinations of your variables in the multi-way contingency table.

Thanks! I guess then my natural follow-up question, one that probably shows my lack of experience, is about how to determine what the right distribution to model a given problem is. I think I will need to do a bit more reading to make sure I can always choose correctly. — user38133, Feb 16 at 1:43
The log-linear model is a Poisson regression model that is applied to a multi-way contingency table. Eg, if you had a 2-way contingency table & you wondered if the rows & columns are independent, you would conduct a chi-squared test; if you had a >2-way contingency table, you could use the log-linear model. Logistic regression is for situations where you have a response variable & it is $\{0,\ 1\}$ only. — gung, Feb 16 at 2:32

asked	4 months ago
viewed	1071 times
active	4 months ago

current community

your communities

more stack exchange communities

Log-linear regression vs. logistic regression

2 Answers

Your Answer

Not the answer you're looking for? Browse other questions tagged regression logistic logit log-linear or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Log-linear regression vs. logistic regression

2 Answers

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged regression logistic logit log-linear or ask your own question.

Related

Hot Network Questions