Carsten Dormann | 2 Oct 08:29
Picon
Favicon

Re: Negative binomial

Dear Joao,

I propose you do the following (and wait for the outcry-responses to 
this email to see if it is a reasonable proposal):

Fit your model with different types of distributions and compare their 
logLik-values:
logLik(glm(y ~ x1+x2+x3+I(x1^2) + x1:x3, family=gaussian))
logLik(glm(y ~ x1+x2+x3+I(x1^2) + x1:x3, family=poisson))
logLik(glm(y ~ x1+x2+x3+I(x1^2) + x1:x3, family=quasipoisson))
logLik(glm.nb(y ~ x1+x2+x3+I(x1^2) + x1:x3)) # require(MASS)

The model with the highest log-Likelihood is the distribution of choice 
and you can defend it against reviewer.

A few notes:
1. You obviously cannot do this when one of the models uses transformed 
responses (e.g. log(y)), because the LL will then be completely different.
2. When you use a more complex model (say a GLMM), you can approximate 
the neg.bin through a two-step procedure: 1. fit a (wrongly structured) 
glm.nb and extract the theta value from the summary of the model, say 
theta=4.5 (that is the second parameter of the neg.bin distribution). 
Then fit the GLMM again, giving as family the argument: 
negative.binomial(theta=4.5) (again from package MASS). The same holds 
for GAMs and other models requiring a specification of family.
3. You may want to dig around for books recommending the above 
procedure. I think I got this as advice from someone else, but haven't 
bothered yet to look it up (obviously MASS would be a good starting 
place, in their description of the neg.bin). I saw a paper that does 
this (using the minimum AIC but otherwise this approach), but it is not 
(Continue reading)

Wilfried Thuiller | 2 Oct 08:45
Picon
Picon
Favicon

Re: Negative binomial

Dear Joao,
Carsten is right.
However if your data are not left skewed you do not have any reason of 
using neg bin or quasi Poisson.

Poisson (= log-linear) models are very adapted to count data (more than 
Gaussian in any case) because they capture very well the fact that, as 
is often the case with count data, the variance tends to increase with 
the mean.
Another advantage of using the log link is that with count data the 
effects of environmental predictors are often multiplicative rather than 
additive. If the effet is in fact proportional to the count, working in 
the log scale leads to a much simpler model.

Hope it helps,
Wilfried

Carsten Dormann a écrit :
> Dear Joao,
>
> I propose you do the following (and wait for the outcry-responses to 
> this email to see if it is a reasonable proposal):
>
> Fit your model with different types of distributions and compare their 
> logLik-values:
> logLik(glm(y ~ x1+x2+x3+I(x1^2) + x1:x3, family=gaussian))
> logLik(glm(y ~ x1+x2+x3+I(x1^2) + x1:x3, family=poisson))
> logLik(glm(y ~ x1+x2+x3+I(x1^2) + x1:x3, family=quasipoisson))
> logLik(glm.nb(y ~ x1+x2+x3+I(x1^2) + x1:x3)) # require(MASS)
>
(Continue reading)


Gmane