| tags:R statistics model selection GLM AIC deviance

# So, you did some GLMs & compared with AIC. Congrats!

Here’s what you need to report in a paper about the model comparison:

- residual deviance
- residual df
- delta AIC
- AIC weight

You should also report the null deviance and degrees of freedom, maybe in a table caption.

Thanks to Emilio Bruna for prompting this post and
suggesting its title. (**Update 2015-12-14**: thanks also to Ben Bolker for pointing out some issues in the first version of this post.)
Below I’ll explain why you should include at least these numbers, do a worked
example, and mention some situations where it’s better to use something other
than AIC.

### What to report

For model selection, a model’s AIC is only meaningful relative to that of other
models, so Akaike and others recommend reporting differences in AIC from the
best model, \(\Delta\)*AIC*, and AIC *weight*. The latter can be viewed as
an estimate of the proportion of the time a model will give the best predictions on new data
(conditional on the models considered and assuming the same process generates
the data; this heuristic view appears justified by simulations, e.g., Burnham
and Anderson 2002 pp. 348).
Also, neither differences in AIC less than 0.1 nor differences in AIC weights
below 0.01 are really meaningful, so round the reported numbers appropriately.

Finally, even if using an information criterion include the residual deviance and degrees of freedom for each model. These provide a (rough) goodness of fit.

### Example: UC Berkeley admissions and gender

Let’s look at the built-in `UCBAdmissions`

data. As R
will tell you, these data are often used to illustrate Simpson’s paradox (see
`?UCBAdmissions`

and Bickel *et al.* 1975 or my PPS below).

```
d <- as.data.frame(UCBAdmissions)
d <- tidyr::spread(d, Admit, Freq) # use Hadley's excellent tidyr to reshape
d[order(d$Dept), ]
```

```
## Gender Dept Admitted Rejected
## 1 Male A 512 313
## 7 Female A 89 19
## 2 Male B 353 207
## 8 Female B 17 8
## 3 Male C 120 205
## 9 Female C 202 391
## 4 Male D 138 279
## 10 Female D 131 244
## 5 Male E 53 138
## 11 Female E 94 299
## 6 Male F 22 351
## 12 Female F 24 317
```

Using logistic regression, encode several models for the effect of applicant gender, department identity, or both on admission.

```
m1 <- glm(cbind(Admitted, Rejected) ~ Gender, d, family='binomial')
m2 <- glm(cbind(Admitted, Rejected) ~ Dept, d, family = 'binomial')
m3 <- glm(cbind(Admitted, Rejected) ~ Dept + Gender, d, family = 'binomial')
model.names <- c("1 Gender", "2 Dept", "3 Gender + Dept")
```

(Note: although we might like to allow for an interaction between gender and department, the data here are insufficient to fit such a model.)

Running `summary`

on any one of the fits yields a bunch of stats: AIC, Residual
and null deviance, as well as coefficients, their standard errors, and
significance.

### How to do it in `R`

We could type by hand the AIC and other stats. No fun!

There are two other options.

First is to use David Robinson’s `broom`

which gives tidy summaries of model objects. The second is to use Ben Bolker’s `bbmle`

which provides methods for generating pretty tables of xIC values.

#### Using broom

`summ.table <- do.call(rbind, lapply(list(m1, m2, m3), broom::glance))`

If we take a look at `summ.table`

, we’ll see it has all the ingredients we might like to report from model selection, whether via AIC, BIC, or just the deviance. These are, in order, null.deviance, df.null, logLik, AIC, BIC, deviance, df.residual.

Creating a table with our own desired columns in an appropriate order is easy.

```
table.cols <- c("df.residual", "deviance", "AIC")
reported.table <- summ.table[table.cols]
names(reported.table) <- c("Resid. Df", "Resid. Dev", "AIC")
```

```
reported.table[['dAIC']] <- with(reported.table, AIC - min(AIC))
reported.table[['weight']] <- with(reported.table, exp(- 0.5 * dAIC) / sum(exp(- 0.5 * dAIC)))
reported.table$AIC <- NULL
reported.table$weight <- round(reported.table$weight, 2)
reported.table$dAIC <- round(reported.table$dAIC, 1)
row.names(reported.table) <- model.names
```

`## Warning: Setting row names on a tibble is deprecated.`

With my advice above in mind, here’s *a minimal table for reporting model
selection on GLMs* using fitting results extracted with `broom::glance`

:

Caption: Model selection for the effect gender (model 1), department (model 2), and both gender and department (model 3) on admission probability fit to 12 observations (i.e., total degrees of freedom) with 877.056 null deviance.

`reported.table`

```
## # A tibble: 3 x 4
## `Resid. Df` `Resid. Dev` dAIC weight
## * <int> <dbl> <dbl> <dbl>
## 1 10 784. 754. 0
## 2 6 21.7 0 0.56
## 3 5 20.2 0.5 0.44
```

#### Using `bbmle`

```
reported.table2 <- bbmle::AICtab(m1, m2, m3, weights = TRUE, sort = FALSE, mnames = model.names)
reported.table2[["Resid. Dev"]] <- summ.table[["deviance"]] # get the deviance from broom'd table
```

Caption: Model selection for the effect gender (model 1), department (model 2), and both gender and department (model 3) on admission probability fit to 12 observations (i.e., total degrees of freedom) with 877.056 null deviance.

`reported.table2`

```
## dAIC df weight Resid. Dev
## 1 Gender 753.9 2 <0.001 783.6
## 2 Dept 0.0 6 0.56 21.7
## 3 Gender + Dept 0.5 7 0.44 20.2
```

Of course, model selection is just the beginning of reporting your results. See the PPS below for some thoughts on reporting results of the best model(s).

And, before you even did model selection, you should have asked yourself…

### Is AIC the right criterion?

- For small data and frequentist inference, you should use AICc – the small sample correction which provides greater penalty for each parameter but approaches AIC as \(n\) becomes large. If it makes a difference, you should use it. (I probably should have used it above.)
- For large data and frequentist inference , consider BIC, which is
asymptotically consistent while AIC is not (see Hastie
*et al.*2009, which is available online). AIC typically favors overly-complex models with large \(n\) relative to BIC. Note, however, that this is not an issue for prediction, only inference of a*true*model (if one exists; Sec. 6.4 McElreath 2015). - For Bayesian models, consider WAIC or LOO (instead of DIC, which has issues with non-Gaussian posteriors McElreath 2015).
- Don’t use information criteria for model selection between GLMs with different link functions

### Further Reading & References

- Ben Bolker on reporting results in the text for
GLMMs this is a
piece of a larger chapter
GLMMs: worked examples
that approaches inference as model checking and improvement (as opposed to
multi-model inference
*sensu*Burnham and Anderson 2004) - Burnham, and Anderson. 2004.
Multimodel Inference: Understanding AIC and BIC in Model Selection.
*Sociological Methods & Research*33(2):261-304 - Burnham, and Anderson. 2002.
*Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach.*Second Edition. Springer. - Elliot and Brook 2007 for a short piece on the philosophical background of multiple working hypotheses
- Hastie, Tibshirani, and Friedman. 2009. Elements of Statistical Learning, Second Edition. Springer, New York, NY, USA.
- McElreath. 2015. Statistical Rethinking. CRC Press.
- Brian O’Meara’s AIC tutorial

### PS: Nested models

Reporting the residual deviance and degrees of freedom as above is relatively similar to R’s output for conducting an ANOVA on a GLM (where you can optionally add a statistical test). For nested models, you may as well just do this and report the table:

```
m3.anova <- anova(m3, test="Chisq")
round(m3.anova, digits = 4)
```

```
## Analysis of Deviance Table
##
## Model: binomial, link: logit
##
## Response: cbind(Admitted, Rejected)
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev Pr(>Chi)
## NULL 11 877
## Dept 5 855 6 22 <2e-16 ***
## Gender 1 2 5 20 0.22
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

### PPS: Evaluating the best model(s)

For the best model(s), you should then go on to visualize the fit relative to the data (and report model results in the text).

To visualize the admissions data and mean fits from models 2 and 3 (which have
approximately equal AIC weight), we can use `ggplot2`

and `augment`

from broom
(which adds model predictions and statistics to the data).

To provide model-averaged predictions is a bit more work:

```
m3.pred <- broom::augment(m3)
m2.pred <- broom::augment(m2)
m2.weight <- reported.table[2, "weight"]
m3.weight <- reported.table[3, "weight"]
mavg.pred <- m2.weight$weight * m2.pred[ , -(1:2)] + m3.weight$weight * m3.pred[ , -(1:3)]
mavg.pred <- cbind(m3.pred[1:3], mavg.pred)
library(ggplot2)
ggplot(d) +
geom_point(aes(Dept, Admitted / (Admitted + Rejected), color=Gender,
size=Admitted + Rejected),
position = position_dodge(width = 0.5)) +
geom_pointrange(aes(Dept, plogis(.fitted),
ymin = plogis(.fitted - 2 * .se.fit),
ymax = plogis(.fitted + 2 * .se.fit),
shape=Gender),
position=position_dodge(width = 0.5), data =mavg.pred, alpha=0.4) +
theme_minimal() + scale_color_manual(values=c("blue", "orange"))
```

Admissions (colored dots, size indicates total applicants) versus department
from UCB Admissions data (included in `R`

), and averaged predictions (means \(\pm\) 2 SE)
from model 2 (department only) and model 3 (department and gender), with
averaging by AIC weight.

#### Simpson’s paradox

Comparing model 3 with model 1 illustrates Simpson’s paradox. Without accounting for department, the apparent effect of female gender on admission is negative (female odds relative to male 0.543, \(p \approx 0\)), whereas after accounting for department, the (within-department) effect is positive (but weak: female odds relative to male 1.105, \(p \approx\) 0.22).