Coding it up in R

The stats::glm function is a useful workhorse, but it has its limits. All of our polytomous models will require specialty functions from other packages. In the notes below I will use one set of functions, but in your own work you may come to prefer alternatives from other packages, especially if they inherit methods that work well with your preferred data or graphics frameworks (such as tidyverse or data.table).

Multinomial logistic regression

I prefer nnet::multinom for these models, though you can also use the VGAM package which has its own function VGAM::multinomial. The nnet models have more compatibility with other modeling functions while the VGAM package is not as fully and professionally developed, in my opinion.

When shaping the data for your multinomial regression, note that:

The response variable can be a factor (preferred) or a vector of numeric or character data (which will then be converted to a factor). This means R will automatically typecast integers (like 1, 2, …, 5) into categories.
Although rarely used, you can instead express the response as a matrix, where each column is a separate level and the entries are counts of how many observations were recorded at each level for the corresponding set of predictors.
The first level of the response variable will be used as the reference level by multinom, so you may wish to re-order the factor levels if you prefer to see a different set of level comparisons.
Any predictor that would work with a lm or glm object will work fine here.

Example: Eye color association with sex and hair color

The dataset HairEyeColor comes pre-loaded in R and describes the hair color (Black, Brown, Red, Blond), eye color (Brown, Blue, Hazel, Green), and sex (Male, Female) of 592 statistics students.

When both the predictors and the response variable are categorical, we can study the data through two perspectives. One is a multinomial logistic regression approach, which we will detail below. The other is a log-linear contingency table analysis, which studies associations between all the variables without defining a response or predictors.

Let’s look at the data:

HairEyeColor

, , Sex = Male

       Eye
Hair    Brown Blue Hazel Green
  Black    32   11    10     3
  Brown    53   50    25    15
  Red      10   10     7     7
  Blond     3   30     5     8

, , Sex = Female

       Eye
Hair    Brown Blue Hazel Green
  Black    36    9     5     2
  Brown    66   34    29    14
  Red      16    7     7     7
  Blond     4   64     5     8

This three-dimensional contingency table is not well-suited for regression analysis. We will restructure it:

eyedata <- as.data.frame(HairEyeColor)
eyedata <- eyedata[rep(1:dim(eyedata)[1],times=rep(eyedata$Freq)),1:3]
rownames(eyedata) <- NULL
dim(eyedata)

[1] 592   3

head(eyedata)

   Hair   Eye  Sex
1 Black Brown Male
2 Black Brown Male
3 Black Brown Male
4 Black Brown Male
5 Black Brown Male
6 Black Brown Male

Now, using the nnet::multinom function, we’ll run a regression trying to explain eye color as a function of hair color and sex.

#install.packages('nnet')
library(nnet)
eye.model.full <- multinom(Eye ~ Hair + Sex, eyedata, trace=FALSE)
summary(eye.model.full)

Call:
multinom(formula = Eye ~ Hair + Sex, data = eyedata, trace = FALSE)

Coefficients:
      (Intercept) HairBrown   HairRed HairBlond  SexFemale
Blue    -1.026823 0.8928067 0.8280827  3.912243 -0.4235428
Hazel   -1.356680 0.7346320 0.9148161  1.937427 -0.3243064
Green   -2.385416 1.2183330 2.0248981  3.542942 -0.4920914

Std. Errors:
      (Intercept) HairBrown   HairRed HairBlond SexFemale
Blue    0.2726421 0.2928267 0.4042358 0.4715526 0.2176325
Hazel   0.3077725 0.3298390 0.4385076 0.5726864 0.2519170
Green   0.4798299 0.5085431 0.5716056 0.6529648 0.2966168

Residual Deviance: 1348.655 
AIC: 1378.655

These results describe three separate regressions for predicting: (1) Blue vs. Brown eyes, (2) Hazel vs. Brown eyes, and (3) Green vs. Brown eyes. We see that all of the parameters estimates are at least twice their standard errors (which points toward significance), except the betas for student sex. Perhaps sex is not strongly linked with eye color and should not be in the model. We can run a reduced model to find out:

eye.model.red <- multinom(Eye ~ Hair, eyedata, trace=FALSE)
summary(eye.model.red)

Call:
multinom(formula = Eye ~ Hair, data = eyedata, trace = FALSE)

Coefficients:
      (Intercept) HairBrown   HairRed HairBlond
Blue    -1.223788 0.8754781 0.7989871  3.821239
Hazel   -1.511469 0.7213341 0.8924691  1.868165
Green   -2.610352 1.1984689 1.9913441  3.437060

Std. Errors:
      (Intercept) HairBrown   HairRed HairBlond
Blue    0.2543726 0.2915707 0.4024785 0.4671279
Hazel   0.2852577 0.3290799 0.4373362 0.5694209
Green   0.4634205 0.5075888 0.5697808 0.6481705

Residual Deviance: 1353.657 
AIC: 1377.657

dev.diff <- eye.model.red$deviance-eye.model.full$deviance
edf.diff <- abs(eye.model.red$edf-eye.model.full$edf)

c(chisq=dev.diff, df=edf.diff, pvalue=1-pchisq(dev.diff,edf.diff))

    chisq        df    pvalue 
5.0024730 3.0000000 0.1716162

c(AIC.full=AIC(eye.model.full),AIC.red=AIC(eye.model.red))

AIC.full  AIC.red 
1378.655 1377.657

A chi-squared deviance test suggests that the three additional parameters for sex do not add significantly more explanatory power. A comparison of the two model AICs would confirm this finding. We will proceed with the smaller model.

We can retrieve our predictions either as a set of classifications (the level of our response with the greatest predicted probability), or as a matrix of fitted probabilities. Here we see that since green and hazel eyes are relatively rare, they are never the predicted class. However, probabilistically, our predictions are extremely close to the true totals:

head(predict(eye.model.red))

[1] Brown Brown Brown Brown Brown Brown
Levels: Brown Blue Hazel Green

head(predict(eye.model.red,type='prob'))

      Brown     Blue     Hazel      Green
1 0.6296404 0.185186 0.1388896 0.04628404
2 0.6296404 0.185186 0.1388896 0.04628404
3 0.6296404 0.185186 0.1388896 0.04628404
4 0.6296404 0.185186 0.1388896 0.04628404
5 0.6296404 0.185186 0.1388896 0.04628404
6 0.6296404 0.185186 0.1388896 0.04628404

table(eyedata$Eye,predict(eye.model.red))

       
        Brown Blue Hazel Green
  Brown   213    7     0     0
  Blue    121   94     0     0
  Hazel    83   10     0     0
  Green    48   16     0     0

rbind(table(eyedata$Eye),colSums(predict(eye.model.red,type='prob')))

        Brown     Blue    Hazel    Green
[1,] 220.0000 215.0000 93.00000 64.00000
[2,] 220.0006 215.0022 93.00036 63.99684

Our coefficients can be exponentiated to calculate the percentage shift in the odds of each response level (blue, hazel, or green eyes) vs. the reference level (brown). Here, the predictors are also categorical, and so each level of the predictor represents a shift from the reference level (black hair).

exp(coefficients(eye.model.red)) - 1

      (Intercept) HairBrown  HairRed HairBlond
Blue   -0.7058861  1.400022 1.223288 44.660727
Hazel  -0.7794144  1.057176 1.441150  5.476403
Green  -0.9264913  2.315037 6.325373 30.095409

We can say, for example, that having brown hair (instead of black) increases the odds of blue eyes (instead of brown) by 140%, while having red hair (instead of black) increases the odds of hazel eyes (instead of brown) by 144%.

To compute the regressions between two non-reference levels, we can either pull out the information from the model details, or re-level our variables and re-run the regression. Let’s try to retrieve the regression predicting for blue eyes instead of green.

If we extract the information ‘manually’, we should difference the coefficients between the “blue vs. brown” regression and the “green vs. brown” regression. The standard errors are a little harder to compute, and require us to access the estimated variance-covariance matrix for our betas. Recall that:

\[\mathbb{V}[X - Y] = \mathbb{V}[X] + \mathbb{V}[Y] - 2 \cdot \textrm{Cov}(X,Y)\]

And so,

\[\textrm{s.e.}(\hat{\beta}_\textrm{blue} - \hat{\beta}_\textrm{green}) = \sqrt{\mathbb{V}[\hat{\beta}_\textrm{blue}] + \mathbb{V}[\hat{\beta}_\textrm{green}] - 2 \cdot \textrm{Cov}(\hat{\beta}_\textrm{blue},\hat{\beta}_\textrm{green})}\]

coefficients(eye.model.red)[1,]-coefficients(eye.model.red)[3,]

(Intercept)   HairBrown     HairRed   HairBlond 
  1.3865634  -0.3229908  -1.1923570   0.3841784

sqrt(diag(vcov(eye.model.red))[1:4]+
     diag(vcov(eye.model.red))[9:12]-
     2*diag(vcov(eye.model.red)[1:4,9:12]))

Blue:(Intercept)   Blue:HairBrown     Blue:HairRed   Blue:HairBlond 
       0.5000527        0.5444648        0.6166864        0.5685001

We can also retrieve very similar coefficients and standard errors simply by re-leveling the response variable (compare to the first lines of the parameter estimates and standard errors).¹

eyedata$Eye2 <- factor(eyedata$Eye,levels=c('Green','Blue','Brown','Hazel'))
summary(multinom(Eye2 ~ Hair, eyedata, trace=FALSE))

Call:
multinom(formula = Eye2 ~ Hair, data = eyedata, trace = FALSE)

Coefficients:
      (Intercept)  HairBrown   HairRed  HairBlond
Blue     1.386313 -0.3227964 -1.192151  0.3843789
Brown    2.610072 -1.1982536 -1.991025 -3.4367264
Hazel    1.098638 -0.4769560 -1.098635 -1.5686448

Std. Errors:
      (Intercept) HairBrown   HairRed HairBlond
Blue    0.5000012 0.5444157 0.6166471 0.5684531
Brown   0.4633656 0.5075368 0.5697350 0.6481199
Hazel   0.5163984 0.5653922 0.6399413 0.6551077

Residual Deviance: 1353.657 
AIC: 1377.657

Ordinal regression

More coming soon!

Because nnet fits these models using a neural net, which involves algorithmic convergence from a random initialization, the two sets of results may differ very slightly.↩︎