The stats::glm function is a useful workhorse, but it has its limits. All of our polytomous models will require specialty functions from other packages. In the notes below I will use one set of functions, but in your own work you may come to prefer alternatives from other packages, especially if they inherit methods that work well with your preferred data or graphics frameworks (such as tidyverse or data.table).
Multinomial logistic regression
I prefer nnet::multinom for these models, though you can also use the VGAM package which has its own function VGAM::multinomial. The nnet models have more compatibility with other modeling functions while the VGAM package is not as fully and professionally developed, in my opinion.
When shaping the data for your multinomial regression, note that:
The response variable can be a factor (preferred) or a vector of numeric or character data (which will then be converted to a factor). This means R will automatically typecast integers (like 1, 2, …, 5) into categories.
Although rarely used, you can instead express the response as a matrix, where each column is a separate level and the entries are counts of how many observations were recorded at each level for the corresponding set of predictors.
The first level of the response variable will be used as the reference level by multinom, so you may wish to re-order the factor levels if you prefer to see a different set of level comparisons.
Any predictor that would work with a lm or glm object will work fine here.
Example: Eye color association with sex and hair color
The dataset HairEyeColor comes pre-loaded in R and describes the hair color (Black, Brown, Red, Blond), eye color (Brown, Blue, Hazel, Green), and sex (Male, Female) of 592 statistics students.
When both the predictors and the response variable are categorical, we can study the data through two perspectives. One is a multinomial logistic regression approach, which we will detail below. The other is a log-linear contingency table analysis, which studies associations between all the variables without defining a response or predictors.
Let’s look at the data:
HairEyeColor
, , Sex = Male
Eye
Hair Brown Blue Hazel Green
Black 32 11 10 3
Brown 53 50 25 15
Red 10 10 7 7
Blond 3 30 5 8
, , Sex = Female
Eye
Hair Brown Blue Hazel Green
Black 36 9 5 2
Brown 66 34 29 14
Red 16 7 7 7
Blond 4 64 5 8
This three-dimensional contingency table is not well-suited for regression analysis. We will restructure it:
These results describe three separate regressions for predicting: (1) Blue vs. Brown eyes, (2) Hazel vs. Brown eyes, and (3) Green vs. Brown eyes. We see that all of the parameters estimates are at least twice their standard errors (which points toward significance), except the betas for student sex. Perhaps sex is not strongly linked with eye color and should not be in the model. We can run a reduced model to find out:
A chi-squared deviance test suggests that the three additional parameters for sex do not add significantly more explanatory power. A comparison of the two model AICs would confirm this finding. We will proceed with the smaller model.
We can retrieve our predictions either as a set of classifications (the level of our response with the greatest predicted probability), or as a matrix of fitted probabilities. Here we see that since green and hazel eyes are relatively rare, they are never the predicted class. However, probabilistically, our predictions are extremely close to the true totals:
head(predict(eye.model.red))
[1] Brown Brown Brown Brown Brown Brown
Levels: Brown Blue Hazel Green
Brown Blue Hazel Green
[1,] 220.0000 215.0000 93.00000 64.00000
[2,] 220.0006 215.0022 93.00036 63.99684
Our coefficients can be exponentiated to calculate the percentage shift in the odds of each response level (blue, hazel, or green eyes) vs. the reference level (brown). Here, the predictors are also categorical, and so each level of the predictor represents a shift from the reference level (black hair).
exp(coefficients(eye.model.red)) -1
(Intercept) HairBrown HairRed HairBlond
Blue -0.7058861 1.400022 1.223288 44.660727
Hazel -0.7794144 1.057176 1.441150 5.476403
Green -0.9264913 2.315037 6.325373 30.095409
We can say, for example, that having brown hair (instead of black) increases the odds of blue eyes (instead of brown) by 140%, while having red hair (instead of black) increases the odds of hazel eyes (instead of brown) by 144%.
To compute the regressions between two non-reference levels, we can either pull out the information from the model details, or re-level our variables and re-run the regression. Let’s try to retrieve the regression predicting for blue eyes instead of green.
If we extract the information ‘manually’, we should difference the coefficients between the “blue vs. brown” regression and the “green vs. brown” regression. The standard errors are a little harder to compute, and require us to access the estimated variance-covariance matrix for our betas. Recall that:
We can also retrieve very similar coefficients and standard errors simply by re-leveling the response variable (compare to the first lines of the parameter estimates and standard errors).1
Because nnet fits these models using a neural net, which involves algorithmic convergence from a random initialization, the two sets of results may differ very slightly.↩︎