Building new GLMs

Over the next few sections we will learn more about GLM models, but we will only discuss a few more combinations of distribution and link function in depth. Many other combinations must remain unexplored at this time. Still, one of the great advantages of GLMs is their flexibility — their “plug and play” construction — and I hope that readers who become familiar with logistic regression, probit regression, and count models will feel emboldened to try new GLM distributions and link functions on their own.

Choosing a distributional assumption for Y

Our first use case for GLMs involved Bernoulli data — 1s and 0s. This data cannot be easily confused with samples from other distributions. However, in the broader world of GLM uses, we will sometimes see data of uncertain distribution, especially if the distribution is continuous. Before we begin modeling, how would we know if the data are conditionally normal, conditionally gamma, conditionally exponential, etc.?

We have two ways forward. One is to rely upon prior scholarship and context about the data. Many use cases we might encounter have been studied before. The gamma distribution, for example, is uncommon in introductory textbooks but used more often in finance, insurance, medical, and environmental disciplines to model times and amounts which aggregate multiple exponentially distributed components. If we were modeling rainfall collections in a reservoir or reinsurer liabilities in the wake of a hurricane, we might naturally start with a gamma assumption, based purely on our understanding of the world and the context of our data.1

The second way forward is to simply use what works. Examine the data closely during the EDA phase, and build a set of testable assumptions about your data’s distribution. Notice whether the values are symmetric around the mean or skewed. Notice whether the data show heteroskedasticity or not. Pick one or more candidate distributions and compare their fits using likelihood-based metrics such as AIC. Recall George Box’s adage: all models are wrong; some are useful.


  1. At the risk of repeating myself, this is both the weakness and the strength of parametric modeling vis-à-vis many machine learning models. They require (and provide) an understanding of how the world works.↩︎

  2. For example, the canonical link of exponential distribution has a mean function of \(\mu = -1⁄\boldsymbol{x \beta}\), which can create impossible estimates of negative means (the exponential is always positive-valued).↩︎