Linear predictors and link functions

This section is short. It is also important. By keeping it short, I hope that you can more easily tell if you grasp the central concept.

Our goal now is to link a linear combination of our predictor variables, \(\boldsymbol{X\beta}\), to the mean of the conditional distribution \(Y|X\). This mean will be our point estimate. Solving for the mean of the distribution will often provide us the information needed to estimate its parameters.1 In these cases, a model for the mean will also allow us to make probabilistic statements about values other than the mean.

Even though we are trying to model non-linear trends between our response variable \(Y\) and the predictors \(\boldsymbol{X}\), the central idea of GLMs is to reject the option of creating a truly non-linear model, and instead to linearly model a reversible transformation of the mean response. And once we have modeled this transformation of the mean, we can simply run our model predictions through the inverse of the function, un-transforming them into estimates for the mean of the actual data.

Let’s start by describing the function which transforms the mean of \(Y\) into a quantity modeled by \(\boldsymbol{X\beta}\). I’ll begin with the more formal definition and then untangle it through examples.

Note

Let \(Y\) be a response variable and \(\boldsymbol{X}\) be a set of predictor variables. Suppose that the values of \(Y\) given \(\boldsymbol{X}\) are conditionally distributed according to a known probability distribution \(F(y;\boldsymbol{\theta})\) dependent on one or more parameters \(\boldsymbol{\theta}\) which themselves are a function of \(X\):

\[P(Y \le y_i | \boldsymbol{X} = \boldsymbol{x}_i )=F_{Y|\boldsymbol{X}} (y_i;\boldsymbol{\theta}_i) \]

For each observation, let \(\mathbb{E}[y_i | \boldsymbol{x}_i ] = \mu_i\).

Define the linear predictor as the linear combination \(\boldsymbol{\eta} = \boldsymbol{X\beta}\), such that for each observation \(\eta_i = \boldsymbol{x}_i \boldsymbol{\beta}\).

Define the link function as an assumed functional form for the relationship between \(\mu_i\) and the linear predictor \(\eta_i\):

\[g(\mu_i) = \eta_i = \boldsymbol{x}_i \boldsymbol{\beta}\]

Define the mean function as the inverse of the link function, which expresses the mean of every observation for y_i as a function of the linear predictor:

\[g^{-1}(\eta_i) = \mu_i \]

Our goal is to connect (or link) a non-linear transformation of the mean of \(Y\) to a linear combination of the \(X\)s. We call the function which defines the mean of \(Y\) as some function of \(\boldsymbol{X\beta}\) the mean function. We call its inverse, which defines \(\boldsymbol{X\beta}\) as a function of the mean of \(Y\) the link function.

In the next section we will use a concrete example to explore one implementation of this idea, logistic regression.


  1. For example, the mean of the exponential distribution is \(1/\lambda\), the inverse of its parameter.↩︎