F distribution

The F distribution will not often describe real-world random variables such as the waiting time between buses or the number of phone calls being simultaneously placed through a single cell tower. Instead, the F distribution describes the sample distribution of several interesting model statistics under their null hypotheses.

In particular, the F distribution is used to perform joint tests of multiple hypotheses. A common use of an F-test in regression analysis is to test whether multiple slopes \(\beta_i, \ldots, \beta_j\) might actually all be zero:

Assumptions

The F distribution is a ratio of two different chi-squared distributions, normalized by their degrees of freedom. That is, if \(X \sim \chi^2_{(df_1)}\) and \(Y \sim \chi^2_{(df_2)}\), then

\[\frac{X/df_1}{Y/df_2} \sim F_{(df_1,df_2)}\]

Degrees of freedom

The two parameters for F-distributed variables are called the “numerator degrees of freedom” and the “denominator degrees of freedom”.

  • In general, for regression analysis, the numerator degrees of freedom will be equal to the number of simultaneously tested hypotheses (e.g. betas).

  • In general, for regression analysis, the denominator degrees of freedom will be the error degrees of freedom, \(n - k - 1\) where \(k\) is the number of non-intercept betas in the model.

  • When the F-test compares two different nested models, the numerator df is generally equal to the difference in parameters between the model (essentially, we are testing whether these are all zero), and the denominator df is equal to the error degrees of freedom of the larger model.

Definition

\[\begin{array}{ll} \text{Support:} & \mathbb{R}^+ \\ \text{Parameter(s):} & df_1,\text{ the numerator degrees of fredom }(df_1 \in \mathbb{Z}^+) \\ & df_2,\text{ the denominator degrees of fredom }(df_2 \in \mathbb{Z}^+) \\ \text{PMF:} & (complex) \\ \text{CDF:} & (also\; complex) \\ \text{Mean:} & \mathbb{E}[X] = \frac{df_2}{df_2 - 2} \\ \text{Variance:} & \mathbb{V}[X] = \frac{2df^2_2 \cdot (df_1 + df_2 - 2)}{df_1 \cdot (df_2 - 2)^2 \cdot (df_2 - 4)} \\ \end{array}\]

Visualizer

Because the F distribution has two parameters, it can assume a wide variety of distribution “shapes”. Try examining the PDF of an F distribution under four scenarios:

  • When both \(df_1\) and \(df_2\) are small
  • When \(df_1 \gg df_2\)
  • When \(df_2 \gg df_1\)
  • When both \(df_1\) and \(df_2\) are large
#| '!! shinylive warning !!': |
#|   shinylive does not work in self-contained HTML documents.
#|   Please set `embed-resources: false` in your metadata.
#| standalone: true
#| viewerHeight: 650

library(shiny)
library(bslib)

ui <- page_fluid(
      tags$head(tags$style(HTML("body {overflow-x: hidden;}"))),
  title = "Chi-squared distribution PDF",
  fluidRow(plotOutput("distPlot")),
  fluidRow(column(width=6,sliderInput("df1","Numerator df", min=1, max=25, value=5)),
           column(width=6,sliderInput("df2","Denominator df", min=5, max=125, value=5))))

server <- function(input, output) {
  output$distPlot <- renderPlot({
    x <- seq(0,5,0.01)
    y <- df(x,input$df1,input$df2)
    plot(x=x,y=y,main=NULL,xlab='x',ylab='Density',type='l',lwd=2)
    abline(v=input$df2/(input$df2 - 2),col='#0000ff',lwd=2,lty=2)
    text(input$df2/(input$df2 - 2),0.1,pos=4,labels='mean',col='#0000ff')
  })
}

shinyApp(ui = ui, server = server)

Properties

The construction of an F test resembles the engineering concept of a “signal-to-noise ratio”. Since the expectation of any chi-squared variable is its own degrees of freedom, under the null hypothesis we would assume an F statistic to be close to 1.

If the F statistic is instead much greater than 1 (say, 5, or 50), then the normalized “signal” is 5x or 50x stronger than the normalized “noise” in the data, which is usually strong evidence we can use to reject the null.

Relations to other distributions

  • If \(X \sim F_{(1,df_2)}\) (i.e. X is F-distributed with one numerator degree of freedom), then \(\sqrt{X} \sim t_{df_2}\) (i.e. the square root of X is t-distributed).

  • Because of this, any t-test for whether a single beta could be zero (such as those shown in the regression summary table) is equivalent to an F-test of whether “all one” of those betas could be zero. This is why the p-value of a model’s omnibus F test matches the p-value of the slope \(\hat{\beta}_1\) in a simple regression.