F distribution
The F distribution will not often describe real-world random variables such as the waiting time between buses or the number of phone calls being simultaneously placed through a single cell tower. Instead, the F distribution describes the sample distribution of several interesting model statistics under their null hypotheses.
In particular, the F distribution is used to perform joint tests of multiple hypotheses. A common use of an F-test in regression analysis is to test whether multiple slopes \(\beta_i, \ldots, \beta_j\) might actually all be zero:
When the betas in question are all the non-intercept betas in a regression model, the F-test becomes a test of whether a model should be fit at all, that is, whether \(Y\) really can be explained by any of the available predictors.
When the betas in question are the only terms which are different between two nested models, the F-test becomes a test of which model explains significantly more of the response variable \(Y\).
When the betas in question are all the dummy variables which code for the non-reference levels of a categorical predictor, the F-test becomes a test of whether the entire categorical predictor adds significant explanatory power to the model (regardless of whether one or more level means are statistically different from the reference level mean.)
Assumptions
The F distribution is a ratio of two different chi-squared distributions, normalized by their degrees of freedom. That is, if \(X \sim \chi^2_{(df_1)}\) and \(Y \sim \chi^2_{(df_2)}\), then
\[\frac{X/df_1}{Y/df_2} \sim F_{(df_1,df_2)}\]
Degrees of freedom
The two parameters for F-distributed variables are called the “numerator degrees of freedom” and the “denominator degrees of freedom”.
In general, for regression analysis, the numerator degrees of freedom will be equal to the number of simultaneously tested hypotheses (e.g. betas).
In general, for regression analysis, the denominator degrees of freedom will be the error degrees of freedom, \(n - k - 1\) where \(k\) is the number of non-intercept betas in the model.
When the F-test compares two different nested models, the numerator df is generally equal to the difference in parameters between the model (essentially, we are testing whether these are all zero), and the denominator df is equal to the error degrees of freedom of the larger model.
Definition
\[\begin{array}{ll} \text{Support:} & \mathbb{R}^+ \\ \text{Parameter(s):} & df_1,\text{ the numerator degrees of fredom }(df_1 \in \mathbb{Z}^+) \\ & df_2,\text{ the denominator degrees of fredom }(df_2 \in \mathbb{Z}^+) \\ \text{PMF:} & (complex) \\ \text{CDF:} & (also\; complex) \\ \text{Mean:} & \mathbb{E}[X] = \frac{df_2}{df_2 - 2} \\ \text{Variance:} & \mathbb{V}[X] = \frac{2df^2_2 \cdot (df_1 + df_2 - 2)}{df_1 \cdot (df_2 - 2)^2 \cdot (df_2 - 4)} \\ \end{array}\]
Visualizer
Because the F distribution has two parameters, it can assume a wide variety of distribution “shapes”. Try examining the PDF of an F distribution under four scenarios:
- When both \(df_1\) and \(df_2\) are small
- When \(df_1 \gg df_2\)
- When \(df_2 \gg df_1\)
- When both \(df_1\) and \(df_2\) are large
#| '!! shinylive warning !!': |
#| shinylive does not work in self-contained HTML documents.
#| Please set `embed-resources: false` in your metadata.
#| standalone: true
#| viewerHeight: 650
library(shiny)
library(bslib)
ui <- page_fluid(
tags$head(tags$style(HTML("body {overflow-x: hidden;}"))),
title = "Chi-squared distribution PDF",
fluidRow(plotOutput("distPlot")),
fluidRow(column(width=6,sliderInput("df1","Numerator df", min=1, max=25, value=5)),
column(width=6,sliderInput("df2","Denominator df", min=5, max=125, value=5))))
server <- function(input, output) {
output$distPlot <- renderPlot({
x <- seq(0,5,0.01)
y <- df(x,input$df1,input$df2)
plot(x=x,y=y,main=NULL,xlab='x',ylab='Density',type='l',lwd=2)
abline(v=input$df2/(input$df2 - 2),col='#0000ff',lwd=2,lty=2)
text(input$df2/(input$df2 - 2),0.1,pos=4,labels='mean',col='#0000ff')
})
}
shinyApp(ui = ui, server = server)
Properties
The construction of an F test resembles the engineering concept of a “signal-to-noise ratio”. Since the expectation of any chi-squared variable is its own degrees of freedom, under the null hypothesis we would assume an F statistic to be close to 1.
If the F statistic is instead much greater than 1 (say, 5, or 50), then the normalized “signal” is 5x or 50x stronger than the normalized “noise” in the data, which is usually strong evidence we can use to reject the null.
Relations to other distributions
If \(X \sim F_{(1,df_2)}\) (i.e. X is F-distributed with one numerator degree of freedom), then \(\sqrt{X} \sim t_{df_2}\) (i.e. the square root of X is t-distributed).
Because of this, any t-test for whether a single beta could be zero (such as those shown in the regression summary table) is equivalent to an F-test of whether “all one” of those betas could be zero. This is why the p-value of a model’s omnibus F test matches the p-value of the slope \(\hat{\beta}_1\) in a simple regression.