Binomial distribution

The Binomial distribution counts the number of successes in a fixed series of Bernoulli trials.

Assumptions

Let \(B_1, \ldots, B_n\) be a series of \(n\) Bernoulli variables each identically and independently distributed with parameter \(p\). Define their sum as,

\[X =\sum_{i=1}^n B_n\]

Then \(X \sim \mathrm{Binomial}(n,p)\).

Two key premises of the Binomial distribution are that (i) \(n\), the number of trials, is fixed beforehand,1 and (ii) the probability of success never changes over time or in response to the previous trials.

Definition

\[\begin{array}{ll} \text{Support:} & \{0,1,2,\ldots,n\} \\ \text{Parameter(s):} & p,\text{ the probability of success }(p \in [0,1]) \\ & n,\text{ the number of trials }(n \in \mathbb{Z}^+) \\ \text{PMF:} & P(X=k) = \left(\begin{array}{c} n \\ k \end{array} \right) p^k (1-p)^{n-k} \\ \text{CDF:} & F_X(x) = \left\{\begin{array}{cl} 0, & \quad x \lt 0 \\ \sum_{i=0}^{\lfloor x \rfloor} \left(\begin{array}{c} n \\ i \end{array}\right) p^i (1-p)^{n-i}, & \quad 0 \le x \lt n \\ 1, & \quad x \ge n \end{array}\right\} \\ \text{Mean:} & \mathbb{E}[X] = np \\ \text{Variance:} & \mathbb{V}[X] = np(1-p) \\ \end{array}\]

Visualizer

#| '!! shinylive warning !!': |
#|   shinylive does not work in self-contained HTML documents.
#|   Please set `embed-resources: false` in your metadata.
#| standalone: true
#| viewerHeight: 650

library(shiny)
library(bslib)

ui <- page_fluid(
      tags$head(tags$style(HTML("body {overflow-x: hidden;}"))),
  title = "Bernoulli distribution PMF",
  fluidRow(plotOutput("distPlot")),
  fluidRow(column(width=6,sliderInput("n", "Trials (n)", min=5, max=25, step=1, value=15)),
           column(width=6,sliderInput("p", "Probability (p)", min=0, max=1, step=0.01,value=0.5))))

server <- function(input, output) {
  output$distPlot <- renderPlot({
    plot(x=0:input$n,y=dbinom(x=0:input$n,input$n,input$p),main=NULL,
         xlab='x (Successes)',ylab='Probability',type='h',lwd=3)})
}

shinyApp(ui = ui, server = server)

Relations to other distributions

  • A Binomial(\(n,p\)) variable is equivalent to the sum of \(n\) i.i.d. Bernoulli(\(p\)) variables

    • As a corollary, when \(n=1\), a Binomial(\(n,p\)) variable is simply a Bernoulli(\(p\)) variable.
  • The Binomial distribution forms a discrete approximation to the Normal distribution with \(\mu = np\) and \(\sigma^2 = np(1-p)\), and the quality of this approximation grows when \(np\) is very large and \(p\) is not near 0 or 1. The accuracy of this approximation increases with the use of a “continuity correction”, such as adding or subtracting 0.5 to the Normal quantile. For example, compare:

pbinom(q=155,size=200,prob=0.8)
[1] 0.2112617
pnorm(q=155,mean=200*0.8,sd=sqrt(200*0.8*0.2))
[1] 0.1883796
pnorm(q=155.5,mean=200*0.8,sd=sqrt(200*0.8*0.2))
[1] 0.2131628
  • The Poisson distribution can be seen as a limit case of the Binomial distribution, as \(n\) grows toward infinity and \(p\) shrinks to zero, with \(\lambda = np\). When the usual counts are nowhere near the theoretical upper bound, we can often use the Poisson instead of the Binomial.

    For example, say that a town has 20,000 homes, and that any home has a 0.1% chance to have a termite infestation. The current number of termite infestations is technically a Binomial(20000,0.001) variable, but well-approximated by a Poisson(20) variable:

dbinom(x=6:10,size=20000,prob=0.001)
[1] 0.0001823448 0.0005213501 0.0013042233 0.0029000148 0.0058032228
dpois(x=6:10,lambda=20)
[1] 0.0001832137 0.0005234676 0.0013086690 0.0029081533 0.0058163065

  1. Unlike, say, the Geometric distribution in which trials are continued until the first success occurs.↩︎