2  Probability as Degree of Belief

NoteLearning Objectives
  • Distinguish between the frequentist and Bayesian interpretations of probability
  • Understand the three axioms of probability and why they constrain belief
  • Compute conditional probabilities and apply Bayes’ theorem to simple problems
  • Recognize how prior beliefs influence posterior conclusions

2.1 Two Interpretations of a Simple Number

What does it mean to say there is a 30% chance of rain tomorrow?

The frequentist answers: it means that in a large number of days with atmospheric conditions like today’s, roughly 30% of them were followed by rain. Probability is a property of a class of repeated events. You cannot meaningfully assign a probability to a unique, non-repeatable event — like the outcome of this particular election, or whether a specific patient will survive surgery.

The Bayesian answers differently: a 30% chance of rain is a statement about your state of knowledge. It reflects how confident you are, given everything you know about the current conditions. Probability is not a property of the world; it is a property of the relationship between the world and an observer with incomplete information.

Both interpretations produce the same arithmetic. The difference is philosophical — but philosophy, in this case, has practical consequences that will become apparent when we get to Chapter 3.

2.2 The Axioms and What They Buy You

Whether you are a frequentist or a Bayesian, probability obeys three axioms, stated compactly by Andrey Kolmogorov in 1933:

  1. \(P(A) \geq 0\) for any event \(A\) — probabilities are non-negative.
  2. \(P(\Omega) = 1\) — something in the sample space always happens.
  3. \(P(A \cup B) = P(A) + P(B)\) if \(A\) and \(B\) are mutually exclusive — probabilities of disjoint events add.

These look innocuous. But they have teeth. They imply, for instance, that \(P(A) + P(\neg A) = 1\) — your probability that something happens and your probability that it doesn’t must sum to one. This rules out a certain kind of incoherence: you cannot simultaneously believe there is a 70% chance of rain and a 50% chance of no rain.

A coherent set of beliefs is one that could not be exploited by a clever betting opponent. The Dutch book argument — a classic result in probability theory — shows that anyone whose beliefs violate the axioms can be offered a set of bets that guarantees them a loss regardless of what happens.

2.3 Conditional Probability

Most interesting questions involve conditional probability: given that we know something, how does that change what we expect?

\[P(A \mid B) = \frac{P(A \cap B)}{P(B)} \tag{2.1}\]

Read this as: the probability of \(A\) given \(B\) is the proportion of \(B\)-cases in which \(A\) also occurs. The intuition is simple — conditioning on \(B\) restricts our universe to the subset of events where \(B\) is true, then asks how often \(A\) appears in that subset.

2.4 Bayes’ Theorem: The Arithmetic of Updating

Bayes’ theorem is just a rearrangement of the definition of conditional probability. It is also one of the most useful results in all of science.

\[P(H \mid D) = \frac{P(D \mid H) \cdot P(H)}{P(D)} \tag{2.2}\]

Where \(H\) is a hypothesis and \(D\) is data. In words:

  • \(P(H)\) is the prior — your belief in \(H\) before seeing \(D\).
  • \(P(D \mid H)\) is the likelihood — how probable the data would be if \(H\) were true.
  • \(P(H \mid D)\) is the posterior — your updated belief after seeing \(D\).
  • \(P(D)\) is the marginal likelihood — a normalizing constant ensuring the posterior sums to one.

The theorem tells you, precisely, how much a piece of evidence should change your mind.

Code
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import beta

rng = np.random.default_rng(42)
true_theta = 0.7          # coin is biased toward heads
n_flips = 30
flips = rng.binomial(1, true_theta, n_flips)

theta = np.linspace(0, 1, 500)
alpha, b_param = 1, 1     # start with flat (uniform) prior

fig, axes = plt.subplots(1, 3, figsize=(12, 4), sharey=False)
checkpoints = [5, 15, 30]

for ax, n in zip(axes, checkpoints):
    heads = flips[:n].sum()
    tails = n - heads
    posterior = beta.pdf(theta, alpha + heads, b_param + tails)
    ax.plot(theta, posterior, color="#4e79a7", linewidth=2)
    ax.axvline(true_theta, color="red", linestyle="--", linewidth=1, label=f"True θ={true_theta}")
    ax.set_title(f"After {n} flips ({heads}H / {tails}T)")
    ax.set_xlabel("θ")
    ax.set_ylabel("Density")
    ax.legend(fontsize=8)

plt.suptitle("Bayesian updating — the posterior tightens around the truth", y=1.02)
plt.tight_layout()
plt.show()
Figure 2.1: Bayesian updating across multiple coin flips. Each observation nudges the posterior toward the true bias.

2.5 Summary

  • Probability has two main interpretations: frequentist (long-run frequency) and Bayesian (degree of belief). They share the same axioms but answer different questions.
  • Conditional probability \(P(A \mid B)\) restricts the sample space to cases where \(B\) is true.
  • Bayes’ theorem is the rule for updating beliefs in response to evidence: prior × likelihood ∝ posterior.
  • Incoherent beliefs — those that violate the probability axioms — can be exploited by a clever opponent.

2.6 Further Reading

Jaynes (2003) is the definitive case for probability as logic rather than frequency. For a shorter and more accessible treatment, Nate Silver’s The Signal and the Noise covers Bayesian thinking in the real world without heavy mathematics.