Causal Inference, Bayesian Reasoning, and the Science of Learning

Author

Troy Altus

Published

May 1, 2026

NoteLearning Objectives
  • Understand what we mean by learning in both human and machine contexts
  • Distinguish between the three major frameworks for reasoning under uncertainty
  • Appreciate why causal thinking is fundamentally different from statistical pattern-matching
  • Get a map of the book’s five-part journey

0.1 The Oracle Problem

In the summer of 1854, a physician named John Snow did something that was, at the time, considered deeply strange. There was a cholera epidemic tearing through the Soho district of London, and Snow did not reach for the prevailing theory — that the disease traveled through foul air, the so-called miasma. Instead, he walked the neighborhood with a notebook, marking deaths on a map. The cluster that emerged pointed, with uncomfortable precision, at a water pump on Broad Street.

Snow had not discovered a statistical correlation. He had identified a cause.

The distinction matters enormously. Thousands of people lived near the pump and did not die. Dozens died who lived farther away but happened to work near it. The pattern alone did not tell the story. Snow had to reason about mechanisms — about how cholera spread, about which water was shared and which was not, about what would happen if someone intervened and removed the pump handle.

That last word, intervene, turns out to be the key to everything in this book.

0.2 Three Ways to Reason Under Uncertainty

Modern quantitative reasoning comes in roughly three flavors, and they answer subtly different questions.

The first is frequentist statistics. A frequentist asks: given that this hypothesis is true, how often would I see data like this? It is a powerful framework, responsible for much of the scientific infrastructure of the twentieth century. But it has a notable quirk: it cannot directly tell you how probable a hypothesis is. It can only tell you how improbable the data would be if the hypothesis were false.

The second is Bayesian reasoning. A Bayesian asks: given this data, how should I update my beliefs? This is closer to how humans actually think. You begin with some prior expectation — cholera probably travels in water, because of what you know about other waterborne pathogens — and you revise it as evidence accumulates. The mathematics is straightforward, but the philosophical implications have been argued about for two centuries and show no signs of settling down.

The third is causal inference. A causal reasoner asks: what would happen if I changed something? Not what pattern do I see, but what would the world look like under a different intervention? This framework, formalized largely in the last forty years, is arguably the most powerful of the three — and the least widely understood.

This book covers all three, and more. But the thread connecting them is a single idea: how do you reason well when you don’t know everything?

0.3 A Map of the Book

Part I builds the foundation. Probability is not just arithmetic — it is a way of encoding belief, and Bayesian inference is the machinery that keeps those beliefs honest as evidence arrives. These two chapters are the bedrock on which everything else rests.

Part II takes up causality directly. Why does correlation mislead us so reliably? What is a cause, precisely, and how can you identify one without running a controlled experiment? The tools here — directed graphs, the do-calculus, counterfactual reasoning — were developed by researchers who were frustrated that statistics could describe the world but not explain it.

Part III explores what happens when the world refuses to be crisp. Classical logic deals in true and false. Fuzzy logic deals in somewhat true and more or less false, which turns out to be a much better description of how categories actually work — in both language and in the brain.

Part IV surveys the landscape of learning algorithms. Starting from linear regression and working through neural networks, deep learning, and causal machine learning, this section asks a practical question: what are these systems actually doing, and what are the limits of what they can learn?

Part V turns the lens on human cognition. The Bayesian brain hypothesis. The way people reason about causes and interventions. The fuzzy, prototype-based way that concepts seem to be stored. And finally, the uncomfortable question: what can machines learn that humans cannot, and vice versa?

Code
import numpy as np
import matplotlib.pyplot as plt

# Prior: we believe the coin is fair (Beta(1,1) = uniform)
# We flip 10 times and see 7 heads. How should our belief update?

from scipy.stats import beta

theta = np.linspace(0, 1, 300)
prior = beta.pdf(theta, 1, 1)           # flat prior
posterior_5h = beta.pdf(theta, 6, 2)    # 5 heads, 1 tails (Beta(1+5, 1+1))
posterior_7h = beta.pdf(theta, 8, 3)    # 7 heads, 3 tails (Beta(1+7, 1+3))

fig, ax = plt.subplots(figsize=(8, 4))
ax.plot(theta, prior, label="Prior (no data)", linestyle="--", color="gray")
ax.plot(theta, posterior_5h, label="After 5 heads in 6 flips", color="#4e79a7")
ax.plot(theta, posterior_7h, label="After 7 heads in 10 flips", color="#f28e2b")
ax.set_xlabel("θ (probability of heads)")
ax.set_ylabel("Probability density")
ax.set_title("Bayesian belief update — coin bias")
ax.legend()
plt.tight_layout()
plt.show()
Figure 1: A coin flip and a belief update — the simplest possible Bayesian story.

The figure above shows the simplest possible Bayesian story: we begin with no opinion about whether a coin is fair, watch it land heads seven times in ten flips, and update our beliefs accordingly. The posterior distribution — our revised opinion after seeing data — has shifted toward higher values of θ. It has not moved to certainty that the coin is biased, because ten flips is not a lot of evidence. It has moved proportionally to the evidence, which is exactly what we would hope rational reasoning to do.

That proportionality is what this book is about.

0.4 Summary

  • Learning, in the broadest sense, is the process of updating beliefs in response to evidence.
  • Frequentist statistics, Bayesian reasoning, and causal inference are three complementary frameworks, each suited to different questions.
  • The key question that separates causal reasoning from statistical reasoning is: what would happen if we intervened?
  • The five parts of this book follow a connected arc from probability and belief, through causality and fuzzy logic, through learning algorithms, to the architecture of human cognition.

0.5 Further Reading

Pearl and Mackenzie (2018) provides an accessible entry into causal reasoning, written for a general audience — an excellent companion to Part II of this book. Jaynes (2003) is the foundational text on probability as logic; dense, but enormously rewarding. For the history of statistical thinking, Salsburg’s The Lady Tasting Tea is a pleasure.