Bayesian Generalization :: Probability & Probabilistic Computing Tutorial

Bayesian Generalization :: Probability & Probabilistic Computing Tutorialhttps://josephausterweil.github.io/probintro/intro2/07_generalization/index.htmlBayesian Generalization How do you learn a concept from a handful of examples? You see three numbers that fit a hidden rule — or a few bentos with a golden sticker — and somehow you know which other things fit too. This chapter shows that the same Bayes’ rule you already know becomes a model of human generalization once you make a single shift: a hypothesis is a set. Try it yourself A companion notebook builds the number game and the size principle interactively: 📓 Open in Colab: 07_generalization.ipynbHugoen-usSun, 31 May 2026 00:00:00 +0000Setup & the Frameworkhttps://josephausterweil.github.io/probintro/intro2/07_generalization/setup-and-framework/index.htmlSun, 31 May 2026 00:00:00 +0000https://josephausterweil.github.io/probintro/intro2/07_generalization/setup-and-framework/index.htmlWhat you’re bringing with you This chapter changes exactly one thing about what you already know — and it is worth saying up front what stays the same, because almost everything does. 📦 You already have all of these Everything in this chapter is built out of tools you have used in earlier chapters: Bayes’ rule as posterior ∝ likelihood × prior. You have used this in every chapter that involved learning — updating a belief by multiplying a prior by a likelihood and renormalizing. ← Review in Chapter 4 The predictive distribution — “given what I’ve seen, what should I expect for the next observation?” You met the posterior-predictive in Chapter 4 (“what weight should I expect for the next bento?”). ← Review in Chapter 4 Conditioning = restricting to what’s consistent with the data. Observing something throws away every possibility that disagrees with it, and you renormalize over what survives. ← Review in the GenJAX tutorial, Chapter 4 Categorization — computing P(category | observation) when an item could belong to one of several groups. You met this with two Gaussians in the Chapter 4 preview and the Gaussian-clusters work. ← Review in Chapter 5 The one new idea: in every one of those chapters, the unknown you reasoned about was either a number (a mean μ) or a yes/no fact (is this bento tonkatsu? is the taxi blue?). In this chapter, the unknown becomes a set — a rule about which things share a property. That single shift — from “which value?” to “which set?” — is the whole content of Bayesian generalization. The machinery for reasoning about it is the machinery you already have.The Number Game & the Size Principlehttps://josephausterweil.github.io/probintro/intro2/07_generalization/number-game-size-principle/index.htmlSun, 31 May 2026 00:00:00 +0000https://josephausterweil.github.io/probintro/intro2/07_generalization/number-game-size-principle/index.htmlGeneralization is a posterior-weighted vote Now the payoff. We can state, in one line, how to predict whether a novel item $y$ has the property. The probability is the total posterior belief sitting on the hypotheses that contain $y$: $$p(y \in C \mid X) = \sum_{h \in \mathcal{H}} \mathbf{1}[y \in h] \cdot p(h \mid X).$$ Read it in plain words first: every hypothesis casts a vote. Each one’s vote is weighted by how much we now believe it — its posterior $p(h \mid X)$ — and a hypothesis votes “yes” for $y$ only if it actually contains $y$ (that’s the $\mathbf{1}[y \in h]$). Sum the yes-votes and you have the prediction. This is the two-rule calculation from §2, written for any number of rules. Every symbol in it was defined in §4.Continuous Concepts & Shepard's Lawhttps://josephausterweil.github.io/probintro/intro2/07_generalization/continuous-and-shepards-law/index.htmlSun, 31 May 2026 00:00:00 +0000https://josephausterweil.github.io/probintro/intro2/07_generalization/continuous-and-shepards-law/index.htmlOn-ramp 3: continuous concepts and the rectangle game So far every hypothesis space has been a finite list — the seven number-rules of §4–§6. That’s why we could enumerate: score every rule, normalize, done. But many real concepts live on a continuous scale. “Healthy blood-sugar level,” “a comfortable room temperature,” “roughly lunchtime” — each is an interval on some axis, and there are infinitely many candidate intervals. Does the framework still work?No Free Lunch & Summaryhttps://josephausterweil.github.io/probintro/intro2/07_generalization/no-free-lunch-and-summary/index.htmlSun, 31 May 2026 00:00:00 +0000https://josephausterweil.github.io/probintro/intro2/07_generalization/no-free-lunch-and-summary/index.htmlNo Free Lunch: why the prior is unavoidable Every section so far leaned on a hypothesis space $\mathcal{H}$ that we chose — seven sensible number-rules, “multiples of $k$” for the numbers, intervals for the continuous case. We even saw in §4 that the choice of $\mathcal{H}$ is a prior: anything left out has probability zero. It’s natural to feel uneasy about that. Isn’t choosing $\mathcal{H}$ cheating? Shouldn’t a truly unbiased learner consider every possible rule and let the data sort it out? This last section shows why the answer is a hard no — and why the prior isn’t a wart on the method but the very thing that makes learning possible.