<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Bayesian Generalization :: Probability &amp; Probabilistic Computing Tutorial</title><link>https://josephausterweil.github.io/probintro/intro2/07_generalization/index.html</link><description>Bayesian Generalization How do you learn a concept from a handful of examples? You see three numbers that fit a hidden rule — or a few bentos with a golden sticker — and somehow you know which other things fit too. This chapter shows that the same Bayes’ rule you already know becomes a model of human generalization once you make a single shift: a hypothesis is a set.
Try it yourself A companion notebook builds the number game and the size principle interactively: 📓 Open in Colab: 07_generalization.ipynb</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Sun, 31 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://josephausterweil.github.io/probintro/intro2/07_generalization/index.xml" rel="self" type="application/rss+xml"/><item><title>Setup &amp; the Framework</title><link>https://josephausterweil.github.io/probintro/intro2/07_generalization/setup-and-framework/index.html</link><pubDate>Sun, 31 May 2026 00:00:00 +0000</pubDate><guid>https://josephausterweil.github.io/probintro/intro2/07_generalization/setup-and-framework/index.html</guid><description>What you’re bringing with you This chapter changes exactly one thing about what you already know — and it is worth saying up front what stays the same, because almost everything does.
📦 You already have all of these Everything in this chapter is built out of tools you have used in earlier chapters:
Bayes’ rule as posterior ∝ likelihood × prior. You have used this in every chapter that involved learning — updating a belief by multiplying a prior by a likelihood and renormalizing. ← Review in Chapter 4 The predictive distribution — “given what I’ve seen, what should I expect for the next observation?” You met the posterior-predictive in Chapter 4 (“what weight should I expect for the next bento?”). ← Review in Chapter 4 Conditioning = restricting to what’s consistent with the data. Observing something throws away every possibility that disagrees with it, and you renormalize over what survives. ← Review in the GenJAX tutorial, Chapter 4 Categorization — computing P(category | observation) when an item could belong to one of several groups. You met this with two Gaussians in the Chapter 4 preview and the Gaussian-clusters work. ← Review in Chapter 5 The one new idea: in every one of those chapters, the unknown you reasoned about was either a number (a mean μ) or a yes/no fact (is this bento tonkatsu? is the taxi blue?). In this chapter, the unknown becomes a set — a rule about which things share a property. That single shift — from “which value?” to “which set?” — is the whole content of Bayesian generalization. The machinery for reasoning about it is the machinery you already have.</description></item><item><title>The Number Game &amp; the Size Principle</title><link>https://josephausterweil.github.io/probintro/intro2/07_generalization/number-game-size-principle/index.html</link><pubDate>Sun, 31 May 2026 00:00:00 +0000</pubDate><guid>https://josephausterweil.github.io/probintro/intro2/07_generalization/number-game-size-principle/index.html</guid><description>Generalization is a posterior-weighted vote Now the payoff. We can state, in one line, how to predict whether a novel item $y$ has the property. The probability is the total posterior belief sitting on the hypotheses that contain $y$:
$$p(y \in C \mid X) = \sum_{h \in \mathcal{H}} \mathbf{1}[y \in h] \cdot p(h \mid X).$$
Read it in plain words first: every hypothesis casts a vote. Each one’s vote is weighted by how much we now believe it — its posterior $p(h \mid X)$ — and a hypothesis votes “yes” for $y$ only if it actually contains $y$ (that’s the $\mathbf{1}[y \in h]$). Sum the yes-votes and you have the prediction. This is the two-rule calculation from §2, written for any number of rules. Every symbol in it was defined in §4.</description></item><item><title>Continuous Concepts &amp; Shepard's Law</title><link>https://josephausterweil.github.io/probintro/intro2/07_generalization/continuous-and-shepards-law/index.html</link><pubDate>Sun, 31 May 2026 00:00:00 +0000</pubDate><guid>https://josephausterweil.github.io/probintro/intro2/07_generalization/continuous-and-shepards-law/index.html</guid><description>On-ramp 3: continuous concepts and the rectangle game So far every hypothesis space has been a finite list — the seven number-rules of §4–§6. That’s why we could enumerate: score every rule, normalize, done. But many real concepts live on a continuous scale. “Healthy blood-sugar level,” “a comfortable room temperature,” “roughly lunchtime” — each is an interval on some axis, and there are infinitely many candidate intervals. Does the framework still work?</description></item><item><title>No Free Lunch &amp; Summary</title><link>https://josephausterweil.github.io/probintro/intro2/07_generalization/no-free-lunch-and-summary/index.html</link><pubDate>Sun, 31 May 2026 00:00:00 +0000</pubDate><guid>https://josephausterweil.github.io/probintro/intro2/07_generalization/no-free-lunch-and-summary/index.html</guid><description>No Free Lunch: why the prior is unavoidable Every section so far leaned on a hypothesis space $\mathcal{H}$ that we chose — seven sensible number-rules, “multiples of $k$” for the numbers, intervals for the continuous case. We even saw in §4 that the choice of $\mathcal{H}$ is a prior: anything left out has probability zero. It’s natural to feel uneasy about that. Isn’t choosing $\mathcal{H}$ cheating? Shouldn’t a truly unbiased learner consider every possible rule and let the data sort it out? This last section shows why the answer is a hard no — and why the prior isn’t a wart on the method but the very thing that makes learning possible.</description></item></channel></rss>