The biggest empirical Bayes estimator in history

In the context of our SBI reading group Jakob Heiss and Drew Nguyen insightfully suggested reading about TabPFN (e.g. Müller et al. (2021) and several subsequent papers). TabPFN is basically SBI on SBI for univariate supervised learning posterior predictive targets. I find it incredible that this actually works, especially since it flies in the face of my intuition that SBI should get easier when it has to generalize less, e.g. when you have highly targeted proposal distributions that really focus on the region of interest in the space of possible model / data pairs, as in SNRE for example (Papamakarios, Sterratt, and Murray (2019)).

As we went back and forth about why it is even possible to do such a thing as TabPFN, Jakob made the following interesting point. The prior for TabPFN consists of randomly generated decision trees, multi-layer perceptrons, and feature transformations. Why these things? Because decades of reasearch in supervised learning have shown empirically that these are the functional forms that are capable of usefully representing real-world processes. We might then reasonably assume the reverse: that randomly generated functions in this class produce datasets with realistic relationships between regressors and responses.

In this story, starting with any possible ways of representing functional forms that we can think of, we have collectively used out-of-sample performance on a zillion problems to select functional forms “most likely” to be those that generate the relationship in an unseen dataset. Each model fit, each ML blog entry, each textbook reading, each homework assignment, each ML seminar, has all been performing noisy hill climbing on the TabPFN prior, as represented informally in the universe of collectively agreed-upon ML best practices.

Viewed this way, these decades of supervised learning have taken the form of empirical Bayes selection of the TabPFN prior — probably the biggest empirical Bayes estimator in all of history. And once you see it this way, we might also reasonably ask whether, if we now rely too much on TabPFN, we run the risk of terminating the empirical Bayes learning process too early.

References

Müller, S., N. Hollmann, S. Arango, J. Grabocka, and F. Hutter. 2021. “Transformers Can Do Bayesian Inference.” arXiv Preprint arXiv:2112.10510.

Papamakarios, G., D. Sterratt, and I. Murray. 2019. “Sequential Neural Likelihood: Fast Likelihood-Free Inference with Autoregressive Flows.” In The 22nd International Conference on Artificial Intelligence and Statistics, 837–48. PMLR.