Research
StochTree: BART-based modeling in R and Python
Herren, A., Hahn, P.R., Murray, J. & Carvalho, C (2025)
arXiv | Paper
stochtree is a C++ library for Bayesian tree ensemble models such as BART and Bayesian Causal Forests (BCF), as well as user-specified variations. Unlike previous BART packages, stochtree provides bindings to both R and Python for full interoperability. stochtree boasts a more comprehensive range of models relative to previous packages, including heteroskedastic forests, random effects, and treed linear models. Additionally, stochtree offers flexible handling of model fits: the ability to save model fits, reinitialize models from existing fits (facilitating improved model initialization heuristics), and pass fits between R and Python. On both platforms, stochtree exposes lower-level functionality, allowing users to specify models incorporating Bayesian tree ensembles without needing to modify C++ code. We illustrate the use of stochtree in three settings: i) straightfoward applications of existing models such as BART and BCF, ii) models that include more sophisticated components like heteroskedasticity and leaf-wise regression models, and iii) as a component of custom MCMC routines to fit nonstandard tree ensemble models.
Deep Learning for Causal Inference: A Comparison of Architectures for Heterogeneous Treatment Effect Estimation
Papakostas, D., Herren, A., Hahn, P.R. & Castillo, K (2024)
arXiv | Paper
This work began as an in-depth comparison of neural network architectures for causal inference for a PhD course taken by three of its authors. Credit for polishing it and adding an in-depth study of a sleep dataset owes to Demetri Papakostas. This work ultimately convinced me of the importance of decision trees for tabular datasets (which underpin many causal inference problems) and motivated my work on BART / BCF.
On true versus estimated propensity scores for treatment effect estimation with discrete controls
Herren, A. & Hahn, P. R.(2023)
arXiv | Paper
The finite sample variance of an inverse propensity weighted estimator is derived in the case of discrete control variables with finite support. The obtained expressions generally corroborate widely-cited asymptotic theory showing that estimated propensity scores are superior to true propensity scores in the context of inverse propensity weighting. However, similar analysis of a modified estimator demonstrates that foreknowledge of the true propensity function can confer a statistical advantage when estimating average treatment effects.
Feature selection in stratification estimators of causal effects
Hahn, P. R. & Herren, A. (2022)
arXiv | Paper
What role (if any) can modern, machine-learning-based feature selection techniques play in average treatment effect (ATE) estimation in causal inference? This work addresses the question under several assumptions:
- Discrete covariates
- No post-treatment covariates
- No unmeasured confounding
In this case, a stratification estimator that estimates and re-aggregates the treatment-control contrasts \(\bar{Y}_{Z=1,X=x} - \bar{Y}_{Z=0,X=x}\) for each unique x in the covariate space will identify the ATE.
The paper fuses three frameworks for doing causal inference (Causal DAGs, Potential Outcomes, and Structural Equations) and uses important concepts from each framework. We establish some theory on minimality and optimality of adjustment sets and then illustrate the problems and pitfalls of feature selection in a series of examples.
Statistical Aspects of SHAP: Functional ANOVA for Model Interpretation
Herren, A. & Hahn, P. R. (2022)
SHAP (Lundberg and Lee 2017) is a popular tool for assessing feature importance in machine learning models. This paper looks at some of the statistical challenges that present themselves in estimating SHAP scores:
- How many synthetic samples to generate and pass through the model’s prediction function (and by what sampling scheme)
- How to choose a reference distribution for the averaging taking place in each of the synthetic samples
In investigating these questions, the paper discusses several connections with the sensitivity analysis and design of experiments literature, in particular:
- Functional ANOVA and the notion of effective dimensionality (Kucherenko et al 2009)
- Fractional factorial designs and the hypothesis of factor sparsity (Box and Meyer 1986)
Semi-supervised learning and the question of true versus estimated propensity scores
Herren, A. & Hahn, P. R. (2020)
Suppose we have data:
- \(Y\): an outcome of interest
- \(Z\): a treatment that may (or may not) causally impact the outcome
- \(X\): a set of control variables that may be related to \(Z\), \(Y\), or both
and suppose we’re willing to make all of the assumptions that would allow us to identify and estimate a causal effect of \(Z\) on \(Y\) after adjusting for \(X\).
If we were given a large amount of data from the same distribution, but without \(Y\) observed (“unlabeled data”), can we use that data in our estimate of the causal effect?
The answer, it turns out, is “yes!” but the explanation is somewhat more subtle than “more data is always better.” This paper explores the challenges and opportunities that come with doing causal inference on “unlabeled data.”