Selected Publications
PyMC: a modern, and comprehensive probabilistic programming framework in Python
PyMC is a probabilistic programming library for Python that provides tools for
constructing and fitting Bayesian models. It offers an intuitive, readable syntax that is
close to the natural syntax statisticians use to describe models. PyMC leverages the
symbolic computation library PyTensor, allowing it to be compiled into a variety of
computational backends, such as C, JAX, and Numba, which in turn offer access to
different computational architectures including CPU, GPU, and TPU. Being a
general modeling framework, PyMC supports a variety of models including
generalized hierarchical linear regression and classification, time series, ordinary
differential equations (ODEs), and non-parametric models such as Gaussian
processes (GPs). We demonstrate PyMC’s versatility and ease of use with examples
spanning a range of common statistical models. Additionally, we discuss the positive
role of PyMC in the development of the open-source ecosystem for probabilistic
programming.
PreliZ: A tool-box for prior elicitation
In a Bayesian modeling workflow, a prior distribution can be chosen in different ways as long as it captures the uncertainty about model parameters prior to observing any data. Particularly, prior elicitation refers to the process of transforming the knowledge of a particular domain into well-defined probability distributions. Here we introduce PreliZ, a Python package aimed at helping practitioners choose prior distributions.
Prior knowledge elicitation: The past, present, and future
Specification of the prior distribution for a Bayesian model is a central part of the Bayesian workflow for data analysis, but it is often difficult even for statistical experts. Prior elicitation transforms domain knowledge of various kinds into well-defined prior distributions, and offers a solution to the prior specification problem, in principle. In practice, however, we are still fairly far from having usable prior elicitation tools that could significantly influence the way we build probabilistic models in academia and industry. We analyze the state of the art by identifying a range of key aspects of prior knowledge elicitation. The existing prior elicitation literature is reviewed and categorized in these terms. This allows recognizing under-studied directions in prior elicitation research, finally leading to a proposal of several new avenues to improve prior elicitation methodology.
Bambi: A simple interface for fitting Bayesian linear models in Python
Bambi (BAyesian Model Building Interface) is an open source Python package that is built on top of the PyMC probabilistic programming framework and the ArviZ package for exploratory analysis of Bayesian models. Bambi makes it easy to specify complex generalized linear hierarchical models using a formula notation similar to those found in the popular R packages lme4, nlme, rstanarm and brms.
A call for changing data analysis practices: from philosophy and comprehensive reporting to modeling approaches and back
Many applied disciplines have recognized problems related to the practice of data analysis within their own communities. Some of them have even declared the existence of a statistical crisis that has raised doubts about findings that were once considered well established. In biological sciences, the recognition of misuse or poor reporting of statistics has only begun to be noticed, and is still far behind other disciplines where reforms are currently being explored. These problems are at least partially related to an unclear understanding of the purpose of the statistical tools or the correct interpretation of statistics themselves (e.g. p-values, confidence intervals, Bayes factors). We consider the ways in which data analysis is taught, performed, and presented in journals to be the main issues. A successful statistical analysis requires both statistical skills and also the ability and willingness to put the statistical results in the context of a particular problem. Here we list some of the issues we think require urgent attention, provide some evidence for misuse and poor reporting practices in the plant-soil sciences, and conclude by offering feasible solutions to both frequentists and Bayesian data analysis paradigms. We do not advocate for one of these paradigms over the other; instead we provide recommendations for the appropriate use of each to answer scientific questions. We also hope this opinion paper gives plant-soil researchers an entry point into the statistical literature to facilitate self-teaching and to properly apply, report, and draw inferences from either the classic frequentist or Bayesian statistical methods
Bayesian additive regression trees for probabilistic programming
Bayesian additive regression trees (BART) is a non-parametric method to approximate functions. It is a black-box method based on the sum of many trees where priors are used to regularize inference, mainly by restricting trees' learning capacity so that no individual tree is able to explain the data, but rather the sum of trees. We discuss BART in the context of probabilistic programming languages (PPLs), specifically we introduce a BART implementation extending PyMC, a Python library for probabilistic programming. We present a few examples of models that can be built using this probabilistic programming-oriented version of BART, discuss recommendations for sample diagnostics and selection of model hyperparameters, and finally we close with limitations of the current approach and future extensions.
Exploring the quality of protein structural models from a Bayesian perspective
We explore how ideas and practices common in Bayesian modeling can be applied to assess the quality of 3D protein structural models. The basic premise of our
approach is that the evaluation of a Bayesian statistical model's fit may reveal aspects of the quality of a structure when the fitted data is related to protein structural properties. Therefore, we fit a Bayesian hierarchical linear regression model to experimental and theoretical 13Cα chemical shifts. Then, we propose two complementary approaches
for the evaluation of such fitting: (a) in terms of the expected differences between experimental and posterior predicted values; (b) in terms of the leave-one-out cross-validation
point-wise predictive accuracy. Finally, we present visualizations that can help interpret these evaluations.
ArviZ a unified library for exploratory analysis of Bayesian models in Python
ArviZ is a Python Library for exploratory analysis of Bayesian models in Python including: Diagnoses of the quality of the inference Model criticism, evaluations of both model assumptions and model predictions, model selection and model averaging, preparation of the results for a particular audience. ArviZ has support for several probabilistic programing libraries in Python, like PyMC, PyStan, Pyro, emcee, NumPyro, Tensorflow Probability. Additionaly it can be even used from Julia.