Osvaldo Martin's personal site

Selected Publications

For a full list of publications you can check Google Scholar.

Recommendations for visual predictive checks in Bayesian workflow

A key step in the Bayesian workflow for model building is the graphical assessment of model predictions, whether these are drawn from the prior or posterior predictive distribution. The goal of these assessments is to identify whether the model is a reasonable (and ideally accurate) representation of the domain knowledge and/or observed data. There are many commonly used visual predictive checks which can be misleading if their implicit assumptions do not match the reality. Thus, there is a need for more guidance for selecting, interpreting, and diagnosing appropriate visualizations. As a visual predictive check itself can be viewed as a model fit to data, assessing when this model fails to represent the data is important for drawing well-informed conclusions. We present recommendations and diagnostic tools to mitigate ad-hoc decision-making in visual predictive checks. These contributions aim to improve the robustness and interpretability of Bayesian model criticism practices. We offer recommendations for appropriate visual predictive checks for observations that are: continuous, discrete, or a mixture of the two. We also discuss diagnostics to aid in the selection of visual methods. Specifically, in the detection of an incorrect assumption of continuously-distributed data: identifying when data is likely to be discrete or contain discrete components, detecting and estimating possible bounds in data, and a diagnostic of the goodness-of-fit to data for density plots made through kernel density estimates.

PyMC: a modern, and comprehensive probabilistic programming framework in Python

PyMC is a probabilistic programming library for Python that provides tools for constructing and ﬁtting Bayesian models. It offers an intuitive, readable syntax that is close to the natural syntax statisticians use to describe models. PyMC leverages the symbolic computation library PyTensor, allowing it to be compiled into a variety of computational backends, such as C, JAX, and Numba, which in turn offer access to different computational architectures including CPU, GPU, and TPU. Being a general modeling framework, PyMC supports a variety of models including generalized hierarchical linear regression and classiﬁcation, time series, ordinary differential equations (ODEs), and non-parametric models such as Gaussian processes (GPs). We demonstrate PyMC’s versatility and ease of use with examples spanning a range of common statistical models. Additionally, we discuss the positive role of PyMC in the development of the open-source ecosystem for probabilistic programming.

PreliZ: A tool-box for prior elicitation

In a Bayesian modeling workflow, a prior distribution can be chosen in different ways as long as it captures the uncertainty about model parameters prior to observing any data. Particularly, prior elicitation refers to the process of transforming the knowledge of a particular domain into well-defined probability distributions. Here we introduce PreliZ, a Python package aimed at helping practitioners choose prior distributions.

Prior knowledge elicitation: The past, present, and future

Specification of the prior distribution for a Bayesian model is a central part of the Bayesian workflow for data analysis, but it is often difficult even for statistical experts. Prior elicitation transforms domain knowledge of various kinds into well-defined prior distributions, and offers a solution to the prior specification problem, in principle. In practice, however, we are still fairly far from having usable prior elicitation tools that could significantly influence the way we build probabilistic models in academia and industry. We analyze the state of the art by identifying a range of key aspects of prior knowledge elicitation. The existing prior elicitation literature is reviewed and categorized in these terms. This allows recognizing under-studied directions in prior elicitation research, finally leading to a proposal of several new avenues to improve prior elicitation methodology.

Bambi: A simple interface for fitting Bayesian linear models in Python

Bambi (BAyesian Model Building Interface) is an open source Python package that is built on top of the PyMC probabilistic programming framework and the ArviZ package for exploratory analysis of Bayesian models. Bambi makes it easy to specify complex generalized linear hierarchical models using a formula notation similar to those found in the popular R packages lme4, nlme, rstanarm and brms.

A call for changing data analysis practices: from philosophy and comprehensive reporting to modeling approaches and back

Many applied disciplines have recognized problems related to the practice of data analysis within their own communities. Some of them have even declared the existence of a statistical crisis that has raised doubts about findings that were once considered well established. In biological sciences, the recognition of misuse or poor reporting of statistics has only begun to be noticed, and is still far behind other disciplines where reforms are currently being explored. These problems are at least partially related to an unclear understanding of the purpose of the statistical tools or the correct interpretation of statistics themselves (e.g. p-values, confidence intervals, Bayes factors). We consider the ways in which data analysis is taught, performed, and presented in journals to be the main issues. A successful statistical analysis requires both statistical skills and also the ability and willingness to put the statistical results in the context of a particular problem. Here we list some of the issues we think require urgent attention, provide some evidence for misuse and poor reporting practices in the plant-soil sciences, and conclude by offering feasible solutions to both frequentists and Bayesian data analysis paradigms. We do not advocate for one of these paradigms over the other; instead we provide recommendations for the appropriate use of each to answer scientific questions. We also hope this opinion paper gives plant-soil researchers an entry point into the statistical literature to facilitate self-teaching and to properly apply, report, and draw inferences from either the classic frequentist or Bayesian statistical methods

Bayesian additive regression trees for probabilistic programming

Bayesian additive regression trees (BART) is a non-parametric method to approximate functions. It is a black-box method based on the sum of many trees where priors are used to regularize inference, mainly by restricting trees' learning capacity so that no individual tree is able to explain the data, but rather the sum of trees. We discuss BART in the context of probabilistic programming languages (PPLs), specifically we introduce a BART implementation extending PyMC, a Python library for probabilistic programming. We present a few examples of models that can be built using this probabilistic programming-oriented version of BART, discuss recommendations for sample diagnostics and selection of model hyperparameters, and finally we close with limitations of the current approach and future extensions.

ArviZ a unified library for exploratory analysis of Bayesian models in Python

ArviZ is a Python Library for exploratory analysis of Bayesian models in Python including: Diagnoses of the quality of the inference Model criticism, evaluations of both model assumptions and model predictions, model selection and model averaging, preparation of the results for a particular audience. ArviZ has support for several probabilistic programing libraries in Python, like PyMC, PyStan, Pyro, emcee, NumPyro, Tensorflow Probability. Additionaly it can be even used from Julia.