All about that Bayes




Tucumán, marzo 2020

About Me


  • I am a reseacher at CONICET
  • Molecular biologist by training, I used to work on problems in biophysics and structural bioinformatics
  • I work on the development and implementation of software tools for Bayesian statistics and probabilistic modeling


https://aloctavodia.github.io/

Bayesian Statistics

Statistics is about computing p-values < 0.05 and following diagrams:

Well, not really! But this is generally what we learn at Schools/Universities and daily scientific practice.

Bayesian Statistics


  • Bayesian statistics is about uncertainty --> like every other statistical approach ¯\(ツ)
  • Bayesian statistics is a form of modelling --> Let's call it probabilistic modeling!
  • Probability models generate data --> we then invert the model to get the parameters from data


Bayesian Data Analysis



We could call Bayesian data analysis “statistics using conditional probability”, but that wouldn’t put the butts in the seats. Andrew Gelman.



Bayesian statistics*

* AKA Probabilistic modeling, Bayesian modeling, probabilistic machine learning ...

The Posterior


  • A posterior it is the central quantity of Bayesian stastics
  • It is the logical consequence of the model and data (platonic world)
  • By integrating over the posterior we get quantities of interest from it:
    • Mean, credible intervals, quantiles
    • Predictions
    • Get rid of nuisance parameters



Why I like the "statistics is a form of modeling" motto?


  • Models are tools to help us undestand stuff, not mathemagical decision making tools
  • Modeling is more about suitability than about Truth --> Scientific fictions¹ can be useful
  • It is OK to play with models, change them, adapt them reuse them, make them simpler or complex, etc


¹Science in the Age of Computer Simulation

A Gaussian Model



Why we want to model something in a Bayesian way?


  • is the *correct way* (platonic world and pedantic answer)
  • is useful and flexible approach (real world)
    • uncertainty propagation
    • data efficiency
      • a principled way to incorporate domain-knowledge
      • from integration instead of optimization
    • a unified and simple conceptual framework $\implies$ practical flexibility to create custom models


Bayesian Deep Learning and a Probabilistic Perspective of Generalization

A unified conceptual framework: Linear models as a common example


The usual way

$$y_i = \beta_0 + \beta_1 x_i + \dots + \beta_m x_m = \underbrace{\mathbf{X} \; \mathit{\beta}}_{\text{matrix notation}}$$


  • Generally the $\beta_s$ coefficients are estimated using ordinary least squares
  • The result is a point-estimate, the best line fitting the data


A probabilisitc linear model


$$\mu = \mathbf{X} \mathit{\beta} $$$$\mathit{Y} \sim \mathop{N}(\mu, \sigma)$$


  • As we do not know the values of the $\mathit{\beta}$ vector or $\sigma$ we have to set priors
  • If we take the full Bayesian way we will get a distributions of plausible parameters --> a distribution of lines!
  • If the priors are flat the most probable values for the parameters will be the same as for ordinary least squares
  • If the priors are Gaussian this is equivalent to ridge-regression (a regularizing method --> a way to reduce overfitting)

Generalized linear models


Student-t regression (robust to outliers)

$$\mu = \mathbf{X} \mathit{\beta} $$$$\mathit{Y} \sim \mathop{T}(\mu, \sigma, \nu)$$

Logistic regression (binary outcomes)

$$\mu = logistic(\mathbf{X} \mathit{\beta})$$$$\mathit{Y} \sim \mathop{Bin}(\mu)$$

Poisson regression (count outcomes)

$$\mu = exp(\mathbf{X} \mathit{\beta})$$$$\mathit{Y} \sim \mathop{Poisson}(\mu)$$

Whatever regression ¯\(ツ)

$$\mu = f(\mathbf{X} \mathit{\beta})$$$$\mathit{Y} \sim \mathop{\phi}(\mu, \theta)$$

Generalized additive models



$$\mu = \sum_j^m g_j(X)$$$$y \sim \mathop{\phi}(\mu, \theta)$$


where $g$ can be, splines, trees, etc


Either this is madness or it is Hell¹


¹Flatland: A Romance of Many Dimensions

Probabilistic Programming



  • A probabilistic programming language is a programming language designed to describe probabilistic models and then perform inference over those models
  • The promise: clear separation of modeling and inference. Practitioners should focus on modeling, not computational or mathematical details



The two cultures and the probabilistic programming plane (Work in progress)


  • Do not take too serious...

Closing the gap


  • Contribute to re-design how we teach and use statistical methods
    • In many ways, modern Bayesian statistics is closer to Data Science than to "classical statistics"
  • Combine ideas and practices from statistical culture and machine learning culture
    • Both cultures have important things to contribute we just need more communication and tools that make the differences even more diffuse
  • In order to close the gap we need more fancy stuff
    • Efficient and accurate model-agnostic Bayesian inference algorithms
    • High-Quality software
    • An iterative Bayesian workflow including model building and model verification (numerical and visual tools)



Machine Learning and Statistics: Don't Mind the Gap.

PyMC3: Probabilistic programming with Python


  • Model building

    • A large collections of probability distributions
    • A clear and powerful syntax
    • Integration with the PyData-stack
  • Inference

    • Markov Chain Monte Carlo (NUTS, MH)
    • Sequential Monte Carlo (SMC, SMC-ABC)
    • Variational Inference
  • Computational backend

    • Theano --> Speed, automatic differenciation, mathematical optimizations, GPU Support
    • PyMC4 --> Tensorflow Probability

A Gaussian Model with PyMC3



with pm.Model() as model:
    μ = pm.Normal('μ', 0, 10)  # Prior
    σ = pm.HalfNormal('σ', 25)  # Prior
    y_obs = pm.Normal('y_obs', μ, σ, observed=y)  # Likelihood

    trace = pm.sample(1000)   # Inference engine



A Bayesian linear model with PyMC3



with pm.Model() as model_l:
    β = pm.Normal('β', 0, 10, shape=n)  # Prior
    σ = pm.HalfNormal('σ', 25)  # Prior
    μ = pm.math.dot(β, X)  # linear model
    y = pm.Normal('y', μ, σ, observed=y)  # Likelihood

    trace_l = pm.sample(1000)  # inference engine



ArviZ: Exploratory analysis of Bayesian models




  • Diagnoses of the quality of the inference
  • Model criticism, including evaluations of both model assumptions and model predictions
  • Comparison of models, including model selection or model averaging
  • Preparation of the results for a particular audience

  • Works with PyMC3, PyStan, Pyro, emcee, TensorFlow probability...

  • Offers a unified data structure InferenceData based on xarray

Bambi: BAyesian Model-Building Interface in Python





INSERT LOGO HERE


  • A high-level Bayesian model-building interface written in Python
  • Use PyMC3 and PyStan as backend
  • Use ArviZ for diagnostics and ploting








Image Markov Chain Monte Carlo https://github.com/ColCarroll/imcmc