```
using Turing, AlgebraOfGraphics, CairoMakie, DataFrames
using AlgebraOfGraphics:density
using DataFrames:stack
```

## Bayes in Julia

For the past few weeks I’ve been trying out `Julia`

as a tool for Bayesian stats. There’s a lot to like about the language, most of which has been covered elsewhere. It’s intuitive, it’s fast, and it’s *fun*. Granted, it’s also a young ecosystem, so there may be fewer resources available depending on your area of interest. Fortunately, probabilistic programming and Bayesian inference is a space that’s seen plenty of growth recently.

As a mostly-`R`

person, I’ve been using `brms`

and `rstan`

for some time now, so I was curious to try out `Turing.jl`

(here) for my Bayesian workflow. Don’t get me wrong, the Stan-based ecosystem is wonderful, and I’d say `brms`

is the friendliest tool to get your feet wet with Bayesian modeling. Also, the amount of tutorials, guides, and general content available is vast (special shoutout to Solomon Kurz’s translation of Statistical Rethinking). And Stan (accessed through whichever interface works best for you) still tops the charts in terms of reliability, speed, and documentation.

But `brms`

(and every other tool that uses `lm`

or `lme4`

-based syntax) also abstracts away details that I feel are important if we want to better understand how Bayesian inference works behind the scenes. Instead, Turing (very much like raw Stan, or McElreath’s `rethinking`

package in `R`

) makes you spell out every part of your model. But it’s also written entirely in Julia, which makes it incredibly interoperable (and more accessible than raw Stan).

## A model of height

Ok, let’s get started. For this, you’ll need the following packages. Note that I’m also calling a couple of functions because they share names with functions in other packages. This allows me to call `density`

without having to specify that I want the `AlgebraOfGraphics`

implementation.

Let’s start with a very simple toy model, based on a Statistical Rethinking example. We have a bunch of height measurements and we’re trying to infer the mean (`mu`

) and standard deviation (`sigma`

) of our population.

As you can see below, models in Turing are just functions, preceded by the `@model`

macro. Here, `height_mean`

is the name of the model.

```
@model function height_mean(; n = 1)
~ Normal(175, 20)
mu ~ Exponential(1)
sigma ~ filldist(Normal(mu, sigma), n)
y end
```

`height_mean (generic function with 2 methods)`

That’s it. We set our prior for the mean (`mu`

) to 175cm because it’s a reasonable average height, and our prior for `sigma`

(the standard deviation) to a fairly harmless exponential distribution (though note there are better priors for scale parameters). I’ve added `n`

as a parameter because it will come in handy for prior predictive checks and other things.

A couple of notes about the likelihood. The `filldist`

syntax creates an n-length vector of `Normal(mu, sigma)`

distributions, which gives us a performance boost over loops. But we can still use loops. The code above is equivalent to this:

```
@model function height_mean(; n = 1)
~ Normal(175, 20)
mu ~ Exponential(1)
sigma for i in 1:n
~ Normal(mu, sigma)
y[i] end
end
```

One thing I really appreciate about Turing is how easy it is to use for a full Bayesian workflow including prior predictive checks, model iteration, model checking, or posterior predictive checks, for example.

Let’s do some prior predictive checks to see whether our model is producing reasonable results before seeing the actual data.

To do that, we can just use call the `rand`

function (which generates samples from a given generator) on our model to get our `mu`

, `sigma`

and `y`

, straight from the prior.

`rand`

function to sample it. There’s no need for a specific sampling function for every distribution family.`rand(height_mean())`

`(mu = 165.49691935056015, sigma = 0.3013854035355912, y = [165.35444024123953])`

Let’s do this a 1000 times and visualize our results. Note that throughout this blog post we’ll use `Makie.jl`

and `AlgebraOfGraphics.jl`

(which is built on top of Makie) to plot our analyses and results.

`Makie.jl`

, head here.```
= vcat([rand(height_mean()).y for i in 1:1000]...)
y_prior
hist(y_prior)
```

Looks pretty reasonable! Very few people are over 220 cm (or under 120 cm) tall, so this makes sense overall. Note that there’s actually another way to do prior predictive checks. We can just use the `sample`

function that we’ll use later on, but specifying that we want the samples to come from the prior.

```
= sample(height_mean(), Prior(), 1_000);
y_prior_chn
= DataFrame(y_prior_chn)
df
hist(df[:,"y[1]"])
```

We get very similar results.

## Conditioning and sampling

Next, let’s generate our fake data to see whether our model can recover our parameters. Let’s assume our sample of heights comes from a place where the average person is quite tall (e.g. the Netherlands). Let’s have our real `mu`

be 185cm, and our `sigma`

be 15. Once again, we can use the `rand`

function to get 100 samples.

```
= rand(Normal(185, 15), 100)
y_fake
first(y_fake, 5)
```

```
5-element Vector{Float64}:
179.14497444673466
169.31526276323365
167.08122537276944
179.9675689163041
167.5129143676516
```

Now, let’s condition our model on the fake data we generated. To do this in Turing, we can just use `|`

, as shown below. We can then sample from the conditioned model using our sampler of choice.

Here we’re using the no U-turn sampler (NUTS), and we’re running it for 1000 iterations and 2 chains. `MCMCThreads()`

allows us to run chains in parallel if that’s needed.

```
= height_mean(; n = length(y_fake)) | (;y = y_fake);
cond_model
= sample(cond_model, NUTS(), MCMCThreads(), 1_000, 2); m1_chn
```

Now we can go ahead and see whether our model recovers the real parameters.

## Model checking

Let’s use the `summarystats`

function to take a look at our results.

`summarystats(m1_chn)`

Summary Statistics parameters mean std naive_se mcse ess rhat ⋯ Symbol Float64 Float64 Float64 Float64 Float64 Float64 ⋯ mu 183.3720 1.5016 0.0336 0.0344 1880.3006 0.9996 ⋯ sigma 15.2470 0.9675 0.0216 0.0203 2181.6638 0.9998 ⋯ 1 column omitted

That’s not bad at all! Let’s plot the parameters using `AlgebraOfGraphics`

. Note that the `StatsPlot`

(based on `Plots.jl`

) provides off-the-shelf to work with `chain`

objects, but I personally prefer the `Makie`

family. It’s straightforward and highly customizable.

```
= DataFrame(m1_chn)
df_chn
= data(df_chn) * mapping(:iteration, [:mu :sigma])
plt *= mapping(color = :chain => nonnumeric, row = dims(2)) * visual(Lines, alpha = 0.7)
plt
draw(plt)
```

Our chains have mixed well. Let’s plot the actual parameters.

```
= data(df_chn) * mapping([:mu, :sigma])
plt *= mapping(col = dims(1)) * histogram(bins = 20, datalimits = extrema)
plt
draw(plt; facet = (; linkxaxes = :none))
```

Note that we could have also reshaped the dataframe into long format. However, `AlgebraOfGraphics`

has good support for wide data, so it’ll work either way.

```
= stack(df_chn, [:mu, :sigma])
chn_long
= data(chn_long) * mapping(:value)
plt *= mapping(col = :variable) * histogram(bins=20, datalimits = extrema)
plt
# draw the plot
draw(plt, facet = (linkxaxes = :minimal,))
```

## Posterior predictive checks

Finally, some posterior predictive checks. Let’s generate 100 draws from the model. We then stack the dataframe (or `pivot_longer`

if you’re in R) to make it easier to work with.

```
# get the predictions
= predict(height_mean(; n = 100), m1_chn)
predictions
# stack the dataframe
= stack(DataFrame(predictions), 3:102)
df_long
# keep just 20 ddraws
= subset(df_long, :iteration => ByRow(x-> x<=20)) df_long
```

First, let’s do a density overlay plot. As a reminder, this plots the original data (shown in dark blue) overlayed with a number of posterior predictive samples (showing in light grey). We can use this to assess whether our generative model is capturing the features of the real data.

```
# plot the posterior predictive samples
= data(df_long) * mapping(:value)
plt *= mapping(group = :iteration => nonnumeric)
plt *= visual(Density, color = (:white, 0),
plt = (:grey, 0.2),
strokecolor = 3)
strokewidth
# plot the original data
= data((;y = y_fake)) * mapping(:y) * visual(Density, color = (:white, 0),
plt2 = (:navy, 0.9),
strokecolor = 5)
strokewidth
= (plt + plt2)
both
# draw both layers
draw(both)
```

That’s pretty good! Which makes sense, because it’s a very simple model, and we generated the data ourselves.

Let’s also do some histograms, comparing the distribution of the original data with 14 draws from the posterior (why 14? it’s a nice number and it’ll fit in a 3x5 grid including the original data).

```
# keep just 14 draws
= subset(df_long, :iteration => ByRow(x-> x<=14))
df_hist
= data(df_hist) * mapping(:value, layout = :iteration => nonnumeric)
plt
*= histogram(bins=20) * visual(color = :grey, alpha=0.4)
plt
= data((;y= y_fake)) * mapping(:y)
plt2 *= histogram(bins=20) * visual(color = :navy, alpha = 0.7) plt2
```

`Layer(AlgebraOfGraphics.Visual(Any, {:color = :navy, :alpha = 0.7}) ∘ AlgebraOfGraphics.HistogramAnalysis{MakieCore.Automatic, Int64}(MakieCore.Automatic(), 20, :left, :none), (y = [179.14497444673466, 169.31526276323365, 167.08122537276944, 179.9675689163041, 167.5129143676516, 173.14049255371256, 194.02095323245283, 160.73859222151324, 172.82434970656556, 186.67500788914083 … 194.60925937976683, 170.6911063030617, 169.51659051941783, 196.88120605204483, 199.24394114055528, 160.70712226008547, 172.354374931714, 176.39932231465602, 180.11454493489586, 186.45320095995854],), Any[:y], {})`

Now let’s see whether we can manage to plot both the original data and the model predictions.

```
= 5
ncol = 3
nrow
= draw(plt; palettes = (layout = vec([(j, i) for i in 1:ncol, j in 1:nrow])[2:end],),
fg = (linkxaxes = :none,),
facet = (width = 550, height = 350))
axis
draw!(fg.figure[1, 1], plt2)
fg
```

And that’s it! Next time we’ll look at some other `Turing`

models and dig deeper into how we can use `Makie`

/`AlgebraOfGraphics`

for our Bayesian workflow.