knitr::opts_chunk$set(echo = TRUE)

library(MASS)
library(brms)
library(ggplot2)

Let’s assume that our data \(y\) is real valued and our predictor is “Age” and a group number.

y <- c(rnorm(15, mean = 1, sd = 2), rnorm(15, mean = 10, sd = 2.5))
dat1 <- data.frame(y)
dat1$age <- rnorm(30, mean = 40, sd = 10)
dat1$grp <- c(rep(1,15),rep(2,15))
head(dat1)

We would like to model this data according to the real-valued predictor age, as well as the categorical predictor grp. What can go wrong?

Encode `as.factor` categorical features

Let’s imagine we want to create a simple model of the form \(y \sim N(\mu, \sigma)\) where \(\mu = a_i + b \cdot Age\), where \(a_i\) is the intercept for i-th group. That can be easily encoded in the brms formula

model_formula<- bf(y ~ 0 +  grp + age)

but it’s crucial that the data is encoded accordingly

dat1$grp <- as.factor(dat1$grp)

and then fit

fit0 <- brm(model_formula,
data = dat1, family = gaussian())

Examine the posterior

Let’s say that we fit our model without caring much about enconding the data

y <- c(rnorm(15, mean = 1, sd = 2), rnorm(15, mean = 10, sd = 2.5))
dat1 <- data.frame(y)
dat1$age <- rnorm(30, mean = 40, sd = 10)
dat1$grp <- c(rep(1,15),rep(2,15))
fit0 <- brm(model_formula,
data = dat1, family = gaussian())

and examine the posterior samples with posterior_summary

posterior_summary(fit0)

The estimate for the effect of group is wrong. We need to refit the model using the encoded data.

Fit model to new data?

How to fit an existing model to new data? Use ` update`.

dat1$grp <- as.factor(dat1$grp)
fit1<- update(fit0,newdata = dat1) # avoids re-compiling

posterior_summary(fit1)

Generate Stan code

This is a great feature that helps understand the model and automate a lot of coding

make_stancode(model_formula,
                   data = dat1, family = gaussian())

How to define the model?

Let’s imagine we want to create a simple model of the form \(y \sim N(\mu, \sigma)\) where \(\mu = a + b \cdot Age\). That model can be defined in two ways

# case 1
model_formula<- bf(y ~ 1 + age)

The number 1 can be ommitted in this case. To confirm it run the following code, does it output TRUE?

# case 1
c1<- make_stancode(bf(y ~  age),
                   data = dat1, family = gaussian())
# case 2
c2<- make_stancode(bf(y ~ 1+ age),
                   data = dat1, family = gaussian())
identical(c1,c2)

If instead we wanted \(\mu = b \cdot Age\) we would write y ~ 0 + age, to indicate that the constant term is 0.

# case 3
c3<- make_stancode(bf(y ~ 0 + age),
                   data = dat1, family = gaussian())
identical(c1,c3)

References

See documentation for update
See documentation for posterior_summary
See documentation for brmsformula

Encode as.factor categorical features