This is part of series about distributed lag non-linear models. Please read the first post for an introduction and a disclaimer.

Since I’ve been working so much with GAMs for this project, I decided to read sections of Simon Wood’s book, Generalized additive models: an introduction with R more thoroughly. In Chapter 5: Smoothers, there is an example **tensor product** smooth (more on what that means later) fitting a distributed lag model. When I saw this, I started to question *everything*. What was the `dlnm`

package even doing if I could fit a DLNM with just `mgcv`

? Was it doing something *wrong*? Was I interpreting it wrong? Am I going to have to change EVERYTHING?

I also found this wonderful paper by Nothdurft and Vospernik that fits a DLNM for yearly tree ring data explained by lagged, non-linear, monthly weather data. They also used only the `mgcv`

package and tensor product smooths to fit this model. So what is a tensor product and what is the `dlnm`

package doing differently?

## Tensor Products

A tensor product smooth is a two (or more) dimensional smooth such that the shape of one dimension varies smoothly over the other dimension. Tensor products are constructed from two (or more) so-called marginal smooths. Imagine a basket weave of wood strips. The strips can be different widths and be more or less flexible in each dimension. The flexibility of the strips roughly corresponds to a smoothing penalty (stiffer = smoother) and the number and width of strips roughly corresponds to number of knots. You can bend the first strip on one dimension, but you can’t really bend the adjacent strip in the completely opposite direction. The shape of the strips in one dimension is forced to vary smoothly across the other dimension.

The tensor product for a DLNM has the environmental predictor on one dimension (SPEI, in our example) and lag time on the other dimension. So SPEI can have a non-linear effect on plant growth, but the shape of that relationship with lag = 0 is constrained to be similar to the shape at lag = 1 (the adjacent strip of wood). The change in the shape of the SPEI effect varies smoothly with lag time. Here’s a pure `mgcv`

implementation of a DLNM.

`library(mgcv)`

`## Loading required package: nlme`

```
##
## Attaching package: 'nlme'
```

```
## The following object is masked from 'package:dplyr':
##
## collapse
```

`## This is mgcv 1.8-33. For overview type 'help("mgcv-package")'.`

```
growth_te <-
gam(log_size_next ~
s(log_size) +
te(spei_history, L,
bs = "cr",
k = c(5, 15)),
family = gaussian(link = "identity"),
method = "REML",
select = TRUE,
data = ha)
```

## So what the heck is {dlnm} doing that’s different?

After re-reading Gasparrini’s papers for the billionth time and reading more of Simon Wood’s book, I realized the difference between the pure `mgcv`

approach and the `dlnm`

approach had to do with “identifiability constraints”. Basically, because a GAM is a function that has covariates which themselves are functions, those smooth covariates are usually constrained to sum to 0.
For example, the smooth term for `s(log_size)`

looks essentially centered around 0 on the y-axis when plotted.

```
library(gratia)
draw(growth_te, select = 1)
```

Tensor product smooths in `mgcv`

have this constraint as well, but not for each marginal function. The entire *surface* sums to zero.

`draw(growth_te, select = 2, dist = Inf)`

The `dlnm`

package constructs a “crossbasis” function, but it does this by using **tensor products** from `mgcv`

. So what is the difference? Well, the major difference is in how it does identifiability constraints. For `te(..)`

, the entire surface must sum to zero. For `s(..., bs = "cb")`

, the predictor-response dimension is constrained to sum to zero. That means that every slice for any value of lag must sum to zero. It also removes the the intercept from that dimension, so the resulting smooth ends up having fewer knots and fewer maximum degrees of freedom.

Here’s the `dlnm`

version:

`library(dlnm)`

`## This is dlnm 2.4.2. For details: help(dlnm) and vignette('dlnmOverview').`

```
growth_cb <-
gam(log_size_next ~
s(log_size) +
s(spei_history, L, #crossbasis function
bs = "cb",
k = c(5, 15),
xt = list(bs = "cr")),
family = gaussian(link = "identity"),
method = "REML",
select = TRUE,
data = ha)
```

`draw(growth_cb, select = 2, dist = Inf)`

Notice how it looks much more symmetrical left to right. This is more clear if we plot all the slices through lag time on top of eachother, kind of like holding the surface up with the x-axis at eye-level and looking down the y-axis in the middle.

```
eval_cb <- evaluate_smooth(growth_cb, "spei_history", dist = Inf)
eval_te <- evaluate_smooth(growth_te, "spei_history", dist = Inf)
```

```
eval_cb %>%
#just take beginning and end
# filter(L == min(L) | L == max(L)) %>%
mutate(L = as.factor(L)) %>%
ggplot(aes(x = spei_history, y = est, group = L)) +
geom_line(alpha = 0.1) +
labs(title = "crossbasis from {dlnm}") +
coord_cartesian(xlim = c(-0.5,0.5), ylim = c(-0.02, 0.02)) +
eval_te %>%
#just take beginning and end
# filter(L == min(L) | L == max(L)) %>%
mutate(L = as.factor(L)) %>%
ggplot(aes(x = spei_history, y = est, group = L)) +
geom_line(alpha = 0.1) +
labs(title = "tensor product from {mgcv}") +
coord_cartesian(xlim = c(-0.5,0.5), ylim = c(-0.02, 0.02))
```

The crossbasis function intercepts are more tightly aligned, I think because of the sum-to-zero constraint along the `spei_history`

axis.

Oddly, the slices in the `dlnm`

version still don’t all sum to zero, so maybe I’m still not totally explaining this right.

## What does it all mean?

Ultimately, there is little difference between these approaches for these data. If we use the `cb_margeff()`

function I mentioned earlier on in this series to get fitted values of y (for the whole GAM, not just the smooth), then the two models look nearly identical.

```
pred_cb <- cb_margeff(growth_cb, spei_history, L)
pred_te <- cb_margeff(growth_te, spei_history, L)
```

```
ggplot(pred_cb, aes(x = x, y = lag, fill = fitted)) +
geom_raster() +
geom_contour(aes(z = fitted), binwidth = 0.01, color = "black", alpha = 0.3) +
scale_fill_viridis_c(option = "plasma") +
labs(title = "{dlnm} crossbasis") +
ggplot(pred_te, aes(x = x, y = lag, fill = fitted)) +
geom_raster() +
geom_contour(aes(z = fitted), binwidth = 0.01, color = "black", alpha = 0.3) +
scale_fill_viridis_c(option = "plasma") +
labs(title = "{mgcv} tensor product")
```

And the summary statistics are very similar as well

`anova(growth_cb)`

```
##
## Family: gaussian
## Link function: identity
##
## Formula:
## log_size_next ~ s(log_size) + s(spei_history, L, bs = "cb", k = c(5,
## 15), xt = list(bs = "cr"))
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(log_size) 1.363 9.000 261.921 < 2e-16
## s(spei_history,L) 4.570 23.000 1.077 4.01e-05
```

`anova(growth_te)`

```
##
## Family: gaussian
## Link function: identity
##
## Formula:
## log_size_next ~ s(log_size) + te(spei_history, L, bs = "cr",
## k = c(5, 15))
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(log_size) 1.364 9.000 261.905 < 2e-16
## te(spei_history,L) 4.618 26.000 0.955 4.02e-05
```

The major difference, you’ll notice, is that the reference degrees of freedom are larger for the tensor product version. Again, that is because the crossbasis function constrains the SPEI dimension to sum to zero and the intercept along that dimension is removed.

## Why {dlnm}?

So the advantages of the `dlnm`

package for fitting DLNMs are probably mostly evident when you consider cases besides GAMs. For example, If I wanted to constrain the lag dimension to be a switchpoint function, or if I wanted the SPEI dimension to be strictly a quadratic function, I could do that with `dlnm`

. If you’re interested in interpreting your results in terms of relative risk ratios, then `dlnm`

offers some OK visualiztion options for that. When using smooth functions for both marginal bases, the differences between using a straight tensor product with `te()`

and using a crossbasis function start to fade away. The `dlnm`

version is still a little easier to interpret, I think, because you can more easily compare slices through lag time with the plot of the smooth itself.