Fit each permuations using various regression models

Useful for boiling down the multiple permutations after quake().

Usage

fit_lm(
  df,
  y = y,
  formula_rhs = x_new,
  by = NULL,
  x_pred = cutter_number(df, 30, x_new),
  k = k,
  .keep_mod = FALSE,
  ...
)

fit_gam(
  df,
  y = y,
  formula_rhs = s(x_new, bs = "cs"),
  by = NULL,
  x_pred = cutter_number(df, 30, x_new),
  k = k,
  .keep_mod = FALSE,
  ...
)

fit_loess(
  df,
  y = y,
  formula_rhs = x_new,
  by = NULL,
  x_pred = cutter_number(df, 30, x_new),
  k = k,
  .keep_mod = FALSE,
  ...
)

Arguments

df: tibble() typically returned by quake()
y: colname of the value of interest. Default to y.
formula_rhs: the right hand-side of the stats::formula() to be used. Depending on the function, different defaults are used. See Usage section above.
by: colname for grouping structure (besides k). Default to NULL
x_pred: the sequence of x for which to predict new values after fitting
k: colname for iteration . Default to k.
.keep_mod: logical whether to keep regression models and index of fit quality. See Details
...: additional parameters to be passed to main fitting functions. See Examples

Value

a tibble()

Details

x_pred can be passed directly as a vector of numeric. By default, cutter_number(df, 30, x_new) is used so that 30 groups of approximately equal numbers of observations are made using the new x values, ie likely to cover the range of possible x values.

formula_rhs correspond to the right hand side of the formula. It you want to fit using, say y~x_new, this corresponds only to x_new. This saves typing y~ because it is already provided by the y argument. You can pass any stats::formula() you want. For instance x_new^2 + x_new -1 would correspond to a quadratic model with no intercept. See stats::formula() to stick with this grammar.

Functions

fit_lm: fitting using linear model
fit_gam: fitting using generalised additive models with smoothness estimation
fit_loess: fitting using local polynomial regression

Motivation

These helpers wrap stats::loess(), stats::lm(), mgcv::gam() and make the assumption that you know what you are doing: ie the models are appropriate and their conditions respected. They all :

fit a model on each permutation (and group-wise if by is provided).
predict new y values using stats::predict on this model and for fixed x_pred values.

The main motivation is to generalize on many permutations what you would do on original data: find a model that fits them and try to represent trends using it. In practice, when you're adding ggplot2::geom_smooth to a graph, that's exactly what you do, even if ggplot2 can largely take care of the modelling aspect.

The second motivation is to boil down the randomized x values into fixed x values, that you provide to the function with the argument x_pred, and for which predicted y values will be calculated using stats::predict. The price to pay is to fit models but the benefit is that you cant test differences between groups more easily for each of these fixed x values, that discretize your temporal range.

Besides testing, graphical comparisons are eased too, because desriptors such as extrema, other intermediate quantiles, average, etc. can be calculated for these fixed x-values. This is the purpose of the main spaghetti plot.

The last (and not least) motivation is to assess the quality of such fitting functions. Using .keep_mod=TRUE gives a glimpse to it by providing indices such as the adjusted r^2 (for lm and gam), and the residual standard deviation (for loess) for each permutation. But it also retain each model so that you can access other components of interest such as coefficients, p-values, etc.

Note that you do not need fitting to discretize values, you can also bin() them on requested x slices. That still allows testing minus the model intermediate. Finally, if you only want a graphical approach, you can directly use spaghetti0 directly after quake()

Examples


# Show fitter on a single iteration
df <- animals_q %>% dplyr::filter(k==1, taxa=="cat")
plot(df$x_new, df$value, pch=20, cex=0.2)
x_pred <- seq(-100, 100, 10)

# use fitting functions
lm_fit    <- fit_lm(df, y=value, x_pred=x_pred, span=0.5) # you can specify arguments
#>  * fitting with lm(value ~ x_new, span=0.5)
#> Warning: In lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
#>  extra argument ‘span’ will be disregarded
loess_fit <- fit_loess(df, y=value, x_pred=x_pred)
#>  * fitting with loess(value ~ x_new)
gam_fit   <- fit_gam(df, y=value, x_pred=x_pred)
#>  * fitting with gam(value ~ s(x_new, bs = "cs"))

# now draw them
lines(lm_fit$x_pred, lm_fit$y_pred, col="firebrick3")
lines(loess_fit$x_pred, loess_fit$y_pred, col="orange")
lines(gam_fit$x_pred, gam_fit$y_pred, col="blue")


# usually, you would use them on a full object returned by quake
# here we show the first 6 permutations along with their loess
cat <- animals_q %>% dplyr::filter(k<=6, taxa=="cat")
cat_lines <- cat %>% fit_loess(y=value, x_pred=x_pred)
#>  * fitting with loess(value ~ x_new)
ggplot2::ggplot() +
 ggplot2::geom_point(mapping=ggplot2::aes(x=x_new, y=value), data=cat, size=0.2) +
 ggplot2::geom_line(mapping=ggplot2::aes(x=x_pred, y=y_pred, group=k), data=cat_lines, col="red") +
 ggplot2::facet_wrap(~k, ncol=3) +
 ggplot2::theme_minimal()