Skip to contents

Calculate and format the prepared data for use in modelling. Different parameters are definied for different types of models (see ?bbs_models for a list of models included in bbsBayes2).

Usage

prepare_model(
  prepared_data,
  model,
  model_variant = "hier",
  model_file = NULL,
  use_pois = FALSE,
  heavy_tailed = TRUE,
  n_knots = NULL,
  basis = "mgcv",
  calculate_nu = FALSE,
  calculate_log_lik = FALSE,
  calculate_cv = FALSE,
  cv_k = 10,
  cv_fold_groups = "obs_n",
  cv_omit_singles = TRUE,
  set_seed = NULL,
  quiet = FALSE
)

Arguments

prepared_data

List. Prepared data generated by prepare_data() (if model-variant is not spatial) or prepare_spatial() (if model_variant is "spatial").

model

Character. Type of model to use, must be one of "first_diff" (First Differences), "gam" (General Additive Model), "gamye" (General Additive Model with Year Effect), or "slope" (Slope model).

model_variant

Character. Model variant to use, must be one of "nonhier" (Non-hierarchical), "hier" (Hierarchical; default), or "spatial" (Spatially explicit).

model_file

Character. Optional location of a custom Stan model file to use.

use_pois

Logical. Whether to use an Over-Dispersed Poisson model (TRUE) or an Negative Binomial model (FALSE; default).

heavy_tailed

Logical. Whether extra-Poisson error distributions should be modelled as a t-distribution, with heavier tails than the standard normal distribution. Default TRUE. Recent results suggest this is best even though it requires much longer convergence times. Can only be set to FALSE with Poisson models (i.e. use_pois = TRUE).

n_knots

Numeric. Number of knots for "gam" and "gamye" models

basis

Character. Basis function to use for GAM smooth, one of "original" or "mgcv". Default is "original", the same basis used in Smith and Edwards 2020. "mgcv" is an alternate that uses the "tp" basis from the package mgcv (also used in brms, and rstanarm). If using the "mgcv" option, the user may want to consider adjusting the prior distributions for the parameters and their precision.

calculate_nu

Logical. Whether to calculate the nu parameter as a factor of gamma(2, 0.1). Default FALSE.

calculate_log_lik

Logical. Whether to calculate point-wise log-likelihood of the data given the model. Default FALSE.

calculate_cv

Logical. Whether to use bbsBayes2' cross validation. Note this is experimental. See Details.

cv_k

Numeric. The number of K folds to include (only relevant if calculate_cv = TRUE). Default 10. Note this is experimental.

cv_fold_groups

Character. The data column to use when determining the grouping level of the observations to be assigned to different fold groups. Must be one of obs_n (default) or routes (only relevant if calculate_cv = TRUE). Note this is experimental. See the models article for more details.

cv_omit_singles

Logical. Whether to omit test groups with no replication (only relevant if calculate_cv = TRUE). Default TRUE. See the models article for more details.

set_seed

Numeric. If NULL (default) no seed is set. Otherwise an integer number to be used with withr::with_seed() internally to ensure reproducibility.

quiet

Logical. Suppress progress messages? Default FALSE.

Value

A list of prepared data.

  • model_data - list of data formatted for use in Stan modelling

  • init_values - list of initialization parameters

  • folds - a vector of k-fold groups each observation is assigned to (if calculate_cv = TRUE), or NULL

  • meta_data - meta data defining the analysis

  • meta_strata - data frame listing strata meta data

  • raw_data - data frame of summarized counts used to create model_data (just formatted more nicely)

Details

There are two ways you can customize the model run. The first is to supply a custom model_file created with the copy_model_file() function and then edited by hand.

Second, you can edit or overwrite the initialization parameters (init_values) in the output of prepare_model() to customize the init supplied to cmdstanr::sample(). You can supply these parameters in anyway that cmdstanr::sample() accepts the init argument. See also the init_alternate argument in run_model().

To implement bbsBayes2' version of cross validation, set calculate_cv = TRUE. You can set up your own system for cross validation by modifying the folds list-item in the output of prepare_model(). Note this is considered experimental.

See the models article for more advanced examples and explanations.

See also

Other Data prep functions: prepare_data(), prepare_spatial(), stratify()

Examples

s <- stratify(by = "bbs_cws", sample_data = TRUE)
#> Using 'bbs_cws' (standard) stratification
#> Using sample BBS data...
#> Using species Pacific Wren (sample data)
#> Filtering to species Pacific Wren (7221)
#> Stratifying data...
#>   Combining BCR 7 and NS and PEI...
#>   Renaming routes...
p <- prepare_data(s)
pm <- prepare_model(p, model = "first_diff", model_variant = "hier")