Prepare model parameters — prepare

Calculate and format the prepared data for use in modelling. Different parameters are definied for different types of models (see ?bbs_models for a list of models included in bbsBayes2).

Usage

prepare_model(
  prepared_data,
  model,
  model_variant = "hier",
  model_file = NULL,
  use_pois = FALSE,
  heavy_tailed = TRUE,
  n_knots = NULL,
  basis = "mgcv",
  calculate_nu = FALSE,
  calculate_log_lik = FALSE,
  calculate_cv = FALSE,
  cv_k = 10,
  cv_fold_groups = "obs_n",
  cv_omit_singles = TRUE,
  set_seed = NULL,
  quiet = FALSE
)

Arguments

prepared_data: List. Prepared data generated by prepare_data() (if model-variant is not spatial) or prepare_spatial() (if model_variant is "spatial").
model: Character. Type of model to use, must be one of "first_diff" (First Differences), "gam" (General Additive Model), "gamye" (General Additive Model with Year Effect), or "slope" (Slope model).
model_variant: Character. Model variant to use, must be one of "nonhier" (Non-hierarchical), "hier" (Hierarchical; default), or "spatial" (Spatially explicit).
model_file: Character. Optional location of a custom Stan model file to use.
use_pois: Logical. Whether to use an Over-Dispersed Poisson model (TRUE) or an Negative Binomial model (FALSE; default).
heavy_tailed: Logical. Whether extra-Poisson error distributions should be modelled as a t-distribution, with heavier tails than the standard normal distribution. Default TRUE. Recent results suggest this is best even though it requires much longer convergence times. Can only be set to FALSE with Poisson models (i.e. use_pois = TRUE).
n_knots: Numeric. Number of knots for "gam" and "gamye" models
basis: Character. Basis function to use for GAM smooth, one of "original" or "mgcv". Default is "original", the same basis used in Smith and Edwards 2020. "mgcv" is an alternate that uses the "tp" basis from the package mgcv (also used in brms, and rstanarm). If using the "mgcv" option, the user may want to consider adjusting the prior distributions for the parameters and their precision.
calculate_nu: Logical. Whether to calculate the nu parameter as a factor of gamma(2, 0.1). Default FALSE.
calculate_log_lik: Logical. Whether to calculate point-wise log-likelihood of the data given the model. Default FALSE.
calculate_cv: Logical. Whether to use bbsBayes2' cross validation. Note this is experimental. See Details.
cv_k: Numeric. The number of K folds to include (only relevant if calculate_cv = TRUE). Default 10. Note this is experimental.
cv_fold_groups: Character. The data column to use when determining the grouping level of the observations to be assigned to different fold groups. Must be one of obs_n (default) or routes (only relevant if calculate_cv = TRUE). Note this is experimental. See the models article for more details.
cv_omit_singles: Logical. Whether to omit test groups with no replication (only relevant if calculate_cv = TRUE). Default TRUE. See the models article for more details.
set_seed: Numeric. If NULL (default) no seed is set. Otherwise an integer number to be used with withr::with_seed() internally to ensure reproducibility.
quiet: Logical. Suppress progress messages? Default FALSE.

Value

A list of prepared data.

model_data - list of data formatted for use in Stan modelling
init_values - list of initialization parameters
folds - a vector of k-fold groups each observation is assigned to (if calculate_cv = TRUE), or NULL
meta_data - meta data defining the analysis
meta_strata - data frame listing strata meta data
raw_data - data frame of summarized counts used to create model_data (just formatted more nicely)

Details

There are two ways you can customize the model run. The first is to supply a custom model_file created with the copy_model_file() function and then edited by hand.

Second, you can edit or overwrite the initialization parameters (init_values) in the output of prepare_model() to customize the init supplied to cmdstanr::sample(). You can supply these parameters in anyway that cmdstanr::sample() accepts the init argument. See also the init_alternate argument in run_model().

To implement bbsBayes2' version of cross validation, set calculate_cv = TRUE. You can set up your own system for cross validation by modifying the folds list-item in the output of prepare_model(). Note this is considered experimental.

See the models article for more advanced examples and explanations.

Examples

s <- stratify(by = "bbs_cws", sample_data = TRUE)
#> Using 'bbs_cws' (standard) stratification
#> Using sample BBS data...
#> Using species Pacific Wren (sample data)
#> Filtering to species Pacific Wren (7221)
#> Stratifying data...
#>   Combining BCR 7 and NS and PEI...
#>   Renaming routes...
p <- prepare_data(s)
pm <- prepare_model(p, model = "first_diff", model_variant = "hier")