Calculate and format the prepared data for use in modelling. Different
parameters are definied for different types of models (see ?bbs_models
for
a list of models included in bbsBayes2).
Usage
prepare_model(
prepared_data,
model,
model_variant = "hier",
model_file = NULL,
use_pois = FALSE,
heavy_tailed = TRUE,
n_knots = NULL,
basis = "mgcv",
calculate_nu = FALSE,
calculate_log_lik = FALSE,
calculate_cv = FALSE,
cv_k = 10,
cv_fold_groups = "obs_n",
cv_omit_singles = TRUE,
set_seed = NULL,
quiet = FALSE
)
Arguments
- prepared_data
List. Prepared data generated by
prepare_data()
(ifmodel-variant
is notspatial
) orprepare_spatial()
(ifmodel_variant
is"spatial"
).- model
Character. Type of model to use, must be one of "first_diff" (First Differences), "gam" (General Additive Model), "gamye" (General Additive Model with Year Effect), or "slope" (Slope model).
- model_variant
Character. Model variant to use, must be one of "nonhier" (Non-hierarchical), "hier" (Hierarchical; default), or "spatial" (Spatially explicit).
- model_file
Character. Optional location of a custom Stan model file to use.
- use_pois
Logical. Whether to use an Over-Dispersed Poisson model (
TRUE
) or an Negative Binomial model (FALSE
; default).- heavy_tailed
Logical. Whether extra-Poisson error distributions should be modelled as a t-distribution, with heavier tails than the standard normal distribution. Default
TRUE
. Recent results suggest this is best even though it requires much longer convergence times. Can only be set toFALSE
with Poisson models (i.e.use_pois = TRUE
).- n_knots
Numeric. Number of knots for "gam" and "gamye" models
- basis
Character. Basis function to use for GAM smooth, one of "original" or "mgcv". Default is "original", the same basis used in Smith and Edwards 2020. "mgcv" is an alternate that uses the "tp" basis from the package mgcv (also used in brms, and rstanarm). If using the "mgcv" option, the user may want to consider adjusting the prior distributions for the parameters and their precision.
- calculate_nu
Logical. Whether to calculate the
nu
parameter as a factor ofgamma(2, 0.1)
. DefaultFALSE
.- calculate_log_lik
Logical. Whether to calculate point-wise log-likelihood of the data given the model. Default
FALSE
.- calculate_cv
Logical. Whether to use bbsBayes2' cross validation. Note this is experimental. See Details.
- cv_k
Numeric. The number of K folds to include (only relevant if
calculate_cv = TRUE
). Default 10. Note this is experimental.- cv_fold_groups
Character. The data column to use when determining the grouping level of the observations to be assigned to different fold groups. Must be one of
obs_n
(default) orroutes
(only relevant ifcalculate_cv = TRUE
). Note this is experimental. See the models article for more details.- cv_omit_singles
Logical. Whether to omit test groups with no replication (only relevant if
calculate_cv = TRUE
). DefaultTRUE.
See the models article for more details.- set_seed
Numeric. If
NULL
(default) no seed is set. Otherwise an integer number to be used withwithr::with_seed()
internally to ensure reproducibility.- quiet
Logical. Suppress progress messages? Default
FALSE
.
Value
A list of prepared data.
model_data
- list of data formatted for use in Stan modellinginit_values
- list of initialization parametersfolds
- a vector of k-fold groups each observation is assigned to (ifcalculate_cv = TRUE
), orNULL
meta_data
- meta data defining the analysismeta_strata
- data frame listing strata meta dataraw_data
- data frame of summarized counts used to createmodel_data
(just formatted more nicely)
Details
There are two ways you can customize the model run. The first is to supply a
custom model_file
created with the copy_model_file()
function and then
edited by hand.
Second, you can edit or overwrite the initialization parameters
(init_values
) in the output of prepare_model()
to customize the init
supplied to cmdstanr::sample()
. You can supply these parameters in anyway
that cmdstanr::sample()
accepts the init
argument. See also the
init_alternate
argument in run_model()
.
To implement bbsBayes2' version of cross validation, set calculate_cv = TRUE
. You can set up your own system for cross validation by modifying the
folds
list-item in the output of prepare_model()
. Note this is
considered experimental.
See the models article for more advanced examples and explanations.
See also
Other Data prep functions:
prepare_data()
,
prepare_spatial()
,
stratify()
Examples
s <- stratify(by = "bbs_cws", sample_data = TRUE)
#> Using 'bbs_cws' (standard) stratification
#> Using sample BBS data...
#> Using species Pacific Wren (sample data)
#> Filtering to species Pacific Wren (7221)
#> Stratifying data...
#> Combining BCR 7 and NS and PEI...
#> Renaming routes...
p <- prepare_data(s)
pm <- prepare_model(p, model = "first_diff", model_variant = "hier")