Get Started • bbsBayes2

library(bbsBayes2)
library(tidyverse)

Welcome!

bbsBayes2 is the successor to bbsBayes, with a major shift in functionality. The MCMC backend is now Stan instead of JAGS, the workflow has been streamlined, the syntax has changed, and there are new functions. This vignette should help you get started with the package, and there are three others that should help explain some of the new features, choices, and more advanced uses:

Stratification vignette The stratification vignette explains the built-in options for spatial stratifications as well as the workflow required to apply a custom stratification.
Models vignette The models vignette explains the four built-in models that differ in the way the temporal components are structured, and it also covers the built-in options for error distributions and the differences among the model variants (e.g., model_variant = "spatial").
Advanced vignette The advanced vignette is helpful for users wanting to take the bbsBayes2 functionality further, including alternate calculations of population trend, customizing the Stan models, adding covariates, and even some experimental functions that allow for k-fold cross-validations and approximate leave-one-out calculations to compare fit and predictive accuracy among models.

First we’ll make sure we have the right software installed, we’ll fetch the BBS survey data, and then we’ll run through some example workflows.

Install bbsbayes2

If you haven’t already, install bbsBayes2 from the R-Universe.

install.packages("bbsBayes2", repos = c(bbsbayes = "https://bbsbayes.r-universe.dev",
                                        CRAN = getOption("repos")))

Note that due to non-CRAN dependencies, bbsBayes2 is not available on CRAN and must be installed from GitHub or the R-Universe.

Install cmdstanr

Because bbsBayes2 uses Stan to run the Bayesian models, we need to make sure we have cmdstanr and cmdstan both installed.

Run this in a fresh R session.

install.packages("cmdstanr", repos = c("https://mc-stan.org/r-packages/",
                                       getOption("repos")))

Now we should be able to use cmdstanr to install cmdstan

cmdstanr::install_cmdstan()

Let’s check that everything went as planned, and tell cmdstanr to fix any issues.

cmdstanr::check_cmdstan_toolchain(fix = TRUE)
#> The C++ toolchain required for CmdStan is setup properly!

Problems? Check out cmdstanr’s vignette on Getting Started

Special note for Windows users

We strongly recommend that you install Linux on your Windows machine and take advantage of cmdstanr functions that run Stan models in Linux. This will likely cut the MCMC run-times by 30-50% for the bbsBayes2 models. Installing Windows Subsystem for Linux (WSL) is a small hassle, but only needs to be done once. Follow the directions at the above link.

Once the WSL installation is complete, re-install cmdstan using cmdstanr::install_cmdstan(overwrite = TRUE, wsl = TRUE). Now, everytime you run a model using cmdstan (and therefore anytime you run a model using bbsBayes2), it will use the Linux installation to run Stan. It’s seamless and you’ll be very thankful you did it once it’s done.

Download BBS data

Now we’ll fetch the BBS data using the fetch_bbs_data() function.

library(bbsBayes2)
fetch_bbs_data() #

This will save the data to a package-specific directory on your computer. You must agree to the terms and conditions of the data usage before the download will run (type yes at the prompt). The message also includes the key metadata for the dataset, including the citation, acknowledgements, and the release-version (below). You only need run this function once for each annual update of the BBS database (these updates usually occur in the summer, approximately 1-year following the data collection).

Note: Most bbsBayes2 functionality can be explored without downloading BBS data by using the included sample data. Specify sample_data = TRUE in the first stratify() step (see the next section).

There are two types of BBS data that can be downloaded, and annual release-versions:

Two levels state and stop (only state works with bbsBayes2 models, the stop option is provided to facilitate custom projects and models)
Annual releases 2020, 2022, 2023, and ‘2024’ more options will be added as annual releases occur).

The defaults (level state and the most recent release - 2024) is almost certainly what you are looking for, Unless you have a specific reason to need a different version. The most recent release will include all of the data included in earlier releases. However you can download all releases and specify which one you wish to use in the stratify() step.

A note about BBS release names:

The releases are named for the year in which the annual dataset was released after going through QA-QC. So the 2022 release contains all surveys conducted up to and including the 2021 field season. There is no release = 2021, because the BBS was cancelled during the COVID lockdowns of spring 2020 so no data were collected and there was no updated data to release the following year.

fetch_bbs_data()                  # Default - most recent release
fetch_bbs_data(release = "2020")  # Specify a different release

Workflow overview

We can visualize the bbsBayes2 workflow with this flow chart of functions.

The functions are colour coded by category:

BBS Data - Blue
Data Prep - Pink
Modelling - Green
Exploring model trends and indices - Purple
General helper functions - Orange

Functions which are connected by a solid, black arrow, indicate that the output of the first function is required as input to the second. For example, the output of stratify() is required input for prepare_data().

Functions which are connected by a solid, grey arrow, indicate that the output of the first function is optional input to the second. For example, the output of generate_trends() is an option input for plot_geofacet().

Functions which are connected by a dotted arrow indicate that the first function can be used to create input for the second, but not necessarily directly. For example, fetch_bbs_data() downloads BBS data which is used by stratify(), but it isn’t an input. Alternatively, load_map() can load a spatial data file for a specific stratification which can be modified by the user and then used as input to prepare_spatial() or generate_indices().

See the Function Reference for more details on how to use a particular function.

Workflow diagram demonstrating the steps, ordering, and dependencies of the key functions within the bbsBayes2 package

Workflow to fit models

Note: If you would prefer to skip the model-fitting process for now, skip down to Workflow to explore the model outputs.

Now that you have cmdstanr installed and the BBS data downloaded, we’ll walk through a general workflow for modelling species trends with bbsBayes2.

Stratify the data

The first step in any bbsBayes2 analysis is to stratify the data. In this step we choose a stratification type as well as a species to explore.

s <- stratify(by = "bbs_usgs", species = "Scissor-tailed Flycatcher")
#> Using 'bbs_usgs' (standard) stratification
#> Loading BBS data...
#> Filtering to species Scissor-tailed Flycatcher (4430)
#> Stratifying data...
#>   Renaming routes...

We can also play around with the included sample data (Pacific Wrens)

s <- stratify(by = "bbs_cws", sample_data = TRUE) # Only Pacific Wren
#> Using 'bbs_cws' (standard) stratification
#> Using sample BBS data...
#> Using species Pacific Wren (sample data)
#> Filtering to species Pacific Wren (7221)
#> Stratifying data...
#>   Combining BCR 7 and NS and PEI...
#>   Renaming routes...

Stratifications included in bbsBayes2 are bbs_usgs, bbs_cws, bcr, latlong, prov_state. See the article on stratifications for more details and examples.

Species names in bbsBayes2

Let’s first take a brief step out of the workflow to explain important information about species names in bbsBayes2.

All of the models in the package are species-specific. So the species is a fundamental aspect of any analysis. The search_species() function allows the user to search up the species names in the BBS database, using text from the English, Spanish, French, or Latin names. The English names for each species will be retained in the metadata at every step of the workflow.

search_species("Geai bleu")
#> # A tibble: 1 × 8
#>     aou english  french    order         family   genus      species  unid_combined
#>   <dbl> <chr>    <chr>     <chr>         <chr>    <chr>      <chr>    <lgl>        
#> 1  4770 Blue Jay Geai bleu Passeriformes Corvidae Cyanocitta cristata TRUE
search_species("Cyanocitta")
#> # A tibble: 2 × 8
#>     aou english       french          order         family   genus      species  unid_combined
#>   <dbl> <chr>         <chr>           <chr>         <chr>    <chr>      <chr>    <lgl>        
#> 1  4780 Steller's Jay Geai de Steller Passeriformes Corvidae Cyanocitta stelleri TRUE         
#> 2  4770 Blue Jay      Geai bleu       Passeriformes Corvidae Cyanocitta cristata TRUE
search_species("Corvidae")
#> # A tibble: 20 × 8
#>      aou english                                    french order family genus species unid_combined
#>    <dbl> <chr>                                      <chr>  <chr> <chr>  <chr> <chr>   <lgl>        
#>  1  4840 Canada Jay                                 Mésan… Pass… Corvi… Peri… canade… TRUE         
#>  2  4830 Green Jay                                  Geai … Pass… Corvi… Cyan… yncas   TRUE         
#>  3  4920 Pinyon Jay                                 Geai … Pass… Corvi… Gymn… cyanoc… TRUE         
#>  4  4780 Steller's Jay                              Geai … Pass… Corvi… Cyan… stelle… TRUE         
#>  5  4770 Blue Jay                                   Geai … Pass… Corvi… Cyan… crista… TRUE         
#>  6  4790 Florida Scrub-Jay                          Geai … Pass… Corvi… Aphe… coerul… TRUE         
#>  7  4811 Island Scrub-Jay                           Geai … Pass… Corvi… Aphe… insula… TRUE         
#>  8  4812 California Scrub-Jay                       Geai … Pass… Corvi… Aphe… califo… TRUE         
#>  9  4813 Woodhouse's Scrub-Jay                      Geai … Pass… Corvi… Aphe… woodho… TRUE         
#> 10  4810 unid. California Scrub-Jay / Woodhouse's … unid … Pass… Corvi… Aphe… califo… TRUE         
#> 11  4820 Mexican Jay                                Geai … Pass… Corvi… Aphe… wollwe… TRUE         
#> 12  4910 Clark's Nutcracker                         Casse… Pass… Corvi… Nuci… columb… TRUE         
#> 13  4750 Black-billed Magpie                        Pie d… Pass… Corvi… Pica  hudson… TRUE         
#> 14  4760 Yellow-billed Magpie                       Pie à… Pass… Corvi… Pica  nuttal… TRUE         
#> 15  4880 American Crow                              Corne… Pass… Corvi… Corv… brachy… TRUE         
#> 16  4900 Fish Crow                                  Corne… Pass… Corvi… Corv… ossifr… TRUE         
#> 17  4881 unid. American Crow / Fish Crow            unid … Pass… Corvi… Corv… brachy… TRUE         
#> 18  4870 Chihuahuan Raven                           Corbe… Pass… Corvi… Corv… crypto… TRUE         
#> 19  4860 Common Raven                               Grand… Pass… Corvi… Corv… corax   TRUE         
#> 20  4865 unid. Chihuahuan Raven / Common Raven      unid … Pass… Corvi… Corv… crypto… TRUE

Species groupings

There are some taxonomic groupings of species-units in the BBS database that bbsBayes2 by default also groups into combined species forms. These represent groupings that make sense based on changes in taxonomy or potentially inconsistent distinctions among observers, routes, regions, or time.

There are groupings that represent species considered distinct in the early portion of the BBS history that have been lumped into a single species at some point since then. For example, the Northern Flicker observations exist in the BBS database separately as Red-shafted Flicker (4130), Yellow-shafted Flicker (4120), unidentified Red/Yellow-shafted Flicker (4123) or hybrid Red x Yellow-shafted Flicker (4125). To provide an appropriate dataset to represent population trends of Northern Flicker, bbsBayes2 by default sums all of these observations into a new species called Northern Flicker (all forms), which replaces the (4123) unidentified category in the species database. The remaining original separate forms (Red, Yellow, and hybrid) are retained.
Similar combined species are created for taxonomic splits that have occurred since the start of the BBS, such as Clark’s and Western Grebe, which are retained as their own separate species, but are also combined into Western Grebe (Clark’s/Western) (12). You can access a complete list of these combined species groups and the sub groups that make them up.

bbsBayes2::species_forms
#>    aou_unid                               english_original
#> 1      2973              unid. Dusky Grouse / Sooty Grouse
#> 2      5677                   (unid. race) Dark-eyed Junco
#> 3      4123    (unid. Red/Yellow Shafted) Northern Flicker
#> 4      5077      unid. Bullock's Oriole / Baltimore Oriole
#> 5      3370                                Red-tailed Hawk
#> 6      4022                                unid. sapsucker
#> 7      1690                                     Snow Goose
#> 8      6295       unid. Cassin's Vireo / Blue-headed Vireo
#> 9      4665     unid. Alder Flycatcher / Willow Flycatcher
#> 10     4642   unid. Cordilleran / Pacific-slope Flycatcher
#> 11       12            unid. Western Grebe / Clark's Grebe
#> 12     6556 (unid. Myrtle/Audubon's) Yellow-rumped Warbler
#> 13     5275           unid. Common Redpoll / Hoary Redpoll
#> 14     5012                               unid. Meadowlark
#>                                                   english_combined
#> 1                                        Blue Grouse (Dusky/Sooty)
#> 2                                      Dark-eyed Junco (all forms)
#> 3                                     Northern Flicker (all forms)
#> 4                            Northern Oriole (Bullock's/Baltimore)
#> 5                                      Red-tailed Hawk (all forms)
#> 6  Sapsuckers (Yellow-bellied/Red-naped/Red-breasted/Williamson's)
#> 7                                           Snow Goose (all forms)
#> 8                            Solitary Vireo (Blue-headed/Cassin's)
#> 9                               Traill's Flycatcher (Alder/Willow)
#> 10                  Western Flycatcher (Cordilleran/Pacific-slope)
#> 11                                 Western Grebe (Clark's/Western)
#> 12                               Yellow-rumped Warbler (all forms)
#> 13                                          Redpoll (Common/Hoary)
#> 14                         Meadowlark (Eastern/Western/Chihuahuan)
#>                                                french_combined                       aou_id
#> 1                            Tétras sombre (sombre/fuligineux)                   2970, 2971
#> 2                            Junco ardoisé (toutes les formes) 5671, 5670, 5680, 5660, 5690
#> 3                           Pic flamboyant (toutes les formes)             4125, 4120, 4130
#> 4                     Oriole du Nord (de Bullock/de Baltimore)             5080, 5070, 5078
#> 5                      Buse à queue rousse (toutes les formes)                         3380
#> 6  Pics buveur de sève (maculé/à nuque rouge/à poitrine rouge)       4020, 4021, 4031, 4032
#> 7                           Oie des neiges (toutes les formes)                         1691
#> 8                  Viréo à tête bleue (à tête bleue/de Cassin)             6292, 6291, 6290
#> 9               Moucherolle de Traill (des aulnes/ des saules)                   4661, 4660
#> 10                     Moucherolle côtier (des ravins/ côtier)                   4641, 4640
#> 11                      Grèbe élégant (à face blanche/élégant)                       10, 11
#> 12               Paruline à croupion jaune (toutes les formes)                   6550, 6560
#> 13                                 Sizerin (flammé/blanchâtre)                   5270, 5280
#> 14                               Sturnelle (prés/Ouest/Lilian)             5009, 5010, 5011

These splits that have occurred since the start of the BBS require some extra care when considering what years to include in any model fit. For example, if fitting a trend model to the data for Clark’s Grebe, it would not make sense to include all years back to 1966. Prior to 1985, Clark’s Grebe was not a distinct species and so observers would not have recorded observations for this species in the same was as they would have after 1985. The prepare_data() function will generate warnings if the user selects a species and time-period where these species identification-issues may be important. Related concerns with time-span may apply to species that have expanded their range into the surveyed area of the BBS since the beginning of surveys. A list of the species where these time-span concerns may be most relevant can be found by calling the built-in data table.

bbsBayes2::species_notes
#>                  english                  french   aou minimum_year
#> 1       Alder Flycatcher  Moucherolle des aulnes  4661         1978
#> 2      Willow Flycatcher  Moucherolle des saules  4660         1978
#> 3          Clark's Grebe    Grèbe à face blanche    11         1990
#> 4          Western Grebe           Grèbe élégant    10         1990
#> 5 Eurasian Collared-Dove      Tourterelle turque 22860         1990
#> 6           Cave Swallow Hirondelle à front brun  6121         1985
#>                                                                                                                                                                                     warning
#> 1 Alder and Willow Flycatcher were considered a single species until 1973. It is likely that they are not accurately separated by BBS observers until at least some years after that split.
#> 2 Alder and Willow Flycatcher were considered a single species until 1973. It is likely that they are not accurately separated by BBS observers until at least some years after that split.
#> 3   Clark's and Western Grebe were considered a single species until 1985. It is likely that they are not accurately separated by BBS observers until at least some years after that split.
#> 4   Clark's and Western Grebe were considered a single species until 1985. It is likely that they are not accurately separated by BBS observers until at least some years after that split.
#> 5                                     Eurasian Collared Dove was introduced into North America in the 1980s. 1990 is the first year that the species was observed on at least 3 BBS routes.
#> 6                                                      Cave Swallows were relatively rare in the areas surveyed by BBS before 1980. There are only two observations during BBS before 1980.

If you’re looking for a complete list of all species in the BBS database.

all_species_bbs_database <- load_bbs_data()$species

Note: Because models can take hours to days to run, if you’re exploring the package functionality, we recommend using species with relatively small ranges (i.e., relatively few data) such as the Hepatic Tanager, Pacific Wren, or Scissor-tailed Flycatcher.

Prepare the data

Once we have stratified the data, we can now prepare it for use in a model. In this step data will be filtered to omit routes with too few samples, etc. See prepare_data() for more details on how you can customize this step.

p <- prepare_data(s)

Prepare the model

Next we will prepare the model parameters and initialization values. See prepare_model() for more details on how you can customize this step.

md <- prepare_model(p, model = "first_diff")

Run model

Now we can run the model.

The default iter_sampling and iter_warmup are 1000 and the default chains is 4. In the interest of speed for this example, we are using much lower values, but note that this almost certainly will result in problems with our model.

m <- run_model(md, iter_sampling = 100, iter_warmup = 500, chains = 2)

Additional steps for spatial models

For spatial models there are two additional steps. You stratify and prepare the data as in the previous example, but you also prepare the map and the spatial data. An example is below.

s <- stratify(by = "bbs_usgs", species="Scissor-tailed Flycatcher")
#> Using 'bbs_usgs' (standard) stratification
#> Loading BBS data...
#> Filtering to species Scissor-tailed Flycatcher (4430)
#> Stratifying data...
#>   Renaming routes...
p <- prepare_data(s)

And now the additional steps…

# Load a map
map <- load_map(stratify_by = "bbs_usgs")
# Prepare the spatial data
sp <- prepare_spatial(p, map)
#> Preparing spatial data...
#> Identifying neighbours (non-Voronoi method)...
#> Formating neighbourhood matrices...
#> Plotting neighbourhood matrices...

Then the remaining steps are the same but we use model_variant = "spatial" in prepare_model().

# Then prepare the model with the spatial output
mod <- prepare_model(sp, model = "gamye", model_variant = "spatial")

# Then run the model as before
m <- run_model(mod)

# Optionally, save the model output as an .rds file
saveRDS(m, "output/4430_gamye_spatial.rds")

The spatial aspects of the spatial model variants use an intrinsic Conditional Autoregressive structure (iCAR) to share information among neighbouring strata on the population abundance and trend parameter (Besag et al. 1991, ver Hoef et al. 2018, Morris et al. 2019). For more information about the bbsBayes2 models and the spatial models see the models vignette and Smith et al., 2024.

The prepared spatial data object includes a map of the spatial neighbourhood relationships for a given species and stratification.

print(sp$spatial_data$map)

Map showing the strata and neighbourhood relationships among them for Scissor-tailed Flycatcher

Workflow to explore the model outputs

If you would prefer to skip the model fitting steps for now, you can download a fitted model object (the output of the code below) and test out the remaining package features.

library(bbsBayes2)
library(tidyverse)

species <- "Scissor-tailed Flycatcher"

# extract the unique numerical identifier for this species in the BBS database
species_number <- search_species(species) %>%
  select(aou) %>%
  unlist()

mod <- "gamye"
var <- "spatial"

out_name <- paste0("output/",
                      species_number,
                      "_",
                      mod,
                      "_",
                      var)

d <- stratify("bbs_usgs",
              species = species) %>%
  prepare_data() %>%
  prepare_spatial(s, strata_map = load_map("bbs_usgs")) %>%
  prepare_model(model = mod, model_variant = var)

m <- run_model(d,
    output_basename = out_name,
    output_dir = getwd()) # by default saves the model output using output_basename

The outputs of the collection of functions required to fit a model are cumulative: each one retains the metadata from the previous step. As a result, the saved object from the run_model() function is a large list that includes the cmdstanr posterior samples object from the model fitting process, as well as all of the data and metadata necessary to understand and replicate the choices made to fit the model.

m <- readRDS("output/4430_gamye_spatial.rds")
names(m)
#> [1] "model_fit"   "model_data"  "meta_data"   "meta_strata" "raw_data"

Convergence and parameter summaries

First, we will investigate the model convergence and the parameter estimates of the model. There are two key helper functions in bbsBayes2 that provide information on model convergence: get_convergence() calculates convergence diagnostics and get_summary() calculates the convergence diagnostics as well as summary statistics (mean, median, credible intervals) for all parameters in a fitted model.

# Convergence diagnostics for all parameters
converge <- get_convergence(m)

# Convergence diagnostics for all smoothed annual indices
converge_n_smooth <- get_convergence(m, variables = "n_smooth") %>%
  arrange(-rhat)
converge_n_smooth
#> # A tibble: 1,425 × 5
#>    variable_type variable         rhat ess_bulk ess_tail
#>    <chr>         <chr>           <dbl>    <dbl>    <dbl>
#>  1 n_smooth      n_smooth[21,25]  1.00    5881.    2893.
#>  2 n_smooth      n_smooth[20,16]  1.00    4369.    3568.
#>  3 n_smooth      n_smooth[20,15]  1.00    4363.    3713.
#>  4 n_smooth      n_smooth[18,27]  1.00    4850.    3204.
#>  5 n_smooth      n_smooth[24,33]  1.00    5138.    3663.
#>  6 n_smooth      n_smooth[18,29]  1.00    4999.    3353.
#>  7 n_smooth      n_smooth[16,51]  1.00    4498.    3746.
#>  8 n_smooth      n_smooth[18,28]  1.00    5024.    3182.
#>  9 n_smooth      n_smooth[16,50]  1.00    4430.    3747.
#> 10 n_smooth      n_smooth[21,23]  1.00    5757.    3643.
#> # ℹ 1,415 more rows

Here we’ve sorted the convergence diagnostics by rhat values (highest values at the top to highlight any problems). Cut-offs for rhat statistics are somewhat arbitrary and recommendations vary in the literature, but values of rhat > 1.1 indicate a serious problem with the convergence of some of the parameters in the model (i.e., there is more variation among the independent chains than within them) and values of ess_bulk < ~400 suggest an imprecise estimate of the parameter. In some cases, refitting the model with more iterations (both warmup and sampling) may improve convergence. More advice on exploring Bayesian model convergence can be found in Gabry et al., 2019. bbsBayes2 relies on functions in the packages cmdstanr and posterior to calculate rhat values following Vehtari et al. 2021.

m <- run_model(mod,
               iter_warmup = 2000,
               iter_sampling = 2000)

It is possible to thin the MCMC chains by passing arguments from cmdstanr::sample() into the run_model() function. The call below would result in the same number of posterior samples as run_model() defaults, but may improve the efficiency of the sampling (and of course would also increase the time required to fit the model by a factor of approximately 4).

m <- run_model(mod,
               iter_warmup = 4000,
               iter_sampling = 4000,
               thin = 4)

If you want summary statistics of the parameters, as well as convergence diagnostics, the function get_summary() may be more useful.

# Summary statistics and convergence diagnostics for all parameters
summary_stats <- get_summary(m)

# Summary statistics and convergence diagnostics for all smoothed annual indices
summary_stats_n_smooth <- get_summary(m, variables = "n_smooth") %>%
  arrange(-rhat)
summary_stats_n_smooth
#> # A tibble: 1,425 × 10
#>    variable         mean median    sd   mad    q5   q95  rhat ess_bulk ess_tail
#>    <chr>           <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
#>  1 n_smooth[21,25] 33.6   33.6  1.33  1.32  31.5  35.8   1.00    5881.    2893.
#>  2 n_smooth[20,16] 13.8   13.8  0.998 1.01  12.2  15.5   1.00    4369.    3568.
#>  3 n_smooth[20,15] 12.8   12.7  0.926 0.935 11.3  14.3   1.00    4363.    3713.
#>  4 n_smooth[18,27]  8.93   8.89 0.743 0.720  7.80 10.2   1.00    4850.    3204.
#>  5 n_smooth[24,33] 36.4   36.3  2.26  2.21  32.7  40.2   1.00    5138.    3663.
#>  6 n_smooth[18,29]  8.70   8.66 0.690 0.668  7.62  9.91  1.00    4999.    3353.
#>  7 n_smooth[16,51] 19.1   19.0  1.60  1.59  16.6  21.9   1.00    4498.    3746.
#>  8 n_smooth[18,28]  8.78   8.74 0.714 0.696  7.68 10.0   1.00    5024.    3182.
#>  9 n_smooth[16,50] 20.6   20.5  1.67  1.65  18.0  23.6   1.00    4430.    3747.
#> 10 n_smooth[21,23] 34.5   34.5  1.40  1.36  32.3  36.9   1.00    5757.    3643.
#> # ℹ 1,415 more rows

Indices - predictions of annual relative abundance

Indices (annual indices of relative abundance) represent mean predicted annual counts of the species in a given region, on an average route, by an average observer. The pattern in the time-series of these annual indices for a given region represent the estimated population trajectory. We generate indices according to different categories of regional summaries. All of the bbsBayes2 models are stratified based on geographic spatial units, and these categories of regions are either these strata (generate_indices(region = "stratum")), or some combination of strata. By default the selected regions for most model summaries are continent and stratum. The continent estimates represent the area- and abundance-weighted means across all strata, other regional summaries are built in for some stratifications that allow it (e.g., region = "country" for the stratifications bbs_usgs, bbs_cws, or prov_state).

i <- generate_indices(model_output = m)
#> Processing region continent
#> Processing region stratum

We can explore or extract these indices for saving as an external file (e.g., export to .csv), by accessing the indices item in the list.

i[["indices"]]
#> # A tibble: 1,482 × 17
#>     year region    region_type strata_included     strata_excluded index index_q_0.025 index_q_0.05
#>    <dbl> <chr>     <chr>       <chr>               <chr>           <dbl>         <dbl>        <dbl>
#>  1  1967 continent continent   US-AR-24 ; US-AR-2… ""              13.1          11.7         11.9 
#>  2  1968 continent continent   US-AR-24 ; US-AR-2… ""              12.9          11.8         12.0 
#>  3  1969 continent continent   US-AR-24 ; US-AR-2… ""              12.9          12.0         12.1 
#>  4  1970 continent continent   US-AR-24 ; US-AR-2… ""              12.9          12.1         12.2 
#>  5  1971 continent continent   US-AR-24 ; US-AR-2… ""              12.9          12.0         12.1 
#>  6  1972 continent continent   US-AR-24 ; US-AR-2… ""              12.8          12.0         12.1 
#>  7  1973 continent continent   US-AR-24 ; US-AR-2… ""              11.7          11.0         11.1 
#>  8  1974 continent continent   US-AR-24 ; US-AR-2… ""              11.0          10.3         10.5 
#>  9  1975 continent continent   US-AR-24 ; US-AR-2… ""              10.3           9.67         9.77
#> 10  1976 continent continent   US-AR-24 ; US-AR-2… ""               9.35          8.76         8.85
#> # ℹ 1,472 more rows
#> # ℹ 9 more variables: index_q_0.25 <dbl>, index_q_0.75 <dbl>, index_q_0.95 <dbl>,
#> #   index_q_0.975 <dbl>, obs_mean <dbl>, n_routes <int>, n_routes_total <int>, n_non_zero <int>,
#> #   backcast_flag <dbl>

Trajectory plots

We can also generate time-series plots of these indices to visualize population trajectories.

# generates a list of ggplot graphs, one for each region
p <- plot_indices(indices = i,
                  add_observed_means = TRUE) # optional argument to show raw observed mean counts

Note that we get one plot for each region and regional category, in this case that means one plot for the continent, and one for each stratum.

names(p)
#>  [1] "continent" "US_AR_24"  "US_AR_25"  "US_AR_26"  "US_KS_18"  "US_KS_19"  "US_KS_22" 
#>  [8] "US_LA_25"  "US_LA_37"  "US_MO_22"  "US_MO_24"  "US_NM_18"  "US_NM_35"  "US_OK_18" 
#> [15] "US_OK_19"  "US_OK_21"  "US_OK_22"  "US_OK_25"  "US_TX_18"  "US_TX_19"  "US_TX_20" 
#> [22] "US_TX_21"  "US_TX_25"  "US_TX_35"  "US_TX_36"  "US_TX_37"

We can plot them individually by pulling a plot out of the list

print(p[["continent"]])

Population trajectory graph, showing the estimated annual relative abundances, their associated 95% credible intervals, and points representing the raw mean observed counts.

Each of these plots is a ggplot2 object that can be modified like any other. For example, you can modify titles or axes.

library(ggplot2)

p1_mod <- p[["continent"]]+
  coord_cartesian(xlim = c(2000,NA))+ #subset of years
  labs(title = "") # remove the title
print(p1_mod)

Population trajectory graph, modified to show only the last 20 years of the time-series and remove the title

Spaghetti plots to show uncertainty in population trajectories

The most common inference to draw from one of these BBS models relates to the estimates of the population trajectory. One particularly useful way to visualise the uncertainty of those population trajectories is to plot many posterior draws of the full trajectory. The default population trajectories plots plot_indices() show a line representing the path of the annual posterior medians of the annual indices and an uncertainty band spanning the outer limits of a credible interval on the annual indices. These are reasonable summaries of the uncertainty in the collection of annual indices. However, the uncertainty of each annual index of abundance includes information about the uncertainty in the estimate of the change in abundance through time (e.g., trend) and uncertainty in the estimate of the mean abundance (e.g., the mean count in any given route or observer). Those two sources of uncertainty can be correlated in the posterior distribution, so that the uncertainty of the annual indices may over-estimate the uncertainty in the trend. To plot a sample of estimated trajectories, set the spaghetti = TRUE argument in the plot_indices() function.

# generates a list of ggplot graphs, one for each region
p <- plot_indices(indices = i,
                  add_observed_means = TRUE,
                  spaghetti = TRUE)
print(p[["continent"]])

Population trajectory graph, showing a posterior sample of estimated trajectories as multiple overlapping lines, and points representing the raw mean observed counts. There are arguments that also allow the user to control the transparency of each plotted line, as well as the number of lines to plot (the default is to draw 100 random samples).

Trends - predictions of mean rates of change over time

Next we can calculate the long-term trends based on these indices. Note all trends from bbsBayes2 models are derived from summaries of indices through time or between two points in time.

t <- generate_trends(i)

We can explore or extract these trends for saving as an external file (e.g., export to .csv), by accessing the trends item in the list.

t[["trends"]]
#> # A tibble: 26 × 27
#>    start_year end_year region    region_type strata_included  strata_excluded   trend trend_q_0.025
#>         <dbl>    <dbl> <chr>     <chr>       <chr>            <chr>             <dbl>         <dbl>
#>  1       1967     2023 continent continent   US-AR-24 ; US-A… ""              -0.874         -1.11 
#>  2       1967     2023 US-AR-24  stratum     US-AR-24         ""               4.41           3.10 
#>  3       1967     2023 US-AR-25  stratum     US-AR-25         ""               0.214         -0.553
#>  4       1967     2023 US-AR-26  stratum     US-AR-26         ""               4.65           2.78 
#>  5       1967     2023 US-KS-18  stratum     US-KS-18         ""               2.81          -1.36 
#>  6       1967     2023 US-KS-19  stratum     US-KS-19         ""              -0.538         -1.31 
#>  7       1967     2023 US-KS-22  stratum     US-KS-22         ""              -0.0272        -0.701
#>  8       1967     2023 US-LA-25  stratum     US-LA-25         ""              -1.25          -2.85 
#>  9       1967     2023 US-LA-37  stratum     US-LA-37         ""               0.320         -1.81 
#> 10       1967     2023 US-MO-22  stratum     US-MO-22         ""              -0.356         -2.43 
#> # ℹ 16 more rows
#> # ℹ 19 more variables: trend_q_0.05 <dbl>, trend_q_0.25 <dbl>, trend_q_0.75 <dbl>,
#> #   trend_q_0.95 <dbl>, trend_q_0.975 <dbl>, percent_change <dbl>, percent_change_q_0.025 <dbl>,
#> #   percent_change_q_0.05 <dbl>, percent_change_q_0.25 <dbl>, percent_change_q_0.75 <dbl>,
#> #   percent_change_q_0.95 <dbl>, percent_change_q_0.975 <dbl>,
#> #   width_of_95_percent_credible_interval <dbl>, rel_abundance <dbl>, obs_rel_abundance <dbl>,
#> #   n_routes <dbl>, mean_n_routes <dbl>, n_strata_included <dbl>, backcast_flag <dbl>

We can generate trends for different periods of time, using any combination of a starting year min_year and ending year max_year.

t_10 <- generate_trends(i,
                        min_year = 2011,
                        max_year = 2021)
t_10
#> $trends
#> # A tibble: 26 × 27
#>    start_year end_year region    region_type strata_included  strata_excluded   trend trend_q_0.025
#>         <dbl>    <dbl> <chr>     <chr>       <chr>            <chr>             <dbl>         <dbl>
#>  1       2011     2021 continent continent   US-AR-24 ; US-A… ""              -2.00           -2.71
#>  2       2011     2021 US-AR-24  stratum     US-AR-24         ""               1.94           -1.19
#>  3       2011     2021 US-AR-25  stratum     US-AR-25         ""              -0.411          -2.85
#>  4       2011     2021 US-AR-26  stratum     US-AR-26         ""               3.18           -2.08
#>  5       2011     2021 US-KS-18  stratum     US-KS-18         ""               4.08           -2.35
#>  6       2011     2021 US-KS-19  stratum     US-KS-19         ""               0.982          -2.07
#>  7       2011     2021 US-KS-22  stratum     US-KS-22         ""              -3.95           -6.37
#>  8       2011     2021 US-LA-25  stratum     US-LA-25         ""              -2.19           -8.99
#>  9       2011     2021 US-LA-37  stratum     US-LA-37         ""               1.43           -4.89
#> 10       2011     2021 US-MO-22  stratum     US-MO-22         ""               0.0656         -6.34
#> # ℹ 16 more rows
#> # ℹ 19 more variables: trend_q_0.05 <dbl>, trend_q_0.25 <dbl>, trend_q_0.75 <dbl>,
#> #   trend_q_0.95 <dbl>, trend_q_0.975 <dbl>, percent_change <dbl>, percent_change_q_0.025 <dbl>,
#> #   percent_change_q_0.05 <dbl>, percent_change_q_0.25 <dbl>, percent_change_q_0.75 <dbl>,
#> #   percent_change_q_0.95 <dbl>, percent_change_q_0.975 <dbl>,
#> #   width_of_95_percent_credible_interval <dbl>, rel_abundance <dbl>, obs_rel_abundance <dbl>,
#> #   n_routes <dbl>, mean_n_routes <dbl>, n_strata_included <dbl>, backcast_flag <dbl>
#> 
#> $meta_data
#> $meta_data$stratify_by
#> [1] "bbs_usgs"
#> 
#> $meta_data$stratify_type
#> [1] "standard"
#> 
#> $meta_data$species
#> [1] "Scissor-tailed Flycatcher"
#> 
#> $meta_data$model
#> [1] "gamye"
#> 
#> $meta_data$model_variant
#> [1] "spatial"
#> 
#> $meta_data$model_file
#> [1] "C:/Users/SmithAC/AppData/Local/Programs/R/R-4.4.1/library/bbsBayes2/models/gamye_spatial_bbs_CV.stan"
#> 
#> $meta_data$run_date
#> [1] "2024-09-28 18:05:30 EDT"
#> 
#> $meta_data$bbsBayes2_version
#> [1] "1.1.2"
#> 
#> $meta_data$cmdstan_path
#> [1] "//wsl$/Ubuntu/home/smithac/.cmdstan/cmdstan-2.35.0"
#> 
#> $meta_data$cmdstan_version
#> [1] "2.35.0"
#> 
#> $meta_data$regions
#> [1] "continent" "stratum"  
#> 
#> $meta_data$start_year
#> [1] 1967
#> 
#> $meta_data$n_years
#> [1] 57
#> 
#> $meta_data$hpdi_indices
#> [1] FALSE
#> 
#> $meta_data$hpdi_trends
#> [1] FALSE
#> 
#> $meta_data$gam_smooth_trends
#> [1] FALSE
#> 
#> 
#> $meta_strata
#> # A tibble: 25 × 5
#>    strata_name strata area_sq_km continent stratum 
#>    <chr>        <dbl>      <dbl> <chr>     <chr>   
#>  1 US-AR-24         1     33232. continent US-AR-24
#>  2 US-AR-25         2     65340. continent US-AR-25
#>  3 US-AR-26         3     39366. continent US-AR-26
#>  4 US-KS-18         4     37179. continent US-KS-18
#>  5 US-KS-19         5    109390. continent US-KS-19
#>  6 US-KS-22         6     66243. continent US-KS-22
#>  7 US-LA-25         7     47075. continent US-LA-25
#>  8 US-LA-37         8     24987. continent US-LA-37
#>  9 US-MO-22         9     83079. continent US-MO-22
#> 10 US-MO-24        10     87034. continent US-MO-24
#> # ℹ 15 more rows
#> 
#> $raw_data
#> # A tibble: 13,250 × 23
#>    country_num state_num state     rpid   bcr  year strata_name route   obs_n count n_routes
#>          <dbl>     <dbl> <chr>    <dbl> <dbl> <dbl> <chr>       <chr>   <dbl> <dbl>    <int>
#>  1         840         7 ARKANSAS   101    24  1967 US-AR-24    7-20  1190020     0       12
#>  2         840         7 ARKANSAS   101    24  1968 US-AR-24    7-20  1190020     0       12
#>  3         840         7 ARKANSAS   101    24  1969 US-AR-24    7-20  1190020     0       12
#>  4         840         7 ARKANSAS   101    24  1970 US-AR-24    7-20  1190020     0       12
#>  5         840         7 ARKANSAS   101    24  1971 US-AR-24    7-20  1190020     0       12
#>  6         840         7 ARKANSAS   101    24  1972 US-AR-24    7-20  1190020     0       12
#>  7         840         7 ARKANSAS   101    24  1973 US-AR-24    7-20  1190020     0       12
#>  8         840         7 ARKANSAS   101    24  1974 US-AR-24    7-20  1190020     0       12
#>  9         840         7 ARKANSAS   101    24  1975 US-AR-24    7-20  1190020     0       12
#> 10         840         7 ARKANSAS   101    24  1976 US-AR-24    7-20  1190020     0       12
#> # ℹ 13,240 more rows
#> # ℹ 12 more variables: non_zero_weight <dbl>, first_year <dbl>, max_n_routes_year <int>,
#> #   n_obs <int>, mean_obs <dbl>, year_num <dbl>, strata <dbl>, observer <int>, site <int>,
#> #   obs_route <chr>, obs_site <int>, n_obs_sites <int>

Trend maps - visualizing the spatial variation in trends

We can plot trend estimates on a map to visualise the spatial patterns in population change.

trend_map <- plot_map(t)
print(trend_map)

Population trend map showing strata with increasing trends in blues and decreasing trends in reds.

Barn Swallow example

A model with a suitable number of iterations takes a long time to run (the Barn Swallow model, a species with many counts, years and strata, took 54 hours).

You can download an example of the saved model output for Barn Swallow here:

An example of the output from applying the spatial gamye model to Barn Swallow data.

In this example we’ve placed it in a sub-directory of our working directory called output.

Let’s take a look at the spatial GAMYE model for the Barn Swallow.

First we load the data

BARS <- readRDS("output/Barn_Swallow_gamye_spatial.rds")

We can investigate the model meta data

BARS$meta_data
#> $stratify_by
#> [1] "bbs_cws"
#> 
#> $stratify_type
#> [1] "standard"
#> 
#> $species
#> [1] "Barn Swallow"
#> 
#> $model
#> [1] "gamye"
#> 
#> $model_variant
#> [1] "spatial"
#> 
#> $model_file
#> [1] "C:/Users/SmithAC/AppData/Local/R/win-library/4.2/bbsBayes2/models/gamye_spatial_bbs_CV.stan"
#> 
#> $run_date
#> [1] "2023-01-20 13:27:49 EST"
#> 
#> $bbsBayes2_version
#> [1] "1.0.0"
#> 
#> $cmdstan_path
#> [1] "C:/Users/SmithAC/Documents/.cmdstan/wsl-cmdstan-2.30.1"
#> 
#> $cmdstan_version
#> [1] "2.30.1"

See the length of the run in seconds or hours

BARS$model_fit$time()
#> $total
#> [1] 197011.7
#> 
#> $chains
#>   chain_id   warmup sampling  total
#> 1        1  82393.1  46100.9 128494
#> 2        2  85907.6 111101.0 197009
#> 3        3 113442.0  54516.1 167958
#> 4        4 113073.0  54597.2 167670
BARS$model_fit$time()$total/3600
#> [1] 54.72546

Now create the indices and trends


BARS_indices <- generate_indices(BARS)
#> Processing region continent
#> Processing region stratum

BARS_index_plots <- plot_indices(BARS_indices,
                                 add_observed_means = TRUE,
                                 add_number_routes = TRUE)

BARS_continent <- BARS_index_plots[["continent"]]
print(BARS_continent)

Population trajectory graph for Barn Swallow, showing the estimated annual relative abundances, their associated 95% credible intervals, and points representing the raw mean observed counts.

Smoothed population trajectories - gamye model

Above, we’ve plotted the full annual indices (i.e., the population trajectory) from the gamye model. In the gamye and slope models, the population trajectory can be decomposed into two components: the smooth component and the annual fluctuations around the smooth. The default estimate of the population trajectory uses the full version of the annual indices (generate_indices(alternate_n = "n")). For the gamye model the alternate population trajectory that represents only the semi-parametric smooth component can be accessed by choosing the generate_indices(alternate_n = "n_smooth") option. For the slope model the alternate population trajectory that represents the linear smooth only can be accessed by choosing the generate_indices(alternate_n = "n_slope") option. These smooth trajectories may be useful for trend estimates that vary less from year-to-year (gamye model) or for an overall linear trend from a slope model applied to a short time-frame (it’s probably unlikely that a linear smooth is reasonable for long periods of time).

BARS_smooth_indices <- generate_indices(BARS,
                                        alternate_n = "n_smooth")
#> Processing region continent
#> Processing region stratum
BARS_trends <- generate_trends(BARS_smooth_indices)
BARS_trend_map <- plot_map(BARS_trends)
BARS_trend_map

Population trend map for Barn Swallow, showing strata with increasing trends in shades of blue and strata with decreasing trends in shades of red ### Modifying base plots

We can also add the smoothed annual indices to the plots of the full annual indices from above, taking advantage of the ggplot2 functions such as geom_line().

library(ggplot2)

smooth_cont_indices <- BARS_smooth_indices$indices %>%
  filter(region == "continent")

BARS_continent_both <- BARS_continent +
  geom_line(data = smooth_cont_indices,
            aes(x = year, y = index),
            alpha = 0.5,
            linewidth = 1)+
  labs(subtitle = "grey line represents the smoothed annual indices")

print(BARS_continent_both)

Population trajectory graph for Barn Swallow, same as above plus the smoothed annual indices are added as a grey line

Check out the other articles to explore more advanced usage or the function reference to see what functions are available and how to use them in greater detail.