Skip to contents

Introduction

This package implements the CBEA approach for performing set-based enrichment analysis for microbiome relative abundance data. A preprint of the package can be found on bioXriv. In summary, CBEA (Competitive Balances for taxonomic Enrichment Analysis) provides an estimate of the activity of a set by transforming an input taxa-by-sample data matrix into a corresponding set-by-sample data matrix. The resulting output can be used for additional downstream analyses such as differential abundance, classification, clustering, etc. using set-based features instead of the original units.

The transformation that CBEA applies is based on the isometric log ratio transformation:
\[ CBEA_{i,\mathbb{S}} = \sqrt{\frac{|\mathbb{S}||\mathbb{S_c}|}{|\mathbb{S}| + |\mathbb{S_c}|}} \ln \frac{g(X_{i,j | j\in \mathbb{S}})}{g(X_{i,j | j \notin \mathbb{S}})} \] Where \(\mathbb{S}\) is the set of interest, \(\mathbb{S}_C\) is it’s complement, \(g()\) is the geometric mean operation, and \(X\) is the original data matrix where \(i\) is the index representing samples and \(j\) is the index representing variables (or taxa).

The inference procedure is performed through estimating the null distribution of the test statistic. This can be done either via permutations or a parametric fit of a distributional form on the permuted scores. Users can also adjust for variance inflation due to inter-taxa correlation. Please refer to the main manuscript for any additional details.

Usage guide

Install CBEA

CBEA is an R package available via the Bioconductor repository for packages. It requires installing the R open source statistical programming language, which can be accessed on any operating system from CRAN. After which you can install CBEA by using the following commands in your R session:

if (!requireNamespace("BiocManager", quietly = TRUE)) {
      install.packages("BiocManager")
  }

BiocManager::install("CBEA")

## Check that you have a valid Bioconductor installation
BiocManager::valid()

If there are any issues with the installation procedure or package features, the best place would be to file an issue at the GitHub repository. For any additional support you can use the Bioconductor support site and use the CBEA tag and check the older posts. Please note that if you want to receive help you should adhere to the posting guidelines. It is particularly critical that you provide a small reproducible example and your session information so package developers can track down the source of the error.

Loading sample data

First, we can load some pre-packaged data sets contained in CBEA. Here we’re loading the data from the Human Microbiome Project (HMP) in TreeSummarizedExperiment data container from the TreeSummarizedExperiment. This package does not support phyloseq from the phyloseq package but users can leverage the mia package to convert between the data types.

In addition, users can also input raw matrices or data frames, however those require additional arguments. The taxa_are_rows argument requires users specify whether the data frame/matrix has taxa abundances as rows (or as columns). The id_col argument requires users to specify (for data frames only) a vector of names of row metadata that will be excluded from the analysis.

data("hmp_gingival")
abun <- hmp_gingival$data
metab_sets <- hmp_gingival$set
abun # this is a TreeSummarizedExperiment object 
#> class: TreeSummarizedExperiment 
#> dim: 5378 369 
#> metadata(4): experimentData phylogeneticTree experimentData
#>   phylogeneticTree
#> assays(1): 16SrRNA
#> rownames(5378): OTU_97.10005 OTU_97.10006 ... OTU_97.9991 OTU_97.9995
#> rowData names(7): CONSENSUS_LINEAGE SUPERKINGDOM ... FAMILY GENUS
#> colnames(369): 700014427 700014521 ... 700111587 700111758
#> colData names(7): RSID VISITNO ... HMP_BODY_SUBSITE SRS_SAMPLE_ID
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> rowLinks: NULL
#> rowTree: NULL
#> colLinks: NULL
#> colTree: NULL

Input sets

CBEA accepts any type of sets, as long as it is in the BiocSet format where the elements in the sets can be matched to taxa names in the data set. The main function will check if these names match.

metab_sets
#> class: BiocSet
#> 
#> es_element():
#> # A tibble: 4,828 × 1
#>   element     
#>   <chr>       
#> 1 OTU_97.10005
#> 2 OTU_97.10053
#> 3 OTU_97.10090
#> # … with 4,825 more rows
#> 
#> es_set():
#> # A tibble: 3 × 1
#>   set        
#>   <chr>      
#> 1 F Anaerobic
#> 2 Aerobic    
#> 3 Anaerobic  
#> 
#> es_elementset() <active>:
#> # A tibble: 4,828 × 2
#>   element      set        
#>   <chr>        <chr>      
#> 1 OTU_97.10005 F Anaerobic
#> 2 OTU_97.10053 F Anaerobic
#> 3 OTU_97.10090 F Anaerobic
#> # … with 4,825 more rows

For more information on BiocSet, please refer to the documentation from BiocSet. However, simply speaking, BiocSet acts similar to a list of three data frames and can be used in conjunction with dplyr/tidyr.

Applying CBEA

After specifying the inputs, cbea is the main function to apply the method. If there are zeros in the abundance data, the cbea will add a pseudocount to avoid issues with the log-ratio transformation (but will throw a warning). If a different zero-handling approach is desired, users should pre-process the abundance data with the appropriate method. For parametric fits, cbea relies on the fitdistrplus and *[mixtools](https://CRAN.R-project.org/package=mixtools)* packages to estimate the parameters of the null. Specific arguments to control this fitting procedure can be provided as a named list in the control argument.
Applying cbea is one command:

results <- cbea(abun, set = metab_sets, abund_values = "16SrRNA",
              output = "cdf", distr = "mnorm", adj = TRUE, thresh = 0.05, n_perm = 10)
#> Warning in .local(obj, set, output, distr, adj, n_perm, parametric, thresh, : Taxonomic count table contains zeros,
#>             which would invalidate the log-ratio transform.
#>             Adding a pseudocount of 1e-5...
#> number of iterations= 644 
#> number of iterations= 1253 
#> number of iterations= 468 
#> number of iterations= 847 
#> number of iterations= 612 
#> number of iterations= 105
results
#> CBEA output of class 'CBEAout' with 369 samples and 3 sets 
#>  Fit type: Parametric with 2-component Gaussian Mixture Distribution 
#>  Number of permutations: 10 
#>  Output type: CDF values (cdf)

Some important arguments to control the behaviour of CBEA.

  • output: This controls what type of output is being returned. CBEA usually estimates a parametric null and users can specify what they want in return. If users want to perform downstream analysis with set-level features, they can return CDF values or z-scores of each raw score computed against that distribution (options cdf or zscore). Alternatively, users can just return the raw scores themselves (no distribution fitting will be performed) using raw as the option. Users can also use this distribution to estimate unadjusted p-values (option pval) to see whether a set is enriched at each sample. These unadjusted p-values can be converted based on a threshold (based on thresh which is default to be set at 0.05) into a dummy variable indicating enrichment (option sig). Note: CDF values and Z-scores are not available for non-parametric null estimations.
  • parametric: This is a logical argument to specify whether a the null distribution will be specified via parametric fit or via non-parametric permutation testing. If parametric is TRUE, users need to specify distr and adj. If parametric is FALSE, users need to increase n_perm.
  • distr: The form of the distribution if parametric fit is desired. As of now only supports norm, mnorm.
  • adj: Whether the distribution should be adjusted for variance inflation. This procedure is done by combining the mean estimate from scores computed from permuted data set and the variance estimate from raw scores (computed on the unpermuted data set).

Model output

The output object is of class CBEAout, which is an S3 object. The underlying data structure is a list of lists, where the outer lists represent different aspects of the output. For example R represent the final scores while diagnostic represent certain goodness-of-fit statistics.

names(results)
#> [1] "R"              "parameters"     "diagnostic"     "fit_comparison"
#> [5] "call_param"

Within each aspect, there is a list of size equivalent to the total number of sets evaluated. For example, the results object is of size 3 representing the evaluated sets.

str(results$R)
#> List of 3
#>  $ Aerobic    : num [1:369] 0.9694 0.0286 0.9953 0.8842 0.5617 ...
#>  $ Anaerobic  : num [1:369] 0.000461 0.58112 0.203075 0.034167 0.088761 ...
#>  $ F Anaerobic: num [1:369] 0.997 0.89 0.303 0.96 0.863 ...

Users can use tidy and glance following the broom to process CBEAout into nice objects. The tidy function returns a tibble of scores (samples by set). The glance function returns some diagnostics. There are two options for the glance function: fit_comparison allows users to compare the l-moments of the data, the permuted data, and the final fitted distribution; fit_diagnostic shows goodness-of-fit statistics of the distribution fitting procedure itself, with log-likelihoods and Anderson-Darling (column “ad”) statistics.

tidy(results)
#> # A tibble: 369 × 4
#>    sample_ids Aerobic Anaerobic F_Anaerobic
#>    <chr>        <dbl>     <dbl>       <dbl>
#>  1 700014427   0.969   0.000461      0.997 
#>  2 700014521   0.0286  0.581         0.890 
#>  3 700014603   0.995   0.203         0.303 
#>  4 700014749   0.884   0.0342        0.960 
#>  5 700014791   0.562   0.0888        0.863 
#>  6 700014917   0.720   0.134         0.898 
#>  7 700014989   0.0249  0.992         0.0462
#>  8 700015076   0.685   0.0698        0.911 
#>  9 700015149   0.402   0.000344      1.00  
#> 10 700015215   0.0976  0.729         0.605 
#> # … with 359 more rows
glance(results, "fit_comparison")
#> # A tibble: 9 × 7
#>   set_ids     final_param  distr l_location l_scale l_skewness l_kurtosis
#>   <chr>       <named list> <chr>      <dbl>   <dbl>      <dbl>      <dbl>
#> 1 Aerobic     <dbl [6]>    data      6.98     15.2   0.0300        0.114 
#> 2 Aerobic     <dbl [6]>    perm     -0.0557    2.58 -0.0000123     0.131 
#> 3 Aerobic     <dbl [6]>    fit       0.0768   13.0  -0.00401       0.127 
#> 4 Anaerobic   <dbl [6]>    data     -6.42     20.0   0.0360        0.0917
#> 5 Anaerobic   <dbl [6]>    perm      0.758     2.54  0.0146        0.122 
#> 6 Anaerobic   <dbl [6]>    fit       0.986    15.0   0.0104        0.132 
#> 7 F_Anaerobic <dbl [6]>    data      5.88     16.5   0.0481        0.0952
#> 8 F_Anaerobic <dbl [6]>    perm     -0.465     2.71  0.000482      0.125 
#> 9 F_Anaerobic <dbl [6]>    fit      -0.591    15.1   0.000309      0.123
glance(results, "fit_diagnostic")
#> # A tibble: 6 × 5
#>   set_ids     final_param   loglik    ad type          
#>   <chr>       <named list>   <dbl> <dbl> <chr>         
#> 1 Aerobic     <dbl [6]>    -10853. 0.229 permuted_distr
#> 2 Aerobic     <dbl [6]>     -1735. 0.141 unperm_distr  
#> 3 Anaerobic   <dbl [6]>    -10791. 0.327 permuted_distr
#> 4 Anaerobic   <dbl [6]>     -1832. 0.185 unperm_distr  
#> 5 F_Anaerobic <dbl [6]>    -11036. 0.321 permuted_distr
#> 6 F_Anaerobic <dbl [6]>     -1762. 0.277 unperm_distr

Parallel computing

CBEA has in-built capacity to perform calculations paralelled across the total number of sets. The engine for parallelization is BiocParallel. If NULL, SerialParam backend will be used.

BiocParallel::registered()
#> $SerialParam
#> class: SerialParam
#>   bpisup: FALSE; bpnworkers: 1; bptasks: 0; bpjobname: BPJOB
#>   bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE
#>   bpRNGseed: ; bptimeout: 2592000; bpprogressbar: FALSE
#>   bpexportglobals: TRUE; bpforceGC: FALSE
#>   bplogdir: NA
#>   bpresultdir: NA
cbea(abun, set = metab_sets, abund_values = "16SrRNA",
     output = "cdf", distr = "mnorm", adj = TRUE, thresh = 0.05, n_perm = 10, 
     parallel_backend = MulticoreParam(workers = 2))

Citing CBEA

We hope that CBEA will be useful for your research. Please use the following information to cite the package and the overall approach. Thank you!

## Citation info
citation("CBEA")
#> 
#> Nguyen Q (2022). _CBEA: R package for performing CBEA approach_. doi:
#> 10.18129/B9.bioc.CBEA (URL: https://doi.org/10.18129/B9.bioc.CBEA),
#> https://github.com/qpmnguyen/CBEA - R package version 0.99.3, <URL:
#> http://www.bioconductor.org/packages/CBEA>.
#> 
#> Nguyen Q (2022). "CBEA: Competitive balances for taxonomic enrichment
#> analysis." _bioRxiv_. doi: 10.1101/TODO (URL:
#> https://doi.org/10.1101/TODO), <URL:
#> https://www.biorxiv.org/content/10.1101/TODO>.
#> 
#> To see these entries in BibTeX format, use 'print(<citation>,
#> bibtex=TRUE)', 'toBibtex(.)', or set
#> 'options(citation.bibtex.max=999)'.

Reproducibility

The CBEA package (Nguyen, 2022) was made possible thanks to:

  • R (R Core Team, 2021)
  • BiocStyle (Oleś, 2021)
  • knitr (Xie, 2021)
  • RefManageR (McLean, 2017)
  • rmarkdown (Allaire, Xie, McPherson, Luraschi, Ushey, Atkins, Wickham, Cheng, Chang, and Iannone, 2022)
  • broom (Robinson, Hayes, and Couch, 2022)
  • sessioninfo (Wickham, Chang, Flight, Müller, and Hester, 2021)
  • testthat (Wickham, 2011)
  • mixtools (Benaglia, Chauveau, Hunter, and Young, 2009)
  • fitdistrplus (Delignette-Muller and Dutang, 2015)
  • tidyverse (Wickham, Averick, Bryan, Chang, McGowan, François, Grolemund, Hayes, Henry, Hester, Kuhn, Pedersen, Miller, Bache, Müller, Ooms, Robinson, Seidel, Spinu, Takahashi, Vaughan, Wilke, Woo, and Yutani, 2019)
  • BiocSet
  • phyloseq (McMurdie and Holmes, 2013)
  • BiocParallel (Morgan, Wang, Obenchain, Lang, Thompson, and Turaga, 2021)

This package was developed using biocthis.

Code for creating the vignette

## Create the vignette
library("rmarkdown")
system.time(render("basic_usage.Rmd", "BiocStyle::html_document"))

## Extract the R code
library("knitr")
knit("basic_usage.Rmd", tangle = TRUE)

Date the vignette was generated.

#> [1] "2022-03-08 22:56:26 UTC"

Wallclock time spent generating the vignette.

#> Time difference of 19.547 secs

R session information.

#> ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.1.2 (2021-11-01)
#>  os       Ubuntu 20.04.4 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language en
#>  collate  C.UTF-8
#>  ctype    C.UTF-8
#>  tz       UTC
#>  date     2022-03-08
#>  pandoc   2.7.3 @ /usr/bin/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
#>  package                  * version  date (UTC) lib source
#>  AnnotationDbi              1.56.2   2021-11-09 [1] Bioconductor
#>  ape                        5.6-2    2022-03-02 [1] RSPM
#>  assertthat                 0.2.1    2019-03-21 [1] RSPM
#>  backports                  1.4.1    2021-12-13 [1] RSPM
#>  Biobase                    2.54.0   2021-10-26 [1] Bioconductor
#>  BiocGenerics               0.40.0   2021-10-26 [1] Bioconductor
#>  BiocIO                     1.4.0    2021-10-26 [1] Bioconductor
#>  BiocManager                1.30.16  2021-06-15 [1] RSPM
#>  BiocParallel               1.28.3   2021-12-09 [1] Bioconductor
#>  BiocSet                  * 1.8.1    2021-11-03 [1] Bioconductor
#>  BiocStyle                * 2.22.0   2021-10-26 [1] Bioconductor
#>  Biostrings                 2.62.0   2021-10-26 [1] Bioconductor
#>  bit                        4.0.4    2020-08-04 [1] RSPM
#>  bit64                      4.0.5    2020-08-30 [1] RSPM
#>  bitops                     1.0-7    2021-04-24 [1] RSPM
#>  blob                       1.2.2    2021-07-23 [1] RSPM
#>  bookdown                   0.24     2021-09-02 [1] RSPM
#>  broom                      0.7.12   2022-01-28 [1] RSPM
#>  bslib                      0.3.1    2021-10-06 [1] RSPM
#>  cachem                     1.0.6    2021-08-19 [1] RSPM
#>  CBEA                     * 0.99.3   2022-03-08 [1] local
#>  cellranger                 1.1.0    2016-07-27 [1] RSPM
#>  cli                        3.2.0    2022-02-14 [1] RSPM
#>  colorspace                 2.0-3    2022-02-21 [1] RSPM
#>  crayon                     1.5.0    2022-02-14 [1] RSPM
#>  DBI                        1.1.2    2021-12-20 [1] RSPM
#>  dbplyr                     2.1.1    2021-04-06 [1] RSPM
#>  DelayedArray               0.20.0   2021-10-26 [1] Bioconductor
#>  desc                       1.4.1    2022-03-06 [1] RSPM
#>  digest                     0.6.29   2021-12-01 [1] RSPM
#>  dplyr                    * 1.0.8    2022-02-08 [1] RSPM
#>  ellipsis                   0.3.2    2021-04-29 [1] RSPM
#>  evaluate                   0.15     2022-02-18 [1] RSPM
#>  fansi                      1.0.2    2022-01-14 [1] RSPM
#>  fastmap                    1.1.0    2021-01-25 [1] RSPM
#>  fitdistrplus               1.1-6    2021-09-28 [1] RSPM
#>  forcats                  * 0.5.1    2021-01-27 [1] RSPM
#>  fs                         1.5.2    2021-12-08 [1] RSPM
#>  generics                   0.1.2    2022-01-31 [1] RSPM
#>  GenomeInfoDb               1.30.1   2022-01-30 [1] Bioconductor
#>  GenomeInfoDbData           1.2.7    2022-03-03 [1] Bioconductor
#>  GenomicRanges              1.46.1   2021-11-18 [1] Bioconductor
#>  ggplot2                  * 3.3.5    2021-06-25 [1] RSPM
#>  glue                       1.6.2    2022-02-24 [1] RSPM
#>  goftest                    1.2-3    2021-10-07 [1] RSPM
#>  gtable                     0.3.0    2019-03-25 [1] RSPM
#>  haven                      2.4.3    2021-08-04 [1] RSPM
#>  hms                        1.1.1    2021-09-26 [1] RSPM
#>  htmltools                  0.5.2    2021-08-25 [1] RSPM
#>  httr                       1.4.2    2020-07-20 [1] RSPM
#>  IRanges                    2.28.0   2021-10-26 [1] Bioconductor
#>  jquerylib                  0.1.4    2021-04-26 [1] RSPM
#>  jsonlite                   1.8.0    2022-02-22 [1] RSPM
#>  KEGGREST                   1.34.0   2021-10-26 [1] Bioconductor
#>  kernlab                    0.9-29   2019-11-12 [1] RSPM
#>  knitr                      1.37     2021-12-16 [1] RSPM
#>  lattice                    0.20-45  2021-09-22 [2] CRAN (R 4.1.2)
#>  lazyeval                   0.2.2    2019-03-15 [1] RSPM
#>  lifecycle                  1.0.1    2021-09-24 [1] RSPM
#>  lmom                       2.8      2019-03-12 [1] RSPM
#>  lubridate                  1.8.0    2021-10-07 [1] RSPM
#>  magrittr                   2.0.2    2022-01-26 [1] RSPM
#>  MASS                       7.3-55   2022-01-13 [1] RSPM
#>  Matrix                     1.4-0    2021-12-08 [1] RSPM
#>  MatrixGenerics             1.6.0    2021-10-26 [1] Bioconductor
#>  matrixStats                0.61.0   2021-09-17 [1] RSPM
#>  memoise                    2.0.1    2021-11-26 [1] RSPM
#>  mixtools                   1.2.0    2020-02-07 [1] RSPM
#>  modelr                     0.1.8    2020-05-19 [1] RSPM
#>  munsell                    0.5.0    2018-06-12 [1] RSPM
#>  nlme                       3.1-155  2022-01-13 [1] RSPM
#>  ontologyIndex              2.7      2021-02-03 [1] RSPM
#>  pillar                     1.7.0    2022-02-01 [1] RSPM
#>  pkgconfig                  2.0.3    2019-09-22 [1] RSPM
#>  pkgdown                    2.0.2    2022-01-13 [1] RSPM
#>  plyr                       1.8.6    2020-03-03 [1] RSPM
#>  png                        0.1-7    2013-12-03 [1] RSPM
#>  purrr                    * 0.3.4    2020-04-17 [1] RSPM
#>  R6                         2.5.1    2021-08-19 [1] RSPM
#>  ragg                       1.2.2    2022-02-21 [1] RSPM
#>  Rcpp                       1.0.8    2022-01-13 [1] RSPM
#>  RCurl                      1.98-1.6 2022-02-08 [1] RSPM
#>  readr                    * 2.1.2    2022-01-30 [1] RSPM
#>  readxl                     1.3.1    2019-03-13 [1] RSPM
#>  RefManageR               * 1.3.0    2020-11-13 [1] RSPM
#>  reprex                     2.0.1    2021-08-05 [1] RSPM
#>  rlang                      1.0.2    2022-03-04 [1] RSPM
#>  rmarkdown                  2.12     2022-03-02 [1] RSPM
#>  rprojroot                  2.0.2    2020-11-15 [1] RSPM
#>  RSQLite                    2.2.10   2022-02-17 [1] RSPM
#>  rstudioapi                 0.13     2020-11-12 [1] RSPM
#>  rvest                      1.0.2    2021-10-16 [1] RSPM
#>  S4Vectors                  0.32.3   2021-11-21 [1] Bioconductor
#>  sass                       0.4.0    2021-05-12 [1] RSPM
#>  scales                     1.1.1    2020-05-11 [1] RSPM
#>  segmented                  1.4-0    2022-01-28 [1] RSPM
#>  sessioninfo              * 1.2.2    2021-12-06 [1] RSPM
#>  SingleCellExperiment       1.16.0   2021-10-26 [1] Bioconductor
#>  stringi                    1.7.6    2021-11-29 [1] RSPM
#>  stringr                  * 1.4.0    2019-02-10 [1] RSPM
#>  SummarizedExperiment       1.24.0   2021-10-26 [1] Bioconductor
#>  survival                   3.3-1    2022-03-03 [1] CRAN (R 4.1.2)
#>  systemfonts                1.0.4    2022-02-11 [1] RSPM
#>  textshaping                0.3.6    2021-10-13 [1] RSPM
#>  tibble                   * 3.1.6    2021-11-07 [1] RSPM
#>  tidyr                    * 1.2.0    2022-02-01 [1] RSPM
#>  tidyselect                 1.1.2    2022-02-21 [1] RSPM
#>  tidytree                   0.3.9    2022-03-04 [1] RSPM
#>  tidyverse                * 1.3.1    2021-04-15 [1] RSPM
#>  treeio                     1.18.1   2021-11-14 [1] Bioconductor
#>  TreeSummarizedExperiment   2.2.0    2021-10-26 [1] Bioconductor
#>  tzdb                       0.2.0    2021-10-27 [1] RSPM
#>  utf8                       1.2.2    2021-07-24 [1] RSPM
#>  vctrs                      0.3.8    2021-04-29 [1] RSPM
#>  withr                      2.5.0    2022-03-03 [1] RSPM
#>  xfun                       0.30     2022-03-02 [1] RSPM
#>  xml2                       1.3.3    2021-11-30 [1] RSPM
#>  XVector                    0.34.0   2021-10-26 [1] Bioconductor
#>  yaml                       2.3.5    2022-02-21 [1] RSPM
#>  yulab.utils                0.0.4    2021-10-09 [1] RSPM
#>  zlibbioc                   1.40.0   2021-10-26 [1] Bioconductor
#> 
#>  [1] /home/runner/work/_temp/Library
#>  [2] /opt/R/4.1.2/lib/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Bibliography

This vignette was generated using BiocStyle (Oleś, 2021) with knitr (Xie, 2021) and rmarkdown (Allaire, Xie, McPherson, et al., 2022) running behind the scenes.

Citations made with RefManageR (McLean, 2017).

[1] J. Allaire, Y. Xie, J. McPherson, et al. rmarkdown: Dynamic Documents for R. R package version 2.12. 2022. URL: https://github.com/rstudio/rmarkdown.

[2] T. Benaglia, D. Chauveau, D. R. Hunter, et al. “mixtools: An R Package for Analyzing Finite Mixture Models”. In: Journal of Statistical Software 32.6 (2009), pp. 1–29. URL: http://www.jstatsoft.org/v32/i06/.

[3] M. L. Delignette-Muller and C. Dutang. “fitdistrplus: An R Package for Fitting Distributions”. In: Journal of Statistical Software 64.4 (2015), pp. 1–34. URL: https://www.jstatsoft.org/article/view/v064i04.

[4] M. W. McLean. “RefManageR: Import and Manage BibTeX and BibLaTeX References in R”. In: The Journal of Open Source Software (2017). DOI: 10.21105/joss.00338.

[5] P. J. McMurdie and S. Holmes. “phyloseq: An R package for reproducible interactive analysis and graphics of microbiome census data”. In: PLoS ONE 8.4 (2013), p. e61217. URL: http://dx.plos.org/10.1371/journal.pone.0061217.

[6] M. Morgan, J. Wang, V. Obenchain, et al. BiocParallel: Bioconductor facilities for parallel evaluation. R package version 1.28.3. 2021. URL: https://github.com/Bioconductor/BiocParallel.

[7] Q. Nguyen. CBEA: R package for performing CBEA approach. https://github.com/qpmnguyen/CBEA - R package version 0.99.3. 2022. DOI: 10.18129/B9.bioc.CBEA. URL: http://www.bioconductor.org/packages/CBEA.

[8] A. Oleś. BiocStyle: Standard styles for vignettes and other Bioconductor documents. R package version 2.22.0. 2021. URL: https://github.com/Bioconductor/BiocStyle.

[9] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria, 2021. URL: https://www.R-project.org/.

[10] D. Robinson, A. Hayes, and S. Couch. broom: Convert Statistical Objects into Tidy Tibbles. https://broom.tidymodels.org/, https://github.com/tidymodels/broom. 2022.

[11] H. Wickham. “testthat: Get Started with Testing”. In: The R Journal 3 (2011), pp. 5–10. URL: https://journal.r-project.org/archive/2011-1/RJournal_2011-1_Wickham.pdf.

[12] H. Wickham, M. Averick, J. Bryan, et al. “Welcome to the tidyverse”. In: Journal of Open Source Software 4.43 (2019), p. 1686. DOI: 10.21105/joss.01686.

[13] H. Wickham, W. Chang, R. Flight, et al. sessioninfo: R Session Information. https://github.com/r-lib/sessioninfo#readme, https://r-lib.github.io/sessioninfo/. 2021.

[14] Y. Xie. knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.37. 2021. URL: https://yihui.org/knitr/.