2. Parallelize Computation of Indices
Source:vignettes/fundiversity_1-parallel.Rmd
fundiversity_1-parallel.Rmd
Note: This vignette presents some performance tests ran between
non-parallel and parallel versions of fundiversity
functions. Note that to avoid the dependency on other packages, this
vignette is pre-computed.
Within fundiversity
the computation of most indices can
be parallelized using the future
package. The goal of this
vignette is to explain how to toggle and use parallelization in
fundiversity
. The functions that currently support
parallelization are summarized in the table below:
Function Name | Index Name | Parallelizable1 | Memoizable2 |
---|---|---|---|
fd_fric() |
FRic | ✅ | ✅ |
fd_fric_intersect() |
FRic_intersect | ✅ | ✅ |
fd_fdiv() |
FDiv | ✅ | ✅ |
fd_feve() |
FEve | ✅ | ❌ |
fd_fdis() |
FDis | ✅ | ❌ |
fd_raoq() |
Rao’s Q | ❌ | ❌ |
Note that memoization and parallelization cannot be used at
the same time. If the option fundiversity.memoise
has been set to TRUE
but the computations are parallelized,
fundiversity
will use unmemoised versions of functions.
The future
package provides a simple and general
framework to allow asynchronous computation depending on the resources
available for the user. The first vignette of
future
gives a general overview of all its features.
The main idea being that the user should write the code once and that it
would run seamlessly sequentially, or in parallel on a single computer,
or on a cluster, or distributed over several computers.
fundiversity
can thus run on all these different backends
following the user’s choice.
library("fundiversity")
data("traits_birds", package = "fundiversity")
data("site_sp_birds", package = "fundiversity")
Running code in parallel
By default the fundiversity
code will run sequentially
on a single core. To trigger parallelization the user needs to define a
future::plan()
object with a parallel backend such as
future::multisession
to split the execution across multiple
R sessions.
# Sequential execution
fric1 <- fd_fric(traits_birds)
# Parallel execution
future::plan(future::multisession) # Plan definition
fric2 <- fd_fric(traits_birds) # The code resolve in similar fashion
identical(fric1, fric2)
#> [1] TRUE
Within the future::multisession
backend you can specify
the number of cores on which the function should be parallelized over
through the argument workers
, you can change it in the
future::plan()
call:
future::plan(future::multisession, workers = 2) # Only 2 cores are used
fric3 <- fd_fric(traits_birds)
identical(fric3, fric2)
#> [1] TRUE
To learn more about the different backends available and the related
arguments needed, please refer to the documentation of
future::plan()
and the overview vignette of
future
.
Performance comparison
We can now compare the difference in performance to see the performance gain thanks to parallelization:
future::plan(future::sequential)
non_parallel_bench <- microbenchmark::microbenchmark(
non_parallel = {
fd_fric(traits_birds)
},
times = 20
)
future::plan(future::multisession)
parallel_bench <- microbenchmark::microbenchmark(
parallel = {
fd_fric(traits_birds)
},
times = 20
)
rbind(non_parallel_bench, parallel_bench)
#> Unit: milliseconds
#> expr min lq mean median uq max neval cld
#> non_parallel 8.9509 9.2691 14.93812 13.32405 18.4841 33.153 20 a
#> parallel 224.7037 248.9997 345.59427 274.59615 304.6889 1660.164 20 b
The non parallelized code runs faster than the parallelized one!
Indeed, the parallelization in fundiversity
parallelize the
computation across different sites. So parallelization should be used
when you have many sites on which you want to compute similar
indices.
# Function to make a bigger site-sp dataset
make_more_sites <- function(n) {
site_sp <- do.call(rbind, replicate(n, site_sp_birds, simplify = FALSE))
rownames(site_sp) <- paste0("s", seq_len(nrow(site_sp)))
site_sp
}
For example with a dataset 5000 times bigger:
bigger_site <- make_more_sites(5000)
microbenchmark::microbenchmark(
seq = {
future::plan(future::sequential)
fd_fric(traits_birds, bigger_site)
},
multisession = {
future::plan(future::multisession, workers = 4)
fd_fric(traits_birds, bigger_site)
},
multicore = {
future::plan(future::multicore, workers = 4)
fd_fric(traits_birds, bigger_site)
}, times = 20
)
#> Unit: seconds
#> expr min lq mean median uq max neval cld
#> seq 78.13766 195.17853 184.92560 196.89360 197.90500 200.56116 20 a
#> multisession 34.23402 54.44036 53.39172 54.88206 55.19359 61.83829 20 b
#> multicore 75.43857 192.45136 183.07222 196.48277 201.16889 209.39847 20 a
Session info of the machine on which the benchmark was ran and time it took to run
#> seconds needed to generate this document: 8443.78 sec elapsed
#> ─ Session info ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.3.1 (2023-06-16 ucrt)
#> os Windows 11 x64 (build 22631)
#> system x86_64, mingw32
#> ui RStudio
#> language (EN)
#> collate French_France.utf8
#> ctype fr_FR.UTF-8
#> tz Europe/Paris
#> date 2024-03-26
#> rstudio 2023.12.1+402 Ocean Storm (desktop)
#> pandoc 3.1.1 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#>
#> ─ Packages ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
#> ! package * version date (UTC) lib source
#> abind 1.4-5 2016-07-21 [1] CRAN (R 4.3.0)
#> cachem 1.0.8 2023-05-01 [1] CRAN (R 4.3.1)
#> cli 3.6.2 2023-12-11 [1] CRAN (R 4.3.2)
#> cluster 2.1.6 2023-12-01 [1] CRAN (R 4.3.2)
#> codetools 0.2-19 2023-02-01 [2] CRAN (R 4.3.1)
#> commonmark 1.9.1 2024-01-30 [1] CRAN (R 4.3.2)
#> crayon 1.5.2 2022-09-29 [1] CRAN (R 4.3.1)
#> curl 5.2.1 2024-03-01 [1] CRAN (R 4.3.3)
#> desc 1.4.3 2023-12-10 [1] CRAN (R 4.3.2)
#> devtools 2.4.5 2022-10-11 [1] CRAN (R 4.3.2)
#> digest 0.6.35 2024-03-11 [1] CRAN (R 4.3.3)
#> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.3.1)
#> evaluate 0.23 2023-11-01 [1] CRAN (R 4.3.2)
#> fansi 1.0.6 2023-12-08 [1] CRAN (R 4.3.2)
#> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.1)
#> fs 1.6.3 2023-07-20 [1] CRAN (R 4.3.1)
#> P fundiversity * 1.1.1 2024-01-03 [?] Github (funecology/fundiversity@d11d749)
#> future 1.33.1 2023-12-22 [1] CRAN (R 4.3.2)
#> future.apply 1.11.1 2023-12-21 [1] CRAN (R 4.3.2)
#> geometry 0.4.7 2023-02-03 [1] CRAN (R 4.3.1)
#> gh 1.4.0 2023-02-22 [1] CRAN (R 4.3.1)
#> gitcreds 0.1.2 2022-09-08 [1] CRAN (R 4.3.1)
#> globals 0.16.3 2024-03-08 [1] CRAN (R 4.3.3)
#> glue 1.7.0 2024-01-09 [1] CRAN (R 4.3.2)
#> htmltools 0.5.7 2023-11-03 [1] CRAN (R 4.3.2)
#> htmlwidgets 1.6.4 2023-12-06 [1] CRAN (R 4.3.2)
#> httpuv 1.6.14 2024-01-26 [1] CRAN (R 4.3.2)
#> httr2 1.0.0 2023-11-14 [1] CRAN (R 4.3.2)
#> jsonlite 1.8.8 2023-12-04 [1] CRAN (R 4.3.2)
#> knitr 1.45 2023-10-30 [1] CRAN (R 4.3.2)
#> later 1.3.2 2023-12-06 [1] CRAN (R 4.3.2)
#> lattice 0.22-5 2023-10-24 [1] CRAN (R 4.3.2)
#> lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.2)
#> listenv 0.9.1 2024-01-29 [1] CRAN (R 4.3.2)
#> magic 1.6-1 2022-11-16 [1] CRAN (R 4.3.0)
#> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.1)
#> MASS 7.3-60.0.1 2024-01-13 [1] CRAN (R 4.3.2)
#> Matrix 1.6-5 2024-01-11 [1] CRAN (R 4.3.1)
#> memoise 2.0.1 2021-11-26 [1] CRAN (R 4.3.1)
#> mgcv 1.9-1 2023-12-21 [1] CRAN (R 4.3.2)
#> microbenchmark 1.4.10 2023-04-28 [1] CRAN (R 4.3.3)
#> mime 0.12 2021-09-28 [1] CRAN (R 4.3.0)
#> miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.3.2)
#> multcomp 1.4-25 2023-06-20 [1] CRAN (R 4.3.1)
#> mvtnorm 1.2-4 2023-11-27 [1] CRAN (R 4.3.2)
#> nlme 3.1-164 2023-11-27 [1] CRAN (R 4.3.2)
#> parallelly 1.37.1 2024-02-29 [1] CRAN (R 4.3.3)
#> permute 0.9-7 2022-01-27 [1] CRAN (R 4.3.1)
#> pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.1)
#> pkgbuild 1.4.4 2024-03-17 [1] CRAN (R 4.3.3)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.1)
#> pkgload 1.3.4 2024-01-16 [1] CRAN (R 4.3.2)
#> profvis 0.3.8 2023-05-02 [1] CRAN (R 4.3.2)
#> promises 1.2.1 2023-08-10 [1] CRAN (R 4.3.1)
#> purrr 1.0.2 2023-08-10 [1] CRAN (R 4.3.1)
#> R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.1)
#> rappdirs 0.3.3 2021-01-31 [1] CRAN (R 4.3.1)
#> Rcpp 1.0.12 2024-01-09 [1] CRAN (R 4.3.2)
#> remotes 2.5.0 2024-03-17 [1] CRAN (R 4.3.3)
#> rlang 1.1.3 2024-01-10 [1] CRAN (R 4.3.2)
#> rmarkdown 2.26 2024-03-05 [1] CRAN (R 4.3.3)
#> roxygen2 7.3.1 2024-01-22 [1] CRAN (R 4.3.2)
#> rprojroot 2.0.4 2023-11-05 [1] CRAN (R 4.3.2)
#> rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.1)
#> sandwich 3.1-0 2023-12-11 [1] CRAN (R 4.3.2)
#> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.1)
#> shiny 1.8.0 2023-11-17 [1] CRAN (R 4.3.2)
#> stringi 1.8.3 2023-12-11 [1] CRAN (R 4.3.2)
#> stringr 1.5.1 2023-11-14 [1] CRAN (R 4.3.2)
#> survival 3.5-8 2024-02-14 [1] CRAN (R 4.3.3)
#> TH.data 1.1-2 2023-04-17 [1] CRAN (R 4.3.1)
#> tibble 3.2.1 2023-03-20 [1] CRAN (R 4.3.1)
#> tictoc 1.2.1 2024-03-18 [1] CRAN (R 4.3.3)
#> urlchecker 1.0.1 2021-11-30 [1] CRAN (R 4.3.2)
#> usethis 2.2.3 2024-02-19 [1] CRAN (R 4.3.3)
#> utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.2)
#> vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.3.2)
#> vegan 2.6-4 2022-10-11 [1] CRAN (R 4.3.1)
#> withr 3.0.0 2024-01-16 [1] CRAN (R 4.3.2)
#> xfun 0.42 2024-02-08 [1] CRAN (R 4.3.3)
#> xml2 1.3.6 2023-12-04 [1] CRAN (R 4.3.2)
#> xtable 1.8-4 2019-04-21 [1] CRAN (R 4.3.1)
#> yaml 2.3.8 2023-12-11 [1] CRAN (R 4.3.2)
#> zoo 1.8-12 2023-04-13 [1] CRAN (R 4.3.1)
#>
#> [1] C:/Users/greniem/AppData/Local/R/win-library/4.3
#> [2] C:/Program Files/R/R-4.3.1/library
#>
#> P ── Loaded and on-disk path mismatch.
#>
#> ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────