5. Design Principles for fundiversity
Source:vignettes/fundiversity_4-design-principles.Rmd
fundiversity_4-design-principles.Rmd
This package is built with a clear set of design principles based on current best practices. These design principles have been established before writing any code to make sure obstacles along the way would not lower our expectations.
Scope
fundiversity aims to provide a reliable tool to compute functional diversity indices. As some of the most used functional diversity indices are defined in Villéger et al. (2008), fundiversity adopts this framework. Rao’s Quadratic Entropy is quite popular as an additional functional diversity index, which makes it a good addition to the panel of indices computable with fundiversity.
Dependencies
We aim at having as few as dependencies as possible, unless they remain relatively lightweight and provide a large speed boost.
As CRAN packages are now automatically archived at the same time of any of their strong dependencies, dependencies for fundiversity should be well established, have a good track record at remaining on CRAN, and ideally already have a large number of reverse-dependencies.
Based on these guidelines, some acceptable dependencies are:
Additionally, special care is taken with packages that rely on external libraries, as their installation might be an issue on shared computing platforms where users don’t have super-user privileges.
fundiversity does however depend on vegan which brings quite a number of other dependencies. vegan is still heavy developed and as such shouldn’t be archived by CRAN without notice. fundiversity users probably already have vegan installed as they are probably also interested in other community ecology analyses provided by vegan. Thus depending on vegan is not a dependency nor an installation burden.
Functions
Each index should be computed in its own separate function
Putting each index in its own function improves maintainability by having shorter, more easily readable functions, with less control-flow.
Additionally, it speeds up computation (versus the case where all indices would be computed each time) and keep the number of columns in the output constant (as opposed to the case where an argument would control which index is returned by a single function with all indices).
Input data should not be transformed without any explicit action from the user
Some packages in functional ecology transform the data before processing it. One common such transformation is the use of dimensionality reduction techniques.
Such transformations should only be done by the user if they wish to do so but never by the package itself. Even when documented in the function help, it is easily overlooked by users and may lead to misinterpreted results.
One acceptable exception is when function need dissimilarity matrix as input, such as for the computation of FEve or Rao’s Quadratic Entropy. Because other functions in fundiversity accept raw trait data, for the sake of coherence the functions that need dissimilarity matrix as input should also accept raw trait data and compute dissimilarity internally. To do so, they should still make minimal assumptions regarding the input dataset that all traits are quantitative. If it’s not the case, then it’s the user’s responsibility to choose the adapted dissimilarity metric she wants to use.
Inputs
Two inputs are acceptable in fundiversity functions:
- tabular data (such as a
matrix
, or atibble
, but with a preference fordata.frame
s) with individuals as rows and characteristics as columns. - a distance matrix between individuals.
The rationale is that providing data.frame
s with
characteristics is more user-friendly, as they are a common format in
functional ecology, and data.frame
s are a familiar object
in R.
However, only allowing data.frame
s doesn’t provide
enough flexibility. In particular, advanced users may want to compute
distances with a custom function (instead of Euclidean distances).
Outputs
Functions should output data.frame
s
The functions should return data.frame
s whenever
possible for several reasons:
-
data.frame
s are one of the most common object in the R ecosystem meaning they are familiar to beginners and there exist many S3 methods for them. -
data.frame
s enable a pipe-able workflow, which is important it is already prevalent in the tidyverse and will be part of base R 4.1
The outputs of functions should be similar in structure
To avoid further steps of data wrangling after running analyses, all functions should output similar structured data. When computing functional diversity indices most users want to compute site-level metric. To make our functions easy to use and flexible, they should only output the absolutely necessary data:
- a column named
site
containing the site names as given by the user (guessed from the row names in the input site-species matrix), - a column named following the computed functional index
(
FDiv
,FRic
, etc.) that contains the values of the indices.
That way our functions only outputs unambiguous data that can easily be reused and merged with other data at the site-level.