R has obscenely long function names

A crudely-drawn long dog with the R function name aspell_write_personal_dictionary_file written along the length of its body.

tl;dr

Use ls() on a package name in the form "package:base" to see all the objects it contains. I’ve done this to find the longest (and shortest) function names in base R and the {tidyverse} suite.

Naming things

I try to keep to a few rules when creating function names, like:

It can be tricky to be succinct. Consider the base R function suppressPackageStartupMessages()1: it’s a whopping 30 characters, but all the words are important. Something shortened, like suppPkgStartMsg(), wouldn’t be so clear.

It made me wonder: what’s the longest function name in R?2

But! It seems tricky and time consuming to find the longest function name from all R packages. CRAN alone has over 18,000 at time of writing.

A much easier (lazier) approach is to focus on some package subsets. I’ll look at base R and the {tidyverse}.

The long and the short of it

Base R

Certain R packages are built-in and attached by default on startup.

base_names <- sessionInfo()$basePkgs
base_names
## [1] "stats"     "graphics"  "grDevices" "utils"     "datasets"  "methods"  
## [7] "base"

How can we fetch all the functions from these packages? We can use ls() to list all their objects, supplying the package name in the format "package:base", for example. Note that I said ‘objects’, not ‘functions’, since it will also return names that refer to things like datasets.

For fun, we can use this as an excuse to demo ‘lambda’ syntax and the dog’s balls approach to function-writing, both introduced in R v4.1.3

base_pkgs <- paste0("package:", base_names)

base_fns <- lapply(base_pkgs, ls) |>
  setNames(base_names) |> 
  lapply(\(object) as.data.frame(object)) |> 
  (\(x) do.call(rbind, x))()  # the balls ()()

base_fns$package <- gsub("\\.\\d{,4}$", "", row.names(base_fns))
row.names(base_fns) <- NULL
base_fns$nchar <- nchar(base_fns$object)

base_fns <- base_fns[order(-base_fns$nchar), ]

Of the 2424 objects across these packages, a quick histogram shows that the most frequent character length is under 10, with a tail stretching out to over 30.

hist(
  base_fns$nchar,
  main = "Character length of base-object names",
  xlab = "Number of characters",
  las = 1
)

Histogram of character lengths for base object names. It's fairly normal around a bin of 6 to 8 characters, which has a peak frequency of over 400, plus there's a tail stretching out to over 30 characters.

Here’s the top 10 by character length.

base_fns_top <- base_fns[order(-base_fns$nchar), ]
rownames(base_fns_top) <- seq(length = nrow(base_fns_top))
head(base_fns_top, 10)
##                                   object package nchar
## 1  aspell_write_personal_dictionary_file   utils    37
## 2     getDLLRegisteredRoutines.character    base    34
## 3       getDLLRegisteredRoutines.DLLInfo    base    32
## 4        reconcilePropertiesAndPrototype methods    31
## 5         suppressPackageStartupMessages    base    30
## 6          as.data.frame.numeric_version    base    29
## 7           as.character.numeric_version    base    28
## 8            print.DLLRegisteredRoutines    base    27
## 9             as.data.frame.model.matrix    base    26
## 10            conditionMessage.condition    base    26

So there are four objects with names longer than suppressPackageStartupMessages(), though they are rarely used as far as I can tell. The longest is aspell_write_personal_dictionary_file(), which has 37(!) characters. It’s part of the spellcheck functions in {utils}.

It’s interesting to me that it follows some of those rules for function naming that I mentioned earlier. It has a verb, is descriptive and uses a prefix for easier autocomplete; ‘aspell’ refers to the GNU open-source Aspell spellchecker on which it’s based.

I’m intrigued that the function uses snake_case rather than camelCase or dot.case, which seem more prevalent in base functions. You could argue then that the underscores have ‘inflated’ the length by four characters. Similarly, the prefix adds another six characters. So maybe the function could be simplified to writePersonalDictionaryFile(), which is merely 27 characters.

What about shortest functions? There are many one-character functions in base R.

sort(base_fns[base_fns$nchar == 1, ][["object"]])
##  [1] "-" ":" "!" "?" "(" "[" "{" "@" "*" "/" "&" "^" "+" "<" "=" ">" "|" "~" "$"
## [20] "c" "C" "D" "F" "I" "q" "t" "T"

Some of these will be familiar, like c() to concatenate and t() to transpose. You might wonder why operators and brackets are in here. Remember: everything in R is a function, so `[`(mtcars, "hp") is the same as mtcars["hp"]. I have to admit that stats::C() and stats::D() were new to me.

{tidyverse}

How about object names from the {tidyverse}?

To start, we need to attach the packages. Running library(tidyverse) only loads the core packages of the tidyverse, so we need another approach to attach them all.

One method is to get the a vector of the package names with the tidyverse_packages() function and pass it to p_load() from {pacman}, which prevents the need for a separate library() call for each one.4

First, here’s the tidyverse packages.

# install.packages("tidyverse")  # if not installed
suppressPackageStartupMessages(  # in action!
  library(tidyverse)
)
tidy_names <- tidyverse_packages()
tidy_names
##  [1] "broom"         "cli"           "crayon"        "dbplyr"       
##  [5] "dplyr"         "dtplyr"        "forcats"       "googledrive"  
##  [9] "googlesheets4" "ggplot2"       "haven"         "hms"          
## [13] "httr"          "jsonlite"      "lubridate"     "magrittr"     
## [17] "modelr"        "pillar"        "purrr"         "readr"        
## [21] "readxl"        "reprex"        "rlang"         "rstudioapi"   
## [25] "rvest"         "stringr"       "tibble"        "tidyr"        
## [29] "xml2"          "tidyverse"

And now to load them all.

# install.packages("pacman")  # if not installed
library(pacman)
p_load(char = tidy_names)

Once again we can ls() over packages in the form "package:dplyr". Now the {tidyverse} is loaded, we might as well use it to run the same pipeline as we did for the base packages.

tidy_pkgs <- paste0("package:", tidy_names)

tidy_fns <- map(tidy_pkgs, ls) |>
  set_names(tidy_names) |> 
  enframe(name = "package", value = "object") |>
  unnest(object) |> 
  mutate(nchar = nchar(object))

So we’re looking at even more packages this time, since the whole tidyverse contains 3018 of them.

The histogram is not too dissimilar to the one for base packages, though the tail is shorter, it’s arguably more normal-looking and the peak is perhaps slightly closer to 10. The latter could be because of more liberal use of snake_case.

hist(
  tidy_fns$nchar,
  main = "Character length of {tidyverse} object names",
  xlab = "Number of characters",
  las = 1
)

Histogram of character lengths for tidyverse object names. It's fairly normal around a bin of 8 to 10 characters, which has a peak frequency of over 600, plus there's a tail stretching out to over 30 characters.

Here’s the top 10 by character length.

slice_max(tidy_fns, nchar, n = 10)
## # A tibble: 10 × 3
##    package       object                           nchar
##    <chr>         <chr>                            <int>
##  1 googlesheets4 vec_ptype2.googlesheets4_formula    32
##  2 googlesheets4 vec_cast.googlesheets4_formula      30
##  3 cli           cli_progress_builtin_handlers       29
##  4 rstudioapi    getRStudioPackageDependencies       29
##  5 rstudioapi    launcherPlacementConstraint         27
##  6 cli           ansi_has_hyperlink_support          26
##  7 ggplot2       scale_continuous_identity           25
##  8 ggplot2       scale_linetype_continuous           25
##  9 haven         vec_arith.haven_labelled            24
## 10 rstudioapi    getActiveDocumentContext            24

The longest two are 32 and 30 characters in length and are both from {googlesheets4}. The help pages say they’re ‘internal {vctrs} methods’. The names of these are long because of the construction: the first part tells us the method name, e.g. vec_ptype2, and the second part tells us that they apply to the googlesheets4_formula S3 class.

So maybe these don’t really count because they would be excuted as as vec_ptype2() and vec_cast()? And they’re inflated because they contain the package name, {googlesheets4} , which is quite a long one (13 characters). That would leave cli::cli_progress_builtin_handlers() and rstudioapi::getRStudioPackageDependencies() as the next longest (29 characters). The latter uses camelCase—which is typical of the {rstudioapi} package—so isn’t bulked out by underscores.

On the other end of the spectrum, there’s only one function with one character: dplyr::n(). I think it makes sense to avoid single-character functions in non-base packages, because they aren’t terribly descriptive. n() can at least be understood to mean ‘number’.

Instead, here’s all the two-letter functions from the {tidyverse}. Note that many of these are from {lubridate} and are shorthand expressions that make sense in context, like hm() for hour-minute. You can also see some of {rlang}’s operators creep in here, like bang-bang (!!) and the walrus (:=).5

filter(tidy_fns, nchar == 2)
## # A tibble: 16 × 3
##    package   object nchar
##    <chr>     <chr>  <int>
##  1 cli       no         2
##  2 dplyr     do         2
##  3 dplyr     id         2
##  4 lubridate am         2
##  5 lubridate hm         2
##  6 lubridate ms         2
##  7 lubridate my         2
##  8 lubridate pm         2
##  9 lubridate tz         2
## 10 lubridate ym         2
## 11 lubridate yq         2
## 12 magrittr  or         2
## 13 rlang     :=         2
## 14 rlang     !!         2
## 15 rlang     ll         2
## 16 rlang     UQ         2

Both the {dplyr} functions here are no longer intended for use. I’m sad especially for dplyr::do(): the help page says it ‘never really felt like it belong[ed] with the rest of dplyr’ 😢.

In memoriam: do().


Session info
## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value                       
##  version  R version 4.1.0 (2021-05-18)
##  os       macOS Big Sur 10.16         
##  system   x86_64, darwin17.0          
##  ui       X11                         
##  language (EN)                        
##  collate  en_GB.UTF-8                 
##  ctype    en_GB.UTF-8                 
##  tz       Europe/London               
##  date     2021-11-27                  
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package       * version    date       lib source                     
##  assertthat      0.2.1      2019-03-21 [1] CRAN (R 4.1.0)             
##  backports       1.2.1      2020-12-09 [1] CRAN (R 4.1.0)             
##  blogdown        1.4        2021-07-23 [1] CRAN (R 4.1.0)             
##  bookdown        0.23       2021-08-13 [1] CRAN (R 4.1.0)             
##  broom         * 0.7.9      2021-07-27 [1] CRAN (R 4.1.0)             
##  bslib           0.3.1      2021-10-06 [1] CRAN (R 4.1.0)             
##  cellranger      1.1.0      2016-07-27 [1] CRAN (R 4.1.0)             
##  cli           * 3.1.0      2021-10-27 [1] CRAN (R 4.1.0)             
##  colorspace      2.0-2      2021-06-24 [1] CRAN (R 4.1.0)             
##  crayon        * 1.4.2      2021-10-29 [1] CRAN (R 4.1.0)             
##  data.table      1.14.0     2021-02-21 [1] CRAN (R 4.1.0)             
##  DBI             1.1.1      2021-01-15 [1] CRAN (R 4.1.0)             
##  dbplyr        * 2.1.1      2021-04-06 [1] CRAN (R 4.1.0)             
##  digest          0.6.28     2021-09-23 [1] CRAN (R 4.1.0)             
##  dplyr         * 1.0.7      2021-06-18 [1] CRAN (R 4.1.0)             
##  dtplyr        * 1.1.0      2021-02-20 [1] CRAN (R 4.1.0)             
##  ellipsis        0.3.2      2021-04-29 [1] CRAN (R 4.1.0)             
##  emo             0.0.0.9000 2021-08-26 [1] Github (hadley/emo@3f03b11)
##  evaluate        0.14       2019-05-28 [1] CRAN (R 4.1.0)             
##  fansi           0.5.0      2021-05-25 [1] CRAN (R 4.1.0)             
##  fastmap         1.1.0      2021-01-25 [1] CRAN (R 4.1.0)             
##  forcats       * 0.5.1      2021-01-27 [1] CRAN (R 4.1.0)             
##  fs              1.5.0      2020-07-31 [1] CRAN (R 4.1.0)             
##  gargle          1.2.0      2021-07-02 [1] CRAN (R 4.1.0)             
##  generics        0.1.1      2021-10-25 [1] CRAN (R 4.1.0)             
##  ggplot2       * 3.3.5      2021-06-25 [1] CRAN (R 4.1.0)             
##  glue            1.5.0      2021-11-07 [1] CRAN (R 4.1.0)             
##  googledrive   * 2.0.0      2021-07-08 [1] CRAN (R 4.1.0)             
##  googlesheets4 * 1.0.0      2021-07-21 [1] CRAN (R 4.1.0)             
##  gtable          0.3.0      2019-03-25 [1] CRAN (R 4.1.0)             
##  haven         * 2.4.3      2021-08-04 [1] CRAN (R 4.1.0)             
##  highr           0.9        2021-04-16 [1] CRAN (R 4.1.0)             
##  hms           * 1.1.1      2021-09-26 [1] CRAN (R 4.1.0)             
##  htmltools       0.5.2      2021-08-25 [1] CRAN (R 4.1.0)             
##  httr          * 1.4.2      2020-07-20 [1] CRAN (R 4.1.0)             
##  jquerylib       0.1.4      2021-04-26 [1] CRAN (R 4.1.0)             
##  jsonlite      * 1.7.2      2020-12-09 [1] CRAN (R 4.1.0)             
##  knitr           1.36       2021-09-29 [1] CRAN (R 4.1.0)             
##  lifecycle       1.0.1      2021-09-24 [1] CRAN (R 4.1.0)             
##  lubridate     * 1.8.0      2021-10-07 [1] CRAN (R 4.1.0)             
##  magrittr      * 2.0.1      2020-11-17 [1] CRAN (R 4.1.0)             
##  modelr        * 0.1.8      2020-05-19 [1] CRAN (R 4.1.0)             
##  munsell         0.5.0      2018-06-12 [1] CRAN (R 4.1.0)             
##  pacman        * 0.5.1      2019-03-11 [1] CRAN (R 4.1.0)             
##  pillar        * 1.6.4      2021-10-18 [1] CRAN (R 4.1.0)             
##  pkgconfig       2.0.3      2019-09-22 [1] CRAN (R 4.1.0)             
##  purrr         * 0.3.4      2020-04-17 [1] CRAN (R 4.1.0)             
##  R6              2.5.1      2021-08-19 [1] CRAN (R 4.1.0)             
##  Rcpp            1.0.7      2021-07-07 [1] CRAN (R 4.1.0)             
##  readr         * 2.0.2      2021-09-27 [1] CRAN (R 4.1.0)             
##  readxl        * 1.3.1      2019-03-13 [1] CRAN (R 4.1.0)             
##  reprex        * 2.0.1      2021-08-05 [1] CRAN (R 4.1.0)             
##  rlang         * 0.4.12     2021-10-18 [1] CRAN (R 4.1.0)             
##  rmarkdown       2.10       2021-08-06 [1] CRAN (R 4.1.0)             
##  rstudioapi    * 0.13       2020-11-12 [1] CRAN (R 4.1.0)             
##  rvest         * 1.0.1      2021-07-26 [1] CRAN (R 4.1.0)             
##  sass            0.4.0      2021-05-12 [1] CRAN (R 4.1.0)             
##  scales          1.1.1      2020-05-11 [1] CRAN (R 4.1.0)             
##  sessioninfo     1.1.1      2018-11-05 [1] CRAN (R 4.1.0)             
##  stringi         1.7.5      2021-10-04 [1] CRAN (R 4.1.0)             
##  stringr       * 1.4.0      2019-02-10 [1] CRAN (R 4.1.0)             
##  tibble        * 3.1.6      2021-11-07 [1] CRAN (R 4.1.0)             
##  tidyr         * 1.1.3      2021-03-03 [1] CRAN (R 4.1.0)             
##  tidyselect      1.1.1      2021-04-30 [1] CRAN (R 4.1.0)             
##  tidyverse     * 1.3.1      2021-04-15 [1] CRAN (R 4.1.0)             
##  tzdb            0.1.2      2021-07-20 [1] CRAN (R 4.1.0)             
##  utf8            1.2.2      2021-07-24 [1] CRAN (R 4.1.0)             
##  vctrs           0.3.8      2021-04-29 [1] CRAN (R 4.1.0)             
##  withr           2.4.2      2021-04-18 [1] CRAN (R 4.1.0)             
##  xfun            0.26       2021-09-14 [1] CRAN (R 4.1.0)             
##  xml2          * 1.3.2      2020-04-23 [1] CRAN (R 4.1.0)             
##  yaml            2.2.1      2020-02-01 [1] CRAN (R 4.1.0)             
## 
## [1] /Library/Frameworks/R.framework/Versions/4.1/Resources/library

  1. I wrote recently a whole post about package startup messages.↩︎

  2. Luke was curious too, so that’s at least two of us. (Luke also noticed that a link to my {linkrot} package was itself rotten, lol.)↩︎

  3. My understanding is that a future version of R will allow an underscore as the left-hand-side placeholder, in a similar manner to how the {tidyverse} allows a dot. That will do away with the need for ()(). Also ignore my badly-written base code; I’m trying to re-learn.↩︎

  4. In fact, p_load() will attempt installation if the package can’t be found in your library. Arguably, this is poor behaviour; you should always ask the user before installing something on their machine.↩︎

  5. Bang-Bang and the Walrus, touring Spring 2022.↩︎