Deciphering Life: One Bit at a Time - targets and Safe Functions

TL;DR

If you are using the {targets} workflow manager for your analyses, and also using {purrr::safely} (or others) to control possible errors, do not do the simplest version of function replacement to make your function safe. See here for the final versions.

targets

A bit of an introduction, {targets} is a workflow manager written and managed completely in R (Landau 2021). It is the second workfow manager written by the amazing William Landau. It keeps track of all the interdependencies amongst function inputs and outputs, files, etc, for you, and only reruns things that need to be rerun. This makes it easier to modularize a script into separate functions, and not worry about the order that things get run in, or which bits need to be rerun when you change things without rerunning the entire script.

If you are familiar with {knitr} cache’ing, it’s somewhat like that, but very beefed up, and not dependent on the document at all. Also, in my opinion, much smarter than {knitr} caches (Yihui Xie 2022).

safely?

Outside of iterations, another subset of functionality in {purrr} are various adverbs, or functions that modify the effect of another function (Wickham and Henry 2023). One of these is safely. From the help page:

Creates a modified version of .f that always succeeds. It returns a list with components result and error. If the function succeeds, result contains the returned value and error is NULL. If an error occurred, error is an error object and result is either NULL or otherwise.

As you can imagine, this is incredibly useful for handling error conditions without explicitly handling it within your own function. By the way, it is worth looking at the help page for {purrr::safely} just to see all the adverbs that {purrr} provides, some may be useful for you in other contexts.

The Issue

So, what happens if you try to use {purrr::safely} to modify a user defined function from within a {targets} workflow? It turns out, you have to be very careful how you incorporate {purrr::safely} into your workflow. If you do it wrong, then {targets} can’t tell you changed the underlying function, and won’t rerun the workflow. I do want to thank Neil Wright for bringing this up on Mastodon, as it’s not obvious what the best solution is or why this happens (Wright 2023). I also note that Neil does post the solution in their next post.

Setup

So this post assumes you have a recent version of {targets} installed (I’m using 1.0.0) as well as {purrr} (using 1.0.1). We are not going to setup a full {targets} workflow, but will use the functions tar_deparse_safe and digest_obj64 (internal to {targets}), as that will illuminate the issues for us.

Initial Functions

So we need a few functions for this, ideally ones that will generate the same output, but that we can implement in a couple of different ways. Thanks to the fact that R is a statistical language, we can do that fairly easily. For example, we can make two functions for calculating the mean of a set of values. One doing the actual calculation, and another that simply wraps the built-in mean function.

mean_by_hand = function(values)
{
  return(
    sum(values) / length(values)
  )
}

mean_builtin = function(values)
{
  return(
    mean(values)
  )
}

From these we can then wrap them in safely to make them better able to handle a possible error.

safe_by_hand = purrr::safely(mean_by_hand)

safe_builtin = purrr::safely(mean_builtin)

Hashes

Now, for all of these, we can get the object hash that {targets} would use to tell when things changed.

hash_values = purrr::map(c("by-hand" = mean_by_hand,
                           "built-in" = mean_builtin,
                           "safe-by-hand" = safe_by_hand,
                           "safe-built-in" = safe_builtin),
                         function(.x){
                           targets:::tar_deparse_safe(.x) |> targets:::digest_chr64()
                         })
hash_table = tibble::as_tibble_row(hash_values) |>
  tidyr::pivot_longer(cols = tidyselect::everything(), names_to = "which", values_to = "hash")
gt::gt(hash_table)

Table 1: Initial hashes of the various functions.
which	hash
by-hand	b3f5384d3554e21a
built-in	35321f32205c905b
safe-by-hand	00758262fad8ce25
safe-built-in	00758262fad8ce25

As we can see from Table 1, the two safe variants have identical hashes. If you look at the help page for tar_cue, you might see why:

User-defined functions are hashed in the following way:

Deparse the function with targets:::tar_deparse_safe(). This function computes a string representation of the function body and arguments. This string representation is invariant to changes in comments and whitespace, which means trivial changes to formatting do not cue targets to rerun.

Manually remove any literal pointers from the function string using targets:::mask_pointers(). Such pointers arise from inline compiled C/C++ functions.

Using static code analysis (i.e. tar_deps(), which is based on codetools::findGlobals()) identify any user-defined functions and global objects that the current function depends on. Append the hashes of those dependencies to the string representation of the current function.

Compute the hash of the final string representation using targets:::digest_chr64().

See that bit about deparsing the function in step 1. What does that look like for the safe versions?

targets:::tar_deparse_safe(safe_by_hand)

[1] "function (...) \ncapture_error(.f(...), otherwise, quiet)"

targets:::tar_deparse_safe(safe_builtin)

[1] "function (...) \ncapture_error(.f(...), otherwise, quiet)"

They have the same output! Now, I understand why the functions should look this way due to how they work, and I also understand why Will implemented the hashing of user defined functions this way as well. However, for our purposes, it makes things a teensy bit harder.

Better Functions

If we still want to use the magic of safely (and any likely some of the other {purrr} adverbs) while using {targets}, then we need to think about this problem a little more. One way to make sure that any changes to the function will be reflected in the hash is to embed the safely call within the function body.

In the next two functions, we can see that we’ve done just that. Note that right now, we are concentrating on making functions that do the same thing, in a safe way, but their deparsed hash will be different.

safe_by_hand_v2 = function(values)
{
  internal_byhand = function(values)
  {
    sum(values) / length(values)
  }
  safe_internal = purrr::safely(internal_byhand)
  return(
    safe_internal(values)
  )
}

safe_builtin_v2 = function(values)
{
  safe_builtin = purrr::safely(mean)
  return(
    safe_builtin(values)
  )
}

More Hashes

hash_values2 = purrr::map(c("safe-by-hand-v2" = safe_by_hand_v2,
                           "safe-built-in-v2" = safe_builtin_v2),
                         function(.x){
                           targets:::tar_deparse_safe(.x) |> targets:::digest_chr64()
                         })
hash_table2 = tibble::as_tibble_row(hash_values2) |>
  tidyr::pivot_longer(cols = tidyselect::everything(), names_to = "which", values_to = "hash")
gt::gt(dplyr::bind_rows(hash_table, hash_table2))

Table 2: More hashes of the safe variants from the above functions.
which	hash
by-hand	b3f5384d3554e21a
built-in	35321f32205c905b
safe-by-hand	00758262fad8ce25
safe-built-in	00758262fad8ce25
safe-by-hand-v2	8259616adc47b8cc
safe-built-in-v2	b35ae69ffbd4ec65

As shown in Table 2, now we see different hashes, even though our functions are safe, and generate the same results, as long as we don’t have any overflow or underflow issues.

Changing the Referenced Function

The {targets} documentation notes that it is also examining the dependent user defined functions and then concatenating the hashes together to check for changes to the overall set of functions being called. This is hard to show here in our simple post, but I’ve verified that it does indeed work for our specific case (Flight 2023).

We can imagine this pair of functions to make a safe version:

mean_function = function(values)
{
  return(
    mean(values)
  )
}

safe_mean = function(values)
{
  safe_version = purrr::safely(mean_function)
  return(
    safe_version(values)
  )
}

If we change our mean_function to something different, {targets} will pick up the changes and change the hash accordingly:

mean_function = function(values)
{
  return(
    sum(values) / length(values)
  )
}

Conclusions

If you want {targets} to pick up on changes to your functions, it helps to understand exactly how {targets} is generating hashable representations of your user defined functions, and what gets returned by them. Thankfully, it does not require much more in terms of lines of code to make sure that {targets} picks up on your changed function definition.

References

Flight, Robert M. 2023. “Diff of Two Commits in a Targets Workflow.” https://fediscience.org/@neilstats/110312534619119343.

Landau, William Michael. 2021. “The Targets r Package: A Dynamic Make-Like Function-Oriented Pipeline Toolkit for Reproducibility and High-Performance Computing.” Journal of Open Source Software 6 (57): 2959. https://doi.org/10.21105/joss.02959.

Wickham, Hadley, and Lionel Henry. 2023. Purrr: Functional Programming Tools. https://CRAN.R-project.org/package=purrr.

Wright, Neil. 2023. “Neil Wright on Fedi Science.” https://fediscience.org/@neilstats/110312534619119343.

Yihui Xie, Emily Riederer, Christophe Dervieux. 2022. R Markdown Cookbook. https://bookdown.org/yihui/rmarkdown-cookbook/cache.html.

Reuse

https://creativecommons.org/licenses/by/4.0/

Citation

BibTeX citation:

@online{mflight2023,
  author = {Robert M Flight},
  title = {Targets and {Safe} {Functions}},
  date = {2023-05-08},
  url = {https://rmflight.github.io/posts/2023-05-08-targets-and-safe-functions},
  langid = {en}
}

For attribution, please cite this work as:

Robert M Flight. 2023. “Targets and Safe Functions.” May 8, 2023. https://rmflight.github.io/posts/2023-05-08-targets-and-safe-functions.