= function(values)
mean_by_hand
{return(
sum(values) / length(values)
)
}
= function(values)
mean_builtin
{return(
mean(values)
) }
TL;DR
If you are using the {targets}
workflow manager for your analyses, and also using {purrr::safely}
(or others) to control possible errors, do not do the simplest version of function replacement to make your function safe
. See here for the final versions.
targets
A bit of an introduction, {targets}
is a workflow manager written and managed completely in R
(Landau 2021). It is the second workfow manager written by the amazing William Landau. It keeps track of all the interdependencies amongst function inputs and outputs, files, etc, for you, and only reruns things that need to be rerun. This makes it easier to modularize a script into separate functions, and not worry about the order that things get run in, or which bits need to be rerun when you change things without rerunning the entire script.
If you are familiar with {knitr}
cache’ing, it’s somewhat like that, but very beefed up, and not dependent on the document at all. Also, in my opinion, much smarter than {knitr}
caches (Yihui Xie 2022).
safely?
Outside of iterations, another subset of functionality in {purrr}
are various adverbs, or functions that modify the effect of another function (Wickham and Henry 2023). One of these is safely
. From the help page:
Creates a modified version of .f that always succeeds. It returns a list with components result and error. If the function succeeds, result contains the returned value and error is NULL. If an error occurred, error is an error object and result is either NULL or otherwise.
As you can imagine, this is incredibly useful for handling error conditions without explicitly handling it within your own function. By the way, it is worth looking at the help page for {purrr::safely}
just to see all the adverbs that {purrr}
provides, some may be useful for you in other contexts.
The Issue
So, what happens if you try to use {purrr::safely}
to modify a user defined function from within a {targets}
workflow? It turns out, you have to be very careful how you incorporate {purrr::safely}
into your workflow. If you do it wrong, then {targets}
can’t tell you changed the underlying function, and won’t rerun the workflow. I do want to thank Neil Wright for bringing this up on Mastodon, as it’s not obvious what the best solution is or why this happens (Wright 2023). I also note that Neil does post the solution in their next post.
Setup
So this post assumes you have a recent version of {targets}
installed (I’m using 1.0.0) as well as {purrr}
(using 1.0.1). We are not going to setup a full {targets}
workflow, but will use the functions tar_deparse_safe
and digest_obj64
(internal to {targets}
), as that will illuminate the issues for us.
Initial Functions
So we need a few functions for this, ideally ones that will generate the same output, but that we can implement in a couple of different ways. Thanks to the fact that R is a statistical language, we can do that fairly easily. For example, we can make two functions for calculating the mean of a set of values. One doing the actual calculation, and another that simply wraps the built-in mean
function.
From these we can then wrap them in safely
to make them better able to handle a possible error.
= purrr::safely(mean_by_hand)
safe_by_hand
= purrr::safely(mean_builtin) safe_builtin
Hashes
Now, for all of these, we can get the object hash that {targets}
would use to tell when things changed.
= purrr::map(c("by-hand" = mean_by_hand,
hash_values "built-in" = mean_builtin,
"safe-by-hand" = safe_by_hand,
"safe-built-in" = safe_builtin),
function(.x){
:::tar_deparse_safe(.x) |> targets:::digest_chr64()
targets
})= tibble::as_tibble_row(hash_values) |>
hash_table ::pivot_longer(cols = tidyselect::everything(), names_to = "which", values_to = "hash")
tidyr::gt(hash_table) gt
which | hash |
---|---|
by-hand | b3f5384d3554e21a |
built-in | 35321f32205c905b |
safe-by-hand | 00758262fad8ce25 |
safe-built-in | 00758262fad8ce25 |
As we can see from Table 1, the two safe
variants have identical hashes. If you look at the help page for tar_cue
, you might see why:
User-defined functions are hashed in the following way:
Deparse the function with targets:::tar_deparse_safe(). This function computes a string representation of the function body and arguments. This string representation is invariant to changes in comments and whitespace, which means trivial changes to formatting do not cue targets to rerun.
Manually remove any literal pointers from the function string using targets:::mask_pointers(). Such pointers arise from inline compiled C/C++ functions.
Using static code analysis (i.e. tar_deps(), which is based on codetools::findGlobals()) identify any user-defined functions and global objects that the current function depends on. Append the hashes of those dependencies to the string representation of the current function.
Compute the hash of the final string representation using targets:::digest_chr64().
See that bit about deparsing the function in step 1. What does that look like for the safe
versions?
:::tar_deparse_safe(safe_by_hand) targets
[1] "function (...) \ncapture_error(.f(...), otherwise, quiet)"
:::tar_deparse_safe(safe_builtin) targets
[1] "function (...) \ncapture_error(.f(...), otherwise, quiet)"
They have the same output! Now, I understand why the functions should look this way due to how they work, and I also understand why Will implemented the hashing of user defined functions this way as well. However, for our purposes, it makes things a teensy bit harder.
Better Functions
If we still want to use the magic of safely
(and any likely some of the other {purrr}
adverbs) while using {targets}
, then we need to think about this problem a little more. One way to make sure that any changes to the function will be reflected in the hash is to embed the safely
call within the function body.
In the next two functions, we can see that we’ve done just that. Note that right now, we are concentrating on making functions that do the same thing, in a safe way, but their deparsed hash will be different.
= function(values)
safe_by_hand_v2
{= function(values)
internal_byhand
{sum(values) / length(values)
}= purrr::safely(internal_byhand)
safe_internal return(
safe_internal(values)
)
}
= function(values)
safe_builtin_v2
{= purrr::safely(mean)
safe_builtin return(
safe_builtin(values)
) }
More Hashes
= purrr::map(c("safe-by-hand-v2" = safe_by_hand_v2,
hash_values2 "safe-built-in-v2" = safe_builtin_v2),
function(.x){
:::tar_deparse_safe(.x) |> targets:::digest_chr64()
targets
})= tibble::as_tibble_row(hash_values2) |>
hash_table2 ::pivot_longer(cols = tidyselect::everything(), names_to = "which", values_to = "hash")
tidyr::gt(dplyr::bind_rows(hash_table, hash_table2)) gt
which | hash |
---|---|
by-hand | b3f5384d3554e21a |
built-in | 35321f32205c905b |
safe-by-hand | 00758262fad8ce25 |
safe-built-in | 00758262fad8ce25 |
safe-by-hand-v2 | 8259616adc47b8cc |
safe-built-in-v2 | b35ae69ffbd4ec65 |
As shown in Table 2, now we see different hashes, even though our functions are safe
, and generate the same results, as long as we don’t have any overflow or underflow issues.
Changing the Referenced Function
The {targets}
documentation notes that it is also examining the dependent user defined functions and then concatenating the hashes together to check for changes to the overall set of functions being called. This is hard to show here in our simple post, but I’ve verified that it does indeed work for our specific case (Flight 2023).
We can imagine this pair of functions to make a safe version:
= function(values)
mean_function
{return(
mean(values)
)
}
= function(values)
safe_mean
{= purrr::safely(mean_function)
safe_version return(
safe_version(values)
) }
If we change our mean_function
to something different, {targets}
will pick up the changes and change the hash accordingly:
= function(values)
mean_function
{return(
sum(values) / length(values)
) }
Conclusions
If you want {targets}
to pick up on changes to your functions, it helps to understand exactly how {targets}
is generating hashable representations of your user defined functions, and what gets returned by them. Thankfully, it does not require much more in terms of lines of code to make sure that {targets}
picks up on your changed function definition.
References
Reuse
Citation
@online{mflight2023,
author = {Robert M Flight},
title = {Targets and {Safe} {Functions}},
date = {2023-05-08},
url = {https://rmflight.github.io/posts/2023-05-08-targets-and-safe-functions},
langid = {en}
}