How to make sure you get a classification fit and not a probability fit from a random forest model using the tidymodels framework.
parsnip
tidymodels
machine-learning
random-forest
random-code-snippets
Author
Robert M Flight
Published
August 30, 2021
I’ve been working on a “machine learning” project, and in the process I’ve been learning to use the tidymodels framework (“Tidymodels” 2021), which helps save you from leaking information from testing to training data, as well as creating workflows in a consistent way across methods.
However, I got tripped up recently by one issue. When I’ve previously used Random Forests (“Random Forest Wiki Page” 2021), I’ve found that for classification problems, the out-of-bag (OOB) error reported is a good proxy for the area-under-the-curve (AUC), or estimate of how good any other machine learning technique will do (see (Flight 2015) for an example using actual random data). Therefore, I like to put my data through a Random Forest algorithm and check the OOB error, and then maybe reach for a tuned boosted tree to squeeze every last bit of performance out.
tidymodels default is to use a probability tree, even for classification problems. This isn’t normally a problem for most people, because you will have a train and test set, and estimate performance on the test set using AUC. However, it is a problem if you just want to see the OOB error from the random forest, because it is reported differently for probability vs classification.
Lets run an example using the tidymodels cell data set.
══ Workflow [trained] ══════════════════════════════════════════════════════════
Preprocessor: Recipe
Model: rand_forest()
── Preprocessor ────────────────────────────────────────────────────────────────
1 Recipe Step
• step_dummy()
── Model ───────────────────────────────────────────────────────────────────────
Ranger result
Call:
ranger::ranger(x = maybe_data_frame(x), y = y, num.threads = 1, verbose = FALSE, seed = sample.int(10^5, 1), probability = TRUE)
Type: Probability estimation
Number of trees: 500
Sample size: 2019
Number of independent variables: 56
Mtry: 7
Target node size: 10
Variable importance mode: none
Splitrule: gini
OOB prediction error (Brier s.): 0.1198456
Here we see the OOB error is 12% (0.119), which is not significantly different than the 17% above, but still different. Also, the “Type” shows “Probability estimation” instead of “Classification estimation”.
If we run ranger again with a “probability” instead of “classification”, do we match up with the result above?
set.seed(1234)ranger(class ~ ., data = cells, min.node.size =10, probability =TRUE)
Ranger result
Call:
ranger(class ~ ., data = cells, min.node.size = 10, probability = TRUE)
Type: Probability estimation
Number of trees: 500
Sample size: 2019
Number of independent variables: 56
Mtry: 7
Target node size: 10
Variable importance mode: none
Splitrule: gini
OOB prediction error (Brier s.): 0.119976
That is much closer to the tidymodels result! Great! Except, it’s a misestimation of the true OOB error for classification. How do we get what we want while using the tidymodels framework?
══ Workflow [trained] ══════════════════════════════════════════════════════════
Preprocessor: Recipe
Model: rand_forest()
── Preprocessor ────────────────────────────────────────────────────────────────
1 Recipe Step
• step_dummy()
── Model ───────────────────────────────────────────────────────────────────────
Ranger result
Call:
ranger::ranger(x = maybe_data_frame(x), y = y, probability = ~FALSE, num.threads = 1, verbose = FALSE, seed = sample.int(10^5, 1))
Type: Classification
Number of trees: 500
Sample size: 2019
Number of independent variables: 56
Mtry: 7
Target node size: 1
Variable importance mode: none
Splitrule: gini
OOB prediction error: 16.54 %
Aha! Now we are much closer to the original value of 17%, and the “Type” is “Classification”.
I know in this case, the differences in OOB error are honestly not that much different, but in my recent project, they differed by 20%, where I had a 45% using classification, and 25% using probability. Therefore, I was being fooled by the tidymodels framework investigation, and then wondering why my final AUC on a tuned model was only hitting just > 55%.
So remember, this isn’t how I would run the model for final classification and estimation of AUC on a test set, but if you want the OOB errors for a quick “feel” of your data, it’s very useful.