Everytime you run a binary classifier over a inhabitants you get an estimate of the proportion of true positives in that inhabitants. That is known as the prevalence.
Nonetheless that estimate is biased, because of no classifier is good. For example, in case your classifier tells you that you simply’ve bought 20% of optimistic cases, nevertheless its precision is known to be solely 50%, you’d anticipate the true prevalence to be 0.2 × 0.5 = 0.1, i.e. 10%. Nonetheless that’s assuming good recall (all true positives are flagged by the classifier). If the recall is decrease than 1, you then acknowledge the classifier missed some true positives, so that you simply moreover need to normalize the prevalence estimate by the recall.
This ends in the frequent system for getting the true prevalence Pr(y=1) from the optimistic prediction value Pr(ŷ=1):
Nonetheless suppose that you just must run the classifier larger than as quickly as. For example, you might want to try this at frequent intervals to detect tendencies inside the prevalence. It’s possible you’ll’t use this technique anymore, because of precision is set by the prevalence. To utilize the system above you would wish to re-estimate the precision steadily (say, with human eval), nevertheless then you could just as well also re-estimate the prevalence itself.
How will we get out of this spherical reasoning? Plainly binary classifiers produce different effectivity metrics (furthermore precision) that don’t rely on the prevalence. These embrace not solely the recall R however as well as the specificity S, and these metrics may be utilized to control Pr(ŷ=1) to get an unbiased estimate of the true prevalence using this technique (usually often called prevalence adjustment):
the place:
- Pr(y=1) is the true prevalence
- S is the specificity
- R is the sensitivity or recall
- Pr(ŷ=1) is the proportion of positives
The proof is straightforward:
Fixing for Pr(y = 1) yields the system above.
Uncover that this technique breaks down when the denominator R — (1 — S) turns into 0, or when recall turns into equal to the false optimistic value 1-S. Nonetheless keep in mind what a typical ROC curve appears to be like:
An ROC curve like this one plots recall R (aka true optimistic value) in direction of the false optimistic value 1-S, so a classifier for which R = (1-S) is a classifier falling on the diagonal of the ROC diagram. It’s a classifier that’s, primarily, guessing randomly. True cases and false cases are equally susceptible to be categorized positively by this classifier, so the classifier is completely non-informative, and you might’t be taught one thing from it—and positively not the true prevalence.
Enough precept, let’s see if this works in observe:
# randomly draw some covariate
x <- runif(10000, -1, 1)# take the logit and draw the top end result
logit <- plogis(x)
y <- runif(10000) < logit
# match a logistic regression model
m <- glm(y ~ x, family = binomial)
# make some predictions, using an absurdly low threshold
y_hat <- predict(m, variety = "response") < 0.3
# get the recall (aka sensitivity) and specificity
c <- caret::confusionMatrix(subject(y_hat), subject(y), optimistic = "TRUE")
recall <- unname(c$byClass['Sensitivity'])
specificity <- unname(c$byClass['Specificity'])
# get the adjusted prevalence
(suggest(y_hat) - (1 - specificity)) / (recall - (1 - specificity))
# consider with exact prevalence
suggest(y)
On this simulation I get recall = 0.049
and specificity = 0.875
. The anticipated prevalence is a ridiculously biased 0.087
, nevertheless the adjusted prevalence is principally equal to the true prevalence (0.498
).
To sum up: this displays how, using a classifier’s recall and specificity, you’ll have the ability to adjusted the anticipated prevalence to hint it over time, assuming that recall and specificity are regular over time. You may’t try this using precision and recall because of precision is set by the prevalence, whereas recall and specificity don’t.