๐งช Calibration Analysis of Neural Network Logits with Inferno
Calibration curves for Effusion and Atelectasis
Author
Maksim Ohvrill
Published
May 1, 2025
This notebook investigates how well neural network (CNN) outputs align with true outcome probabilities, using Inferno.
Calibration performance is assessed for two clinical conditions: Effusion and Atelectasis.
Code
# -------------------------------# Setup# -------------------------------# Set working directory one level upsetwd("..")library(inferno)# Load reusable utilitiessource("RScripts/reusableUtils.R")# Define general output directoryoutput_dir <-"data/plots"if (!dir.exists(output_dir)) {dir.create(output_dir)}# Define model and data pathslearnt_dir <-"data/inferno/combinedML50"setup <-load_metadata_testdata(learnt_dir)metadata <- setup$metadatatest_data <- setup$testdata
Conditional Probability of Lung Conditions vs Age
We estimate the conditional probability of two lung conditions โ Effusion and Atelectasis โ as a function of Age.
These probabilities are plotted: - Solid line for Effusion - Dashed line for Atelectasis - 89% variability intervals included to represent model uncertainty
Interpretation
The plots show how the likelihood of each condition evolves with age
Code
# -------------------------------# Conditional Probability Curves vs Age# -------------------------------# Define input gridsx_age <-data.frame(AGE =1:100)y_eff <-data.frame(LABEL_EFFUSION =1)y_ale <-data.frame(LABEL_ATELECTASIS =1)# Infer probabilitiescondpr_eff <-Pr(Y = y_eff, X = x_age, learnt = learnt_dir, parallel = parallel, quantiles =c(0.055, 0.945))condpr_ale <-Pr(Y = y_ale, X = x_age, learnt = learnt_dir, parallel = parallel, quantiles =c(0.055, 0.945))# Define plotting functionplot_lung_condition_vs_age <-function() {par(family ="Palatino")plot(condpr_eff, ylim =0:1, col ="#3fb2b7", lty =1, lwd =2,legend =FALSE, ylab ="Probability of Effusion/Atelectasis given Age (89% uncertainty)", font.lab =2, font.axis =2)plot(condpr_ale, ylim =0:1, col ="#f8766d", lty =2, lwd =2,legend =FALSE, add =TRUE)legend("top", legend =c("Effusion", "Atelectasis"), col =c("#3fb2b7", "#f8766d"), lty =1:2, lwd =2, pch =NA, bty ='n', text.font =2)}# Save and display plotpdf2("lungcondition_vs_age", path = output_dir)plot_lung_condition_vs_age()dev.off()plot_lung_condition_vs_age()
Registered doParallelSNOW with 15 workers
Closing connections to cores.
Registered doParallelSNOW with 15 workers
Closing connections to cores.
pdf: 2
๐ Calibration Curves Across Age Groups
This experiment examines how the neural networkโs predicted probabilities align with actual outcomes across different age groups. The focus is on predicting pleural effusion (\(\text{LABEL\_EFFUSION}\)).
๐ฏ Goal of Calibration
A model is well-calibrated if: $ = $ This means, for example, that if the model predicts a 70% chance of effusion, then about 70% of the cases actually have effusion.
๐งฎ Estimating Conditional Probabilities
Because inferno does not directly compute probabilities conditional on an interval, conditional probabilities are estimated using:
$ P(Y = 1 , a < < b) = $ where: - \(Y\) is the binary target (1 for effusion), - \(a\) and \(b\) define the age range, - The numerator is the joint probability of effusion and age between \(a\) and \(b\), - The denominator is the marginal probability of age between \(a\) and \(b\).
This approach loses variability information but gives a usable estimate for calibration.
๐ ๏ธ Steps in Calibration
The calibration is computed for the following non-overlapping age intervals:
0โ19 years
20โ39 years
40โ59 years
60โ79 years
80โ99 years
Calculate summed probabilities for each age group.
Divide joint by marginal probabilities to get conditional estimates.
Plot estimated probabilities versus the sigmoid output of the neural network.
๐ผ๏ธ Interpretation of the Calibration Plot
The X-axis shows the neural networkโs sigmoid-transformed logits (\(\text{plogis}(\text{logit output})\)).
The Y-axis shows the estimated true probability of effusion for different age groups.
A solid red line represents perfect calibration, where the predicted sigmoid output would exactly match the true probability.
If the curves align with the red line, the model outputs are well calibrated.
Deviation from the red line indicates bias or miscalibration: the modelโs raw outputs overestimate or underestimate the true probability, depending on the age group.
Code
# -------------------------------# Calibration Curves Across Age Groups# -------------------------------# Define grids for calibrationout_logit <-seq(-5, 5, length.out =129)# Effusion probabilitiesprobs1_eff <-Pr(Y =data.frame(LABEL_EFFUSION =1, AGE =0:99),X =data.frame(LOGIT_EFFUSION = out_logit),learnt = learnt_dir, parallel = parallel,quantiles =NULL, nsamples =NULL)probs2_eff <-Pr(Y =data.frame(AGE =0:99),X =data.frame(LOGIT_EFFUSION = out_logit),learnt = learnt_dir, parallel = parallel,quantiles =NULL, nsamples =NULL)# Create grouped calibration curvescond_probs_eff <-make_calibration_curve(probs1_eff, probs2_eff, group_size =20)# Generate darker color palette between two colorscolor_palette <-c("#1f77b4", # Age 0โ19"#aec7e8", # Age 20โ39"#004c6d", # Age 40โ59"#d62728", # Age 60โ79"#ff9896"# Age 80โ99)# Define plotting functionplot_calibration_vs_age <-function() {par(family ="Palatino")flexiplot(x =plogis(out_logit), y = cond_probs_eff,xlab ="NN sigmoid output", ylab ="Probability", main ="Effusion",ylim =0:1, xlim =0:1,col = color_palette, lty =1:10, lwd =3, font.lab =2, font.axis =2)lines(x =c(0, 1), y =c(0, 1), lty =1, lwd =2, col ="#000000")legend("topleft",legend =c("Age 0 -- 19", "Age 20 -- 39", "Age 40 -- 59", "Age 60 -- 79", "Age 80 -- 99", "Perfect Calibration"),lty =c(1:5, 1), col =c(color_palette, "#000000"), lwd =2, pch =NA, bty ='n', text.font =2)}# Save and display plotpdf2("calibration_vs_age", path = output_dir)plot_calibration_vs_age()dev.off()plot_calibration_vs_age()
pdf: 2
๐ Calibration Curves for Neural Net Outputs vs Inferred Probabilities
For each clinical condition (Effusion, Atelectasis), two calibration plots are generated:
๐ Sigmoid Output Calibration Curve
X-axis: Neural network sigmoid output, computed as \(\text{sigmoid}(\text{logit})\).
Y-axis: Bayesian inferred probability, obtained using \(\text{Pr}()\).
The diagonal red line represents perfect calibration where \(y = x\).
Deviation from the red line shows miscalibration, meaning the neural networkโs predicted confidence does not match true probabilities.
๐ข Logit Output Calibration Curve
X-axis: Raw neural network logit scores.
Y-axis: Bayesian inferred probability.
This view visualizes how raw CNN scores translate to real probabilities, highlighting possible non-linear distortions in the output space.
โ๏ธ Threshold at 0.28
A threshold at \(0.28\) is highlighted by a vertical line and a shaded region.
This threshold maximizes the expected clinical utility for binary classification.
Interpretation:
Patients with CNN sigmoid output above \(0.28\) are classified positive under a fixed rule.
This fixed threshold assumes perfect calibration across all patients, which is often unrealistic.