InfernoCalibNet
  1. Notes
  2. Metrics
  • Overview
    • Welcome to InfernoCalibNet
  • Data preparations
    • ๐ŸงŠ Data loading, preperation and cleanup CNN
    • ๐Ÿ“ฆ Data loading, preperation and cleanup Inferno
  • CNN/Inferno evaluation
    • ๐Ÿงฎ Thresholds, Utility and Confusion Matrices
    • ๐Ÿงช Clinical utility comparison
  • Clinical Experiments
    • โš–๏ธ Utility-Based Clinical Decision
    • ๐ŸŽฏ Utility Based Evaluation Under Altered Base Rates
    • ๐Ÿงช Calibration Analysis of Neural Network Logits with Inferno
    • ๐Ÿ“ˆ Inferno Mutual Information Between Predictands and Predictors
  • Pipeline examples
    • ๐Ÿ–ผ๏ธ Prediction using Neural Network
    • ๐Ÿ”„ CNN to Inferno Pipeline
  • Notes
    • Metrics

On this page

  • ๐Ÿงช Evaluation Metrics Overview
    • ๐ŸŽฏ Precision
    • ๐ŸŽฏ Recall (Sensitivity)
    • ๐Ÿ” Precision vs Recall
    • ๐Ÿ“ˆ ROC Curve (Receiver Operating Characteristic)
    • ๐Ÿ“ AUROC (Area Under ROC Curve)
    • ๐Ÿ“‰ PR Curve (Precision-Recall)
    • ๐Ÿ“Š Average Precision (AP)
    • ๐Ÿ” AUROC vs Average Precision
    • โœ… Metric Use in This Project
  • Report an issue
  1. Notes
  2. Metrics

Metrics

๐Ÿงช Evaluation Metrics Overview

๐ŸŽฏ Precision

  • Definition: How many of the predicted positives are actually correct?
  • Formula: TP / (TP + FP)
  • Good for: Avoiding false alarms. Important when false positives are costly.

๐ŸŽฏ Recall (Sensitivity)

  • Definition: How many of the actual positives did the model find?
  • Formula: TP / (TP + FN)
  • Good for: Avoiding missed cases. Critical in healthcare to ensure no true cases are overlooked.

๐Ÿ” Precision vs Recall

  • Precision emphasizes correctness of positive predictions.
  • Recall emphasizes completeness of finding all true positives.
  • In medical AI, recall is often prioritized, but both matter depending on clinical consequences.

๐Ÿ“ˆ ROC Curve (Receiver Operating Characteristic)

  • Axes: X = False Positive Rate, Y = True Positive Rate
  • Good for: Overall class separation ability across all thresholds.
  • Note: Can be misleading with class imbalance.
  • How to read: Closer the curve is to the top-left, the better the model. AUC of 1.0 is perfect, 0.5 is random.

๐Ÿ“ AUROC (Area Under ROC Curve)

  • Definition: Single number summarizing the ROC curve.
  • Range: 0.0 โ€“ 1.0 (1.0 = perfect classifier)
  • Limitation: Considers both classes equally โ€” not ideal when positives are rare.

๐Ÿ“‰ PR Curve (Precision-Recall)

  • Axes: X = Recall, Y = Precision
  • Good for: Measuring performance on the positive class only.
  • More useful than ROC when positives are rare (e.g., disease detection).
  • How to read: The higher the curve stays in the top-right corner, the better. A steep drop in precision shows increased false positives as recall rises.

๐Ÿ“Š Average Precision (AP)

  • Definition: Area under the PR curve.
  • Not just the mean of precision values.
  • Best for: Summarizing model performance on minority class detection.

๐Ÿ” AUROC vs Average Precision

Metric Focus Class Imbalance Sensitivity to Thresholds
AUROC Overall discrimination Poor Low
Average Precision Positive class only Good High

โœ… Metric Use in This Project

  • AUROC and AP are both reported to evaluate model quality.
  • PR curves and Average Precision are emphasized due to dataset imbalance.
  • Precision, Recall, and F1-score help assess decision quality at a 0.5 threshold.
  • Utility matrix evaluation is included to account for clinical relevance beyond binary metrics.
 

ยฉ 2025 InfernoCalibNet - All Rights Reserved

  • Report an issue