📈 Inferno Mutual Information Between Predictands and Predictors

Author

Maksim Ohvrill

Published

May 1, 2025

This notebook analyzes the mutual information between patient predictors (CNN logits, Age, Gender, etc.) and lung condition predictands (effusion and atelectasis). It demonstrates that CNN logits carry nearly all predictive power, while auxiliary information adds little beyond what the model already captures.

Code

# -------------------------------
# Setup
# -------------------------------

# Set working directory one level up
setwd("..")
library(inferno)

# Load reusable utilities
source("RScripts/reusableUtils.R")

# Define general output directory
output_dir <- "data/plots"
if (!dir.exists(output_dir)) {
  dir.create(output_dir)
}

# Define model and data paths
learnt_dir <- "data/inferno/combinedML50"
setup <- load_metadata_testdata(learnt_dir)
metadata <- setup$metadata
test_data <- setup$testdata

Mutual Information Analysis

📖 Overview

Mutual Information (MI) quantifies the dependency between two random variables.
In this study, MI measures how much information about:

Patient age
CNN logit outputs (raw scores before applying sigmoid)

explains:

The ground truth labels: LABEL_EFFUSION and LABEL_ATELECTASIS.

📈 Mutual Information Between CNN Logits and Age

The first analysis computes the mutual information between CNN logits and patient age.

Inputs:
\(Y_1 = \{\text{LOGIT\_ATELECTASIS}, \text{LOGIT\_EFFUSION}\}\)
\(Y_2 = \{\text{AGE}\}\)
Formula:

\[ MI(Y_1, Y_2) = \sum_{y_1, y_2} p(y_1, y_2) \log\left( \frac{p(y_1, y_2)}{p(y_1)p(y_2)} \right) \]

Results Reported:
- Mutual Information (\(MI\))
- Conditional entropies: \(H(Y_1|Y_2)\) and \(H(Y_2|Y_1)\)
- Marginal entropies: \(H(Y_1)\) and \(H(Y_2)\)
- Maximum achievable MI: \(MI_{\text{max}} = H(Y_1)\)

🧪 Mutual Information Between Labels and Predictors

Mutual information is calculated between:

\(Y_1 = \{\text{LABEL\_ATELECTASIS}, \text{LABEL\_EFFUSION}\}\)
\(Y_2 = \{\text{AGE, GENDER, VP, LOGIT\_EFFUSION, LOGIT\_ATELECTASIS}\}\)

This measures how well patient metadata and CNN predictions explain the ground truth labels.

🔍 Isolation of Age and Logit Contributions

To isolate contributions:

Without Age:
\(Y_2 = \{\text{GENDER, VP, LOGIT\_EFFUSION, LOGIT\_ATELECTASIS}\}\)
Without Logits:
\(Y_2 = \{\text{AGE, GENDER, VP}\}\)

Comparing mutual information values shows how much each group adds uniquely.

📏 Results Interpretation

Absolute Differences:

For age contribution:

\[ \Delta_{\text{age}} = MI_{\text{full}} - MI_{\text{no age}} \]

For logits contribution:

\[ \Delta_{\text{logits}} = MI_{\text{full}} - MI_{\text{no logits}} \]

Relative Importance:

Relative contribution of logits:

\[ \text{Relative Importance of Logits} = \frac{\Delta_{\text{logits}}}{MI_{\text{full}}} \times 100 \]

Code

# -------------------------------
# Mutual Information Calculations
# -------------------------------

# Mutual information between logits and age
mi_logits_age <- calculate_mi(
  Y1names = c("LOGIT_ATELECTASIS", "LOGIT_EFFUSION"),
  Y2names = c("AGE"),
  learntdir = learnt_dir
)

# Mutual information between predictands and all predictors
predictands <- c("LABEL_ATELECTASIS", "LABEL_EFFUSION")
predictors <- setdiff(metadata$name, predictands)
mi_full <- calculate_mi(
  Y1names = predictands,
  Y2names = predictors,
  learntdir = learnt_dir
)

# Mutual information without age
mi_no_age <- calculate_mi(
  Y1names = predictands,
  Y2names = setdiff(predictors, "AGE"),
  learntdir = learnt_dir
)

# Mutual information without logits
mi_no_logits <- calculate_mi(
  Y1names = predictands,
  Y2names = setdiff(predictors, c("LOGIT_EFFUSION", "LOGIT_ATELECTASIS")),
  learntdir = learnt_dir
)

Code

# -------------------------------
# Organize Results in Tables
# -------------------------------

# Helper to format and print MI results
format_mi_table <- function(mi_result, label) {
  cat("\n", label, "\n")
  print(data.frame(
    Metric = c("MI", "CondEn12", "CondEn21", "En1", "En2", "MImax"),
    Value = round(c(
      mi_result$MI['value'], mi_result$CondEn12['value'],
      mi_result$CondEn21['value'], mi_result$En1['value'],
      mi_result$En2['value'], mi_result$MImax['value']
    ), 6),
    Error = round(c(
      mi_result$MI['error'], mi_result$CondEn12['error'],
      mi_result$CondEn21['error'], mi_result$En1['error'],
      mi_result$En2['error'], mi_result$MImax['error']
    ), 6)
  ))
}

format_mi_table(mi_logits_age, "Logits vs Age")
format_mi_table(mi_full, "Predictands vs All Predictors")
format_mi_table(mi_no_age, "Predictands vs Predictors (No Age)")
format_mi_table(mi_no_logits, "Predictands vs Predictors (No Logits)")

# -------------------------------
# Compare Mutual Information
# -------------------------------

mi_diff_age <- mi_full$MI['value'] - mi_no_age$MI['value']
mi_diff_logits <- mi_full$MI['value'] - mi_no_logits$MI['value']
relative_importance_logits <- 100 * mi_diff_logits / mi_full$MI['value']

cat("\nDifferences and Relative Importance\n")
comparison_table <- data.frame(
  Comparison = c("Difference due to Age", "Difference due to Logits", "Relative Importance of Logits"),
  Value = round(c(mi_diff_age, mi_diff_logits, relative_importance_logits), 6)
)
print(comparison_table)


 Logits vs Age 
    Metric    Value    Error
1       MI 0.133806 0.009322
2 CondEn12 4.982198 0.023841
3 CondEn21 5.902876 0.016751
4      En1 5.115597 0.023321
5      En2 6.035979 0.016041
6    MImax 5.115597 0.023321

 Predictands vs All Predictors 
    Metric     Value    Error
1       MI  0.373046 0.015630
2 CondEn12  1.409143 0.018229
3 CondEn21 12.604340 0.035121
4      En1  1.781067 0.012561
5      En2 12.977324 0.031103
6    MImax  1.781067 0.012561

 Predictands vs Predictors (No Age) 
    Metric    Value    Error
1       MI 0.365143 0.015533
2 CondEn12 1.403404 0.018948
3 CondEn21 6.689791 0.030743
4      En1 1.767748 0.012568
5      En2 7.054970 0.026460
6    MImax 1.767748 0.012568

 Predictands vs Predictors (No Logits) 
    Metric    Value    Error
1       MI 0.022705 0.004892
2 CondEn12 1.765200 0.013556
3 CondEn21 7.993131 0.017449
4      En1 1.787959 0.012775
5      En2 8.015833 0.016959
6    MImax 1.787959 0.012775

Differences and Relative Importance
                     Comparison     Value
1         Difference due to Age  0.007903
2      Difference due to Logits  0.350342
3 Relative Importance of Logits 93.913743

🔢 Mutual Information Results

Summary Table

Predictor Set	MI	CondEn12	CondEn21	En1	En2	MImax
Logits vs Age	0.133806	4.982198	5.902876	5.115597	6.035979	5.115597
Predictands vs All Predictors	0.373046	1.409143	12.604340	1.781067	12.977324	1.781067
Predictands vs Predictors (No Age)	0.365143	1.403404	6.689791	1.767748	7.054970	1.767748
Predictands vs Predictors (No Logits)	0.022705	1.765200	7.993131	1.787959	8.015833	1.787959

Differences and Relative Importance

Comparison	Value
Difference due to Age	0.007903
Difference due to Logits	0.350342
Relative Importance of Logits	93.913743

🎯 Summary of Key Findings

Logits capture almost all predictive information:
- Predictands (effusion and atelectasis) are highly dependent on CNN logits.
- Removing logits causes mutual information (MI) to drop from \(0.3730\) to \(0.0227\).
- This corresponds to a loss of \(0.3503\), or \(93.9\%\) of the total predictive information.
Age contributes negligibly:
- Removing Age slightly decreases MI (from \(0.3730\) to \(0.3651\)), but the change is small.
- Difference due to Age is minor: \(0.0079\).
- Age alone explains very little about the lung condition labels beyond what logits already encode.
CNN logits dominate predictive capability:
- Neural network logits already capture Age related effects.
- Adding extra auxiliary data like Age, Gender, or Ventilation Pressure adds almost no extra predictive value.

🧠 Interpretations and Information Theory Insights

General Predictive Power:
- The mutual information between all predictors and predictands is about \(0.3730\) Shannon (\(\text{Sh}\)), indicating modest predictive ability.
- The maximum Shannon entropy of the predictands is about \(1.78\, \text{Sh}\).
- This means that most uncertainty about the true lung condition remains even after using all available predictors.

Logits vs Age Dependence:
- The mutual information between the two logits and Age is \(0.1338\, \text{Sh}\), out of a maximum of about \(5.12\, \text{Sh}\).
- Knowing Age reduces uncertainty about logits only slightly, from about \(37\) binary guesses to \(35\).
- This agrees with the flatness of the conditional probability plots against Age.

Critical Role of Logits:
- CNN logits carry almost all usable information for predicting lung conditions.
- Without logits, prediction ability nearly disappears, showing a \(93.9\%\) loss in mutual information.

Strength of Bayesian Inference:
- CNN logits are strong but not perfectly reliable.
- Bayesian recalibration (Inferno) adjusts patient specific probabilities, improving flexibility and handling uncertainty much better than a hard threshold.

🛠️ Practical Implications

For clinical models:
- Decision making should be based mainly on CNN logits, calibrated with Bayesian methods.
- Relying only on hard thresholds loses important case by case details.

Future improvements:
- Just adding auxiliary information like age or gender doesn’t improve results much.
- Big improvements will likely need new types of patient information that reveal risk factors the model doesn’t already pick up, such as:
- Geography:
Health risks vary by region. Differences in healthcare access, climate, or income levels can explain why some areas have more lung problems.
- Environmental exposure:
Living near factories, breathing polluted air, or working in risky jobs can lead to lung issues like effusion or atelectasis.
- Hospital practices:
Different hospitals use different methods for ventilation, surgery, and diagnosis, which can change how often lung problems occur or are detected.
- Pre-treatment factors:
Treatments like oxygen therapy, ventilation support, or surgery before imaging can affect what lung issues show up in scans.
- Other health conditions:
Conditions like heart failure or infections can raise the risk of lung problems. Age, gender or view position of the x-ray alone or combined does not capture these risks.