Radiomics for glioblastoma survival analysis in pre-operative MRI: exploring feature robustness, class boundaries, and machine learning techniques

Table 3 Performance drop of models trained on single-center data and applied to unseen multi-center data, using non-robust and robust featues withs priors, averaged across class boundaries (lower is better). Listed as mean and 95% confidence intervals, calculated with the adjusted bootstrap percentile (BCa) method. The lowest drop is indicated in bold for each metric. Bal. Acc.: Balanced accuracy, Acc.: Accuracy

Feature set	AUC drop	Bal. acc. drop	Acc. drop	Specificity drop	Sensitivity drop	F1 drop	Precision drop
Non-robust features	0.52 CI: [0.50,0.56]	0.40 CI: [0.26,0.45]	0.48 CI: [0.33,0.53]	0.80 CI: [0.70,0.88]	0.06 CI: [0.00,0.15]	0.38 CI: [0.24,0.50]	0.54 CI: [0.39,0.63]
Robust features, sequence prior	0.30 CI: [0.22,0.36]	0.26 CI: [0.03,0.35]	0.37 CI: [0.33,0.43]	0.40 CI: [−0.10,0.75]	0.18 CI: [0.00,0.34]	0.38 CI: [0.24,0.53]	0.51 CI: [0.37,0.65]
Robust features, hand-picked	0.32 CI: [0.27,0.36]	0.26 CI: [0.18,0.31]	0.33 CI: [0.27,0.37]	0.42 CI: [0.27,0.50]	0.16 CI: [0.02,0.37]	0.35 CI: [0.22,0.54]	0.48 CI: [0.35,0.66]