MET-RADS-P is a guideline for the treatment response evaluation of systemic metastases of patients with APC, which involves the evaluation of primary focus, bone metastases, lymph node metastases and organ metastases. In this study, we established a semiautomatic pelvic lymph node treatment response evaluation process for patients with APC through lymph node segmentation based on deep learning. Our results showed that the accuracies of automated segmentation-based response assessment were high for all the target lesions, nontarget lesions and nonpathological lesions according to MET-RADS-P criteria and achieved good consistency with the attending radiologist and fellow radiologist.
Based on the morphology and signal characteristics of all acquired images, the MET-RADS-P system mapped unequivocal diseases to 14 predefined body regions [8, 15]. Analysis of lymph node metastases in the pelvis is crucial for clinical practice and drug studies in patients with APC, which is the most common metastatic site [17]. A lymph node's size is highly correlated with survival time, a measurement that radiologists and clinicians perform to monitor disease progression or assess therapeutic options, due to the fact that many malignancies can enlarge lymph nodes [18]. According to the Response Evaluation Criteria in Solid Tumors 1.1 (RECIST 1.1) Guidelines, lymph nodes with a short-axis diameter of at least 10 mm are considered to be enlarged lymph nodes and are clinically significant [19]. The size standard of pathological lymph nodes defined by MET-RADS-P based on MRI was similar to RECIST 1.1, while MET-RADS-P provides a more complete assessment of nodal metastases response including the nontarget nodes and nonpathologic nodes, which was usually qualitatively assessed by RECIST 1.1 criteria.
According to the MET-RADS-P criteria, the core whole body MRI protocol designed for bone and lymph node metastasis detection included T1WI (GRE Dixon technique) and axial DWI [8]. DWI is a well-recognized and used sequence for pelvic lymph node imaging, that is able to offer qualitative and quantitative assessments for disease characterizations [14, 20]. Therefore, in this study, we performed the treatment response assessment only on DWI images.
In this study, the established semiautomatic pelvic lymph node treatment response evaluation process according to MET-RADS-P criteria included two parts. First, a previously established pelvic lymph node segmentation model was used to perform the automatic segmentation of lymph nodes. The model achieved good segmentation performance here, which is similar to the segmentation results reported in previous literature (the DSC and VS values for all visible lymph nodes were 0.76 ± 0.15 and 0.82 ± 0.14, respectively) [14], especially the target lesions, further highlighting its potential usefulness.
Second, based on the quantitative measurements obtained from the automated segmentation, we can directly evaluate the treatment response according to MET-RADS-P criteria, which can be more practical in clinical settings. A clinical radiology report provides a qualitative narrative, but does not provide standardized, quantitative information about the patient's progress or response to treatment [21]. Natural language processing and deep learning models have been employed in previous studies to estimate responses from clinical text [22, 23]. These approaches can be feasible for quantitative assessment related to MET-RADS-P criteria but can be indirect.
Our proposed semiautomated algorithm achieved high Kappa values in terms of treatment response assessment with attending and fellow radiologists when measuring the same set of target and nontarget lesions. The consistency of nonpathological lesions was lower, which may be due to the relatively poor segmentation performance. Tang et al. [24] proposed a deep learning-based method for semiautomated RECISTS measurement and assessed using a mean difference between the deep learning algorithm and manual measurement in the unit of pixels. Scores using pixel difference, however, may not be reliable, as scores are largely determined by data composition. In this study, we used Bland–Altman plotting based on percent measurement difference to address the issue as suggested by Woo et al. [25]. As demonstrated, the Bland–Altman analysis indicated good consistency between the automated segmentation and manual segmentation, and most values were within the upper and lower LOA.
There are some limitations that need to be addressed. First, in this study, the deep learning-based treatment response assessment was only focused on the pelvic lymph node, and other regions of the body according to the MET-RADS-P guideline need to be investigated in the future. Second, we acknowledge that there remain opportunities for further model refinement, including the achievement of lymph node registration between baseline and posttreatment images, thus realizing fully automated lymph node treatment response evaluation. Finally, our results demonstrated that the semiautomated treatment response assessment can be achieved on the DWI sequence, but the values of other sequences (e.g. T1WI, DCE or T2WI) on response assessment also need to be investigated in further studies.