Radiomic nomogram based on MRI to predict grade of branching type intraductal papillary mucinous neoplasms of the pancreas: a multicenter study

Background Accurate diagnosis of high-grade branching type intraductal papillary mucinous neoplasms (BD-IPMNs) is challenging in clinical setting. We aimed to construct and validate a nomogram combining clinical characteristics and radiomic features for the preoperative prediction of low and high-grade in BD-IPMNs. Methods Two hundred and two patients from three medical centers were enrolled. The high-grade BD-IPMN group comprised patients with high-grade dysplasia and invasive carcinoma in BD-IPMN (n = 50). The training cohort comprised patients from the first medical center (n = 103), and the external independent validation cohorts comprised patients from the second and third medical centers (n = 48 and 51). Within 3 months prior to surgery, all patients were subjected to magnetic resonance examination. The volume of interest was delineated on T1-weighted (T1-w) imaging, T2-weighted (T2-w) imaging, and contrast-enhanced T1-weighted (CET1-w) imaging, respectively, on each tumor slice. Quantitative image features were extracted using MITK software (G.E.). The Mann-Whitney U test or independent-sample t-test, and LASSO regression, were applied for data dimension reduction, after which a radiomic signature was constructed for grade assessment. Based on the training cohort, we developed a combined nomogram model incorporating clinical variables and the radiomic signature. Decision curve analysis (DCA), a receiver operating characteristic curve (ROC), a calibration curve, and the area under the ROC curve (AUC) were used to evaluate the utility of the constructed model based on the external independent validation cohorts. Results To predict tumor grade, we developed a nine-feature-combined radiomic signature. For the radiomic signature, the AUC values of high-grade disease were 0.836 in the training cohort, 0.811 in external validation cohort 1, and 0.822 in external validation cohort 2. The CA19–9 level and main pancreatic duct size were identified as independent parameters of high-grade of BD-IPMNs using multivariate logistic regression analysis. The CA19–9 level and main pancreatic duct size were then used to construct the radiomic nomogram. Using the radiomic nomogram, the high-grade disease-associated AUC values were 0.903 (training cohort), 0.884 (external validation cohort 1), and 0.876 (external validation cohort 2). The clinical utility of the developed nomogram was verified using the calibration curve and DCA. Conclusions The developed radiomic nomogram model could effectively distinguish high-grade patients with BD-IPMNs preoperatively. This preoperative identification might improve treatment methods and promote personalized therapy in patients with BD-IPMNs. Supplementary Information The online version contains supplementary material available at 10.1186/s40644-021-00395-6.


Introduction
The pancreatic ductal system mucinous epithelium can develop mucin-producing tumors, such as intraductal papillary mucinous neoplasms (IPMNs) of the pancreas. IPMNs represent approximately 21-33% of cystic neoplasms and is one of the precursors of pancreatic cancer that is identifiable radiographically [1][2][3]. In the past two decades, we found that the detection rate and incidence of IPMNs have increased significantly as a result of the use of advanced diagnostic imaging technology [1,4,5]. When a patients is suspected of having an IPMN, determining the grade of malignancy for an individual patient permits decisions regarding whether surgery or surveillance is appropriate to be made. However, although clinicians are experienced at diagnosing and treating IPMNs, it is still challenging to distinguish high-grade IPMNs (i.e., high-grade dysplasia (HGD) to invasive carcinoma) from low-grade IPMNs (i.e., low-grade dysplasia (LGD) to intermediate-grade dysplasia (IGD)) before surgery.
Currently, laboratory tests, endoscopy, cytology, and imaging technologies play the main roles in differentiating between high-grade and low-grade IPMNs. The International Association of pancreatic diseases (IAP) recommends active surgical treatment for IPMNs of the main duct type and mixed type, while for IPMNs of the branch duct type, according to the revised 2017 international consensus guidelines, surgery is recommended for tumors with indicative features, such as mural nodules, cyst size > 3 cm, main pancreatic duct (MPD) size > 5 mm, rapid cyst growth (≥ 5 mm over 2 years), and increased serum carbohydrate antigen (CA19-9 > 37 ng/ ml) levels [6,7]. The IAP guidelines noted that the mean prevalence of invasive malignancy in BD-IPMNs was 17.7% (1-37%) [8]. In this setting, a considerable number of patients with benign lesions received unnecessary invasive surgery, and the existing grade assessment system, with unsatisfactory specificity and positive predictive value, remains unreliable [9,10]. Therefore, there is an urgent need for a highly sensitive and specific preoperative prediction system to help establish individualized treatment decisions. Medical imaging produces essential information for the preoperative assessment of BD-IPMNs. It has been suggested that computed tomography (CT)-derived radiological features could assess the grade of BD-IPMNs objectively [11][12][13]. Radiomics is the quantitative analysis of images that comprise a large number of features combined with machine learning [14]. Radiomics has demonstrated potential utility in oncological imaging in prognosis, detection, and differential diagnosis assessments, such as in the lung [15], breast [16], prostate [17,18] and liver [19]. However, to the best of our knowledge, there has been no previous research using magnetic resonance (MRI)-derived radiomics in the grade assessment in patients with BD-IPMNs. Some scoring systems or nomograms to predict malignancy from clinical variables have been suggested [20][21][22]; however, these methods had limitations, e.g., the lack of external or internal validation of their clinical efficacy.
Thus, the present study aimed to construct a predictive model that integrated clinical variables and a radiomic signature for preoperative grade assessment in patients with BD-IPMNs. This improved preoperative evaluation model could spare low-grade patients from potentially morbid surgery and permit high-grade patients to undergo resection before transformation into an invasive phenotype [23].

Workflow
The research workflow is shown in Fig. 1, and comprises four parts: Acquiring the images, segmenting the region of interest (ROI), extracting features, and prediction model construction. We acquired T1-w, T2-w MRI images and CET1-w images, and radiologists outlined the tumor area manually on all image slices. Then, the quantitative radiomic features were extracted from the ROI, after which a machine learning model was established to assess the grade of BD-IPMNs. The machine learning model-based tumor pathological grade assessment was developed and validated using three separate datasets. Medical center A (n = 103) data were used as the training dataset to construct the tumor pathological grade assessment model. Medical center B and C (n = 48 and 51) data were used as independent validation datasets to test the developed model.
After extraction of the quantitative radiomics features from the ROI, least absolute shrinkage and selection operator (LASSO) regression, Spearman correlation analysis, analysis of variance (ANOVA) tests, and Mann-Whitney U test, were applied to select the best radiomics features from the radiomic signature. Then, to construct the tumor pathological grade assessment nomogram, multivariate logistic regression was used to integrate the clinical variables and the radiomic signature.

Patients
The Institutional Review Boards of the three centers approved this retrospective study. The provision of signed informed consent was waived. This study was conducted following the tenets of the Declaration of Helsinki. Between Mar. 2012 and Feb. 2020, patients who were reported to have BD-IPMNs on pathological assessment and who underwent an MRI scan of their pancreas before surgery were included in this study. If the branch duct-type information was not included on the pathology report, patients with BD-IPMNs were selected according to their pre-operative MRI report. Among them, 91 patients were excluded as follows: Patients lacking complete clinical data (n = 23), suboptimal MRI image quality caused by severe motion artifacts (n = 9,) and patients who did not receive an MRI scan within 3 months before surgery (n = 59) (Fig. S1).
A multidisciplinary team comprising, surgeons, oncologists, and radiologists assessed the patients. Clinical characteristics (e.g., sex, symptoms (abdominal pain or fatigue), age, mural nodule, cyst size, main pancreatic duct (MPD) size, CA19-9 and CEA level) were obtained through review of the clinical data by a surgeon and a radiologist with over 10 years of clinical experience. The size of the MPD was determined at the maximum dilation point of the pancreatic duct. Mural nodules were defined as any solid papillary protuberances in the cyst. The serum level of CEA and CA19-9 were measured using ELISA. The serum CA19-9 level was judged to be elevated if it was higher than the upper limit of normal (37 U/mL). The serum CEA level was judged to be elevated if it was higher than the upper limit of normal (5 U/mL). The pathological grade (LGD to invasive carcinoma) was assessed by two pathologists with over 10 years of experience in abdominal tumor diagnosis according to the 2010 World Health Organization classification [24].

Acquisition of MRI images
Four different scanning images: preoperative T1-w, T2w, CET1-w arterial phase, and CET1-w portal venous phase were acquired for all images. T1-w and CET1-w images were collected during breath holding and T2 images were collected by breath triggering. The details of MRI image acquisition are shown in Supplementary material I.

Tumor segmentation and Radiomics feature extraction
To extract the features, portal venous phase imaging, arterial phase imaging, and pre-contrast T1-w, T2-w were used (Fig. S2). Radiologists manually contoured the ROIs on the MRI images. If possible, reference images were acquired using contrast-enhanced computed tomography (CT). Image standardization before ROI segmentation was conducted, which makes the thickness of the layer consistent, thus ensuring that the size and bedding of each ROI segmentation are consistent. Before ROI segmentation, the two radiologists who performed the analysis (Y.J.W. and W.Y) were blinded to the clinical outcome. The whole tumor was segmented manually using ITK-SNAP (www.itk-snap.org) on all slices of the tumor [25]. According to previously published studies, when there were multiple tumors, the tumors with the largest diameter were selected for analysis [26]. One of the radiologists drew the tumor boundary, which was verified by the other radiologist. The MITK software (Medical Imaging Interaction Tookit 3.1.0.A, GE Healthcare) was used to extract the radiomic features from the three-dimensional ROIs, resulting in 328 features. The extracted features included: Histograms, texture parameters, RLM (run length matrix), GLCM (gray level co-occurrence matrix), and form factor parameters. Each factor's average value was subtracted from all extracted radiometric features, and divided by the standard deviation value (Z score normalization), which eliminates the limitation imposed by each feature's units. To standardize the different scales used to process the variables, each feature's average value was subtracted from all the radiomic features in the training data set, and then divided by their standard deviation values, respectively. Then, using the mean and standard deviation values derived from the training dataset, the same normalization method was applied.
To construct a realistic radiomic signature that combines the most appropriate radiomic features we used the LASSO regression method, as described by Monica and Kumamaru, to select the most nonredundant and robust radiomic features [27,28]. Supplementary material II describes the details of the LASSO method. To predict the grade assessment for each patient, the best selected radiomic features from the training cohort were recalculated using a linear combination of selected features weighted by their respective coefficients. Receiver operating characteristic (ROC) curve and area under the ROC curve (AUC) were used to evaluate the predictive accuracy of the developed radiomic signature. The reproducibility of radiomics feature extraction was assessed using intra-and inter-class correlation coefficients (ICCs). Initially, 30 ROIs were chosen randomly from each sequence. To calculate the intra-observer ICC, reader A segmentation was repeated at a 7-day interval and compared with the original segmentation. Comparing the segment extraction of reader B with that of reader A (first time) was used to calculate the interobserver ICC. An ICC value greater than 0.8 was considered to show good consistency of feature extraction [29].

Radiomic Nomogram construction
After univariate logistic regression, independent predictors of high-grade BD-IPMNs were selected using multivariate logistic regression analysis, and then the new radiomic nomogram was established using these independent predictors. Next, to test the calibration and recognition performance of the radiomic nomogram, the training and validation sets were used. The calibration curve displayed the performance characteristics of the multimodal radiomic nomogram models graphically. The predictive accuracy of the combined nomogram model was indicated by the degree of overlap between the diagonal in the graph and the calibration curve. The independent validation cohorts of 48 and 51 patients from the second and third medical centers were used to validate the radiomic nomogram. In the three cohorts (training and two independent validation cohorts), the clinical utility of the radiomic nomogram model was assessed using decision curve analysis (DCA). Supplementary material III shows the details of the DCA method.
Statistical analysis SPSS 18.0 (IBM, Armonk, NY, USA), R software (v. 3.5.1, Vienna, Austria), and MedCalc software v. 15.2.2 (https://www.medcalc.org/) were used to perform all the statistical analyses. To compare continuous variables, Mann-Whitney U-tests and the independent-sample ttest were performed when appropriate. To compare categorical variables between groups, Fisher's exact test or a chi-squared test was used. To assess the impact of variations between intra-and inter-readers in the extracted radiomics features, intra-and inter-class correlation coefficients (ICC) were determined. In the ROC analysis, the best threshold was determined using the Youden Index. For the developed logistic regression models, the goodness of fit was examined using the Hosmer-Lemeshow test. The "glmnet" package was used to carry out the LASSO regression analysis. The "pROC" package was used to produce the ROC plots. The "rms" package was used to plot the nomogram and calibration curve.
For DCA, the "rmda" package was used. Statistical significance was accepted at P < 0.05.

Patients characteristics
Among the three medical centers, we recruited 202 patients diagnosed with BD-IPMNs. The training cohort comprised patients from the first medical center (n = 103). The external independent validation cohorts comprised the patients from the second and third medical centers (n = 48 and 51). The characteristics of the patients in the training and validation datasets were evenly distributed (Table S1). There were no statistically significant differences in BD-IPMN pathological grade assessment and clinical characteristics (sex, age, symptoms, mural nodule, cyst size, MPD size, CA19-9, and CEA) between the training and validation datasets. Pathologically, high-grade BD-IPMNs were detected in 24.8% patients. The detailed distribution of clinical characteristics in the low-grade and high-grade groups is summarized in Table 1. In the training and validation datasets, we noted significant differences for the cyst size, mural nodule, MPD size, and CA19-9 between the low grade and high grade groups.

Selection of Radiomics features and construction of the Radiomics signature
In the training dataset, 334 statistically significant features (P < 0.05) between the low-grade and high-grade groups were identified from 1312 texture features, from which 51 features were identified by Spearman correlation analysis. To construct the radiomics signature, LASSO was used to select the nine most valuable texture features (Fig. S3), including five GLCM features, two histogram features, one texture parameter feature, and one form factor feature. The LASSO regression method derived the coefficient for each selected feature. Then the radiomic features classification and the calculation of texture features after dimension reduction are carried out (Table S2 and Fig. S4). Supplementary material II shows the details of the selected features and the formula used to calculate the radiomic signature. The developed radiomic signature model produced a good result when predicting the histological grade (LGD/IGD vs. HGD/associated invasive carcinoma), resulting in an AUC of 0.836 in the training set (95% confidence interval (CI), 0.750-0.901), 0.811 in validation set 1 (95% CI, 0.671-0.909), and 0.822 in validation set 2 (95% CI, 0.690-0.915). Figure 2a-c shows the ROC curves of the radiomic signature based on the three datasets. Next, we determined the quantitative scores of the radiomic signature for each patient with respect to the classification of grade assessment of BD-IPMNs, to show the effectiveness of the radiomic signature model at the individual level (Fig. 3a-c), the percentage of patients in the low grade category whose rad scores overlap with high grade category were 58.4% in the training set, 60.5% in validation set 1 and 51.4% in validation set 2. Table 2 shows the details of the performance evaluation of the radiomic signature.

Construction of the combined Nomogram
In the univariate analysis of the training cohort, the lowand high-grade groups showed significant differences for the CA19-9 level, largest cyst size, size of the MPD, mural nodule, and the radiomic signature. The AUC value for all the significant variables in the univariate analysis showed in Table 3. Multivariate logistic analysis showed that the size of the MPD (odds ratio (OR): 4.263, 95% CI: 1.762-10.3113, P = 0.001), the CA19-9 level (OR: 8.402, 95% CI: 1.622-43.524, P = 0.011), and the radiomic signature (OR: 3.434, 95% CI: 1.638-7.197, P = 0.001) were independent parameters of high-grade of BD-IPMN. Therefore, a radiomics nomogram model incorporating the developed radiomics signature with the size of the MPD and the CA19-9 level was constructed. A weighted number of points was assigned to each factor. The nomogram was then used to calculate the total point score of each patient, which was analyzed for its correlation with the estimated probability of high-grade of BD-IPMNs (Fig. 4a).

Radiomic Nomogram performance evaluation
ROC analysis was used to confirm the utility of the combined nomogram, resulting in an AUC of 0.903 (95% CI, 0.828-0.952), a sensitivity of 0.734, and a specificity of 0.948 for the training set; an AUC of 0.884 (95% CI, 0.759-0.958), a sensitivity of 0.900, and a specificity of 0.790 for validation set 1; and an AUC of 0.876 (95% CI, 0.754-0.952), a sensitivity of 0.857, and a specificity of 0.837 for validation set 2 (Fig. 5a-c). The AUC values showed that the combined nomogram performed well in the assessment of tumor grade. The calibration curve showed that there was sufficient consistency between the nomogram-estimated grade and the actual observed in the three cohorts (Fig. 4b-d). Table 4 shows the details of the radiomic nomogram's performance.
DCA was used to reveal the utility of the combined nomogram for clinical decision making. The clinical utility of the corresponding strategies was demonstrated by the area under the decision curve (Fig. 6ac). The area occupied by the combined nomogram (red) was larger than that of the radiomic signature (blue) alone, and was larger than those of the "all "(gray) or "none" (black) strategies in the training and the validation sets.

Discussion
In the present study, we constructed a combined nomogram and investigated its ability to predict tumor pathological grade preoperatively in patients with BD-IPMNs.
The radiomic signature was based on nine features, and was considered an effective method of preoperative tumor grade assessment, representing a non-invasive imaging biomarker. Combining the radiomics signature with clinical variables (CA19-9 and the MPD size), as a combined nomogram model, significantly improved the predictive performance. The repeatability and reliability of the developed prediction model was confirmed using independent datasets from other institutions. Preoperative grade assessment of patients with BD-IPMNs is clinically important. According to the current consensus, most of the excised BD-IPMNs are low-grade diseases. The average incidence of invasive malignant tumors in BD-IPMNs is 17.7% (1-37%) [8]; therefore, although many patients might have dangerous signs, such as mural nodules and dilation of the main pancreatic duct, they can still be followed up for a long time without selective surgery. However, currently, the performance of the existing discrimination system is unsatisfactory, and most studies recommend immediate resection for patients with obstructive jaundice, enhanced mural nodules, and main duct dilation > 10 mm [30]. Similarly, there are no biomarkers to predict highgrade BD-IPMNs. Therefore, an advanced discriminative method with high sensitivity and specificity would provide valuable information to determine clinical  Fig. 3 Rad-score prediction of grade of BD-IPMNs. The rad-score depicted using score dot diagrams in a the training cohort, b external validation cohort 1, and c external validation cohort 2. Red indicates a high grade of BD-IPMNs, blue represents a low grade of BD-IPMNs. A high score indicates a high likelihood of high-grade BD-IPMNs strategies, resulting in patients with low-grade IPMNs avoiding unnecessary surgery for a benign disease. Of note, the goal in stratifying patients is to identify those that are at risk of developing or harboring invasive malignancy, and so even if patients with "benign" disease undergo resection this may be of treatment benefit so that they do not develop future carcinoma.
In the present study, we analyzed the predictive power of quantitative MRI features to assess the grade of BD-IPMNs. The results showed the potential value of radiomic features. We analyzed both high and low-order radiomic features identified in previous studies. The histogram parameter (low-order feature) is associated to single pixel characteristics, describing the distribution of voxel intensity via common and basic measures [31]. Hoffman et al. performed preoperative MRI texture analysis [32] and showed that intensity histogram-based statistical features and entropy from MRI images could predict the malignancy of BD-IPMNs. High order radiomic features, including GLCM features, mainly assess the spatial relationships among local neighboring pixels [33,34]. Hanania et al. [13] identified 14 imaging biomarkers within GLCM features that predicted the histopathological grade within cyst contours. Tobaly et al. [35] developed a radiomic model mostly based on high order CT radiomic features, which showed high diagnostic performance in differentiating benign from malignant IPMNs. Our results showed that most of the selected features were high-order features, which was consistent with the results presented by previous studies. Therefore, we hypothesized that the tumor biology and heterogeneity were better represented by high-order features. Previous studies recognized the value of assessing pathological features among radiomic features, such as in nonfunctional pancreatic neuroendocrine tumors, soft-tissue masses, and rectal cancer [36][37][38]. However, it remains challenging to associate a single radiomic feature with the complex biological processes of tumors. Therefore, it is common to construct a multi-factor panel to estimate the results in a radiomic background. Multi-factor based radiomic methods are usually more suitable to describe the complex heterogeneity of BD-IPMNs. The results presented here indicated that our radiomic signature could satisfactorily discriminate lowgrade and high-grade of BD-IPMNs in the patients in the training and validation cohorts.
According to previous studies of IPMNs, the symptoms, obstructive jaundice, presence of mural nodules, cyst size, age, and sex were also associated with pathology grade assessment [21,39]. However, we failed to confirm these results in the present study. In addition, parameters like pain could be subjectively reported according to the different sensitivities of patients to tumor-related abdominal pain, which might also be   associated with tumor infiltration into the viscera, blood vessels, or peripheral nerves. We observed that age and sex were not related to grade, which might be related to populations from different ethnicities. Obstructive jaundice was rare in the patients in our study, which could be related to the limited sample size. The radiomic nomogram constructed in our study incorporated clinical factors (the size of MPD and CA19-9 levels), which might be useful to identify those patients at risk of malignancy and to select the best treatment.
The CA19-9 level is useful for the differential diagnosis of pancreatic carcinoma and benign pancreatic diseases, and increased CA19-9 levels are an independent predictor of malignant BD-IPMNs [20,[40][41][42]. However, in the present study, CA19-9 alone achieved an AUC of 0.762 to discriminate high-grade in patients training set. Thus, CA19-9 alone is not sufficient to evaluate the high-grade of BD-IPMNs accurately. Previous studies showed that main duct dilatation was a significant predictor of malignancy and suggested that the patients with main duct dilatation over 10 mm should undergo resection without further testing or calculation [39]. In the present study, we divided the MPD dilation into four degrees according to previous studies [10,39], and found that patients with a larger MPD dilation were more likely to have a high-grade of BD-IPMNs.
The DCA curve showed that the radiomic nomogram was superior to the radiomic signature over a large threshold probability range, which indicated that the clinical parameters increased the incremental value for BD-IPMN grade assessment in the training and validation sets.
The established grade assessment model has obvious advantages. For clinicians seeking low-cost techniques to improve patient management, quantitative image analysis is attractive, because it comprises a noninvasive disease assessment at multiple time points and tumor sites compared with conventional diagnostic imaging. In the current study, the results showed that the AUC of the model was further improved to 0.903 by combining the nomogram with clinical characteristics and the radiomic signature, suggesting that advanced imaging technology should be considered together with well-known clinical factors. In addition, the pre-operative data used to establish the grade assessment model is easy to obtain, with less additional costs and provides good results.
The present study had certain limitations. First, because of the retrospective design of our study, there may be selection bias. Second, this study involved a relatively small sample, and the low-grade and high-grade groups were not further separated into patients with low and intermediate-grade dysplasia or high-grade dysplasia and patients with associated invasive carcinoma. However, our cohort was sufficient to build a reliable model. Third, the area of potential peritumoral infiltration were not included in this study, which may result in underestimation of tumor area. In future, studies are still needed for detailed analysis of features extracted from peritumoral area. However, as we focus on the visible tumor area, the results are still reliable. Our study had  Fig. 6 Clinical utility evaluation of the radiomic signature and the radiomic nomogram using DCA curves. Evaluation in the training cohort (a), external validation cohort 1 (b), and external validation cohort 2 (c). The y-axis shows the net benefit. The x-axis shows the threshold probability. The highest net benefit was gained using the radiomic nomogram (red line) compared with the radiomic signature (blue line), the treat-all strategy (gray line), and the treat-none strategy (horizontal black line) another limitation: Tumor area segmentation has to be performed manually by radiologists. Automatic segmentation will be more convenient and should become the standard. In the future, we will construct a fully automatic prediction model incorporating automatic segmentation of the pancreas and cyst areas.

Conclusions
The preoperative pathological grade of BD-IPMNs could be predicted effectively using the developed nomogram model combining the radiomic signature and tumor clinical characteristics. The predictive nomogram model represents an accurate and noninvasive assessment method for patients with BD-IPMNs before surgery, which will help clinicians to alter the treatment protocol for each patient and thus obtain improved clinical outcomes in the future.
Additional file 1. The Supplementary Material for this article was available in additional file 1.