Intravoxel incoherent motion and ADC measurements for differentiating benign from malignant thyroid nodules: utilizing the most repeatable region of interest delineation at 3.0 T

Background There is a growing need for a reproducible and effective imaging method for the quantitative differentiation of benign from malignant thyroid nodules. This study aimed to investigate the performances of intravoxel incoherent motion (IVIM) parameters and the apparent diffusion coefficient (ADC) in differentiating malignant from benign thyroid nodules derived from the most repeatable region of interest (ROI) delineation. Methods Forty-three patients with 46 pathologically confirmed thyroid nodules underwent diffusion-weighted imaging (DWI) with 8 b values. Two observers measured the intravoxel incoherent motion (IVIM) parameters (D, f and D*) and the apparent diffusion coefficient (ADC), ADC600 and ADC990 values using whole-lesion (W-L) ROI and IVIM parameters using single-section (S-S) ROI delineation. The intraclass correlation coefficients (ICCs) and Bland-Altman plots were used to evaluate the intra- and interobserver variability. The diagnostic performance of these parameters was evaluated by generating receiver operating characteristic (ROC) curves. Results The ICC values of all IVIM with W-L ROI delineation were higher than those with S-S ROI delineation, and excellent intra- and interobserver reproducibility was obtained. According to the Bland-Altman plots, the 95% limits of agreement of the IVIM parameters determined by the W-L ROIs revealed smaller absolute intra- and interobserver variability than those determined by S-S ROIs. The D and ADC600 values obtained from the W-L ROIs were the most powerful parameters in differentiating benign from the malignant nodules [area under the ROC curve = 0.962 and 0.970, P = 0.771]. Conclusions The W-L ROI of the thyroid was considered an effective method for obtaining IVIM measurements with excellent reproducibility for differentiating benign from malignant nodules.


Background
Thyroid cancer is the most common malignant disease of the endocrine system, and its incidence is rapidly increasing [1]. Although most thyroid cancers are well differentiated papillary carcinomas [2], due to the heavy psychological and mental burden of this disease, most patients tend to undergo surgery. Because the thyroid gland is a superficial organ, the application of highresolution ultrasonography (US) in thyroid diseases has obvious advantages. However, thyroid nodules evaluated by US or US-guided fine needle aspiration biopsy are reportedly operator dependent, and no single sonographic feature has sufficient diagnostic value. According to a systematic review and meta-analysis of observational studies, the accuracy of US in predicting malignant thyroid nodules has a sensitivity in the range of 26~87% and a specificity in the range of 40~93% [3].
Diffusion-weighted imaging (DWI) is a functional magnetic resonance imaging (MRI) modality. The apparent diffusion coefficient (ADC) value is an objective parameter of DWI that reflects free water diffusion and provides a quantitative measurement of the motion of water molecules in various diseases. Previous studies have demonstrated the potential of ADC in distinguishing benign from malignant thyroid nodules and other thyroid diseases [4,5]. On the basis of a recent metaanalysis of studies differentiating benign from malignant thyroid nodules, the sensitivity of quantitative DWI and ADC was 0.90~0.91, and the specificity was 0.93~0.95 [6,7]. However, the diagnostic threshold demonstrated considerable variability (0.36-2.56 × 10 − 3 mm 2 /s) among studies. Considering the limitations of applying lower or higher b values in a linear form, investigations into head and neck cancers to differentiate benign from malignant neoplasms and predict treatment responses have been performed using intravoxel incoherent motion (IVIM), which could better reflect true diffusion in the tissue [8,9]. Recently, IVIM applications in thyroid and related diseases have been increasingly reported [10][11][12]. Texture analyses of thyroid nodules based on magnetic resonance imaging have also been reported [13]. Based on large-sample and multi-centre research, artificial intelligence and machine learning could gradually be applied to magnetic resonance imaging of thyroid nodules.
Accurate image segmentation is the premise of radiomics analyses. Most previous studies adopted a manual single-section (S-S) region of interest (ROI) analysis for the quantitative assessment of thyroid nodules to avoid necrotic, cystic and micro-calcified areas [6,7,[14][15][16][17], while few studies used whole-lesion (W-L) ROI analyses [12,[18][19][20]. Notably, the appropriate ROI size and positioning have considerable influence on the quantitative parameters of tumour and interobserver variability [21,22]. Furthermore, some studies used W-L ROI IVIM for rectal and hepatic tumours, believing that this method can accurately reflect and evaluate the information of the lesions [21,23,24]. At present, there are no comparative studies on the delineation of thyroid nodules in the literature.
Therefore, this study aimed to compare the reproducibility and repeatability of the IVIM parameters derived from W-L ROI and S-S ROIs and the IVIM and ADC parameters derived from W-L ROIs to distinguish malignant from benign thyroid nodules.

Study design
This study was approved by the local ethics committee. Inclusion criteria were as follows: (1) thyroid nodules or masses detected by physical examination or incidentally; (2) thyroid nodules > 1 cm detected by ultrasonography; (3) clinicians and/or patients believed that the lesion could be removed surgically; and (4) written informed consent for MRI examination was provided. For some nodules that were suspected to be benign, surgical treatment was performed according to the patients' preferences or symptoms. Patients were excluded if they had any MRI-incompatible metallic devices or motion artefacts during the MRI examination.

Image analysis
Image analysis was performed using MATLAB scripts (The MathWorks, Natick, MA, USA) that provided the D, D*, f, and ADC metrics mapped on a pixel-by-pixel basis. The images were independently analysed by two radiologists (one radiologist with 10 years of experience in MRI radiology and one radiologist with 23 years of experience) who were blinded to the clinical and histopathological data.
This study adopted two methods of delineation: W-L ROI and S-S ROI. For the W-L ROI delineation, a 3D ROI was traced along the margins of the whole-lesion slice by slice, including the cystic, necrotic, and haemorrhagic portions, and obtained at b = 0 s/mm 2 , with reference to the corresponding T2WI (Figs. 1 and 2). The W-L ROI delineation trajectory was then saved. For the S-S ROI delineation, the same two radiologists independently drew two sample round/oval-shaped ROIs ranging from 20 to 60 pixels in size within the most solid tumour part (in the b = 0 s/mm 2 ) of the two independent tumour-containing slices, which were drawn as large as possible while avoiding cystic lesions, necrotic tissues, and vessels. Repeated measurements using the two ROI methods were performed at an interval of 4 weeks by one radiologist.
The ROIs obtained from these two methods were applied in the bi-exponential IVIM model. Diffusivity values were set as follows: 1. D (true diffusion, × 10 − 3 mm 2 /s); 2. D* (pseudo-diffusion, × 10 − 3 mm 2 /s); and 3. f (fractional perfusion related to microcirculation, %). The IVIM model equation was as follows: [25]. All three parameters were derived by direct curve fitting, and no cut-off values were used. For curve fitting, both D and D* were set to be positive and were smaller than 0.01, while D* was larger than D. In addition, f ranged from 0 to 1.
The mono-exponential ADC was noted as the ADC (× 10 − 3 mm 2 /s). The ADC was calculated according to the mono-exponential fit of the signal intensity using b = 0 s/mm 2 and 600 s/mm 2 (ADC 600 ) and b = 0 s/mm 2 and 990 s/mm 2 (ADC 990 ) based on the following equation: ln (Sb) = ln (S0) − bADC. The ADC values were measured with W-L ROIs that were saved as described above for the same patient.

Statistical analysis
Normality testing was performed using the Shapiro-Wilk test, and homogeneity of variance was evaluated using one-way ANOVA (Levene test). The numerical continuous variables are expressed as the mean ± standard deviation. The intraclass correlation coefficient (ICC) was calculated to evaluate intra-and interobserver variability. Interobserver agreement for W-L ROI and S-S ROI measurements was analysed by calculating the ICC and was interpreted as follows: 0.00-0.20, poor correlation; 0.21-0.40, fair correlation; 0.41-0.60, moderate correlation; 0.61-0.80, good correlation; and 0.81-1.00, excellent correlation. We also analysed the level of agreement (both inter-and intraobserver) by plotting the differences between the two measurements against the averages of the two measurements according to the method described by Bland and Altman (i.e., Bland-Altman plots). Differences between the two methods (W-L ROIs and S-S ROIs) were compared by paired-sample t-tests. The mean values of all parameters in the malignant group and the benign group were compared by independent-sample t-tests. Receiver operating characteristic (ROC) curves were generated to evaluate the diagnostic performance of IVIM and ADC parameters for distinguishing benign from malignant thyroid nodules. The optimal threshold was selected according to the Youden index. The area under the ROC curve (AUC) was calculated, and the sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and accuracy rate were determined. In addition, the AUC values of different parameters were compared with the Z-test proposed by DeLong et al. [26]. The parameters found to be statistically significant in the univariate analysis were subsequently entered into the multiple logistic regression models to determine their independent predictive value in predicting thyroid nodules. All statistical analyses were performed using MedCalc Online, version 16.2 (Medcalc Software, Mariakerke, Belgium) and SPSS (18.0 for Windows, SPSS, Chicago, IL). The adjusted P value was used for multiple comparisons. For all other statistical tests, a P value < 0.05 was used to indicate a statistically significant difference.

Results
The results of normality testing showed that all parameters (i.e., D, f and D* derived from W-L ROIs and S-S ROIs) had approximately normal distributions, and Levene's test revealed that all the variances were homogeneous (Table 1). Paired-sample t-tests of IVIM parameters measured using W-L ROIs and S-S ROIs demonstrated no statistically significant differences. The results of the intra-and interobserver reproducibility of the W-L ROI and S-S ROI analyses are provided in Table 2. The ICC values of all IVIM parameters (D, f and D*) using W-L ROI delineation were higher than those using S-S ROI delineation, and excellent intraand interobserver reproducibility was obtained (range, 0.902-0.986 and 0.894-0.970, respectively). When an S-S ROI was applied, the intra-and interobserver reproducibility of the D value alone was excellent (ICC = 0.908 and 0.920, respectively), but the intra-and interobserver reproducibility for the D* value was only fair (ICC = 0.510 and 0.562, respectively). For the f value, the intraobserver agreement was moderate (ICC = 0.701), while the interobserver agreement was excellent (ICC = 0.846). The Bland-Altman plots for the IVIM parameters derived from W-L ROIs revealed small absolute intraand interobserver variability, and the 95% limits of agreement (LoAs) for the IVIM parameters are presented in Table 3.
The results of independent-sample t-tests for benign and malignant nodules are summarized in Table 4. Among the three IVIM parameters derived from the two ROI methods, the D and f values were significantly lower in the malignant nodules than in the benign nodules. Their respective optimal cut-off values in the univariate and ROC curve analyses are summarized in Table 5 and are shown in Fig. 3. The cut-off values for D and f were 0.870 × 10 − 3 mm2/s (the D values of the benign nodules were greater than this value) and 35.30% (the f values of the benign nodules were greater than this value), respectively. The multivariate logistic regression showed that the AUC of D combined with the f value was not significantly higher than that of the D value [0.979 (95% confidence interval (CI), 0.886-1.000) vs. 0.962 (95% CI, 0.860-0.996), P = 0.461] (Fig. 4). The D and ADC 600 values obtained from W-L ROIs achieved the highest AUC [0.962 (95% CI, 0.860-0.996) and 0.970 (95% CI, 0.871-0.998), P = 0.771] for differentiating malignant thyroid nodules from benign nodules.

Discussion
The results reported in this study revealed that the IVIM parameters (D, f and D*) derived from W-L ROI delineation led to more reproducible and repeatable results than those derived from S-S ROI delineation. The diagnostic efficacy of the D value was almost as high as that of the ADC obtained from W-L ROI delineation. W-L ROI delineation of the thyroid to obtain IVIM measurements can be considered an effective method with excellent reproducibility for differentiating benign from malignant nodules.
In quantitative and qualitative analyses, unified delineation of lesions is very important. Most previous studies evaluating ADC values from thyroid nodules adopted S-S ROIs with solid areas, which did not include necrotic, cystic and micro-calcified areas [14][15][16][17]. However, most thyroid tumours are follicular-derived neoplasms, and the heterogeneity remained significant [2]. For this reason, S-S ROI delineation of solid areas may be associated with some problems. First, different observers or studies may select different slices for delineation. Second, even if the lesion is delineated on the same slice, the location and shape of the selected ROI are not always consistent. Third, the process may avoid visible cystic areas, but microcysts and micro-calcifications exist that cannot be avoided, leading to a decrease in the restrictive cell membranes and less hindrance to diffusion [2,8]. The results derived from the S-S ROI analysis showed variability in both sensitivity and specificity [6,7]. Whereas some studies have used W-L ADC imaging or histograms to evaluate thyroid nodules and distinguish different types of malignant thyroid nodules or benign nodules from malignant nodules [18][19][20], this study used IVIM with W-L ROI delineation that included cystic, necrotic and micro-calcified components of thyroid nodules, and the results demonstrated that all parameters had better reproducibility and repeatability than those of S-S ROI delineation. Notably, the perfusionrelated parameters (f and D*) derived from the W-L ROIs showed excellent intra-and interobserver agreement, which may be related to the use of ROIs across whole tumours, thus minimizing the intra-and interobserver differences in ROI delineations. In addition, such differences may be a concern with the application of rFOV DWI, which employs a special 2D RF excitation pulse that is spatially selective in both the slice-select and phase-encoding directions. This method effectively provides high-resolution images with fewer artefacts and less distortion, providing more reliable and repeatable results to delineate the lesions more accurately [27,28]. In our results, the intra-and interobserver agreement    Previous studies involving S-S ROI analysis of other organs have also shown that the repeatability of the D* or f values was poor [29][30][31], possibly because the analysis was associated with high uncertainty. A major challenge is the high variations in the signal-to-noise ratio (SNR) of the perfusion markers (D* and f), which has also been mentioned by several other studies [23,[32][33][34]. Furthermore, differences in the ROI analysis between the observers in this research who selected the slice of a lesion or delineation inconsistency may increase the uncertainty of D* values. The results of this study indicated that the repeatability of IVIM parameters is excellent when using W-L ROI analysis for thyroid nodules, providing a strong basic foundation for future multi-centre, large-sample studies as well as stratification research. These results revealed that the diffusion effect plays a major role in differentiating malignant thyroid nodules, which may be based on differences in the cell density and histopathologic features of benign and malignant thyroid nodules [18,35]. Some ADC and DKI studies with higher b values (1000 s/mm 2 to 2000 s/mm 2 ) have shown the role of diffusion in differentiating benign from malignant thyroid nodules [36,37]. However, Tan et al. "s [11]. recently published study regarding IVIM in the thyroid showed that the f value achieved the highest AUC. The scanning parameter settings, signal receiving coil, delineation method, and proportions of pathological types of thyroid nodules used in their study were different from those in this study. Notably, selecting only the solid part of a lesion, i.e., the area where the microvasculature is likely to be densely distributed, may have yielded the perfusion effect reflected in the results, which was more obvious in our study than in their study.
Based on the excellent reproducibility of W-L ROI IVIM measurements, the ADC 600 and ADC 990 values of W-L ROIs were analysed. The results showed that the ADC values were higher than the D values, which may be related to the mix-up of perfusion factors in the biexponential model. This finding is consistent with those of other studies on the head and neck [8,38]. IVIM was used to identify the effects of diffusion and perfusion (pseudo-diffusion). The D value was almost as good as the diagnostic efficiency because the ADC values were obtained from W-L ROIs. These results suggested that the IVIM model is an effective method for differentiating thyroid nodules. The AUC of D combined with the f value did not significantly differ from that of D. We interpreted the reasons for the above results as follows: 1) It may be related to different proportions of pathological types of thyroid nodules. Most lesions in our study were nodular goitres (22/46) and PTC (20/46), which showed higher percentages of cystic lesions and necrosis, thus reducing the overall vascularity and, consequently, confounding the results. 2) Expanding the Table 3 Intra-and interobserver variation in the assessment of IVIM parameters with two ROI methods Values are presented as the 95% limits of agreement (LoA; mean interdevice difference, spans of limits of agreement)   [40,41]. However, the present study has several limitations. First, the sample size was relatively small, as the study was confined to a single centre, and the patients did not have a large variety of nodules, resulting in selection bias. The next step will be to estimate the required sample size for greater statistical power and collect data from multiple centres and large samples. Second, the S-S ROI approach used in this study is based on the delineation method used by most reports in the literature. However, this delineation does not include necrosis, cystic or microcalcification areas, which may have led to some missing information regarding quantitative parameters. Third, it would be useful to assess MRI performance in relation to thyroid imaging reporting and data system based on the ultrasound or histopathologic characteristics of thyroid nodules. Finally, these data were derived using a specific vendor sequence; other major vendors have offered similar sequences that can be applied to thyroid examinations, but further investigation is needed in a clinical setting.

Conclusions
In conclusion, W-L ROI delineation to derive IVIM measurements with 2D RF rFOV DWI was confirmed to be a better method with excellent reproducibility and repeatability for differentiating benign from malignant thyroid nodules.