Skip to main content

Identifying factors that may influence the classification performance of radiomics models using contrast-enhanced mammography (CEM) images

Abstract

Background

Radiomics plays an important role in the field of oncology. Few studies have focused on the identification of factors that may influence the classification performance of radiomics models. The goal of this study was to use contrast-enhanced mammography (CEM) images to identify factors that may potentially influence the performance of radiomics models in diagnosing breast lesions.

Methods

A total of 157 women with 161 breast lesions were included. Least absolute shrinkage and selection operator (LASSO) regression and the random forest (RF) algorithm were employed to construct radiomics models. The classification result for each lesion was obtained by using 100 rounds of five-fold cross-validation. The image features interpreted by the radiologists were used in the exploratory factor analyses. Univariate and multivariate analyses were performed to determine the association between the image features and misclassification. Additional exploratory analyses were performed to examine the findings.

Results

Among the lesions misclassified by both LASSO and RF ≥ 20% of the iterations in the cross-validation and those misclassified by both algorithms ≤5% of the iterations, univariate analysis showed that larger lesion size and the presence of rim artifacts and/or ripple artifacts were associated with more misclassifications among benign lesions, and smaller lesion size was associated with more misclassifications among malignant lesions (all p <  0.050). Multivariate analysis showed that smaller lesion size (odds ratio [OR] = 0.699, p = 0.002) and the presence of air trapping artifacts (OR = 35.568, p = 0.025) were factors that may lead to misclassification among malignant lesions. Additional exploratory analyses showed that benign lesions with rim artifacts and small malignant lesions (< 20 mm) with air trapping artifacts were misclassified by approximately 50% more in rate compared with benign and malignant lesions without these factors.

Conclusions

Lesion size and artifacts in CEM images may affect the diagnostic performance of radiomics models. The classification results for lesions presenting with certain factors may be less reliable.

Introduction

It is important to find an accurate and efficient way to detect and diagnose breast cancer. In recent years, radiomics has played an increasingly important role in the field of oncology [1,2,3,4]. In radiomics, a high-throughput computer algorithm extracts large amounts of image features and converts medical images into quantitative data, showing decent results [5,6,7]. For breast cancer, radiomics has been extensively studied in research settings for diagnosis, treatment evaluation, and prognosis prediction [1,2,3,4, 8].

Contrast-enhanced mammography (CEM) is a technique that can simultaneously show the morphological and angiogenic characteristics of breast lesions [9, 10] and has a high spatial resolution comparable to that of conventional mammography [11, 12]. Several studies have developed and validated radiomics models in an attempt to achieve high diagnostic accuracy for breast lesions [13,14,15,16,17,18]. Although the diagnostic performance of radiomics models is promising, concerns still persist, as radiomics approaches are often regarded as black boxes and are less acceptable for clinical application [1, 2, 19, 20]. In other words, improvement in the overall diagnostic performance of radiomics models is still difficult to convert into practical clinical benefits, such as a reduction in unnecessary biopsies. Radiomics models are still not sufficiently reliable and interpretable to be used in the real-world diagnostic setting. In addition, few studies have examined imaging factors that may influence the diagnostic performance of the models.

The purpose of this study was to examine the performance of radiomics analysis in breast cancer diagnosis and preliminarily disentangle the black box of radiomics by identifying factors that may influence the classification results of radiomics models. Our study focused on breast lesions that were more likely to be misclassified by radiomics analysis and attempted to identify the potential image features that may influence the classification results from an interpretable perspective.

Materials and methods

Study participants

This retrospective study was approved by the Institutional Review Board and Ethics Committee of the research center. The requirement for patient informed consent was waived. We collected consecutive CEM images between November 2018 and February 2020. The indications for CEM in this study included (1) problem solving for inconclusive findings on mammography or ultrasound screening; and (2) evaluation of symptomatic patients. The inclusion criteria were as follows: (1) patients with suspected breast lesions after physical examination or screening; (2) patients with referral for CEM by breast surgeons as part of diagnostic imaging; and (3) patients with final diagnoses that were confirmed by histopathological results. We excluded patients (1) with missing data and (2) with a history of breast surgery, breast radiotherapy, chemotherapy, or hormone treatment within 6 months prior to CEM examination. The patient inclusion and exclusion workflows are shown in Fig. 1. A total of 157 women with 161 breast lesions (47 benign, 29.2%; 114 malignant, 70.8%) were included in the study. The median age of the patients was 49 years (range, 21–70 years).

Fig. 1
figure 1

Patient inclusion and exclusion flowchart. CEM = contrast-enhanced mammography

CEM examination

All CEM examinations were performed using Senographe Essential mammography units (GE Healthcare). Before the examination, a dose of 1.5 mL/kg body weight iodinated contrast material (Iohexol, 300–350 mg I/mL) was injected intravenously using an automated power injector at a flow rate of 3.0 mL/s, followed by a 10-mL bolus of saline. Two minutes after the injection, bilateral craniocaudal (CC) views were obtained first, beginning with the suspicious breast. Then, bilateral mediolateral oblique (MLO) views were acquired in the same order. In a single projection, a pair of low-energy (LE) and high-energy (HE) exposures was performed within 1.5 seconds. The HE and LE images were recombined to generate dual-energy subtraction (DES) images. All of the HE, LE, and DES images were used to construct the radiomics models.

CEM image evaluation

Two radiologists with 5–10 years of experience in breast imaging reviewed and interpreted all of the CEM images to obtain the image features. The radiologists were blinded to the histopathology results. When a discrepancy occurred in image evaluation, the final decision was made by consensus. The image features could be divided into two main groups: (1) basic image features and (2) artifact features. The basic image features included breast density, degree of background parenchymal enhancement (BPE), and lesion size. Breast density (a, b, c, or d) was evaluated using the LE images according to the Breast Imaging Reporting and Data System (BI-RADS) mammography lexicon [21]. The degree of BPE (minimal, mild, moderate, or marked) was assessed using the DES images referring to the BI-RADS MRI lexicon [22]. Lesion size was obtained by calculating the mean value of the largest lesion diameters on DES images measured by two independent radiologists. The artifact features included the presence of rim artifacts, ripple artifacts, vascular artifacts, and air trapping artifacts in DES images, as these artifacts occurred more often and might interfere with image quality [23, 24]. We defined artifacts located outside the lesion area as being absent since all the radiomics features were extracted from inside the lesion area and therefore might not interfere with artifacts outside the lesion area.

In addition, we extracted three objective quantitative features that might reflect the enhancement degree of the lesions. These features include the signal-to-noise ratio (SNR), contrast-to-noise ratio (CNR), and background contrast ratio (BCR). Since these features are obtained through calculation and can also be affected by the abovementioned image features, such as artifacts, they were excluded in the factor analysis. We only examined the distribution pattern of these features among the lesions with different classification results. The detailed processes and calculation methods of these features are provided in the Supplemental Materials (Appendix E1).

Lesion delineation and feature extraction

The lesion contours were manually delineated with ITK-SNAP (version 3.6; www.itksnap.org) (25) by two radiologists together. For each lesion, a total of 6 regions of interest (ROIs) were delineated on the HE, LE, and DES images in the CC and MLO views. For multiple lesions within one breast, only the largest lesion was delineated.

Because the voxel was isotropic in-plane, we omitted the image resampling step. Gray-level discretization was performed to discretize all the images to 256 Gy levels. Spectral Mammography Kit (SMK) software (version 1.2.0, GE Healthcare) was used to extract the radiomics features. For each ROI, a total of 680 features, including 14 shape features, 18 first-order features, 24 Gy-level cooccurrence matrix (GLCM) features, 16 Gy-level run length matrix (GLRLM) features, 16 Gy-level size zone matrix (GLSZM) features, and 592 wavelet features, were extracted (Supplemental Table 1).

Statistical analysis

Feature selection and Radiomics model building

We employed two algorithms, L1-based least absolute shrinkage and selection operator (LASSO) regression [25] and the random forest (RF) algorithm [26], with all the radiomics features (680 features for each ROI), to construct the classification models. The “one-standard-error” rule [27] was used to select the best model when implementing LASSO regression. The reference standard of the classification results was the histopathological results. To obtain robust results regarding how the radiomics models classified each lesion, we conducted 100 rounds of five-fold cross-validation. During each round of cross-validation, to account for imbalanced class numbers between malignant and benign lesions, adjusted weights inversely proportional to the frequencies of each class in the training data were calculated and incorporated in building RF and LASSO regression [28,29,30]. Before analysis, all the extracted radiomics features were normalized. We performed the feature normalization using the training data and calculated the mean and standard deviation for each feature. Subsequently, the values of mean and standard deviation were used to normalize the features in the testing data. Besides, the dimensions of radiomics features were reduced using the training data (80% of the whole data). We removed highly correlated redundant radiomics features if the pairwise correlations were greater than 0.8. Specifically, if two radiomics features had a correlation greater than 0.8, the radiomics feature with the largest mean absolute correlation was removed. Then, the models were built on the remaining features in the training data and the classification results for the testing data determined by using the best cutoff value based on the Youden index [31] for both the LASSO and RF models were summarized. The area under the curve (AUC), accuracy, sensitivity, and specificity values in the testing dataset were calculated. The misclassification probability for each lesion was obtained. The details of this statistical procedure are provided in the Supplemental Materials (Appendix E2).

Definition of lesion with high/low misclassification probability

For both the LASSO and RF method, we defined a lesion as having a high misclassification probability if it was incorrectly classified for no less than 20.0% of 100 iterations and as having a low misclassification probability if it was incorrectly classified for no more than 5.0% of the iterations. To combine the results of the LASSO regression and RF models, we defined a lesion as having a high misclassification probability for both algorithms if the lesion was defined with a high misclassification probability by each algorithm at the same time; the equivalent definition was used to identify lesions with a low misclassification probability for both algorithms. Unless otherwise specified, lesions described below as having a high/low misclassification probability are those with a high/low misclassification probability as determined by both algorithms simultaneously.

Identification of factors influencing the classification performance of Radiomics models

Multivariate logistic regression was conducted using the type of lesion (high misclassification probability vs. low misclassification probability) as a dependent variable and the image features as independent variables. A factor that showed a statistically significant high or low odds ratio (OR) was determined as an influential factor.

Additional exploratory analyses

To directly evaluate how the factors identified in the previous analysis influence the performances of the radiomics models, we compared the correct classification rates between lesions with certain factors and the lesions without these factors based on the results of cross-validation.

In addition, to evaluate the performance of radiomics models on the data with/without influential factors, we performed two more sets of 100 rounds of five-fold cross-validation with both radiomics algorithms built on the data, including the lesions with/without the factors identified by the factor analysis. The AUC, accuracy, sensitivity, and specificity values in the testing dataset were calculated for comparison.

General statistical analysis

Continuous variables were described as the means ± standard deviations, and categorical variables were summarized as proportions (%). Independent t tests, Wilcoxon rank-sum tests, and Fisher’s exact tests were used as appropriate for the univariate analyses and additional exploratory analyses. A p value less than 0.050 was considered statistically significant. All analyses were implemented in R software (version 3.6.3) [32].

Results

Summary of the study cohort and image features

A summary of the study cohort and image features is shown in Table 1. The mean age and lesion size in the malignant lesion group were significantly greater than those in the benign lesion group (p <  0.050). The distributions of different types of breast densities and degrees of BPE were significantly different between the two groups (both p <  0.050). For the different kinds of artifacts, significant differences were observed in the presence of ripple artifacts (p = 0.005) and vascular artifacts (p = 0.042), but no differences were found in the presence of rim artifacts (p = 1.000) and air trapping artifacts (p = 0.104) between the two groups. For the objective quantitative features, the benign lesion group showed lower SNR (p <  0.001), CNR (p <  0.001), and BCR values (p <  0.001) than the malignant lesion group.

Table 1 Summary of the study cohort and image features

Performance of Radiomics models based on cross-validation results

For the LASSO regression models, the average AUC, accuracy, sensitivity, and specificity values were 0.926 ± 0.047, 0.895 ± 0.061, 0.891 ± 0.085, and 0.908 ± 0.096, respectively. For the RF models, the average AUC, accuracy, sensitivity, and specificity values were 0.915 ± 0.055, 0.880 ± 0.068, 0.878 ± 0.097, and 0.886 ± 0.108, respectively. The statistics of the features used ≥20% of the times by LASSO and features with the largest permutation importance scores generated by RF in the cross-validation are given in the Supplemental Tables 2 and 3.

Summary of classification results for the lesions

The lesion classification results are shown in Fig. 2 (LASSO regression) and Fig. 3 (RF). For the LASSO regression models, 20 (12.4%) of the 161 lesions (5 benign; 15 malignant) were incorrectly classified for no less than 20.0% of the 100 iterations, and 116 (72.0%) of the 161 lesions (37 benign; 79 malignant) were incorrectly classified for no more than 5.0% of the iterations; for the RF models, 33 (20.5%) lesions (8 benign; 25 malignant) were misclassified for no less than 20.0% of 100 iterations, and 116 (72.0%) lesions (35 benign; 81 malignant) were incorrectly classified for no more than 5.0% of the iterations. Based on our definition, a total of 16 (9.9%) lesions (5 benign; 11 malignant) were defined as having a high misclassification probability, and 101 (62.7%) lesions (32 benign; 69 malignant) were defined as having a low misclassification probability.

Fig. 2
figure 2

Least absolute shrinkage and selection operator (LASSO) regression radiomics model classification results for 100 rounds of cross-validation. The blue dashed line is the cutoff line for a misclassification probability of 0.05, and the red dashed line is the cutoff line for a misclassification probability of 0.20 for benign and malignant lesions. The average AUC, accuracy, sensitivity, and specificity values and the standard deviation are 0.926 ± 0.047, 0.895 ± 0.061, 0.891 ± 0.085, and 0.908 ± 0.096

Fig. 3
figure 3

Random forest (RF) radiomics model classification results for 100 rounds of cross-validation. The blue dashed line is the cutoff line for a misclassification probability of 0.05, and the red dashed line is the cutoff line for a misclassification probability of 0.20 for benign and malignant lesions. The average AUC, accuracy, sensitivity, and specificity values and the standard deviation are 0.915 ± 0.055, 0.880 ± 0.068, 0.878 ± 0.097, and 0.886 ± 0.108

Factors identified that may influence the classification performance of Radiomics models

A summary of the image features and the objective quantitative features in the subgroups of interest is shown in Table 2. The univariate analysis showed that larger lesion size (p = 0.003), the presence of rim artifacts (p <  0.001), and ripple artifacts (p = 0.042) may increase the misclassification rate for benign lesions. Among the malignant lesions, a smaller lesion size (p <  0.001) was found to be a factor that may be associated with misclassification. The distributions of the objective quantitative features are shown in Fig. 4. Among the benign lesions, compared with lesions with a low misclassification probability, lesions with a high misclassification probability showed higher values for the SNR, CNR, and BCR. Among the malignant lesions, compared with the lesions with low misclassification probability, the lesions with high misclassification probability showed lower values for the SNR, CNR, and BCR. All of the differences between the lesions with a high misclassification probability and lesions with a low misclassification probability were statistically significant (p <  0.050).

Table 2 Summary of image features and objective quantitative features in subgroups of interest
Fig. 4
figure 4

Distribution of values of quantitative features in the subgroups of interest. SNR = signal-to-noise ratio; CNR = contrast-to-noise ratio; BCR = background contrast ratio

Multivariate analysis was only performed in the malignant lesion group since the small number of lesions in the benign lesion group prevented the logistic regression model from converging. In Table 3, the results show that a smaller lesion size (odds ratio [OR] = 0.699, p = 0.002) and the presence of air trapping artifacts (OR = 36.568, p = 0.025) may be factors that may result in the misclassification of malignant lesions.

Table 3 Multivariate factor analysis results for malignant lesions in the subgroups of interest

In addition, both the univariate and multivariate analyses based on the LASSO regression models and RF models showed similar results (Supplemental Table 4-Table 7).

Results of additional exploratory analyses

Correct classification rates for lesions with/without influential factors

A summary of correct classification rates between lesions with and without certain influential factor is given in Table 4. A smaller lesion size (< 20 mm) increased the correct classification rate among benign lesions by 0.223 ± 0.098 (mean ± standard deviation) and 0.231 ± 0.095, and decreased the correct classification rate among malignant lesions by − 0.140 ± 0.049 and − 0.256 ± 0.069 for LASSO and RF, respectively. The present of rim artifacts decreased the correct classification rate among benign lesions by − 0.613 ± 0.193 and − 0.624 ± 0.140 for LASSO and RF, respectively. The present of ripple artifacts decreased the correct classification rate among benign lesions by − 0.126 ± 0.075 and − 0.165 ± 0.106 for LASSO and RF, respectively. The present of air trapping artifacts decreased the correct classification rate among malignant lesions by − 0.148 ± 0.056 and − 0.088 ± 0.054 for LASSO and RF, respectively. However, the presence of both smaller lesion size and air trapping artifacts decreased the correct classification rate among malignant lesions by − 0.458 ± 0.168 and − 0.559 ± 0.145 for LASSO and RF, respectively.

Table 4 Summary of additional exploratory analysis for correct classification rates between lesions with and without influential factors

Performance of Radiomics models in the data with/without influential factors

We performed two more sets of 100 rounds of cross-validations among the data on the lesions with or without rim artifacts, ripple artifacts, and/or air trapping artifacts (with: 87 in total, 16 benign and 71 malignant; without: 74 in total, 31 benign and 43 malignant). We only considered valid classification results without prediction issues due to the small number of prediction categories. For the LASSO regression models in lesions with/without the abovementioned artifacts, the average AUC, accuracy, sensitivity, and specificity values were 0.875 ± 0.078 vs. 0.970 ± 0.071, 0.858 ± 0.097 vs. 0.965 ± 0.066, 0.851 ± 0.099 vs. 0.967 ± 0.088, and 0.898 ± 0.123 vs. 0.967 ± 0.092, respectively. For the RF models in lesions with/without the abovementioned artifacts, the average AUC, accuracy, sensitivity, and specificity values were 0.852 ± 0.085 vs. 0.961 ± 0.094, 0.830 ± 0.100 vs. 0.952 ± 0.079, 0.822 ± 0.123 vs. 0.953 ± 0.121, and 0.907 ± 0.124 vs. 0.968 ± 0.090, respectively.

Discussion

Overall, the performance of the two algorithms (LASSO and RF) used in this study was comparable to that of the models in the published literature using radiomics features of CEM to classify breast lesions (AUC = 0.848–0.950, accuracy = 78.4–90.0%) [13,14,15,16].

The results of factor analyses showed that small lesion size and the presence of rim artifacts, ripple artifacts, and air trapping artifacts might influence classification performances in the LASSO regression models and RF radiomics models. To illustrate the findings, we provided a set of CEM images as examples in Fig. 5. As shown in Fig. 5A-C, benign lesions with larger lesion size and presenting with rim artifacts or ripple artifacts were more likely to be misclassified. Benign lesions that were less likely to be misclassified (Fig. 5D-F) were smaller in size and generally did not contain rim or ripple artifacts. In Fig. 5G-H, malignant lesions with smaller lesion size and presenting with air trapping artifacts were more likely to be misclassified. Malignant lesions that were less likely to be misclassified were generally larger and did not present with air trapping artifacts (Fig. 5J-L). The presence of artifacts seemed to be an influential factor that resulted in misclassification, and the influence could be bidirectional: some artifacts, such as rim artifacts and ripple artifacts, tended to influence the classification of a lesion as malignant, probably because these artifacts increase the signal intensity and/or heterogeneity of the lesions, while other artifacts, such as air trapping artifacts and negative enhancement artifacts, decrease the signal intensity of the lesions. Thus, lesions with such artifacts might be more likely to be classified as benign.

Fig. 5
figure 5

Examples of dual-energy subtraction (DES) images of contrast-enhanced mammography (CEM) classified by the radiomics models. A-C Examples of benign lesions with high misclassification probabilities. The lesions are annotated with arrowheads. A A 42-year-old woman with a markedly enhanced lesion in the upper quadrant of the right breast. Biopsy revealed a fibroadenoma. The diameter of the lesion is 31.5 mm (mean lesion size of all the benign lesions: 17.1 mm). The patient has marked BPE. B A 47-year-old woman with a moderately enhanced lesion in the outer quadrant of the right breast. Biopsy revealed adenosis with a fibroadenoma. Rim artifacts are present at the location of the lesion (arrows). The patient has marked BPE. C A 35-year-old woman with a moderately enhanced lesion in the lower quadrant of the left breast. Biopsy revealed an intraductal papilloma. Ripple artifacts are present at the location of the lesion (arrow). The patient has mild BPE. D-F Examples of benign lesions with low misclassification probabilities. D A 50-year-old woman with a moderately enhanced lesion in the outer quadrant of the right breast. Biopsy revealed a fibroadenoma. The diameter of the lesion is 10.5 mm. The patient has minimal BPE. E A 55-year-old woman with a mildly enhanced lesion in the outer quadrant of the right breast. Biopsy revealed a fibroadenoma. The diameter of the lesion is 8.0 mm. The patient has minimal BPE. F A 58-year-old woman with a mildly enhanced lesion in the outer quadrant of the left breast. Biopsy revealed a fibroadenoma. The diameter of the lesion is 10.3 mm. The patient has minimal BPE. G-I Examples of malignant lesions with high misclassification probabilities. The lesions are annotated with arrowheads. G A 60-year-old woman with a mildly enhanced lesion in the central area of the left breast. Biopsy revealed IDC with mucous secretion (grade III). The diameter of the lesion is 16.0 mm (mean lesion size of all malignant lesions: 28.8 mm). The patient has minimal BPE. H A 53-year-old woman with a moderately enhanced lesion in the upper quadrant of the right breast. Biopsy revealed IDC (grade II). The diameter of the lesion is 16.3 mm. The patient has minimal BPE with an air trapping artifact in the lesion area (arrow). I A 57-year-old woman with a lesion showing negative enhancement in the outer quadrant of the left breast. Biopsy revealed mucous adenocarcinoma. The diameter of the lesion is 27.5 mm. The patient has minimal BPE with negative enhancement artifacts (eclipse sign) in the lesion area (arrow). J-L Examples of malignant lesions with low misclassification probabilities. J A 58-year-old woman with a markedly enhanced lesion in the upper quadrant of the left breast. Biopsy revealed IDC (grade II). The diameter of the lesion is 31.0 mm. The patient has mild BPE. K A 49-year-old woman with a markedly enhanced lesion in the outer quadrant of the right breast. Biopsy revealed IDC (grade II). The diameter of the lesion is 39.5 mm. The patient has minimal BPE. L A 60-year-old woman with a markedly enhanced lesion in the retro-areola region of the right breast. Biopsy revealed IDC (grade II). The diameter of the lesion is 48.8 mm. The patient has minimal BPE. BPE = background parenchymal enhancement; IDC = invasive ductal carcinoma

Our findings were further examined by the results of additional exploratory analyses. Based on the cross-validation results, correct classification rates could obviously decrease (approximately 50% on average) for benign lesions with rim artifacts and smaller malignant lesions (< 20 mm) with air trapping artifacts. Furthermore, model accuracy could obviously decrease by an average of 10–12% when the analyses were only performed for lesions with rim artifacts, ripple artifacts, and/or air trapping artifacts versus lesions without the artifacts.

Our findings could also be potentially explained by objective quantitative image features in an interpretable way. The SNR, CNR, and BCR values showed significantly different distributions between lesions with high misclassification probability and lesions with low misclassification probability in both the benign and malignant lesion groups. These results were also in line with the abovementioned findings and inferences. It is worth mentioning that the quantitative features may be associated with the presence of artifacts as well, so we did not include these features in our exploratory analyses. Benign lesions with high misclassification probability showed higher signal intensity after enhancement (Fig. 5A-C), while malignant lesions with high misclassification probability showed lower signal intensity (Fig. 5G-I). Several aspects could contribute to high lesion signal intensity, including the inherent characteristics of the lesion itself and external influential factors, which may further cause lesion misclassification by the radiomics models. Several quantitative studies of CEM have demonstrated that malignant lesions tend to show more obvious enhancement than benign lesions [33,34,35]. Some studies [36, 37] have noted that the enhancement intensity depends on the size of the tumor and is more obvious for larger lesions than for smaller lesions. In other words, larger benign lesions can also display strong enhancement, and smaller malignant lesions can also display slight enhancement. Furthermore, as reported by Yagil et al. [38], rim and ripple artifacts were the main artifacts commonly seen on CEM. Researchers [23, 39, 40] have stated that DES images are prone to rim artifacts of increased density as a result of radiation scattering. Additionally, BPE, which refers to the uptake of contrast medium by normal fibroglandular breast tissue [41, 42], may also add the signal intensity of the lesions. In contrast, air trapping artifacts, which represent the presence of air and create a dark area due to incomplete contact between the skin and the detector or compression paddle [23, 24], may result in more neutral signal intensity.

Although some scholars have considered that some artifacts in CEM images might not compromise image quality [24, 38], we found that some artifacts in CEM images might affect the diagnostic performance of radiomics models, and other scholars [23, 43, 44] have proposed that some artifacts may present challenges to image interpretation. Therefore, it is still necessary to stress the importance of high-quality images. Neppalli et al. reported [45] that the type, incidence, and severity of CEM-specific artifacts differ between image device vendors. To date, several image-processing algorithms have been developed to reduce artifacts and improve the image quality [46,47,48]. For example, scatter correction techniques are becoming commercially available [48], and the rim artifacts are not present in the newer systems [24]. Furthermore, except for equipment- or technique-related factors, CEM-specific artifacts can also be alleviated by patient- or technologist-related factors. Therefore, it is also important to use standard and appropriate protocols during image acquisition and perform regular quality control tests [49] to prevent or minimize these artifacts.

There are some limitations in our study. First, the relatively small sample size is the main limitation. A larger sample may help provide more information with the same accuracy. Second, radiomics features derived from CEM, in general, could have inherent limitations caused by the two-dimensional nature of the images and compression. Third, more homogeneous baseline characteristics between benign and malignant lesions may potentially help better interpret the results. To avoid bias, we used 100 rounds of cross-validation instead of a single round to obtain “averaged” classification results. Performing factor analysis separately for benign and malignant lesions could further limit the impact of unbalanced characteristics.

Conclusions

Our study found that large lesion size and the presence of rim and/or ripple artifacts were associated with misclassification of benign lesions, and small lesion size and presence of air trapping artifacts were associated with misclassification of malignant lesions. The results imply that we should be aware that the results of radiomics models could be less reliable when these influential factors are present. Based on these findings, some methods, such as alleviating artifacts by using specific postprocessing algorithms [48], applying adequate compression of the breast [24], referring to the image information around the lesion [50], and employing an adjusted algorithm that considers these influential factors, can potentially help to build more accurate and interpretable radiomics classification models.

Availability of data and materials

The datasets generated and/or analysed during the current study are not publicly available due confidential information but are available from the corresponding author on reasonable request.

Abbreviations

AUC:

area under the curve

BPE:

background parenchymal enhancement

CC:

craniocaudal

CEM:

contrast-enhanced mammography

CI:

confidence interval

DES:

dual-energy subtraction

GLCM:

gray-level cooccurrence matrix

GLRLM:

gray-level run length matrix

GLSZM:

gray-level size zone matrix

HE:

high-energy

LASSO:

least absolute shrinkage and selection operator

LE:

low-energy

MLO:

mediolateral oblique

OR:

odds ratio

RF:

random forest

References

  1. Liu Z, Wang S, Dong D, Wei J, Fang C, Zhou X, et al. The applications of radiomics in precision diagnosis and treatment of oncology: opportunities and challenges. Theranostics. 2019;9:1303–22.

    Article  PubMed  PubMed Central  Google Scholar 

  2. Rogers W, Thulasi Seetha S, Refaee TAG, Lieverse RIY, Granzier RWY, Ibrahim A, et al. Radiomics: from qualitative to quantitative imaging. Br J Radiol. 2020;93:20190948.

    Article  PubMed  PubMed Central  Google Scholar 

  3. Conti A, Duggento A, Indovina I, Guerrisi M, Toschi N. Radiomics in breast cancer classification and prediction. Semin Cancer Biol. 2020;72:238–50.

    Article  PubMed  CAS  Google Scholar 

  4. Lee SH, Park H, Ko ES. Radiomics in breast imaging from techniques to clinical applications: a review. Korean J Radiol. 2020;21:779–92.

    Article  PubMed  PubMed Central  Google Scholar 

  5. Lambin P, Rios-Velazquez E, Leijenaar R, Carvalho S, van Stiphout RGPM, Granton P, et al. Radiomics: extracting more information from medical images using advanced feature analysis. Eur J Cancer. 2012;48:441–6.

    Article  PubMed  PubMed Central  Google Scholar 

  6. Kumar V, Gu Y, Basu S, Berglund A, Eschrich SA, Schabath MB, et al. Radiomics: the process and the challenges. Magn Reson Imaging. 2012;30:1234–48.

    Article  PubMed  PubMed Central  Google Scholar 

  7. Gillies RJ, Kinahan PE, Hricak H. Radiomics: images are more than pictures, they are data. Radiology. 2015;278:563–77.

    Article  PubMed  Google Scholar 

  8. Valdora F, Houssami N, Rossi F, Calabrese M, Tagliafico AS. Rapid review: radiomics and breast cancer. Breast Cancer Res Tr. 2018;169:217–29.

    Article  Google Scholar 

  9. Dromain C, Thibault F, Muller S, Rimareix F, Delaloge S, Tardivon A, et al. Dual-energy contrast-enhanced digital mammography: initial clinical results. Eur Radiol. 2011;21:565–74.

    Article  PubMed  Google Scholar 

  10. Ghaderi KF, Phillips J, Perry H, Lotfi P, Mehta TS. Contrast-enhanced mammography: current applications and future directions. Radiographics. 2019;39:1907–20.

    Article  PubMed  Google Scholar 

  11. Lalji UC, Jeukens CRLPN, Houben I, Nelemans PJ, van Engen RE, van Wylick E, et al. Evaluation of low-energy contrast-enhanced spectral mammography images by comparing them to full-field digital mammography using EUREF image quality criteria. Eur Radiol. 2015;25:2813–20.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Francescone MA, Jochelson MS, Dershaw DD, Sung JS, Hughes MC, Zheng J, et al. Low energy mammogram obtained in contrast-enhanced digital mammography (CEDM) is comparable to routine full-field digital mammography (FFDM). Eur J Radiol. 2014;83:1350–5.

    Article  PubMed  Google Scholar 

  13. Fanizzi A, Losurdo L, Basile TMA, Bellotti R, Bottigli U, Delogu P, et al. Fully automated support system for diagnosis of breast Cancer in contrast-enhanced spectral mammography images. J Clin Med. 2019;8:891.

    Article  PubMed Central  Google Scholar 

  14. Danala G, Patel B, Aghaei F, Heidari M, Li J, Wu T, et al. Classification of breast masses using a computer-aided diagnosis scheme of contrast enhanced digital mammograms. Ann Biomed Eng. 2018;46:1419–31.

    Article  PubMed  PubMed Central  Google Scholar 

  15. Patel BK, Ranjbar S, Wu T, Pockaj BA, Li J, Zhang N, et al. Computer-aided diagnosis of contrast-enhanced spectral mammography: a feasibility study. Eur J Radiol. 2018;98:207–13.

    Article  PubMed  Google Scholar 

  16. Fusco R, Vallone P, Filice S, Granata V, Petrosino T, Rubulotta MR, et al. Radiomic features analysis by digital breast tomosynthesis and contrast-enhanced dual-energy mammography to detect malignant breast lesions. Biomed Signal Process Control. 2019;53:101568.

    Article  Google Scholar 

  17. Losurdo L, Fanizzi A, Basile TMA, Bellotti R, Bottigli U, Dentamaro R, et al. Radiomics analysis on contrast-enhanced spectral mammography images for breast Cancer diagnosis: a pilot study. Entropy. 2019;21:1110.

    Article  PubMed Central  Google Scholar 

  18. Lin F, Wang Z, Zhang K, Yang P, Ma H, Shi Y, et al. Contrast-enhanced spectral mammography-based Radiomics Nomogram for identifying benign and malignant breast lesions of Sub-1 cm. Front Oncol. 2020;10:573630.

    Article  PubMed  PubMed Central  Google Scholar 

  19. Limkin EJ, Sun R, Dercle L, Zacharaki EI, Robert C, Reuzé S, et al. Promises and challenges for the implementation of computational medical imaging (radiomics) in oncology. Ann Oncol. 2017;28:1191–206.

    Article  CAS  PubMed  Google Scholar 

  20. Verma V, Simone CB 2nd, Krishnan S, Lin SH, Yang J, Hahn SM. The rise of Radiomics and implications for oncologic management. J Natl Cancer Inst. 2017;109.

  21. D’Orsi CJSE, Mendelson EB, Morris EA. ACR BI-RADS® atlas: breast imaging reporting and data system. Reston: American College of Radiology; 2013.

    Google Scholar 

  22. D’Orsi CJ, Sickles EA, Mendelson EB, Morris EA. ACR BI-RADS Atlas. Breast Imaging Reporting and Data System 2013.

  23. Bhimani C, Li L, Liao L, Roth RG, Tinney E, Germaine P. Contrast-enhanced spectral mammography: modality-specific artifacts and other factors which may interfere with image quality. Acad Radiol. 2017;24:89–94.

    Article  PubMed  Google Scholar 

  24. Nori J, Gill MK, Vignoli C, Bicchierai G, De Benedetto D, Di Naro F, et al. Artefacts in contrast enhanced digital mammography: how can they affect diagnostic image quality and confuse clinical diagnosis? Insights Imaging. 2020;11:16.

    Article  PubMed  PubMed Central  Google Scholar 

  25. Yuan G-X, Ho C-H, Lin C-J. An improved glmnet for l1-regularized logistic regression. J Machine Learn Res. 2012;13:1999–2030.

    Google Scholar 

  26. Breiman L. Random forests. Mach Learn. 2001;45:5–32.

    Article  Google Scholar 

  27. Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw. 2010;33:1.

    Article  PubMed  PubMed Central  Google Scholar 

  28. Maalouf M. Logistic regression in data analysis: an overview. Int J Data Analysis Techniques Strategies. 2011;3:281–99.

    Article  Google Scholar 

  29. Chen C, Liaw A, Breiman L. Using random forest to learn imbalanced data. Berkeley: University of California; 2004;110:24.

  30. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine learning in Python. J Machine Learn Res. 2011;12:2825–30.

    Google Scholar 

  31. Fluss R, Faraggi D, Reiser B. Estimation of the Youden index and its associated cutoff point. Biometrical J. 2005;47:458–72.

    Article  Google Scholar 

  32. Chambers J. Software for data analysis: programming with R: Springer Science & Business Media; 2008.

  33. Rudnicki W, Heinze S, Niemiec J, Kojs Z, Sas-Korczynska B, Hendrick E, et al. Correlation between quantitative assessment of contrast enhancement in contrast-enhanced spectral mammography (CESM) and histopathology-preliminary results. Eur Radiol. 2019;29:6220–6.

    Article  PubMed  PubMed Central  Google Scholar 

  34. Deng CY, Juan YH, Cheung YC, Lin YC, Lo YF, Lin G, et al. Quantitative analysis of enhanced malignant and benign lesions on contrast-enhanced spectral mammography. Br J Radiol. 2018;91:20170605.

    Article  PubMed  PubMed Central  Google Scholar 

  35. Lv Y, Chi X, Sun B, Lin S, Xing D. Diagnostic value of quantitative gray-scale analysis of contrast-enhanced spectral mammography for benign and malignant breast lesions. J Comput Assist Tomogr. 2020;44:405–12.

    Article  PubMed  Google Scholar 

  36. Dromain C, Vietti-Violi N, Meuwly JY. Angiomammography: a review of current evidences. Diagn. Interv. Imaging. 2019;100:593–605.

    Article  CAS  PubMed  Google Scholar 

  37. Luczynska E, Niemiec J, Heinze S, Adamczyk A, Ambicka A, Marcyniuk P, et al. Intensity and pattern of enhancement on CESM: prognostic significance and its relation to expression of Podoplanin in tumor Stroma - a preliminary report. Anticancer Res. 2018;38:1085–95.

    CAS  PubMed  Google Scholar 

  38. Yagil Y, Shalmon A, Rundstein A, Servadio Y, Halshtok O, Gotlieb M, et al. Challenges in contrast-enhanced spectral mammography interpretation: artefacts lexicon. Clin Radiol. 2016;71:450–7.

    Article  CAS  PubMed  Google Scholar 

  39. Lancaster RB, Gulla S, De Los SJ, Umphrey HR. Contrast-enhanced spectral mammography in breast imaging. Semin Roentgenol. 2018;53:294–300.

    Article  PubMed  Google Scholar 

  40. James JJ, Tennant SL. Contrast-enhanced spectral mammography (CESM). Clin Radiol. 2018;73:715–23.

    Article  CAS  PubMed  Google Scholar 

  41. Savaridas SL, Taylor DB, Gunawardana D, Phillips M. Could parenchymal enhancement on contrast-enhanced spectral mammography (CESM) represent a new breast cancer risk factor? Correlation with known radiology risk factors. Clin Radiol. 2017;72:1085.e1081–1085.e1089.

    Article  Google Scholar 

  42. Sogani J, Morris EA, Kaplan JB, D’Alessio D, Goldman D, Moskowitz CS, et al. Comparison of background parenchymal enhancement at contrast-enhanced spectral mammography and breast MR imaging. Radiology. 2016;282:63–73.

    Article  PubMed  Google Scholar 

  43. Zamora K, Allen E, Hermecz B. Contrast mammography in clinical practice: current uses and potential diagnostic dilemmas. Clin Imaging. 2021;71:126–35.

    Article  PubMed  Google Scholar 

  44. Gluskin J, Click M, Fleischman R, Dromain C, Morris EA, Jochelson MS. Contamination artifact that mimics in-situ carcinoma on contrast-enhanced digital mammography. Eur J Radiol. 2017;95:147–54.

    Article  PubMed  PubMed Central  Google Scholar 

  45. Neppalli S, Kessell MA, Madeley CR, Hill ML, Vlaskovsky PS, Taylor DB. Artifacts in contrast-enhanced mammography: are there differences between vendors? Clin Imaging. 2021;80:123–30.

    Article  PubMed  Google Scholar 

  46. Knogler T, Homolka P, Hörnig M, Leithner R, Langs G, Waitzbauer M, et al. Contrast-enhanced dual energy mammography with a novel anode/filter combination and artifact reduction: a feasibility study. Eur Radiol. 2016;26:1575–81.

    Article  PubMed  Google Scholar 

  47. Sistermanns M, Kowall B, Hörnig M, Beiderwellen K, Uhlenbrock D. Motion artifact reduction in contrast-enhanced dual-energy mammography - a multireader study about the effect of nonrigid registration as motion correction on image quality. Rofo. 2021;193:1183–8.

    Article  PubMed  Google Scholar 

  48. Lu Y, Peng B, Lau BA, Hu Y-H, Scaduto DA, Zhao W, et al. A scatter correction method for contrast-enhanced dual-energy digital breast tomosynthesis. Phys Med Biol. 2015;60:6323–54.

    Article  PubMed  PubMed Central  Google Scholar 

  49. Sensakovic WF, Carnahan MB, Czaplicki CD, Fahrenholtz S, Panda A, Zhou Y, et al. Contrast-enhanced mammography: how does it work? Radiographics. 2021;41:829–39.

    Article  PubMed  Google Scholar 

  50. Wang S, Sun Y, Li R, Mao N, Li Q, Jiang T, et al. Diagnostic performance of perilesional radiomics analysis of contrast-enhanced mammography for the differentiation of benign and malignant breast lesions. Eur Radiol. 2022;32:639–49.

    Article  PubMed  Google Scholar 

Download references

Acknowledgements

The authors are thankful to Boran Pang, MD, for providing technical support and inspiration in experimental design. Permission was obtained from him. We also acknowledge funding and support from the Clinical Research Plan of SHDC (SHDC2020CR2008A), the National Natural Science Foundation of China (NSFC82071878), Shanghai Science and Technology Foundation (19DZ1930502), Shanghai Anticancer Association EYAS PROJECT (SACA-CY20B01), Shanghai Anticancer Association FLIGHT PROJECT (SACA-AX-201903), and Shanghai Science and Technology Innovation Action Plan Medical Innovation Research Project (21Y11910200).

Funding

This project was supported by the grant from the Clinical Research Plan of SHDC (SHDC2020CR2008A), the National Natural Science Foundation of China (NSFC 82071878), Shanghai Science and Technology Foundation (19DZ1930502), Shanghai Anticancer Association EYAS PROJECT (SACA-CY20B01), Shanghai Anticancer Association FLIGHT PROJECT (SACA-AX-201903), and Shanghai Science and Technology Innovation Action Plan Medical Innovation Research Project (21Y11910200).

Author information

Authors and Affiliations

Authors

Contributions

Yuqi Sun and Simin Wang contributed equally to this work. The two authors are responsible for study design (Y.S. and S.W.), data collection (S.W), data analysis (Y.S.) and manuscript drafting (S.W. and Y.S.). The author(s) read and approved the final manuscript.

Corresponding authors

Correspondence to Henry S. Lynn or Yajia Gu.

Ethics declarations

Ethics approval and consent to participate

Institutional Review Board approval was obtained.

Consent for publication

Consent forms were obtained.

The datasets generated during and/or analyzed during the current study are not publicly available due confidentiality but are available from the corresponding author on reasonable request.

Competing interests

One of the coauthors (S.D.) is an employee of General Electric (GE) Healthcare (Shanghai, China). No other potential conflict of interest relevant to this article has been reported.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

40644_2022_460_MOESM1_ESM.docx

Additional file 1.

40644_2022_460_MOESM2_ESM.docx

Additional file 2.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Sun, Y., Wang, S., Liu, Z. et al. Identifying factors that may influence the classification performance of radiomics models using contrast-enhanced mammography (CEM) images. Cancer Imaging 22, 22 (2022). https://doi.org/10.1186/s40644-022-00460-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s40644-022-00460-8

Keywords

  • Mammography
  • Breast Cancer
  • Radiomics
  • Artifact