Man or machine? Prospective comparison of the version 2018 EASL, LI-RADS criteria and a radiomics model to diagnose hepatocellular carcinoma

Background The Liver Imaging Reporting and Data System (LI-RADS) and European Association for the Study of the Liver (EASL) criteria are widely used for diagnosing hepatocellular carcinoma (HCC). Radiomics allows further quantitative tumor heterogeneity profiling. This study aimed to compare the diagnostic accuracies of the version 2018 (v2018) EASL, LI-RADS criteria and radiomics models for HCC in high-risk patients. Methods Ethical approval by the institutional review board and informed consent were obtained for this study. From July 2015 to September 2018, consecutive high-risk patients were enrolled in our tertiary care hospital and underwent gadoxetic acid-enhanced magnetic resonance (MR) imaging and subsequent hepatic surgery. We constructed a multi-sequence-based three-dimensional whole-tumor radiomics signature by least absolute shrinkage and selection operator model and multivariate logistic regression analysis. The diagnostic accuracies of the radiomics signature was validated in an independent cohort and compared with the EASL and LI-RADS criteria reviewed by two independent radiologists. Results Two hundred twenty-nine pathologically confirmed nodules (173 HCCs, mean size: 5.74 ± 3.17 cm) in 211 patients were included. Among them, 201 patients (95%) were infected with hepatitis B virus (HBV). The sensitivity and specificity were 73 and 71% for the radiomics signature, 91 and 71% for the EASL criteria, and 86 and 82% for the LI-RADS criteria, respectively. The areas under the receiver operating characteristic curves (AUCs) of the radiomics signature (0.810), LI-RADS (0.841) and EASL criteria (0.811) were comparable. Conclusions In HBV-predominant high-risk patients, the multi-sequence-based MR radiomics signature, v2018 EASL and LI-RADS criteria demonstrated comparable overall accuracies for HCC.


Background
Hepatocellular carcinoma (HCC) is the fifth most common malignancy and the second leading cause of cancer-related death worldwide [1]. Currently, all major clinical guidelines [2][3][4] recommend the noninvasive diagnosis of HCC based on characteristic imaging findings on computed tomography, magnetic resonance (MR) imaging and/or contrast-enhanced ultrasound.
With the advent of novel imaging techniques, HCC diagnostic criteria have been continuously updated to incorporate several new imaging features on various modalities, among which the European Association for the Study of the Liver (EASL) criteria have been widely considered as a reliable scheme [2]. However, many of these criteria lack clear lexicons regarding modality-specific imaging features [2,3]. Fortunately, the introduction of Liver Imaging Reporting and Data System (LI-RADS) offered the opportunity to standardize the interpretation, reporting and data collection of imaging results in patients at risk for HCC [5]. However, the assessment of several LI-RADS features can be subjective due to variations in radiologists' experience and familiarity with the system [6,7]. In addition, LI-RADS is developed and modified based predominantly on Western data [2,4], but the demand for validation of the system in Asian cohort remains vital.
Radiomics, which allows quantitative tumor behavior and heterogeneity profiling by extracting high-throughput data with advanced image processing techniques [8], may be a possible approach to improve the accuracy and reproducibility of HCC diagnosis. Previous studies have demonstrated the potential of radiomics in the diagnosis of focal liver lesions [9] and several other solid tumors [10][11][12]. However, evidence regarding the comparison between the accuracies of radiomics models and existing HCC diagnostic criteria remains limited, and few studies have optimized the radiomics model with the multidisciplinary approach.
Thus, the aim of this prospective single-center study was to develop a diagnostic radiomics model for HCC and to compare its accuracy with the version 2018 (v2018) of the LI-RADS [5] and European Association for the Study of the Liver (EASL) criteria [2] in high-risk patients with surgical histopathologic examination as the reference standard. We also explored the diagnostic benefit of the refined radiomics-clinical model incorporating both radiomics features and predictive clinical markers.

Study cohort
Ethical approval by the institutional review board and informed consent from all patients were obtained for this prospective study before the start of patient enrollment. From July 2015 to September 2018, we enrolled consecutive adult patients with hepatitis B virus infection and/or cirrhosis to undergo gadoxetic acid (Gd-EOB-DTPA)-enhanced MR imaging from our tertiary care hospital. The exclusion criteria were patients i) with Child-Pugh class C disease; ii) with any previous antitumoral treatment (e.g. locoregional, surgical, systematic etc.); iii) with any contraindication of Gd-EOB-DTPAenhanced MR imaging; iv) with inadequate image quality (e.g. substantial to severe arterial phase motion artifact); v) who did not receive or were not eligible for liver resection or transplantation in our center; vi) with inconclusive histopathologic diagnosis.

Imaging protocols
All MR examinations were performed on a MAGNETOM Skyra 3.0 T MR scanner (Siemens Healthcare, Erlangen, Germany). 0.025 mmol/kg of Gd-EOB-DTPA (Primovist®; Bayer Schering Pharma AG, Berlin, Germany) was injected at a rate of 2 ml/s. The detailed acquisition parameters were shown in the Additional file 1: Supplementary material and Table S1.

Image analysis Qualitative analysis
All MR imaging analyses were performed independently by two abdominal radiologists (with 10 years and 4 years of experience in liver imaging, respectively) who were blinded to the other imaging results, any clinical information and the final pathological diagnoses. Before start of the image analysis, both reviewers were given at least 2 months of intensive hands-on instructions in the practice of EASL v2018 and LI-RADS v2018 on Gd-EOB-DTPA-enhanced MR imaging.
Observations were diagnosed as HCC if they displayed a combination of arterial phase hyperenhancement and washout on portal venous phase exclusively by the EASL v2018 criteria [2]. Using all major, ancillary and LR-M features, each observation was assigned to an LR category according to the LI-RADS v2018 criteria by navigating the diagnostic algorithm in a stepwise fashion [5]. LR-4 V, LR-5 V or LR-MV was defined as LR-TIV contiguous with LR-4, LR-5 or LR-M lesions, respectively. All patient images were provided to the reviewers in random sequences, and both reviewers were asked to gap for at least 1 month between evaluating according to LI-RADS v2018 and evaluating according to EASL v2018 criteria. Disagreements regarding the LR categorization and HCC diagnosis were resolved by consensus with a senior abdominal radiologist with over 30 years of liver imaging experience.

Radiomics analysis
3D regions of interest were placed manually by delineating along the entire tumor margin on T2-weighted, T1weighted in−/opposed-phase, unenhanced, arterial phase, portal venous phase, and hepatobiliary phase images to avoid major vessels and any marked necrotic areas with the 3D segmentation software ITK-SNAP [13] (version 3.6.0-RC1; http://www.itk-snap.org). The free-hand outlines were independently drawn by the two radiologists who conducted qualitative image analyses.
Radiomics analysis was performed with in-house texture analysis algorithms using the nonpublic scientific research 3D analysis software Analysis Kit (version v3.0.1. A, GE Healthcare, China). To standardize the imaging data of all MR images, the signal intensity is aligned to the same level by changing the formula of the original radiomics feature. In the processing of the pixel size, we pushed the wavelet transformation and calculated all features repeatedly. Using bin size as the variable point, one of the key processes in the standardization of feature extraction was feature discretion, which had a substantial impact on the value of the radiomics features. A total of 396 radiomics features from the categories of histogram, gray-level cooccurrence matrix, run-length matrix, gray-level size zone matrix, form factor and Haralick were extracted from each MR image.

Construction and validation of the radiomics models
All nodules were randomly divided into a training cohort (137 nodules [60%] in 133 patients) and a validation cohort (92 nodules [40%] in 78 patients) using repeated stratified splitting method to reduce the bias selection of a single validation dataset. In a multivariate analysis, the number of events should be no less than 10 times the number of included covariates [14]. Therefore, we applied the least absolute shrinkage and selection operator (LASSO) model [15] with 10-fold cross-validation to select radiomics features with the strongest diagnostic powers in the training data set. Radiomics features with an intraclass correlation coefficient over 0.80 between two reviewers were considered stable and entered into further radiomics model construction [16]. A radiomics score (Rad-score) of each MR sequence was calculated by a linear combination of the selected radiomics features weighted by the corresponding LASSO regression coefficients as: Where a n is the LASSO regression coefficient of variable n, X n is the value of the variable n determined from the input MR image and b is the intercept. A summarized Radscore of all sequences was generated by a linear combination of the Rad-score of each sequence weighted by its logistic regression coefficient to construct the diagnostic radiomics signature. The radiomics signature was further integrated with clinical markers that were independently predictive for HCC diagnosis in the training cohort to formulate a radiomics-clinical nomogram with multivariate logistic regression analysis. The performances of the radiomics signature and radiomics-clinical nomogram were evaluated in the validation cohort (Fig. 2).

Reference standard
Histopathologic examination of the resected or explanted liver was used as the reference standard for all lesions. Two experienced pathologists (with 8 years and over 20 years of experience in liver oncology, respectively), who were aware of the clinical data and imaging results for co-localization of the target lesions, independently performed gross and histologic analyses of all resected or explanted specimens. All disagreements were resolved by consensus. Histopathologic diagnoses of the hepatic lesions were established according to the World Health Organization classification [17].
Per-lesion diagnostic performances were assessed by sensitivities, specificities, positive predictive values (PPVs), negative predictive values (NPVs) and receiver operating characteristic (ROC) analysis. Diagnostic measures were compared with the McNemar test or the method described by DeLong et al [18], where applicable. Comparisons of diagnostic accuracies between the EASL and LI-RADS criteria were conducted in the combined cohort comprising all patients, while all comparisons were made in the validation cohort between the radiomics signature and EASL or LI-RADS criteria.
All statistical analyses were performed with R software, version 3.3.1 (The R Foundation for Statistical Computing, Vienna, Austria). P values for multiple comparisons were adjusted by the Bonferroni method, and p < 0.05 was considered statistically significant.
Among the included patients, 201 (95%) were infected with HBV. No difference of the nodule type proportion (HCC, non-HCC malignancy and non-HCC benign lesion) or any demographic, clinical or biological characteristic was detected between the training and validation cohorts (p > 0.05 for all). Table 2 summarizes the interrater reliability results of the EASL v2018 and different LI-RADS categories for all 229 nodules. Agreement was substantial between the two reviewers for each LI-RADS category (κ = 0.7437), the combination of LR-5/LR-5 V (κ = 0.6542), LR-4/LR-4 V/LR-5/LR-5 V (κ = 0.7109) and the EASL v2018 results (κ = 0.6809).

Interrater agreement assessment
Agreement was substantial to almost perfect for all LI-RADS major features and most ancillary and tiebreaking  Table S2). Agreement was not evaluated for nodule size or growth, which were provided to the reviewers.

Construction and validation of the radiomics models
After LASSO regression analysis in the training data set, a total of 18 features with nonzero regression coefficients were extracted from T1-weighted inphase, opposed-phase, arterial phase, portal venous phase images and T2-weighted images (Additional file 3: Table S3). After multivariate logistic regression analysis, the summarized Rad-score (Fig. 3a) revealing the radiomics information of all predictive sequences was generated as: Serum AFP (p<0.001), HBsAg (p = 0.01), AST (p = 0.046), IBIL (p<0.001) and ALB (p = 0.049) were significantly predictive of HCC after multivariate logistic regression analysis in the training data set and were incorporated with the Rad-score to formulate a radiomics-clinical nomogram (Fig. 3c).
Diagnostic accuracy of the radiomics models, EASL and LI-RADS criteria Table 3 summarizes the diagnostic performances of the radiomics model, EASL and LI-RADS v2018 criteria by consensus.

The radiomics models
The AUCs of the radiomics signature were 0.861 and 0.810 in the training and validation cohort, respectively (Fig. 3b). These measures were 0.982 and 0.866 for the radiomics-clinical nomogram in the training and validation cohort, respectively. In the validation cohort, the sensitivity, specificity, PPV and NPV of the radiomics signature and radiomics-clinical model were 73, 77, 91,  47 and 77%, 68, 89, 48%, respectively. No difference was detected between any paired diagnostic measure for the radiomics signature and radiomics-clinical model in the validation cohort (Fig. 3d) or for the radiomics signature in the training and validation cohorts (Fig. 3b).

Comparisons between the radiomics signature, the EASL and LI-RADS criteria
Diagnostic results by LR-5/LR-5 V were used to represent the LI-RADS v2018 performances. After p value adjustment for multiple comparisons, the v2018 EASL and LI-RADS criteria yielded comparable diagnostic accuracies for HCC irrespective of underlying cirrhosis or lesion size. In the validation cohort, the EASL v2018 demonstrated significantly higher sensitivity than the radiomics signature in all nodules (p = 0.01), cirrhotic livers (p = 0.01) and in nodules ≤2 cm (p = 0.03). The radiomics signature is more specific than the EASL (p = 0.01) and LI-RADS (p = 0.045) in non-cirrhotic livers. The AUCs of all three diagnostic models were comparable in the validation data set.

Discussion
Both updated in 2018, the EASL and LI-RADS criteria are currently the most widely used diagnostic criteria for HCC. However, concerns have been raised for both criteria regarding their applicability in Asian cohort and with hepatobiliary-specific contrast agents. Advances in radiomics have led to improved tumor-heterogeneity quantification and may assist in liver lesion characterization [9]. In this prospective study, we found that the multi-sequence-based MR radiomics signature, the LI-RADS v2018 and the EASL v2018 demonstrated comparable diagnostic accuracies for HCC in high-risk patients. First, we constructed a multi-sequence-based MR radiomics signature in the training cohort and compared its diagnostic accuracy with EASL and LI-RADS criteria exclusively in the validation cohort to eliminate the effect of overfitting. We found that the AUCs of the radiomics signature were similar to EASL and LI-RADS criteria irrespective of lesion size and the presence of underlying cirrhosis. Notably, in non-cirrhotic patients, the radiomics signature demonstrated 100% specificity, which was significantly higher than both EASL (p = 0.008) and LI-RADS (p = 0.045) criteria, with an excellent AUC of 0.923. Since HBV chronic infection is currently the leading risk factor for HCC in Asian countries [3] and in this context many HCCs can develop without cirrhosis, the radiomics signature may play a pivotal role in increasing the diagnostic specificity and overall accuracy for these patients. However, the radiomics signature was less sensitive than EASL criteria, particularly in cirrhotic livers and for lesions≤2 cm, and these might have been explained by the fact that radiomics signatures constructed in small lesions could not usually provide sufficient biological information in a reliable fashion, as many such small lesions have not developed in the full spectrum [19].
Extracted from clinical radiologic images, radiomics features can indicate the gene expression profiles of  HCC [20] and reveal key phenotypic characteristics including tumor growth and vascular invasion [21][22][23]. In our multi-sequence-based radiomics signature, most extracted imaging features belonged to the gray-level cooccurrence matrix (61%, 11/18) and run-length matrix (28%, 5/18) categories. Gray-level co-occurrence matrix parameters can depict tumor texture described by pixel spatial relationships [24]. Run-length matrix features enable evaluation of the complex 3D structures labelled with the same grey level values and have been reported to indicate HCC aggressiveness on Gd-EOB-DTPA-enhanced MR imaging [19]. However, the one-to-one correlations between numerous radiomics features and complex tumor biology processes are still unclear and need to be explored in further studies. Interestingly, we found that the radiomics-clinical model incorporating predictive clinical markers showed no diagnostic benefit compared with the sole radiomics signature. This finding highlighted the central role of imaging examinations in HCC diagnostic workflow and indicated that clinical markers may provide limited information for liver lesion characterization in high-risk patients.
Afterwards, we compared the performances between EASL and LI-RADS criteria in the combined cohort comprising all patients. Both criteria demonstrated similar diagnostic accuracies irrespective of lesion size and the underlying cirrhosis status, which were in line with the study of Ronot et al [25]. However, despite that both EASL and LI-RADS were developed and modified in order to be nearly 100% specific, we reported relatively low specificities of both criteria. These results were not in accordance with previous studies [25][26][27][28], in which the specificities of previous EASL and LI-RADS criteria reached up to 87.6-98.6% [25,26] and 83.6-100% [25][26][27][28], respectively. Therefore, we explored origins of the restricted specificities on a per-lesion level. Among all false-positive cases, 9 (Fig. 4) were misclassified by both EASL and LI-RADS criteria (cHCC-CCA: n = 3; ICCA: n = 2; neuroendocrine tumor: n = 2; inflammatory pseudotumor: n = 1; angioleiomyolipoma: n = 1), 7 exclusively by EASL criteria (ICCA: n = 5; cHCC-CCA: n = 1; dysplastic nodule: n = 1) and 1 exclusively by LI-RADS criteria (ICCA). 85% (6/7) of the false-positive lesions misdiagnosed exclusively by EASL criteria presented the "targetoid appearance", a target-like imaging morphology as a result of the highly cellular peripheral area surrounding the central fibrotic/ischemic stroma according to LI-RADS Fig. 4 Gd-EOB-DTPA-enhanced MR images of a 47-year-old man with chronic HBV infection and pathologically proven cirrhosis. Images of unenhanced phase (a) show a hypointense mass predominantly in segment VI. The mass demonstrates typical arterial phase (b) hyperenhancement (not rim), portal venous phase (c) washout and moderate T2 hyperintensity (e). No targetoid appearance is identified on hepatobiliary phase (d) or diffusion-weighted (f, b = 1200s/mm 2 ) images. Note the peritumoral corona enhancement (b, white arrow heads) pattern in arterial phase due to venous drainage from the tumor. The mass was histopathologically proven to be intrahepatic cholangiocarcinoma with hematoxylin-eosin staining at 200 × magnification (g). Cytokeratin 19 is positive at 200 × magnification with immunohistochemical staining (h). The serum alphafetoprotein (4.91 ng/ml) and carcinoembryonic antigen 19-9 (17.44 U/ml) levels were within the normal range criteria [5]. This feature is highly indicative of ICCA, cHCC-CCA and other non-HCC malignancies. In our study, the "targetoid appearance" was significantly more common in non-HCC malignancies (75.0%) than in HCCs (7.5-9.8%) (both p < 0.001), as previously reported [7,29]. Thus, a possible approach to improve the specificity of EASL criteria for HCC is to eliminate the effect of the "targetoid appearance" from the diagnostic algorithm.
However, neither EASL nor LI-RADS criteria demonstrated satisfactory specificities even after eliminating the effect of the "targetoid appearance", particularly in differentiating between HCC and non-HCC malignancies in cirrhotic patients. One possible explanation was that 49% (112/229) of the included lesions were>5 cm. As larger lesions are more likely to demonstrate significant intratumoral heterogeneity and atypical imaging features, differential diagnosis of these tumors can be particularly challenging due to considerable clinical and imaging overlaps. By subgroup analysis, we reported the lowest specificities for both EASL and LI-RADS criteria in nodules>5 cm, which might have affected the overall diagnostic results substantially. Another likely explanation for the limited specificities was that 64% (134/211) of the included patients were cirrhotic, and small duct type ICCAs and cHCC-CCAs, can mimic HCCs in cirrhotic patients [30][31][32]. Similarly, Choi et al reported a relatively low specificity (87%) for LI-RADS v2017 in differentiating between HCC, ICCA and cHCC-CCA in HBV-predominant patients [32]. As both EASL and LI-RADS were developed in Western countries, where hepatitis C virus infection is the most important risk factor for HCC [2,4], the diagnostic dilemma caused by these mimickers in chronic HBV patients may not be well addressed by either EASL or LI-RADS criteria.
In summary, the radiomics signature demonstrated comparable AUC for HCC with the v2018 EASL and LI-RADS but significantly higher specificity in noncirrhotic patients, which may be clinically beneficial for patient with chronic HBV infection. However, the sensitivity of it was limited and the diagnostic results were difficult to interpret. In addition, radiomics results are prone to overfitting and the influence of imaging collection and modality variation [33,34]. Thus, one of the key aspects of applying radiomics results in daily clinical practices is optimal acquisition and integration of curated data in a standardized and reproducible manner.
The EASL criterion is currently the most widely used diagnostic criteria for HCC. It is sensitive for small lesions, easy to apply and does not require the use of advanced imaging techniques. However, its accuracy might be restricted by relatively low specificity. LI-RADS empowers HCC probability assessment by integrating various imaging features with standardized interpretation and reporting. However, the diagnostic performances of LI-RADS were suboptimal in our HBV-predominant cohort. Apart from the geographical discrepancies of HCC between Western and Eastern cohorts, another possible explanation for the suboptimal performance of LI-RADS in this study might be the fact that LI-RADS was predominantly designed for MR using extracellular contrast agents instead of Gd-EOB-DTPA. Therefore, further tailoring of the system in Asian cohort using Gd-EOB-DTPA is necessary to optimize patient management. In addition, all LI-RADS ancillary features are weighted equally and optional, but some features (e.g. hepatobiliary phase hypointensity and restricted diffusion) may merit more emphasis or weighting [35]. Notably, combining LR-4 with LR-5 [26,27] might be a possible approach to improve the sensitivity of LI-RADS in Eastern cohort.
This study has several limitations. First, the consecutive prospective cohort consisted of limited numbers of non-HCC and small HCC lesions. The small sample sizes of these specific categories of hepatic nodules might introduce significant selection bias to our diagnostic results. However, only patients with reliable pathological results were included, and many patients with small HCCs or non-HCC lesions were excluded because they were not candidates for surgery (e.g., some non-HCC benign lesions), received alternative therapies (e.g., ablation for small HCCs) or did not have conclusive histopathologic results. However, a different study design, such as using either histopathologic diagnosis or imaging follow-up as the reference standard might provide a larger number of these lesions. Second, we did not conduct multicenter external validation for the radiomics models due to dramatic variations in MR imaging protocols and surgical procedures across different centers. To overcome this limitation, we assessed the performance of the radiomics-clinical model in an independent validation cohort in our center. However, further prospective studies with multicenter large-scale external validation are warranted to assess the reproducibility and generalizability of the reported findings.

Conclusions
The multi-sequence-based MR radiomics signature was significantly more specific in non-cirrhotic patients than v2018 EASL and LI-RADS criteria for HCC in HBVpredominant high-risk patients. However, the radiomics signature was less sensitive than v2018 EASL. The overall accuracies of these three diagnostic approaches were comparable.