Patients
This retrospective study was approved by our institutional review board and was conducted by searching for electronic medical records. A total of 1076 patients who underwent tumor resection or ablation at our institution with histopathologically confirmed HCC were recruited from January 2010 to September 2015. Figure 1 depicts the patient selection flow diagram. The inclusion criteria were as follows: (1) patients who had tumor resection or ablation with curative intent between January 2010 to September 2015 and (2) those who had preoperative CT performed within one month before treatment. Patients were excluded from the study if they met the following criteria: (1) those with a history of previous HCC treatment or a combination of other malignancies (n = 397); (2) those who received a combination of other anti-tumor treatments (n = 55), such as transarterial chemoembolization (TACE), targeting therapy, radiotherapy, and so on, or palliative care (n = 33); (3) patients who lacked digital CT imaging data or patients who did not undergo pretreatment CT 1 month before tumor resection or ablation (n = 200); (4) those with major thrombosis in a branch of the portal vein, hepatic vein thrombosis, or abdominal lymph node metastasis or distant metastases that were confirmed with pathology or imaging (n = 167); or (5) those who were followed up for less than 2 years (n = 68). Therefore, the final study population included 156 patients. The entire cohort was randomly divided into a training dataset (109 cases) and validation dataset (47 cases) by a ratio of 7:3. The training dataset was used to compose models that were evaluated by the validation dataset.
Follow-up surveillance after tumor resection or ablation
Our post-treatment tumor surveillance program consisted of physical examinations and laboratory tests, including tests for serum alpha-fetoprotein (AFP), performed 1 month after surgery and then every 3 months thereafter. In addition, abdominal CECT, CEMR or CEUS imaging was performed every 3 months. The endpoint was ER, which was defined as the presence of new intrahepatic lesions or metastasis with typical imaging features of HCC, or atypical findings with histopathological confirmation within 2 years after curative resection or ablation of HCC.
CT scan protocols
CECT was performed at our institute with one of the following machines: a 64-detector row (Aquilion CXL, Toshiba Medical System, Tokyo, Japan) or 320-detector row CT machine (Aquilion One, Toshiba Medical System, Tokyo, Japan). We used the same scanning parameters for both machines as follows: tube voltage, 120 kV; tube current, 250 mA; and slice thickness, 1 mm. After a routine unenhanced scan, 1.5 mL/kg of contrast media (Ultravist, Bayer, Germany) was injected into an antecubital vein at a rate of 3.0 mL/s via a pump injector (P3T abdomen module, Medrad Inc.). Hepatic arterial phase CT images were obtained at 35 s, and portal venous phase CT images were obtained at 65 s [23, 24].
Image analysis
Two radiologists (S.T.F. and P.Z.P.), both with 15 years of abdominal CT interpretation, and both blinded to the clinical data, independently evaluated the imaging features randomly. The radiologists independently recorded incidences of PT-E (defined as detectable arterial-enhancing portions adjacent to the tumor border on arterial-phase images that became isodense with the background liver parenchyma on delayed-phase images [25]); when there were disagreements, they reached a consensus by discussion.
CT images (1 mm) on the largest cross-sectional area of the tumor, including routine unenhanced (Fig. 2a), hepatic arterial and portal venous phases, were recorded as digital imaging data and communications in medicine (DICOM) files. The slice chosen for delineating the lesion was confirmed by two radiologists in consensus.
Radiomics features extraction and radiomics models building in the training set
DICOM images were used to extract radiomics features using A. K. software (Artificial Intelligence Kit, Version 1.0.0, GE Life Science, Institute of Precision Medicine), including routine unenhanced, hepatic arterial and portal venous phases. A T-RO region of interest (ROI) was manually delineated around the lesion (Fig. 2b). A PT-RO ROI of automatically expanded 2 cm from the lesion, and if the ROI was beyond the parenchyma of the liver after the expansion, the portion beyond the parenchyma was removed manually (Fig. 2c). The radiologists tried to keep ROIs in the three phases as consistent as possible.
Radiomics features were extracted from the ROIs using the A.K. software. A total of 1044 features were extracted from one single ROI, including four types of features: gray-level histogram texture, wavelet-transformed texture, transformed matrix texture, and filter-transformed texture. With the histogram texture, we extracted the texture feature parameters and made a quantitative or qualitative description of the texture based on the gray value of the images. With the wavelet-transformed texture, we analyzed the characteristics of the ROI through different levels of resolution. The transformed matrix texture reflected the high-level information of the ROI by a series of matrix transformations. With the filter-transformed texture, we obtained a series of target features by different types of filters.
Fifty patients were randomly selected, and their ROIs (containing T-RO and PT-RO) in the selected DICOM images were delineated by two radiologists (S.T.F. and P.Z.P.) blinded to the clinical data. Then, radiologist S.T.F. finished the final 106 patient ROIs. Radiomics features were automatically extracted from the ROIs by A. K. software through computing algorithms and recorded as comma separated values (CSVs).
The radiomics features extracted from the 50 patients by radiologist S.T.F. were compared with the features extracted by radiologist P.Z.P. using an independent sample t-test or a Kruskal-Wallis H test. Interclass correlation coefficients (ICCs) were used to assess the interobserver agreement of the feature extractions. Radiomics features with an ICC greater than 0.6 (indicating moderate-excellent agreement) were recorded for further analysis.
The linear regression least absolute shrinkage and selection operator (LASSO) regression was performed to select the features [26, 27] after manually eliminating the features that had an absolute value less than 0.6 for the coefficients of ER from the radiomics features extracted by radiologist S.T.F. in the training set of 109 patients. Finally, the PT-RO model was built using the selected features extracted from the ROIs of PT-RO, and the T-RO model was built using the selected features extracted from the ROIs of T-RO.
Performance of the PT-RO model, T-RO model and PT-E
The PT-RO model, T-RO model and PT-E were first assessed in the training set and then validated in the independent validation set. The receiver operating characteristic (ROC) curve was plotted to show the prediction accuracy of predicting ER. Prediction accuracy was quantified with area under the curve (AUC). The more the ROC curve deviated from the baseline, the greater the AUC value was, which indicated higher accuracy of the prediction. The significant difference in AUC between the training and validation cohorts indicated overfitting. Calibrations (i.e., the agreement between observed outcome frequencies and predicted probabilities) were plotted to explore the predictive accuracy of the models in the validation cohort. The unreliability (U) statistic was used to assess the calibration, and P values of more than 0.05 were considered well-calibrated [28]. Decision curve analysis (DCA) was conducted to determine the clinical usefulness of the prediction models by quantifying the net benefits at different threshold probabilities in the validation cohort [29]. The more the curve deviated from the baseline, the greater the benefit was. The improvement in the predictive accuracy of the models was evaluated by calculating the integrated discrimination improvement (IDI) and the category-free net reclassification index (cfNRI). CfNRI generalizes to any upward or downward movement in predicted risks. IDI is the absolute value of the change in predicting accuracy.
Statistical analysis
The baseline information in the training and validation cohorts were compared using the chi-squared test or the Fisher exact test for categorical variables and the two-sample t-test or the Mann–Whitney U test for continuous variables. P values of less than 0.05 (two-sided) were considered statistically significant. Computer-generated random numbers were used to assign 7/10 of the patients to the training dataset and 3/10 of the patients to the validation dataset. To test the intraobserver variability of the enhancement patterns, the intraclass correlation coefficient (ICC) was calculated. An ICC greater than 0.6 indicated moderate-excellent agreement.
The ROC curves were plotted to demonstrate the performance of the PT-RO model, T-RO model and PT-E in predicting ER in the training cohort and validation cohort, and AUC was used to evaluate the accuracy of the two models and PT-E in predicting the ER. Calibration curves were plotted to explore the predictive accuracy. DCA was conducted to determine the clinical usefulness by quantifying the net benefits at different threshold probabilities in the validation cohort. The improvement in the predictive accuracy of the models was evaluated by calculating IDI and cfNRI. CfNRI generalizes to any upward or downward movement in predicted risks. IDI is the absolute value of the change in predicting accuracy. The detailed methods introducing the calibration curves, DCA, cfNRI and IDI are provided in the Additional file 1.
All statistical analyses were conducted with the open-source statistical computing environment R (R Foundation for Statistical Computing, version 3.4.1; https://www.r-project.org/). The ICC was applied with the R package “irr”. Data cleaning was conducted using the R packages “knnImputation” and “DMwR”. The “glmnet” package of R was used for the LASSO regression. Univariate and multivariate logistic regressions were calculated and plotted using the R package “glm”. The “pROC” package was used to plot the ROC curves and measure the AUC. The “CalibrationCurves” package was used for the calibration curves. The “DecisionCurve” package was used to perform DCA. CfNRI and IDI were conducted with the R package “nricens” and “PredictABEL”.