AI Revolutionizes Ultrasound Diagnosis of Liver Lesions
In the rapidly evolving landscape of medical imaging, artificial intelligence (AI) is emerging as a transformative force, particularly in the field of ultrasound diagnostics. A recent comprehensive review published in the Academy Journal of the Chinese PLA Medical School highlights significant advancements in AI-assisted ultrasound for diagnosing liver lesions, offering new hope for earlier and more accurate detection of liver diseases.
The liver, one of the body’s most vital organs, is susceptible to a range of pathologies, from diffuse conditions like non-alcoholic fatty liver disease (NAFLD) to focal abnormalities such as tumors and cysts. Ultrasound has long been a frontline tool in hepatology due to its non-invasive nature, real-time imaging capabilities, portability, and absence of ionizing radiation. However, despite its widespread use, conventional ultrasound interpretation relies heavily on operator expertise and subjective visual assessment, leading to variability in diagnostic accuracy and inter-observer consistency.
This is where artificial intelligence steps in. As outlined in the review by Wang Yanjie, Song Qing, Han Peng, and Luo Yukun from the Department of Ultrasound at the First Medical Center of the Chinese PLA General Hospital and the Chinese PLA Medical School, AI—particularly deep learning (DL)—is poised to overcome many of the limitations inherent in traditional ultrasound diagnostics. By leveraging large datasets and sophisticated algorithms, AI models can learn to recognize subtle imaging patterns that may elude even experienced sonographers, thereby enhancing diagnostic precision and reproducibility.
One of the most pressing public health challenges addressed in the review is NAFLD, a condition that affects an estimated 15% to 30% of adults in China and has become the most prevalent chronic liver disease in the country. The diagnosis of NAFLD using conventional two-dimensional ultrasound typically depends on qualitative visual cues such as increased echogenicity, attenuation of posterior echoes, and diminished visualization of intrahepatic vessels. These criteria, while useful, are highly subjective and prone to inter-rater variability.
To address this, researchers have turned to AI-driven quantitative methods. Han and colleagues developed a model using radiofrequency (RF) ultrasound data trained against the gold standard of MRI-derived proton density fat fraction (MRI-PDFF). Their one-dimensional convolutional neural network achieved diagnostic performance exceeding 90% and predicted fat fraction with a mean error of just 0.8%, demonstrating that AI can provide a reliable, non-invasive, and quantitative assessment of hepatic steatosis.
Further enhancing the accuracy of fatty liver grading, Byra et al. employed transfer learning—a technique where a pre-trained deep learning model is fine-tuned for a specific task—on ultrasound images to assess steatosis severity. Their model, which combined high-level feature extraction with lasso regression, achieved an area under the curve (AUC) of 0.977 when validated against liver biopsy results, outperforming traditional methods such as the gray-level co-occurrence matrix and liver-kidney contrast index. This underscores the power of deep learning in extracting meaningful diagnostic information from complex image data.
Cao and team explored three different post-processing approaches—envelope signal analysis, grayscale analysis, and a deep learning index—for classifying liver images into four categories of steatosis. All three methods achieved AUCs above 0.85 in distinguishing the presence of NAFLD, with the deep learning index reaching 0.933. While promising, the study acknowledged limitations, including the lack of histopathological or advanced imaging validation and reliance on subjective expert grading, which may explain its reduced performance in differentiating mild from moderate steatosis.
In another innovative approach, Chen et al. compared a VGG-16-based deep learning algorithm with ultrasound entropy imaging, a method rooted in statistical analysis of backscattered signals. Both techniques showed strong performance in identifying moderate to severe fatty liver, but entropy imaging slightly outperformed the deep learning model and offered additional insights into tissue microstructure. This suggests that hybrid approaches combining AI with physics-based signal analysis may yield superior diagnostic tools.
Perhaps the most striking result came from Zamanian et al., who developed a combinational deep learning algorithm using four different pre-trained transfer learning models. Their system achieved an astonishing AUC of 0.9999 in detecting the presence of NAFLD. However, the authors caution that the dataset was limited to 55 severely obese patients undergoing bariatric surgery, with minimal representation of mild or moderate cases. This highlights a critical challenge in AI development: the risk of selection bias and the need for diverse, representative training data to ensure generalizability across broader patient populations.
Beyond fatty liver disease, AI is making significant inroads in the assessment of liver fibrosis—a progressive condition that, if untreated, can lead to cirrhosis and hepatocellular carcinoma. The transition from fibrosis to cirrhosis represents a critical juncture in patient management, as treatment strategies differ significantly between stages. Accurate staging is therefore essential for clinical decision-making.
Early efforts in this domain utilized texture analysis based on algorithms such as gray-level gradient co-occurrence matrices and gray-level co-occurrence matrices. Gao et al. applied these techniques in conjunction with a multilayer feedforward neural network to classify liver fibrosis into five stages, achieving over 70% accuracy across all categories and 100% accuracy in distinguishing normal liver (S0) from advanced fibrosis (S4). However, such methods are sensitive to variations in ultrasound machine settings and time-gain compensation, limiting their robustness.
To improve reliability, high-frequency ultrasound and radiofrequency time-series analysis have been explored. Gao Yongzhen and colleagues extracted seven backscattered echo features to build a classification model that achieved 87.5% accuracy in differentiating normal liver from cirrhotic tissue. Acharya et al. used curvelet transform and entropy features to classify normal liver, fibrotic liver, and cirrhotic liver, achieving an impressive 97.33% accuracy.
More recently, Cheng et al. analyzed ultrasound RF data from 160 rats with induced liver fibrosis, applying a bidirectional long short-term memory (BiLSTM) network to predict fibrosis stage. Their model showed excellent correlation with histopathological staging (R² > 0.93), demonstrating the potential of AI in longitudinal monitoring of disease progression.
In human studies, Fu Tiantian and colleagues combined conventional grayscale ultrasound with two-dimensional shear wave elastography—a technique that measures tissue stiffness—to train a LeNet-5 deep learning model. Their system achieved up to 91.8% accuracy in classifying fibrosis stages, illustrating the value of multimodal data integration.
Xue et al. took this a step further by fusing grayscale and elastographic data within a transfer learning framework, achieving AUCs of 0.950, 0.932, and 0.930 for predicting stages S4, ≥S3, and ≥S2 fibrosis, respectively. These results surpass the performance of conventional serum biomarkers, suggesting that AI-enhanced ultrasound could become a first-line tool for non-invasive fibrosis staging.
Zhang Wanming’s work on radiomics—the high-throughput extraction of quantitative features from medical images—further advances this field. By analyzing 321 chronic hepatitis B patients and selecting 121 radiomic features, his team developed a model that achieved an AUC of 0.88 in detecting early fibrosis. This demonstrates that AI can capture subtle, subvisual patterns in ultrasound images that correlate with early pathological changes.
An innovative line of research focuses on the liver capsule, the thin connective tissue layer surrounding the organ. In early cirrhosis, the capsule becomes irregular and nodular. Liu et al. developed a method to manually trace the liver capsule in high-frequency ultrasound images, enabling early detection of cirrhosis. Building on this, they later introduced an automated capsule detection and analysis network, achieving a maximum AUC of 0.97. This approach exemplifies how AI can shift diagnostic paradigms by focusing on novel anatomical and biomechanical markers.
The application of AI extends beyond diffuse liver diseases to the detection and characterization of focal liver lesions. These include benign entities such as hemangiomas and cysts, and malignant tumors such as hepatocellular carcinoma (HCC) and metastases. Distinguishing between these is crucial, as benign lesions often require only surveillance, while malignant ones demand prompt intervention.
Early computer-aided diagnosis (CAD) systems relied on handcrafted texture features extracted from B-mode ultrasound images. Virmani et al. used wavelet-based texture descriptors with a support vector machine (SVM) classifier to differentiate normal liver, cirrhotic liver, and HCC, achieving an overall accuracy of 88.8% and a sensitivity of 86.6% for HCC detection. Other studies explored multi-class classification models for distinguishing cysts, hemangiomas, HCC, and metastatic cancers, laying the groundwork for more sophisticated AI systems.
With the advent of deep learning, the field has undergone a paradigm shift. Unlike traditional methods that require manual feature engineering, deep learning models automatically learn hierarchical representations from raw image data. Schmauch et al. applied a 50-layer residual neural network (ResNet-50) to classify 367 focal liver lesion images, achieving an average AUC of 0.935 for benign versus malignant differentiation. Similarly, Xi et al.trained a residual network on 596 ultrasound cases, achieving an overall accuracy of 0.84—comparable to the performance of two experienced radiologists.
Perhaps the most compelling evidence comes from a multicenter study involving 2,143 patients and 24,343 ultrasound images. The researchers developed a deep convolutional neural network that achieved an AUC of 0.924 in predicting lesion malignancy. Notably, this performance was on par with contrast-enhanced CT and ultrasound contrast imaging, and only slightly inferior to contrast-enhanced MRI—the current clinical gold standard for liver lesion characterization.
Despite these impressive results, the authors emphasize that most existing models are limited to binary classification (benign vs. malignant) and lack the granularity needed for precise pathological subtyping. Given that different tumor types require distinct treatment pathways, the development of fine-grained classification models remains a critical unmet need.
Beyond diagnosis, AI is also being applied to interventional guidance. For instance, in microwave ablation therapy for liver tumors, accurately delineating the ablation zone in real time is essential to ensure complete tumor destruction while sparing healthy tissue. Zhang et al. developed a convolutional neural network using 1,640 backscattered RF signals from porcine liver ablation experiments. Their model achieved AUCs above 0.85 in monitoring thermal damage, outperforming conventional 2D ultrasound, and showed strong correlation with gross pathological findings. This suggests that AI could enhance the safety and efficacy of minimally invasive liver procedures.
Despite these advances, the integration of AI into clinical ultrasound practice faces several challenges. First, ultrasound images are inherently noisy, with artifacts such as speckle, low contrast, blurred boundaries, and intensity inhomogeneity that can confound AI models. Second, unlike CT or MRI, ultrasound is highly operator-dependent. Variations in probe positioning, scanning technique, and machine settings (e.g., 2D gain, mechanical index, time-gain compensation) introduce significant variability in image appearance, posing a challenge for model generalizability.
Moreover, most current AI models are trained on single-center datasets, limiting their external validity. To achieve broad clinical adoption, models must be validated across diverse populations, imaging platforms, and geographic regions. Additionally, while image data is central, incorporating clinical history, laboratory results, and patient demographics could further enhance diagnostic accuracy through multimodal AI frameworks.
The future of AI in liver ultrasound lies in the development of robust, generalizable, and clinically integrated systems. As models become more sophisticated and datasets more diverse, AI has the potential to standardize diagnostic workflows, reduce inter-observer variability, and support less experienced practitioners. Ultimately, this could democratize access to high-quality liver diagnostics, particularly in resource-limited settings.
In conclusion, the convergence of artificial intelligence and ultrasound imaging is ushering in a new era of precision hepatology. From quantifying fat content in NAFLD to staging fibrosis and characterizing focal lesions, AI is augmenting the capabilities of sonographers and radiologists alike. While challenges remain, the trajectory is clear: AI is not replacing clinicians, but empowering them with tools that enhance diagnostic confidence, improve patient outcomes, and advance the standard of care.
Wang Yanjie, Song Qing, Han Peng, Luo Yukun. Academy Journal of the Chinese PLA Medical School. DOI: 10.3969/j.issn.2095-5227.2021.11.022