Current status of deep learning applications in abdominal ultrasonography
Article information
Abstract
Deep learning is one of the most popular artificial intelligence techniques used in the medical field. Although it is at an early stage compared to deep learning analyses of computed tomography or magnetic resonance imaging, studies applying deep learning to ultrasound imaging have been actively conducted. This review analyzes recent studies that applied deep learning to ultrasound imaging of various abdominal organs and explains the challenges encountered in these applications.
Introduction
Artificial intelligence has been applied in many fields, including medicine. Deep learning has recently become one of the most popular artificial intelligence techniques in the medical imaging field, and it has been applied to various organs using different imaging modalities. In the abdomen, the main imaging modality is computed tomography (CT) for most organs [1]; however, deep learning research regarding abdominal ultrasonography (US) is ongoing. In this article, I review the current status of the application of deep learning to abdominal US and discuss the challenges involved.
Liver
US is one of the most commonly used imaging modalities for evaluating liver disease. In particular, it is used to screen for liver tumors, to evaluate liver status in patients with chronic liver disease, and to evaluate hepatic steatosis.
Diffuse Liver Disease
Ten studies applying deep learning to liver US imaging aimed to evaluate diffuse liver disease, especially hepatic fibrosis and steatosis [2-11]. These studies are summarized in Table 1.
In terms of the type of data used, B-mode image data is the most common. This is likely because B-mode images are the simplest, making it easier to acquire data. However, to evaluate liver fibrosis, some studies have used elastography. Wang et al. [6] used the full two-dimensional (2D) shear wave elastography (SWE) region of interest and included additional B-mode imaging areas from the surrounding area, and demonstrated that the deep learning method was more accurate than 2D-SWE measurements for assessing liver fibrosis. Xue et al. [2] reported that deep learning using both B-mode and 2D-SWE images showed better performance than using only one of the two types of images. These results suggest that an analysis of the heterogeneity of intensity and texture of colored 2D-SWE and B-mode images can improve the accuracy of the assessment of liver fibrosis. The study of Han et al. [4] was the only one that used radiofrequency (RF) data (Fig. 1). RF signals are raw data obtained from US equipment that are used to generate B-mode images; however, some information is lost or altered during the conversion. Therefore, RF data contain more information than B-mode images and are less dependent on system settings and postprocessing operations, such as the dynamic range setting or filtering operations. These characteristics of RF data may be advantageous in terms of the generalizability of a deep learning model developed from RF data. However, further research is needed to determine whether using RF data will actually help.
Deep learning is basically a data-driven method. Deep learning can extract and learn nonlinear features from data; it does not extract features using domain expertise. In order to avoid overfitting, developing a deep learning model requires a large amount of data. The amount of data used in the studies included in this review is relatively small compared to the large challenge databases of optical images such as the ImageNet Large Scale Visual Recognition Challenge. However, recently published studies have tended to use a larger amount of data. For example, in the study of Lee et al. [3], 14,583 total images were used to develop and validate a deep learning model for the evaluation of liver fibrosis.
Careful and meticulous confirmation of the clinical performance and utility of a developed model is required for it to be adopted in clinical practice. This involves more than just the completeness of a model or its performance evaluation during development. Robust clinical confirmation of a model’s performance requires external validation. Furthermore, for the ultimate clinical verification of developed models, their effect on patient outcomes needs to be evaluated [12]. Of extant studies in this field, only Lee et al. [3] conducted external validation. In particular, it is desirable to perform external validation of a model’s performance in a clinical cohort that represents the target population of the developed model using prospectively collected data [12]. However, Lee et al. [3] performed external validation using retrospectively collected data in a case-control group.
Focal Liver Disease
Several studies have applied deep learning to liver US imaging to detect or characterize focal liver lesions [13-17]. These studies are summarized in Table 2.
Compared to the number of studies applying deep learning to diffuse liver disease, the number of studies applying it to focal liver disease is small. The amount of available image data for focal liver disease is relatively small compared to that available for diffuse liver disease, and the US imaging findings of focal liver lesions often overlap. Additionally, in clinical practice, US imaging is usually used as a screening tool, not as a tool for disease confirmation. These factors likely explain why there is less activity in deep learning research targeting focal liver disease than there is in targeting diffuse liver disease.
In terms of the type of data used, contrast-enhanced US (CEUS) was used to develop deep learning models in three studies [13,15,17]. Among them, Liu et al. [17] and Pan et al. [15] used a 3D-convolutional neural network (CNN). CEUS imaging incorporates information regarding space as well as time. A 2D-CNN can only analyze the spatial features, such as texture and edge, from one frame of CEUS cine images, but a 3D-CNN can analyze temporal features as well (Fig. 2).
Schmauch et al. [16] used the dataset that was provided during a public challenge during the 2018 Journées Francophones de Radiologie in Paris, France. Although their model was tested on the dataset by the challenge organizers, no detailed information was provided as to how the dataset was collected or what lesions it contained. Except for this one study, external validation has not been performed in any other study that applied deep learning to focal liver disease.
Prostate
Most studies applying deep learning to prostate US imaging have focused on detecting and grading prostate cancer and the segmentation of the prostate gland.
In the field of prostate cancer detection and its grading, a group of researchers has conducted several studies [18-24]. They used multi-parametric magnetic resonance imaging (MRI) data of the prostate gland, CEUS imaging data of suspicious lesions in multiparametric MRI, and histopathologic results of MRI and transrectal US (TRUS)-fusion guided targeted biopsies. They applied deep learning to CEUS imaging data to classify prostate lesions or grade prostate cancers [18,19,21]. In one study, they integrated multiparametric MRI and CEUS imaging data for the detection of prostate cancer [24].
TRUS is commonly used as a guiding imaging modality for prostate biopsies and for therapy of prostate cancer. An accurate delineation of the boundaries of the prostate gland on TRUS images is essential for the insertion of biopsy needles or cryoprobes, treatment planning, and brachytherapy. In addition, accurate prostate segmentation can assist in registration and image fusion of TRUS and MRI images. Manual segmentation of the prostate on TRUS imaging is time-consuming and often not reproducible. For these reasons, several studies have applied deep learning to automatically segment the prostate using TRUS imaging [25-31].
Kidney
Very few studies have applied deep learning to kidney US imaging. Zheng et al. [32] evaluated the diagnostic performance of deep learning to classify normal kidneys as well as congenital abnormalities of the kidney and urinary tract. Kuo et al. [33] and Yin et al. [34] used deep learning with kidney US imaging to predict kidney function and to segment the kidney, respectively.
Other Abdominal Organs
There are no reports of deep learning being applied to US imaging of other abdominal organs such as the pancreas or the spleen.
Challenges Applying Deep Learning to Abdominal US Imaging
Compared to CT or MRI, abdominal US imaging faces several challenges regarding the application of deep learning. First, US imaging is highly operator-dependent, both in terms of image acquisition and interpretation. In particular, obtaining images of abdominal US, in which the target organs are located deep inside the body, is more operator-dependent than US of other organs located superficially. Second, it is difficult to image organs that are found behind bone and air. Due to the rib cage and the air normally present in the bowels, abdominal US imaging windows are often limited, and it is therefore often difficult to obtain an image of an entire organ or even a good-quality image. Lastly, there is variability across US imaging systems from different manufacturers, and even those from the same company have version-specific variability. These various challenges make it difficult to standardize US images.
To overcome the challenges in applying deep learning to abdominal US imaging, efforts to reduce operator and system differences and to improve imaging technology are needed. In this respect, certain studies are worth noting. As mentioned previously, Han et al. used US RF data, and using RF data is expected to reduce the variability among US systems. Camps et al. [35] applied deep learning to automatically assess the quality of transperineal US images in a way that may help to reduce the variability in image acquisition between operators. Finally, Khan et al. [36] proposed a deep learning-based beamformer to generate high-quality US images.
Summary
In this review, recent articles that applied deep learning to US imaging of various abdominal organs are analyzed. Many studies used databases of only a few hundred images or datasets; only a few studies surveyed used thousands of images. Most studies were case-control studies at the proof of concept level. Although several studies have conducted external validation, none have done external validation on prospective cohorts. Overall, the application of deep learning to abdominal US imaging is at an early stage. However, I expect that deep learning for US imaging will continue to progress, because it has many advantages compared to other imaging modalities and efforts are being made to overcome the existing challenges.
Notes
No potential conflict of interest relevant to this article was reported.