Artificial intelligence in breast ultrasonography
Article information
Abstract
Although breast ultrasonography is the mainstay modality for differentiating between benign and malignant breast masses, it has intrinsic problems with false positives and substantial interobserver variability. Artificial intelligence (AI), particularly with deep learning models, is expected to improve workflow efficiency and serve as a second opinion. AI is highly useful for performing three main clinical tasks in breast ultrasonography: detection (localization/segmentation), differential diagnosis (classification), and prognostication (prediction). This article provides a current overview of AI applications in breast ultrasonography, with a discussion of methodological considerations in the development of AI models and an up-to-date literature review of potential clinical applications.
Introduction
Breast cancer is the most common type of cancer in women in Korea according to data from the Korea National Cancer Incidence Database [1]. Its incidence rate has been increasing, with an annual percentage rise of 6%, and is expected to further increase in the next 10 years. The rise in cases may be related to reproductive/lifestyle factors and an aging society [2]. These epidemiological findings underscore the importance of effective and accurate diagnoses of breast cancer using mammography and ultrasonography, which lead to an increased workload for radiologists. Although mammography is known to reduce breast cancer mortality, mammography is limited as a diagnostic modality because of its wide variability in interpretation and diagnostic performance among radiologists. A benchmark study showed that considerable rates of radiologists in the Breast Cancer Surveillance Consortium had suboptimal performance measures in terms of the abnormal interpretation rate and specificity [3]. Moreover, mammography has an intrinsic problem of imperfect sensitivity due to obscured cancers, especially in women with dense breasts. As breast density notification legislation in the United States has been widely implemented, radiologists are now required to inform women with dense breasts that additional supplemental screening modalities (e.g., ultrasonography) may be necessary [4,5].
Ultrasonography is the mainstay modality for differentiating between benign and malignant breast masses and has been traditionally used in the diagnostic setting. Due to growing evidence that ultrasonography can detect mammographically occult cancers, interest in the use of ultrasonography for screening has increased [6,7]. Moreover, ultrasonography has several advantages in comparison with other modalities (e.g., mammography, digital breast tomosynthesis, and magnetic resonance imaging). It is generally safer (non-ionizing), more economical, easy to use, and allows real-time guidance and monitoring. However, interobserver variability is high in the acquisition and interpretation of ultrasound images, even among experts [8], which contributes to a high rate of false positives, leading to unnecessary biopsies and surgical procedures.
Deep learning (DL), as a subset of artificial intelligence (AI), has made great strides toward the automated detection and classification of medical images. For mammography, as a modality with commercially available DL-based decision support systems, recent validation studies have shown that several AI systems can perform at the level of radiologists. These studies suggest that AI has the potential to democratize expertise in settings with a lack of experienced radiologists. In addition, the radiologists’ workload is reduced by improving workflow efficiency, and AI systems prevent overlooked findings or interpretation errors by giving a second opinion. These advantages may translate to ultrasonography. In recent years, DL algorithms have been increasingly applied to breast ultrasonography, mostly in feasibility studies for automated detection, differential diagnosis, and segmentation [9-15]. The development of DL-based AI systems for ultrasonography is in its early stages relative to mammography, and ultrasonography has unique characteristics in terms of the development process. This article provides a current overview of AI applications in breast ultrasonography, along with a discussion of methodological considerations in the development of these applications and an up-to-date literature review of potential clinical applications.
Methodological Considerations
Datasets
AI is a data-driven technology, and its performance is highly dependent on the quantity and quality of training data. To develop a robust AI model for ultrasonography, a multi-institutional large-scale dataset is required with a wide spectrum of diseases and non-disease entities, as well as images obtained from ultrasound devices from multiple vendors. Depending on the working conditions, the same lesion can be captured and interpreted differently, because more than 10 companies produce ultrasound equipment with various transducers and technical settings. In addition, ultrasound technology has evolved over the decades. Older ultrasonographic images are normally of lower resolution and have a higher noise level, while newer images are of higher resolution and have lower noise levels. Thus, AI algorithms trained with older images may not be externally valid for newer images.
The number of images used per patient in AI development has not been specified or standardized. Although most studies have used more than one image per patient for training and validation, specific details on the number of patients and images are needed based on the recently proposed Checklist for Artificial Intelligence in Medical Imaging [16]. Further studies or guidelines may be necessary to specify the structure and details of the dataset according to the type of dataset (training/validation/test) for a generalizable AI system, by minimizing selection and spectrum bias [17,18].
Image Preprocessing and Data Augmentation
An image annotation process involving manual delineation of the region of interest (ROI) of the lesion is usually required to train AI models in a supervised manner. The ROI can be automatically detected using various computer-aided segmentation techniques. However, human verification of the ROI is still required to guarantee the quality of training data. The need for a massive number (usually more than thousands) of annotated images is a barrier in the development of well-performing and robust AI systems, because the image annotation process is both time- and labor-intensive. In addition, manual annotation can be biased due to subjective prejudgment of the lesion character. To relax the requirements of manual ROI delineation in training data, weakly-supervised or semi-supervised methods are now emerging, in which unannotated images with only image-level labels (i.e., benignity and malignancy) are used for image classification and localization [19-21]. After image annotation, the images are usually cropped with a fixed margin around the ROI and resized. The margin is defined as the distance between the lesion boundary and the boundary of the cropped image itself. In a previous study, a 180-pixel margin showed the best performance [15]; however, cropping with variable margins (0–300 pixels) has been used in AI studies of breast ultrasonography. To input cropped images into AI models, it is necessary to resize cropped images to a fixed size, which also varies from study to study.
Data augmentation is commonly used to avoid overfitting and to increase the volume of training data [22,23]. Data augmentation is a process of creating new data (images) by manipulating the original data using a variety of augmentation strategies, including flipping, rotation, translation, and noise injection (Fig. 1). Even though resizing and data augmentation are essential steps for AI model training, these processes also carry the risk of reducing classification performance by altering some breast lesion attributes in ultrasonographic images. Byra et al. [13] suggested that images should not be rotated so that the longitudinal direction is shifted. For example, the posterior acoustic shadowing of a breast mass, which is one of the signs of malignancy, can be located anteriorly by longitudinal flipping [13].
Explainable AI
Since decisions derived from AI systems affect clinical decisions and/or outcomes, there is a need to understand the AI decision-making process. Considering the deep nature of current AI techniques with hundreds of layers and millions of parameters, this is a "black box" problem, in which AI output lacks explainability and justification. Thus, eXplainable AI (XAI) is now a crucial trend in the deployment of responsible AI models in medical imaging. One of the well-known approaches in XAI is the class activation mapping (CAM), which provides weighted feature maps of each class at the last convolution layers. CAM helps to understand the decision-making process by mapping the output back to the input image to see which parts of the input image were discriminative for the output [24-26]. With a CAM, breast lesions can be recognized (localized) in ultrasonographic images, and this localization is relevant for classification (Figs. 2, 3). Recent approaches and related issues have been reviewed [27-30]. However, much work remains to be done in the clinical interpretation of the explainable outputs provided by AI models.
Potential Clinical Applications
In breast ultrasonography, AI performs three main clinical tasks: detection (localization or segmentation), differential diagnosis (classification), and prognostication (prediction).
Detection (Localization or Segmentation)
As with other medical imaging modalities, DL-based lesion detection on breast ultrasonography is mostly performed by convolutional neural networks (CNNs). The CNN-based detection methods have shown superior accuracy in object detection compared with conventional computerized methods (i.e., radial gradient index filtering and multifractal filtering). For hand-held ultrasound (HHUS) images, Yap et al. [31] reported the performance of DL models with three different DL-based methods (a patch-based LeNet, a U-Net, and a transfer learning approach with FCN-AlexNet). They found that the transfer-learned FCN-AlexNet outperformed the other methods, with a true positive fraction (TPF) ranging from 0.92 to 0.98. Kumar et al. [32] proposed an ensemble model of multi U-Net models for automated segmentation of suspicious breast masses seen on ultrasonography, and reported that the model showed a TPF of 0.84.
DL research using HHUS has intrinsic limitations because still images are obtained from HHUS after a decision is made on whether and how to capture a certain portion of a lesion or an anatomical structure; thus, the process of image acquisition is highly dependent on the human imager. Hence, the clinical need for AI applications using HHUS necessarily extends to real-time detection. Zhang et al. [33] embedded a lightweight neural network, which was trained using a knowledge distillation technique to transfer knowledge from deeper models to the shallow network, into ultrasonographic equipment. They reported successful test performance using real-time equipment at 24 frames per second [33]. This is the only study so far to implement real-time automated detection using single-source data, but it lacked performance metrics; thus, further studies are required to ascertain the potential clinical applications of AI to HHUS.
Automated breast ultrasonography (ABUS) is the modality of AIpowered lesion detection that is most strongly expected to assist radiologists in performing initial screenings and reducing the need for observational oversight because the thousands of images per patient generated by ABUS necessitate a prolonged interpretation time. Several computer-aided detection (CAD) algorithms, including the commercial software QV CAD (QView Medical, Los Altos, CA, USA), have been developed [34-37]. Studies have shown that the QV CAD system, with marks potentially malignant lesions, was helpful for radiologists (especially less-experienced radiologists) to improve cancer detection and reduce interpretation time [38]. Moon et al. [39] and Chiang et al. [40] proposed a 3-D CNN with a sliding window method and achieved high sensitivity (91%-100%) with a false-positive rate per case of 3.6%-21.6%.
Differential Diagnosis (Classification)
Differential diagnosis refers to the process of distinguishing a particular disease from others, and in the context of breast imaging, it usually refers to a distinction between benignity and malignancy. In clinical practice, the Breast Imaging Reporting and Data System (BI-RADS) developed by the American College of Radiology is used to standardize the reporting of breast ultrasound interpretation. Although the BI-RADS provides a systemic approach for lesion characterization and assessment, interobserver and intraobserver variability has been a subject of intense scrutiny, and AI solutions are expected to provide more reliable diagnoses [41,42].
Byra et al. [13] presented a CNN model with a transfer learning strategy using the VGGNet-19 which was pretrained on the ImageNet data set and fine-tuned on 882 breast ultrasound images with a matching layer to classify breast lesions as benign or malignant. The area under the receiver operating characteristic curve (AUC) of the better-performing CNN model was significantly greater than the highest AUC value for the radiologists (0.936 vs. 0.882). Other DL studies with CNN variants trained from scratch have demonstrated comparable or even higher diagnostic performance relative to radiologists [9,12,43]. Despite these promising results, further studies are warranted to prove the clinical utility of AIpowered classification systems. Most of the algorithms developed in up-to-date studies were trained through images obtained from a limited number of institutions and ultrasound systems; therefore, the developed algorithms do not necessarily perform well in different circumstances. Furthermore, ultrasound still images taken by an imager that capture a certain portion of the lesion, instead of viewing the whole lesion range, may contribute to underrepresentation or exaggeration of the ground-truth characteristics.
Han et al. [15] employed a transferred GoogLeNet model on 7,408 breast ultrasound images (4,254 benign and 3,154 malignant), which showed an AUC >0.9. This model is a component algorithm of S-Detect, which is a commercial CAD program embedded in ultrasound equipment that provides an automatic analysis based on BI-RADS descriptors. It has been implemented in the RS80A ultrasound machine (Samsung Medison Co. Ltd., Seoul, Korea). With its feature extraction technique and support vector machine classifier, it predicts the final assessment of breast masses in a dichotomized form (possibly benign or possibly malignant). Choi et al. [44] found that significant improvements in AUC were seen with CAD (0.823–0.839 vs. 0.623–0.759), especially for less-experienced radiologists. However, this program has limited applications with other vendors and requires user-defined lesion annotation, which is also a time-consuming process.
Differential diagnosis concurrent with detection or segmentation using DL models has been reported in various studies. Yap et al. [45] simultaneously performed both localization and classification using their model, and obtained sensitivity of 0.80-0.84 and 0.38-0.57 and dice scores of 0.72-0.76 and 0.33-0.76 for benign and malignant masses, respectively. Shin et al. [46] proposed a semi-supervised method, for which a small number of extensively annotated images and a larger number of image-level labeled images were used for model training, and they reported a 4.5 percentage point improvement in the correct localization measure, compared with the conventional fully-supervised methods with only the extensively annotated images, although quantitative metrics for classification were not presented. Kim et al. [47] introduced a weakly-supervised deep network with box convolution to detect suspicious regions of breast masses with various sizes and shapes in AI-based differential diagnosis. The box convolution method (accuracy, 89%) more accurately classified breast masses than conventional CNN models (86%-87%), by learning the clinically relevant features of the masses and their surrounding areas. Moreover, their proposed network provided more robust discrimination localization than conventional methods (AUC, 0.89 vs. 0.75-0.78). Although these efforts toward a framework for both localization and classification of breast lesions with less manual annotation are still on the level of feasibility studies, without clear performance metrics or validation through a multi-reader study design, they will provide the groundwork for AI application in ABUS or real-time ultrasonography.
Prognostication (Prediction)
Prognostication in breast cancer patients is usually conducted to predict the histopathological characteristics of the tumor before surgery or treatment, as well as treatment response and survival time. AI technologies involving prognostication have been rarely studied, because they are now beginning to be understood mostly at the level of detection and diagnosis. Several studies have attempted to predict axillary nodal status, which is of clinical significance because it guides treatment selection (i.e., the type of axillary surgery) [48]. Zhou et al. [49] found that the best-performing CNN model yielded a satisfactory prediction, with an AUC of 0.89-0.90, a sensitivity of 82%-85%, and a specificity of 72%-79% and this model outperformed three experienced radiologists in the receiver operating characteristic space. Another study reported that the DL radiomics of conventional ultrasonography and shear wave elastography, combined with clinical parameters, showed the best performance in predicting axillary lymph node metastasis, with an AUC of 0.902 [50]. In addition, this model could also discriminate between a low and heavy metastatic axillary nodal burden, with an AUC of 0.905.
AI with DL is expected to further reveal information that human experts cannot recognize and integrate imaging features and clinical variables. Further studies may provide insights into whether and how AI-aided predictions of clinical outcomes can be made with superior and reliable accuracy compared with human-crafted features. For now, very limited DL studies have been published on predictions of tumor response to neoadjuvant chemotherapy using magnetic resonance imaging [51,52].
Summary
AI has tremendous potential to contribute to workflow efficiency and the reduction of interobserver variability in breast ultrasonography. The studies reviewed in this article have reported potential clinical applications of AI for breast cancer detection, characterization, and prognostication using ultrasonography. However, some methodological considerations should be carefully considered for the development of more robust and responsible AI systems.
Notes
Author Contributions
Conceptualization: Kim J, Kim HJ, Kim WH. Data acquisition: Kim J, Kim HJ, Kim C, Kim WH. Data analysis or interpretation: Kim J, Kim HJ, Kim C, Kim WH. Drafting of the manuscript: Kim J, Kim HJ, Kim WH. Critical revision of the manuscript: Kim C, Kim WH. Approval of the final version of the manuscript: all authors.
No potential conflict of interest relevant to this article was reported.
Acknowledgements
This work was supported by a National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (No. 2020R1C1C1006453, No.2019R1G1A1098655) and Ministry of Education (No. 2020R1I1A3074639), and by the AI-based image analysis solution development program for diagnostic medical imaging devices (20011875, Development of AI based diagnostic technology for medical imaging devices) funded by the Ministry of Trade, Industry & Energy (MOTIE, Korea) and the Korean Society of Breast Imaging & Korean Society for Breast Screening (KSBI & KSFBS-2017-01).