AbstractUltrasonography (US) is the primary diagnostic tool used to assess the risk of malignancy and to inform decision-making regarding the use of fine-needle aspiration (FNA) and post-FNA management in patients with thyroid nodules. However, since US image interpretation is operator-dependent and interobserver variability is moderate to substantial, unnecessary FNA and/or diagnostic surgery are common in practice. Artificial intelligence (AI)-based computer-aided diagnosis (CAD) systems have been introduced to help with the accurate and consistent interpretation of US features, ultimately leading to a decrease in unnecessary FNA. This review provides a developmental overview of the AI-based CAD systems currently used for thyroid nodules and describes the future developmental directions of these systems for the personalized and optimized management of thyroid nodules.
IntroductionThyroid nodules are a common clinical problem, occurring in 19%-68% of the healthy population [1-3]. Ultrasonography (US) is an essential diagnostic tool used to assess the risk of malignancy and to inform decision-making regarding the use of fine-needle aspiration (FNA) and post-FNA management in patients with thyroid nodules [1-3]. However, accurate recognition and consistent interpretation of US features are challenging for less-experienced operators, resulting in moderate to substantial interobserver and intraobserver variability [4-8]. In addition to experienced radiologists, many other clinicians-including endocrinologists, surgeons, nuclear medicine physicians, cytopathologists, family practice physicians, and other non-imaging specialists-perform thyroid US at primary care centers; therefore, unnecessary FNA and/or diagnostic surgery are commonly performed, placing a significant burden on the healthcare system and causing considerable anxiety to patients [1-3]. In addition, examining thyroid nodules on US is relatively labor-intensive due to their high prevalence in practice.
Artificial intelligence (AI)-based computer-aided diagnosis (CAD) systems, based on machine learning (ML) and deep learning (DL) techniques, have been introduced for thyroid cancer diagnosis to overcome the limitations of US diagnosis by clinicians. Many studies have reported the potential roles of these systems in thyroid cancer diagnosis, and have demonstrated comparable or even higher diagnostic performance than experienced radiologists [8-13]. However, at this point, the use of AI tools in clinical practice is of great concern since most studies were designed as proof-of-concept or technical feasibility research without a thorough external validation of real-world clinical performance [14-16]. Most studies have been based on algorithms developed by individual researchers, and only a few have investigated the use of commercially available systems. In this review, we discuss the clinical background, development, and validation studies of AI-based CAD systems in thyroid cancer diagnosis, and describe the future developmental directions of these systems for the personalized and optimized management of thyroid nodules.
Development of AI-Based CAD Systems in Thyroid ImagingUS-based risk-stratification systems (RSS) have been used for the effective management of thyroid nodules since the early 2000s [17]. They were initially introduced as qualitative grading systems for the simple classification of thyroid nodules that show any suspicious US features such as malignancy, the inclusion of microcalcifications, a taller-than-wide shape, spiculated margin, and marked hypoechogenicity [3,17,18]. However, there has been a conceptual change in the use of RSS, which have evolved into more quantitative scoring systems that estimate the risk of malignancy by scoring the combined US features and categorizing the US patterns or adding US risk scores (Fig. 1) [3,19]. Na et al. [19] suggested that US predictors for malignancy were dependent on the solidity and echogenicity of thyroid nodules, as the suspicious US features of microcalcification, taller-than-wide shape, and spiculated/microlobulated margin were independent predictors of malignancy in solid hypoechoic nodules. Alongside this development, international societies of thyroid imaging specialists have devised a system with a more structured format, known as the Thyroid Imaging Reporting and Data System (TI-RADS) [1,2,20-22]. However, although the various TI-RADS classifications, based on the pattern- or point-based approach, have been widely applied in practice, some researchers have suggested the need for more segmented RSS for the personalized and optimized management of thyroid nodules [23,24]. The concerns regarding the current RSS are that nodules with different risk factors are classified into four or five categories, which are managed equally, with a broad range of risk. Although these systems have the advantages of being simple and easily applied to clinical practice, further advances are needed in RSS of thyroid nodules to prevent unnecessary FNA in low-risk nodules [3]. Therefore, in response to the clinical demands to decrease unnecessary FNA further, AI-based CAD systems have been proposed as ways to increase the accuracy of US-based diagnosis for less-experienced operators and to address the complexity of the segmented RSS. Accurately estimating and stratifying the risk of malignancy on US could help to identify nodules with a high risk of cancer, while also avoiding unnecessary FNA by identifying nodules with an acceptably low likelihood of malignancy.
AI-Based CAD Systems: Where Do We Stand?AI-based CAD systems are based on two techniques: ML or DL. The ML technique relies on pre-defined engineered features extracted from the region of interest (ROI) based on expert knowledge, and the most robust features are selected and fed into the ML classifiers [25]. In thyroid imaging, many studies using the ML technique have developed CAD systems based on US features, such as composition, shape, margin, echogenicity, and calcifications, and have demonstrated their potential in thyroid cancer diagnosis [9,26,27]. Chang et al. [9] reported an area under the receiver operating curve (AUROC) up to 0.986 when a support vector machine classifier was used to differentiate benign and malignant nodules, which was similar to the results obtained via visual inspection by radiologists (AUROC, 0.979) [9]. Therefore, many researchers have suggested that ML-based CAD systems may play a role in generating a second opinion for radiologists [9,26].
However, there are several limitations, since few websites or commercialized programs are available to provide access to these systems for external validation [27]. In a systematic review by Sollini et al. [27], it was found that the AUROC values of ML approaches varied widely among published studies, ranging from 0.67 to 1.00, and that many methodological issues existed. They emphasized the limited comparability and reproducibility of the published studies, which arise from confounding variables, such as different imaging protocols, segmentation methods, and scanners/vendors. Feature type, selection, and classifiers also varied among studies. Finally, the test and validation datasets were lacking in most cases, even though the systems were developed using relatively small sample sizes [27].
Compared to the ML technique, the DL technique does not require prior definitions by human experts [25]. Many recent studies using DL techniques have developed classification models without the provision of any information on the US features. A recent study by Li et al. [13] reported promising results in the development of a classification model with a cohort of over 300,000 images. Interestingly, they reported that a newly developed CAD system had similar sensitivity to that of skilled radiologists who interpreted US images based on the American College of Radiology TI-RADS classification (84.3%-93.4% vs. 89.0%-96.9%, respectively), and even higher specificity (86.1%-87.8% vs. 57.1%-68.6%, respectively). Although debates exist concerning issues such as the relatively low specificity of the skilled radiologists (57.1%-68.6%) compared with that reported in previous studies (86.4%-95.5%), and the direct comparison made between radiologists who used the TI-RADS classifications and the CAD system, which used dichotomous outcomes, the technical success of this study is noteworthy and should be validated in a different geographic setting [28].
Two commercialized CAD systems for thyroid cancer diagnosis have been developed to overcome the software implementation and external validation issues (Table 1). AmCAD-UT (AmCAD Biomed, Taipei, Taiwan) is the first commercialized CAD system for the diagnosis of thyroid nodules using US. It is designed to characterize thyroid nodule features using statistical pattern recognition and quantification algorithms and provides the risk of malignancy based on TI-RADS classifications. In an external validation study by Reverter et al. [29] on 300 thyroid nodules, it showed a similar sensitivity (87.0% vs. 87.0%, P=0.76), but lower specificity (68.8% vs. 91.2%, P<0.01) and AUROC (0.72 vs. 0.88), compared to clinical experts using the American Thyroid Association TI-RADS classification [29]. S-Detect for thyroid (Samsung Medison Co., Ltd., Seoul, Korea) is another commercialized CAD system that is integrated into a commercially available US platform (Fig. 2). The S-Detect 1 for the thyroid is based on the ML technique of support vector machine models, and the S-Detect 2 for the thyroid utilizes convolutional neural network-based DL techniques [11]. This system has been used since the real-time application of CAD systems became possible during US examinations. When an ROI is manually drawn around a lesion during a US examination, the CAD software automatically calculates the mass contours and presents the US features and a possible diagnosis using a dichotomous outcome (probably benign vs. probably malignant) or a TI-RADS classification outcome. In a preliminary prospective external validation study by Choi et al. [12] on 102 thyroid nodules, this system showed comparable sensitivity (88.4% vs. 90.7%, P>0.99) to an experienced radiologist, but lower specificity and AUROC (74.6% vs. 94.9%, P=0.002; and 0.83 vs. 0.92, P=0.021, respectively). In a similar study by Yoo et al. [10] on 117 thyroid nodules, a CAD-assisted radiologist showed improved sensitivity (up to 92.0%), but lower specificity and positive predictive value (PPV) than those of the radiologist alone (85.1% vs. 95.5%, P=0.005; 82.1% vs. 93.3%, P=0.008). Chung et al. [30] suggested similar clinical applications in their prospective study. They demonstrated that the S-Detect may support decision-making in the diagnosis of malignant thyroid nodules for operators who have less experience with thyroid US [30]. In terms of the comparative diagnostic performance between CAD systems based on the TI-RADS classification and dichotomous outcomes, a study by Han et al. [31] showed that the dichotomous outcome method had higher specificity, PPV, and accuracy (i.e., it reduced unnecessary FNA). In contrast, the TI-RADS system had higher sensitivity and negative predictive value (NPV; an increase in unnecessary FNA) when using the CAD system [30]. However, some issues remained. Kim et al. [11] reported that areas needing improvement were inaccuracy and a poor detection rate of microcalcifications. Jeong et al. [8] demonstrated that the less-experienced operators showed lower sensitivity (68.8%-73.8%) and accuracy (71.0%-75.0%) than an experienced radiologist (88.6% and 86.0%, respectively), indicating that operator dependency remains an issue with regard to AI-based CAD systems. Considering the current status of commercialized CAD systems, CAD systems may support and generate a second opinion for doctors, especially less-experienced ones.
Future Developmental Directions of AI-Based CAD SystemsAI-based CAD systems are currently evolving; however, none have been widely adopted worldwide, and conflicting issues remain (Table 2). Therefore, at this point, the current results are somewhat disappointing, making it difficult to estimate their actual clinical impact on thyroid nodule management. More practical, well-designed AI-based CAD systems are required to provide consistent nodule management in practice. Three key questions must be considered when developing an AI-based CAD system. The first issue is how any new tool will be integrated into the diagnostic workflow (as a first reader, second reader, or an offline [autonomous] reader). Although current evidence supports a role for AI-based CAD systems as potential decision-making aids for less-experienced operators as a second reader, further research and discussion are needed regarding the clinical implications of such systems. The second issue concerns the target benchmarks of these tools: high sensitivity (high NPV) or high specificity (high PPV). Since differentiated thyroid cancers have a good prognosis and low mortality rate, many researchers tend to focus on high specificity as a way to reduce unnecessary FNA. However, since previous studies have shown that CAD systems have similar sensitivity, but lower specificity and accuracy, in comparison with experienced radiologists, a possible option would be to use these systems as screening tools with high sensitivity to assist less-experienced operators at primary medical centers [8,10-12]. Decisions about whether to perform FNA could be referred to thyroid imaging experts to increase the specificity at secondary or tertiary medical centers. A social consensus on the appropriate levels of sensitivity and specificity of AI-based CAD systems in thyroid cancer diagnosis is required in the future. The third issue concerns other clinical factors such as successfully reducing report variability, unnecessary downstream testing, and turnaround time. Although we hope that the combination of AI-based CAD with an RSS, implemented on a commercial US machine, could decrease operator dependency in image interpretation and assist with real-time interpretation to assess the risk of malignancy and FNA decisions in patients with thyroid nodules, the actual clinical significance of the CAD system requires further validation in different clinical settings. It is essential to obtain an adequate external dataset from a well-defined clinical cohort to avoid overestimating the clinical performance and to achieve a robust clinical evaluation [14-16]. Finally, clinical trials and observational outcome studies that go beyond performance metrics are needed in the future for the ultimate clinical verification of AI-based CAD systems.
Besides thyroid cancer diagnosis, AI-based CAD systems also show some potential to identify and differentiate metastatic lymph nodes in patients with thyroid cancer. The presence of metastatic lymph nodes is a prognostic indicator for patients with thyroid cancer and an important determinant in surgical decision-making. However, evaluating neck lymph nodes requires more experience than is needed for the diagnosis of thyroid cancer. In a study by Lee et al. [32] using US, a CAD system showed accuracy, sensitivity, and specificity for predicting lymph node metastasis of 83.0%, 79.5%, and 87.5%, respectively. A recent development and validation study applied DL to the diagnosis of lymph node metastasis using computed tomography; the AUROCs for the eight tested algorithms were above 0.90, and the best-performing algorithm showed an AUROC of 0.874 in a validation set [33,34]. This approach may serve as a training tool to help resident physicians gain confidence in diagnosing thyroid cancer.
NotesAuthor Contributions Conceptualization: Ha EJ, Baek JH. Drafting of the manuscript: Ha EJ, Baek JH. Critical revision of the manuscript: Ha EJ, Baek JH. Approval of the final version of the manuscript: all authors. References1. Haugen BR. 2015 American Thyroid Association management guidelines for adult patients with thyroid nodules and differentiated thyroid cancer: what is new and what has changed? Cancer 2017;123:372–381.
2. Shin JH, Baek JH, Chung J, Ha EJ, Kim JH, Lee YH, et al. Ultrasonography diagnosis and imaging-based management of thyroid nodules: revised Korean Society of Thyroid Radiology consensus statement and recommendations. Korean J Radiol 2016;17:370–395.
3. Ha EJ, Baek JH, Na DG. Risk stratification of thyroid nodules on ultrasonography: current status and perspectives. Thyroid 2017;27:1463–1468.
4. Choi SH, Kim EK, Kwak JY, Kim MJ, Son EJ. Interobserver and intraobserver variations in ultrasound assessment of thyroid nodules. Thyroid 2010;20:167–172.
5. Kim HG, Kwak JY, Kim EK, Choi SH, Moon HJ. Man to man training: can it help improve the diagnostic performances and interobserver variabilities of thyroid ultrasonography in residents? Eur J Radiol 2012;81:e352–e356.
6. Park CS, Kim SH, Jung SL, Kang BJ, Kim JY, Choi JJ, et al. Observer variability in the sonographic evaluation of thyroid nodules. J Clin Ultrasound 2010;38:287–293.
7. Park SH, Kim SJ, Kim EK, Kim MJ, Son EJ, Kwak JY. Interobserver agreement in assessing the sonographic and elastographic features of malignant thyroid nodules. AJR Am J Roentgenol 2009;193:W416–W423.
8. Jeong EY, Kim HL, Ha EJ, Park SY, Cho YJ, Han M. Computer-aided diagnosis system for thyroid nodules on ultrasonography: diagnostic performance and reproducibility based on the experience level of operators. Eur Radiol 2019;29:1978–1985.
9. Chang Y, Paul AK, Kim N, Baek JH, Choi YJ, Ha EJ, et al. Computer-aided diagnosis for classifying benign versus malignant thyroid nodules based on ultrasound images: a comparison with radiologist-based assessments. Med Phys 2016;43:554.
10. Yoo YJ, Ha EJ, Cho YJ, Kim HL, Han M, Kang SY. Computer-aided diagnosis of thyroid nodules via ultrasonography: initial clinical experience. Korean J Radiol 2018;19:665–672.
11. Kim HL, Ha EJ, Han M. Real-world performance of computer-aided diagnosis system for thyroid nodules using ultrasonography. Ultrasound Med Biol 2019;45:2672–2678.
12. Choi YJ, Baek JH, Park HS, Shim WH, Kim TY, Shong YK, et al. A computer-aided diagnosis system using artificial intelligence for the diagnosis and characterization of thyroid nodules on ultrasound: initial clinical assessment. Thyroid 2017;27:546–552.
13. Li X, Zhang S, Zhang Q, Wei X, Pan Y, Zhao J, et al. Diagnosis of thyroid cancer using deep convolutional neural network models applied to sonographic images: a retrospective, multicohort, diagnostic study. Lancet Oncol 2019;20:193–201.
14. Park SH, Han K. Methodologic guide for evaluating clinical performance and effect of artificial intelligence technology for medical diagnosis and prediction. Radiology 2018;286:800–809.
15. Park SH, Kressel HY. Connecting technological innovation in artificial intelligence to real-world medical practice through rigorous clinical validation: what peer-reviewed medical journals could do. J Korean Med Sci 2018;33:e152.
16. Park SH. Diagnostic case-control versus diagnostic cohort studies for clinical validation of artificial intelligence algorithm performance. Radiology 2019;290:272–273.
17. Kim EK, Park CS, Chung WY, Oh KK, Kim DI, Lee JT, et al. New sonographic criteria for recommending fine-needle aspiration biopsy of nonpalpable solid nodules of the thyroid. AJR Am J Roentgenol 2002;178:687–691.
18. Moon WJ, Jung SL, Lee JH, Na DG, Baek JH, Lee YH, et al. Benign and malignant thyroid nodules: US differentiation: multicenter retrospective study. Radiology 2008;247:762–770.
19. Na DG, Baek JH, Sung JY, Kim JH, Kim JK, Choi YJ, et al. Thyroid Imaging Reporting and Data System risk stratification of thyroid nodules: categorization based on solidity and echogenicity. Thyroid 2016;26:562–572.
20. Tessler FN, Middleton WD, Grant EG, Hoang JK, Berland LL, Teefey SA, et al. ACR Thyroid Imaging, Reporting and Data System (TI-RADS): white paper of the ACR TI-RADS Committee. J Am Coll Radiol 2017;14:587–595.
21. Camacho PM, Petak SM, Binkley N, Clarke BL, Harris ST, Hurley DL, et al. American Association of Clinical Endocrinologists and American College of Endocrinology clinical practice guidelines for the diagnosis and treatment of postmenopausal osteoporosis - 2016. Endocr Pract 2016;22(Suppl 4):1–42.
22. Russ G. Risk stratification of thyroid nodules on ultrasonography with the French TI-RADS: description and reflections. Ultrasonography 2016;35:25–38.
23. Kwak JY, Jung I, Baek JH, Baek SM, Choi N, Choi YJ, et al. Image reporting and characterization system for ultrasound features of thyroid nodules: multicentric Korean retrospective study. Korean J Radiol 2013;14:110–117.
24. Choi YJ, Baek JH, Baek SH, Shim WH, Lee KD, Lee HS, et al. Web-based malignancy risk estimation for thyroid nodules using ultrasonography characteristics: development and validation of a predictive model. Thyroid 2015;25:1306–1312.
25. Hosny A, Parmar C, Quackenbush J, Schwartz LH, Aerts H. Artificial intelligence in radiology. Nat Rev Cancer 2018;18:500–510.
26. Zhang B, Tian J, Pei S, Chen Y, He X, Dong Y, et al. Machine learning-assisted system for thyroid nodule diagnosis. Thyroid 2019;29:858–867.
27. Sollini M, Cozzi L, Chiti A, Kirienko M. Texture analysis and machine learning to characterize suspected thyroid nodules and differentiated thyroid cancer: where do we stand? Eur J Radiol 2018;99:1–8.
28. Ha EJ, Baek JH, Na DG. Deep convolutional neural network models for the diagnosis of thyroid cancer. Lancet Oncol 2019;20:e130.
29. Reverter JL, Vazquez F, Puig-Domingo M. Diagnostic performance evaluation of a computer-assisted imaging analysis system for ultrasound risk stratification of thyroid nodules. AJR Am J Roentgenol 2019;213:169–174.
30. Chung SR, Baek JH, Lee MK, Ahn Y, Choi YJ, Sung TY, et al. Computer-aided diagnosis system for the evaluation of thyroid nodules on ultrasonography: prospective non-inferiority study according to the experience level of radiologists. Korean J Radiol 2020;21:369–376.
31. Han M, Ha EJ, Park JH. Computer-aided diagnostic system for thyroid nodules on ultrasonography: diagnostic performance based on the TI-RADS classification and dichotomous outcomes. Am J Neuroradiol 2020;(In press).
32. Lee JH, Baek JH, Kim JH, Shim WH, Chung SR, Choi YJ, et al. Deep learning-based computer-aided diagnosis system for localization and diagnosis of metastatic lymph nodes on ultrasound: a pilot study. Thyroid 2018;28:1332–1338.
Table 1.
Table 2.
|