Malignancy risk stratification and subcategorization of K-TIRADS intermediate suspicion thyroid nodules: a retrospective multicenter study
Article information
Abstract
Purpose
This study aimed to develop the ultrasonography (US) criteria for risk stratification of the Korean Thyroid Imaging Reporting and Data System (K-TIRADS) 4 nodules, and to evaluate the diagnostic yield of a modified biopsy criterion in a multicenter cohort.
Methods
In total, 1,542 K-TIRADS 4 nodules (≥1 cm) were included in the study. US criteria for the subcategorization of K-TIRADS 4 nodules were developed based on high-risk US features. The diagnostic yields and false referral rates of biopsy criterion 1 (size cut-off of 1 cm), biopsy criterion 2 (size cut-off of 1.5 cm), and modified biopsy criterion 3 (size cut-off of 1 cm for K-TIRADS 4B and 1.5 cm for K-TIRADS 4A) were evaluated.
Results
The five high-risk US features (solid composition, marked hypoechogenicity, macrocalcification, punctate echogenic foci, and irregular margin) independently increased the malignancy risk of the K-TIRADS 4 nodules (P<0.001). The K-TIRADS 4 nodules could be subcategorized into higher- and lower-risk subcategories according to the number of high-risk US features: K-TIRADS 4B (≥2 US features) and K-TIRADS 4A (≤1 US feature). The modified biopsy criterion increased the diagnostic yield by 7.8% compared with criterion 2 and reduced the false referral rate by 15.3% compared with criterion 1 (P<0.001).
Conclusion
The K-TIRADS 4 nodules were subcategorized as K-TIRADS 4B and K-TIRADS 4A based on high-risk US features. The modified biopsy criterion 3 showed a similar diagnostic yield and reduced false referral rate compared to criterion 1.
Introduction
Ultrasonography (US) is the primary imaging modality for evaluating thyroid nodules [1,2]. The US risk stratification system (RSS) and the Thyroid Imaging Reporting and Data System (TIRADS) are essential in diagnosing and managing these nodules. The RSS is particularly important in selecting patients who are eligible for biopsy and in ruling out thyroid cancer. As a triage tool, RSSs must demonstrate adequate sensitivity in detecting thyroid cancer within clinically relevant thyroid nodules (>1 cm) and should aid in minimizing unnecessary biopsies of benign nodules [3].
The recently updated 2021 Korean (K)-TIRADS increased the biopsy size cut-offs for low and intermediate suspicion nodules, significantly reducing the rate of unnecessary biopsies for small (1-2 cm) nodules while maintaining high sensitivity for detecting malignancy in larger (>2 cm) nodules [4,5]. K-TIRADS 4 nodules present an intermediate malignancy risk ranging from 10%-40%, and the 2021 K-TIRADS recommends biopsy for K-TIRADS 4 nodules measuring 1-1.5 cm, taking into account US features, nodule location, clinical risk factors, and patient considerations [5]. However, the US features that increase the risk of malignancy in K-TIRADS 4 nodules have been seldom explored. Two studies have investigated the risk stratification and subcategorization of K-TIRADS 4 nodules based on US characteristics, but their findings are inconsistent [6,7]. There is a need to establish US criteria for risk stratification of K-TIRADS 4 nodules to provide useful information for making optimized biopsy decisions. Additionally, the diagnostic yield of the modified biopsy criteria, based on the subcategorization of K-TIRADS 4 nodules, should be evaluated against the biopsy criteria of the 2021 K-TIRADS to determine their effectiveness in diagnosing malignancy.
This study aimed to develop US criteria for the risk stratification of K-TIRADS 4 nodules and to evaluate the diagnostic yield of a modified biopsy criterion derived from subcategorizing K-TIRADS 4 nodules in a multicenter cohort.
Materials and Methods
Compliance with Ethical Standards
This study was approved by the institutional review board of GangNeung Asan Hospital in Korea (2018-04-008-003), and informed consent was waived for this retrospective study. The methods and data reporting were performed in accordance with the Standards for Reporting Diagnostic Accuracy Studies [8].
Study Population
This study included a subset of patient data retrospectively collected from 26 different hospitals in Korea (Thyroid Imaging Network of Korea registry), which included 5,081 consecutive patients with 5,708 thyroid nodules (≥1 cm) [4,9]. After the exclusion of 3,581 patients with thyroid nodules classified as K-TIRADS 2, 3, or 5 by the 2021 K-TIRADS [5], the study included 1,542 thyroid nodules classified as K-TIRADS 4 in 1,500 patients (1,244 women and 256 men; median age, 55 years) (Fig. 1).
Malignant nodules were diagnosed based on histopathological results after surgery (n=304) or malignant (Bethesda VI) fine needle aspiration (FNA) or core-needle biopsy (CNB) results (n=58). Benign nodules were diagnosed based on histopathological results after surgery (n=94), with at least two benign FNA or CNB results (n=155) and one benign FNA or CNB result (n=931).
US Examination and Image Analysis
All US examinations were performed using 5-14 MHz linear probes and high-resolution US systems. The US images of thyroid nodules were independently reviewed by one of 17 experienced radiologists. These reviewers were blinded to both the biopsy results and the final diagnoses. They evaluated the US characteristics of the nodules, focusing on composition, echogenicity, orientation, margin, and echogenic foci (calcification). The detailed methodology for US assessment of thyroid nodules has been previously described [4,9]. In our study, the US features of the nodules were defined according to the lexicon provided by the 2021 K-TIRADS [5]. A nonparallel orientation was identified when the anteroposterior diameter of a nodule exceeded its transverse diameter in the transverse plane. Marked hypoechogenicity was characterized by a nodule's echogenicity being hypoechoic or similar to that of the anterior neck muscle. A nodule that was entirely calcified (isolated macrocalcification) was classified as a K-TIRADS 4 nodule according to the 2021 K-TIRADS [5]. For the purposes of this study, such a nodule was considered to have macrocalcification and was placed into subgroup 1 for analysis.
Development of US Criteria for the Subcategorization of K-TIRADS 4 Nodules
The K-TIRADS 4 nodules were subdivided into two subgroups based on US patterns: solid hypoechoic nodules without suspicious US features (subgroup 1) and partially cystic or iso-/hyperechoic nodules with suspicious US features (subgroup 2), as defined by the 2021 K-TIRADS [5]. The associations between US features—such as composition, echogenicity, echogenic foci (including punctate echogenic foci [PEF], macrocalcification, and rim calcification), irregular margins, nonparallel orientation, and hypoechoic halos—and malignancy were assessed within each subgroup and across all K-TIRADS 4 nodules. Criteria for the subcategorization of K-TIRADS 4 nodules were established based on high-risk US features that significantly increased the likelihood of malignancy within this group. Consequently, K-TIRADS 4 nodules were further classified into K-TIRADS 4B nodules, which have a higher risk of malignancy, and K-TIRADS 4A nodules, which have a lower risk, according to these criteria. This subcategorization was then applied to a modified biopsy criterion, wherein the size threshold for biopsy was differentiated: a 1 cm cut-off for K-TIRADS 4B nodules and a 1.5 cm cut-off for K-TIRADS 4A nodules.
Diagnostic Yield of Subcategories and Biopsy Criteria in K-TIRADS 4 Nodules
The diagnostic yields, false referral rates, and positive predictive values (PPVs) of two US subcategories (K-TIRADS 4A and 4B) and three biopsy criteria (criterion 1: biopsy size cut-off of 1 cm for all nodules, criterion 2: biopsy size cut-off of 1.5 cm for all nodules, and criterion 3: biopsy size cut-off of 1 cm for K-TIRADS 4B and 1.5 cm for K-TIRADS 4A nodules) were evaluated with 95% confidence intervals for K-TIRADS 4 nodules. Nodules were dichotomized into those for which a biopsy was indicated (test positivity) and those for which it was not (test negativity) according to each biopsy criterion. The diagnostic yield, or detection rate, was defined as the proportion of true-positive thyroid malignant tumors among all nodules, calculated by dividing the number of true-positive results by the total number of nodules [10]. The false referral rate, or unnecessary biopsy rate, was defined as the proportion of false-positive thyroid malignant tumors among all patients, determined by dividing the number of false-positive results by the total number of nodules [10]. The positive predictive value was defined as the proportion of true-positive malignant tumors among the nodules indicated for biopsy by each biopsy criterion.
Statistical Analysis
Due to their nonparametric distribution, continuous variables such as age and nodule size are presented using the median and interquartile range. The Mann-Whitney U test was utilized to compare age and nodule size between the two subgroups within K-TIRADS 4 nodules. Categorical variables are expressed as frequencies and percentages for each category. For categorical variables, the chi-square test or the Fisher exact test was employed to compare the clinicopathological features of patients between the two subgroups of K-TIRADS 4 nodules, as well as to assess the difference in malignancy risk between the subcategorized nodules (K-TIRADS 4A and 4B). These tests were also applied to evaluate the malignancy risk of nodules based on the number of high-risk US features independently associated with malignancy within the overall K-TIRADS 4 nodules.
Univariable and multivariable logistic regression analyses were conducted to identify independent US predictors of malignancy within each subgroup and across all K-TIRADS 4 nodules. We compared the diagnostic yield, false referral rate, and PPV among the biopsy criteria using the McNemar test or weighted generalized score tests. The statistical analyses were carried out using IBM SPSS version 25 for Windows (IBM Corp., Armonk, NY, USA) and R 3.6.3 for Windows (R Foundation for Statistical Computing, Vienna, Austria). A P-value of less than 0.05 was considered statistically significant.
Results
Demographic Data and Clinical Characteristics of the Patients
The demographic data and clinical characteristics of the patients are summarized in Table 1. The K-TIRADS 4 category nodules were divided into two subgroups based on the US patterns of thyroid nodules. There was no significant difference in the malignancy rates between subgroups 1 and 2 (22.5% vs. 24.6%, P=0.361). The proportion of women was slightly higher in subgroup 1 compared to subgroup 2 (P=0.021); however, the median ages of the two subgroups were not significantly different (P=0.720). The median size (maximum diameter) of the nodules was larger in subgroup 2 than in subgroup 1 (1.8 cm vs. 1.4 cm, P<0.001), and the incidence of large nodules (>2 cm) was greater in subgroup 2 than in subgroup 1 (40.4% vs. 24.3%, P<0.001). Out of the 1,542 nodules examined, 362 (23.5%) were malignant, and 1,180 (76.5%) were benign. There was no significant difference in the distribution of histological types of malignant tumors between the two subgroups (P=0.244).
US Features Associated with Thyroid Malignancy in K-TIRADS 4 Nodules
Table 2 shows the US features associated with malignancy in subgroups 1 and 2 of K-TIRADS 4 nodules. In subgroup 1, marked hypoechogenicity and macrocalcification emerged as significant independent predictors of thyroid malignancy in the multivariable analysis (P=0.002 and P<0.001, respectively). For subgroup 2, solid composition, marked hypoechogenicity, PEF, irregular margins, and macrocalcification were identified as significant independent predictors of malignancy, with all P-values being ≤0.009. The presence of two or three suspicious US features in subgroup 2 nodules significantly increased the risk of malignancy compared to nodules with only one suspicious feature (51.1% vs. 20.7%, P<0.001). However, the US feature of nonparallel orientation did not significantly affect the risk of malignancy in subgroup 2 nodules (P=0.899). Table 3 details the US features associated with malignancy across all K-TIRADS 4 nodules. Multivariable binary logistic regression analysis revealed that five US features—solid composition, marked hypoechogenicity, PEF, irregular margins, and macrocalcification—were independently associated with an increased risk of malignancy (all P<0.001).
US Criteria for Subcategorization of K-TIRADS 4 Nodules
The risk of malignancy varied significantly with the number of high-risk US features present (P<0.001), showing a tendency to increase as the number of high-risk US features within a nodule rose (P<0.001) (Supplementary Table 1). Nodules exhibiting two high-risk US features had a malignancy risk of 27.0%, which was significantly greater than the risk associated with nodules having none or one high-risk US feature (10%, P=0.039 and 15.4%, P<0.001, respectively). Furthermore, the malignancy risk for nodules with three high-risk US features was 50.7%, markedly higher than the risk for nodules with two high-risk US features (P<0.001).
Based on these findings, K-TIRADS 4 nodules can be divided into two subcategories (K-TIRADS 4A: lower risk and K-TIRADS 4B: higher risk). The K-TIRADS 4B subcategory included K-TIRADS 4 nodules that showed at least two of the five following US features: (1) solid composition, (2) marked hypoechogenicity, (3) macrocalcification, (4) PEF, and (5) irregular margin (Table 4). The K-TIRADS 4A subcategory included K-TIRADS 4 nodules with absent or one high-risk US feature. The malignancy risk in the K-TIRADS 4B category was significantly higher than that in the K-TIRADS 4A category (31.7% vs. 15.2%, P<0.001) (Table 4). The malignancy risk in the K-TIRADS 4B category was also significantly higher than that in the K-TIRADS 4A category in subgroup 1 (28.3% vs. 16.1%, P<0.001) and subgroup 2 (36.0% vs. 14.2%, P<0.001).
The diagnostic yield of K-TIRADS 4B was significantly higher than that of K-TIRADS 4A (16.0% and 7.5%, P<0.001) and the false referral rate of K-TIRADS 4B was significantly lower than that of K-TIRADS 4A (34.4% and 42.1%, P<0.001) (Table 4).
Diagnostic Yield of the Modified Biopsy Criterion Based on the Subcategorization of K-TIRADS 4 Nodules
Table 5 presents the diagnostic yields, false referral rates, and PPVs for two biopsy criteria from the 2021 K-TIRADS (criteria 1 and 2) alongside a modified biopsy criterion (criterion 3) based on the subcategorization of K-TIRADS 4 nodules within the overall K-TIRADS 4 category. Criterion 3 resulted in a 2.9% decrease in diagnostic yield and a 15.3% reduction in the false referral rate compared to criterion 1. In contrast, criterion 3 led to a 7.8% increase in diagnostic yield and a 16.0% rise in the false referral rate when compared to criterion 2. In the subgroup of small (≤2 cm) K-TIRADS 4 nodules, criterion 3 reduced the diagnostic yield by 4.3% and the false referral rate by 22.5% relative to criterion 1. Meanwhile, compared to criterion 2, criterion 3 increased the diagnostic yield by 11.3% and the false referral rate by 23.3% (Table 5). Both criteria 2 and 3 demonstrated significantly lower diagnostic yields and false referral rates than criterion 1 (P<0.001 for both). Criterion 3 also showed significantly higher diagnostic yield, false referral rate, and PPV compared to criterion 2 (all P<0.001). Additionally, criterion 3 exhibited significantly greater sensitivity for detecting malignancy in both the overall and small (≤2 cm) K-TIRADS 4 nodules than biopsy criterion 2 (87.6% vs. 54.4%, P<0.001 and 82.0% vs. 34.0%, P<0.001, respectively).
Discussion
This study demonstrated that the presence of five high-risk US features—solid composition, marked hypoechogenicity, macrocalcification, PEF, and irregular margins—independently increased the malignancy risk of K-TIRADS 4 nodules. Furthermore, we found that K-TIRADS 4 nodules could be stratified into higher and lower risk subcategories based on the number of these high-risk US features. Specifically, K-TIRADS 4B is characterized by the presence of two or more high-risk US features, while K-TIRADS 4A is defined by the absence or presence of only one high-risk US feature. The modified biopsy criterion (criterion 3) based on the subcategorization of K-TIRADS 4 nodules reduced the false referral rate (unnecessary biopsy rate) by 15.3% and 22.5% compared with criterion 1 (biopsy size cut-off of 1 cm) and increased the diagnostic yield (detection rate) for malignancy by 7.8% and 11.3% compared with criterion 2 (biopsy size cut-off of 1.5 cm) in the overall and small (≤2 cm) K-TRADS 4 nodules, respectively.
Marked hypoechogenicity and macrocalcification were identified as independent US features that increased the risk of malignancy in all K-TIRADS 4 nodules, as well as within each subgroup of nodules. These findings align with those of previous studies [11,12]. Lee et al. [12] found that nodules with moderately or markedly hypoechoic characteristics (similar in echogenicity or hypoechoic compared to neck muscle) had a significantly higher risk of malignancy than those that were mildly hypoechoic or iso-/hyperechoic across all subgroups, based on their composition and suspicious US features. Similarly, Shin et al. [11] found that macrocalcification was independently associated with malignancy in all nodules and significantly increased the malignancy risk in both subgroups of K-TIRADS 4 nodules. These consistent results strongly suggest that marked hypoechogenicity and macrocalcification are US features that independently elevate the risk of malignancy in K-TIRADS 4 nodules. A trend was also observed for an increased risk of malignancy with a higher number of high-risk US features in K-TIRADS 4 nodules, which is in line with the outcomes of previous research [13,14]. The US patterns of higher-risk K-TIRADS 4 nodules (K-TIRADS 4B) can be further categorized based on the subgroups of K-TIRADS 4 nodules. These subcategories include: (1) solid hypoechoic K-TIRADS 4 nodules with at least one feature of marked hypoechogenicity or macrocalcification, (2) solid isoechoic K-TIRADS 4 nodules with at least one feature of macrocalcification, punctate echogenic foci, or irregular margin, and (3) partially cystic K-TIRADS 4 nodules with at least two features of marked hypoechogenicity, macrocalcification, punctate echogenic foci, or irregular margin.
The sensitivity and unnecessary biopsy rate for the 2021 K-TIRADS using biopsy criterion 1 (a size cut-off of 1 cm for K-TIRADS 4 nodules) were the highest. In contrast, those for the 2021 K-TIRADS with biopsy criterion 2 (a size cut-off of 1.5 cm for K-TIRADS 4 nodules) were the lowest when compared to the American College of Radiology (ACR)–TIRADS and European-TIRADS in small (≤2 cm) thyroid nodules [9]. The diagnostic performance of current RSSs or TIRADS in detecting malignant tumors is primarily influenced by the nodule size cut-off for biopsy [15-17]. Utilizing higher size cut-offs for biopsy, such as those in ACR-TIRADS, results in relatively higher specificity, a lower rate of unnecessary biopsies, and greater diagnostic accuracy, but this approach inevitably reduces sensitivity in the diagnosis of malignancy [18-20]. The approach of lowering the rate of unnecessary biopsies at the expense of reduced sensitivity may be suitable for small (≤2 cm) malignant tumors, which generally have a better prognosis than larger (>2 cm) malignant tumors. Ideally, the most effective RSS should minimize unnecessary biopsies while preserving adequate sensitivity for detecting malignancy [3]. However, there is currently no consensus on the optimal sensitivity level for identifying malignancies in small (≤2 cm) thyroid nodules.
The diagnostic yields of biopsy criteria were analyzed in a subgroup of small (≤2 cm) K-TIRADS 4 nodules to discern the differences in diagnostic yield among the three biopsy criteria specific to small nodules. Our study revealed that the diagnostic yield and sensitivity of biopsy criterion 2 for detecting malignancy were notably low in small (≤2 cm) K-TIRADS 4 nodules. In contrast, the modified biopsy criterion 3, which is based on the subcategorization of K-TIRADS 4, demonstrated a significant improvement in diagnostic yield and sensitivity for malignancy, albeit with an increased false referral rate compared to biopsy criterion 2. It may be prudent to employ biopsy criterion 2 (size cut-off of 1.5 cm) for most K-TIRADS 4 nodules that lack high-risk clinical or ultrasonographic features of metastasis or gross extrathyroidal extension, despite the relatively low diagnostic yield, because the majority of malignant tumors missed are small (<1.5 cm) low-risk tumors. However, our findings suggest that the modified biopsy criterion 3 could be selectively utilized in place of biopsy criterion 2 for patients who prioritize a higher detection rate for malignancy and aim to minimize the oversight of small malignant tumors.
Subcategorizing K-TIRADS 4 nodules by risk stratification could provide valuable insights for the management of thyroid nodules. First, a K-TIRADS 4B nodule may be considered for biopsy if a patient presents with multiple K-TIRADS 4 nodules of comparable size. Second, modified biopsy criterion 3 might be applied in patients with clinical risk factors, particularly when the K-TIRADS 4 nodule is in proximity to vital structures such as the trachea or recurrent laryngeal nerves. Third, employing modified biopsy criterion 3 instead of criterion 2 could enhance the detection of malignancy in children, who may require more aggressive management [21-24]. However, the effectiveness of these modified biopsy criteria, based on subcategorization, will need future validation in pediatric populations.
A recent study from a single institution [7] reported a higher malignancy rate in K-TIRADS 4 nodules with suspicious findings (subgroup 2 nodules) compared to those without suspicious findings (subgroup 1 nodules). This finding contrasts with the results of the present study and previous research, which indicated similar malignancy rates between the two subgroups of K-TIRADS 4 nodules [11,25,26]. The discrepancy may be partly due to an overestimation of the malignant risk associated with PEF in retrospective US assessments [27]. Notably, the malignancy risk for K-TIRADS 4 nodules with PEF in their study was significantly higher (47.9%) than the risks estimated in other studies, which ranged from 9.2% to 25.9% [25-27].
The present study has several limitations. First, this was a retrospective study and included only nodules that had undergone a biopsy. Therefore, selection bias was unavoidable. Second, the final diagnosis of many benign nodules was based on one FNA or CNB benign result, which may have resulted in false-negative results.
In conclusion, K-TIRADS 4 nodules were subcategorized into K-TIRADS 4B (higher risk) and K-TIRADS 4A (lower risk), based on the presence of five high-risk US features that increase the risk of malignancy in K-TIRADS 4 nodules. The modified biopsy criterion 3 significantly improved the diagnostic yield for malignancy compared to biopsy criterion 2, which has a size cut-off of 1.5 cm. It also demonstrated a similar diagnostic yield and a reduced false referral rate compared to biopsy criterion 1, with a size cut-off of 1 cm. Modified biopsy criterion 3 could be used complementarily with biopsy criterion 2 in patients who require a higher detection rate for malignancy.
Notes
Author Contributions
Conceptualization: Lee B, Na DG, Kim JH. Data acquisition: Lee B, Na DG. Data analysis or interpretation: Lee B, Na DG. Drafting of the manuscript: Lee B, Na DG. Critical revision of the manuscript: Lee B, Na DG, Kim JH. Approval of the final version of the manuscript: all authors.
Ji-hoon Kim serves as Editor for the Ultrasonography, but has no role in the decision to publish this article. All remaining authors have declared no conflicts of interest.
Acknowledgements
This research was supported by the Medical Research Promotion Program through the GangNeung Asan Hospital funded by the Asan Foundation (2018-C03).
Supplementary Material
References
Article information Continued
Notes
Key point
The five high-risk ultrasonography (US) features independently increased the malignancy risk of Korean Thyroid Imaging Reporting and Data System (K-TIRADS) 4 nodules. K-TIRADS 4 nodules could be subcategorized according to the number of high-risk US features: K-TIRADS 4B (≥2 US features) and K-TIRADS 4A (≤1 US features). The modified biopsy criterion 3 showed a similar or higher diagnostic yield compared to criteria 1 or 2.