GI-RADS versus O-RADS in the differential diagnosis of adnexal masses: a systematic review and head-to-head meta-analysis
Article information
Abstract
Purpose
The aim of this study was to compare the diagnostic performance of the Gynecology Imaging Reporting and Data System (GI-RADS) and Ovarian-Adnexal Reporting and Data System (O-RADS) ultrasound (US) classification systems and assess their capacity to stratify the risk of malignancy in adnexal masses (AMs).
Methods
A comprehensive search of MEDLINE (PubMed), Scopus, Web of Science, and Google Scholar was conducted to identify articles published between January 2020 and August 2023. The quality of the studies, the risk of bias, and concerns regarding applicability were assessed using QUADAS-2.
Results
The search yielded 132 citations. Five articles, which included a total of 2,448 AMs, were ultimately selected for inclusion. The risk of bias was high in all articles regarding patient selection, low in four studies for the index test, and unclear in three papers for the reference test. For GI-RADS, the pooled sensitivity and specificity were 90.8% (95% confidence interval [CI], 86.0% to 94.0%) and 91.5% (95% CI, 89.0% to 93.0%), respectively. For O-RADS, the pooled sensitivity and specificity were 95.1% (95% CI, 93.0% to 97.0%) and 88.8% (95% CI, 85.0% to 92.0%), respectively. O-RADS demonstrated greater sensitivity for malignancy than GI-RADS (P<0.05). Heterogeneity was moderate for both sensitivity and specificity with respect to GIRADS; for O-RADS, heterogeneity was moderate for sensitivity and high for specificity.
Conclusion
Both GI-RADS and O-RADS US demonstrate good diagnostic performance in the preoperative assessment of AMs. However, the O-RADS classification provides superior sensitivity.
Introduction
Most adnexal masses are benign and do not necessitate immediate surgical intervention [1]. However, ovarian cancer represents one of the most lethal malignancies, exhibiting an overall 5-year survival rate below 40% [2,3]. Consequently, accurate differentiation between benign and malignant adnexal masses is essential for appropriate treatment [4-9].
Transvaginal ultrasound is recognized as the first-line imaging technique for evaluating adnexal masses and offers the best diagnostic performance [10]. Historically, the assessment of adnexal masses via ultrasound has relied on subjective interpretation using a "pattern recognition" approach [11]. However, the main limitation of pattern recognition is that both its diagnostic accuracy and the examiner’s confidence in the diagnostic judgment depend heavily on the level of experience [12]. In recent decades, several scoring systems have been developed to identify and characterize the ultrasound features of adnexal masses [13-16].
The Gynecology Imaging Reporting and Data System (GI-RADS) was introduced in 2009 to standardize the reporting of adnexal masses [17]. This system utilizes pattern recognition based on criteria established by the International Ovarian Tumor Analysis (IOTA) group [18]. In 2020, the American College of Radiology introduced the Ovarian-Adnexal Reporting and Data System (O-RADS), which also classifies adnexal masses but with the specific aim of stratifying their risk [19]. Both systems categorize adnexal masses into five distinct risk groups.
Several studies have addressed the diagnostic performance of these two classifications [20-32]. However, limited data are available regarding the comparability and reproducibility of the systems. This systematic review and meta-analysis aimed to compare the diagnostic performance of both systems in classifying the risk of malignancy in adnexal masses.
Materials and Methods
Protocol and Registration
This systematic review and meta-analysis was reported in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 statement (available at http://www.prisma-statement.org/) and the Synthesizing Evidence from Diagnostic Accuracy Tests (SEDATE) guidelines [33,34]. The meta-analysis was not registered. Given the nature of the research study, neither informed consent nor ethics committee approval was required.
Data Sources and Search Process
A search was conducted across PubMed, Scopus, Web of Science, and Google Scholar to identify studies comparing the O-RADS and GI-RADS classifications for adnexal masses, published between January 2020 and August 2023. The search terms were limited to the keywords "O-RADS" and "GI-RADS." No methodological filters or language restrictions were applied.
Study Selection and Data Collection
Two authors, M.P. and A.M., independently conducted literature searches. Records that were duplicates, unrelated to the research question, or not primary studies were excluded. The retrieved titles and abstracts were then examined by the authors to ensure that the articles compared the diagnostic performance of the two classification systems. Additionally, to prevent the inclusion of duplicate data from different reports by the same authors, the most recent publication was selected based on the date of publication. This was done under the assumption that patients from an earlier study may have been included in later research. Subsequently, M.P. and A.M. reviewed the full texts of the remaining articles.
The two reviewers (M.P. and A.M.) independently applied the selection criteria described below:
- A prospective or retrospective primary cohort study that includes at least 100 patients;
- Female patients presenting with an ovarian cyst or mass identified during routine transabdominal ultrasound, pelvic/ transvaginal ultrasound, or both (as the index test);
- Reference standards of histopathology diagnosis or adequate follow-up of at least 1 year; and
- The presence of data necessary to populate 2×2 contingency tables as a minimum requirement for assessing diagnostic performance.
Authors were contacted and asked to provide any missing or discrepant data.
Risk of Bias in Individual Studies
The methodological quality of the primary studies included in this systematic review and meta-analysis, which focuses on diagnostic test accuracy, was assessed using the QUADAS-2 tool [35]. Three authors (M.P., A.M., and J.L.A.) independently evaluated the methodological quality of each report.
The QUADAS-2 format comprises four key domains: patient selection, index test, reference standard, and flow and timing. Each domain is evaluated for risk of bias, which is rated as low, high, or unclear. Additionally, the first three domains are assessed for concerns regarding applicability, also rated as low, high, or unclear risk. The results of this quality assessment were utilized to examine the overall quality of the study.
Statistical Analysis
Two authors, M.P. and A.M., independently extracted data on true positives, true negatives, false positives, and false negatives for the O-RADS and GI-RADS classification systems from each study included in the meta-analysis.
Pooled sensitivity, specificity, positive likelihood ratio (LR+), negative likelihood ratio (LR-), and diagnostic odds ratio (DOR) were estimated using the random effects model. The sensitivity and specificity of O-RADS and GI-RADS classifications were compared using the bivariate method.
Likelihood ratios were calculated to indicate the clinical utility of the test and to estimate the post-test probability of disease [36]. Fagan nomograms were used to graphically demonstrate changes in pretest probability.
Forest plots of sensitivity and specificity were created. Heterogeneity for sensitivity and specificity was assessed using the Cochran Q statistic and the I2 index [37]. Hierarchical summary receiver operating characteristic curves were designed and plotted, while publication bias was assessed using the method described by Deeks et al. [38].
All analyses were conducted using the MIDAS (Meta-analytical Integration of Diagnostic Accuracy Studies) and METANDI commands in STATA version 12.0 for Windows (StataCorp., College Station, TX, USA). A P-value of less than 0.05 was considered to indicate statistical significance.
Results
Search Results
The search yielded a total of 132 citations. Based on screening of the titles, 22 duplicates were removed. A total of 104 citations were further excluded for reasons including irrelevance to the topic under review, lack of comparison between the two classification systems, absence of data necessary to construct a 2×2 table, and non-primary nature of the studies. Additionally, two studies had the same first author but different publication dates; to prevent the inclusion of duplicate data, the more recent publication was included. Consequently, five articles were ultimately selected for qualitative and quantitative analysis. Fig. 1 presents a flowchart summarizing the literature search process.
Characteristics of Included Studies
Five studies published between January 2020 and August 2023, which analyzed 2,448 adnexal masses in 2,410 women, were included in the analysis [21,26-29]. The characteristics of these studies are presented in Table 1. Of these, 658 were malignancies, accounting for 26.9% of the total. The prevalence rate of malignant lesions ranged from 14.0% to 37.1%.
Histological diagnosis served as the reference standard in all included publications. In one study, the pathologist was blinded to the findings of the index test [21]. The remaining studies did not provide this information [26-29]. None of the studies specified the time interval between ultrasound evaluations and surgery.
Methodological Quality of Included Studies
The risk of bias and concerns regarding the applicability of the selected studies were assessed using QUADAS-2, as shown in in Fig. 2.
The study design was retrospective in three studies [21,26,29] with only one specifying that it was consecutive [21]. In the two other studies, the study design was not specified [27,28].
All studies were considered to have a high risk of bias in the domain of patient selection due to inappropriate exclusion criteria, such as patients with poor-quality images, those with incomplete evaluations, and pregnant women.
Concerning bias in the index test, four reports indicated that observers were blinded to the reference standard [21,26,27,29]. However, one report was assessed as having an unclear risk of bias because it did not provide information about examiner blinding [28]. Similarly, most studies utilized the IOTA lexicon for O-RADS classification and pattern recognition for the GI-RADS scoring system [21,26,27]. Thus, this study could compare the diagnostic performance of both approaches.
In the reference standard domain, three studies were deemed to have an unclear risk because they did not disclose whether the pathologist was blinded to prior sonographic evaluations [26-28]. The other articles explicitly stated that the pathologists were blinded to the imaging results.
In the flow and timing domain, the time elapsed between the index test and the reference standard was unclear for all studies.
Regarding applicability, all studies were assessed as low risk in the domains of patient selection (target population: patients with suspected adnexal masses), index test (ultrasound), and reference standard (histopathological diagnosis).
Diagnostic Performance of O-RADS and GI-RADS
The pooled sensitivity and specificity of the O-RADS scoring system for the preoperative characterization of benign and malignant ovarian tumors were 95.1% (95% confidence interval [CI], 93.0% to 97.0%) and 88.8% (95% CI, 85.0% to 92.0%), respectively. Moderate heterogeneity was observed for sensitivity (I2=28.26; Cochran Q=5.6; P=0.23) along with high heterogeneity for specificity (I2=77.5; Cochran Q=17.78; P<0.001) (Fig. 3A). The LR+ and LR- were 8.5 (95% CI, 6.5 to 11.2) and 0.06 (95% CI, 0.04 to 0.08), respectively. The DOR was 153 (95% CI, 92 to 256).
The pooled sensitivity, specificity, LR+, and LR- of the GI-RADS classification for differentiating between benign and malignant adnexal masses prior to surgery were 90.8% (95% CI, 86.0% to 94.0%), 91.5% (95% CI, 89.0% to 93.0%), 10.7 (95% CI, 8.2 to 14.0), and 0.10 (95% CI, 0.07 to 0.15), respectively. Moderate heterogeneity was observed for both sensitivity (I2=66.7%; Cochran Q=12.01; P=0.02) and specificity (I2=62.0%; Cochran Q=10.53; P=0.03) (Fig. 3B). The DOR was 107 (95% CI, 59 to 192).
When comparing the sensitivity and specificity of the O-RADS and GI-RADS classification systems, O-RADS demonstrated significantly higher sensitivity (P=0.038). No statistically significant difference in specificity was found between the two systems (P=0.126).
Summary receiver operating characteristic curves illustrating the diagnostic performance of the O-RADS and GI-RADS ultrasound scoring systems are presented in Fig. 4. The area under the curve was 0.97 (95% CI, 0.96 to 0.99) for the O-RADS score (Fig. 4A) and 0.96 (95% CI, 0.94 to 0.98) for the GI-RADS system (Fig. 4B).
Fagan nomograms demonstrated that a positive test using the O-RADS system significantly increased the pretest probability of correctly categorizing adnexal masses, from 26% to 75%. Conversely, a negative test significantly reduced the pretest probability from 26% to 2% (Fig. 5A). Similarly, a positive test with the GI-RADS ultrasound system significantly raised the pretest probability of accurately characterizing adnexal masses, from 26% to 79%, while a negative test significantly lowered the pretest probability from 26% to 3% (Fig. 5B).
Publication bias was not observed for either O-RADS (P>0.99) or GI-RADS (P=0.52).
Discussion
Summary of Evidence
In this systematic review and meta-analysis, the GI-RADS and O-RADS ultrasound classification systems were compared in terms of their diagnostic performance in stratifying the risk of malignancy of adnexal masses. Five papers were included, collectively examining a total of 2,448 adnexal masses; of these, 658 were confirmed as malignant by histology following surgical removal. The GI-RADS system demonstrated a pooled sensitivity of 90.8% (95% CI, 86.0% to 94.0%) and a pooled specificity of 91.5% (95% CI, 89.0% to 93.0%). In comparison, the O-RADS system had a pooled sensitivity of 95.1% (95% CI, 93.0% to 97.0%) and a pooled specificity of 88.8% (95% CI, 85.0% to 92.0%).
Regarding study quality, three of the five articles utilized a retrospective design [21,26,29]. The "patient selection" domain was identified as representing a high risk of bias for all included papers. Furthermore, as all studies were conducted in either Egypt or China, the generalizability of the quantitative data obtained is limited.
Interpretation of the Results and Relevance of the Topic
The results of the present meta-analysis indicate that both the O-RADS and GI-RADS classification systems demonstrate high sensitivity and specificity in differentiating preoperative ovarian and adnexal lesions, with the O-RADS system showing higher sensitivity. However, moderate heterogeneity was noted for sensitivity and high heterogeneity for specificity across studies regarding the O-RADS system. Similarly, moderate heterogeneity was observed for both sensitivity and specificity for the GI-RADS system. Thus, these findings should be interpreted with caution. Furthermore, regarding article quality, prospective studies involving substantial patient cohorts are required in future research.
From a clinical perspective, O-RADS represents an additional ultrasound scoring tool to improve the evaluation, reporting, and management of ovarian and adnexal lesions. The lower specificity of O-RADS ultrasound compared to its sensitivity may be indicative of its primary aim: to maximize the detection of malignant ovarian masses and thus prevent missed cancer diagnoses, while still minimizing unnecessary surgical procedures for patients with non-malignant adnexal masses. For the O-RADS classification, this study identified a false positive rate of 11.2%. This finding is meaningful, considering that the observers involved in most of the articles included in this meta-analysis were expert examiners. The false-positive rate for GI-RADS classification was slightly lower, at 8.5%.
Strengths and Limitations
Previous meta-analyses have addressed the diagnostic performance of GI-RADS and O-RADS. Regarding GI-RADS, Guo et al. [39] reported an analysis of 10 studies involving 2,474 women, showing that the pooled sensitivity and specificity for this classification system were 95% and 90%, respectively. Subsequently, Alcazar et al. [6] presented data from 26 studies including 7,350 women. They observed that the pooled sensitivity and specificity were 94% and 90%, figures comparable to the present findings.
In contrast, regarding the O-RADS classification, Vara et al. conducted a meta-analysis based on 11 studies involving 4,634 women [40]. These authors reported a pooled sensitivity of 97% and a specificity of 77%. Zhang et al. [41] reported another meta-analysis, which included data from 15 studies with a total of 6,223 women. Their findings indicated a pooled sensitivity and specificity of 95% and 82%, respectively. More recently, Lee et al. [42] presented a third meta-analysis that analyzed data from 18 studies, encompassing 11,605 women. They reported that the O-RADS classification had a pooled sensitivity and specificity of 96% and 77%, respectively. However, none of these meta-analyses performed a comparison between the two classification systems.
The primary strength of the present study is that, to these authors’ knowledge, it is the first head-to-head meta-analysis on this topic in the literature. Specifically, both systems were applied to the same patient cohort, enabling a direct comparison. Furthermore, this analysis encompasses publications from early 2020 to the present, thereby including most of the published evidence on the O-RADS ultrasound system for classifying malignant and benign ovarian lesions, which was first described in 2020.
Furthermore, most studies classified adnexal masses using the IOTA lexicon for O-RADS classification and pattern recognition for the GI-RADS scoring system. By comparing these methods, this analysis could reveal whether both approaches provide similar diagnostic performance.
Regarding the limitations of the present meta-analysis, the primary concern is the small number of papers included, which was only five studies. Consequently, one must interpret the data with caution. Another limitation is the retrospective nature of the data collection in the included articles, none of which collected data prospectively. Despite these limitations, a relatively large sample size was achieved overall, and all cases included had histological confirmation of the diagnosis. Most of the articles exclusively included patients with confirmed pathology; this introduces potential selection bias, as it likely excluded functional ovarian lesions that had resolved spontaneously by ultrasound follow-up. Furthermore, as mentioned previously, another limitation is the geographical concentration of the research analyzed; all included studies were conducted in Egypt and China, raising questions about the external validity of the results. Additionally, the arbitrary selection of studies with more than 100 cases could be seen as another source of selection bias.
Despite these limitations, the present findings may have value due to their apparent clinical relevance. Further investigations should be aimed at more thoroughly comparing the diagnostic performance of the O-RADS and GI-RADS ultrasound evaluation systems.
Future Research Agenda
Considering the results of this meta-analysis, and given the limited number of studies included as well as their retrospective design, the need exists for large-scale, multicenter prospective studies to compare the diagnostic accuracy of these methods in preoperatively classifying adnexal masses.
Additionally, further studies are needed to assess the reproducibility of the O-RADS system. Similarly, research is required to determine the most appropriate follow-up guidelines. To date, no publications have evaluated the performance of the O-RADS system when used by non-expert examiners. Therefore, it is crucial to conduct more studies addressing this key issue.
This research demonstrates that both the O-RADS and GI-RADS ultrasound scoring systems possess high sensitivity in differentiating malignant ovarian and adnexal lesions. Notably, O-RADS exhibited greater sensitivity than GI-RADS. However, the specificity was higher in the GI-RADS classification, although this difference was not statistically significant. Prospective external validation of the O-RADS system is still required.
Notes
Author Contributions
Conceptualization: Vara J, Alcázar JL. Data acquisition: Pérez M, Meseguer A, Alcázar JL. Data analysis or interpretation: Vilches JC, Brunel I, Lozano M, Orozco R, Alcázar JL. Drafting of the manuscript: Pérez M, Meseguer A, Alcázar JL. Critical revision of the manuscript: Vara J, Vilches JC, Brunel I, Lozano M, Orozco R. Approval of the final version of the manuscript: all authors.
No potential conflict of interest relevant to this article was reported.
References
Article information Continued
Notes
Key point
The Gynecology Imaging Reporting and Data System (GI-RADS) and Ovarian-Adnexal Reporting and Data System (O-RADS) classification systems demonstrate good diagnostic performance in distinguishing benign from malignant adnexal masses.