Combined application of deep learning and conventional computer vision for kidney ultrasound image classification in chronic kidney disease: preliminary study

Article information

Ultrasonography. 2025;44(5):346-353
Publication date (electronic) : 2025 June 15
doi : https://doi.org/10.14366/usg.25074
1Department of Radiology, University of Massachusetts Chan Medical School, Worcester, MA, USA
2Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea
3Division of Nephrology, Department of Internal Medicine, Seoul National University Hospital and Seoul National University College of Medicine, Seoul, Korea
Correspondence to: Young H. Kim, MD, PhD, Department of Radiology, University of Massachusetts Chan Medical School, 55 N Lake Avenue, Worcester, MA 01655, USA Tel. +1-508-334-2087 Fax. +1-508-334-3947 E-mail: young.kim@umassmemorial.org
*

These authors contributed equally to this work.

Received 2025 April 29; Revised 2025 June 15; Accepted 2025 June 15.

Abstract

Purpose

This study evaluates the feasibility of combining deep learning (DL) and conventional computer vision techniques to classify kidney ultrasound (US) images for the presence or absence of chronic kidney disease (CKD).

Methods

A retrospective analysis was conducted on 258 kidneys (124 normal and 134 with CKD). A DL model was trained using midsagittal US images of the right kidney and corresponding contour maps to automate measurements of parenchymal thickness and parenchyma-to-sinus ratios. These features were integrated with a convolutional neural network for classification. The ground truth was determined based on clinical CKD diagnosis and laboratory data.

Results

The combined DL and conventional feature extraction model achieved an accuracy of 82%, with a specificity of 93% and a negative predictive value of 97%. This approach outperformed models that relied solely on raw US images using DL, which achieved an accuracy of 64%. The inclusion of contour-based parenchymal measurements enhanced classification performance.

Conclusion

The integration of DL with automated feature extraction enables accurate classification of CKD using minimal user input. This proof-of-concept study highlights the potential of combining artificial intelligence–driven analysis with traditional metrics to serve as a noninvasive adjunct for CKD diagnosis and monitoring.

Graphic Abstract

Introduction

The diagnosis of chronic kidney disease (CKD) through ultrasound (US) has been a challenging issue, despite its proven efficacy as a tool for excluding causes other than chronic renal parenchymal abnormalities. Kidney US is a valuable diagnostic modality for identifying hydronephrosis and other structural abnormalities that may contribute to renal dysfunction. Once these causes are excluded, kidney US provides critical information such as renal size and parenchymal changes, although the manifestations of these findings can vary widely depending on the underlying renal disease and its stage. Additionally, the measurements and interpretations provided by US are often subject to operator and patient variability introducing potential inconsistencies. Despite these limitations, US has been established as a valuable first-line imaging modality for renal assessment due to its accessibility, safety, and cost-effectiveness.

Given the subjective nature of image interpretation and the variability inherent in traditional US evaluation, we sought to investigate whether artificial intelligence (AI) could overcome these limitations. By leveraging AI, we aim to achieve more objective and consistent results, enhancing the diagnostic accuracy and reproducibility of kidney US in the evaluation of CKD. From a global impact perspective, success in this endeavor can affect significant portions of the global population, as CKD is seen as a leading cause of global mortality with reported projections of an eventual top 5 leading cause of death [1-3]. Employing this technology in conjunction with modern analytic tools would have significant potential financial impact as well considering the global financial burden imposed by the disease process [4-6]. As such, creation of an AI algorithm with high diagnostic impact for the noninvasive assessment of CKD can offer potential significant improvements to patient management and clinical workflow.

Materials and Methods

Compliance with Ethical Standards

This retrospective study was approved by the Institutional Review Board of the University of Massachusetts Chan Medical School with informed consent waived (IRB ID: STUDY00000348).

Patient Selection

Adult patients (≥18 years) who underwent kidney US between 2017 and 2021 were included. A radiology report database (Nuance mPower) was used to retrieve cases. A total of 300 patients were initially enrolled with laboratory data including age, sex, body mass index, diabetes and hypertension status, estimated glomerular filtration rate (GFR) within 6 months of the US, and International Classification of Diseases (ICD) codes for CKD status. Exclusion criteria included examinations with poor image quality (e.g., beam artifact, rib shadowing, or bowel gas obscuring >25% of renal parenchyma), absence of contralateral kidney, and cases with hydronephrosis, renal masses, or large cysts. This schema is illustrated in Fig. 1. After exclusions were executed on the basis of image quality (removing 2 normal and 10 CKD cases), 136 normal cases (GFR >60 mL/min/1.73 m2, no CKD ICD code in chart) and 152 CKD cases (GFR ≤60 mL/min/1.73 m2, Kidney Disease Improving Global Outcomes [KDIGO] stages I-V) remained. On post-processing imaging analysis, 12 cases were removed from the normal cohort and 18 cases from the CKD cohort due to inability to reliably measure mean cortical thickness or absence of mask segmentation. Ultimately, 124 normal and 134 CKD cases were analyzed after allotment into training, validation, and testing cohorts at an 80:10:10 split for AI algorithm development and testing. Diabetes mellitus type II was present in 16 normal cases (12.9%) and 55 CKD cases (41.0%). In similar vein, hypertension was present in 43 normal cases (34.7%) and 102 CKD cases (76.1%). This proof-of-concept study focused on routine non-hydronephrotic kidneys for deep learning (DL) model development.

Fig. 1.

Recruitment flow diagram.

Schema was utilized for recruitment into study, with case reports evaluated via Nuance mPower software. A total of 138 normal cases and 162 cases with history and imaging features of chronic kidney disease (CKD) were initially analyzed based on clinical criteria. After quality control and image review, 136 normal cases and 152 cases of CKD remained. On imaging post-processing review, 124 normal cases and 134 CKD cases subsequently underwent randomized assignment into training, validation, and testing sets at an 8:1:1 ratio. AI, artificial intelligence; KDIGO, Kidney Disease Improving Global Outcomes.

Clinical Data Collection and Analysis

Demographics, estimated GFR within 6 months prior to US, diabetes status, and clinical CKD diagnosis were collected (Table 1). Radiology reports were reviewed to document renal length. Patients with CKD were selected based on KDIGO criteria, which define CKD as kidney abnormalities persisting for over 3 months, categorized by GFR and albuminuria [7]. These clinical diagnoses served as the ground truth for DL algorithm training. Statistical analysis was performed utilizing a two-tailed independent samples t-test to assess the statistical significance of differences in mean age, mean renal lengths, and mean cortical thicknesses. Additionally, a two-proportion Z-test was performed to evaluate the statistical significance of differences in the presence of hypertension, the presence of diabetes, radiology report stated cortical thinning, as well as radiology report stated increased renal parenchymal echogenicity. For all tests, P-values were calculated, with statistical significance set at α=0.001.

Patient demographics

US Image Acquisition

In accordance with institutional protocol, all US examinations were performed using 2-9 MHz curvilinear transducers (EPIQ Elite, Philips Healthcare, Bothell, WA, USA). Accredited sonographers obtained the images, which were subsequently reviewed by radiologists to ensure quality. For this study, single midsagittal greyscale images of the right kidney were selected to maintain strict image uniformity. Renal length was manually measured, and the images were cropped, de-identified, and assigned random IDs to ensure anonymity during analysis.

Contour Map Preparation

Contour maps were created to outline the external margins of the kidney and renal sinus using GIMP software (v2.10.34). These maps enabled the automated extraction of key structural metrics, including renal length, parenchymal thickness, parenchyma-to-sinus ratios, and functional renal parenchymal area (FRA). Renal length was provided as the basic input, as variations in the US field of view make it difficult to accurately measure the actual size. Using the provided renal length, other parameters were automatically calculated. The contour maps served as input for conventional computer vision algorithms, ensuring precise and reproducible measurements from static images. This process is shown in Fig. 2.

Fig. 2.

Selected cases

A. Normal kidney demonstrates normal corticomedullary differentiation, normal cortical echogenicity, and normal length (not shown to scale). B. Imaging features of chronic kidney disease with presence of decreased renal cortical and parenchymal thickness, increased renal cortical echogenicity, and decreased renal length (not shown to scale). C, D. The corresponding masks delineate parenchyma in white and renal sinus fat as the central black region is below both cases as (C) and (D), respectively.

Image Analysis

Segmentation masks were generated to fill the area between the external kidney and sinus margins. FRA was calculated by counting white pixels in the mask. Parenchymal thickness was determined using two ellipses approximating the outer and inner boundaries, with the midsection of a straight line connecting the ellipses representing the thickness, which is demonstrated in Fig. 3.

Fig. 3.

Elliptical schema for parenchymal thickness determination.

Two cases that illustrate the process of measuring parenchymal thickness: (A) a normal kidney and (B) a kidney with chronic kidney disease. The inner (pink dashed) and outer (blue dashed) ellipses are fitted to the contour points of the mask delineating parenchyma using OpenCV’s fitEllipse function. The straight orange arrow originates from the center of the inner ellipse and is oriented according to the angle of rotation of the inner ellipse. The segment of the orange line between the inner and outer ellipses corresponds to the parenchymal thickness.

DL Model Development

The dataset was split into training (80%), validation (10%), and testing (10%) subsets with a balanced distribution in validation and testing sets. Images were resized (640×448) and normalized (-1 to 1).

The DL architecture combined a convolutional neural network (CNN) and a multilayer perceptron (MLP), which is shown in Fig. 4. The pre-trained VGG16 CNN model extracted spatial features, and these features were flattened to a 1D feature vector. Simultaneously, kidney structural metrics, such as renal length and parenchymal thickness, were processed through an MLP to produce a 128-dimensional feature vector. The feature vectors of the image and structural metrics were concatenated to form a combined representation. A 10% dropout layer was applied to improve robustness. These features were processed by another MLP with a SoftMax activation layer to classify images as either normal or CKD. The model was trained using supervised learning with a batch size of 32, a cross-entropy loss function, and the Adam optimizer (learning rate 0.0002) for 100 epochs. Python and TensorFlow were used for implementation.

Fig. 4.

Deep learning model architecture.

Pictorial diagram illustrates the algorithm development, as detailed under Materials and Methods. Images were evaluated by a convolutional neural network and VGG16 Encoder for image feature extraction in addition to input variables including renal length, parenchymal thickness, and mask ratios for metric features. Image and metric features were utilized together in a multilayer perceptron to assign binary classes with a 10% dropout. This process was repeated to optimization. CKD, chronic kidney disease; MLP, multilayer perceptron.

Results

Model Comparisons

Three models were evaluated which were generated off the training and validation cohorts and applied on the testing cohort:

(1) DL+contour map+renal length: This combined approach achieved the best results, with an accuracy of 82%, sensitivity of 71%, specificity of 93%, positive predictive value (PPV) of 0.53, negative predictive value (NPV) of 0.97, positive likelihood ratio (LR+) of 10.14, and negative likelihood ratio (LR-) of 0.31. Contour mapping improved classification by enabling automated calculation of parenchymal thickness, parenchyma-to-sinus ratios, and FRA.

(2) DL+contour map (excluding renal length): This model showed slightly lower specificity (79%) but higher sensitivity (86%), with an accuracy of 82%, PPV of 0.31, NPV of 0.98, LR+ of 4.10, and LR- of 0.18, underscoring the importance of renal length in enhancing specificity.

(3) DL only (raw image input): This model demonstrated the lowest performance, with an accuracy of 64%, sensitivity of 71%, specificity of 57%, LR+ of 1.65, and LR- of 0.51.

Including renal length and structural features extracted from contour maps significantly improved diagnostic accuracy and specificity, highlighting the importance of integrating conventional metrics with DL (Table 2).

Results generated during AI algorithm development using deep learning, minimal user entry data, and extracting imaging features

Additional Metrics

Attempts to use ratios of renal parenchyma pixel intensity relative to the sinus for simulating echogenicity scoring were not discriminatory between CKD and normal groups, likely due to the limited field of view. However, combining manual and automated structural parameter assessments enhanced reproducibility and diagnostic performance overall.

Discussion

This study demonstrates the feasibility of combining DL and conventional computer vision techniques for kidney US analysis to predict CKD with minimal user input. Using ICD codes as the ground truth and incorporating midsagittal renal length, parenchymal thickness, and FRA, the model achieved an accuracy of 82%, with high specificity (93%) and a NPV (97%).

In traditional US interpretation, in the absence of an analysis tool such as AI, we see variability in US diagnostic performance dependent on the quantitative metrics used. Key reported US metrics of renal structural health include renal length, parenchymal thickness, as well as renal parenchymal echogenicity [8]. For example, parenchymal thickness of ≤1.4 cm achieves a sensitivity of 82% but a specificity of only 30% for early CKD detection. Similarly, renal length ≤9.5 cm demonstrates sensitivity and specificity values of 67% and 26%, respectively, for advanced CKD. Renal cortical thickness measurements of ≤6.1 mm perform slightly better, with a sensitivity of 82% and specificity of 48% [9-13]. These findings underscore the limitations of traditional US metrics in reliably diagnosing CKD and the need for innovative approaches to enhance diagnostic accuracy.

The results of our proof-of-concept study highlight the potential of integrating DL with traditional metrics to enhance diagnostic performance. Our study algorithm aims to integrate numerical features from US images enabling robust discrimination of CKD status. This approach sets the foundation for broader applications, including scenarios involving hydronephrosis and space-occupying lesions, with potential integration into clinical US workflows to improve diagnostic accuracy and efficiency. This study differs from previously reported results on this topic using machine learning and computer-extracted measurable features by Lee et al in several ways [14]. For instance, our data input included a narrow field of view only containing the kidney (to allow generalizability of algorithm despite unilaterality) and excluded algorithmic input of diabetes presence (due to presence in both normal and CKD populations), focusing solely on raw image input, renal length, and purely extractable renal features. Our team then generated several different AI models to highlight how input parameters affected diagnostic performance and generated an algorithm with high specificity without the use of a diabetic status input.

Diagnosing CKD via US is challenging due to variability in disease manifestations and reliance on subjective parameters. By leveraging DL and automated feature extraction, our approach minimizes variability and enhances reproducibility. However, it remains uncertain how DL algorithms effectively address the fundamental limitation posed by the inherent variability of disease manifestations [15]. Future iterations could incorporate real-time cine US, allowing AI to independently detect and analyze kidneys under varying conditions, including hydronephrosis or lesions.

This study has several limitations. The sample size is relatively small, and data were sourced from a single institution, which limits generalizability. The sample size was limited to unilateral kidney analysis to maintain data uniformity. Although this approach had the potential to include patients with renal failure due to unilateral conditions—such as renal artery stenosis affecting the contralateral, non-studied kidney—no such cases were identified in our cohort. The sample also included diabetic patients in both arms, where the presence of diabetes can preserve normal kidney length until late stages of CKD [16]. Additionally, cases involving hydronephrosis and significant renal lesions were excluded. The cropping of images, while necessary for this proof-of-concept for purposes of data anonymization, restricted the field of view and precluded the use of liver echogenicity as a reference for renal parenchyma comparison. Attempts to use internal echogenicity ratios failed to discriminate CKD effectively. Lastly, the current model relies on manual contour mapping, which restricts automation in its current iteration. Despite these limitations, this study highlights the clinical applicability of DL for US-based CKD assessment. Future steps include expanding datasets, incorporating common renal pathologies, integrating clinical information to enhance specificity, and enabling real-time analysis to develop a more robust, autonomous model for clinical use. This method has the potential to be applied in population-based studies and, in the near future, real-world clinical settings to improve CKD monitoring, streamline workflows, and assist in clinical decision-making.

Combined application of DL and conventional computer vision with minimal user input, focusing on kidney structural features, can serve as an automated clinical tool for CKD assessment. Future algorithmic development addressing the aforementioned limitations will prove useful to extend algorithm applicability across varying clinical scenarios. Real-time analysis of kidney US image using a further developed version of this algorithm has potential for significant clinical impact in resource-limited environments.

Notes

Author Contributions

Conceptualization: Svrcek PT, Jang J, Ge C, Kim YH. Data acquisition: Svrcek PT, Ge C. Data analysis or interpretation: Jang J, Ge C, Lee H, Kim YH. Drafting of the manuscript: Svrcek PT, Jang J, Ge C, Lee H, Kim YH. Critical revision of the manuscript: Svrcek PT, Jang J, Ge C, Lee H, Kim YH. Approval of the final version of the manuscript: all authors.

Conflict of Interest

No potential conflict of interest relevant to this article was reported.

References

1. Kovesdy CP. Epidemiology of chronic kidney disease: an update 2022. Kidney Int Suppl (2011) 2022;12:7–11.
2. Jager KJ, Kovesdy C, Langham R, Rosenberg M, Jha V, Zoccali C. A single number for advocacy and communication-worldwide more than 850 million individuals have kidney diseases. Kidney Int 2019;96:1048–1050.
3. Foreman KJ, Marquez N, Dolgert A, Fukutaki K, Fullman N, McGaughey M, et al. Forecasting life expectancy, years of life lost, and all-cause and cause-specific mortality for 250 causes of death: reference and alternative scenarios for 2016-40 for 195 countries and territories. Lancet 2018;392:2052–2090.
4. Kim SH, Jo MW, Go DS, Ryu DR, Park J. Economic burden of chronic kidney disease in Korea using national sample cohort. J Nephrol 2017;30:787–793.
5. Manns B, Hemmelgarn B, Tonelli M, Au F, So H, Weaver R, et al. The cost of care for people with chronic kidney disease. Can J Kidney Health Dis 2019;6:2054358119835521.
6. Betts KA, Song J, Faust E, Yang K, Du Y, Kong SX, et al. Medical costs for managing chronic kidney disease and related complications in patients with chronic kidney disease and type 2 diabetes. Am J Manag Care 2021;27(20 Suppl):S369–S374.
7. KDIGO 2012 clinical practice guideline for the evaluation and management of chronic kidney disease [Internet]. Brussels: Kidney Disease Improving Global Outcomes (KDIGO), 2013 [cited 2023 Sep 7]. Available from: https://kdigo.org/wp-content/uploads/2017/02/KDIGO_2012_CKD_GL.pdf.
8. Singla RK, Kadatz M, Rohling R, Nguan C. Kidney ultrasound for nephrologists: a review. Kidney Med 2022;4:100464.
9. Korkmaz M, Aras B, Guneyli S, Yilmaz M. Clinical significance of renal cortical thickness in patients with chronic kidney disease. Ultrasonography 2018;37:50–54.
10. Siddappa JK, Singla S, Al Ameen M, Rakshith SC, Kumar N. Correlation of ultrasonographic parameters with serum creatinine in chronic kidney disease. J Clin Imaging Sci 2013;3:28.
11. Singh A, Gupta K, Chander R, Vira M. Sonographic grading of renal cortical echogenicity and raised serum creatinine in patients with chronic kidney disease. J Evol Med Dent Sci 2016;5:2279–2286.
12. Troell S, Berg U, Johansson B, Wikstad I. Ultrasonographic renal parenchymal volume related to kidney function and renal parenchymal area in children with recurrent urinary tract infections and asymptomatic bacteriuria. Acta Radiol Diagn (Stockh) 1984;25:411–416.
13. Kodikara I, Gamage DTK, Nanayakkara G, Ilayperuma I. Diagnostic performance of renal ultrasonography in detecting chronic kidney disease of various severity. Asian Biomed (Res Rev News) 2020;14:195–202.
14. Lee S, Kang M, Byeon K, Lee SE, Lee IH, Kim YA, et al. Machine learning-aided chronic kidney disease diagnosis based on ultrasound imaging integrated with computer-extracted measurable features. J Digit Imaging 2022;35:1091–1100.
15. Esteva A, Chou K, Yeung S, Naik N, Madani A, Mottaghi A, et al. Deep learning-enabled medical computer vision. NPJ Digit Med 2021;4:5.
16. de Boer IH, Khunti K, Sadusky T, Tuttle KR, Neumiller JJ, Rhee CM, et al. Diabetes management in chronic kidney disease: a consensus report by the American Diabetes Association (ADA) and Kidney Disease: Improving Global Outcomes (KDIGO). Diabetes Care 2022;45:3075–3090.

Article information Continued

Notes

Key point

Deep learning techniques can discriminate between presence and absence of chronic kidney disease with minimal user input up to an accuracy of 82% (sensitivity 71%, specificity 93%, positive predictive value 53%, negative predictive value 97%, positive likelihood ratio 10.1, negative likelihood ratio 0.3). Further development of deep learning techniques for automated sub-classification of chronic kidney disease has the potential to impact clinical management.

Fig. 1.

Recruitment flow diagram.

Schema was utilized for recruitment into study, with case reports evaluated via Nuance mPower software. A total of 138 normal cases and 162 cases with history and imaging features of chronic kidney disease (CKD) were initially analyzed based on clinical criteria. After quality control and image review, 136 normal cases and 152 cases of CKD remained. On imaging post-processing review, 124 normal cases and 134 CKD cases subsequently underwent randomized assignment into training, validation, and testing sets at an 8:1:1 ratio. AI, artificial intelligence; KDIGO, Kidney Disease Improving Global Outcomes.

Fig. 2.

Selected cases

A. Normal kidney demonstrates normal corticomedullary differentiation, normal cortical echogenicity, and normal length (not shown to scale). B. Imaging features of chronic kidney disease with presence of decreased renal cortical and parenchymal thickness, increased renal cortical echogenicity, and decreased renal length (not shown to scale). C, D. The corresponding masks delineate parenchyma in white and renal sinus fat as the central black region is below both cases as (C) and (D), respectively.

Fig. 3.

Elliptical schema for parenchymal thickness determination.

Two cases that illustrate the process of measuring parenchymal thickness: (A) a normal kidney and (B) a kidney with chronic kidney disease. The inner (pink dashed) and outer (blue dashed) ellipses are fitted to the contour points of the mask delineating parenchyma using OpenCV’s fitEllipse function. The straight orange arrow originates from the center of the inner ellipse and is oriented according to the angle of rotation of the inner ellipse. The segment of the orange line between the inner and outer ellipses corresponds to the parenchymal thickness.

Fig. 4.

Deep learning model architecture.

Pictorial diagram illustrates the algorithm development, as detailed under Materials and Methods. Images were evaluated by a convolutional neural network and VGG16 Encoder for image feature extraction in addition to input variables including renal length, parenchymal thickness, and mask ratios for metric features. Image and metric features were utilized together in a multilayer perceptron to assign binary classes with a 10% dropout. This process was repeated to optimization. CKD, chronic kidney disease; MLP, multilayer perceptron.

Table 1.

Patient demographics

Category Normal CKD (I–V) P-value
Sample size 124 134 N/A
Mean age (year) 55.1±17.3 73.0±13.6 <0.001
Diabetes mellitus, type II 16 (12.9) 55 (41.0) <0.01
Hypertension 43 (34.7) 102 (76.1) <0.01
Mean right renal length (cm) 11.1±1.1 10.1±1.6 <0.001
Mean cortical thickness (mm)a) 8.9±1.6 7.2±1.3 <0.001
Reported cortical thinning 5 (4.0) 34 (25.4) <0.01
Reported increased renal parenchymal echogenicity 13 (10.5) 94 (70.1) <0.01

Values are presented as number (%) or mean±standard deviation.

CKD, chronic kidney disease.

a)

Mean cortical thickness was a manually derived value from the raw image dataset.

Ultimately, this was not used as an input parameter for our model and we utilized parenchymal thickness instead, as detailed in the results.

Table 2.

Results generated during AI algorithm development using deep learning, minimal user entry data, and extracting imaging features

Model input Accuracy (%) Sensitivity Specificity PPVa) NPVa) LR+ LR-
Raw image 64 0.71 0.57 0.16 0.95 1.65 0.51
Raw image+UIRL 75 0.86 0.64 0.21 0.98 2.39 0.22
Raw image+FRA 71 0.71 0.71 0.21 0.96 2.45 0.41
Raw image+parenchymal thickness input 68 0.79 0.57 0.17 0.96 1.84 0.37
Raw image, UIRL+FRA 75 0.86 0.71 0.25 0.98 2.97 0.20
Raw image, UIRL, parenchymal thickness 79 0.86 0.71 0.25 0.98 2.97 0.20
Raw image, parenchymal thickness, FRA 82 0.86 0.79 0.31 0.98 4.10 0.18
Raw image, UIRL, parenchymal thickness, FRA 82 0.71 0.93 0.53 0.97 10.14 0.31

AI algorithm was generated from the training and validation cohorts and applied to the testing cohorts for generation of these results.

AI, artificial intelligence; PPV, positive predictive value; NPV, negative predictive value; LR, likelihood ratio; UIRL, user-input renal length.

a)

PPV and NPV calculations were based on population incidence of 10%. User-input renal length was the input kidney midsagittal length for each case. The functional renal area (FRA) is the functional renal area, otherwise known as the functional renal parenchymal area. The parenchymal thickness is a metric generated from input raw images and contour maps as surrogate for cortical thickness.