Article contents
Global–Local Attention Modeling for Reliable Multiclass Kidney Disease Classification from CT Images
Abstract
Automated analysis of kidney abnormalities from computed tomography (CT) has gained increasing importance as imaging volumes grow and radiological workloads intensify. Despite recent progress, robust multiclass classification remains challenging due to overlapping visual characteristics, acquisition variability, and class imbalance across renal conditions. In this work, we present an attention-driven framework for multiclass kidney disease classification from CT images. The proposed approach is based on a Vision Transformer (ViT-B/16) architecture that explicitly models global anatomical context while preserving discriminative local renal features. A comprehensive evaluation is conducted against established convolutional and modern CNN-based models, including ResNet50, DenseNet121, EfficientNetV2-S, and ConvNeXt-Tiny, using a CT kidney dataset containing 12,446 images spanning normal, cyst, stone, and tumor classes. The proposed model achieves the best overall performance, with 98.90% accuracy and a PR-AUC of 99.23%, demonstrating strong class-wise discrimination under imbalance. To promote transparency, gradient- and attention-based explainability techniques are employed to visualize lesion-relevant regions influencing predictions. The results indicate that transformer-based modeling offers an effective and interpretable solution for reliable CT-based kidney disease screening.
Article information
Journal
Journal of Medical and Health Studies
Volume (Issue)
7 (5)
Pages
36-45
Published
Copyright
Copyright (c) 2026 https://creativecommons.org/licenses/by/4.0/
Open access

This work is licensed under a Creative Commons Attribution 4.0 International License.

Aims & scope
Call for Papers
Article Processing Charges
Publications Ethics
Google Scholar Citations
Recruitment