Article contents
Explainable Transformer-Based Skin Lesion Classification from Clinical Images
Abstract
Early and reliable detection of skin cancer is critical for reducing disease burden and improving patient outcomes, yet large-scale screening remains constrained by limited specialist availability and heterogeneous image acquisition conditions. This paper presents an efficient transformer-based framework for automated multiclass skin lesion classification, centered on an EfficientViT architecture designed to balance representational capacity and computational efficiency. The proposed approach is evaluated against lightweight transformer and CNN baselines, including DeiT-Tiny, Axial Attention Transformer, Swin Transformer-Tiny, and EfficientNetV2-S, using the PAD-UFES-20 dataset comprising 2,298 smartphone-acquired clinical images across six lesion categories. Experimental results show that EfficientViT achieves superior performance, reaching 99.40% accuracy and 99.78% PR-AUC, indicating robust discrimination under real-world acquisition variability. To enhance transparency and support clinical interpretability, Grad-CAM visual explanations are integrated to highlight lesion-relevant regions driving model predictions. Overall, the results demonstrate that EfficientViT provides an accurate and interpretable solution for practical skin lesion screening using consumer-grade images.
Article information
Journal
Journal of Medical and Health Studies
Volume (Issue)
7 (5)
Pages
46-55
Published
Copyright
Copyright (c) 2026 https://creativecommons.org/licenses/by/4.0/
Open access

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Aims & scope
Call for Papers
Article Processing Charges
Publications Ethics
Google Scholar Citations
Recruitment