Article contents
Detecting and Mitigating Regressions in ML Model Performance Due to Platform-Level Changes
Abstract
This article addresses the critical challenge of detecting and mitigating performance regressions in machine learning models caused by platform-level changes. Unlike traditional software systems, ML models exhibit probabilistic behavior that can silently degrade when the underlying infrastructure evolves. The present a comprehensive framework for identifying and addressing these hidden regressions through environment fingerprinting, performance correlation analysis, and automated detection pipelines. Our methodology captures detailed execution contexts, establishes reproducible benchmarks, and implements continuous monitoring to maintain model integrity during infrastructure transitions. Through extensive experimentation across diverse ML workloads, we identify specific regression patterns including numerical precision shifts, memory access changes, and thread scheduling variations. We propose effective mitigation strategies, including infrastructure versioning, platform-aware model design principles, and continuous verification practices. This work bridges the gap between model development and infrastructure management, enabling organizations to evolve their technical platforms while maintaining consistent ML performance.
Article information
Journal
Journal of Computer Science and Technology Studies
Volume (Issue)
7 (8)
Pages
904-911
Published
Copyright
Open access

This work is licensed under a Creative Commons Attribution 4.0 International License.