Article contents
Hardware-Accelerated Caching for Large-Scale AI Model Training: An Intelligent Architecture for Vector Database and Model Inference Optimization
Abstract
Modern AI infrastructures are facing significant challenges in managing data movement efficiently between vector databases and AI models for training and inference operations. Traditional caching approaches cannot tackle the unique characteristics of vector operations and embedding access patterns, which introduce significant performance bottlenecks. This article proposes a novel caching system that fuses custom-designed vector processors with an adaptive hot/cold partitioning strategy enhanced by Bloom filters. It implements a hardware-accelerated hot cache for frequent vectors, a cold storage queue for less frequent data, and Bloom filter-based efficient lookups. By integrating hardware acceleration with workload-aware partitioning and probabilistic filtering, the system achieves massive improvements along multiple dimensions. The architecture addresses the unique temporal and spatial locality patterns in AI vector operations, reducing data movements while maximizing the utilization of compute resources. Simulation results on large language models and computer vision workloads show that the model accelerates training and inference speeds, reduces network data movement, and improves hardware utilization compared to conventional LRU-based architectures, potentially transforming the economics and characteristics of large-scale AI operation performance.
Article information
Journal
Journal of Computer Science and Technology Studies
Volume (Issue)
7 (12)
Pages
252-259
Published
Copyright
Open access

This work is licensed under a Creative Commons Attribution 4.0 International License.

Aims & scope
Call for Papers
Article Processing Charges
Publications Ethics
Google Scholar Citations
Recruitment