Article contents
Cross-LLM Generalization of Behavioral Backdoor Detection in AI Agent Supply Chains
Abstract
As AI agents increasingly become integral to enterprise workflows, their reliance on shared tool libraries and pre-trained components creates significant supply chain vulnerabilities. This paper presents the first systematic study of cross-LLM behavioral backdoor detection in AI agent supply chains, evaluating generalization across six production LLMs: GPT-5.1, Claude Sonnet 4.5, Grok 4.1, Llama 4 Maverick, GPT-OSS 120B, and DeepSeek Chat V3.1. Through 1,198 execution traces and 36 cross-model experiments, we identify a critical finding: single-model detectors achieve 92.7% accuracy within their training distribution but only 49.2% across different LLMs, representing a 43.4 percentage point generalization gap equivalent to random guessing. Our analysis reveals this gap stems from model-specific behavioral signatures, particularly in temporal features with coefficient of variation exceeding 0.8, while structural features remain stable across architectures. We demonstrate that a simple model-aware detection strategy, incorporating model identity as an additional feature, achieves 90.6% accuracy universally across all evaluated models. These findings establish that organizations using multiple LLMs cannot rely on single-model detectors and require unified detection strategies. We release our multi-LLM trace dataset and detection framework to enable reproducible research in this emerging area.

Aims & scope
Call for Papers
Article Processing Charges
Publications Ethics
Google Scholar Citations
Recruitment