Research Article

Cross-LLM Generalization of Behavioral Backdoor Detection in AI Agent Supply Chains

Authors

Abstract

As AI agents increasingly become integral to enterprise workflows, their reliance on shared tool libraries and pre-trained components creates significant supply chain vulnerabilities. This paper presents the first systematic study of cross-LLM behavioral backdoor detection in AI agent supply chains, evaluating generalization across six production LLMs: GPT-5.1, Claude Sonnet 4.5, Grok 4.1, Llama 4 Maverick, GPT-OSS 120B, and DeepSeek Chat V3.1. Through 1,198 execution traces and 36 cross-model experiments, we identify a critical finding: single-model detectors achieve 92.7% accuracy within their training distribution but only 49.2% across different LLMs, representing a 43.4 percentage point generalization gap equivalent to random guessing. Our analysis reveals this gap stems from model-specific behavioral signatures, particularly in temporal features with coefficient of variation exceeding 0.8, while structural features remain stable across architectures. We demonstrate that a simple model-aware detection strategy, incorporating model identity as an additional feature, achieves 90.6% accuracy universally across all evaluated models. These findings establish that organizations using multiple LLMs cannot rely on single-model detectors and require unified detection strategies. We release our multi-LLM trace dataset and detection framework to enable reproducible research in this emerging area.

Article information

Journal

Journal of Computer Science and Technology Studies

Volume (Issue)

7 (12)

Pages

355-363

Published

2025-12-15

How to Cite

Sanna, A. C. (2025). Cross-LLM Generalization of Behavioral Backdoor Detection in AI Agent Supply Chains. Journal of Computer Science and Technology Studies, 7(12), 355-363. https://doi.org/10.32996/jcsts.2025.7.12.45

Downloads

Views

26

Downloads

4

Keywords:

AI Security, Backdoor Detection, Large Language Models, Cross-LLM Generalization, Behavioral Anomaly Detection, AI Security, Backdoor Detection, Large language Models, Cross LLM Generilization, Behavioral Anomaly Detection