Article contents
AI model that predicts antibiotic resistance from bacterial genomes (AMR) using open sequencing data
Abstract
Antimicrobial resistance (AMR) prediction from bacterial whole genome sequencing can shorten time to effective therapy, strengthen surveillance, and reduce reliance on slow culture based susceptibility testing. However, many genomic AMR workflows still depend on curated gene and mutation catalogs, which can miss emerging mechanisms, vary in coverage across species, and rarely provide calibrated uncertainty. We present a phenotype first framework that trains models to predict resistant, intermediate, or susceptible outcomes directly from genome sequences using open repositories. Genomes and linked phenotypes are assembled from NCBI SRA and GenBank records and harmonized with BV BRC (formerly PATRIC) antibiotic panels. We compare three modeling families: catalog based baselines (AMRFinderPlus and ResFinder), k mer linear models with stability selection, and sequence transformers fine tuned on long k mer tokens. To support clinical decision making, predicted probabilities are calibrated with temperature scaling and wrapped with conformal prediction to yield distribution free confidence sets. Interpretability is addressed by mapping influential k mers and attention based attributions to genes and known resistance determinants in CARD and MEGARes, producing mutation importance summaries that can be reviewed by microbiologists. We define evaluation protocols that avoid leakage through lineage structure, including temporal splits and site stratified cross validation, and report discrimination, calibration, and abstention metrics. The result is a reproducible template for AMR phenotype modeling that complements rule based tools while providing transparent evidence and quantified uncertainty. The manuscript targets publication in November 2025 and emphasizes open science, auditability, and transferability across pathogens and antibiotics in public health practice.

Aims & scope
Call for Papers
Article Processing Charges
Publications Ethics
Google Scholar Citations
Recruitment