Article contents
Artificial Intelligence in Sentencing: Evaluating Machine Learning Models for Sentencing Recommendations in the U.S.
Abstract
Artificial intelligence is increasingly deployed in high-stakes decision-making, raising critical questions about accuracy, fairness, and transparency in regulated domains. This study evaluates the use of machine learning models to generate sentencing recommendations within the U.S. criminal justice system, examining whether such models can reliably support judicial decision-making without amplifying existing inequities. Using a comprehensive dataset of sentencing records enriched with engineered features reflecting criminal history, offense severity, demographics, and jurisdictional context, we develop and compare a range of predictive models, including Logistic Regression, tree‑based ensembles (Random Forest, XGBoost, LightGBM), deep learning architectures (MLP, LSTM, Bi‑LSTM), and hybrid ensemble frameworks. Models are assessed on both continuous sentence length prediction and classification of above‑median sentencing, using metrics such as mean absolute error, R‑squared, AUC‑ROC, and F1‑score. Fairness metrics are computed across gender, age, and jurisdictional groups, and interpretability analyses employ feature importance, attention weights, and SHAP values to ensure transparency in decision logic. Results indicate that hybrid and stacked ensembles achieve the best balance of accuracy and fairness improvements over baselines, with interpretability tools confirming alignment with legal reasoning and risk factors. These findings suggest that responsibly governed AI systems can augment sentencing decisions as decision‑support tools, provided continuous bias monitoring and ethical oversight are integrated into deployment practices. The study contributes empirical evidence and methodological guidance for integrating machine learning into judicial contexts.

Aims & scope
Call for Papers
Article Processing Charges
Publications Ethics
Google Scholar Citations
Recruitment