Article contents
Google Translate Then and Now: Translations From Five Languages into English and Arabic (2012–2025)
Abstract
Google Translate (GT) is a free online service that instantly translates words, phrases, text and web pages across 249 languages including English, Arabic, German, French, Hungarian, Turkish and others. In 2006, GT was launched as a statistical machine translation system (SMT) and in 2016, it transitioned to a neural machine translation system (NMT) with differences in translation quality in the two eras. This study compares the translation of six texts from Hungarian, German and Spanish, Turkish and Japanese to English and Arabic by GT in 2012 (SMT era) and in 2025 (NMT era) in terms of intelligibility, fluency, and semantic, lexical and syntactic accuracy. For the SMT era (2012), holistic evaluations showed that Hungarian-English, German-English, Spanish-English translations were somewhat intelligible, with literal meanings, but were a bit awkward and clumsy. GT failed to capture idiomatic nuance, produced sentences with broken syntax and robotic phrasing. Turkish-English and Japanese-English translations were far more problematic, riddled with broken syntax, incoherence, and nonsensical phrases. Arabic translations by GT across the five languages were largely unintelligible. Lexical equivalents were inaccurate and sentence structure was nonsensical with jumbled, incoherent, fragmented, distorted, and clumsy word order. Arabic translations were unusable. In 2025 (NMT era), GT translation from European and non-European languages to both English and Arabic drastically changed. It became intelligible, fluent, coherent, stylistically natural, contextually accurate, and appropriately contextualized. Syntax and word order were preserved. The differences in translation quality for both English and Arabic in the two eras stem from differences in architecture, training, and linguistic modelling, In 2012, GT matched segments of source text against stored bilingual units and recombined them according to probability. It worked reasonably well for European languages that share structural similarities with English, but it struggled with languages that are typologically distant, such as Turkish, Japanese, and Arabic. In 2025, GT has been trained on much larger and more diverse datasets than SMT. This breadth of training data enables better handling of words, specialized terminology, and cross domain variation. The study concludes with recommendations for students, instructors, researchers, and developers on the effective use and improvement of GT.
Article information
Journal
Journal of Computer Science and Technology Studies
Volume (Issue)
7 (12)
Pages
413-427
Published
Copyright
Copyright (c) 2025 https://creativecommons.org/licenses/by/4.0/
Open access

This work is licensed under a Creative Commons Attribution 4.0 International License.

Aims & scope
Call for Papers
Article Processing Charges
Publications Ethics
Google Scholar Citations
Recruitment