Article contents
Cohesion and Coherence in AI-generated Narrative Texts: ChatGPT vs Grok
Abstract
The rapid advancement of large language models has raised questions about their ability to produce cohesive and coherent texts. Our exploratory study examines and compares referential and temporal cohesion in 20 narrative texts, approximately 500 words each, generated by ChatGPT (OpenAI) and Grok (x.AI). Coh-Metrix 3.0 was used as an automatic tool for text analysis. 8 Coh-Metrix referential cohesion indices (Local and global noun, argument, stem and content word overlap) and two temporal cohesion indices (incidence of temporal connectives and semantic temporal overlap /tense aspect consistency) relevant for the study were selected. Given the small sample size, a series of Mann Whitney U tests along with Benjamini-Hochberg false discovery rate (FDR) correction were applied. The results revealed large similarity between local and global referential cohesion across models, with a small to moderate advantages for Grok in global noun and stem overlap (all FDR-adjusted p>.05). In contract, temporal cohesion showed a clear divergence between the two models: ChatGPT exhibited considerably higher use of explicit temporal connectives (CNCTemp, rank bi-serial r=0.73, large effect; FDR-adjusted p=.010), whereas Grok demonstrated stronger implicit temporal consistency (SMTEMP, r=.044, moderate-to-large effect). The latter difference, however, did not remain statistically significant after FDR correction (FDR-adjusted p=.245). Overall, these findings suggest that the two models espouse two distinct strategies to achieve text coherence in narrative texts with explicit cue reliance from ChatGPT versus deeper situational model coherence from Grok.

Aims & scope
Call for Papers
Article Processing Charges
Publications Ethics
Google Scholar Citations
Recruitment