A Corpus-Based Multidimensional Analysis of Linguistic Features between Human-Authored and ChatGPT-Generated Compositions

JinLiang Wu

doi:10.32996/ijllt.2025.8.5.10

Research Article

A Corpus-Based Multidimensional Analysis of Linguistic Features between Human-Authored and ChatGPT-Generated Compositions

Authors

JinLiang Wu Undergraduate Student, School of Foreign Languages, Guizhou Medical University, Guiyang, China

Abstract

This study presents a corpus-based multidimensional comparative analysis of linguistic features in human-authored and ChatGPT-generated English compositions, with a focus on four core dimensions: lexical difficulty, syntactic complexity, textual cohesion, and error patterns. A total of 120 compositions were analyzed—60 produced by ChatGPT-4 and 60 authored by Chinese L2 English learners from the Ten-thousand English Compositions of Chinese Learners corpus—equally distributed across three educational proficiency levels: primary, secondary, and tertiary. Quantitative analyses indicate that human-authored compositions exhibit a progressive increase in lexical complexity aligned with educational advancement, while ChatGPT-generated texts demonstrate limited differentiation between primary and secondary levels, followed by a sharp lexical elevation at the tertiary level. This pattern suggests an algorithmic reliance on generalized discourse rather than sensitivity to developmental variation. In terms of syntactic complexity, ChatGPT consistently produces structurally uniform texts with high usage of subordinate clauses and logical subordination, whereas human writing displays greater contextual flexibility, albeit with occasional simplification. Regarding textual cohesion, ChatGPT-generated compositions—particularly at the tertiary level—rely heavily on overt logical connectors and referential markers, resulting in structurally coherent but stylistically formulaic discourse. In contrast, human-authored texts, while sometimes lacking explicit cohesion markers, employ more nuanced devices such as collocations and implicit semantic links. Error analysis reveals a near absence of grammatical, lexical, and orthographic errors in ChatGPT outputs, contrasting with the relatively high error frequency in human compositions, especially at lower proficiency levels. These findings highlight ChatGPT’s strengths in producing grammatically accurate and syntactically complex texts, yet also underscore its limitations in mimicking authentic learner development and stylistic variability. The study concludes that while generative AI can serve as an effective auxiliary tool in L2 writing instruction, its pedagogical integration should be carefully calibrated to avoid undermining learners’ development of rhetorical sensitivity, authorial voice, and context-appropriate expression.

Article information

Journal

International Journal of Linguistics, Literature and Translation

Volume (Issue)

8 (5)

DOI

https://doi.org/10.32996/ijllt.2025.8.5.10

Pages

102-110

Published

2025-05-12

Copyright

Open access

This work is licensed under a Creative Commons Attribution 4.0 International License.

How to Cite

Wu, J. (2025). A Corpus-Based Multidimensional Analysis of Linguistic Features between Human-Authored and ChatGPT-Generated Compositions. International Journal of Linguistics, Literature and Translation, 8(5), 102-110. https://doi.org/10.32996/ijllt.2025.8.5.10

International Journal of Linguistics, Literature and Translation

A Corpus-Based Multidimensional Analysis of Linguistic Features between Human-Authored and ChatGPT-Generated Compositions

Authors

Abstract

Article information

Journal

International Journal of Linguistics, Literature and Translation

Volume (Issue)

8 (5)

DOI

https://doi.org/10.32996/ijllt.2025.8.5.10

Pages

102-110

Published

Copyright

Open access

How to Cite

Downloads

786

763

Keywords:

rightbar

submission

menus