A Scientometric Review of Syntactic Complexity in L 2 writing based on Web of Science ( 2010-2022 )

| ABSTRACT As an important construct in the field of second language teaching and assessment, syntactic complexity is closely related to the language proficiency and language development process of L2 learners. Using the visualization software of CiteSpace, this study conducts an in-depth scientometric analysis of 140 articles on written syntactic complexity published over the past 10 years (2010-2022), thus uncovering the current development and challenges faced by relevant studies. Specifically, a frequency analysis was firstly administrated to describe the overall development in written syntactic complexity research. Furthermore, the current study conducted a Document Co-Citation Analysis (DCA), which enables researchers to conduct a network of co-cited references to identify the underlying research hotpots and future trends. The results indicate that the study concerning automatic essay scoring is the most prominent cluster active from 2010 to 2021. In addition, Norris & Ortega (2009) is the most cited paper, followed by Ortega (2003) and Biber et al. (2011). Meanwhile, the bursts of detected papers demonstrate that McNamara et al. (2012) and Grant & Ginther (2000) generated the strongest citation burst with a burst strength of 3.14 and 3.09, respectively. The findings of the study would have implications for subsequent research on written syntactic complexity in the field of language teaching and language learning.


Introduction
Over the last few decades, the importance of syntactic complexity in the second language (L2) writing has been recognized, as evidenced in those studies that investigated the relationship between syntactic complexity and the quality of L2 writing text ( Syntactic complexity, also known as syntactic maturity, was defined by Ortega (2003) as the diversity and sophistication of syntactic structures in language production. Furthermore, Bulté & Housen (2012) made clear the content and boundaries of syntactic complexity by introducing the concept of language complexity. According to them, syntax complexity, a crucial dimension in language complexity, mainly involves the richness and elaborateness of syntactic structures. Syntactic richness refers to the diversity and change of syntax forms in different levels of phrase, clause, and sentence, while syntactic elaborateness refers to the sophistication of various syntactic forms in these three levels. Furthermore, to a certain extent, written syntactic complexity reflects L2 learners' ability to store and produce syntactic knowledge in the brain. Therefore, it is regarded as a gauge of L2 learners' language proficiency and the quality of L2 writing.
Over the past few decades, there has been a surge of studies investigating the relationship between syntactic complexity and the quality of L2 writing. Most researchers have carried out synchronic studies to examine how syntactic complexity reflects the quality of L2 writing with the change of different independent such as first languages of the writer, L2 proficiency, writing task, etc. A number of researchers have carried out cross-sectional studies to investigate the extent to which different syntactic complexity measures reliably index L2 writers' proficiency (e.g., Kim & Nam, 2019;Kyle & Crossley, 2017). There are also some longitudinal studies that track the development or change of syntactic complexity over time (e.g., Bulté  All these studies have provided useful insights into how the construct of syntactic complexity should be conceived and utilized in L2 writing research and pedagogy (Lu & Ai, 2015). However, given the multi-dimension of written syntactic complexity and the disparity in research objects, research methods, or research designs, there's always a lack of consistency in the relationship between syntactic complexity and L2 writing quality. Therefore, several reviews tried to uncover the relationship between syntactic complexity and L2 writing quality (Frantz et al., 2015;Jackson & Suethanapornkul, 2013;Jagaiah et al., 2020). However, they have the limitation of narrow breadth and scope (Chen et al., 2009). For instance, Jagaiah et al. (2020), from 36 publications in the literature, examined how syntactic complexity measures (SCMs) varied by genre, grade level, students' writing ability, and writing quality. However, because of the small quantity and the lack of relative high-quality studies, it is hard to reach a consistent definition of SCMs, thus struggling to find the connection behind these 36 studies.
In view of this, making use of bibliometrics and the CiteSpace software, based on a sample of 131 articles on syntactic complexity between 2010 and 2022 in the WOS corpus, this research conducts data mining and analysis of researches on syntactic complexity in terms of their annual publication count, highly cited studies, core scholars, and research hotspots in this field. Furthermore, based on these results, the primary goal of the current study is to systematically examine how written syntactic complexity is varied by different impacting factors. The results will have implications for L2 writing and pedagogy.

Data source
The sample sources of data selected in this study are listed in Table 1. The data were completely selected from Core Collection (2010-2022) in the Web of Science database. Web of Science (formerly ISI Web of Knowledge) is the premier research platform for information in the hard sciences, social sciences, arts, and humanities. And Web of Science Core Collection is the world's most trusted publisher-independent global citation database (Zhang, 2020).
Two rounds of data search were conducted in the Web of Science Core Collection on 16th December 2021. In the first round, the period was set from 2010 to all years in 2022. The topic search of "syntactic complexity" OR "syntactic maturity" is mainly used to cover recent research status in the field of linguistics. The search scope was restricted to the linguistics category to ensure accurate results, and document type was limited to "Article" and "Review". There were 847 publications from linguistics journals from 2010 to 2020. The publications were from several different branches of linguistics, such as psycholinguistics, cognitive linguistics, applied linguistics, and etc. Therefore, to unveil the relationship between syntactic complexity and L2 writing, those that have nothing to do with this topic were excluded from the rest of 847 publications. Through a systematic reading of both title and abstract of these articles, only 140 articles were further considered and analyzed. The data were subsequently downloaded from the Core Collection of Web of Science into text-formatted files.

Data analysis
In this article, we capitalize on CiteSpace software (Version 5.7) to analyze core scholars, co-cited references and research hotspots in the field of syntactic complexity. CiteSpace, an information visualization analysis program, is widely used to scientifically analyze the potential knowledge in certain fields (Chen, 2004;Chen, 2016;Chen et al., 2010). It has been widely used in that the knowledge graph it draws can capture the evolution of a certain knowledge field by virtue of documents on citation node and co-citation cluster labels as research frontiers (Aryadoust & Ang, 2021; Lim & Aryadoust, 2021).
On the whole, two rounds of data analysis were implemented. The first round of frequency analysis revealed the number of written syntactic complexity articles published annually, the names of the journals wherein the papers were published, and the names of the most productive authors and universities/institutes wherein the authors were residing when the papers were published. In the second round of data analysis, we conducted a document co-citation analysis to explain the co-citation relationship among keywords and references, thus tracing the trend of written syntactic complexity.  (19) and Journal of English for Academic Purposes (13).

Fig. 1 Frequency Analysis of 140 Articles on Written Syntactic Complexity
In terms of the most productive authors, the third figure illustrates that Lu Xiaofei was at the top of the publication list (12 articles), followed by Yoon Hyung-Jo, Housen Alex, Scott A. Crossley (4 articles). As for the affiliations wherein the authors were residing, the fourth figure shows that Georgia State University and Pennsylvania State University topped the list with 12 articles, while Vrije Universiteit Brussel came in second with 7 articles.

Document co-citation analysis
As a widely-used visualization tool in scientometric analysis, Document Co-Citation Analysis (DCA) enables researchers to conduct a network of co-cited references to identify the underlying research hotpots (Chen, Ibekwe-SanJuan, & Hou, 2010; Lim & Aryadoust, 2021).

The knowledge mapping for keywords
The paper's keywords may highly reflect the scheme and thought of the paper, and it is feasible to determine the research hotspots in the field of discipline according to the frequency of the keywords (Ma et al.,2020). Fig. 2 shows the knowledge mapping for keywords in the written syntactic complexity research field. Overall, the modularity Q index ranges from 0 to 1, and the mean silhouette ranges from −1 to 1. Higher modularity Q index indicates higher reliability, and a higher mean silhouette indicates better homogeneity and vice versa (Lim & Aryadoust, 2021). In the current study, the modularity Q index and the weighted mean silhouette metric for the DCA network were 0.4842 and 0.7743, respectively, indicating a highly acceptable level of reliability and homogeneity for the network. As shown in Fig. 2, excluding those small clusters that had little or no connection and those clusters that are not clearly labelled, a total of 8 major clusters with keyword labels were automatically mined by CiteSpace Version 5.7 software, which is academic writing (cluster #1), complexity (cluster#2), syntactic language development (cluster#3), teaching materials (cluster#4), linguistic complexity (cluster#5), automatic essay scoring (cluster#7), syntactic elaboration (cluster#8), and education (cluster#9). On this basis, the current study drew a timeline of keywords through the co-citation analysis. Figure. 3 presents the major clusters in the data on a horizontal line to provide an overview of the development of the research on written syntactic complexity over time (modularity Q index = 0.4842; weighted mean silhouette metric = 0.7743). Overall, the strength of the connection between nodes is presented as the size of the outer purple rings of each node (Aryadoust & Ang, 2021). As illustrated in Fig. 3, the longest cluster depicted in the timeline view from 2010 to 2022 was cluster 7, which was labelled as "automatic essay scoring" and was active from 2010 to 2021. The following large nodes in clusters were cluster 4 (teaching materials) and cluster 3 (second language development), implying that the publications within these two clusters were highly cited. Furthermore, the timeline view also provided chronological information on the duration of activities of each cluster in that the length of each line represents the lifetime of the cluster (Chen, 2016). For instance, cluster 2 (i.e., academic writing) and cluster 7 (automatic essay scoring) had the longest active duration of approximately 11 years. In contrast, cluster 5 (linguistic complexity) had the shortest lifetime of approximately 5 years from 2015 to 2020.  Fig. 4 represents the knowledge mapping for the co-cited references network (2010-2022) in the field of written syntactic complexity. Normally the importance of references was analyzed with the citation frequency, centrality, and explosiveness of reference. The five top-cited articles in written syntactic complexity are shown in Table 2.

The knowledge mapping for co-cited reference
As shown in Figure 4 and Table 2, the most cited paper, which examined challenges faced by the current practices in the measurement of syntactic complexity, proposed that measurement practices in relation to CAF must become considerably more organic and sustainable, thus shedding new light on the subsequent research on the construct of syntactic complexity (Norris & Ortega, 2009). The The second most cited paper is Ortega (2003), systematically investigating the relationship between L2 proficiency and measure of syntactic complexity in college-level L2 writing and finding that the relationship between L2 proficiency and L2 writing syntactic complexity was not static. Instead, it varied systematically across factors like second or foreign language learning context and the methods of defining and rating the level of L2 proficiency. Meanwhile, Ortega examined four measures of syntactic complexity which can better detect between-proficiency differences and proposed an observation period needed for substantial changes in the syntactic complexity of L2 writing. As the third most cited paper, Biber et al. (2011) challenged the validity of using T-units and clausal subordination to assess the development of grammatical complexity in L2 writing, especially in academic writing, and proposed a radically new approach that is most strongly characteristic of advanced academic writing. Lu (2010) put forward a computational system for automatic analysis of syntactic complexity in second language writing, which provided technical support for the automatic analysis of L2 writing text. Bulté and Housen (2014) ascertained the measures of syntactic complexity, which can sensitively reflect the short-term changes in English L2 writing proficiency. To sum up, all these highly cited papers give impetus to the development of studies on written syntactic complexity.  Aside from the analysis of top-cited articles, the current study performed a burst detection analysis to identify the most influential publications in the field of written syntactic complexity. The top 10 publications with the strongest document co-citation burstness are shown in Table 3. A burst detection analysis enables researchers to explore the research trends of a research field (Chen, 2006; Wang et al., 2019) and reveal the future trends to some extent (Guo, 2017).
For example, the burst group with the end year of 2022 suggests that their citation burst will probably continue in the future, thus reflecting the future trend in the field. The most common future trend is the automatic analysis of syntactic complexity. Lu (2017) examined the rationality of using the automated measurement tool (L2 syntactic complexity Analyzer) to analyze the complexity of syntactic structures in corpus-based writing text. Another notable finding in Table 3 Table 3. Top 10 publications with the strongest document co-citation burstness

Discussion
As illustrated in Figure 2, a total of 8 major clusters with keyword labels were automatically mined by CiteSpace Version 5.7 software, which are academic writing (cluster #1), complexity (cluster#2), syntactic language development (cluster#3), teaching materials (cluster#4), linguistic complexity (cluster#5), automatic essay scoring (cluster#7), syntactic elaboration (cluster#8), and education (cluster#9). In this chapter, 8 clusters are divided into four groups according to the keyword of paper in these clusters.

Academic writing
The automatically extracted label of cluster #1 was 'academic writing'. The cluster focused on (i) synchronous and cross-sectional corpus-based study on the measures of syntactic complexity across different factors like genre, register, discipline, L2 proficiency, etc., and (ii) longitudinal study on how syntactic characteristics of L2 learners' academic writing change over time.
In terms of the first focus, many researchers capitalized on different corpora to identify the use of grammatical complexity features in academic writing (e.g., Ansarifar  On the other hand, longitudinal studies of how syntactic characteristics of L2 learners' academic writing change over time were implemented in the last decade. Mazgutova and Kormos (2015) compared two argumentative essays at the beginning and at the end of the EAP course to identify the differences in written syntactic complexity. However, the limitation of this study was that argumentative essays could not be boiled down to the category of academic writing, strictly speaking.
It is noted that recent years have witnessed a new trend in academic writing research that combines the analysis of syntactic characteristics and functional analysis of move structure (e.g., Lu, Casal, et

Syntactic language development
Publications in cluster#3 concentrate on (i) cross-sectional study on how the measures of syntactic complexity across the change of time, and (ii) longitudinal study on how syntactic characteristics of L2 writing change over time.
On the whole, cross-sectional studies are concerned with the extent to which different syntactic complexity measures reliably index L2 writers' proficiency (e.g., Kim

Syntactic complexity in education
The label was extracted from cluster #4 (teaching materials) and cluster#9 (education), which mainly deals with (i) the analysis of readability and syntactic characteristics of various teaching materials, like textbooks and exam materials, and (ii) with how syntactic

Syntactic complexity
According to the keywords in these clusters, cluster#2 (complexity), cluster#5 (linguistic complexity), cluster#7 (automatic essay scoring), and cluster#8 (syntactic elaboration) were labelled as syntactic complexity itself, focusing on (i) the construct of syntactic complexity in L2 writing, (ii) on the validity of syntactic complexity measures, and (iii) on the analyzer oof syntactic complexity.
As previously noted, syntactic complexity is a multidimensional concept. That is because, on the one hand, syntactic complexity, coupled with lexical complexity, is one of the constructs of linguistic complexity (Bulté & Housen, 2012), while on the other hand, the concept itself involves syntactic diversity and syntactic elaboration, which both have different measures needed to analyzed (Ortega, 2003).
Therefore, choosing which components of syntactic complexity are related to proficiency and which measures should be employed is crucial to the final results. Many authors then examined whether those previous measures can fully reflect the difference in syntactic complexity(e.g., Jagaiah et al., 2020).
In terms of the analysis of these syntactic complexity measures, there are some widely known automated text analysis tools, like the Biber tagger, Coh-Metrix, the Tool for the Automatic Analysis of Syntactic Complexity (TAASSC), and L2 Syntactic Complexity Analyzer (L2SCA), making automatic essay scoring more efficient and trustworthy.

Conclusion
In this article, we conducted a scientometric analysis of papers on written syntactic complexity in the past 10 years (2010-2022) to explore current development and challenges faced by related studies. Through visualization analysis, the current research trends of syntactic complexity are summarized as follows: 1. The extensiveness of Research Contents: At the macro level, studies on written syntactic complexity exhibit an interdisciplinary nature, whose research fields involve Linguistics, Psychology, Education, Computer science, and etc. That is to say; it should be noted that the results of research on written syntactic complexity may vary across distinct research methods and purposes.

Depth of Research Contents:
Studies on written syntactic complexity have made great progress in their research depth at the micro-level. For instance, along with the development of computational linguistics, the analysis of written syntactic complexity measures underwent a transition from manual analysis to automatic analysis, making the process of analyzing more efficient and reliable. While in terms of syntactic complexity measures, many researchers embarked on a new exploration on the validity of these measures, namely, whether these previous measures can flexibly reflect the differences in syntactic complexity under different contexts.
Therefore, in light of the current trend of written syntactic complexity research, some notes need to be considered in subsequent studies.
First, with respect to SCMs (syntactic complexity measures), it should be noted that considering the fact that a consistent definition for these SCMs is much needed, and the total number of SCMs is large, it is difficult to compare the results across studies or to identify consistent patterns in the use of SCMs. In addition, whether the SCMs are sensitive enough to gauge language learners' writing quality and language proficiency remains to be seen.
Second, in terms of the factors involved in the complexity of syntactic structure, existing research shows that syntactic complexity is affected by many factors, such as language proficiency, different SCMs, task design, writing genre, writing modality, and formality of task. However, to date, there are still some potential factors affecting the measures of syntactic complexity, thus resulting in inadequate control of experimental variables in most studies. Therefore, it is necessary to further clarify the factors that affect the complexity of syntactic structures and examine these factors' independent and overlapping effects.
To sum, the research on written syntactic complexity has attracted much attention and faces many challenges. We hope that such a study may bring new changes in writing instruction and assessment in the classroom through a scientometric analysis of syntactic complexity.
Funding: This research received no external funding.

Conflicts of Interest:
The author declares no conflict of interest.