Lexical and Rhetorical Features in The Time Machine : A Corpus-stylistic Analysis

| ABSTRACT Corpus stylistics is the study of style by applying linguistic theory and the corpus-based approach. Therefore it combines qualitative and quantitative study. This study adopts the stylistic theory of Leech and Short (2007) and the corpus-based approach to analyze the science fiction The Time Machine. By using the corpus tools WordSmith 7.0 and AntConc, the lexical features and rhetorical features are extracted. Then the writing style and the theme are discussed. Therefore, the research enriches the empirical study of corpus-based fiction stylistics and encourages the appreciation of science fiction in the early times.


Introduction
When it comes to the analysis of fiction, stylistics is of vital significance in the linguistic field. As the study of style, various theories have been put forward and improved successively. Among them, Leech and Short's (2007) theory of fiction stylistics is of vital significance. The categories are classified: lexical categories, grammatical categories, figures of speech, and cohesion and context. The traditional stylistics, however, focuses on qualitative research. Thanks to the progress of computer technology, the research on corpus linguistics is prospering. The combination of stylistics and a corpus-based approach can add the quantitative aspects of the stylistic research. Based on Leech and Short's framework, the objective of this study is to seek the lexical and rhetorical features in the science fiction The Time Machine in a quantitative way and explore the writing style and the theme. The research is the enrichment of the empirical study of corpus stylistics, and it provides inspiration for the appreciation of science fiction in the early times.

Literature Review
With the progress of statistics and the spread of computers, advanced corpus analysis software and toolbox emerge endlessly, and various kinds of corpora have been built. More and more linguists have realized the research prospect offered by corpus resources and techniques. So corpus stylistics came into being. Corpus stylistics, just as its name implies， is a new research field that combines the research pattern of corpus linguistics with the research of stylistics. Wales (2006:213) defined stylistics as 'the study of style' and explained that "it characteristically deals with the interpretation of texts by focusing in detail on relevant, distinctive linguistic features, patterns, structures, or levels and on their significance and effects on readers." The presupposition is that every linguistic feature in a text has potential significance. Among the linguistic theories for different genres, Leech and Short (2007:61) propose a theory of fiction stylistics. Four levels of style: lexical categories, grammatical categories, figures of speech, and context cohesion are identified. This classification enables the researchers to analyze a text practically. Wynne (2006:223) shared a single opinion on stylistics, saying the empirical approach of stylistics relies on the evidence of the language in a work of literature. He pointed out the typical approach to investigate stylistics is to apply the systems of categorization and analysis of linguistic science to the study of poetry, fiction, and prose. The theories of other branches like sociolinguistics, pragmatics, cognitive linguistics, historical linguistics are used to describe the multifaceted nature of languages, such as phonology, syntax, and semantics.
Among the various methods to do stylistics, corpus stylistics is a promising one, because stylistics itself is an empirical branch, it depends on the collection and descriptive analysis of literary data, so "a large corpus is a testbed for hypotheses and the corpusbased to add a quantitative dimension to many linguistic studies" (Hunston, 2006: 234). In a nutshell, Corpus linguistics is a useful supplement to stylistics because of its emphasis on quantitative analysis, and the common concern about the relationship between form and meaning further promotes the integration of stylistics and corpus linguistics. Wynne (2006:223) also discerned two sub-approaches of corpus linguistics: corpus annotation and norm analysis. In terms of corpus annotation concerning literary, speech thought, and writing presentation can be a benchmarking pattern. Based on Leech and Short's (1981:318) widely-accepted model of speech and thought, Semino and Short (2004: 42) extended and refined the scales. Then a revised model of speech, writing, and thought presentation (SW&TP) was put to use. They built a written language corpus that included three genres of texts: fiction, news report, and biography. All the representational categories in the corpus were manually classified and annotated. Although the annotation is time-consuming and laborious, the pattern is useful to analyze the text meaning. The other one, norm analysis, is "to study literary effects in texts by using the evidence of language norms in a reference corpus" (Wynne, 2006: 224). By comparing the research subject with the reference corpus, deviations from the norms of language use can be dugout. According to Stubbs (2005: 5), "individual texts can be explained only against a background of what is normal and expected in general language use, and this is precisely the comparative information that quantitative corpus data can provide." Generally speaking, a vital corpus-stylistic approach is to focus on the use of language, such as words, phrases, grammatical structures, rhetorical devices, etc. That is to say, on the basis of the annotated literary texts, we can do word frequency statistics, theme word retrieval, index, word category distribution, and so on. All these means can be used to study the theme of literary works, the shaping of characters, the development of narration, and the style of writers. The publications of corpus stylistics are fruitful. More and more studies have been done since the 1980s. Some scholars selected specific literary works to do empirical research. Some referential articles, including Burrows (1987), revealed the meaning and values of Jane Austen's novels by comparing the frequency of modal verbs used in the dialogues of characters and narration. Starcke (2006) extracted the most frequent phraseology of Jane Austen's novel Persuasion and presented a detailed analysis of its two most frequent 3-grams. By doing that, he illustrated how computer-assisted techniques could reveal new shades of meaning in a text. Mahlberg (2007) conducted a corpus-stylistic study on Dickens' works. By analyzing high-frequency word clusters in the corpus, he found that word clusters related to body parts were often clues to promote the key plots. Biber (2010) focused on keywords, keyword clusters, and word collocation and analyzed the unique language style of specific articles or authors. Balossi (2014) conducted a contrast analysis of word class and semantic field use of the monologue in Woolf's novel "The Waves." It showed the six characters' obvious differences in language style. All of the studies were based on a particular aspect.
This study adopts the second approach, "norm analysis," to explore the stylistic features in fiction. The research object of this study is a science fiction written by the English writer Herbert George Wells, and it was first published in 1895. The computer-readable version is available from Project Gutenberg. As the representative work of H.G.Wells, The Time Machine must be the best embodiment of his writing style, including the narrative and descriptive features and language characteristics. Research on this novel focuses on narrative techniques and literary criticism. Most of them are qualitative studies from the perspective of literature. For example, Ruddic (2001) discussed the unusual narrative structure in the novel and explained that it might be accounted for the topicality, which was unacknowledged. On the other hand, there are few empirical studies from the perspective of linguistics, especially from the corpus stylistic angle.
Therefore this study combines the corpus approach and the literary theory of Leech and Short (2007), which focuses on the analysis of lexis, and rhetoric, aiming to seek these two stylistic features and meanings conveyed in the science fiction The Time Machine in a quantitative way. Another objective of this study is to confirm the explanatory power and maneuverability of fiction stylistic theories in science fiction. In addition, the study aims to enrich the empirical study of corpus stylistics and provide inspiration for the appreciation of science fiction in the early times. The research questions of this study include: The corpus analysis software used in this study is WordSmith 7.0 and AntConc. As for the research design, the first two corpora are established. The observed corpus is built by the electronic text of The Time Machine. The reference corpus established in this research contains 14 other English science fiction of the 21st century. All of them are chosen from the winners of the British Science Fiction Association Awards after 2000. They are the most representative and famous science fiction in 21st English literature. By comparing these two corpora, the norms and deviations can be easily recognized. The results will provide quantitative parameters to the answer to the research questions.

Methodology
The methodology applied in this study follows Leech's stylistic theory of fiction, the corpus-based approach, the process of constructing the observed corpus and the reference corpus, software for corpus and analysis, research procedure, and methods of data analysis are introduced in detail in this part.
The fundamental framework used in this study came from Leech and Short (2007: 61). In their representative work Style in Fiction: A linguistic introduction to English fictional prose. Four categories of stylistic features can be summarized: lexis of the text, grammatical categories, figures of speech, and context cohesion. This framework gives a practical way to analyze the literary from different perspectives, which is an essential reference to the current research. The research questions focus on the lexical and rhetorical categories of literary style.
As for lexical style in fiction, the research contents include the complexity of words, the degree of formality, whether the word is descriptive or evaluative, general or specific, and does the text contains idiomatic phrases or notable collocations. Lexical complexity relates to the depth and breadth of lexical knowledge possessed by speakers, writers, and readers (Meara 2005). The analysis of lexical level in this study mainly emphasized the vocabulary data through the type/token ratio, the mean word length, frequency. The corpus-based method is used to discuss the four aspects: lexical density, word length, word frequency, and keywords.
With regard to the novel rhetoric, it refers to the techniques and strategies that a writer applies to establish a channel to connect with the readers. It refers to the various measures that aim to control the reader's response persuade readers to accept the value of the characters and main ideas in the novel. The devices include metaphor, simile, personification, repetition, ellipsis, etc. There are a lot of metaphors in The Time Machine. The dominant metaphor in this novel will be chosen and discussed. Combined with the era of fiction， the analysis of metaphor can reveal the author's profound thoughts.
Therefore, because of the need for research, semantic annotation and part-of-speech tagging are necessary. This study is annotated on the Free USAS English web tagger and Free CLAWS web tagger, respectively. And the tagset for the latter is C7. The tools used to analyze the corpora are WordSmith 7.0 and AntConc. The construction of the reference was mentioned before. It

Results and Discussion 4.1General statistical information of two corpora
In this research, WordSmith 7.0 is applied to get the general statistical information of The Time Machine and the reference corpus. The word tokens, word types, TTR, and Sd.TTR, keywords, average word length, and average sentence length are listed below:

The Time Machine
The

Lexical level 4.2.1 Lexical density
Token refers to all the words in the text, and type represents all the different words in the text. The type/token ratio (TTR) is to measure lexical density. Although TTR indicates the size of the vocabulary used in a particular text, it is significantly affected by the size of the text. Therefore, a more scientific indicator, the standardized type/token ratio, was used. The average TTR of every fixed chunk of words, like 1,000 words, is counted. The result is more convincing. Table 1 shows that the data of standardized TTR of The Time Machine and the reference corpus are 45.32 and 45.64. It demonstrates that as 19th-century science fiction, the vocabulary scale of The Time Machine is almost the same as current novels.

Word length
According to Bailin and Grafstein (2016: 97-98), word length is a vital indicator to measure the complexity of a text. The calculation of word length varies by different researchers in different ways. In this paper, the average word length is the average number of letters of words in a given text. As shown in table 1, the mean word length of The Time Machine is 4.41, and the mean word length of the reference corpus is 4.42. It means that the majority of words used in the target corpus and the reference corpus have a similar length and the average word length is not long Combined with the lexical density, we can conclude that The Time Machine has a normal level of vocabulary. It is comparable to the world's top science fiction, so it has a reading threshold, but it is relatively readable.

Frequency
The frequency of words can reveal the theme of a work. This study chooses the frequency of the top 100 words in The Time Machine for specific analysis. Figure 1 shows the results by applying the software AntConc. As can be seen from Figure 1, the top five are "the," "of," "and," "I," "and." The results confirm that English text contains a large amount of articles, prepositions, and conjunction. It's worth noting that the word "I" is the 4th one; it may indicate that this novel is narrated from a first-person perspective. The male pronoun "he" is ranked 12th, and the female pronoun "her" is ranked 22nd, which reveals that there is more than one character. And the proper noun "andrew" ranked 50th may be the name of the leading character.

Keywords
By comparing a wordlist of The Time Machine with the wordlist based on the reference corpus, the keyword list is also obtained by AntConc. The top 20 keywords with high keyness are shown in figure 2. From the high keyness of "I," "my," "me," we can tell that the first-person perspective of the story is of vital importance. It sets the tone of the work. Then by looking for the context of the proper nouns "weena," "andrew," "morlocks," "filby," "marie," we can tell that "weena," "andrew," "filby" and "marie" are names of the person while "morlocks" is the name of a certain group. It confirms that the interaction of characters makes up a large part. And the place is crucial in the process of the story. Then the words "traveler," "machine," "time," "presently" echo the title, which indicates the story is based on time traveling.
In order to find out the features of this novel, this study classifies the keywords into different semantic fields after removing the proper nouns and function words for further analysis based on the semantic tagging mentioned before. The keywords in the novel involve many semantic fields. The words about the world and environment take a large proportion of the total. It reflects the author's idea of ecological ethics to a large extent. Combining the era of the late 19th century when Britain's industrial civilization has reached an advanced level in the world, we can tell that the author is complaining about the destruction of the natural environment and ecosystem in modern society stimulated by science and technology.

Rhetorical level
In this fiction, the Eloi and the Morlocks have strong symbolism. They can be seen as the metaphor of the bourgeoisie and the proletariat in Britain at that time. By extracting the description of "the Morlocks" in the context of the fiction, we can see that "the Morlocks," the creature living underground, produce goods for "the Eloi" who live on the ground. "The Morlocks" are engaged in machine production, but they don't form the management and social organization. It is obvious that they correspond to the proletariat. On the other hand, all necessities of "the Eloi" depend on "the Morlocks." And "the Morlocks" feed on the fattened Eloi. In other words, the Eloi degenerate into the tasty food for "the Morlocks," while "the Morlocks" also devolve into beast-like predators.
This shows Wells's concern over the increasingly obvious stratification of society and the intensification of class contradictions.

Major findings
Through the quantitative and qualitative stylistic analyses of The Time Machine from the perspective of lexis and rhetoric, the major findings are summarized. 1) At the lexical level, by analyzing the lexical density and word length of the two corpora, we can tell that The Time Machine is similar to 21th-century science fiction. The scale of the lexis is relatively simple, and the text is readable. The frequency of the words in the fiction reveals the first narrative perspective adopted by the author. As regards the keywords, the main semantic field includes the world and environment, which reflects the author's idea of ecological ethics.
2) At the rhetorical level, by looking for the co-occurrence of the context of the "Morlocks," the use of metaphor is discussed. It shows Wells's concern about the intensification of class contradictions between the metaphor of the bourgeoisie and the proletariat in Britain at that time.

Limitations and implications
In terms of implications, this study shows the applicability of Leech and Short's theory of stylistics on science fiction and enriches the empirical study on corpus stylistics and the research on The Time Machine. Different levels of the stylistic features help us better understand the writing style of Wells and the theme of the work.
As for the limitations, first, the syntactic level is worth studying. The sentence-level of the fiction can also be extracted and analyzed to enrich the research content. Then other methods of rhetorical can also be studied intensively.

Conflicts of Interest:
The authors declare no conflict of interest