论文范文:基于语料库的金融与会计学科学术论文中的词块研究

来源: 未知 作者:paper 发布时间: 2022-07-07 09:39
论文地区:中国 论文语言:中文 论文类型:会计论文
词块指一种多词组合,其研究受到国内外学者的广泛关注。作为学术写作 的重要组成部分,词块在学术语篇中被高频使用。近年来,运用词块分析法 的研究日益增多,主要涉及教学法
词块指一种多词组合,其研究受到国内外学者的广泛关注。作为学术写作
的重要组成部分,词块在学术语篇中被高频使用。近年来,运用词块分析法
的研究日益增多,主要涉及教学法、体裁、和学科差异方面的探讨。然而,
就体裁而言,仅少数前人研究关注财经类文本。此外,对于相近学科间词块
差异的研究也较少。
据此,本文旨在基于语料库的方法上,对财经类学科,即金融和会计学科
学术论文中的词块进行研究。
本研究自建三个语料库:两个财经类语料库(金融英语语料库,CFE;会
计英语语料库,CAE),及参照语料库(应用语言学英语语料库,CAL)。三
库共计约 180 万字。本文基于 Biber et al(. 1999)的结构框架和 Hyland(2008)
的功能框架对有效词块进行分析,其间提出适用于本文的新增词块分类。
本研究有四项主要发现。第一,金融和会计英语语料库中词块使用频率
较高,体现了词块在财经类学术写作中的重要性。第二,两库词块使用具有
相似之处。结构上,主要使用短语类词块(如 in the case of),同时,使用小
句类词块(如 this table reports the)的情况也较多。功能上,使用以研究为导
向的词块为主,其中探讨统计和量化主题的词块居多。第三,两库词块的使
用也各具特点,反映出财经类体裁的特色。比如,金融英语语料库中基于介
词短语的词块占比较高,而会计英语语料库则含有较多基于动词短语的词块。
同时,前者用于引导读者的词块更为多样化,而后者则使用多种词块描述研
究结果。第四,作为整体而言,财经类学科学术论文的词块使用特点接近于
自然科学类学科。
以上发现有助于理解学术论文中的词块概况,丰富了财经类体裁中词块使
用与学科差异的联系。并且,本研究对教学具有指导意义,同时从学科视野
下词块分析法的角度提出借鉴。
关键词:词块;基于语料库;学术论文;金融学;会计学
III
Contents
Chapter One Introduction......................................................................................1
1.1 Research background...................................................................................1
1.2 Research purpose and significance.............................................................. 2
1.3 Structure of the thesis...................................................................................4
Chapter Two Literature review............................................................................. 5
2.1 Previous studies on business-type texts....................................................... 5
2.2 The lexical bundle approach adopted in previous genre-based studies....... 7
2.2.1 Comparative and non-comparative studies on lexical bundles..........7
2.2.2 The intradisciplinary studies of lexical bundles in business-type
texts........................................................................................................... 11
Chapter Three Theoretical framework...............................................................14
3.1 Definitions of lexical bundles.................................................................... 14
3.2 Characteristics of lexical bundles.............................................................. 17
3.3 Classification of lexical bundles................................................................ 19
3.3.1 Structural classification of lexical bundles...................................... 19
3.3.2 Functional classification of lexical bundles.....................................20
3.4 Lexical bundle approach to genre analysis................................................ 23
Chapter Four Methodology..................................................................................26
4.1 Research questions.....................................................................................26
4.2 Data for the current study.......................................................................... 26
 4.2.1 The two ‘business-type’ corpora...................................................... 27
 4.2.2 The reference corpus........................................................................28
4.3 Research procedures.................................................................................. 29
 4.3.1 Data collection................................................................................. 29
 4.3.2 Data cleaning................................................................................... 29
 4.3.3 Bundle identification........................................................................30
IV
 4.4 Corpus tool.................................................................................................32
Chapter Five Results.............................................................................................33
 5.1 General distribution of bundles in the ‘business-type’ corpora................. 33
 5.2 Shared features in the ‘business-type’ corpora.......................................... 35
 5.2.1 Ratio of bundles............................................................................... 35
 5.2.2 Structural features in the ‘business-type’ corpora............................36
 5.2.2.1 NP-based bundles...................................................................39
 5.2.2.2 PP-based bundles................................................................... 42
 5.2.2.3 VP-based bundles...................................................................43
 5.2.3 Functional features in the ‘business-type’ corpora.......................... 44
 5.2.3.1 Research-oriented bundles.....................................................47
 5.2.3.2 Text-oriented bundles.............................................................52
 5.2.3.3 Participant-oriented bundles.................................................. 55
 5.3 Unique features in the ‘business-type’ corpora..........................................56
 5.3.1 Unique structural features................................................................56
 5.3.1.1 PP-based bundles................................................................... 57
 5.3.1.2 VP-based bundles...................................................................58
 5.3.2 Unique functional features...............................................................60
Chapter Six Discussions........................................................................................65
 6.1 Relationship between the structural and functional taxonomy..................65
 6.2 Relationship between lexical bundles and disciplinary variations............ 67
Chapter Seven Conclusions..................................................................................70
 7.1 Major findings............................................................................................70
 7.2 Implications................................................................................................73
 7.3 Limitations of the current study and suggestions for future research........75
References..............................................................................................................76
Appendix................................................................................................................83
V
Acknowledgements............................................................................................... 86
在读期间科研成果目录........................................................................................ 87
VI
Chapter One Introduction
Chapter One Introduction
This chapter will illustrate the research background, research purpose and
significance, along with the structure of this thesis.
1.1 Research background
Lexical bundles, referring to the frequently recurring sequences of words,
compose an indispensable element in phraseology studies. Sinclair (2008)
emphasized the importance of phrases by noting that meaningful units are formed
and shaped by the persistent recurrence and co-selection of words. Biber, Conrad
and Cortes (2004) suggested that lexical bundles should be regarded as a basic
linguistic construct with important functions for discourse construction. Similarly
as noted by Hyland (2012), this type of multi-word sequences is considered
familiar to users of a language and has customary pragmatic or discoursal
functions, while Cortes (2004) argued that its frequent recurrence signals fluent
linguistic production and competent language use, making it a significant member
among linguistic units.
As a result, the research on lexical bundles attract increasing attention from
scholars domestically and overseas. Lexical bundles are studied with one major
method, that is, the corpus approach. Currently, it is regarded as a scientific way to
linguistic study because it lays stress on obtaining evidence from naturally
occurring language symbols. This is different from the traditional theoretical
linguistic studies whose concentrations are mainly on the structures of the
language itself. The corpus approach is also applied in phraseology studies, in
which this type of word combinations can be viewed as a whole and become single
units for statistical purposes (Sinclair, 2008).
The applications of lexical bundles have been highlighted from diversified
1
A Corpus-Based Study of Lexical Bundles in the Research Articles
 From Disciplines of Finance and Accounting
perspectives, such as the research in areas of English for Specific Purpose (ESP)
and English for Academic Purpose (EAP). Among them, how bundles are
embedded in academic discourses with discipline-oriented characteristics becomes
a major focus. Lexical bundles are reported to be central in the creation of
academic discourse, and they offer an important means of differentiating written
texts by discipline (Hyland, 2008). Wang and Wang (2015) also argued that lexical
bundles are the expressions based on discipline-specific methodological
recognition by academic writers, and these expressions gradually become the
discipline practices.
The disciplinary features across academic texts through bundle analysis were
explored by several studies previously, having revealed that bundle usage
demonstrates discipline-specific discourse functions to some extent. However,
only a few existing studies concentrate on the lexical bundles in research articles
from business-type disciplines. Moreover, a few research have investigated the
disciplinary variations of lexical bundles from closely-related disciplines.
1.2 Research purpose and significance
As aforementioned, it is important to probe into how far the lexical bundles
differ by disciplines. Hence, the present study intends to investigate the research
articles from the disciplines of finance and accounting through lexical bundles
with a corpus-based approach.
The major purpose is to explore the genre features of research articles drawn
from bundle analysis regarding the disciplinary variations. To be more specific,
this study firstly aims to present how lexical bundles are used in the two
business-type disciplines, and secondly how these bundle features reveal the
discoursal characteristics of certain disciplines.
By comparing the 4-word bundles in the self-constructed corpora based on
2
Chapter One Introduction
the extraction and categorization criteria from Biber et al. (1999) and Hyland
(2008), the current study would demonstrate the grammatical structures and
pragmatic functions of lexical bundles from the two business-type disciplines,
together with the relation between lexical bundles and disciplinary variations in
academic texts.
 The significance of the present study is illustrated from three aspects.
 First, the previous research of lexical bundles are mostly concerned with the
disciplines that clearly stem from different areas, such as comparing the bundles in
disciplines of history and biology in Cortes (2004). Thus, it is worthy to discuss
how lexical bundles are associated with disciplinary features in research articles,
especially regarding the two rather related disciplines.
 Second, within the broad category of business-type texts, although several
previous studies focus on the research articles from the disciplines of business and
economics, the core research focus and the choices of lexical items of these areas
vary to some extent. Besides, the related studies involving the language data from
disciplines of finance and accounting appear to be few.
 Third, the findings might strengthen the understanding of the structures and
functions of lexical bundles in academic prose, and also enrich the lexical bundle
studies in discipline-specific contexts, particularly regarding the research articles
in areas of finance and accounting. Moreover, the findings may provide references
in academic writings for novice authors, who are able to take the bundle
information as instructions to the cohesion and coherence in their textual building
in the academic genres. Hopefully, the exploration of academic genre through
bundle analysis can contribute to the pedagogical guidance in areas of EAP and
ESP. It may help develop the comprehension of the prominent role played by
lexical bundles for EAP or ESP researchers and practitioners, where lexical
bundles are viewed as one of the key criteria for language choices in a
3
A Corpus-Based Study of Lexical Bundles in the Research Articles
 From Disciplines of Finance and Accounting
discipline-specific condition for certain genres.
1.3 Structure of the thesis
This study includes seven parts, which are progressively connected.
The first part has given the introduction of research background, purpose and
significance.
The second part will discuss the existing studies on business-type texts and
the lexical bundle approach adopted in previous genre-based studies.
The third part will present the theoretical framework, that is, the lexical
bundle approach to genre analysis. The definitions, characteristics, and
classifications of lexical bundles will be given.
The fourth part will describe the method adopted in this study, namely the
research questions, the construction of corpora, and the descriptions of research
procedures, such as data collection, data cleaning, criteria of bundle identification,
and the corpus tool.
The fifth part will present the results from aspects such as general distribution,
shared features, and unique features of lexical bundles across the corpora.
The sixth part, based on the reviews and findings in the previous sections,
will conduct a discussion on the connections between structural and functional
taxonomy in bundle classifications, along with the relationship between lexical
bundles and disciplinary variations.
The last part will provide the overall conclusions, including the major
findings, implications and limitations of the current study, together with the
suggestions for future research.
4
Chapter Two Literature review
Chapter Two Literature review
As aforementioned in Chapter one, the current study is motivated to focus on
business-type texts, particularly adopting the lexical bundle approach to conduct
genre analysis. Thus, in this chapter, the review will be carried out firstly on the
existing research upon business-type texts, and secondly on the previous
genre-based studies with a lexical bundle approach.
2.1 Previous studies on business-type texts
The business-type texts (also referred to as commerce-type texts) point to a
broad area related to any type of business communications, including the texts
from disciplines of finance and accounting. This categorization is recognized by
many previous studies. For example, Coxhead (2000) constructed the corpus with
commerce-related language data by taking into consideration the disciplines
including finance, accounting, economics, management, and marketing.
As shown in Table 2.1, the research on business-type texts are concerned with
various topics in business situations with different focuses, such as the comparison
between native and non-native English speakers, genre analysis, and lexical
studies.
Table 2.1 The research on business-type texts
author year topic
Cho & Yoon 2013 corporate earnings call
Zhu 2013 business faxes
Piqué-Noguera 2012 business research article abstracts
Nathan 2013 business case reports
Ma & Ren 2012 academic article introductions
Rutherford 2013 financial reporting
Li 2008 public annual report
Tsai, Wang & Chien 2016 financial keywords
Gardner & Davies 2014 list of core academic vocabulary
5
A Corpus-Based Study of Lexical Bundles in the Research Articles
 From Disciplines of Finance and Accounting
 A group of researchers focused on the business communications between
non-native and native English speakers. To illustrate, Cho and Yoon (2013)
investigated the business English fluency through comparing the usage of
corporate earnings call between Korean and native speakers. As the findings
implied that the low-level genre awareness of Korean participants led to the
misunderstanding and confusion during communications, the awareness of
professional genre specialty was highlighted in business English. Zhu (2013)
conducted the cross-cultural analysis of English and Chinese business faxes,
concluding that these two sides had different emphasis in many areas such as the
persuasive orientations in building the business relationships. The above two
literature suggest that the different cultural background may affect the
communications in business environment.
 Another group of scholars conducted the genre analysis of business-type texts.
For instance, Piqué-Noguera (2012) studied the structure and content of business
research article abstracts. The results showed an inconsistent writing pattern
among the collected abstracts, then the importance of presenting abstracts in a
more informative way was highlighted. Nathan (2013) investigated the business
case reports, and identified several common features of this type of text in respects
of the structures, styles, and lexis. The rhetorical moves were found with business
specialism, indicating the value of specialism-oriented pedagogy. Ma and Ren
(2012) conducted genre analysis of the introduction section in academic articles in
disciplines of finance and accounting, concluding that the core moves were similar,
with variations in the employment of lexis and tense. Rutherford (2013) conducted
the research from the theoretical perspective by taking financial reporting as genre,
and discussed how genre theory was applied to this type of text. The previous
studies upon genre analysis mainly focused on the overall moves of the target texts.
The research on a specific genre from a micro perspective were also noted, such as
6
Chapter Two Literature review
from the level of “word”.
There are also other researchers focusing on specialized vocabulary in
business-type texts. Regarding the financial report, Li (2008) explored the public
annual report readability with reference from Management Discussion and
Analysis section, by looking at its complex words with statistic tools. Tsai, Wang,
and Chien (2016) proposed an approach to discover financial keywords from a
large number of financial reports. In academic perspective, Gardner and Davies
(2014) summarized a list of core academic vocabulary, which included the finance
section as part of its corpus data.
The above studies provide evidences of the specialty of business-type texts.
Therefore it is important to probe into the characteristics of these texts under
certain communicative environments.
2.2 The lexical bundle approach adopted in previous
genre-based studies
2.2.1 Comparative and non-comparative studies on lexical bundles
Previous research pay attention to both the comparative and non-comparative
studies on lexical bundles. Most researchers focus on the comparative studies,
while the latter group composes only a tiny proportion.
The non-comparative studies concentrate on how bundles are used in an
individual situation. In pedagogical practices, for example, Herbel-Eisenmann and
Cortes (2010) elaborated how the stance bundles were particular to the context of
mathematics classrooms, noting that teacher stance was significant to both the
experience of teachers and students. Csomay (2013) documented the bundle
functions by specifying the opening and instructional phrases of class sessions,
showing that the bundle functions shifted corresponding to the discourse structures.
7
A Corpus-Based Study of Lexical Bundles in the Research Articles
 From Disciplines of Finance and Accounting
In terms of the research articles varied by disciplines, Zheng (2014) identified the
most frequently used lexical bundles in English research articles of applied
linguistics, and summarized the typical features of bundle usages by Chinese
authors. Based on a similar method, the structures and functions of lexical bundles
were analyzed from the research articles in disciplines of mechanical engineering
(Pan, 2016) and telecommunication (Pan, Reppen, and Biber, 2016).
The existing comparative studies mostly concentrate on the bundle usage
from three aspects (Table 2.2), including the comparisons between different groups
of speakers, genres, and disciplines.
Table 2.2 Perspective of contrastive studies on lexical bundles
perspective author year focus
Adel & Ermann 2012 L1 Swedish and native English speakers
Paquot 2013 L1 French and native English speakers
Pérez-Llantada 2014 L1 Spanish and native English speakers
speaker Esfandiari & Barbary 2017 L1 Persian and native English speakers
Jhang, Kim & Qi 2018 L1 Japanese and native English speakers
Chen & Baker 2010 L1 Chinese and native English speakers
Pan, Reppen & Biber 2016 L1 Chinese and native English speakers
Bychkovska & Lee 2017 L1 Chinese and native English speakers
Breeze 2013 four legal genres
genre Wright 2019 stand-alone literature review and
literature review in research paper
Biber, Conrad & Cortes 2004 university teaching and textbooks
Biber 2015 conversation and academic writing
Hyland 2008 electrical engineering, business studies,
applied linguistics, and biology
Wang & Wang 2015 linguistics,energy and electrical engineering
discipline Gao 2017 physics, computer science,
management, and linguistics
Hong & Hua 2018 international business management,
and academic formulas list
Hyland & Jiang 2018 diachronic study on research articles
from four disciplines
8
Chapter Two Literature review
Damchevska 2019 economics, applied linguistics and business
studies, biology and electrical engineering
Note: “L1” refers to the first language.
 Firstly, the comparison of bundle usage between speakers of different
proficiency levels attracted great attention from the researchers, especially the
bundles produced by native and non-native English speakers. Among these studies,
the first language of non-native English language speakers were ranging from
Swedish (Adel & Ermann, 2012), French (Paquot, 2013), Spanish (Pérez-Llantada,
2014), Persian (Esfandiari & Barbary, 2017), to Japanese (Jhang, Kim & Qi, 2018).
The bundles from English writings by Chinese speakers have also been explored.
The findings mainly laid stress on the performance of students’ overuse, underuse,
and misuse of lexical bundles, highlighting the formulaic nature in pragmatic
competence. To illustrate, Chen and Baker (2010) compared the lexical bundles in
native expert writings, native peer writings, and English learner writings from
Chinese speakers. The results showed that the expert writings exhibited the widest
range while the overuse and underuse of bundles were found in student writings.
Pan, Reppen and Biber (2016) explored the professional writings by native English
speakers and by Chinese speakers from English medium telecommunications
journals. The findings presented major structural differences between native and
non-native writers, where the former preferred phrasal bundles and the later the
clausal bundles. The results also suggested functional difference between the
writers’ choices of bundles. Bychkovska and Lee (2017) studied native English
and non-native Chinese undergraduate students’ use of lexical bundles in English
argumentative essays, and identified the most common bundle misuses in student
writing. It was found that native English writers employed a larger quantity of
bundles.
 Secondly, there were studies focusing on the bundle usages across genres.
Breeze (2013) discussed the lexical bundles across four legal genres, including the
9
A Corpus-Based Study of Lexical Bundles in the Research Articles
 From Disciplines of Finance and Accounting
academic law, case law, legislation, and documents. The findings showed that the
bundle types could be quite different, while the content phrases took up the major
percentage. Wright (2019) studied the stand-alone literature reviews from the
dimensions of frequency, dispersion, and discourse functions of lexical bundles,
which were then compared with the literature review section in academic writings.
The results implied that a core set of bundles for written academic prose might
exist. Moreover, previous studies also compared the written and spoken lexical
bundles, with related studies including Biber, Conrad and Cortes (2004), and Biber
(2015). They respectively concentrated on lexical bundles in university teaching
and textbooks, along with the most common multi-word patterns and the differing
pattern types in conversations and academic writings.
 Thirdly, another group of studies concentrated on discipline-specific bundles.
Hyland (2008) compared the lexical bundles in fields of electrical engineering,
business studies, applied linguistics, and biology, concluding that bundles form a
way to distinguish discourses by discipline. Wang and Wang (2015) compared the
choice of bundles in studies of linguistics along with energy and electrical
engineering. They explained that bundles could reconcile grammar, meaning, and
context, indicating the authors’ considerations on specific themes and purposes.
Gao (2017) explored the bundles in physics, computer science, management, and
linguistics, commenting that core lexical bundles existed across disciplines, and
the variation of discipline had a profound effect on bundle usage. Hyland and Jiang
(2018) conducted a diachronic study on lexical bundles in research articles from
four disciplines (applied linguistics, sociology, electrical engineering and biology)
over a time span of fifty years, proposing that lexical bundles were not constantly
invariant, but they changed in accordance to the varying contexts. The conclusions
provided insights into how bundles or how their functions changed over time in
terms of hard and soft science disciplines. Hong and Hua (2018) explored the
10
Chapter Two Literature review
journal articles in international business management, where the results revealed
that some lexical bundles were relatively specific to this individual field
comparing with the previously identified academic formulas list (Simpson-Vlach
& Ellis, 2010). Damchevska (2019) looked into the journal articles in economics,
and the results suggested that the bundles in applied linguistics and business
studies were similarly employed, while the target bundles were structurally
different from those in biology and electrical engineering.
These literature indicate the close relationship between lexical bundles and
disciplinary variations, where the notion is further strengthened that lexical
bundles are able to reflect the disciplinary characteristics to some extent.
2.2.2 The intradisciplinary studies of lexical bundles in business-type texts
To explain the intradisciplinary studies of lexical bundles, it is needed to
provide the definition of discipline. The term discipline was depicted by Hyland
(2015) as a notion with remarkable persistence in using language to engage with
others in certain recognized and familiar ways. Thus the relationship between
different disciplines can be further described as either interdisciplinary or
intradisciplinary relations.
The interdisciplinary studies refer to the comparisons between the disciplines
that are clearly from different areas, such as history and biology. The reviews in
section 2.2.1 concerning the lexical bundles with disciplinary variations can be
more specifically regarded as interdisciplinary studies.
The intradisciplinary studies, on the other hand, are concerned with the
disciplines that have a close affinity between them, such as finance and accounting.
The two disciplines chosen in the present study have similarities in some aspects.
Within the broad business-type disciplines, the study of finance tends to contain
the exploration of economics from a macro perspective, while the research on
11
A Corpus-Based Study of Lexical Bundles in the Research Articles
 From Disciplines of Finance and Accounting
accounting involves the study of management from a micro perspective. However,
the studies on finance and accounting share some features in areas of, for example,
the research on financial management, economic activities, along with the
application of arithmetic, and statistic tools.
The disciplines being explored in previous research differentiate greatly from
each other, demonstrating distinguished bundle usage both in structure and
function, which is not surprising considering the interdisciplinary nature of their
own research paradigm, subject, and method. The characteristics of these writing
components in research articles can be reflected through language use to a certain
extent, where bundle usage is included as an important element. However, it is
worthy discussion whether the results will remain the same if taking the two
closely-related disciplines with shared properties into account.
The previous research of lexical bundles in the disciplines of finance and
accounting are relatively rare. Although extremely few, the related disciplines have
been somehow referred to, appearing with the subjects including “business” or
“commerce”, and “economics”.
Table 2.3 The research on lexical bundles in business-type texts
author year focus
Wood & Appel 2014 business textbooks
Ruan 2016 corporate annual reports and CEO’s letters
Hong & Hua 2018 International Business Management journals
Hyland 2008 academic articles in business studies
Sun 2015 academic writings in economics
The research on lexical bundles in business-type texts are listed in Table 2.3.
To illustrate, although Wood and Appel (2014) studied the lexical bundles in
disciplines of business textbooks, the findings only stressed on the EAP
instructions, without touching upon the connection between bundles and
business-type texts. Ruan (2016) focused on the phraseology feature of a certain
12
Chapter Two Literature review
multi-word combinations of nominal group, with the lexis “business” as the head
noun in business English texts from corporate annual reports and CEO’s letters.
Although the findings suggested the distinctive features of the informational
writing in the written business discourse, they failed to provide the explanation of
general multi-word construction characteristics in business disciplines.
Moreover, concerning the language data in business-type journals, Hong and
Hua (2018) revealed that several lexical bundles in International Business
Management journals were relatively specific to this field. Besides, Hyland (2008)
included the research articles from business studies as part of the language data in
exploring the bundle usages. These two literature indicate that the constructions of
academic phraseological sequences need to accord to specific academic needs and
purposes. In addition, among the small number of studies touching upon the
bundles in discipline of economics, Sun (2015) constructed the corpus involving
the language data from the area of economics. However, its focus lied in the
bundle comparison of native and non-native English writings, without conducting
discipline-featured analysis.
Under the background of distinguishing disciplines from a macro perspective
into either soft sciences or hard sciences, Durrant (2017) argued that “commerce”
belongs to the intermediate disciplines. Therefore, the two business-type
disciplines of finance and accounting should also be regarded as the intermediate
ones. It is thus worthy discussion how lexical bundles could demonstrate the characteristics of the intermediate disciplines (specifically the two business-type
texts), and their relations with the soft and hard science disciplines.
13
A Corpus-Based Study of Lexical Bundles in the Research Articles
 From Disciplines of Finance and Accounting
Chapter Three Theoretical framework
This chapter will present the theoretical framework, that is, the lexical bundle
approach to genre analysis. Further, the definitions, characteristics, and
classifications of lexical bundles will also be given.
3.1 Definitions of lexical bundles
As Firth (1957) put forward the remarks that one shall know a word by the
company it keeps, subsequently the research focus on a single word is extended to
include the surrounding lexis. With the arguments from phraseologists such as
Sinclair (2008) that phrase is central and pivotal in linking the structures and
meanings, the research upon phraseology gradually arouse attentions in linguistic
studies, where phrase is considered as the normal carrier of meaning rather than
the word. Lexical bundles differ from other similar phraseology terms, such as
idioms and collocations. These multi-word expressions can be distinguished
according to their invariability and idiomaticity (Biber et al., 1999). The two
extremes of multi-word expressions are shown in Figure 3.1.
Figure 3.1 The extremes of multi-word expressions
In short, lexical bundles could be viewed as extended collocations, as longer
sequences show statistical tendency to co-occur (Biber et al., 1999: 989). To
compare, idiom is the most invariable one due to its structural and semantic
14
Chapter Three Theoretical framework
invariability, whose meaning as a whole is usually unpredictable with reference
from the meanings of the parts. Esfandiari and Barbary (2017) described idioms as
structurally complete units and can be replaced by other single words with similar
meanings. The idiomatic expressions include phrasal verbs (e.g. put off) and longer
expressions (e.g. kick the bucket), which can be replaced by verbs such as
postpone and die.
On the other end of the extreme, there is no agreed definitions upon
collocations, which are regarded as one of the most controversial notions in
linguistics. Collocations have been given definitions quite broadly, and lexical
bundles are somehow included in the broad category of collocation, thus the
distinction between these two concepts is not always unequivocal (Crossley &
Salsbury, 2011).
 However, the slight distinctions do exist. Collocations refer to specific types
of co-occurrence that are less rigid than idioms and follow syntactic combinations
(e.g. “adjective + noun collocations”, such as strong tea) (ibid). In addition, as
defined by Biber et al. (1999: 988), collocation indicates the association between
lexical words which co-occur more frequently than expected by chance, which
allows the individual words to retain its meaning though, the extended meaning as
a collocation contributes to the reason of co-occurrence. This association was
clarified by Esfandiari and Barbary (2017) as a relation between two lexical words.
Besides, Evert (2008: 1212) noted that a collocation is based on compelling and
widely shared intuition that certain words have a tendency to occur near each other
in natural language. To combine the above definitions, “co-occurrence” is mutually
included, and the components in a collocation are variable and usually not
idiomatic. Moreover, the distinction between collocations and lexical bundles also
lies in their length of sequence. Despite the criterion of co-occurrence, two words
are able to form a collocation while lexical bundles can only be formed with three
15
A Corpus-Based Study of Lexical Bundles in the Research Articles
 From Disciplines of Finance and Accounting
or more words.
Under the background of corpus approach, a major difference between idioms,
collocations, and lexical bundles lies in their occurrences, as the latter is much
more commonly used than the former two, and it must recur frequently to be
considered as a lexical bundle.
Lexical bundles, as a type of multi-word combinations, have been studied
under different terms (Table 3.1), including “formulaic chunks” by Widdowson
(1989), “multi-word item” by Moon (1998), “formulaic sequences” by Wray
(2000), “clusters” by Scott and Tribble (2006), as the term “lexical bundles” is
taken into use by Biber et al. (1999), Cortes (2004), Hyland (2008) and others.
Table 3.1 Terminology of lexical bundles in previous studies
author year term
Widdowson 1989 formulaic chunks
Moon 1998 multi-word item
Biber et al. 1999 lexical bundles
Wray 2000 formulaic language/formulaic sequences
Cortes 2004 lexical bundles
Scott & Tribble 2006 clusters
Hyland 2008 lexical bundles
In fact, there are also several other definitions denoting such word
combinations, varied with distinctive focuses. Nattinger and Decarrico (1992)
suggested that multi-word lexical phenomena exist somewhat between the
traditional poles of lexicon and syntax, which occur more frequently and have
more idiomatically determined meaning than language that is put together each
time. Moon (1998) noted that phrasal lexemes stand for the whole range of fixed
and semi-fixed complex items which are classified and treated as “phrases” or
“idioms”. These sorts of item are regarded as holistic units rather than
compositional strings, regarding semantics, lexico-grammar, and pragmatics. Wray
(2002) commented that formulaic sequence refers to a continuous or discontinuous
16
Chapter Three Theoretical framework
sequence of words or other elements, which is or appears to be prefabricated. It is
stored and retrieved whole from memory at the time of use. Moreover, lexical
bundles have been defined by Biber et al. (1999) as the recurring sequences of
three or more words, or simply sequences of word forms that commonly go
together in natural discourse. They are also considered as the recurrent discourse
building blocks with the surrounding slots containing specific information in
individual situations.
3.2 Characteristics of lexical bundles
This study adopts the term “lexical bundles” because this term has been
widely used in previous corpus-based studies. Further, it is important to note that
lexical bundles are associated with several characteristics, which are summarized
in Table 3.2.
Table 3.2 The characteristics of lexical bundles
characteristic interpretation
multi-word unit combination of words
frequency-driven approach a large number of occurrences
semi-idiomatic expression partially equivalent to idiomatic expression
context-dependent/independent stand alone or be meaningful with context
holistically saved and extracted be stored and retrieved as a whole
language fluency criterion a signal of competent language use
First, it is a multi-word unit, implying a kind of combination of words, which
is relatively fixed with its component lexis flexibly substitutable. For example, this
table reports the, this table shows the, and this table presents the are all
meaningful 4-word bundles, despite the slight change in their choice of component
verbs.
Second, it follows a frequency-driven approach. The emergence of lexical
bundles initially relies on their occurrences. With a frequency-driven starting point,
17
A Corpus-Based Study of Lexical Bundles in the Research Articles
 From Disciplines of Finance and Accounting
the word combinations with few occurrences may not be observed as significant
bundle candidates, for more generally applied bundles are expected in exception of
idiolect or other minority styles of wording. These minority styles are certainly
worth studying but not under a corpus-based paradigm with big data as its core.
Usually, a conservative cut-off frequency is set as 20 times a million words, and
researchers may adjust this setting in terms of their research purpose.
 Third, it can be viewed as a semi-idiomatic expression. Lexical bundles are
partially equivalent, but not necessarily, to a phrase or idiomatic expression. For
example, the 3-word bundle as well as may be taken as a phrase, but the 4-word
bundle as well as the may not be viewed as a phrase. The idioms can only be
regarded as lexical bundles when the occurrence reaches the cut-off frequency,
which appears to be rare.
 Fourth, it could be context-dependent or context-independent. Lexical
bundles can either stand alone or be meaningful with context. Since bundles are
often fragmented units, some of them are structurally complete, and others rely on
a larger unit of sequences. For example, it is easy to interpret the meaning of the
bundle on the other hand”, which indicates the transition to an opposite side, but
not confident about the bundles such as takes the value of or to one if the. The fact
is that “value” contains more than one meaning itself, thus ambiguities may appear
without looking at a bigger size of word combinations. The same is true for “to one
if the”, which seems meaningless and structurally incomplete. Then the context of
this kind of bundles will be checked since they meet with the identification criteria
in the first place.
 Fifth, it should be holistically saved and extracted. Lexical bundles are
usually stored and retrieved as a whole from one’s memory. As noted by Wray and
Perkins (2000), this notion is agreeable for academic writing when advanced
writers follow a set of expressions, in which a group of lexical bundles are
18
Chapter Three Theoretical framework
involved.
Sixth, it is regarded as a language fluency criterion. For language learners,
this so-called multi-word sequence is as important as collocation, which forms a
criterion of language fluency. The production of lexical bundles shapes the
successful language learning process, especially for second language learning
which relies on language information instruction, compared with the first language
acquisition that depends more on the environment. Cortes (2004) even regarded
the frequent use of lexical bundles as a signal of competent language use within a
text type. Writers in specific areas also need such language instructions, either in
conscious or unconscious ways, so that they follow the authorized or popularized
format thus stylistic features become prominent to match a special genre. It is also
important for speakers of a certain community to know what to utter within a
range of lexical choices in particular occasion or region, being mutually
intelligible without sounding wired or awkward.
3.3 Classification of lexical bundles
Lexical bundle classification can be conducted from two aspects, respectively
structural and functional classifications. As the structures rely on the language use
itself, the functions are associated with how language is meaningful in real
conditions.
3.3.1 Structural classification of lexical bundles
The structural analysis in the current study mainly followed the framework
from Biber et al. (1999) because it is a comprehensive categorization according to
the structural correlates of bundles. It has been applied in many previous research
(e.g. Csomay 2013; Perez-Llantada, 2014; Chen & Baker, 2016; Wang, 2017;
Jhang, Kim & Qi, 2018), dependable as an all-rounded frame and a departure point
19
A Corpus-Based Study of Lexical Bundles in the Research Articles
 From Disciplines of Finance and Accounting
for classification in the aspect of bundle structures.
In addition, this framework was typically designed for the bundles in
academic prose. Since the current study focuses on the academic articles, the
major bundle features are supposed to match up with this framework. Chen and
Baker (2010) provided a supplementary scheme by grouping the subcategories into
3 broader categories as Noun Phrase-based, Prepositional Phrase-based, Verb
Phrase-based bundles (hereinafter referred to as NP-based, PP-based, and
VP-based bundles). In the framework of Biber et al. (1999), in total 12 detailed
categories were distinguished, as shown in Table 3.3.
Table 3.3 Structural category of lexical bundles in academic prose
(based on Biber et al., 1999: 1014-1024)
structural sub-category example
noun phrase with of-phrase fragment the end of the
noun phrase with other post-modifier fragment the fact that the
prepositional phrase with embedded of-phrase fragment as a matter of
other prepositional phrase fragment at the same time
anticipatory it + verb phrase/adjective phrase it should be noted
passive verb + prepositional phrase fragment is referred to as
copula be + noun phrase/adjective phrase is part of the
(verb phrase +) that-clause fragment should be noted that
(verb/adjective +) to-clause fragment is likely to be
adverbial clause fragment as we shall see
pronoun/noun phrase + be (+ ...) there has been a
other expressions as well as the
3.3.2 Functional classification of lexical bundles
As to functional classification, two frameworks respectively from Biber et al.
(2004) and Hyland (2008) were majorly employed in previous research. Biber et al.
(2004) presented an overall framework (Table 3.4), which was applied by the
scholars such as Adel and Erman (2012), Perez-Llantada (2014), Wood and Appel
20
Chapter Three Theoretical framework
(2014), Chen and Baker (2016), Wang (2017), Jhang, Kim, and Qi (2018), and
Wright (2019). However, Hyland (2008) commented that Biber’s taxonomy
emerged from a broader corpus of spoken and written texts, covering a wide range
of texts apart from the academic writings, thus later on a new taxonomy was
developed based on Biber et al. (2004) by narrowing down the framework for
research-focused genres (Table 3.5). Therefore, Hyland’s framework was chosen
as the basis for functional analysis in this paper.
Table 3.4 Functional category of lexical bundles in general (based on Biber et al., 2004: 384-388)
category sub-category example
stance bundles epistemic stance bundles I don’t think so
attitudinal/modality stance bundles I would like to
discourse organizing bundles topic introduction/focus bundles what I want to
topic elaboration/clarification bundles on the other hand
identification/focus bundles one of the things
referential bundles imprecision bundles or something like that
bundles specifying attributes a little bit of
time/place/text-deixis bundles the end of the
Table 3.5 Functional category of lexical bundles in research-focused genres
(based on Hyland, 2008: 13-14)
category sub-category example
location at the beginning of
procedure the purpose of the
research-oriented quantification one of the most
description the size of the
topic the currency board system
transition signals in contrast to the
text-oriented resultative signals it was found that
structuring signals as shown in figure
framing signals in the presence of
participant-oriented stance features may be due to
engagement features as can be seen
21
A Corpus-Based Study of Lexical Bundles in the Research Articles
 From Disciplines of Finance and Accounting
 In total, 11 subcategories are presented. Three categories “research-oriented,
text-oriented, and participant-oriented” concern distinctive aspects. To illustrate,
research-oriented bundles put emphasis on describing the research process, in
which five elements are normally involved. With their particular focus, “location”
bundles indicate time and place, “procedure” bundles help record how a certain
research is conducted, “quantification” bundles are associated with numbers,
“topic” bundles relate to the research subject, and the research-related bundles that
are not primarily considered with other four functions are classified into the
subcategory of “description” bundles.
 Text-oriented bundles are the sequences more concerned with the language
from a micro perspective, paying close attention to how a research is logically
presented regarding its bundle usages. Four subcategories are noted: The
“transition signals” refer to the bundles linking either addictive or contrastive
elements; The “resultative signals” help elaborate the causal relationship; The
“structuring signals” direct readers to follow the author’s narration; The “framing
signals” set limitations in narrowing the condition of discussions.
 Participant-oriented bundles focus on the author or reader of certain texts. On
the one hand, the bundles with “stance features” are coined to convey the author’s
attitudes. On the other hand, the ones with “engagement features” convince the
readers of the viewpoints from the author.
 In Hyland’s framework, three dimensions, “research, text, and participant”,
are sketched with a broader sense of function relating to systemic-functional
linguistics, in which three core functions respectively accord with the meaning and
orientation of target bundles, forming a well-rounded categorization model
specially aiming at academic texts.
22
Chapter Three Theoretical framework
3.4 Lexical bundle approach to genre analysis
Genre was defined by Swales (1990: 45-58) as the categorization of
communicative events, suggesting that a genre is employed by members of a
discourse community who share particular communicative purposes. Bhatia (2004:
23) regarded genre as the language use in a conventionalized communicative
setting, with specific communicative goals of a disciplinary or social institution.
Under this background, the analysis of genre, as a way to investigate this purpose,
helps analyze the strategies in language use in certain kind of text. It was noted by
Bhatia (2012) that genre analysis is considered as a most useful tool to analyze
academic genres for ESP applications, which is able to offer a partial view of
language aspects in typical contexts.
One of the major reasons for using a lexical bundle approach is that lexical
bundles are firmly connected with one layer of genre analysis summarized by
Swales (2004). It implies that lexical bundles become an important research
perspective, which could be referred to in exploring the genre features.
According to Swales (ibid), three layers were noted upon research article
genre analysis (as shown in Table 3.6), respectively focusing on part-genres (e.g.
IMRD organization-introduction, method, results, discussion), moves (e.g. CARS
model-create a research space), along with language features (e.g. lexical bundles).
Table 3.6 Three layers upon research article genre analysis
layer focus example
layer 1 part-genres IMRD organization
layer 2 moves CARS model
layer 3 language features lexical bundles
In terms of the first two layers, recent studies upon genre analysis convey the
relationships between organizational structures, stylistic features, and social
functions from a macro perspective. To illustrate, Parkinson (2017) followed the
23
A Corpus-Based Study of Lexical Bundles in the Research Articles
 From Disciplines of Finance and Accounting
move analysis to identify the genre-based features of student laboratory reports,
which was then compared with the previously distinguished moves in research
articles. The different purposes of these genres were revealed through move
constructions. Drawn on the CARS model from Swales (1990), Chen, Yu and Shi
(2019) analyzed the introduction section of English research articles written by
Chinese and native English authors in the area of biology, as the findings
suggested generally consistent macro structure of texts.
Table 3.7 The research with lexical bundle approach to genre analysis
author year featured genre
Breeze 2013 four legal genres
Allan 2016 business meetings
Allan 2017 five self-study books
Jhang, Kim & Qi 2018 investigation report
Hyland 2012 academic discourse
From the aspect of the third layer, genre features can also be demonstrated
through lexical bundles. Cai (2016) commented that although lexical phrases and
genre studies have different original research paradigms, the former composes an
important component of the latter.
The research with lexical bundle approach to genre analysis are shown in
Table 3.7. For instance, the results from Breeze (2013) suggested that the
specialized texts contained a wide range of bundles which could be categorized as
discipline-specific terms. Allan (2016) studied the lexical bundles in business
meetings, finding that this genre had a high degree of overlap with the general
spoken texts, then the idea was confirmed that stable core features of lexical
bundles existed in spoken English. Allan (2017) also analyzed the lexical bundles
in five self-study books, finding that the quantities and forms of bundles displayed
notable differences, while their functions were broadly similar. Jhang, Kim, and Qi
(2018) took the marine accident investigation report as an example to explore the
24
Chapter Three Theoretical framework
lexical bundles in ESP writings. The similar use of some types of lexical bundles
by native and non-native writers was reported to reflect the special characteristics
of this genre.
Regarding the lexical bundles in academic discourse with disciplinary
contexts, Hyland (2012) discussed the existing research concerning the bundles in
spoken and written academic texts, and summarized their ideas in respect of
bundle identification, the importance of academic sequences, pedagogical issues,
together with disciplinary and generic variations. He agreed with Biber and
Barbieri (2007) that genre was important in distinguishing bundle distributions,
and the variation of bundles contributed to the interpretation of generic patterning.
These literature suggest that the lexical bundle approach has been applied to
the research of distinctive genres, and it is confirmed that lexical bundles are able
to reveal some characteristics of certain genres.
From bundle features to genre specialty, the lexical bundle approach provides
an important perspective to genre analysis, which could be viewed as another
significant approach in addition to the commonly applied models as mentioned
earlier. Furthermore, it has been proved that a core list of lexical bundles
contributes to the construction of academic discourse (Simpson-Vlach & Ellis,
2010; Wright, 2019), forming one of the characteristics of academic genre. Hyland
(2012) also noted that the description of lexical bundles was able to improve the
understanding of how disciplinary arguments were achieved in distinctive contexts. Under this background, this study adopts the lexical bundle approach to explore
the research articles in two business-type disciplines.
25
A Corpus-Based Study of Lexical Bundles in the Research Articles
 From Disciplines of Finance and Accounting
Chapter Four Methodology
This chapter will provide the research questions, data for the current study,
research procedures, and the corpus tool.
4.1 Research questions
The present study investigates the published research articles from the
disciplines of finance and accounting to explore their use of lexical bundles by
adopting a corpus-based approach. More specifically, the study addresses the
following research questions.
1. What are the frequently occurring 4-word lexical bundles in the Corpus of
Financial English (hereinafter referred to as the CFE), and in the Corpus of
Accounting English (hereinafter referred to as the CAE)?
2. What are the structural and functional features of lexical bundles in the
CFE and the CAE?
3. How are these structural and functional features of lexical bundles
associated with the two business-type disciplines of finance and accounting?
4.2 Data for the current study
Three corpora were compiled in this study, including two business-type
corpora (the CFE and the CAE), and a reference corpus (Corpus of Applied
Linguistics, hereinafter referred to as the CAL). The results from the CFE and the
CAE were supplemented by comparison with that from the CAL. The total
capacity of the three corpora reach over 1.8 million words, as shown in Table 4.1.
It is necessary to note that in the comparison of certain lexical bundles across the
corpora, the normalized frequency and loglikelihood test are taken into account
due to the difference in total capacity of each corpus.
26
Chapter Four Methodology
Table 4.1 Corpora for the current study
Name of Corpus CFE CAE CAL
Discipline Finance Accounting Applied Linguistics
Source Journal of Financial Economics Journal of Accounting Research Applied Linguistics
Year 2017-2019 2017-2019 2017-2019
Type of text research articles research articles research articles
Number of text 39 38 45
Total number of words 763,640 words 691,164 words 444,358 words
4.2.1 The two ‘business-type’ corpora
All the language data were adopted from published journal articles due to
their representativeness in academic writings, and published articles must meet
certain criteria in terms of writing proficiency. Besides, although the writing style
of authors may vary, the overall writing performance might not differentiate as
much as that of student writings. Then a more general disciplinary features can be
exemplified by excluding this interference factor to some extent.
The lexical bundles in disciplines of finance and accounting were taken as
the data of this research, and the two corpora of CFE and CAE were constructed
accordingly. These two disciplines were chosen firstly because the previous studies
pay much attention to comparing the disciplines with completely different research
contents and methodologies (See Section 2.2.2), and the studies on these two
disciplines are relatively few. Secondly, finance belongs to the economic study
areas, and takes its role as an important branch of economic research, while the
major subjects in accounting research are also related to economic activities. Thus
the bundle characteristics in the two closely related disciplines may be worthy
discussion. Thirdly, the writings in these two areas compose an indispensable part
in business-type texts, then the results might suggest intradisciplinary
characteristics through bundle usage across the two disciplines.
The published journals chosen included Journal of Financial Economics and
27
A Corpus-Based Study of Lexical Bundles in the Research Articles
 From Disciplines of Finance and Accounting
Journal of Accounting Research, which were taken as the sources for crawling data,
respectively representing each discipline. The followings are the introductions of
the selected journals.
Journal of Financial Economics1 is a leading peer-reviewed academic
journal covering theoretical and empirical topics in financial economics. It places
emphasis on the areas including capital markets, financial institutions, corporate
finance, corporate governance, and the economics of organizations.
Journal of Accounting Research2 is a general-interest accounting journal. It
publishes original research in all areas of accounting. The related research
typically use analytical, empirical archival, experimental, and field study methods.
The studies address economic questions, in accounting, auditing, disclosure,
financial reporting, taxation, and information as well as related fields such as
corporate finance, investments, capital markets, law, contracting, and information
economics.
4.2.2 The reference corpus
In order to further explore the similarities and differences of bundle usages in
the CFE and the CAE, a third corpus has been built as a reference in the analyses.
The discipline of applied linguistics was chosen because it had been selected to
construct the self-built corpus in many previous studies, which might help increase
the comparability with the results from other related research. Besides, applied
linguistics focuses on language studies, totally different from either of the target
two business-type disciplines in contents, thus it could be treated as rather a fine
source for reference.
The data made up the third corpus were derived from the journal named
1 See more details at https://www.journals.elsevier.com/journal-of-financial-economics
2 See more details at
https://onlinelibrary.wiley.com/page/journal/1475679x/homepage/productinformation.html
28
Chapter Four Methodology
Applied Linguistics3, which published research into language, and promoted the
research on language-related concerns in the various fields encompassed by the
area of applied linguistics.
Combined with the above elements, these three journals were chosen mainly
due to their high impact factor, reputation of the publisher, internationally
recognized credential, along with comprehensiveness and reliability in each
industry, which contributed to the representativeness of each corpus, and thus each
discipline.
4.3 Research procedures
Three stages were involved in the research procedures: data collection, data
cleaning, and bundle identification.
4.3.1 Data collection
The data collection of the three corpora included the collection of published
articles in the disciplines of finance, accounting, and applied linguistics.
All the articles were randomly selected among the past three years from 2017
to 2019. Firstly, in total 122 texts were downloaded from the publishers or online
database in “.pdf” format. Secondly, all the “.pdf” files were converted into “.doc”
files in electronic version.
4.3.2 Data cleaning
The three obtained documents were cleaned after the collection of articles. In
addition to the main body of texts, the titles, notes, tables (including captions,
descriptions, footnotes), captions of figures, appendices, excerpts, samples,
extracts, and transcriptions, were retained. Although these parts are regarded as the
3 See more details at https://academic.oup.com/applij/pages/About
29
A Corpus-Based Study of Lexical Bundles in the Research Articles
 From Disciplines of Finance and Accounting
additional materials for research articles, they are equally important considering
the bundle usage might be reflected through supplementary components as well.
The removed elements were non-linguistic components, including headers,
footers, pictures, and figures. Three main steps were involved.
Firstly, headers and footers which do not necessarily contribute to the original
work were excluded. Secondly, the rest of contents in “doc.” format were
converted into “.txt” files. In this step, pictures and figures were automatically
deleted, and tables were converted into plain text version, while all the other
language scripts were preserved. It is noted that formulas and digits were retained,
for the following extraction step would not identify these elements in terms of the
identification criteria, thus having no effect on the results. Thirdly, all the
paragraph markers were manually deleted so as to successfully retrieve the lexical
bundles with the help of Antconc software. Then the cleaned files were saved.
4.3.3 Bundle identification
The bundle identification procedure is a crucial step in bundle analysis. As
summarized by Cortes (2004), two approaches can be used to identify frequent
word combinations. One is to pre-select a group of expressions. These expressions
should be regarded as notable based on the language produced by native speakers,
or be considered frequently occurred in related literature. Another approach relies
more on the search tools, extracting lexical co-occurrences according to the
preseted length, cut-off frequency, and dispersion.
The second approach was adopted in the present study. In total three criteria
were preseted in order to generate a list of lexical bundles with the corpus tool.
The 4-word bundles. Firstly, 4-word bundles were chosen for the current
study. There are three major reasons for this choice. First, they occur more
frequently and are more common than 5-word bundles, and then more plentiful
30
Chapter Four Methodology
lexical bundles are potentially involved. Second, although the number of 3-word
bundles is assumed to be even more than 4-word ones, many 4-word bundles hold
3-word bundles concerning the structures and functions (e.g. we find that the
contains we find that). These two points are also supported by Hyland (2008:8) as
“4-words bundles are far more common than 5-word strings and offer a clearer
range of structures and functions than 3-word bundles”. Third, Chen and Baker
(2010) noted that the number of 4-word bundles is within a controllable size for
manual processing and concordance checks.
The cut-off frequency. Secondly, the cut-off frequency was set at 40 times
per million words. This criterion varies according to the size of corpora and
purpose of each research. Normally, a conservative point is set at 20 times per
million words, such as in Lu and Deng (2019). Nevertheless, since this study
focuses on the high frequent lexical bundles, thus following a moderately high
cut-off point applied by studies such as Grabowski (2015), Pan, Reppen, and Biber
(2016), Esfandiari and Barbary (2017), and Bychkovska and Lee (2017). Then the
stricter normalized cut-off frequency was pinpointed as 40 times per million
words.
 The dispersion threshold. The third criterion is concerned with the number
of texts that involve the same lexical bundles. In order to avoid idiosyncrasies
from individual writers or speakers, the dispersion threshold should be preseted.
Previous studies used different criteria, ranging from 5 texts (Biber et al., 2004), or
over 10% of texts (Hyland, 2008; Grabowski, 2015). Due to the number of the
texts in the current study is relatively fewer but the mean length is longer, the
occurrence of word combinations in at least 4 texts has been set, following
Hyland’s criterion.
31
A Corpus-Based Study of Lexical Bundles in the Research Articles
 From Disciplines of Finance and Accounting
4.4 Corpus tool
The software Antconc (Version 3.5.7)4, developed by Lawrence Anthony, is a
free corpus analysis toolkit for concordancing and text analysis, containing the
functions of extracting concordance, clusters/N-Grams, collocates, word list, and
keyword list. It is currently used to generate 4-word bundle lists, and to determine
the structures as well as functions of the extracted bundles by checking the
concordance lines.
4 Anthony, L. (2018). AntConc (Version 3.5.7) [Windows]. Tokyo, Japan: Waseda University. Available from
https://www.laurenceanthony.net/software
32
Chapter Five Results
Chapter Five Results
This chapter will present the results of the current study, including the general
distribution of bundles, along with the shared and unique features of bundles in the
two business-type corpora.
5.1 General distribution of bundles in the ‘business-type’
corpora
Table 5.1 demonstrates a general distribution of bundles in the three corpora.
It is necessary to mention that the item “type of bundles” in this table represents
the type of eligible bundles that have been extracted from the three corpora. The
“raw frequency” refers to the total occurrence in each corpus, while the
“normalized frequency” is computed as the occurrence out of one million words.
The “word ratio of bundles” refers to the percentage of qualified bundles in each
corpus. It was calculated through two steps. Firstly the total frequency of bundles
was multiplied by the words each bundle contains (4-word bundles are the
research subjects then each bundle contains 4 words). Secondly the result in the
first step was divided by total frequency of words in each corpus.
Table 5.1 General distribution of bundles in the three corpora
name of corpus CFE CAE CAL
discipline Finance Accounting Applied Linguistics
type of bundles 78 73 29
raw frequency of bundles
(normalized frequency) 3577 (4684) 3335 (4825) 884 (1989)
word ratio of bundles 1.87% 1.93% 0.79%
The bundle selection criteria include frequency, range, and length,
generating 123 different 4-word bundles in the CFE and the CAE out of altogether
33
A Corpus-Based Study of Lexical Bundles in the Research Articles
 From Disciplines of Finance and Accounting
1.45 million words. As the total number of words in the CFE is a little more than
that in the CAE, the CFE demonstrates a higher capacity of bundle types, thus the
higher total bundle frequency. 78 types of 4-word bundles in the CFE and 73 in the
CAE were respectively retrieved, with 28 shared bundles. It is noted that the
normalized frequency of bundles in the CAL is two times lower than the other two
corpora, suggesting that this type of relatively fixed expressions appear more
frequently in the two business-type texts.
Regarding the word ratio of bundles, the CAE displays a bit higher proportion
than the CFE, with bundles covering around 1.9% of total words in both the CFE
and the CAE. Interestingly, compared with this item in the CAL, the ratio reaches
around 0.8%. Thus this proportion is higher in disciplines of finance and
accounting than that in applied linguistics, implying a stronger tendency in
involving bundles in business-type articles.
Table 5.2 lists the top 10 lexical bundles retrieved from the CFE and the CAE
(The complete list of bundles can be seen in the Appendix). The top 10 bundles
occur over 90 times per million words. The most frequent bundles are at the end of
hitting 101 times in 22 texts in the CFE, and are more likely to hitting 122 times in
28 texts in the CAE.
Table 5.2 Top 10 lexical bundles in the CFE and the CAE
Lexical bundles in the CFE Lexical bundles in the CAE
Bundles frequen
cy Norm.
Freq range Bundles frequen
cy Norm.
Freq range
at the end of 101 132 22 are more likely to 122 177 28
the cross section of 86 113 20 equal to one if 90 130 13
panel a of table 84 110 27 is an indicator variable 86 124 18
on the other hand 77 101 24 indicator variable equal to 81 117 11
panel b of table 74 97 26 an indicator variable equal 80 116 10
are more likely to 73 96 23 variable equal to one 79 114 11
is a dummy variable 72 94 14 an indicator variable that 73 106 8
we find that the 72 94 22 the effect of the 72 104 15
34
Chapter Five Results
the difference between the 71 93 23 the extent to which 69 100 21
the dependent variable is 69 90 18 we find that the 68 98 22
Note: “Frequency” represents the raw frequency. “Norm. Freq” stands for the normalized frequency, suggesting the occurrence
out of one million words. “Range” indicates the number of texts in which each bundle appears. The bundles written in bold format
suggest repeating appearance across the two corpora.
The 4-word bundles that meet the criteria in one corpus were checked if they
ever appear in another corpus. 7 bundles in the CAE show no presence in the CFE,
including is an indicator variable, as the dependent variable, indicator variable
that takes, securities and exchange commission, value of one if, after the adoption
of, and reports the results of. Meanwhile, 5 bundles in the CFE are not present in
the CAE, including the corporate bond market, a one standard deviation, the end
of month, cross sectional relation between, and in the corporate bond. It is notable
that almost all those bundles exclusive to one corpus suggest disciplinary
topic-based information. (More details will be provided in section 5.2.3.1)
5.2 Shared features in the ‘business-type’ corpora
5.2.1 Ratio of bundles
Table 5.3 presents the percentage of unique and shared bundles. In total 28
bundles were observed in both corpora, taking up 35.9% and 38.36% in the CFE
and the CAE respectively, with the unique bundles in each corpus accounting for
64.1% and 61.64%.
Table 5.3 Percentage of unique and shared bundles in the CFE and the CAE
category bundle type percentage
CFE-unique 50 64.10%
CFE-shared 28 35.90%
CAE-unique 45 61.64%
CAE-shared 28 38.36%
The proportion of shared bundles is around 37% when comparing all the
35
A Corpus-Based Study of Lexical Bundles in the Research Articles
 From Disciplines of Finance and Accounting
identified bundles, and 41% when comparing the top 50 bundles. These
percentages are moderately higher than that in previous studies. For example,
Hyland (2008) showed that the disciplines of business studies and applied
linguistics from the social sciences share 36% of bundles regarding the frequent 50
bundles, while electrical engineering and microbiology from the pure sciences
share 32% of bundles. Thus it is worth noting that the disciplines with some level
of similarity in research subject, research methodology or other aspects may
contribute to the affinity in bundle usages.
Among the shared 28 bundles, 12 regular bundles frequently appear in
general texts such as on the other hand, and as well as the, normally being viewed
as the best candidate bundles for a general EAP discourse. The other 16 bundles
were all considered as being associated with the theme of “statistics”. The shared
bundles could reflect some genre features, when both the generally occurred
bundles in academic articles together with the ones performing the “statistics”
function are commonly employed in these two business-type texts.
5.2.2 Structural features in the ‘business-type’ corpora
Table 5.4 displays a comparison of the structural categories across the three
corpora in a broader category by Chen and Baker (2010). 12 types of lexical
bundles (See Section 3.3.1) were further categorized into 3 groups, “NP-based”,
“PP-based”, and “VP-based” bundles. Respectively, “NP-based” bundles indicate
the co-occurrence of noun phrases and their modifiers; “PP-based” bundles
represent the prepositional phrases; “VP-based” bundles refer to a variety of word
combinations with verb components. Biber et al. (1999) explained that NP-based
and PP-based bundles are mainly included as phrasal bundles while clausal
bundles consist of simple verb phrases, together with the bundles that incorporate a
main clause.
36
Chapter Five Results
Table 5.4 Broad comparison of structural categories
category type total frequency LL
CFE CAE CAL CFE CAE CAL
NP-based
—phrasal 33(42%) 29(40%) 7(24%) 1526(43%) 1434(43%) 229(26%) -1.04
PP-based
—phrasal 19(24%) 13(18%) 19(66%) 876(24%) 518 (15%) 590(66%) 60.74**
VP-based
—clausal 23(30%) 29(40%) 2(7%) 1040(29%) 1290 (39%) 42(5%) -57.59**
others 3(4%) 2(2%) 1(3%) 135(4%) 93(3%) 23(3%) 4.15*
total 78(100%) 73(100%) 29(100%) 3577(100%) 3335(100%) 884(100%) 61.55**
Note: LL=Loglikelihood. *=significant at p<0.05 level. **= significant at p<0.01 level. LL in this study compares
the two corpora of the CFE and the CAE.
The frequency of bundles has been computed in types and total frequency.
The former refers to the number of bundles belonging to each structural
subcategory, and the latter the total occurrence of the corresponding type.
Log-likelihood was adopted to assess the raw frequency, so as to identify the
statistical significance across the CFE and the CAE.
The two business-type corpora show similar tendency in an overall
distribution. NP-based bundles account for the largest percentage while VP-based
bundles occupy the second place and PP-based bundles the third place. If
considering phrasal and clausal bundles each as a whole, the proportion in each
corpus is consistent with the findings in previous study (Biber, 2015; Pan, Reppen
& Biber, 2016) that phrasal bundles are predominantly used in academic prose,
explaining that the phrasal feature is related to its high informational focus, which
results in the shift from clausal to phrasal style.
In contrast, PP-based bundles take up the largest proportion in the CAL, as
NP-based and VP-based bundles together form fewer than half of the total. Besides,
the CFE and the CAE contain higher percentage of clausal bundles than the CAL.
According to Biber et al. (1999), clausal bundles were used more frequently in
37
A Corpus-Based Study of Lexical Bundles in the Research Articles
 From Disciplines of Finance and Accounting
conversation than in academic prose. That is to say, academic features can be
better reflected through the research articles from the discipline of applied
linguistics, where the two business-type disciplines are in accordance in employing
the bundles from a broader category. This finding is consistent with Damchevska
(2019) in which the grammatical structure of bundles in economics research
articles were found to be clausal, implying that the clausal bundles were largely
employed in business-type texts, and the two closely-related disciplines might
share structurally similar bundles.
By looking at other disciplines, the findings from Cortes (2004) suggested
that the bundles in history discipline only belong to phrasal groups structurally,
while biology discipline contains clausal bundles. Since biology is recognized as a
discipline from the pure science field, the bundles in the two business-type texts
show tendency towards a more natural science way concerning the use of the
clausal descriptions. On the other hand, the discipline of applied linguistics
demonstrates a closer relationship with history discipline than finance and
accounting regarding the different preference over phrasal and clausal bundles.
Table 5.5 presents the structural distribution of the extracted bundles. Overall,
the CFE and the CAE demonstrate similar tendency in the structural subcategories.
NP-based structure covers a major percentage in which “noun phrase with
of-phrase fragment” takes the lead, being consistent with the previous results in
Pan, Reppen, and Biber (2016), and Esfandiari and Barbary (2017).
Table 5.5 Structural distribution of bundles
category structural sub-category type total frequency LL
CFE CAE CFE CAE
NP-based NP with of-phrase fragment 18 16 851 749 0.31
NP with other post-modifier fragment 15 13 675 685 -4.45*
PP-based PP with embedded of-phrase fragment 8 9 377 344 -0.01
other PP fragment 11 4 499 174 132.96**
38
Chapter Five Results
VP-based anticipatory it + VP/adjective phrase 0 0 0 0 -
passive verb + PP fragment 4 4 160 169 -1.96
#verb (past participle) + PP fragment 0 4 0 123 -
copula be + NP/adjective phrase 5 5 252 228 0.00
(VP +) that-clause fragment 1 3 72 143 -31.51**
(verb/adjective +) to-clause fragment 2 1 107 122 -3.05
adverbial clause fragment 2 3 98 190 -39.8**
pronoun/noun phrase + be (+ ...) 3 1 152 50 44.28**
#(pronoun/noun phrase)+VP (non-be) +NP 6 8 199 265 -17.15**
others 3 2 135 93 4.16*
total 78 73 3577 3335 61.55**
Note: The sub-category begins with sign# implies newly added sub-categories based on Biber et al. (1999).
5.2.2.1 NP-based bundles
Table 5.6 and table 5.7 present the patterns found in NP-based bundles.
Among the “noun phrase with of-phrase fragment” structure, three major patterns
can be seen. The “the + Noun + of + *” structure is the most frequently used one,
in which “the + Noun + of + the” pattern accounts for a large proportion. The
second and third categories are distinguished mainly in the first word of the
bundles, with the former beginning with the article the or a while the latter the
regular nouns. The CAL contains the bundles from these three categories, however,
with fewer types and lower frequency.
Table 5.6 Noun phrase with of-phrase fragment
pattern CFE CAE CAL
the+noun+of+* the size of the (69) the effect of the (104) the meaning of the (43)
the end of the (62) the end of the (77)
the number of analysts (54) the ratio of the (64)
the results of the (52) the beginning of the (62)
the sum of the (52) the quality of the (58)
the end of month (48) the value of one (55)
the variance of the (48) the magnitude of the (54)
the probability of a (47)
the value of the (41)
39
A Corpus-Based Study of Lexical Bundles in the Research Articles
 From Disciplines of Finance and Accounting
the(a)+*+noun+of the cross section of (113) the total number of (97) the total number of (54)
the natural logarithm of (81) a small number of (67)
the book value of (62) the natural logarithm of (62)
the market value of (52) the absolute value of (59)
the standard deviation of (43) the natural log of (55)
the standard deviation of (52)
noun+noun+of+* panel a of table (110) panel a of table (88) per cent of the (72)
panel b of table (97) panel b of table (71)
standard deviation of the (42) market value of equity (59)
cross section of stock (41)
Note: * means alternative component. Figures in brackets denote the normalized frequency of each bundle. The bundles
occurring in both the CFE and the CAE are written in bold. The bundles occurring in the CFE, the CAE, and the CAL are
written in bold and underlined.
In another category of NP-based structures, although the “noun phrase with
other post-modifier fragment” structure shows significant differences across the
CFE and the CAE regarding the total frequency, some bundles overlap to a certain
extent. The post-modifier fragment can be further divided into four categories, as
shown in Table 5.7. Overall, this structure tends to employ the prepositional phrase
as the post modifier in form, and puts emphasis on statistics in meaning by looking
at the repeated bundles.
Table 5.7 Noun phrase with other post-modifier fragment
pattern CFE CAE CAL
noun+ the difference between the (93) standard errors clustered by (59) english as a lingua (115)
prepositional an increase in the (72) the results in table (51)
phrase standard deviation increase in (67) statistical significance at the (46)
statistical significance at the (63) cross sectional variation in (42)
the change in the (59)
cross sectional relation between (45)
the results in table (45)
market to book ratio (42)
noun+ the extent to which (45) an indicator variable that (106) the extent to which (110)
clause the extent to which (100) the ways in which (81)
indicator variable that takes (69)
variable that takes the (43)
40
Chapter Five Results
noun+ variable equal to one (55) indicator variable equal to (117)
adjective an indicator variable equal (116)
phrase variable equal to one (114)
noun phrase the risk free rate (73) and year fixed effects (64)
(proper noun) the corporate bond market (68) securities and exchange
a one standard deviation (65) commission (64)
one standard deviation increase (51)
panels a and b (42)
The only bundle occurring repeatedly in the three corpora, the extent to which,
is formed by “noun + clause” structure. From concordances (1-3), this expression
was found to follow the notional verbs rather frequently, such as ascertain,
examine, investigate, determine, and estimate, which are related to describing
research questions and objectives. More concordances suggest that these bundles
are usually used to discuss previous studies’ practices in the literature review
section or the authors’ research steps in the methodology section. Thus the three
disciplines might share a broader research paradigm considering the massive hits
of this bundle.
One point that needs to be mentioned is that several 4-word bundles are split
from 5-word or 6-word bundles, adding to the total number of bundle types. For
example, the 4-word bundles indicator variable equal to, variable equal to one,
and an indicator variable equal appear to be clipped from the 6-word bundle an
indicator variable equal to one; The 4-word bundles a one standard deviation, one
41
A Corpus-Based Study of Lexical Bundles in the Research Articles
 From Disciplines of Finance and Accounting
standard deviation increase, and standard deviation increase in can be derived
from the 6-word bundle a one standard deviation increase in. Thus in this case,
the bundle diversity in the CFE and the CAE should be narrowed a bit.
However in general, to combine the NP-based structures in Table 5.6 and
Table 5.7, the bundle types and frequency in the CFE and the CAE show closer
correlation than the CAL, in exception of the few partially overlapped bundles.
5.2.2.2 PP-based bundles
PP-based bundles conform to the norm of academic prose which are primarily
embedded with the “prepositional phrase + of” pattern, as shown in Table 5.8.
Overall, more than half of bundle types repeat in the CFE and the CAE, and only
two bundles, at the end of and in the case of, occur in the three corpora.
The beginning preposition at and in are mainly adopted in this structure
across the three corpora, most of which are followed by the determiner the,
forming a structure in narrowing the case to a specific condition (Concordances
4-6). The repeated bundles are mainly used for indicating time (at the end of), and
limiting the scope of discussion (in the case of).
Table 5.8 Prepositional phrase + of fragment
pattern CFE CAE CAL
pp + of at the end of (132) of the number of (82) in the context of (83)
at the time of (60) at the end of (71) on the basis of (77)
of the number of (58) in the presense of (58) in the case of (59)
in panel b of (56) at the beginning of (55) in terms of the (54)
in the case of (50) as a function of (52) at the end of (43)
in panel a of (47) after the adoption of (48) as a means of (41)
at the beginning of (46) at the time of (45) in the form of (41)
for each of the (45) in the case of (43)
to the number of (43)
42
Chapter Five Results
Moreover, as to the total frequency of the “pp + of” fragment in each corpus,
the CFE and the CAE reach almost the same occurrence (normalized frequency) of
around 500 per million words, while this fragment in the CAL occupies about 400
per million words. Bundles from the CFE and the CAE with this structure thus
demonstrate broader meanings semantically, with identical hits numerically.
5.2.2.3 VP-based bundles
Table 5.4 in section 5.2.2 has shown that VP-based bundles cover respectively
around 30% and 40% of the overall structures in the CFE and the CAE, with only
7% in the CAL. In Table 5.5, the log-likelihood value suggests significant
differences in several sub-categories except “anticipatory it + verb phrase/adjective
phrase”, “passive verb + prepositional phrase fragment”, “copula be + noun
phrase/adjective phrase”, and “(verb/adjective +) to-clause fragment” structures.
Among them, the sub-category “anticipatory it + verb phrase/adjective
phrase” in VP-based structure shows no occurrence under the bundle identification
criteria in both the CFE and the CAE, although this structure has been noted by
Cortes (2004) as a depersonalized mode frequently found in academic prose.
43
A Corpus-Based Study of Lexical Bundles in the Research Articles
 From Disciplines of Finance and Accounting
By testing whether this structure has ever occurred in the CFE and the CAE,
the analysis suggests that this formation exists with only several hits, as shown in
concordances (7) and (8). While the “3rd person singular ‘it’ + verb phrase
fragment” serves as a part in academic writing style, the academic articles in the
CFE and the CAE show little reliance on this structure. This may be due to the
phenomenon that the application of “anticipatory it” form lengthens a sentence
without adding more content information than other VP-based structures such as
the simple “verb + noun” form, which occupies a large part.
Other VP-based forms, including “passive verb + prepositional phrase
fragment”, “copula be + noun phrase/adjective phrase, and “(verb/adjective +)
to-clause fragment”, are also similarly distributed in the CFE and the CAE, either
regarding the types or frequency of bundles.
5.2.3 Functional features in the ‘business-type’ corpora
Three major discourse functions were classified following the taxonomy from
Hyland (2008, see section 3.3.2), according to which each bundle was manually
annotated. It is necessary to note that when a bundle carries more than one
function, the primary one is recorded. The context-dependent bundles are checked
with the concordance lines to track the appropriate function, giving priority to the
most commonly used one.
44
Chapter Five Results
Table 5.9 Broad comparison of functional categories
category type total frequency
CFE CAE CAL CFE CAE CAL LL
research
-oriented 47(60%) 50(69%) 17(59%) 2128(59%) 2262(68%) 509(58%) -28.36**
text
-oriented 27(35%) 22(30%) 11(38%) 1272(36%) 951(28%) 355(40%) 20.03**
participant
-oriented 4(5%) 1(1%) 1(3%) 177(5%) 122(4%) 20(2%) 5.43*
total 78(100%) 73(100%) 29(100%) 3577(100%) 3335(100%) 884(100%) 61.55**
Table 5.9 shows the broad comparison of functional categories between
corpora. In this table, research-oriented expressions refer to the bundles aiming at
depicting the author’s activities and operations so as to present the entire process
of a research. These bundles give direct assistance in narrating the considerations
given by authors. The type and total frequency of these expressions are roughly
consistent across the three corpora, each covering around 60% of the total bundles.
According to Hyland (2008), the significantly greater use of research-oriented
bundles in hard science emphasizes generalizations rather than individual
interpreting by highlighting research practices, methods, and procedures. At this
point, the bundles in the research articles from finance, accounting, and applied
linguistics, exemplify some similar features with those in the pure science studies.
The second largest group is the text-oriented expressions, which are used to
link each textual element so as to compose the texts to a larger unit. These
expressions take up around 30% to 40% respectively in the three corpora. The
signals included in this group such as transition, result, structure and frame, are
functioned as explanatory patterns, containing elements required for a research
article to smoothly advance.
Participant-oriented bundles take up the lowest proportion. These bundles lay
stress on the authors’ or readers’ attitude. There appears to be a similar distribution
45
A Corpus-Based Study of Lexical Bundles in the Research Articles
 From Disciplines of Finance and Accounting
with that in Esfandiari and Barbary (2017), where the first two categories of
research-oriented and text-oriented bundles account for the main ratio. However,
the proportion of participant-oriented bundles in this study is extremely lower than
what previous literature have claimed.
 For instance, Hyland’s (2008) study showed that the discipline of biology and
electrical engineering both comprise around 9% participant-oriented bundles,
while business studies and applied linguistics cover around 17%. The former two
fields are reported to share the properties of pure sciences while the latter the
social sciences. Surprisingly, this study shows different results that all the three
corpora contain either around 5% or less than 5% participant-oriented bundles.
There may be two potential reasons for such difference.
 Firstly, Hyland (ibid) constructed larger corpora with about 3.5 million words
where the language data were from doctorate and master’s writings by non-native
English speakers along with published journals, while the present research
involves fewer than 2 million words from published journals only, without
identifying the authors’ identities. Hence the corpora in these two studies are
aiming at distinctive objectives, then the data representativeness remains different.
 Secondly, the three corpora in the present study display consistent preference
in the use of participant-oriented bundles, which tends to be even lower than the
proportions in pure science disciplines, and keep at a distance from the generally
reported properties in social science disciplines. Thus it is assumed that the choice
of bundles in research articles from finance, accounting, and applied linguistics
were found to gain some pure science attributes.
 Table 5.10 presents the functional distribution of the extracted bundles. As a
whole, similarities and differences can both be seen comparing the distribution in
the CFE and the CAE, in which the bundles with the primary function of “topic”
show the highest frequency while no engagement bundles have been found. The
46
Chapter Five Results
following will exemplify some sub-categories with notable features, taking into
consideration the frequency (the highest and lowest occurrence), and types
(completely identical bundles or similar in quantity) of bundles in the CFE and the
CAE.
Table 5.10 Functional distribution of bundles
category sub-category type total frequency LL
CFE CAE CFE CAE
location 6 9 304 355 -10.67**
procedure 1 2 44 80 -14.50**
research-oriented quantification 8 11 326 480 -46.96**
description 9 3 457 155 127.16**
topic 23 25 997 1192 -42.27**
transition signals 4 3 226 122 22.05**
text-oriented resultative signals 4 9 189 318 -47.31**
structuring signals 13 3 596 148 246.20**
framing signals 6 7 261 363 -28.02**
participant-oriented stance features 4 1 177 122 5.43*
engagement features 0 0 0 0 -
total 78 73 3577 3335 61.55**
5.2.3.1 Research-oriented bundles
Research-oriented bundles play an indispensable role in academic writings, as
their main purpose is to present the research. The research-oriented bundles were
found to make up a large portion among the three functional categories.
“Topic” bundles. Bundles relating to research “topic” occupy around half of all
the research-oriented ones. Topic-related bundles are mainly concerned with two
groups of themes. One follows the detailed discussion of a specific subject, in
which distinctive concentrations emerge in the CFE and the CAE. Not a single
“topic” bundle has been found repeating across the two corpora, as shown in
47
A Corpus-Based Study of Lexical Bundles in the Research Articles
 From Disciplines of Finance and Accounting
concordances (9-10). Another theme concerns the bundles indicating statistical
tests, where the result that the two corpora share few such bundles suggests the
identical approaches to data examination, as exhibited in concordances (11-12).
The second theme comprises a far higher proportion than the first, but the
means to statistical measurement are confined only to a relatively fixed number of
ways, especially the commonly recognized ones. Regarding the limited approaches
to statistical tests, the large proportion of these bundles implies that finance and
accounting studies put emphasis on empirical over introspective method, primarily
expanding the research procedure with the guidance from experiments. Besides, it
is noteworthy that the ways of reporting data are partially alike, as shown in
concordances (13-14). In the meanwhile, the word choice is not entirely subjective,
but relies on frequently occurring bundles or the slight changes within these
bundles, such as switching order, adding or deleting a certain word, and altering
the pattern without replacing the words that the bundles originally contain.
The bundles from the CFE and the CAE exhibit significance on data reporting,
48
Chapter Five Results
similar to pure science writings which include descriptions on tables or figures as
the most frequent strings. Among the 16 statistics-featured bundles, four of them
comprise the lexis table, forming bundles such as panel a of table, panel b of table,
this table reports the, and the results in table. It is shown that academic studies
from finance and accounting areas are concerned with the data, where figures play
a role as the basis or departure point of analysis.
An added test shows that the lexis table hits 1,976 times ranking the 29th
among all the single lexis in the CFE, and 1,359 times ranking the 44th in the CAE.
The four bundles containing the lexis table are made up of descriptions on
statistical information in the tables (See concordances 15 and 16), thus the
descriptions upon tables display a salient point in the two corpora.
The rest of topic-related bundles are more or less related to statistics as well,
such as the dependent variable is, variable equal to one, the natural logarithm of,
results are robust to, the standard deviation of, is the number of, and of the number
of, along with significant at the level, indicate significance at the, and statistical
significance at the. Among them the key words include the testing object variable,
the testing index natural logarithm and standard deviation, along with the testing
result robust and significance.
Hence it is inferred that the CFE and the CAE both show strong reliance on
the bundles describing data collection, arithmetic calculation, mathematics result,
and reporting the summary of certain numerical value. In other words, statistics
play a major role in finance and accounting studies, thus forming a major part of
49
A Corpus-Based Study of Lexical Bundles in the Research Articles
 From Disciplines of Finance and Accounting
those bundles with semantic information.
“Quantification” bundles. Following the function of experimental tests discussed
above, the sub-category of “quantification” emerges to be the second largest group
among research-oriented bundles, as can be seen in Table 5.11.
Table 5.11 The bundles with “quantification” function
function CFE CAE
quantification is the number of (64) the total number of (97)
the book value of (62) of the number of (82)
of the number of (58) is the number of (68)
of one plus the (56) a small number of (67)
the number of analysts (54) the ratio of the (64)
the market value of (52) the absolute value of (59)
is the ratio of (41) takes the value of (58)
the value of the (41) the value of one (55)
the magnitude of the (54)
that takes the value (48)
to the number of (43)
The key words number, value, and ratio occur repeatedly in both corpora so as
to specify the figure of each subject. The 3-word bundle the number of appears
with its variations being outstanding as well, such as the total number of and a
small number of. The repeated bundles is the number of (concordances 17-18) and
of the number of (concordances 19-20) show high frequency among the others.
The former serves to provide definition to a concept related to quantification while
the latter helps expand information in a quantified perspective.
50
Chapter Five Results
“Procedure” bundles. In research-oriented bundles, those performing the function
of “procedure” are recognized as the least frequently occurring ones, in which only
two bundles is defined as the and as the function of appear, with the former
occurring repeatedly in both corpora. On the one hand, although describing the
research procedure is supposed to be an indispensable part in academic studies, the
authors of the two business-type texts tend to employ fewer bundles in simply
discussing the procedure compared with other functions under the same criteria.
On the other hand, bundles with other functions could also indicate “procedure”,
since the primary function is recorded only, such as the “location” function, as
shown in Table 5.12.
Table 5.12 The bundles with “location” function
function CFE CAE
location at the end of (132) the end of the (77)
the end of the (62) at the end of (71)
at the time of (60) are defined in appendix (62)
at the same time (50) the beginning of the (62)
the end of month (48) at the beginning of (55)
at the beginning of (46) in the online appendix (54)
after the adoption of (48)
at the time of (45)
defined in appendix b (41)
“Location” bundles. The key words in bundles playing the role of “location”
mainly include beginning, end, and time across the two corpora, forming bundles
as at the beginning of, the beginning of the, at the end of, the end of the, at the time
of, and at the same time, which could be considered as describing the procedure in
some cases as well.
51
A Corpus-Based Study of Lexical Bundles in the Research Articles
 From Disciplines of Finance and Accounting
As exemplified in concordances (21-23), the context of each is related to the
descriptions upon procedures. However, most bundles with the “location” function
stay close to its major function of indicating “time” or “place”, with few
exceptions such as the overlapping functions mentioned above.
5.2.3.2 Text-oriented bundles
The two corpora CFE and CAE comprise similar bundles regarding the
transitional and framing signals in text-oriented ones, as can be seen in Table 5.13
and Table 5.14. All the transitional bundles in the CAE repeat in the CFE, and 5
out of 6 framing bundles occur in the CAE. These bundles could be regarded as
highly similar in the two corpora despite the difference in total frequency.
“Transitional” bundles. The transitional bundles are mainly employed in showing
additive links (e.g. is consistent with the, as well as the) and leading into
comparisons (e.g. on the other hand). Particularly, the bundle is consistent with the
usually illustrates the relation between various results and evidences (as shown in
concordances 24-27).
Table 5.13 The bundles with “transition” function
function CFE CAE
transition on the other hand (101) on the other hand (74)
is consistent with the (75) as well as the (55)
as well as the (71) is consistent with the (48)
is associated with a (50)
52
Chapter Five Results
An interesting point is that while on the other hand occurs frequently in both
the two corpora, the general counterpart on the one hand appears fewer times. It is
then checked with a larger unit of context (concordances 28-29). It is shown that
although the two corpora employ on the other hand as a transitional signal, its
counterpart is omitted in most cases. In this way the focus is shifted to the latter
half when on the other hand serves as a prompt signal as well.
The omission has no effect on the transitional function performed by the other
half, where the transitional signal stays clear. Combined with other bundles
suggesting the turning points, the authors from the two business-type disciplines
display similar selection of transitional bundles.
53
A Corpus-Based Study of Lexical Bundles in the Research Articles
 From Disciplines of Finance and Accounting
“Framing” bundles. The framing signals are used to specify limits to a certain
value (e.g. equal to one if), period (e.g. in a given year) or situation (e.g. in the
case of), as shown in Table 5.14.
Table 5.14 The bundles with “framing” function
function CFE CAE
framing equal to one if (81) equal to one if (130)
in a given year (75) the extent to which (100)
in the case of (50) to one if the (90)
to one if the (47) in the presence of (58)
for each of the (45) value of one if (55)
the extent to which (45) in a given year (49)
in the case of (43)
Among the repeated bundles, in a given year and equal to one if (resembled
to value of one if, and to one if the) are relatively uncommon. Semantically, in a
given year (concordances 30-31) implies a certain year-long period, presuming
some occasions may take place within this period. The concordances (32-33)
suggest that equal to one if composes part of the longer expressions in projecting
the research methodology. Both the two examples involve a hypothetical condition
based on previous data, putting cases in a general way. To sum, finance and
54
Chapter Five Results
accounting disciplines involve these kind of assumptions as an approach to
generalize certain conditions in explaining the research procedure.
5.2.3.3 Participant-oriented bundles
Participant-oriented bundles were observed with only “stance” features, while
the bundles with “engagement features” were found with no occurrence among all
the extracted ones.
“Stance” bundles. The bundles performing the stance function are shown in Table
5.15. Stance bundles largely convey the inference by using the lexis upon
possibilities, such as likely and probability.
Table 5.15 The bundles with “stance” function
function CFE CAE
stance are more likely to (96) are more likely to (177)
the probability of a (47)
more likely to be (45)
we focus on the (45)
Possibilities, on the other hand, could imply uncertainties as well. Academic
authors draw support from these bundles to put forward their proposition, thus
guiding the readers to stay in line with the authors. The chances are that the
employment of these bundles strengthens the confidence of inference without
imposing any kind of fact upon readers.
Although stance features are seen with more types in the CFE while the CAE
only contains the bundle are more likely to, this one appears as the top first
frequently used bundle among all the identified ones. It is inferred that accounting
studies employ fewer bundle types in showing authors’ stance, yet with an
extremely large occurrence.
55
A Corpus-Based Study of Lexical Bundles in the Research Articles
 From Disciplines of Finance and Accounting
“Engagement” bundles. As “engagement” bundles were not found among the
retrieved ones, an additional test was conducted to check whether this kind of
bundles ever occurred at all. Two bundles being the examples in Hyland (2008)
with engagement attributes were selected as the targets (Concordances 34-35). The
results show that this sort of bundles exist but with a low frequency.
As participant-oriented ones, the bundles with “engagement features” serve
the function of involving the readers into the research in a straightforward way,
guiding readers into the textual organization that the author has set in advance. The
research articles from finance and accounting attempt to lessen the impact in
directly prompting the key points to readers, but leave them with more possible
considerations. This result is consistent with the previous literature from Esfandiari
and Barbary (2017), where engagement features only take an extremely small
percentage among other bundle functions. As noted in Zou and Hyland (2020), the
disciplinary communities have their own ways of engaging readers, such as
imperatives, reader pronouns, and questions, in which the majority of engaging
features occur in soft discipline texts. Thus it is inferred that these two
business-type disciplines have a tendency of deviation from the typical soft science
ways of engaging readers.
5.3 Unique features in the ‘business-type’ corpora
5.3.1 Unique structural features
The bundles in the two business-type corpora also demonstrate distinctive
56
Chapter Five Results
structural usages in some aspects. The PP-based bundles were found to occur more
frequently in the CFE while the CAE contained more VP-based bundles. The
difference between the CFE and the CAE was also proved significant using the
log-likelihood test. The following discussions will present the detailed information
around the unique features of each corpus and discipline.
5.3.1.1 PP-based bundles
PP-based bundles in the CFE and CAL demonstrate a broader variety, and the
total frequency of eligible bundles in the CFE is over that in the CAE in “other
prepositional phrase fragment” (Table 5.16). This structure shows statistical
significance between the CFE and the CAE.
Table 5.16 The other prepositional phrase fragment
structure CFE CAE CAL
other on the other hand (101) as the dependent variable (75) in second language acquisition (140)
prepositional in this section we (88) on the other hand (74) on the other hand (137)
phrase in a given year (75) in the online appendix (54) as a lingua franca (124)
of one plus the (56) in a given year (49) at the same time (99)
at the firm level (54) of english as a (90)
in the cross section (54) of the variance in (65)
at the same time (50) in the current study (61)
in this paper we (47) as a foreign language (45)
in table panel a (47) in second language learning (45)
as the difference between (43) in the present study (45)
in the corporate bond (42) as a second language (41)
in a second language (41)
The bundles with the structure “other prepositional phrase fragment” in the
CFE were found to play the role of transitioning, (e.g. on the other hand, at the
same time), organizing the textual structure (e.g. in this section we, in the paper
we), setting limitation to the discussion range (e.g. in a given year, as the
difference between), or pointing to topic-related areas (e.g. in the corporate bond).
57
A Corpus-Based Study of Lexical Bundles in the Research Articles
 From Disciplines of Finance and Accounting
These functions were also noticed in the CAE, but with a narrower coverage.
The repeated structures are mainly used to lead in turning points (e.g. on the other
hand) or help clarify the topic-related information (e.g. in a given year). The CAL
contains more abundant bundles than the other two corpora in types and frequency.
In addition to those bundles performing the functions in connecting sentences, the
others focus on the key word language, which is the starting point of linguistic
studies. It is then presumed that the bundles performing the “topic” function are
largely formed with PP-based structures in the CAL. However, the assumption is
limited only to the CAL, since this feature has not been remarkably noticed in the
CFE and the CAE.
On the whole, the authors from the three disciplines display variant choices
regarding “other prepositional phrase fragment”. This may because of the
characteristics of this structure, for its ability to perform diversified functions in
the first place.
5.3.1.2 VP-based bundles
Among the extracted VP-based bundles, some fail to appropriately fit into the
framework from Biber et al. (1999) if not being categorized into “Others”
sub-category, which is rather a general gathering. Then those exceptional bundles
were then filled into two independent sub-categories particularly given by this
paper, as shown in Table 5.17 and Table 5.18.
The bundles such as scaled by total assets, clustered by firm and, defined in
appendix b, and divided by total assets, are similar in structure - beginning with
“past participle” form, and followed by prepositional phrase. Concordances (36-39)
show that these fragments are either grammatically functioned as the post modifier
or partial passive voice constituents.
58
Chapter Five Results
Table 5.17 The verb (past participle) + prepositional phrase fragment
pattern CFE CAE
passive verb + prepositional phrase is defined as the (58) variables are defined in (74)
fragment are clustered at the (52) is defined as the (64)
is associated with a (50) are defined in appendix (62)
standard errors are clustered (50) all variables are defined (45)
#verb (past participle)+ prepositional / scaled by total assets (55)
phrase fragment clustered by firm and (42)
defined in appendix b (41)
divided by total assets (41)
As can be seen in Table 5.17, “#verb (past participle) + prepositional phrase
fragment” sub-category is considered as a supplementary of “passive verb +
prepositional phrase fragment”. This structure has not been found in the CFE, but
makes up 4 types of bundles in the CAE. Each corpus has 4 bundle types with
“passive verb + prepositional phrase fragment” structure. Then the “passive voice”
structure within the bundles in both corpora is noted, with the “past participle”
form playing a part in either making the description more concise, or hedging the
subject so as to sound objective.
Table 5.18 The (pronoun/noun phrase) verb phrase (non-be) + noun phrase fragment
pattern CFE CAE
pronoun/noun phrase + be (+ ...) the dependent variable is (90) the dependent variable is (72)
the sample period is (59)
dependent variable is the (50)
59
A Corpus-Based Study of Lexical Bundles in the Research Articles
 From Disciplines of Finance and Accounting
#(pronoun/noun phrase) verb phrase this table reports the (47) takes the value of (58)
(non-be) + noun phrase panel b shows the (45) this table reports the (55)
we focus on the (45) indicate significance at the (48)
and indicate significance at (42) table presents the results (48)
indicate significance at the (42) that takes the value (48)
panel a reports the (41) reports the results of (43)
table reports the results (43)
we do not find (41)
The two corpora comprise another structure that has not been clarified in
Biber et al. (ibid), as shown in Table 5.18. The lexical bundles such as this table
reports the and indicate significance at the, are noted as the most frequently
occurred pattern in VP-based bundles. This structure derives from the “verb +
noun” pattern, which employs active voice in the form of “subject + participle verb
+ objective” format. This pattern has been categorized in Cortes (2004) as “(noun
phrase) + verb + (complement)”, reflecting the academic prose style. These
bundles are mainly used to report the results with the reporting verbs as report,
show, present, indicate, and find. In both the two corpora, only two bundles take
the first person plural form we as the subject, while others tend to use table or
panel to point to the results.
Considering that the high frequent VP-based bundles are included in the
frame “#(pronoun/noun phrase) verb phrase (non-be) + noun phrase”, a possible
hypothesis comes to that financial and accounting research articles tend to shape
the language relatively straightforward despite some academic hedging approaches.
Other chances are that academic authors attempt to integrate the message in an
accurate and concise way, so as to distinguish its scientific characteristic from the
popular wording.
5.3.2 Unique functional features
The unique functional features were presented by comparing the types and
60
Chapter Five Results
normalized frequency of the extracted bundles, which were mainly expressed from
three sub-functions: “description”, “resultative”, and “structuring”. The following
will respectively discuss how these functions are achieved through bundles.
Table 5.19 The bundles with “description” function
function CFE CAE
description the cross section of (113) the effect of the (104)
the difference between the (93) that the effect of (62)
an increase in the (72) the quality of the (58)
the size of the (69)
the change in the (59)
are clustered at the (52)
the sum of the (52)
cross sectional relation between (45)
as the difference between (43)
“Description” bundles. As shown in Table 5.19, not a single bundle repeats
across the CFE and the CAE in displaying the “description” function. Besides, the
observed highly frequent bundles are used in various ways in the two corpora. The
descriptions touch upon a variety of meanings in the CFE while the CAE only
covers two themes including “effect” and “quality”. However, as mentioned earlier,
since only the primary functions have been recorded, a possible inference leads to
the result that the CAE makes use of more bundles primarily underlining other
functions such as “quantification” (see Table 5.11), considering its high frequency
and similar structure (mainly NP-based bundles) to the “description” function.
“Resultative” bundles. Text-oriented expressions intend to organize the structure
of the research procedure, providing logical connections between activities.
Resultative expressions, as the name implies, take the relation between methods
and results into consideration. The bundles with “resultative” function are
presented in Table 5.20.
61
A Corpus-Based Study of Lexical Bundles in the Research Articles
 From Disciplines of Finance and Accounting
Table 5.20 The bundles with “resultative” function
function CFE CAE
resultative we find that the (94) we find that the (98)
results are robust to (56) the results in table (51)
the results of the (52) table presents the results (48)
the results in table (45) we also find that (46)
results are consistent with (45)
results are robust to (45)
reports the results of (43)
table reports the results (43)
we do not find (41)
It can be seen from the table that one thirds of the bundles in the CAE begin
with the first person subject we and the predicate verb find, displaying a strong
stance feature (Type 1). Another one thirds take report and present as the predicate,
which are commonly used as the reporting verbs (Type 2). The rest bundles intend
to take results or the results as the antecedent words within a bundle, highlighting
the resultative topic (Type 3). The examples of these three sub-types are shown in
concordances (40-42).
As a contrast, three out of four resultative bundles in the CFE belong to the
third type depicted in the CAE. A larger unit of word combinations such as the
5-word bundles, including present the results in table, and report the results in
table, were found in the CFE, which belong to the second type in the CAE. And
these longer strings also appear in the CAE. Thus it is implied that the research
62
Chapter Five Results
articles from the finance discipline tend to focus on the word results in reporting
and describing the findings, while those from the accounting discipline engage
more forms of bundles in presenting the results.
“Structuring” bundles. Another text-oriented functional subcategory
“structuring” denotes the distinctions in the bundle types across the two corpora.
As shown in Table 5.21, in this case, the CAE only contains three bundle types,
fewer than that in the CFE.
Table 5.21 The bundles with “structuring” function
function CFE CAE
structuring panel a of table (110) panel a of table (88)
panel b of table (97) panel b of table (71)
in this section we (88) this table reports the (55)
in parentheses and indicate (63)
in panel b of (56)
in the cross section (54)
in panel a of (47)
in this paper we (47)
panel b shows the (45)
panels a and b (42)
The structuring bundles convey two major purposes: One is to arrange the
overall layout, such as in this section we, and in this paper we; Another is to set
directions, building a bridge between tables, panels, and written language, such as
panel a of table, and panels a and b. The first purpose can be considered as a
general use of bundles in academic articles, whose appearance normally will not
be affected by disciplinary variations, but might be regarded as the general bundle
candidates instead. The second, on the other hand, takes into account the
discipline-specific features, providing that the target articles have tables where
data models are reported in the first place, then the narrations could be expressed
through these bundles.
63
A Corpus-Based Study of Lexical Bundles in the Research Articles
 From Disciplines of Finance and Accounting
In spite that the CFE comprises more structuring types of bundles, over half
of those are concerned with one theme, that is, the panel data. Then it may be
inferred that the CFE demonstrates more various combination of words regarding
data model descriptions. In the meanwhile, the CAE also takes panel data
description as a major part in performing “structuring” function, with less
diversified bundles.
From the above analysis, it is inferred that the unique functional features of
bundles are mostly connected with the topics in a discipline-oriented condition.
To conclude, the two business-type corpora have shared and particularly
unique characteristics. Additionally, these unique features can be reflected through
a lexical bundle approach. The variations of bundle type and frequency serve as
important clues in identifying the distinguishing features of language choices in
different disciplinary communities.
学术论文网提供数万篇的免费毕业论文、硕士论文、博士论文、sci论文发表的范文供您参考,并提供经济、管理、医学、法律、文学、教育、理工论文、mba作业、英语作业的论文辅导写作、发表等服务,团队实力雄厚,多达人,帮您解决一切论文烦恼。