What is corpus linguistics, and why is it relevant to education?

Illustration of keywords in context from the British National Corpus at English-corpora.org

Illustration: Keywords in context from the British National Corpus at English-Corpora.org

Teachers and students often have questions about how language is used. 

At one level, they might want to know about particular words: What’s the difference between absolutely, completely, and totally? Should I put however at the beginning, middle, or end of a sentence? Or they might want to know about how they can best shape their language to the expectations of particular situations: How is the language I need for a school presentation different from the language I need to talk to tourists? When do/don’t academics use I in their writing?

At a broader level, they might have questions related to planning a course of study: What are the most important words to learn if I’m studying to be a doctor? Are the language examples in this textbook really idiomatic? What vocabulary/grammatical forms can I expect a typical student to be using by Year 9?

While the intuitions of a proficient speaker might give us some clues, these can often be misleading. Language users often remember points that are salient or are emphasized in traditional teaching materials, and these may not always reflect the facts of language use. As a teacher of English, I had always assumed that the central use of like was as a verb (I like football, etc.) but it turns out that its use as a preposition (people like you) is over three times more frequent (Davies, 2018). Intuitions can also overlook key facts about language use. One example is the tendency for words like cause to be used to describe unpleasant events: we are likely to cause problems, cause damage, and cause pain, but we rarely cause solutions, cause remedies, or cause comfort.

Intuitions tend to draw a complete blank when it comes to identifying broader patterns of language variation: Is present perfect tense more frequent in conversation or in academic writing? Which are the most frequent reporting verbs in Physics writing, and how does this compare to Philosophy? How frequently do Year 9 students use relative clauses, and how does this compare to adults? (For answers to these questions, see Biber et al., 2021; Hyland, 1999; Durrant & Brenchley, 2023.)

Corpus linguistics is an empirical approach to language study that enables us to answer questions like these. It involves collecting large samples (typically millions, sometimes billions, of words) of language use that are intended to be representative of a particular domain of use, in much the same way that an opinion poll is intended to be representative of a population of voters. These samples are stored on computer and a range of tools used to learn about language use. We can identify what forms are most frequent in a particular context or amongst particular groups of users; we can compare frequencies with those of other contexts or users; we can view hundreds of contextualized examples to understand how forms are used, and identify the other linguistic features with which they co-occur.

In the MULTIWRITE project, we are harnessing the tools of corpus linguistics to study writing in Norwegian, English, Spanish, French and German by children in Norwegian secondary schools. We want to find out how each of these languages develops as children progress and how they influence each other. Data from the project will give us empirically based understandings to support teachers’ work across these different languages.

Sources:

Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (2021). Grammar of spoken and written English. Amsterdam: John Benjamins Publishing Company.

Davies, M. (2018). Word frequency data. Retrieved November from https://www.wordfrequency.info. November 2012.

Durrant, P., & Brenchley, M. (2023). Development of noun phrase complexity across genres in children’s writing. Applied Linguistics, 44(2), 239 - 264. 

Hyland, K. (1999). Academic attribution: Citation and the construction of disciplinary knowledge. Applied Linguistics, 20(3), 341-367.
 

Av Philip Durrant
Publisert 29. mai 2024 17:54 - Sist endret 29. mai 2024 17:56
About-image

om bloggen

Velkommen til Multiwrite-bloggen! Dette er en blogg som tilhører prosjektet “MULTIWRITE – sammenhenger mellom første-, andre- og tredjespråk”. Her skriver vi om flerspråklighet, språklæring og språkundervisning og kommentere funn vi gjør underveis i prosjektet.