Writing a Paper in English Corpus Linguistics

By Stig Johansson and Hilde Hasselgård1


1 Purpose

A term paper ('semesteroppgave') is an exercise in linguistic method. The object is to show that you can:

2 Choosing a topic

Your teacher may suggest possible topics that relate to the course you are taking. You can also propose a topic yourself or expand on one of the tasks you have been given during the course. Another possibility is to carry out a study similar to one that you have read about in the course syllabus. It is important to be realistic and limit the topic, so that you can finish your paper within a reasonable period of time and write it up within the stated 10 standard pages. Contact your teacher at an early stage to draw up a plan for your paper. The topic will often have to be restricted and modified as you go along.

In choosing your topic and defining your research question, it may be useful to ask yourself the following questions:

More about choosing a topic and defining a research question.

Sources

Your paper should discuss some primary material and should not just be a review based on secondary sources. Primary material is the actual linguistic data you write about, such as written texts, transcriptions of spoken material, tape recordings, or elicited responses from native speakers. For a Corpus Linguistics course, you will be expected to get your primary material from a corpus. Secondary material is what has been written previously on the topic.

3.1 Secondary material
Start by making a general survey of secondary material. This will show you what has been done before (which means that you do not have to do it) and will probably give you ideas on how you should (or should not) deal with your primary material, on problems that you had not thought of, etc. You can find secondary material in a number of places, such as:

In connection with a term paper there is of course a limit to the amount of secondary sources you can be expected to go through. Consult your teacher, who may advise you on this point.

In going through the secondary material, you should make notes and collect excerpts as you go along. It is important to organize your notes in such a way that you can survey (and rearrange) them while you are working on your paper. Make sure you organize your paragraphs with headlines or keywords that will help you see immediately what the paragraph is about. Make sure that you copy the excerpts correctly and write the name of the source and the page number or URL immediately. This will save you a lot of extra work and trouble later on. Getting the facts right is fundamental in scholarly procedure.

3.2 Primary material
Although it is important to study the secondary material carefully, your main task is to collect and analyse some primary material. You may draw your primary material from corpora such as the LOB Corpus, the British National Corpus (BNC), the Corpus of Contemporary American English (COCA), the English-Norwegian Parallel Corpus (ENPC), the Oslo Multilingual Corpus (OMC), the International Corpus of English (ICE), or the International Corpus of Learner English (ICLE, including the Norwegian subcorpus NICLE). Your choice of corpus will depend on your research question. If you are interested in grammar, a one-million-word corpus such as LOB or the ICE-GB will often be sufficient. For lexical phenomena beyond the core vocabulary you will need a much larger corpus, such as the BNC or COCA. If you want to study translation or compare languages, you will find a good source in the ENPC or the Oslo Multilingual Corpus (OMC). Aspects of learner English can be studied for example on the basis of the International Corpus of Learner English (ICLE) or VESPA.

To use a corpus, you need a search tool. Many corpora can be searched by means of a general concordancer such as AntConc or WordSmith Tools (see Scott 2001), but for example, the BNC, ICE-GB (the British part of ICE), and the ENPC have their own search engines, as do the corpora found at https://www.english-corpora.org/.

It is often useful to make a pilot study first, i.e. to collect a small amount of material and analyse it along the lines intended. This will show you if your material is giving you enough data, and the right kind of data, or whether you may have to use different kinds of material.

Be careful in registering your material. Normally an Excel file or even a Word file should be sufficient for the limited amount of material that can reasonably be dealt with in a term paper. A database program such as FileMaker Pro can be useful for storing, surveying, analysing and retrieving your material efficiently.

Whichever system you use, make sure to copy all examples correctly (and/or save them in a useful format), to include enough context. Also include the precise source at once, to save time and trouble later on. This is because you may want to look up the example in the corpus again (for example, if you need to look at the wider context) and because you should specify the source of corpus examples when you quote them in your paper.

4 The investigation process

We can broadly distinguish between the following stages:

4.1 The research question
Suppose you are interested in studying the position of the direct object with phrasal verbs; i.e. do you say she switched the light on or she switched on the light? The secondary sources will tell you that a pronoun as direct object is placed between the verb and the particle (e.g. she switched it on), while a noun or noun-headed phrase may appear either before or after the particle. You decide to focus on the word order problem in the latter case. The working title of your term paper could be "What factors determine the placement of nouns and noun phrases as direct objects of phrasal verbs?".

4.2 Data collection
To start your investigation you first need to retrieve relevant examples from a corpus. The examples may be extracted in the form of concordance lines, sentences, or other units, depending on your research question. Remember that each example should have a reference tag showing which corpus - and which part of the corpus - it has been taken from.

In collecting the material you immediately meet a number of problems. It is not always easy to distinguish phrasal verbs (e.g. she put on her coat) from other superficially similar constructions, e.g. constructions with prepositional verbs (e.g. she called on Mr. X to speak) or with prepositional phrases as adverbials (e.g. she called on Monday). And what about examples like these (from the LOB Corpus): Dr Horn swayed two or three inches back, Ugo had his glasses off now, I own some land up in the foothills? The first example looks superficially like a construction with a direct object, but should no doubt be analysed as containing an adverbial. In the last example up clearly goes with the following prepositional phrase rather than with own. The second example is a real problem. Do we recognize have off as a phrasal verb?

Collecting the primary material is not easy. It is necessary to be alert, so as not to miss out relevant examples or include irrelevant ones. If you have studied the secondary material carefully, you will have a good idea about what to look for and how to distinguish relevant from irrelevant examples. Nevertheless, there will always be doubtful examples. Make sure to include these, with a note on the type of problem. Such material is usually important to discuss in your paper.

As you go on you may discover that it is necessary to limit the material a great deal. For example, in a study of phrasal verbs, it may be necessary to limit the search to certain verbs with different particles, or to certain particles with different verbs. A dictionary of phrasal verbs and information in previous studies may guide you in the process.

In a study of phrasal verbs (1986) Stig Johansson decided to focus on all examples in the LOB Corpus of six lexical verbs co-occurring with a noun or noun-headed phrase as direct object and ten particles; see Table 1. To find these, it was necessary to search for all forms of the relevant verbs followed within a certain span by one of the ten particles. Such searches can be easily done using a program like WordSmith Tools.

After irrelevant instances had been discarded, the material was as presented in Table 1. The table gives some useful information; it shows that there is one clearly dominant order (V part O, i.e. verb + particle + direct object). But this is only the starting-point of the analysis.

4.3 Classification of the material
The next logical step in the investigation process is the classification of the material. It is natural to start thinking about the classification while the material is being collected. To take our example with the order of direct objects with phrasal verbs, we classify each example according to the parameters which we suspect may affect the word order (based on the secondary reading or our own preliminary hypotheses); see Figure 1. Here either a spreadsheet or a database program (such as FileMaker) will be an invaluable tool, because you will be able to count, sort, and resort your examples along a number of variables. Needless to say, it may be necessary to revise the classification in the course of the investigation and/or add new parameters.

4.4 The analysis
Having classified the examples, we can rearrange and analyse the material in various ways. In a quantitative study it is usually best to draw up tables first, and then go on to describing and commenting on the data. Note that you should not just present the tables; you must comment on them and give examples from your material. Make sure that each table is numbered and has a legend that says what the table is supposed to show and explains any abbreviations or codes. Be careful in using numbers. Do not give percentages without presenting the raw frequencies. If you compare frequencies from two different corpora, include normalized frequencies. If you give averages, also provide some measure of dispersion. In addition to tables, you may wish to present figures or diagrams. These too should have a legend and adequate explanation.

4.5 Discussing the findings
To continue with our example case, see Tables 2-5. These show that the less common order (V O part, with the direct object between the verb and the particle) is more frequent with a short direct object, a definite form as direct object, and with a literal meaning of the verb plus particle combination. We can now deal with each parameter in turn, illustrating the main tendencies and commenting on any deviations from the main tendencies. We find, for example, that the four examples in Table 2 of V O part with a direct object consisting of three or more words are all definite and that the particle in three of the examples is followed by a prepositional phrase indicating direction, as in:

(1) ... bring these opulent days back to life ... (LOB F35:68)

With respect to the seven examples in Table 3 of V O part with indefinite noun phrases, we find that the direct object is short (one to two words) and in several cases followed by a prepositional phrase indicating direction, as in:

(2) The barman put two glasses down on the counter. (LOB L11:73)

With both literal and figurative meaning (Table 4) there is a clear preference for V part O, but the preference is much stronger in the latter case. The examples which deviate from the main pattern generally contain a particle followed by adverbial specification in the form of a prepositional phrase, as in:

(3) A final dividend of 10 p.c. brings the total distribution up to 17.5 p.c. ... (LOB A16:178)

Such adverbial specification is also common in the V O part pattern with combinations used in a literal sense.

Advice on how to quote examples in your paper

Analysing the material means asking interesting questions about it. In our case we found three parameters which turned out to be important: the length of the direct object, the structure of the object NP (including definiteness) , and literal vs. figurative meaning of the verb + particle combination. The minority pattern V O part is used particularly with short and definite noun phrases as direct object and when the verb + particle combination has a literal meaning. In the process of the analysis we discovered another important factor: the occurrence of adverbial specification after the particle in the form of a prepositional phrase. This strengthens the minority pattern V O part.

The results may not always be clear-cut; see Table 5. There seems to be a slight tendency for the minority pattern V O part to be more frequent in fiction. In such cases we need to bring in statistical tests. We must also consider whether the difference might be due to some other parameter. Could it, for example, be a reflection of the length of the direct objects? We could expect noun phrases as direct objects, like noun phrases in general, to be less complex in fiction than in informative prose. If there really is a difference between fiction and informative prose which is not due to some other parameter, what might it be due to? These are the sorts of questions that must be asked.

If our results are to be of real value, we should try to generalize beyond our data and find deeper explanations for the regularities observed. Do we have reason to believe that the regularities we have noted extend to phrasal verbs in general (and do not just apply to the six verbs and the ten particles we selected for our material)? Can we find a deeper explanation for the regularities observed? In our case we can relate two of our parameters to two general word order principles: end weight (cf. Table 2) and end focus (cf. Table 3; indefinite noun phrases are more likely to introduce new information and appear in final focus position). It seems reasonable to suppose that a figurative combination (cf. Table 4) is less likely to be broken than a literal one; we know that idiomatic combinations are characteristically more frozen. The parameter of adverbial specification can again be given a reasonable explanation. Note that there is a tendency for adverbial particles to be attracted to related prepositional phrases; the result may even be a compound preposition: into, up to, out of, etc. An indirect result of this attraction is that we get the minority pattern V O part, with the particle plus prepositional phrase in final focus position.

In other words, we have answered the question posed at the start of the study and have been able to relate our findings to other phenomena in the language. We have reached the final stage of the investigation process and can finish the writing of the paper.

5 Some characteristics of descriptive linguistics

To a linguist (possibly as opposed to a historian or a literary scholar), describing the language of a geographical area, a period, a genre, or a socially defined group of people, is something that is well worth doing. In fact, it is exactly what descriptive linguistics is about: describing the language as it is actually used, and not as certain people think that it should be used. A description of somebody's language therefore should not normally contain prescriptive comments. (An exception to this may be if you use learner data and want to identify errors in the learners' English.)

Even so, a linguistic description is not simply a presentation or summary of some material. Judgments have to be made at each stage of the investigation, and these should be reflected in the written paper. What sort of a problem are you dealing with? What material might be appropriate to use as a basis of the investigation? How should it be delimited? What method of analysis is appropriate? Example material may require interpretation. What patterns can you see? What do they mean? What is the effect of this or that construction? The results of the analysis must also be interpreted. How do they fit in with previous knowledge about the language? And finally, to what extent have you been able to answer your research question?

In order to describe and analyse linguistic features you need precise categories. You must also use the terminology correctly and consistently. Your secondary reading will provide guidance on terminology. Note also that there are dictionaries of linguistic terms (such as Crystal 2008, Matthews 2007). There are cases in which scholars disagree about terminology (unfortunately this is quite common in the humanities!). Then it is an advantage if you know the criteria by which a category is defined. There are also occasions when you cannot rely on ready-made categories; sometimes you may have to define your own, for instance if the categories found in your reading are inadequate for your particular research purpose. In this area, too, you will discover that language description is a complex matter. There are many instances where linguistic categories are less than clear. For example, where do you draw the line between adjective and verb with forms ending in -ed and -ing (e.g He was baffled. The question is intriguing)? How do you delimit phrasal verbs from related constructions? How do you decide what a discourse marker means? Distinctions are certainly not always clear-cut. Sometimes we need to talk about 'more or less' rather than 'either or'; that is, the difference between categories is scalar rather than absolute. In sum, a description of language always involves a good element of analysis and interpretation.

Writing strategies

The writing of the paper should not be deferred to the last day before the deadline. It is natural to write notes while working on the material, such as comments on examples or notes or ideas that you think may become useful later. You might even have a brainstorming session at an early stage when you write down in telegraphic form whatever you think is relevant to the topic.

At a fairly early stage it is useful to write a brief outline of your paper, organized according to some major headings (very often the ones listed below will be useful) and with some notes under each heading.

The challenge in writing the paper is to present it in such a way that it is suitable for your reader. What do you need to explain? What can be taken for granted? It is often natural to skip the introduction and write the main body of the paper first. The first draft is ... a draft. Make sure that there is a logical progression in your paper, without gaps in the argument. Ask your teacher or a fellow student to read your draft. And be prepared to re-write!

There are at least two ways of presenting your investigation (and they can be combined):

Supporting your statements

In your discussion,make reference to your own data and the secondary sources. When you quote, include as much as is necessary, but no more. If you need to abbreviate an example or a quotation from a secondary source, insert three dots (indicating ellipsis). Editorial comments can be added within square brackets. When you quote or paraphrase from somebody else's work, always give a reference to your source. In giving examples from your primary material, it is useful to number the examples (as above). This makes it easier to refer to them in your discussion. Each example should have an identification of the source. Look at the articles on your syllabus to see how this is done.

Distinguish between safe conclusions clearly validated by the data, and uncertain ones, for which you have inconclusive or incomplete evidence. Do not conceal data which may be difficult to account for. On the contrary, such material may require special comment.

Organizing the paper

Organize your paper into sections, with headings. This makes it easier to follow the steps in your investigation. It is often useful to number the sections (as in this paper).

8.1 Before the main text
On the title page, write the title of the paper, your candidate number (which you get from StudentWeb), the date (term), and the course your term paper relates to. Make sure the title of the paper reflects its aim and scope. If your paper is long, you may include a Table of contents. If you use a lot of abbreviations (apart from conventional ones), you should include a list of abbreviations. If your paper includes many tables and figures, include a list of these.

8.2 The main text
The main text can often be organized as follows (needless to say, the organization may vary depending upon the type of topic):

Throughout the paper, give references to relevant secondary sources.

8.3 Around and after the main text

Some writing/formatting conventions
Writing about linguistic items: Use italic type to mark out a word or phrase that you are discussing and which is not an integrated part of the running text. Examples: (1) The most common word in the English language is the. (2) The word foot enters into a lot of more or less fixed phrases such as foot the bill, foot-and-mouth disease, put one's foot in it and set foot in.

Quoting examples: Use normal type for the example. However, you may want to add emphasis (for instance by using italics) for the part of the example that is most relevant to the discussion. Examples should be numbered for ease of reference. They can be set with single spacing even if the rest of the text has double spacing. See further advice on how to quote examples.

Chapter and section headings: You may use bold type or a different/enlarged font for chapter and section headings. Normally chapter headings are more prominent than section headings. In a term paper it will often not be necessary to subdivide sections (e.g. 2.1, 2.2, 2.3), but in an MA thesis this is often done, just as in linguistics books. Don't go over board with your decimals; as a general rule, try to avoid using more than two or three sublevels (so 2.1.1 and possibly 2.1.1.2 are all right, but not 2.1.2.5.2).

 10 Language use

Before you hand in your paper, make sure that it is free of errors in language (grammar, vocabulary, spelling). Check that pronoun references are clear: do not overuse this and that in referring to the preceding text. Make sure the verbs agree with their subjects. Avoid sentence fragments, without a subject and a verb. On the other hand, do not use long run-on sentences, with main clauses loosely strung together. If you can replace a comma by a full stop, do so. Be careful with paragraphing. this will contribute greatly to the clarity and readability of your paper.

The genre of an academic paper involves a fairly formal style. Avoid contractions (she's, aren't, etc.). Reduce reference to yourself to a minimum. Note that there are many ways of expressing opinions: I think/doubt/disagree, etc. may be regarded as too informal for an academic paper. Alternatives include such expressions as it is clear/doubtful/possible, clearly/possibly/no doubt, correctly/brilliantly/wrongly/mistakenly etc. Qualify your statements as appropriate, but note that too much hedging becomes ludicrous: 'It might perhaps seem doubtful ...'. Say what you think is true, but no more and no less. It may be a good idea to collect useful and/or elegant turns of phrase from books and articles about linguistics.

Finally, to avoid boring your reader you should try to vary your language (but not at the expense of clarity!). A synonym dictionary or a thesaurus may give you ideas for alternative ways of expression. The English language, the subject of your paper, is a rich and flexible instrument, and it is great if this shows in your paper.


Note

  1. This paper is based on Stig Johansson's manuscript "Writing a Term Paper in English Linguistics: Some Hints for Mellomfag and Hovedfag Students" (unpublished, University of Oslo, 1997). Parts of Johansson's paper were based on Altenberg et al. (1980) and on material provided by Kay Wikberg. Extensive revisions to the original documents have been (and are being) made by Hilde Hasselgård.


References

Altenberg, Bengt, Jan Svartvik, and Gunnel Tottie. 1980. How to write a term paper in English linguistics. Department of English, Lund University. Unpublished.

Crystal, David. 2008. A Dictionary of Linguistics and Phonetics. 6th ed. Oxford: Wiley Blackwell.

Johannesson, Nils-Lennart. 1993. English Language Essays: Investigation Method and Writing Strategies. 4th ed. English Department, Stockholm University.

Johansson, Stig. 1986. Some observations on the order of adverbial particles and objects in the LOB Corpus. In Sven Jacobson (ed.), Papers from the Third Scandinavian Symposium on Syntactic Variation, Stockholm, May 11-12, 1985. Stockholm Studies in English 56. Stockholm: Almqvist & Wiksell. 51-62.

Matthews, P.H. 2014. The Concise Oxford Dictionary of Linguistics. Third edition. Oxford: Oxford University Press. [Available at the UiO here]

Scott, Mike. 2001. Comparing corpora and identifying key words, collocations, and frequency distributions through the WordSmith Tools suite of computer programs. In Mohsen Ghadessy, Alex Henry, and Robert L. Roseberry (eds), Small Corpus Studies: Theory and Practice, 47-67. Amsterdam: Benjamins.


(c) Stig Johansson/Hilde Hasselgård and the Department of Literature, Area Studies and European Languages, University of Oslo

Last edited September 2023, HH