Norwegian noun-noun compounds

These lists of compounds and the frequency list comprises the data material used in Eli Anne Eiesland's PhD dissertation, titled "The Semantics of Norwegian Noun-Noun Compounds: A Corpus-Based Study".

The lists are compiled from NoWaC.

The file words_from_nowac.txt contains all the words found in the NoWaC corpus with frequencies. As far as possible everything that are not well-formed words have been removed, such as web addresses, strings containing the same letter over and over, and strings with punctuation like @. In addition, strings that differ only in capitalization have been merged. Hyphens have been removed as well, and the resulting change in frequencies updated. The result is a frequency list that can be used as a database of words found in NoWaC.

Download words_from_nowac.txt

The rest of the files on this page contain compounds containing nouns from eight semantic categories: animals, artefacts, body parts, emotions, foods, persons, substances and plants. They were collected by searching the file called words_from_nowac.txt, which contains all the words found in the NoWaC corpus and their frequencies. The results were then manually cleaned in order to ensure that only well-formed Norwegian compounds were included. The frequency of each compound is given in a separate column. The files with "modifier" in their name contain compounds starting with the nouns in question, while the files with "head" in their name contain compounds ending with the nouns in question.

Download:

About the dissertation

The semantics of Norwegian noun-noun compounds: a corpus-based study

Noun-noun compounding, in which two or more nouns are combined into one, is highly productive in many languages, including Norwegian. In spite of this productivity, the meanings of noun-noun compounds are highly unpredictable. In my dissertation I use data from the NoWaC corpus (Guevara 2010) to investigate the range of semantic relations found in such compounds.

I compile a database of approximately 60,000 compound types, consisting of nouns from eight different semantic groups (“animals,” “artefacts,” “body parts,” “emotions,” “foods,” persons,” “plants,” and “substances”). Comparing these groups to each other, and find that there is a great deal of variation between them, both in terms of productivity and positional preference. While some groups, like “substances” has a large preference for the modifier position, other groups show no preference. No noun types show preference for the head position.

Through analyzing a subset of 2,000 of the 60,000 compounds, I propose a model of their semantic relations where each semantic relation is a category with a prototypical structure, with central and peripheral members. These relations are able to account for 94.8% of the analyzed compounds. I further demonstrate that the semantics of the constituent nouns which semantic relations they occur with most frequently.

A central finding is that there is more variation in the modifier position than in the head position, both in terms of openness (there are more different nouns found in the modifier position than in the head position), but also in terms of semantic relations: Compounds sharing the same modifier noun have more different semantic relations on average than compounds sharing the same head noun. I discuss how this finding is congruent with the view in Cognitive Grammar (Langacker 1987) that the head noun in a compound is the profile determinant, and that this profile determinant has an elaboration site that is “filled” by the modifier noun.

Eli Anne Eiesland January 2016

References

Guevara, Emiliano (2010), 'NoWaC: a large web-based corpus for Norwegian', Sixth Web as Corpus Workshop (Los Angeles, California), 1-7.
Langacker, Ronald W. (1987), Foundations of cognitive grammar (Stanford, Calif.: Stanford University Press) 516.

Publisert 7. jan. 2016 16:57 - Sist endret 28. apr. 2016 15:13