Frequency lists from NoWaC

The frequency lists from NoWaC contain frequencies of word forms and lemmas.

Homonyms are counted separately according to how they have been tagged by the Oslo-Bergen grammatical tagger. For example, the verb "arbeid" and the noun "arbeid" are counted separately, and the same goes for e.g. past tense and past participle of verbs like "hoppe".

All words are converted to lowercase letters so that e.g. "The" and "the" are counted together. An exception is proper names that retain their original form.

It should be noted that parts of the corpus contain text in formats that are difficult to recognize for the grammatical tagger (e.g. different newpaper bylines or question-answer formats on chat sites). This means that many words have been analysed as proper names when they are in fact sentence initial common nouns, pronouns etc.

 

Download

The frequency lists are distributed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.0 Generic license.

Character frequencies in Norwegian

This list of character frequencies in Norwegian is generated from NoWaC. The list is sorted by descending frequency and includes all characters that appear in the corpus. Uppercase and lowercase characters are merged. The characters are listed with absolute frequency and with percentage in brackets.
Download the character frequency list

 

Published Oct. 23, 2023 3:46 PM - Last modified Jan. 24, 2024 1:38 PM