Frequency lists from NoWaC

The frequency lists from NoWaC contain frequencies of word forms and lemmas.

Homonyms are counted separately according to how they have been tagged by the Oslo-Bergen grammatical tagger. For example, the verb "arbeid" and the noun "arbeid" are counted separately, and the same goes for e.g. past tense and past participle of verbs like "hoppe".

All words are converted to lowercase letters so that e.g. "The" and "the" are counted together. An exception is proper names that retain their original form.

It should be noted that parts of the corpus contain text in formats that are difficult to recognize for the grammatical tagger (e.g. different newpaper bylines or question-answer formats on chat sites). This means that many words have been analysed as proper names when they are in fact sentence initial common nouns, pronouns etc.

Download

The frequency lists are distributed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.0 Generic license.

Download the frequency list of analyzed word forms
Download the raw word form frequency list
Download the lemma frequency list
Download the list of inflected forms for the 1000 most frequent lemmas
Download the frequency list sorted primary alphabetic and secondary by frequency within each character.
(User example: Search for a blank followed by the character b and get the most frequent words starting with the character b.)

Norwegian noun-noun compounds (compiled from NoWaC and edited by Eli Anne Eiesland)

Character frequencies in Norwegian

This list of character frequencies in Norwegian is generated from NoWaC. The list is sorted by descending frequency and includes all characters that appear in the corpus. Uppercase and lowercase characters are merged. The characters are listed with absolute frequency and with percentage in brackets.
Download the character frequency list

Published Oct. 23, 2023 3:46 PM - Last modified Jan. 24, 2024 1:38 PM