habit

Last modified Apr. 16, 2024 10:35 AM by root@localhost
Last modified Oct. 23, 2023 4:07 PM by lenkeretter@localhost

The Text Laboratory can now offer two big web corpora for Norwegian, finished in 2017.

• HaBiT Norwegian Web Corpus 2015 (Bokmål) with 1.18 billion words (3.4 million documents).

• HaBiT Norwegian Web Corpus 2015 (Nynorsk) with more than 55 million words (214 000 documents).

The corpus for Nynorsk is the first web corpora collected for this language.