-
Haug, Dag Trygve Truslew; Yildirim, Ahmet; Nøklestad, Anders & Kristen, Hagen
(2023).
Rules and neural nets for morphological tagging of Norwegian - Results and challenges.
-
Larsson, Ida; Lundquist, Bjørn; Westendorp, Maud; Nøklestad, Anders & Tengesdal, Eirik
(2020).
The Nordic Word Order Database.
-
Golden, Anne; Nøklestad, Anders & Johansson, Sofie
(2019).
The vocabulary of the Nordic heritage speakers in the US. An attempt of categorization.
-
Lundquist, Bjørn; Larsson, Ida; Westendorp, Maud; Tengesdal, Eirik & Nøklestad, Anders
(2019).
Presenting the Nordic Word Order Database.
-
-
Johannessen, Janne Bondi; Askeland, Anne Renette; Hagen, Kristin; Håberg, Live; Jensen, Bård Uri & Nøklestad, Anders
[Vis alle 8 forfattere av denne artikkelen]
(2018).
Utfordringa med fonetisk transkripsjon av dialekter i den digitale tidsalderen: Oslo-translitteratoren.
-
Nøklestad, Anders; Hagen, Kristin; Johannessen, Janne Bondi; Kosek, Michał & Priestley, Joel
(2017).
A Modernised Version of the Glossa Corpus Search System.
-
-
Søfteland, Åshild & Nøklestad, Anders
(2016).
Korpus-workshop.
-
Johannessen, Janne Bondi; Vangsnes, Øystein A; Lundquist, Bjørn; Larsson, Ida; Bentzen, Kristine & Garbacz, Piotr
[Vis alle 10 forfattere av denne artikkelen]
(2014).
Nye isoglosser illustrert i det nye nettstedet for nordisk språk: NALS – Nordic Atlas of Language Structures (Online).
-
Priestley, Joel; Johannessen, Janne Bondi; Hagen, Kristin; Nøklestad, Anders & Lynum, André
(2012).
Maps as a central linguistic research tool.
-
Johannessen, Janne Bondi; Priestley, Joel; Hagen, Kristin; Nøklestad, Anders & Lynum, André
(2012).
The Nordic Dialect Corpus.
-
-
Johannessen, Janne Bondi & Nøklestad, Anders
(2010).
Recent developments in the Nordic Dialect Corpus and the Nordic Syntactic Judgments Database.
-
Johannessen, Janne Bondi; Hagen, Kristin; Nøklestad, Anders & Priestley, Joel
(2010).
Enhancing Language Resources with Maps.
-
Nøklestad, Anders
(2010).
Bruk av et norsk leksikon til tagging og andre språkteknologiske formål.
-
Johannessen, Janne Bondi; Nøklestad, Anders & Priestley, Joel
(2007).
Developing multi-Scandinavian word-lists for multi-Scandinavian texts.
-
Johannessen, Janne Bondi; Hagen, Kristin; Laake, Signe; Lindstad, Arne Martinus; Vangsnes, Øystein A. & Åfarli, Tor A.
[Vis alle 7 forfattere av denne artikkelen]
(2007).
Dialektkorpus - presentasjon av prosjekt, metode, innsamling og materiale.
-
Søfteland, Åshild & Nøklestad, Anders
(2006).
”Manuell morfologisk tagging av NoTa-materialet med støtte fra en statistisk tagger”.
-
Nøklestad, Anders
(2005).
Memory-based PP Attachment Disambiguation for Norwegian.
-
-
Nøklestad, Anders; Johansson, Christer & van den Bosch, Antal
(2004).
Pronominal anaphora resolution in Norwegian using TiMBL and z-scores.
-
Nøklestad, Anders
(2004).
Memory-based Classification of Proper Names in Norwegian.
-
Johannessen, Janne Bondi; Nøklestad, Anders; Hagen, Kristin & Lindstad, Arne Martinus
(2000).
Det åpne laboratoriet.
[Avis].
Apollon.
-
Johannessen, Janne Bondi; Nøklestad, Anders & Hagen, Kristin
(2000).
A Web-Based Advanced and User Friendly System: The Oslo Corpus of Tagged Norwegian Texts.
Vis sammendrag
A general purpose text corpus meant for linguists and lexicographers needs to satify quality criteria at at least four different levels. The first two criteria are fairly well established; the corpus should have a wide variety of texts and be tagged according to a fine-grained system. The last two criteria are much less widely appreciated, unfortunately. One has to do with variety of search criteria: the user should be allowed to search for any information contained in the corpus, and with any combination possible. In addition, the search results should be presented in a choice of ways. The forth criterion has to do with accessability. It is a rather surprising fact that while user interfaces tend to be simple and self explanatory in most areas of life represented electronically, corpus interfaces are still extremely user unfriendly. In this paper, we present a corpus whose interface we have given a lot of thought, and likewise the possible search options, viz. the Oslo Corpus of Tagged Norwegian Texts.
-
Johannessen, Janne Bondi; Nøklestad, Anders & Hagen, Kristin
(2000).
A Web-Based Advanced and User Friendly System: The Oslo Corpus of Tagged Norwegian Texts.
Vis sammendrag
A general purpose text corpus meant for linguists and lexicographers needs to satify quality criteria at at least four different levels. The first two criteria are fairly well established; the corpus should have a wide variety of texts and be tagged according to a fine-grained system. The last two criteria are much less widely appreciated, unfortunately. One has to do with variety of search criteria: the user should be allowed to search for any information contained in the corpus, and with any combination possible. In addition, the search results should be presented in a choice of ways. The forth criterion has to do with accessability. It is a rather surprising fact that while user interfaces tend to be simple and self explanatory in most areas of life represented electronically, corpus interfaces are still extremely user unfriendly. In this paper, we present a corpus whose interface we have given a lot of thought, and likewise the possible search options, viz. the Oslo Corpus of Tagged Norwegian Texts.
-
Johannessen, Janne Bondi & Nøklestad, Anders
(1999).
Tavle-analyse ut - data-analyse inn.
[Avis].
Aftenposten.
-
Johannessen, Janne Bondi & Nøklestad, Anders
(1999).
Mot et maksimalt brukervennlig korpus.
-
Hagen, Kristin; Johannessen, Janne Bondi & Nøklestad, Anders
(1999).
The shortcomings of a tagger.
Vis sammendrag
The tagger used for the Oslo Corpus of Tagged Norwegian Texts has very good statistical results. In spite of this, it makes mistakes. In this
paper we take a closer look at some of them. Although some mistakes are of a kind that would disappear if we improved the tagger, many are
impossible or very difficult to do anything about. They are due to errors in the corpus (spelling errors, foreign words, non-standard spellings), to elliptic sentences, such as headlines, and to structural ambiguity, which abounds to a surprising extent. Proofreading the corpus would have removed the first kind of problems, but the other two types cannot be resolved in any obvious way.
-
Johannessen, Janne Bondi & Nøklestad, Anders
(1999).
Oslo-korpuset av taggede, norske tekster.
-
Johannessen, Janne Bondi & Nøklestad, Anders
(1999).
Oslo-korpuset av taggede, norske tekster.
-
Hagen, Kristin; Nøklestad, Anders & Johannessen, Janne Bondi
(1998).
A Constraint-based Tagger for Norwegian.
Vis sammendrag
Disambiguating morphosyntactic taggers are computer programs which provide the words in a text with grammatical information and which are
able to pick the correct reading for ambiguous words based on linguistic context. We describe such a tagger for Norwegian BOKMÅL and
NYNORSK which is based on the Constraint Grammar formalism (Karlsson et al. 1995). The tagger disambiguates through the use of linguistic
constraints that operate only the level of individual words, which means that no phrase structure is established. We show how it is possible to perform morphological and syntactic disambiguation of Norwegian texts without having recourse to a phrasal level.
-
Nøklestad, Anders
(2009).
A Machine Learning Approach to Anaphora Resolution Including Named Entity Recognition, PP Attachment Disambiguation, and Animacy Detection.
Unipub forlag.
ISSN 0806-3222.
Vis sammendrag
The thesis describes an automatic anaphora resolution (AR) system for Norwegian, focussing on the resolution of pronominal anaphora in fiction material. The system relies primarily on machine learning (ML) methods, and is the first Norwegian AR system to use machine learning. A set of linguistically motivated filters remove incompatible antecedent candidates before the remaining ones are classified as either antecedent or non-antecedent. The closest candidate classified as a suitable antecedent (if any) is selected as the antecedent of the pronoun.
For the classifier, three different machine learning methods are evaluated and compared: memory-based learning (MBL), maximum entropy modelling (MaxEnt), and support vector machines (SVMs). The methods are tested with default as well as automatically optimized parameter settings. Different pronouns are handled by separate classifiers. Two other knowledge-poor approaches, a factor/indicator-based approach and a Centering Theory approach, are compared to the machine learning methods. The best machine learning approaches perform significantly better than the non-ML approaches and significantly better than the only previously existing Norwegian AR system.
The thesis also describes the development and evaluation of three support modules providing information to the AR system: a named entity recognizer, a PP attachment disambiguator, and an animacy detector. Various machine learning methods are tested and compared with respect to the first two modules. The PP module introduces a novel kind of semi-supervised learning, while the animacy detector employs two different procedures for using the World Wide Web to obtain animacy information for nouns. The three support modules are evaluated both as standalone NLP tools and as information sources for the AR system.
In almost all experiments described in this thesis, MBL performs better than or equally well as MaxEnt, while the performance of the SVMs is significantly worse.