Record Details

Outomatiese lemma-identifisering vir Afrikaans

Literator

 
 
Field Value
 
Title Outomatiese lemma-identifisering vir Afrikaans
 
Creator Groenewald, H.J. van Huyssteen, G.B.
 
Subject — Afrikaans; Feature Selection; Inflection; Lemmatisation; Machine Learning; Morphology; Natural Language Processing; Parameter Optimisation; Text Technology
Description Automatic lemmatisation for Afrikaans Automatic lemmatisation is a general normalisation procedure in text processing, where all inflected forms of a lexical word are normalised to a single lemma (i.e. a meaningful, uninflected base form from which more complex word forms could be formed). Traditionally, lemmatisers are developed by writing language-specific rules to identify lemmas. In this article an alternative approach is investigated, namely a machine learning approach, to develop a lemmatiser for Afrikaans (LIA: “Lemmaidentifiseerder vir Afrikaans”). An overview regarding the process of inflection in Afrikaans is provided with the aim of identifying the categories of inflection that are relevant for lemmatisation in Afrikaans. The format of the input and output is described with special reference to the nine inflectional categories for Afrikaans that the system should be able to handle. Then the task of lemmatisation as a classification task for machine learning is described, and a concise introduction to memory-based learning is provided. The development and evaluation of LIA is discussed in detail, and it is illustrated how the performance of the initial classifier is improved through feature selection and parameter optimisation. The best classifier reaches an accuracy of 92,8%. The article concludes with a view on some future work.
 
Publisher AOSIS
 
Contributor
Date 2008-07-25
 
Type info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion — — —
Format application/pdf
Identifier 10.4102/lit.v29i1.101
 
Source Literator; Vol 29, No 1 (2008); 65-92 Literator; Vol 29, No 1 (2008); 65-92 2219-8237 0258-2279
 
Language eng
 
Relation
The following web links (URLs) may trigger a file download or direct you to an alternative webpage to gain access to a publication file format of the published article:

https://literator.org.za/index.php/literator/article/view/101/85
 
Coverage — — —
Rights Copyright (c) 2008 H.J. Groenewald, G.B. van Huyssteen https://creativecommons.org/licenses/by/4.0
ADVERTISEMENT