Use this url to cite publication: https://cris.mruni.eu/cris/handle/007/18195
Options
LOD-connected offensive language ontology and tagset enrichment
Type of publication
Straipsnis konferencijos medžiagoje Scopus duomenų bazėje / Article in conference proceedings in Scopus database (P1a2)
Author(s)
Lewandowska-Tomaszczyk, Barbara | State University of Applied Sciences |
Žitnik, Slavko | Faculty for Computer and Information Science |
Bączkowska, Anna | University of Gdansk |
Liebeskind, Chaya | Jerusalem College of Technology |
Mitrović, Jelena | University of Passau |
Title
LOD-connected offensive language ontology and tagset enrichment
Date Issued
2021
Extent
p. 1-16
Is part of
CEUR Workshop Proceedings: Proceedings of the Workshops and Tutorials - Language Data and Knowledge 2021 (LDK 2021), Zaragoza, Spain, September 1-4, 2021 / Edited by: Sara Carvalho, Renato Rocha Souza. [S.l.] : CEUR-WS, 2021, vol. 3064.
Field of Science
Abstract
The main focus of the paper is the definitional revision and enrichment of offensive language typology, making reference to publicly available offensive language datasets and testing them on available pre-trained lexical embedding systems. We review over 60 available corpora and compare tagging schemas applied there while making an attempt to explain semantic differences between particular concepts of the category OFFENSIVE in English. A finite set of classes that cover aspects of offensive language representation along with linguistically sound explanations is presented, based on the categories originally proposed by Zampieri et al. [1, 2] in terms of offensive language categorization schemata and tested by means of Sketch Engine tools on a large web-based corpus. The schemata are juxtaposed and discussed with reference to non-contextual word embeddings FastText, Word2Vec, and Glove. The methodology for mapping from existing corpora to a unified ontology as presented in this paper is provided. The proposed schema will enable further comparable research and effective use of corpora of languages other than English. It will also be applied in building an enriched tagset to be trained and used on new data, with the application of recently developed LLOD techniques [3].
Is Referenced by
Type of document
type::text::conference output::conference proceedings::conference paper
ISSN (of the container)
1613-0073
SCOPUS
2-s2.0-85126015890
eLABa
124175741
Coverage Spatial
Jungtinės Amerikos Valstijos / United States of America (US)
Language
Anglų / English (en)
Bibliographic Details
54
Funding(s)
European Cooperation in Science and Technology |
Bundesministerium für Bildung und Forschung |
Creative Commons License
Access Rights
Atviroji prieiga / Open Access
File(s)
Journal | Cite Score | SNIP | SJR | Year | Quartile |
---|---|---|---|---|---|
CEUR Workshop Proceedings | 1.1 | 0.317 | 0.228 | 2021 | Q4 |