Use this url to cite publication: https://cris.mruni.eu/cris/handle/007/46483
Options
Multi-word Expressions as Discourse Markers in Multilingual TED-ELH Parallel Corpus
Type of publication
Straipsnis recenzuojamoje užsienio konferencijos medžiagoje / Article in peer-reviewed foreign conference proceedings (P1g)
Author(s)
Liebeskind, Chaya | Jerusalem College of Technology |
Title
Multi-word Expressions as Discourse Markers in Multilingual TED-ELH Parallel Corpus
Publisher
NOVA CLUNL, 2023
Date Issued
2023
Extent
p. 466-469
Start Page
466
End Page
469
Field of Science
Abstract
Multi-word Expressions as Discourse Markers in Multilingual TED-ELH Parallel Corpus Giedrė Valūnaitė Oleškevičienė Institute of Humanities Mykolas Romeris university Ateities 20, LT-08303 Vilnius, Lietuva gvalunaite@mruni.eu Chaya Liebeskind Department of Computer Science Jerusalem College of Technology 21 Havaad Haleumi st., 9116001 Jerusalem, Israel liebchaya@gmail.com Abstract In this paper, we present the outcome of the research inspired by the Nexus Linguarum net- work. As a theoretical basis, we discuss the multi-word word expressions as a part of the formulaic language used as discourse markers for organizing discourse. We also identify that parallel research in multiple languages may provide inter-lingual insights. We created a parallel multilingual corpus TED-ELH for our research and applied a parallel corpus alignment algorithm to extract multi-word discourse markers and their translations in Lithuanian and Hebrew. The analysis of the translations of multi-word discourse markers allowed us to identify that they demonstrate certain variability and either remain multi-word expressions or turn into one-word translations due to the linguistic characteristics of the target languages.
Part Of
Language, Data and Knowledge 2023 (LDK 2023) : Proceedings of the 4th Conference on Language, Data and Knowledge, 12–15 September 2023. Vienna, Austria / ed. Sara Carvalho... [et all.]
Type of document
text::conference output::conference proceedings::conference paper
ISBN (of the container)
9789895408153
eLABa
181418322
Coverage Spatial
Portugalija / Portugal (PT)
Language
Anglų / English (en)
Bibliographic Details
20
Funding(s)
COST Action CA18209 – European network for Web-centred linguistic data science, supported by COST (European Cooperation in Science and Technology) |
Date Reporting
2023
Creative Commons License
Access Rights
Atviroji prieiga / Open Access
File(s)