Multi-word Expressions as Discourse Markers in Multilingual TED-ELH Parallel Corpus
Author(s) | ||
---|---|---|
Liebeskind, Chaya | Jerusalem College of Technology | IL |
Date Issued | Start Page | End Page |
---|---|---|
2023 | 466 | 469 |
Multi-word Expressions as Discourse Markers in Multilingual TED-ELH Parallel Corpus Giedrė Valūnaitė Oleškevičienė Institute of Humanities Mykolas Romeris university Ateities 20, LT-08303 Vilnius, Lietuva gvalunaite@mruni.eu Chaya Liebeskind Department of Computer Science Jerusalem College of Technology 21 Havaad Haleumi st., 9116001 Jerusalem, Israel liebchaya@gmail.com Abstract In this paper, we present the outcome of the research inspired by the Nexus Linguarum net- work. As a theoretical basis, we discuss the multi-word word expressions as a part of the formulaic language used as discourse markers for organizing discourse. We also identify that parallel research in multiple languages may provide inter-lingual insights. We created a parallel multilingual corpus TED-ELH for our research and applied a parallel corpus alignment algorithm to extract multi-word discourse markers and their translations in Lithuanian and Hebrew. The analysis of the translations of multi-word discourse markers allowed us to identify that they demonstrate certain variability and either remain multi-word expressions or turn into one-word translations due to the linguistic characteristics of the target languages.