MultiLexBATS: Multilingual Dataset of Lexical Semantic Relations
| Author | Affiliation | |
|---|---|---|
Gromann, Dagmar | University of Vienna | AT |
Oliveira, Hugo Gonçalo | University of Coimbra | PT |
Pitarch, Lucia | University of Zaragoza | ES |
Apostol, Elena-Simona | National University of Science and Technology Politehnica Bucharest | RO |
Bernad, Jordi | University of Zaragoza | ES |
Bytyçi, Eliot | University of Prishtina | KS |
Cantone, Chiara | University of Siena | IT |
Carvalho, Sara | University of Aveiro-CLLC/NOVA CLUNL | PT |
Frontini, Francesca | CNR-ILC | IT |
Garabik, Radovan | Slovak Academy of Sciences | SK |
Gracia, Jorge | University of Zaragoza | ES |
Granata, Letizia | University of Naples “L’Orientale” | IT |
Khan, Fahad | CNR-ILC | IT |
Knez, Timotej | University of Ljubljana | SI |
Labropoulou, Penny | Athena R.C. – ILSP | GR |
Liebeskind, Chaya | Jerusalem College of Technology | IL |
Buono, Maria Pia di | University of Naples “L’Orientale” | IT |
Anić, Ana Ostroški | Institute for the Croatian Language | HR |
Rodrigues, Ricardo | Polytechnic Institute of Coimbra / CISUC | PT |
Sérasset, Gilles | University Grenoble Alpes | SE |
Sidibé, Mahammadou | University Grenoble Alpes | SE |
Silvano, Purificação | University of Porto | PT |
Spahiu, Blerina | University of Milan-Bicocca | IT |
Sogutlu, Enriketa | University of Porto | PT |
Stanković, Ranka | University of Belgrade | RS |
Truică, Ciprian-Octavian | National University of Science and Technology Politehnica Buchares | RO |
Zitnik, Slavko | University of Ljubljana | SI |
Zdravkova, Katerina | University Ss Cyril and Methodius in Skopje | MK |
| Date |
|---|
2024 |
Understanding the relation between the meanings of words is an important part of comprehending natural language. Prior work has either focused on analysing lexical semantic relations in word embeddings or probing pretrained language models (PLMs), with some exceptions. Given the rarity of highly multilingual benchmarks, it is unclear to what extent PLMs capture relational knowledge and are able to transfer it across languages. To start addressing this question, we propose MultiLexBATS, a multilingual parallel dataset of lexical semantic relations adapted from BATS in 15 languages including low-resource languages, such as Bambara, Lithuanian, and Albanian. As experiment on cross-lingual transfer of relational knowledge, we test the PLMs’ ability to (1) capture analogies across languages, and (2) predict translation targets. We find considerable differences across relation types and languages with a clear preference for hypernymy and antonymy as well as romance languages.
European Social Fund Plus |
| Journal | Cite Score | SNIP | SJR | Year | Quartile |
|---|---|---|---|---|---|
Proceedings - International Conference on Computational Linguistics, COLING | 6.6 | 1.702 | 0 | 2024 | Q1 |