A Parallel Cross-Lingual Benchmark for Multimodal Idiomaticity Understanding
| Author | Affiliation | |||
|---|---|---|---|---|
Torunoğlu-Selamet, Dilara | Istanbul Technical University | TR | ||
Arslan, Doğukan | Istanbul Technical University | TR | ||
Wilkens, Rodrigo | University of Exeter | GB | ||
He, Wei | University of Exeter | GB | ||
Eryiğit, Doruk | ITU NLP | XX | ||
Pickard, Thomas | University of Sheffield | GB | ||
Pagano, Adriana S. | Federal University of Minas Gerais | BR | ||
Villavicencio, Aline | University of Exeter | GB | University of Sheffield | GB |
Eryiğit, Gülşen | Istanbul Technical University | TR | ||
Abuczki, Ágnes | Károli Gáspár University of the Reformed Church in Hungary | HU | ||
Cardoso, Aida | University of Lisbon | ES | ||
Lazarenka, Alesia | Tecnológico de Estudios Superiores de Ixtapaluca | MX | ||
Almassova, Dina | Nazarbayev University | KZ | ||
Mendes, Amalia | University of Lisbon | ES | ||
Kanellopoulou, Anna | Aristotle University of Thessaloniki | GR | ||
Brosa-Rodríguez, Antoni | Universitat Rovira i Virgili | ES | ||
Valkovska, Baiba | University of Latvia | LV | ||
Wojtowicz, Beata | University of Warsaw | PL | ||
Pedersen, Bolette | University of Copenhagen | DK | ||
Hidalgo-Ternero, Carlos Manuel | Universidad de Málaga | ES | ||
Liebeskind, Chaya | Jerusalem College of Technology | IL | ||
Jokić, Danka | University of Belgrade | RS | ||
Alves, Diego | Saarland University | DE | ||
Triantafyllidi, Eleni | Aristotle University of Thessaloniki | GR | ||
Velldal, Erik | University of Oslo | NO | ||
Philippy, Fred | University of Luxembourg | LU | ||
Rizgelienė, Ieva | Vilniaus universitetas | |||
Skadina, Inguna | University of Latvia | LV | ||
Lobzhanidze, Irina | Ilia State University | GE | ||
Stinessen Haugen, Isabell | University of Bergen | NO | ||
Akbar Krito, Jauza | Universitas Gadjah Mada | ID | ||
Marković, Jelena M. | University of East Sarajevo | BA | ||
Monti, Johanna | University of Naples - L'Orientale | IT | ||
Sauca, Josue Alejandro | Universitat de València | ES | ||
Dobrovoljc, Kaja | University of Ljubljana | SI | Jožef Stefan Institute | SI |
Ugwuanyi, Kingsley O. | SOAS University of London | GB | ||
Rituma, Laura | University of Latvia | LV | ||
Øvrelid, Lilja | University of Oslo | NO | ||
Tufail Agro, Maha | Mohamed bin Zayed University of Artificial Intelligence | AE | ||
Abjalova, Manzura | Alisher Navo'i Tashkent State University of Uzbek Language and Literature | UZ | ||
Chatzigrigoriou, Maria | National and Kapodistrian University of Athens | GR | ||
del Mar Sánchez Ramos, María | Universidad de Alcalá | ES | ||
Pendevska, Marija | Ss. Cyril and Methodius University in Skopje | MK | ||
Seyyedrezaei, Masoumeh | Istinye University | TR | ||
Shamsfard, Mehrnoush | Shahid Beheshti University | IR | ||
Ahsan, Momina | Mohamed bin Zayed University of Artificial Intelligence | AE | ||
Riaz Khan, Muhammad Ahsan | Mohamed bin Zayed University of Artificial Intelligence | AE | ||
Hau Norman, Nathalie Carmen | University of Copenhagen | DK | ||
Ayyıldız, Nilay Erdem | Fırat University | TR | ||
Hosseini-Kivanani, Nina | University of Luxembourg | LU | ||
Ligeti-Nagy, Noémi | ELTE Research Centre for Linguistics | HU | ||
Naeem, Numaan | Mohamed bin Zayed University of Artificial Intelligence | AE | ||
Kanishcheva, Olha | Heidelberg University | US | Selected Electronic Technologies | DE |
Yatsyshyna, Olha | Ternopil Volodymyr Hnatiuk National Pedagogical University | UA | ||
Orel, Daniil | Mohamed bin Zayed University of Artificial Intelligence | AE | ||
Giommarelli, Petra | University of Pisa | IT | University of Naples - L'Orientale | IT |
Osenova, Petya | Bulgarian Academy of sciences | BG | ||
Garabik, Radovan | Slovak Academy of Sciences | SK | ||
Semou, Regina E. | National and Kapodistrian University of Athens | GR | ||
Rebechi, Rozane | Federal University of Rio Grande do Sul | BR | ||
Pranida, Salsabila Zahirah | Mohamed bin Zayed University of Artificial Intelligence | AE | ||
Touileb, Samia | University of Bergen | NO | ||
Nimb, Sanni | Society for Danish Language and Literature | DK | ||
Ahmad, Sarfraz | Mohamed bin Zayed University of Artificial Intelligence | AE | ||
Sharipova, Sarvinoz | Samarkand State Institute of Foreign Languages | AZ | ||
Golan, Shahar | Jerusalem College of Technology | IL | ||
Ji, Shaoxiong | ELLIS Institute Finland | FI | ||
Aboh, Sopuruchi Christian | Hong Kong Polytechnic University | CN | ||
Sucur, Srdjan | University of East Sarajevo | BA | ||
Markantonatou, Stella | ILSP and Archimedes Unit-RC ATHENA | GR | ||
Olsen, Sussi | University of Copenhagen | DK | ||
Tajalli, Vahide | Shahid Beheshti University | IR | ||
Lipp, Veronika | ELTE Research Centre for Linguistics | HU | ||
Giouli, Voula | Aristotle University of Thessaloniki | GR | ||
Eraydın, Yelda Yeşildal | Fırat University | TR | ||
Saaberi, Zahra | Shahid Beheshti University | IR | ||
Xie, Zhuohan | Mohamed bin Zayed University of Artificial Intelligence | AE |
| Date |
|---|
2026 |
Potentially idiomatic expressions (PIEs) carry meanings inherently tied to the everyday experience of a given language community. As such, they constitute an interesting challenge for assessing the linguistic (and to some extent cultural) capabilities of NLP systems. In this paper, we present XMPIE, a parallel multilingual and multimodal dataset of potentially idiomatic expressions. The dataset, containing 34 languages and over ten thousand items, allows comparative analyses of idiomatic patterns among language-specific realisations and preferences in order to gather insights about shared cultural aspects. This parallel dataset allows evaluation of language model performance for a given PIE in different languages and whether idiomatic understanding in one language can be transferred to another. Moreover, the dataset supports the study of PIEs across textual and visual modalities, to measure to what extent PIE understanding in one modality transfers or implies in understanding in another modality (text vs. image). The data was created by language experts, with both textual and visual components crafted under multilingual guidelines, and each PIE is accompanied by five images representing a spectrum from idiomatic to literal meanings, including semantically related and random distractors. The result is a high-quality benchmark for evaluating multilingual and multimodal idiomatic language understanding.