Machine Learning Methods for Discourse Marker Detection in Italian
Author(s) |
---|
Montechiari, Emma Angela |
Stankov, Stanko |
Mishev, Kostadin |
Damova, Mariana |
Date Issued |
---|
2022 |
The latest advances in NLP, more precisely NLP Transformers, show great performance in building universal language representations. The trained vectors of words or sentences can provide unique representation for multiple languages, exclusively extracting semantic information from texts that is mapped into shared embedding space. This semantic information can be leveraged to train a model for specific downstream tasks, such as text classification, clustering, and others, while also leveraging semantic information for language understanding. The resulting model from the training phase can be universally used for all languages whose shared vector space is encompassed, thus avoiding the need to train separate models for each language individually.