Teisinių technologijų (LegalTech) centras / Legal Technologies (LegalTech) Centre
English-Lithuanian Comparable Cybersecurity Corpus - DVITASItem type:Dataset, dataset[2022][D][H004] ;Utka, Andrius; ; ; ; Vytauto Didžiojo universitetas, 2022The English-Lithuanian comparable corpus (DVITAS COMPARABLE) is morphologically annotated. It includes English and Lithuanian original texts on cybersecurity from the time period of 2010-2021. The corpus was compiled for the bilingual terminology extraction project together with English-Lithuanian parallel corpus. There are 1,708 files in English and 2,567 for Lithuanian. The total size of the corpus is 4m words (EN-2m; LT-2m) The corpus is composed of texts representing 4 text types: academic (EN-19%; LT-30%), administrative-informative (EN-8%; LT-11%), legal (EN-18%; LT-4%), media (EN-55%; LT-55%).
59 English-Lithuanian Parallel Cybersecurity Corpus - DVITASItem type:Dataset, dataset[2022][D][H004] ;Utka, Andrius; ; ; ; Vytauto Didžiojo universitetas, 2022English-Lithuanian parallel corpus DVITAS includes original English texts on cybersecurity and their Lithuanian translations aligned on the sentence level. The corpus was compiled for the bilingual terminology extraction project together with English-Lithuanian comparable corpus. The parallel corpus includes the EU legal acts and other documents from the time period of 2006-2021. The documents have been extracted from the EUR-Lex database and other EU institutional repositories. There are 80 aligned files in TMX format in English and Lithuanian, as well as 160 raw files (80 in English, and 80 in Lithunian) in the dataset. The total size of the corpus is 1.4m words (EN-0.77m; LT-0.63m). The corpus contains 35,415 aligned segments.
91 English-Lithuanian Parallel Cybersecurity Corpus - DVITAS v2.0Item type:Dataset, dataset[2024][D][H004] ;Mickevič, Jolanta ;Utka, Andrius; ; ; ; CLARIN-LT, 2024English-Lithuanian parallel corpus DVITAS v2 includes original English texts on cybersecurity and their Lithuanian translations aligned on the sentence level. Version 1 of the corpus was compiled for the bilingual terminology extraction project DVITAS together with English-Lithuanian comparable corpus. The current 2nd version of the corpus features expansion of the 1st version containing additional 27 files and metadata information. The parallel corpus includes the EU legal acts and other documents from the time period of 2006-2022. The documents have been extracted from the EUR-Lex database and other EU institutional repositories. There are 107 aligned files in TMX format in English and Lithuanian, as well as 214 raw files (107 in English, and 107 in Lithuanian) within the dataset. The total size of the corpus is 1.97m words (EN-1.08m; LT-0.88m). The corpus contains 53,792 aligned segments.
10 Lithuanian-English Cybersecurity Termbase v.0.1Item type:Dataset, [Lietuvių-anglų kalbų kibernetinio saugumo terminų bazė v.0.1]dataset[2023][D][H004] ;Utka, Andrius; ; ; ; Vytauto Didžiojo universitetas, 2023The bilingual termbase is TBX export of the online termbase https://www.terminologue.org/csterms/. The termbase includes terms for 233 cybersecurity concepts.
45