English-Lithuanian Parallel Cybersecurity Corpus - DVITAS

Utka, Andrius; Rackevičienė, Sigita; Rokas, Aivaras; Bielinskienė, Agnė; Mockienė, Liudmila; Laurinaitis, Marius

Use this url to cite dataset: https://cris.mruni.eu/cris/handle/007/47354

Type of document

dataset

Author(s)

Creator	Affiliation
Utka, Andrius

Title

English-Lithuanian Parallel Cybersecurity Corpus - DVITAS

Publisher

Vytauto Didžiojo universitetas

Mykolo Romerio universitetas

Date Issued

2022

Research Area

Field of Science

Keywords

eLABa

151093184

Abstract

English-Lithuanian parallel corpus DVITAS includes original English texts on cybersecurity and their Lithuanian translations aligned on the sentence level. The corpus was compiled for the bilingual terminology extraction project together with English-Lithuanian comparable corpus. The parallel corpus includes the EU legal acts and other documents from the time period of 2006-2021. The documents have been extracted from the EUR-Lex database and other EU institutional repositories. There are 80 aligned files in TMX format in English and Lithuanian, as well as 160 raw files (80 in English, and 80 in Lithunian) in the dataset. The total size of the corpus is 1.4m words (EN-0.77m; LT-0.63m). The corpus contains 35,415 aligned segments.

Geographic Coverage

LT

Language

Anglų / English (en)

URI

URI
https://cris.mruni.eu/cris/handle/007/47354
https://clarin.vdu.lt/xmlui/handle/20.500.11821/46
http://hdl.handle.net/20.500.11821/46

Affiliation(s)

Humanitarinių mokslų institutas / Institute of Humanities

Teisinių technologijų (LegalTech) centras / Legal Technologies (LegalTech) Centre

Teisės mokykla / Law School

Žmogaus ir visuomenės studijų fakultetas / Faculty of Human and Social Studies

Mykolo Romerio universitetas / Mykolas Romeris University

Project(s)

Bilingual Automatic Terminology Extraction P-MIP-20-282

Funding(s)

Research Council of Lithuania