Two news portals were selected for comparable corpora building: the Lithuanian portal DELFI and the English portal The Guardian. The compiled corpora comprise 135 Lithuanian articles from DELFI portal and 135 English articles from the Guardian portal. The main criterion for article extraction from the portals was the presence of the two keywords in the articles: "vaccination" and "vaccine". The selected time period for the articles was from January 2021 to September 2021. 30 (15 Lithuanian and 15 English) articles were selected for each month of this period. The extracted articles were used to build two types of comparable corpora necessary for further analysis: full–text corpora composed of full texts of articles and extract corpora composed of titles and lead paragraphs of articles. The sizes of the full-text corpora are: the Lithuanian full-text corpus ‘Lithuanian media articles on vaccination’ contains 45,827 words, the English full-text corpus ‘English media articles on vaccination’ contains 96,759 words. The sizes of the extract corpora are: the Lithuanian extract corpus contains 4,863 words, the English extract corpus contains 3,828 words.
Use this url to cite dataset: https://cris.mruni.eu/cris/handle/007/47349