doclevel-MT-benchmark-discoMT2019

Tiedemann, Jörg; Scherrer, Yves

doclevel-MT-benchmark-discoMT2019

dc.contributor.affiliation	University of Helsinki - Tiedemann, Jörg
dc.contributor.affiliation	University of Helsinki - Scherrer, Yves
dc.contributor.author	Tiedemann, Jörg
dc.contributor.author	Scherrer, Yves
dc.date.accessioned	2025-03-24T15:14:52Z
dc.date.issued	2019-11-01
dc.date.issued	2019-11-01
dc.description	This release contains data sets for experiments with document-level machine translation. The data sets have been used in previous studies and provided here for replicability and comparison with other systems. The data sets are taken from the English-German news translation task at WMT 2019 and the English-German bitext in the OpenSubtitles collection v2016 from OPUS. All data sets are sentence aligned with corresponding lines being aligned to each other. Document boundaries are marked with empty lines (on both sides of the parallel corpus). The data set has been used in the following publication: @inproceedings{scherrer-tiedemann-loaiciga-2019, title = "Analysing concatenation approaches to document-level NMT in two different domains", author = {Scherrer, Yves and Tiedemann, J{\"o}rg and Lo{\'a}iciga, Sharid}, booktitle = "Proceedings of the Third Workshop on Discourse in Machine Translation", month = nov, year = "2019", address = "Hong-Kong", publisher = "Association for Computational Linguistics", } Please, cite that paper if you use the data set in your own work.
dc.identifier	https://doi.org/10.5281/zenodo.3525366
dc.identifier.uri	https://hydatakatalogi-test-24.it.helsinki.fi/handle/123456789/9399
dc.rights	Open
dc.rights.license	cc-by-4.0
dc.subject	natural language processing
dc.subject	machine translation
dc.subject	language technology
dc.subject	NLP
dc.title	doclevel-MT-benchmark-discoMT2019
dc.type	dataset
dc.type	dataset

Zenodo