doclevel-MT-benchmark-discoMT2019

Tiedemann, Jörg; Scherrer, Yves

doclevel-MT-benchmark-discoMT2019

Date

2019-11-01, 2019-11-01

Creator/contributor

Tiedemann, Jörg

Scherrer, Yves

Publication Type

dataset
dataset

Repositories

Zenodo

Access rights

Open

Description

This release contains data sets for experiments with document-level machine translation. The data sets have been used in previous studies and provided here for replicability and comparison with other systems. The data sets are taken from the English-German news translation task at WMT 2019 and the English-German bitext in the OpenSubtitles collection v2016 from OPUS. All data sets are sentence aligned with corresponding lines being aligned to each other. Document boundaries are marked with empty lines (on both sides of the parallel corpus). The data set has been used in the following publication: @inproceedings{scherrer-tiedemann-loaiciga-2019, title = "Analysing concatenation approaches to document-level NMT in two different domains", author = {Scherrer, Yves and Tiedemann, J{\"o}rg and Lo{\'a}iciga, Sharid}, booktitle = "Proceedings of the Third Workshop on Discourse in Machine Translation", month = nov, year = "2019", address = "Hong-Kong", publisher = "Association for Computational Linguistics", } Please, cite that paper if you use the data set in your own work.

Link to original dataset

https://doi.org/10.5281/zenodo.3525366

Keyword

natural language processing, machine translation, language technology, NLP

View full metadata

University of Helsinki

University of Helsinki Data catalogue

doclevel-MT-benchmark-discoMT2019

Restricted Availability

Date

Persistent identifier of the Data Catalogue metadata

Creator/contributor

Editor

Journal title

Journal volume

Publisher

Publication Type

Peer Review Status

Repositories

Access rights

ISBN

ISSN

Description

Link to original dataset

Keyword (yso)

Keyword

Publication Series

Journal title

Location of the original dataset