Annotated sets of compounds
Annotated compounds in German and Swedish. The zipped file contains four compound data sets, two in German and two in Swedish. In all cases the sets are running words from Europarl, annotated with compounds in two manners.
- 1-to-1: compounds only annotated if the parts are in 1-to-1 corresponednce with the English Europarl translation (see Koehn and Knight, EACL 2003).
- No suffix: compounds annotated based on linguistic intuition.
For references see:
- Swedish: Sara Stymne, Maria Holmqvist and Lars Ahrenberg. Effects of Morphological Analysis in Translation between German and English. In Proceedings of the ACL 2008 Third Workshop on Statistical Machine Translation. Pages 135-138. June 19, 2008. Columbus, Ohio. (pdf)
- German 1-to-1: Sara Stymne. German Compounds in Factored Statistical Machine Translation. In Proceedings of GoTAL, 6th International Conference on Natural Language Processing, ed: A. Ranta and B. Nordström, Springer LNCS/LNAI Volume 5221. Pages 464-475. August 25-27, 2008. Gothenburg, Sweden. (pdf)