Wikipedia parallel quotations corpora

A tiny German-English parallel corpus extracted from the German Wikipedia, where quotations sometimes include the translation and the original language. It contains 6,802 parallel sentences. The corpus can be useful for testing or tuning statistical machine translation systems.

ISLRN 544-353-662-209-8

Download

Download gzipped tar archive containing two parallel files in Moses format (UTF-8): zitate-dewiki-20141024.tgz.

License

The Wikipedia Parallel Quotations Corpus is derived from the Wikipedia and is therefore made available under the same license as Wikipedia: Creative Commons Attribution-ShareAlike license.


If you’d like to stay informed about corpora updates and new tools for text analysis you can subscribe to linguatools newsletter by providing your email address.