We offer tools and data for natural language analysis and processing:


Corpora are the fundamental prerequisite of natural language processing tools. We provide you with monolingual, comparable, and parallel corpora in many languages. Read more

Text analysis

LinA is a multilingual and highly efficient pipeline of text analysis tools based on cutting edge technology (Apache UIMA, Apache OpenNLP). It comprises morphological analysis, part-of-speech tagging, lemmatization, and other language processing tools. Read more

Context Dictionary

In order to feed our online context dictionary with more bilingual example sentences we have built BSP – the Example Sentence Generation Pipeline. BSP is a webcrawler that identifies web pages that are translations of each other, extracts the pages‘ contents, splits them into sentences using LinA, and aligns the sentences. Then, some very advanced classifiers sort the wheat from the chaff… Read more

Semantic Similarity

DISCO is a Java tool that allows to retrieve the semantic similarity between arbitrary words. The similarities are based on the statistical analysis of very large text collections. Read more


Wie provide several language APIs.

If you’d like to stay informed about corpora updates and new tools for text analysis you can subscribe to linguatools newsletter by providing your email address.