We offer tools and data for natural language analysis and processing:
Corpora
Corpora are the fundamental prerequisite of natural language processing tools. We provide you with monolingual, comparable, and parallel corpora in many languages. Read more
Text analysis
LinA is a multilingual and highly efficient pipeline of text analysis tools based on cutting edge technology (Apache UIMA, Apache OpenNLP). It comprises morphological analysis, part-of-speech tagging, lemmatization, and other language processing tools. Read more
Context Dictionary
In order to feed our online context dictionary with more bilingual example sentences we have built BSP – the Example Sentence Generation Pipeline. BSP is a webcrawler that identifies web pages that are translations of each other, extracts the pages‘ contents, splits them into sentences using LinA, and aligns the sentences. Then, some very advanced classifiers sort the wheat from the chaff… Read more
Semantic Similarity
DISCO is a Java tool that allows to retrieve the semantic similarity between arbitrary words. The similarities are based on the statistical analysis of very large text collections. Read more
API
Wie provide several language APIs.
If you’d like to stay informed about corpora updates and new tools for text analysis you can subscribe to linguatools newsletter by providing your email address.