Supporting Language Machines

Research Projects

Publications

Software & Demos

CLAM

CLAM

by Maarten van Gompel https://proycon.github.io/clam

CLAM allows you to quickly and transparently transform your Natural Language Processing application into a RESTful webservice, with which both human end-users as well as automated clients can interact.

FoLiA: Format for Linguistic Annotation

FoLiA: Format for Linguistic Annotation

by Maarten van Gompel https://proycon.github.io/folia

FoLiA is an XML-based annotation format, suitable for the representation of linguistically annotated language resources. FoLiA’s intended use is as a format for storing and/or exchanging language resources, including corpora.

Fowlt.net

Fowlt.net

by Antal van den Bosch , Wessel Stoop http://fowlt.net/

Fowlt is a spelling correction system for English.

Frog

Frog

by Antal van den Bosch , Maarten van Gompel , Ko van der Sloot https://languagemachines.github.io/frog

Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl, the Tilburg memory-based learning software package. Most modules were created in the 1990s at the ILK Research Group (Tilburg University, the Netherlands) and the CLiPS Research Centre (University of Antwerp, Belgium). Over the years they have been integrated into a single text processing tool, which is currently maintained and developed by the Language Machines Research Group and the Centre for Language and Speech Technology at Radboud University Nijmegen. A dependency parser, a base phrase chunker, and a named-entity recognizer module were added more recently. Where possible, Frog makes use of multi-processor support to run subtasks in parallel.

LaMachine

LaMachine

by Maarten van Gompel https://proycon.github.io/LaMachine

LaMachine is not a single tool, but is a distribution of almost all our software bundled in three different ways to facilitate use on a wide variety of systems. LaMachine can be used as a Virtual Machine - Easiest, allowing you to run our software on any host OS, as a Docker application, or as a compilation/installation script in a virtual environment. It contains software such as Timbl, ucto, Frog, colibri core and all the Python bindings.

MBT: Memory-based tagger generator and tagging

MBT: Memory-based tagger generator and tagging

by Antal van den Bosch , Ko van der Sloot https://github.com/LanguageMachines/mbt/

MBT is a memory-based tagger-generator and tagger in one. The tagger-generator part can generate a sequence tagger on the basis of a training set of tagged sequences; the tagger part can tag new sequences. MBT can, for instance, be used to generate part-of-speech taggers or chunkers for natural language processing. It has also been used for named-entity recognition, information extraction in domain-specific texts, and disfluency chunking in transcribed speech.

TiMBL: Tilburg Memory-Based Learner

TiMBL: Tilburg Memory-Based Learner

by Antal van den Bosch , Maarten van Gompel , Ko van der Sloot , Walter Daelemans, Jakub Zavrel https://languagemachines.github.io/timbl

TiMBL is an open source software package implementing several memory-based learning algorithms, among which IB1-IG, an implementation of k-nearest neighbor classification with feature weighting suitable for symbolic feature spaces, and IGTree, a decision-tree approximation of IB1-IG. All implemented algorithms have in common that they store some representation of the training set explicitly in memory. During testing, new cases are classified by extrapolation from the most similar stored cases. For over fifteen years TiMBL has been mostly used in natural language processing as a machine learning classifier component, but its use extends to virtually any supervised machine learning domain. Due to its particular decision-tree-based implementation, TiMBL is in many cases far more efficient in classification than a standard k-nearest neighbor algorithm would be.

Ucto: Unicode Tokenizer

Ucto: Unicode Tokenizer

by Maarten van Gompel , Ko van der Sloot https://languagemachines.github.io/ucto

Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic preprocessing steps such as changing case that you can all use to make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation.

Valkuil.net

Valkuil.net

by Antal van den Bosch , Maarten van Gompel http://valkuil.net/

Valkuil is a Dutch spelling correction system.