Ko van der Sloot

Ko van der Sloot

Scientific Programmer Centre for Language Studies since Sept. 1, 2015
E4.04 k.vandersloot@let.ru.nl

Publications

M. Reynaert, M. van Gompel, K. van der Sloot, and A. van den Bosch
PICCL: Philosophical Integrator of Computational and Corpus Libraries
Proceedings of {CLARIN} {A}nnual {C}onference 2015 -- {B}ook of {A}bstracts, CLARIN ERIC, 2015
Full text (external), RIS, BibTex
H. P. Maat, R. Kraf, A. van den Bosch, N. Dekker, M. van Gompel, S. Kleijn, T. Sanders, and K. van der Sloot
T-Scan: a new tool for analyzing Dutch text
Computational Linguistics in the Netherlands Journal, 4, 2014
RIS, BibTex

Software

Frog

Frog

by Antal van den Bosch , Maarten van Gompel , Ko van der Sloot https://languagemachines.github.io/frog

Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl, the Tilburg memory-based learning software package. Most modules were created in the 1990s at the ILK Research Group (Tilburg University, the Netherlands) and the CLiPS Research Centre (University of Antwerp, Belgium). Over the years they have been integrated into a single text processing tool, which is currently maintained and developed by the Language Machines Research Group and the Centre for Language and Speech Technology at Radboud University Nijmegen. A dependency parser, a base phrase chunker, and a named-entity recognizer module were added more recently. Where possible, Frog makes use of multi-processor support to run subtasks in parallel.

MBT: Memory-based tagger generator and tagging

MBT: Memory-based tagger generator and tagging

by Antal van den Bosch , Ko van der Sloot https://github.com/LanguageMachines/mbt/

MBT is a memory-based tagger-generator and tagger in one. The tagger-generator part can generate a sequence tagger on the basis of a training set of tagged sequences; the tagger part can tag new sequences. MBT can, for instance, be used to generate part-of-speech taggers or chunkers for natural language processing. It has also been used for named-entity recognition, information extraction in domain-specific texts, and disfluency chunking in transcribed speech.

T-scan: Tekst Complexiteits Analyse voor het Nederlands

T-scan: Tekst Complexiteits Analyse voor het Nederlands

by Maarten van Gompel , Ko van der Sloot , Rogier Kraf, Martijn van der Klis https://github.com/proycon/tscan/

T-scan is an analysis tool for dutch texts to assess the complexity of the text, and is based on original work by Rogier Kraf (Utrecht University) [See: Kraf et al., 2009]. The code has been reimplemented and extended by Ko van der Sloot (Tilburg University), and is currently maintained and continued by Martijn van der Klis (Utrecht University)

TiMBL: Tilburg Memory-Based Learner

TiMBL: Tilburg Memory-Based Learner

by Antal van den Bosch , Maarten van Gompel , Ko van der Sloot , Walter Daelemans, Jakub Zavrel https://languagemachines.github.io/timbl

TiMBL is an open source software package implementing several memory-based learning algorithms, among which IB1-IG, an implementation of k-nearest neighbor classification with feature weighting suitable for symbolic feature spaces, and IGTree, a decision-tree approximation of IB1-IG. All implemented algorithms have in common that they store some representation of the training set explicitly in memory. During testing, new cases are classified by extrapolation from the most similar stored cases. For over fifteen years TiMBL has been mostly used in natural language processing as a machine learning classifier component, but its use extends to virtually any supervised machine learning domain. Due to its particular decision-tree-based implementation, TiMBL is in many cases far more efficient in classification than a standard k-nearest neighbor algorithm would be.

Ucto: Unicode Tokenizer

Ucto: Unicode Tokenizer

by Maarten van Gompel , Ko van der Sloot https://languagemachines.github.io/ucto

Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic preprocessing steps such as changing case that you can all use to make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation.