Louis Onrust (lama-fan) MA MSc
The research is part of a joint-doctorate degree between the Radboud University Nijmegen and the KU Leuven. In Nijmegen with the group of Antal van den Bosch, we focus on example-based language learning, where we learn language models solely by looking at words and their patterns. In Leuven, with the group of Hugo Van hamme, they have a strong focus on speech recognition, and associated problems of reducing the dimensionality of training data by means of latent variable modelling, to keep the models computationally tractable.
One of the major problems in natural language processing is cross-domain language modelling. If we train on medical texts, and evaluate on medical texts, we expect a reasonable performance. However, if we train on medical texts and evaluate on legal texts, the performance slips. Although this seems obvious, in more similar scenarios such as Wikipedia texts and newspaper texts we see the same behaviour. We hypothesise that part of the answer is to develop rich example-based language models that capture in a statistical and implicit way, regularities underlying sequences in words. We aim to do so by combining rich representations of words with Bayesian non-parametric models.
Dreams, the involuntary perceptions that occur in our minds during sleep, have been the topic of studies in many fields of research, including psychiatry, psychology, neurobiology, and religious studies. Their narrative content also links dreams to other forms of storytelling, with sharp distinctions (such as the focus on one's personal life and the typical personal perspective) but also interesting overlaps with genres such as orally transmitted folktales. We present a study on dreams aimed at the large-scale analysis of dreams using text analytics.
Improving cross-domain n-gram language modelling with skipgrams
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Association for Computational Linguistics, 2016
Full text (external), RIS, BibTex