Projects

AfriBooms

Building a treebank for Afrikaans

July, 2013 - June, 2014

While a wide range of resources and technologies exist for Dutch, enabling the implementation of sophisticated language technologies and advanced linguistic research, Afrikaans is still considered a resource-scarce language. Due to the under-resourced status of Afrikaans and the fact that it is closely-related to Dutch, it often turns out to be faster and cheaper to use and adapt existing Dutch tools than to develop tools for Afrikaans from scratch. The aim of this project is the creation of an Afrikaans treebank using a Dutch parser. The treebank will be included in GrETEL in order to make it searchable online.

GrETEL 2.0

Querying very large treebanks

June, 2013 - May, 2014

The aim of this project is the adaption of GrETEL for querying very large treebanks, such as the reference corpus for written Dutch (ca. 500M tokens, 41M sentences). The main challenge is to make this huge treebank searchable as fast as the small treebanks that are currently included in GrETEL.

NEDERBOOMS

Exploitation of Dutch treebanks for linguistic research

October, 2010 - February, 2012

The recent construction of large treebanks for spoken and written Dutch (CGN, LASSY, SoNaR) has created new and exciting opportunities for the empirical investigation of Dutch syntax and semantics. At the moment the exploitation of those treebanks requires knowledge of specific data structures and query languages. The purpose of this project is the development of user-friendly and well-documented tools for the exploitation of treebanks by linguists who are not familiar with language technology. The Nederbooms project is in line with the main CLARIN goal of applying the results of speech and language technology to research in the humanities.