Building a treebank for Afrikaans
July, 2013 - June, 2014
While a wide range of resources and technologies exist for Dutch, enabling the implementation of sophisticated language technologies and advanced linguistic research, Afrikaans is still considered a resource-scarce language. Due to the under-resourced status of Afrikaans and the fact that it is closely-related to Dutch, it often turns out to be faster and cheaper to use and adapt existing Dutch tools than to develop tools for Afrikaans from scratch. The aim of this project is the creation of an Afrikaans treebank using a Dutch parser. The treebank will be included in GrETEL in order to make it searchable online.
Querying very large treebanks
June, 2013 - May, 2014
The aim of this project is the adaption of GrETEL for querying very large treebanks, such as the reference corpus for written Dutch (ca. 500M tokens, 41M sentences). The main challenge is to make this huge treebank searchable as fast as the small treebanks that are currently included in GrETEL.
Exploitation of Dutch treebanks for linguistic research
October, 2010 - February, 2012
The recent construction of large treebanks for spoken and written Dutch (CGN, LASSY, SoNaR) has created new and exciting opportunities for the empirical investigation of Dutch syntax and semantics. At the moment the exploitation of those treebanks requires knowledge of specific data structures and query languages. The purpose of this project is the development of user-friendly and well-documented tools for the exploitation of treebanks by linguists who are not familiar with language technology. The Nederbooms project is in line with the main CLARIN goal of applying the results of speech and language technology to research in the humanities.