
What is a treebank?

A treebank or parsed corpus is a text corpus in which every sentence is syntactically annotated.

Dutch treebanks

There exists several treebanks for Dutch, such as the Alpino Treebank, the CGN treebank, and LASSY.

Treebanks in the Nederbooms tools

The official version of the LASSY small treebank is used. In order to include the CGN treebank in the Nederbooms tools, the official version (Tiger-XML) is converted to Alpino-XML format by Gertjan van Noord.



The annotations of both LASSY small and the CGN treebank are manually corrected. For LASSY, the Alpino parser is used as a first annotation step, but for CGN this was not the case. Although the annotations of both treebanks are largely the same, there are some differences. The major difference is the lack of Alpino POS tags in the CGN treebank.