Authors: Nelleke OostdijkMartin Reynaert
Updated: Thu 01 December 2011
Type: text corpus
Languages: Dutch
Keywords: languagelinguisticsspeechDutch
Open Access: yes
License: none
Publications: Oostdijk, N., Reynaert, M., Hoste, V., Schuurman, I. (2013). The Construction of a 500 Million Word Reference Corpus of Contemporary Written Dutch in: Essential Speech and Language Technology for Dutch: Results by the STEVIN-programme (eds. P. Spyns, J. Odijk), Springer Verlag.
Citation: Oostdijk, N. (2011). SoNaR: STEVIN Dutch Reference Corpus. STEVIN Program.

The SoNaR project aims to build a large corpus (minimum 500 million words) of contemporary written Dutch that can serve as a general reference for all kinds of research into language and language use. This includes descriptive research (as reflected in eg dictionaries and grammars), but also research in the field of language and speech technology. For such research it is very important that large amounts of data are available with the possibility to process this data with your own software.