Speech and Language Resource Bank
The Spoken Dutch Corpus (CGN) contains 900 hours (and approximately 3.3 million words) of Dutch and Flemish speech.
Authors: W.J.M. Levelt,
S.G. Nooteboom,
J. Bil,
G.E. Booij,
P. Dengis,
E. DeWallef,
A. Hulk,
B. Krekels,
C. Lucas,
D. Van Compernolle,
W. Vonk
Updated: 2014-07-30
Source: https://ivdnt.org/images/stories/producten/documentatie/cgn_website/doc_English/topics/index.htm
Keywords: language,
phonology,
syntax,
word-frequency,
Dutch
EsPal is based on an extensible set of data sources, beginning with a 300 million token written database and a 460 million token subtitle database.
Authors: Andrew Duchon,
Manuel Perea,
Nuria Sebastián-Gallés,
Antonia Martí,
Manuel Carreiras
Updated: 2013-01-26
Source: https://www.bcbl.eu/databases/espal/
Keywords: phonological-neighbors,
frequency,
database,
neighborhood-density,
psycholinguistics,
language,
Spanish
This is a corpus of four European sign languages. It contains linguistically annotated video files of Sign Language of the Netherlands (Nederlandse Gebarentaal), British Sign Language, and Swedish Sign Language; data include narratives, dialogues, small lexicons, and poetry.
Authors: Stephen Levinson and Louis Boves
Updated: 2010-04-30
Source: http://sign-lang.ruhosting.nl/echo/
Keywords: language,
dialogue,
lexicon,
Dutch Sign Language,
British Sign Language,
Swedish Sign Language