Speech and Language Resource Bank

home search analysis data education experimentation

Page 1 / 1

open documented

The English Crowdsourcing Project (L2 speakers)

The English Crowdsourcing Project (L2) contains word recognition times for 61,851 English words in a Y/N vocabulary recognition task.

Authors: Marc Brysbaert, Emmanuel Keuleers, Pawel Mandera
Updated: 2020-05-18
Source: http://crr.ugent.be/programs-data/lexicon-projects
Keywords: word-prevalence, word-frequency, word-knowledge, crowdsourcing, language-acquisition, vocabulary

open documented

The French Lexicon Project

The French Lexicon Project contains lexical decision times for over 38,000 French words.

Authors: Ludovic Ferrand, Boris New, Marc Brysbaert, Emmanuel Keuleers, Patrick Bonin, Alain Meot, Maria Augustinova, Christophe Pallier
Updated: 2020-02-03
Source: https://osf.io/f8kc4/
Keywords: lexicon, vocabulary, French, word-frequency, word-recognition

open

The Hansard Corpus

A corpus of the speeches given in the British Parliament from 1803-2005.

Authors: Marc Alexander, Fraser Dallachy, Stephen Wattam, Paul Rayson, Mark Davies
Updated: 2016-04-30
Source: https://www.english-corpora.org/hansard/
Keywords: English, semantics, language, linguistics, corpora, collocates, word-frequency

open documented

Corpus Gesproken Nederlands (The Spoken Dutch Corpus)

The Spoken Dutch Corpus (CGN) contains 900 hours (and approximately 3.3 million words) of Dutch and Flemish speech.

Authors: W.J.M. Levelt, S.G. Nooteboom, J. Bil, G.E. Booij, P. Dengis, E. DeWallef, A. Hulk, B. Krekels, C. Lucas, D. Van Compernolle, W. Vonk
Updated: 2014-07-30
Source: https://ivdnt.org/images/stories/producten/documentatie/cgn_website/doc_English/topics/index.htm
Keywords: language, phonology, syntax, word-frequency, Dutch

open

Warriner English Affective Ratings

We have collected affective norms of valence, arousal, and dominance for 13,915 English words (lemmas). They are a complement of our age-of-acquisition ratings and subtitle word frequencies.

Authors: Marc Brysbaert, Victor Kuperman, and Amy Warriner
Updated: 2013-01-05
Source: http://crr.ugent.be/archives/1003
Keywords: semantics, crowdsourcing, word-frequency, English, emotion

open documented

The CMU Statistical Language Modeling Toolkit

The CMU-Cambridge Statistical Language Modeling toolkit is a suite of UNIX software tools to facilitate the construction and testing of statistical language models. The SLM toolkit is meant for large amounts of training data.

Authors: Ronald Rosenfeld & Philip Clarkson
Updated: 1999-06-07
Source: http://www.speech.cs.cmu.edu/SLM_info.html
Keywords: language, data, experiment, word-frequency

Page 1 / 1