open   documented  

The English Crowdsourcing Project (L2 speakers)

The English Crowdsourcing Project (L2) contains word recognition times for 61,851 English words in a Y/N vocabulary recognition task.

Authors:  Marc BrysbaertEmmanuel KeuleersPawel Mandera
Updated:  2020-05-18
Source:  http://crr.ugent.be/programs-data/lexicon-projects
Keywords:  word-prevalenceword-frequencyword-knowledgecrowdsourcinglanguage-acquisitionvocabulary

open   documented  

The French Lexicon Project

The French Lexicon Project contains lexical decision times for over 38,000 French words.

Authors:  Ludovic FerrandBoris NewMarc BrysbaertEmmanuel KeuleersPatrick BoninAlain MeotMaria AugustinovaChristophe Pallier
Updated:  2020-02-03
Source:  https://osf.io/f8kc4/
Keywords:  lexiconvocabularyFrenchword-frequencyword-recognition

open  

The Hansard Corpus

A corpus of the speeches given in the British Parliament from 1803-2005.

Authors:  Marc AlexanderFraser DallachyStephen WattamPaul RaysonMark Davies
Updated:  2016-09-22
Source:  https://www.english-corpora.org/hansard/
Keywords:  Englishsemanticslanguagelinguisticscorporacollocatesword-frequency

open   documented  

Corpus Gesproken Nederlands (The Spoken Dutch Corpus)

The Spoken Dutch Corpus (CGN) contains 900 hours (and approximately 3.3 million words) of Dutch and Flemish speech.

Authors:  W.J.M. LeveltS.G. NooteboomJ. BilG.E. BooijP. DengisE. DeWallefA. HulkB. KrekelsC. LucasD. Van CompernolleW. Vonk
Updated:  2014-07-22
Source:  https://ivdnt.org/images/stories/producten/documentatie/cgn_website/doc_English/topics/index.htm
Keywords:  languagephonologysyntaxword-frequencyDutch

open  

Warriner English Affective Ratings

We have collected affective norms of valence, arousal, and dominance for 13,915 English words (lemmas). They are a complement of our age-of-acquisition ratings and subtitle word frequencies.

Authors:  Marc BrysbaertVictor Kupermanand Amy Warriner
Updated:  2013-01-05
Source:  http://crr.ugent.be/archives/1003
Keywords:  semanticscrowdsourcingword-frequencyEnglishemotion

open   documented  

The CMU Statistical Language Modeling Toolkit

The CMU-Cambridge Statistical Language Modeling toolkit is a suite of UNIX software tools to facilitate the construction and testing of statistical language models. The SLM toolkit is meant for large amounts of training data.

Authors:  Ronald Rosenfeld & Philip Clarkson
Updated:  1999-06-07
Source:  http://www.speech.cs.cmu.edu/SLM_info.html
Keywords:  languagedataexperimentword-frequency