Spanish Confusions Corpus

Behavioral data and stimuli from a large-scale corpus of noise-induced misperceptions in Spanish.

Authors: Máté Attila Tóth, María Luisa García Lecumberri, Yan Tang, Martin Cooke
Updated: 2014-01-01
Source: https://zenodo.org/record/3521449
Keywords: misperceptions, errors, word-recognition, noise, spanish, audio-data, behavioral-data

Nanny

A multimodal corpus of speech to infant and adult listeners.

Authors: Elizabeth Johnson, Mybeth Lahey, Mirjam Ernestus, Anne Cutler
Updated: 2013-11-08
Source: https://asa.scitation.org/doi/10.1121/1.4828977
Keywords: speech, language, vocabulary, Dutch

English Concreteness Ratings

We have collected concreteness ratings for 40 thousand English lemma words with Amazon Mechanical Turk. The ratings come from a larger list of 63 thousand words and represent all English words known to 85% of the raters. As such, the list can be used as a reference list for future word recognition in (American) English.

Authors: Marc Brysbaert, Victor Kuperman, and Amy Beth Warriner
Updated: 2013-09-15
Source: http://crr.ugent.be/archives/1330
Keywords: word-recognition, linguistics, concreteness, psycholinguistics

The World Atlas of Language Structures

The World Atlas of Language Structures (WALS) is a large database of structural (phonological, grammatical, lexical) properties of languages gathered from descriptive materials (such as reference grammars) by a team of 55 authors.

Authors: Matthew S. Dryer, Martin Haspelmath, et al.
Updated: 2013-04-30
Source: https://wals.info/
Keywords: phonology, semantics, linguistics, grammar, lexical, database, language-structure

EsPal

EsPal is based on an extensible set of data sources, beginning with a 300 million token written database and a 460 million token subtitle database.

Authors: Andrew Duchon, Manuel Perea, Nuria Sebastián-Gallés, Antonia Martí, Manuel Carreiras
Updated: 2013-01-26
Source: https://www.bcbl.eu/databases/espal/
Keywords: phonological-neighbors, frequency, database, neighborhood-density, psycholinguistics, language, Spanish

Warriner English Affective Ratings

We have collected affective norms of valence, arousal, and dominance for 13,915 English words (lemmas). They are a complement of our age-of-acquisition ratings and subtitle word frequencies.

Authors: Marc Brysbaert, Victor Kuperman, and Amy Warriner
Updated: 2013-01-05
Source: http://crr.ugent.be/archives/1003
Keywords: semantics, crowdsourcing, word-frequency, English, emotion

Nijmegen Corpus of Casual Spanish

Around 30 hours of high-quality recordings featuring 52 Spanish speakers from Madrid conversing among friends.

Authors: Mirjam Ernestus, Francisco Torreira
Updated: 2012-09-18
Source: https://mirjamernestus.nl/Ernestus/NCCSp/index.php
Keywords: language, speech, morphology, Spanish

CLEARPOND: Cross-Linguistic Easy-Access Resource for Phonological and Orthographic Neighborhood Densitites

CLEARPOND provides a user-friendly, web-based interface for obtaining Dutch, English, French, German and Spanish phonological and orthographic neighborhood densities (or, PONDs).

Authors: Viorica Marian, James Bartolotti, Sarah Chabal, Anthony Shook
Updated: 2012-04-30
Source: https://clearpond.northwestern.edu/index.html
Keywords: linguistics, phonetics, neighborhood-density, database, lexicon, Dutch, English, French, German, Spanish

Korp

Språkbanken's tool for searching a large dataset of written Swedish text

Authors: Lars Borin, Markus Forsberg and Johan Roxendal
Updated: 2012-04-30
Source: https://spraakbanken.gu.se/korp/#?lang=eng
Keywords: Swedish, Corpus, Språkbanken

British Lexicon Project

Database of lexical decision times for approximately 14,000 English words and nonwords responded to by British participants.

Authors: Emmanuel Keuleers, Paula Lacey, Kathleen Rastle, Marc Brysbaert
Updated: 2012-01-01
Source: http://crr.ugent.be/blp
Keywords: text-database, lexicon, behavioural-perception, visual-word-recognition, lexical-decision, megastudy, virtual-experiment, english