SpiCE: Speech in Cantonese and English

Authors: Khia A. JohnsonMolly BabelIvan FongNancy Yiu
Updated: Thu 20 May 2021
Source: https://doi.org/10.5683/SP2/MJOXP3
Type: speech-database
Languages: cantonese, english
Keywords: bilingualconversationcorpuscantoneseenglish
Open Access: yes
License: CC BY 4.0
Documentation: https://spice-corpus.readthedocs.io/
Publications: Johnson (2021), Johnson et al. (2020)
Citation: Johnson, K. A. (2021). SpiCE: Speech in Cantonese and English. Scholars Portal Dataverse. Version 1. https://doi.org/10.5683/SP2/MJOXP3; Johnson, K. A., Babel, M., Fong, I., & Yiu, N. (2020). SpiCE: A New Open-Access Corpus of Conversational Bilingual Speech in Cantonese and English. Proceedings of The 12th Language Resources and Evaluation Conference, 4089–4095.

This is the Speech in Cantonese and English (SpiCE) corpus. SpiCE is an audio corpus of conversational Cantonese-English bilingual speech recorded in Vancouver, Canada during 2018-2020. The corpus includes high-quality recordings of 34 early bilinguals in both English and Cantonese. Participants completed a sentence reading task, storyboard narration, and conversational interview in each language. These different speech tasks are available in a single audio file for each language for each talker. A Praat textgrid file accompanies each audio file. The textgrids provide hand-corrected orthographic transcription and phoneme-level forced-alignment in Cantonese and English. As an open-access language resource, SpiCE will promote bilingualism research for a typologically distinct pair of languages, of which Cantonese remains understudied despite there being millions of speakers around the world. The SpiCE corpus is especially well-suited for phonetic research on conversational speech, and enables researchers to study cross-language within-speaker phenomena for a diverse group of early Cantonese-English bilinguals. These are areas with few existing high-quality resources. Corpus documentation is available at: https://spice-corpus.readthedocs.io/.