Automatic Linguistic Unit Count Estimator (ALICE)

Authors:	Okko Räsänen, Shreyas Seshadri, Marvin Lavechin, Alejandrina Cristia, Marisa Casillas
Updated:	Tue 02 November 2021
Source:	https://github.com/orasanen/ALICE
Type:	Github repository
Languages:	Argentinian Spanish, Tseltal, Yélî Dnye, English
Keywords:	language, linguistics, phonetics, speech-production, Argentinian Spanish, Tseltal, Yélî Dnye, English
Open Access:	yes
License:	https://github.com/orasanen/ALICE/blob/new_diarizer/docs/license.md
Documentation:	https://github.com/orasanen/ALICE/tree/new_diarizer/docs
Publications:	Räsänen, O., Seshadri, S., Lavechin, M. et al. (2021). ALICE: An open-source tool for automatic measurement of phoneme, syllable, and word counts from child-centered daylong recordings. Behavior Research Methods. 53, 818–835. https://doi.org/10.3758/s13428-020-01460-x
Citation:	Räsänen, O., Seshadri, S., Lavechin, M., Cristia, A. & Casillas, M. (2021): ALICE: An open-source tool for automatic linguistic unit count estimation from child-centered daylong recordings. Behavior Research Methods. https://link.springer.com/article/10.3758/s13428-020-01460-x.
Summary:	ALICE uses SylNet for feature extraction and voice type classifier for broad-class speaker diarization. The used model for linguistic unit counts has been optimized across four languages: Argentinian Spanish, Tseltal, Yélî Dnye, and American and UK variants of English. SylNet uses a model that has been adapted for daylong child-centered audio, starting from the baseline model available in standard SylNet. ALICE outputs an estimate for the number of phonemes, syllables, and words in the input. Only speech detected as spoken by adult male or female talkers is considered towards the counts. Unit counts from ALICE are not (and are not meant to be) accurate at short time-scales, but optimized for counting across several minutes of audio. Also note that ALICE is NOT designed for "typical" high-quality audio recordings, and may not operate on such data properly.