Vox Populi

Authors: Chanhan WangMorgane RiviereAnn LeeAnne WuChaitanya TalnikarDaniel HazizaMary WIlliamsonJuan PinoEmmanuel Dupoux
Updated: Fri 30 April 2021
Source: https://aclanthology.org/2021.acl-long.80/
Type: Corpus
Languages: English, German, French, Spanish, Polish, Italian, Romanian, Hungarian, Czech, Dutch, Finnish, Hungarian, Slovak, Slovenian, Estonian, Lithuanian, Portuguese, Bulgarian, Greek,Latvian, Maltese, Swedish, Danish
Keywords: EnglishGermanFrenchSpanishPolishItalianRomanianHungarianCzechDutchFinnishSlovakSlovenianEstonianLithuanianPortugueseBulgarianGreekLatvianMalteseSwedishDanishspeech synthesismachine learningAccented Speech
Open Access: yes
License:
Documentation: https://github.com/facebookresearch/voxpopuli
Citation: Wang, C., Riviere, M., Lee, A., Wu, A., Talnikar, C., Haziza, D., Williamson, M., Pino, J., & Dupoux, E. (2021). VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (pp. 993–1003). Association for Computational Linguistics.
Summary:

A large-scale multilingual speech corpus for representation learning, semi-supervised learning and interpretation. The Corpus includes recordings of the European Parliament in 23 different languages, as well as pre-created models for speech recognition.