PBCM: Code-mixed Hindi-English corpus

Authors: Ayushi PandeyBrij Mohan Lal SrivastavaRohit KumarBT NelloreKS TejaSV Gangashetty
Updated: Mon 24 December 2018
Source: https://brijmohan.github.io/publication/pbcm-lrec18/
Type: audio data
Languages: hindi, english, hindi-english
Keywords: code-mixinghindi-englishhindienglishmulti-speaker
Open Access: yes
License:
Documentation: http://www.lrec-conf.org/proceedings/lrec2018/pdf/940.pdf
Publications: Pandey et al. (2018)
Citation: Pandey, A., Srivastava, B. M. L., Kumar, R., Nellore, B. T., Teja, K. S., & Gangashetty, S. V. (2018). Phonetically balanced code-mixed speech corpus for Hindi-English automatic speech recognition. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC).
Summary:

This corpus is a phonetically balanced read-speech corpus of code-mixed Hindi-English. The speech data for this corpus was recorded by 113 native Hindi speakers (58 male, and 55 female), all of whom were fluent in English. The sentences have been sourced from selected sections of Hindi newspapers, and have frequent English words embedded within Hindi sentences.