Open Access

Downloads

Download data is not yet available.

Abstract

Bahnar is an ethnic minority group in Vietnam, prioritized by the government for the preservation of their cultural heritage, traditions, and language. In the current era of AI technology, there is substantial potential in synthesizing Bahnar voices to support these preservation endeavors. While voice conversion technology has made strides in enhancing the quality and naturalness of synthesized speech, its focus has predominantly been on widely spoken languages. Consequently, low-resource languages like the Bahnaric language family encounter numerous disadvantages in voice synthesis. This study addresses the formidable challenge of synthesizing natural-sounding speech in low-resource languages by exploring the application of voice conversion techniques to the Bahnaric language. We introduce the BN-TTS-VC system, a pioneering approach that integrates a text-to-speech system based on Grad-TTS with voice conversion techniques derived from StarGANv2-VC, both tailored specifically for the nuances of the Bahnaric language. Grad-TTS allows the system to articulate Bahnaric words without vocabulary limitations, while StarGANv2-VC enhances the naturalness of synthesized speech, particularly in the context of low-resource languages like Bahnaric. Moreover, we introduce the Bahnaric-fine-tuned HiFi-GAN model to further enhance voice quality with native accents, ensuring a more authentic representation of Bahnaric speech. To assess the effectiveness of our approach, we conducted experiments based on human evaluations from volunteers. The preliminary results are promising, indicating the potential of our methodology in synthesizing natural-sounding Bahnaric speech. Through this research, we aim to make significant contributions to the ongoing efforts to preserve and promote the linguistic and cultural heritage of the Bahnar ethnic minority group. By leveraging the power of AI technology, we aspire to bridge the gap in speech synthesis for low-resource languages and facilitate the preservation of their invaluable cultural heritage.



Author's Affiliation
  • Dang Tran Dat

    Google Scholar Pubmed

  • Tang Quoc Thai

    Google Scholar Pubmed

  • Nguyen Quang Duc

    Google Scholar Pubmed

  • Vo Duy Hung

    Google Scholar Pubmed

  • Quan Thanh Tho

    Email I'd for correspondance: qttho@hcmut.edu.vn
    Google Scholar Pubmed


 Copyright Info

Creative Commons License

Copyright: The Authors. This is an open access article distributed under the terms of the Creative Commons Attribution License CC-BY 4.0., which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

 How to Cite
Dat, D., Thai, T., Duc, N., Hung, V., & Tho, Q. (2024). Voice conversion for natural-Sounding speech generation on low-Resource languages: A case study of bahnaric. VNUHCM Journal of Engineering and Technology, 6(SI8), In press. https://doi.org/https://doi.org/10.32508/stdjet.v6iSI8.1198

 Cited by



Article level Metrics by Paperbuzz/Impactstory
Article level Metrics by Altmetrics

 Article Statistics
HTML = 299 times
Online First   = 100 times
Total   = 100 times