Voice conversion for natural-Sounding speech generation on low-Resource languages: A case study of bahnaric

Dang Tran Dat; Tang Quoc Thai; Nguyen Quang Duc; Vo Duy Hung; Quan Thanh Tho

doi:10.32508/stdjet.v6iSI8.1198

Article
Details
Citation
Metrics

Open Access

Downloads

Download data is not yet available.

Abstract

Bahnar is an ethnic minority group in Vietnam, prioritized by the government for the preservation of their cultural heritage, traditions, and language. In the current era of AI technology, there is substantial potential in synthesizing Bahnar voices to support these preservation endeavors. While voice conversion technology has made strides in enhancing the quality and naturalness of synthesized speech, its focus has predominantly been on widely spoken languages. Consequently, low-resource languages like the Bahnaric language family encounter numerous disadvantages in voice synthesis. This study addresses the formidable challenge of synthesizing natural-sounding speech in low-resource languages by exploring the application of voice conversion techniques to the Bahnaric language. We introduce the BN-TTS-VC system, a pioneering approach that integrates a text-to-speech system based on Grad-TTS with voice conversion techniques derived from StarGANv2-VC, both tailored specifically for the nuances of the Bahnaric language. Grad-TTS allows the system to articulate Bahnaric words without vocabulary limitations, while StarGANv2-VC enhances the naturalness of synthesized speech, particularly in the context of low-resource languages like Bahnaric. Moreover, we introduce the Bahnaric-fine-tuned HiFi-GAN model to further enhance voice quality with native accents, ensuring a more authentic representation of Bahnaric speech. To assess the effectiveness of our approach, we conducted experiments based on human evaluations from volunteers. The preliminary results are promising, indicating the potential of our methodology in synthesizing natural-sounding Bahnaric speech. Through this research, we aim to make significant contributions to the ongoing efforts to preserve and promote the linguistic and cultural heritage of the Bahnar ethnic minority group. By leveraging the power of AI technology, we aspire to bridge the gap in speech synthesis for low-resource languages and facilitate the preservation of their invaluable cultural heritage.

Comments

Author's Affiliation

Dang Tran Dat

Faculty of Computer Science and Engineering, Ho Chi Minh City University of Technology (HCMUT), Ho Chi Minh City, Vietnam Vietnam National University Ho Chi Minh City, Ho Chi Minh City, Vietnam
Google Scholar Pubmed

Tang Quoc Thai

Faculty of Computer Science and Engineering, Ho Chi Minh City University of Technology (HCMUT), Vietnam; Vietnam National University Ho Chi Minh City (VNU-HCM), Vietnam
Google Scholar Pubmed

Nguyen Quang Duc

Faculty of Computer Science and Engineering, Ho Chi Minh City University of Technology (HCMUT), Vietnam; Vietnam National University Ho Chi Minh City (VNU-HCM), Vietnam
Google Scholar Pubmed

Vo Duy Hung

Faculty of Computer Science and Engineering, Ho Chi Minh City University of Technology (HCMUT), Vietnam; Vietnam National University Ho Chi Minh City (VNU-HCM), Vietnam
Google Scholar Pubmed

Quan Thanh Tho

Faculty of Computer Science and Engineering, Ho Chi Minh City University of Technology (HCMUT), Vietnam; Vietnam National University Ho Chi Minh City (VNU-HCM), Vietnam

Email I'd for correspondance: qttho@hcmut.edu.vn
Google Scholar Pubmed

Article Details

Issue: Vol 6 No SI8 (2023): Vol 6 (SI8): Advanced technologies for computer science and engineering 2023

Page No.: In press

Published: Jun 7, 2024

Section: Research article

DOI: https://doi.org/10.32508/stdjet.v6iSI8.1198

Copyright: The Authors. This is an open access article distributed under the terms of the Creative Commons Attribution License CC-BY 4.0., which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

How to Cite

Dat, D., Thai, T., Duc, N., Hung, V., & Tho, Q. (2024). Voice conversion for natural-Sounding speech generation on low-Resource languages: A case study of bahnaric. VNUHCM Journal of Engineering and Technology, 6(SI8), In press. https://doi.org/https://doi.org/10.32508/stdjet.v6iSI8.1198

Download Citation

Cited by

Article level Metrics by Paperbuzz/Impactstory

Article level Metrics by Altmetrics

Article Statistics

HTML = 428 times
Online First = 162 times
Total = 162 times

VNUHCM Journal of

Engineering and Technology

An official journal of Viet Nam National University Ho Chi Minh City, Viet Nam since 2018

ISSN 2615-9872

HTML

428

Total

162

Citations

Share

Voice conversion for natural-Sounding speech generation on low-Resource languages: A case study of bahnaric

Dang Tran Dat

Tang Quoc Thai

Nguyen Quang Duc

Vo Duy Hung

Quan Thanh Tho

Downloads

Abstract

Dang Tran Dat

Tang Quoc Thai

Nguyen Quang Duc

Vo Duy Hung

Quan Thanh Tho

INFORMATION

FOR AUTHORS

CONTACT US

VNUHCM Journal of

Engineering and Technology

An official journal of Viet Nam National University Ho Chi Minh City, Viet Nam since 2018

ISSN 2615-9872

HTML428 Total 162 Citations Share Voice conversion for natural-Sounding speech generation on low-Resource languages: A case study of bahnaric

Dang Tran Dat Tang Quoc Thai Nguyen Quang Duc Vo Duy Hung Quan Thanh Tho

Downloads

Abstract

Dang Tran Dat

Tang Quoc Thai

Nguyen Quang Duc

Vo Duy Hung

Quan Thanh Tho

INFORMATION

FOR AUTHORS

CONTACT US

HTML

428

Total

162

Citations

Share

Voice conversion for natural-Sounding speech generation on low-Resource languages: A case study of bahnaric

Dang Tran Dat

Tang Quoc Thai

Nguyen Quang Duc

Vo Duy Hung

Quan Thanh Tho