Open Access

Downloads

Download data is not yet available.

Abstract

Bilingual dictionaries are vital tools for automated machine translation. Leveraging advanced machine learning techniques, it is possible to construct bilingual dictionaries by automatically learning lexical mappings from bilingual corpora. However, procuring extensive bilingual corpora for low-resource languages, such as Bahnaric, poses a significant challenge. Recent studies suggest that non-parallel corpora, supplemented with a handful of anchor words, can aid in the learning of these mappings, which contain parameters for automated translation between source and target languages. The prevailing methodology involves using Generative Adversarial Networks (GANs) and solving the Procrustes orthogonal problem to generate this mapping. This approach, while innovative, exhibits instability and demands substantial computational resources, posing potential issues in rural regions where Bahnaric is spoken natively. To mitigate this, we propose a low-rank adaptation strategy, where the limitations of GANs can be circumvented by directly calculating the rigid transformation between the source and target languages. We evaluated our approach using the French-English dataset, and a low-resource dataset, Vietnamese-Bahnaric. Notably, the Vietnamese-Bahnaric lexical mapping produced by our method is valuable not only to the field of computer science, but also contributes significantly to the preservation of Bahnaric cultural heritage within Vietnam’s ethnic minority communities.



Author's Affiliation
  • Huy Cam La

    Google Scholar Pubmed

  • Minh Quang Le

    Google Scholar Pubmed

  • Oanh Ngoc Tran

    Google Scholar Pubmed

  • Dong Duc Le

    Google Scholar Pubmed

  • Duc Quang Nguyen

    Google Scholar Pubmed

  • Sang Tan Nguyen

    Google Scholar Pubmed

  • Quan Tran

    Google Scholar Pubmed

  • Tho Thanh Quan

    Email I'd for correspondance: qttho@hcmut.edu.vn
    Google Scholar Pubmed

Article Details

 Copyright Info

Creative Commons License

Copyright: The Authors. This is an open access article distributed under the terms of the Creative Commons Attribution License CC-BY 4.0., which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

 How to Cite
La, H., Le, M., Tran, O., Le, D., Nguyen, D., Nguyen, S., Tran, Q., & Quan, T. (2024). Low-Rank Adaptation Approach for Vietnamese-Bahnaric Lexical Mapping from Non-Parallel Corpora. VNUHCM Journal of Engineering and Technology, 6(SI8), In press. https://doi.org/https://doi.org/10.32508/stdjet.v6iSI8.1197

 Cited by



Article level Metrics by Paperbuzz/Impactstory
Article level Metrics by Altmetrics

 Article Statistics
HTML = 341 times
Online First   = 186 times
Total   = 186 times