-
Alaryngeal Speech Generation Using MaskCycleGAN-VC and Timbre-Enhanced Loss
- Back
Document Title
Alaryngeal Speech Generation Using MaskCycleGAN-VC and Timbre-Enhanced Loss
Author
Lwin H.Y. Kumwilaisak W. Hansakunbuntheung C. Thatphithakkul N.
Affiliations
King Mongkut s University of Technology Thonburi Bangkok Thailand; National Science and Technology Development Agency Pathum Thani Thailand
Type
Conference Paper
Source Title
ACM International Conference Proceeding Series
Year
2023
Open Access
All Open Access Hybrid Gold
Publisher
Association for Computing Machinery
DOI
10.1145/3628454.3631582
Abstract
This paper introduces a data augmentation technique for alaryngeal speech using voice conversion within the MaskCycleGAN-VC framework [6]. Our method leverages two masking techniques: Articulatory Dimension Masking (ADM) and the combination of ADM with Consecutive Time Masking (CTM) called SpecAugment[11]. The initial technique used for masking within the MaskCycleGAN-VC framework is CTM and our proposed additional masking techniques enhance the quality and performance of voice conversion for alaryngeal speech. We can also expand the variability of voice characteristics within the converted alaryngeal speech dataset. One notable enhancement in our approach is incorporating a timbre similarity score into the generator loss known as the Timbre Enhanced Loss. This score dynamically guides the conversion process to prioritize preserving timbral characteristics during voice transformation. From our experiments using different objective metrics the proposed method can provide synthesized alaryngeal speeches having characteristics close to the actual ones. ? 2023 Owner/Author.
Industrial Classification
Knowledge Taxonomy Level 1
Knowledge Taxonomy Level 2
Knowledge Taxonomy Level 3
License
CC BY
Rights
Author
Publication Source
WOS