-
Classification-based spoken text selection for LVCSR language modeling
- Back
Document Title
Classification-based spoken text selection for LVCSR language modeling
Author
Chunwijitra V., Wutiwiwatchai C.
Name from Authors Collection
Affiliations
NECTEC, National Science and Technology Development Agency (NSTDA), 112 Pahonyothin Road, Pathumthani, 12120, Thailand
Type
Article
Source Title
Eurasip Journal on Audio, Speech, and Music Processing
ISSN
16874714
Year
2017
Volume
2017
Issue
1
Open Access
All Open Access, Gold
Publisher
Springer International Publishing
DOI
10.1186/s13636-017-0121-5
Format
Abstract
Large vocabulary continuous speech recognition (LVCSR) has naturally been demanded for transcribing daily conversations, while developing spoken text data to train LVCSR is costly and time-consuming. In this paper, we propose a classification-based method to automatically select social media data for constructing a spoken-style language model in LVCSR. Three classification techniques, SVM, CRF, and LSTM, trained by words and parts-of-speech are comparatively experimented to identify the degree of spoken style in each social media sentence. Spoken-style utterances are chosen by incremental greedy selection based on the score of the SVM or the CRF classifier or the output classified as “spoken” by the LSTM classifier. With the proposed method, just 51.8, 91.6, and 79.9% of the utterances in a Twitter text collection are marked as spoken utterances by the SVM, CRF, and LSTM classifiers, respectively. A baseline language model is then improved by interpolating with the one trained by these selected utterances. The proposed model is evaluated on two Thai LVCSR tasks: social media conversations and a speech-to-speech translation application. Experimental results show that all the three classification-based data selection methods clearly help reducing the overall spoken test set perplexities. Regarding the LVCSR word error rate (WER), they achieve 3.38, 3.44, and 3.39% WER reduction, respectively, over the baseline language model, and 1.07, 0.23, and 0.38% WER reduction, respectively, over the conventional perplexity-based text selection approach. © 2017, The Author(s).
Keyword
Industrial Classification
Knowledge Taxonomy Level 1
Knowledge Taxonomy Level 2
Knowledge Taxonomy Level 3
License
N/A
Rights
N/A
Publication Source
Scopus