Classification-based spoken text selection for LVCSR language modeling

Classification-based spoken text selection for LVCSR language modeling
Back

bibtex

Document Title

Classification-based spoken text selection for LVCSR language modeling

Author

Chunwijitra V., Wutiwiwatchai C.

Name from Authors Collection

Affiliations

NECTEC, National Science and Technology Development Agency (NSTDA), 112 Pahonyothin Road, Pathumthani, 12120, Thailand

Type

Article

Source Title

Eurasip Journal on Audio, Speech, and Music Processing

ISSN

16874714

Year

2017

Volume

2017

Issue

Open Access

All Open Access, Gold

Publisher

Springer International Publishing

DOI

10.1186/s13636-017-0121-5

Format

PDF

Abstract

Large vocabulary continuous speech recognition (LVCSR) has naturally been demanded for transcribing daily conversations, while developing spoken text data to train LVCSR is costly and time-consuming. In this paper, we propose a classification-based method to automatically select social media data for constructing a spoken-style language model in LVCSR. Three classification techniques, SVM, CRF, and LSTM, trained by words and parts-of-speech are comparatively experimented to identify the degree of spoken style in each social media sentence. Spoken-style utterances are chosen by incremental greedy selection based on the score of the SVM or the CRF classifier or the output classified as “spoken” by the LSTM classifier. With the proposed method, just 51.8, 91.6, and 79.9% of the utterances in a Twitter text collection are marked as spoken utterances by the SVM, CRF, and LSTM classifiers, respectively. A baseline language model is then improved by interpolating with the one trained by these selected utterances. The proposed model is evaluated on two Thai LVCSR tasks: social media conversations and a speech-to-speech translation application. Experimental results show that all the three classification-based data selection methods clearly help reducing the overall spoken test set perplexities. Regarding the LVCSR word error rate (WER), they achieve 3.38, 3.44, and 3.39% WER reduction, respectively, over the baseline language model, and 1.07, 0.23, and 0.38% WER reduction, respectively, over the conventional perplexity-based text selection approach. © 2017, The Author(s).

License

N/A

Rights

N/A

Link

https://www.scopus.com/inward/record.uri?eid=2-s2.0-85031999128&doi=10.1186%2fs13636-017-0121-5&partnerID=40&md5=3289f63922f730c11148c3c44f835400

Publication Source

Scopus

Document Title

Author

Name from Authors Collection

Chunwijitra V.

Scopus Author ID

ORCID ID

Chai Wutiwiwatchai

Scopus Author ID

ORCID ID

Affiliations

Type

Source Title

ISSN

Year

Volume

Issue

Open Access

Publisher

DOI

Format

Abstract

Keyword

Industrial Classification

Knowledge Taxonomy Level 1

Knowledge Taxonomy Level 2

Knowledge Taxonomy Level 3

License

Rights

Link

Publication Source

Continue browsing

Classification-based spoken text selection for LVCSR language modeling

Share

Document Title

Author

Name from Authors Collection

Chunwijitra V.

Scopus Author ID

ORCID ID

Chai Wutiwiwatchai

Scopus Author ID

ORCID ID

Affiliations

Type

Source Title

ISSN

Year

Volume

Issue

Open Access

Publisher

DOI

Format

Abstract

Keyword

Industrial Classification

Knowledge Taxonomy Level 1

Knowledge Taxonomy Level 2

Knowledge Taxonomy Level 3

License

Rights

Link

Publication Source

Continue browsing