-
Blind Estimation of Speech Transmission Index and Room Acoustic Parameters by Using Extended Model of Room Impulse Response Derived From Speech Signals
- Back
Document Title
Blind Estimation of Speech Transmission Index and Room Acoustic Parameters by Using Extended Model of Room Impulse Response Derived From Speech Signals
Author
Wang L., Duangpummet S., Unoki M.
Affiliations
Japan Advanced Institute of Science and Technology, Graduate School of Advanced Science and Technology, Nomi, Ishikawa, 923-1292, Japan; Klong Luang, NECTEC, National Science and Technology Development Agency, Pathum Thani, 12120, Thailand
Type
Article
Source Title
IEEE Access
ISSN
21693536
Year
2023
Volume
11
Page
49431-49444
Open Access
All Open Access, Gold
Publisher
Institute of Electrical and Electronics Engineers Inc.
DOI
10.1109/ACCESS.2023.3276327
Format
Abstract
The speech transmission index (STI) and room acoustic parameters (RAPs) are essential metrics for assessing speech quality and predicting listening difficulty in a sound field. Although STI and important RAPs, such as reverberation time and clarity, can be derived from the room impulse response (RIR), measuring the RIR in regularly occupied spaces is difficult. Hence, simultaneous blind estimation of STI and RAPs is an imperative challenge issue that must be addressed. However, most existing methods provide only a single parameter and require a massive dataset for model training. A deterministic method is presented for blindly estimating STI and five RAPs using a stochastic RIR model that approximates an unknown RIR. An algorithm is formulated that uses the temporal power envelope of a reverberant speech signal to determine the optimal parameters of the RIR model. A mathematical model of reverberation and dereverabation process was proposed based on the temporal power envelope of the signals. This model maps the parameters of the RIR model to the observed reverberant signal. The estimated RIR can then be synthesized using the optimal parameters to estimate the STI and RAPs. A simulation was conducted to evaluate the simultaneous estimation of STI and five essential RAPs from observed reverberant speech signals, in comparison to the best existing previous work. The root-mean-square error (RMSE) and Pearson correlation coefficient between the estimated and measured values were used as evaluation metrics. In terms of STI, the proposed method achieves the accuracy with an RMSE of 0.037. With regard to the reverberation time and other RAPs, the accuracy remains consistent with the previous works. The results show that the proposed method can effectively estimate STI and RAPs simultaneously without any training. ? 2013 IEEE.
License
CC BY-NC-ND
Rights
Authors
Publication Source
WOS