Physiological
Higher Education Press
image: The proposed model structure view more
Credit: Higher Education Press Limited Company
Biometric speech recognition systems are often subject to various spoofing attacks, the most common of which are speech synthesis and speech conversion attacks. These spoofing attacks can cause the biometric speech recognition system to incorrectly accept these spoofing attacks, which can compromise the security of this system. Researchers have made many efforts to address this problem. But existing voice spoofing detection methods only consider the physical features of speech, resulting in poor detection performance.
To solve the problem, a research team led by Junxiao XUE published their new research on 15 April 2023 in Frontiers of Computer Science co-published by Higher Education Press and Springer Nature.
The team proposes a voice spoofing detection method based on physiological-physical feature fusion. The method includes a feature extractor, a densely connected convolutional neural network with squeeze and excitation blocks (SE-DenseNet), and a feature fusion strategy. Compared to existing methods, the tandem decision cost function and equal error rate scores improved by 5% and 7% respectively.
Specifically, physiological features in the audio are first extracted from a pre-trained convolutional network. SE-DenseNet is then used to extract the physical features. Such a densely connected model has high parametric efficiency and squeeze and excitation blocks enhances the efficiency of feature transmission. Finally, the two features are integrated into the classification network for voice spoofing detection.
They compared the proposed model with some of the best single systems. The experiments show that their proposed model performs better on both EER and t-DCF. To validate the effectiveness of the face features, they also evaluated the performance of some baseline models that introduced face features. It was found that different baseline methods showed different degrees of performance improvement when combined with the face features, proving that the face features are practicable for the baseline models.
Future work can attempt to extract more accurate face features and study more effective feature fusion strategies to detect spoofing attacks.
###
Research Article
Junxiao XUE, Hao ZHOU. Physiological-physical feature fusion for automatic voice spoofing detection. Front. Comput. Sci., 2023, 17(2): 172318, https://doi.org/10.1007/s11704-022-2121-6
About Frontiers of Computer Science (FCS)
FCS was launched in 2007. It is published bimonthly both online and in print by HEP and Springer. Prof. Zhi-Hua Zhou from Nanjing University serves as the Editor-in-Chief. It aims to provide a forum for the publication of peer-reviewed papers to promote rapid communication and exchange between computer scientists. FCS covers all major branches of computer science, including: architecture, software, artificial intelligence, theoretical computer science, networks and communication, information systems, multimedia and graphics, information security, interdisciplinary, etc. The readers may be interested in the special columns "Perspective" and "Excellent Young Scholars Forum".
FCS is indexed by SCI(E), EI, DBLP, Scopus, etc. The latest IF is 2.669. FCS solicits the following article types: Review, Research Article, Letter.
Frontiers of Computer Science
10.1007/s11704-022-2121-6
Experimental study
Not applicable
Physiological-physical feature fusion for automatic voice spoofing detection
15-Apr-2023
Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.
image: The proposed model structure Research Article About Frontiers of Computer Science (FCS) Disclaimer: