基于多任务学习的方言语种识别

秦晨光; 王海; 任杰; 郑杰; 袁璐; 赵子鑫

doi:10.7544/issn1000-1239.2019.20190101

基于多任务学习的方言语种识别

¹(西北大学信息科学与技术学院西安 710127)
²(陕西师范大学计算机学院西安 710119) (qcgnwu@stumail.nwu.edu.cn)

基金项目: 国家自然科学基金项目(61572401，61701400)；中央高校基本科研业务费专项资金项目(GK201803063)；陕西省自然科学基础研究计划项目(2019JQ-271)

详细信息

中图分类号: TP312; TP18
计量
- 文章访问数: 1154
- HTML全文浏览量: 2
- PDF下载量: 1153
出版历程
- 发布日期: 2019-11-30

Dialect Language Recognition Based on Multi-Task Learning

¹(School of Information Science & Technology, Northwest University, Xi’an 710127)
²(School of Computer Science, Shaanxi Normal University, Xi’an 710119)

摘要

摘要: 近年来深度学习尤其是神经网络的发展，对语音识别这类复杂的模式分类问题提供了新的解决思路.为加强对我国方言语种的保护工作、提高方言语种识别的准确率以及丰富语音识别的前处理模块，首先采用目前语音识别领域应用最广泛的LSTM模型搭建单任务方言语种识别模型SLNet作为基线系统.其次，针对中国方言的多样性、复杂性特点，基于多任务学习的参数共享机制，通过多任务神经网络模型发现不同语种间的隐含相关特性，提出基于多语种任务的方言语种识别模型MTLNet.进一步根据中国方言的区域特点，采用基于参数硬共享的多任务学习模式，构建基于辅助任务的多任务学习神经网络ATLNet.经实验验证表明：相比于单任务神经网络方言语种识别，MTLNet和ATLNet将识别准确率可提升至80.2%，弥补了单任务模型的单一性和弱泛化性.
- 方言语种识别 /
- 方言区域识别 /
- 多任务学习 /
- 辅助任务 /
- 神经网络
Abstract: Development of deep learning and neural networks in recent years has led to new solutions to the complicated pattern recognition problems of speech recognition. In order to reinforce the protection of Chinese dialects, to improve the accuracy of dialect language recognition and the diversity of speech signal pre-processing modules for language recognition, this paper proposes a single-task dialect language recognition model, SLNet, on the basis of LSTM and currently the most widely used model in the field of speech recognition. Considering the diversity and complexity of Chinese dialects, on the basis of a multi-task learning parameter sharing mechanism, we use a neural network model to discover the implicit correlation characteristics of different dialects and propose the MTLNet, a dialect recognition model based on multilingual tasking. Further considering the regional characteristics of Chinese dialects, we adopt a multi-task learning model based on hard parameter sharing to construct the ATLNet, a multi-task learning neural network model based on auxiliary tasks. We design several sets of experiments to compare a single-task dialect language recognition model with the MTLNet and ATLNet models proposed in this paper. The results show multi-task methods improve the accuracy of language recognition to 80.2% on average and make up the singularity and weak generalization of the single-task model.
- dialect language recognition /
- dialect region recognition /
- multi-task learning /
- auxiliary tasks /
- neural networks

HTML全文

参考文献(0)

施引文献(27)

期刊类型引用(12)

1.	杨兴耀，肖瑞，卢进堂. 新疆维吾尔语口音普通话短文的语音识别研究. 东北师大学报(自然科学版). 2024(04): 72-80 . 百度学术
2.	闫凯，宋烨，刘瑜，杨莉，张浩源. 老龄化背景下居家养老系统方言识别算法应用研究——以粤语为例. 信息与电脑(理论版). 2023(02): 120-122 . 百度学术
3.	蒋若怡，韦永壮，王慧娇. 基于深度学习的差分神经区分器求解方法. 计算机工程与设计. 2023(06): 1629-1634 . 百度学术
4.	赵建川，杨浩铨，徐勇，吴恋，崔忠伟. 基于对比预测编码模型的多任务学习语种识别方法. 数据采集与处理. 2022(02): 288-297 . 百度学术
5.	万苗，任杰，马苗，曹瑞. 多任务学习在中国方言分类中的应用研究. 计算机技术与发展. 2022(04): 109-115 . 百度学术
6.	郝焕香. 基于深度学习的方言语音识别模型构建. 自动化与仪器仪表. 2022(04): 48-51 . 百度学术
7.	王瑶，龙华，邵玉斌，杜庆治. 可变时长的短时广播语音多语种识别. 云南大学学报(自然科学版). 2022(03): 490-496 . 百度学术
8.	付英，刘增力，汤辉. 基于CNN-BiGRU的方言语种识别. 通信技术. 2022(06): 712-719 . 百度学术
9.	王瑶，龙华，邵玉斌，杜庆治，王延凯. 基于CRNN混合神经网络的多语种识别. 光电子·激光. 2022(06): 620-628 . 百度学术
10.	张允耀，黄鹤鸣，张会云. 复杂噪声环境下语音识别研究. 计算机与现代化. 2021(09): 68-74 . 百度学术
11.	辛强伟，唐云凯. 多维度数据组合的人工智能系统性能优化分析. 数字技术与应用. 2020(10): 36-38 . 百度学术
14.	顾佳，黄明，关岳. 高速列车牵引变流器故障诊断研究. 振动.测试与诊断. 2020(05): 997-1002+1029 . 百度学术