ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2022, Vol. 59 ›› Issue (3): 683-693.doi: 10.7544/issn1000-1239.20200789

• 人工智能 • 上一篇    下一篇



  1. 1(特种光纤与光接入网重点实验室(上海大学) 上海 200444);2(特种光纤与先进通信国际合作联合实验室(上海大学) 上海 200444);3(中国科学院上海微系统与信息技术研究所无线传感网与通信重点实验室 上海 200050) (
  • 出版日期: 2022-03-07
  • 基金资助: 

Tolerance Feature Extension of Substandard Sign Language Recognition with Finite Samples

Kong Leyi1,2, Zhang Jinyi1,2, Lou Liangliang3   

  1. 1(Key laboratory of Specialty Fiber Optics and Optical Access Networks(Shanghai University), Shanghai 200444);2(Joint International Research Laboratory of Specialty Fiber Optics and Advanced Communication(Shanghai University),Shanghai 200444);3(Key Laboratory of Wireless Sensor Network & Communication,Shanghai Institute of Microsystem and Information Technology,Chinese Academy of Sciences, Shanghai 200050)
  • Online: 2022-03-07
  • Supported by: 
    This work was supported by the Subject Innovation and Talent Introduction Program (111) in Colleges and Universities (D20031) and the Key Disciplines Funded by Shanghai Education Commission(J50104).

摘要: 生活中似是而非的手语表达语义含糊,欠规范的手势动作易混淆,同时从有限样本中难以获得充足特征用于训练手语识别模型,模型容易过拟合进而导致识别准确率较低.针对此问题,提出一种在有限样本条件下扩充欠规范手语识别容错特征的表示学习方法.该方法基于手语表达时人体骨架的运动信息,面向手语的时空关联性构建自编码器,从手语语料库中少量原始样本提取标准特征;然后利用生成对抗网络从标准特征产生大量欠规范样本,再通过自编码器扩充容错特征,构建新的容错特征集用于后续任务.实验结果表明:该方法在有限样本条件下,产生的欠规范手语样本语义清晰,新的容错特征集中不同类别的特征易于划分.在中文手语数据集上利用该方法构建容错特征集,训练手语识别模型达到97.5%的识别准确率,证明其具有广泛的应用前景.

关键词: 手语识别, 有限样本, 自编码器, 生成对抗网络, 表示学习

Abstract: The expression of specious sign language in life is ambiguous, and the semantics of substandard gesture actions are easy to be confused. At the same time, it is difficult to obtain sufficient features for training sign language recognition model with finite samples, and the model is easy to over fit when it is too complex, which leads to low recognition accuracy. In order to solve this problem, we propose a representation learning method to expand the tolerant features of sub-standard sign language recognition with finite samples. This method based on the skeleton information of human body, facing the spatiotemporal correlation of sign language, constructes a autoencoder to extract standard features from a small number of original samples in sign language corpus; a large number of substandard samples are generated from standard features by generative adversarial networks, and then fault-tolerant features are extended by autoencoder to construct new features for subsequent sign language recognition tasks. The experimental results show that, under the condition of limited samples, the semantics of the samples generated by this method are clear, and the features of different semantics in the new feature set are easy to be divided. Using this method to build tolerant feature set in CSL dataset, the training sign language recognition model achieves 97.5% recognition accuracy, which indicates that it has broad application prospects.

Key words: sign language recognition, finite sample, autoencoder, generative adversarial network, representation learning