Abstract:
Cross-lingual word embedding aims to use the embedding space of resource-rich languages to improve the embedding of resource-scare languages, and it is widely used in a variety of cross-lingual tasks. Most of the existing methods address the word alignment by learning a linear mapping between two embedding spaces. Among them, the adversarial model based methods have received widespread attention because they can obtain good performance without using any dictionary. However, these methods perform not well on the dissimilar language pairs. The reason may be that the mapping learning only relies on the distance measurement for the entire space without the guidance of the seed dictionary, which results in multiple possibilities for the aligned word pairs and unsatisfying alignment. Therefore, in this paper, a semi-supervised cross-lingual word embedding method based on an adversarial model with dual discriminators is proposed. Based on the existing adversarial model, a bi-directional shared and fine-grained discriminator is added, and then an adversarial model with double discriminators is constructed. In addition, a negative sample dictionary is introduced as a supplement of the supervised seed dictionary to guild the fine-grained discriminator in a semi-supervised way. By minimizing the distance between the initial word-pairs and the supervised dictionary, including the seed dictionary and negative dictionary, the fine-grained discriminator will reduce the possibility of multiple word pairs and recognize the correct aligned pairs from those initial generated dictionaries. Finally, experimental results conducted on two cross-lingual datasets show that our proposed method can effectively improve the performance of the cross-lingual word embedding.