Abstract:
To solve the problem that the knowledge graph representation learning model only uses triples information, a representation model with semantic analysis is proposed, which is named bidirectional encoder representations from transformers-pruning knowledge embedding (BERT-PKE). It employs bidirectional encoder representations to analyze text, and mines the depth semantic information of entities and relations based on the entities and relations of text description. Since BERT has the heavy consumption in the training time, we propose a pruning strategy with word frequency and k-nearest neighbors to extract the selected text description set. In addition, due to the construction of negative samples has impacts on training model, two strategies are introduced for improving random sampling. One is a negative sampling method based on entity distribution, in which the Bernoulli distribution probability is used to select the replaced entities. It reduces the Pseudo-Labelling problem caused by negative sampling. The other is a negative sampling method based on the similarity of the entities. It mainly uses TransE and k-means to represent the entities as the vectors and classify the entities respectively. High-quality negative triples can be obtained by mutual replacement of entities in the same cluster, which is helpful for feature learning of entities. Experimental results show that the performance of proposed model is significantly improved compared to the SOTA baselines.