ISSN 1000-1239 CN 11-1777/TP


    Default Latest Most Read
    Please wait a minute...
    For Selected: Toggle Thumbnails
    Journal of Computer Research and Development    2016, 53 (2): 229-230.  
    Abstract2647)   HTML15)    PDF (428KB)(1528)       Save
    Related Articles | Metrics
    Research on the Big Data Fusion: Issues and Challenges
    Meng Xiaofeng and Du Zhijuan
    Journal of Computer Research and Development    2016, 53 (2): 231-246.   DOI: 10.7544/issn1000-1239.2016.20150874
    Abstract3773)   HTML43)    PDF (3467KB)(3162)       Save
    Data characteristics and realistic demands have changed because of the large-scale data’s links and crossover. The data, which has main features of large scale, multi-source heterogeneous, cross domain, cross media, cross language, dynamic evolution and generalization, is playing an important role. And the corresponding data storage, analysis and understanding are also facing a major challenge. The immediate problem to be solved is how to use the data association, cross and integration to achieve the maximization of the value of big data. Our paper believes that the key to solve this problem lies in the integration of data, so we put forward the concept of large data fusion. We use Web data, scientific data and business data fusion as a case to analyze the demand and necessity of data fusion, and propose a new task of large data fusion, but also summarize and analyze the existing fusion technologies. Finally, we analyze the challenges that may be faced in the process of large data fusion and the problems caused by large data fusion.
    Related Articles | Metrics
    Knowledge Representation Learning: A Review
    Liu Zhiyuan, Sun Maosong, Lin Yankai, Xie Ruobing
    Journal of Computer Research and Development    2016, 53 (2): 247-261.   DOI: 10.7544/issn1000-1239.2016.20160020
    Abstract10628)   HTML133)    PDF (3333KB)(17159)       Save
    Knowledge bases are usually represented as networks with entities as nodes and relations as edges. With network representation of knowledge bases, specific algorithms have to be designed to store and utilize knowledge bases, which are usually time consuming and suffer from data sparsity issue. Recently, representation learning, delegated by deep learning, has attracted many attentions in natural language processing, computer vision and speech analysis. Representation learning aims to project the interested objects into a dense, real-valued and low-dimensional semantic space, whereas knowledge representation learning focuses on representation learning of entities and relations in knowledge bases. Representation learning can efficiently measure semantic correlations of entities and relations, alleviate sparsity issues, and significantly improve the performance of knowledge acquisition, fusion and inference. In this paper, we will introduce the recent advances of representation learning, summarize the key challenges and possible solutions, and further give a future outlook on the research and application directions.
    Related Articles | Metrics
    Short Text Understanding: A Survey
    Wang Zhongyuan, Cheng Jianpeng, Wang Haixun, Wen Jirong
    Journal of Computer Research and Development    2016, 53 (2): 262-269.   DOI: 10.7544/issn1000-1239.2016.20150742
    Abstract3017)   HTML7)    PDF (1608KB)(3138)       Save
    Short text understanding is an important but challenging task relevant for machine intelligence. The task can potentially benefit various online applications, such as search engines, automatic question-answering, online advertising and recommendation systems. In all these applications, the necessary first step is to transform an input text into a machine-interpretable representation, namely to “understand” the short text. To achieve this goal, various approaches have been proposed to leverage external knowledge sources as a complement to the inadequate contextual information accompanying short texts. This survey reviews current progress in short text understanding with a focus on the vector based approaches, which aim to derive the vectorial encoding for a short text. We also explore a few potential research topics in the field of short text understanding.
    Related Articles | Metrics
    Graph-Based Collective Chinese Entity Linking Algorithm
    Liu Qiao, Zhong Yun, Li Yang, Liu Yao, Qin Zhiguang
    Journal of Computer Research and Development    2016, 53 (2): 270-283.   DOI: 10.7544/issn1000-1239.2016.20150832
    Abstract1996)   HTML7)    PDF (1917KB)(2652)       Save
    Entity Linking technology is a central concern of the knowledge base population research area. Traditional entity linking methods are usually limited by the immaturity of the local knowledge base, and deliberately ignore the semantic correlation between the mentions that co-occurr within a text corpus. In this work, we propose a novel graph-based collective entity linking algorithm for Chinese information processing, which not only can take full advantage of the structured relationship of the entities offered by the local knowledge base, but also can make use of the additional background information offered by external knowledge sources. Through an incremental evidence minning process, the algorithm achieves the goal of linking the mentions that are extraced from the text corpus, with their corresponding entities located in the local knowledge base in a batch manner. Experimental results on some open domain corpus demonstrate the validity of the proposed referent graph construction method, the incremental evidence minning process, and the coherence criterion between the mention-entity pairs. Experimental evidences show that the proposed entity linking algorithm consistently outperforms other state-of-the-art algorithms.
    Related Articles | Metrics
    Chinese Named Entity Relation Extraction Based on Syntactic and Semantic Features
    Gan Lixin, Wan Changxuan, Liu Dexi, Zhong Qing, Jiang Tengjiao
    Journal of Computer Research and Development    2016, 53 (2): 284-302.   DOI: 10.7544/issn1000-1239.2016.20150842
    Abstract1774)   HTML14)    PDF (2640KB)(1845)       Save
    Named entity relations are a foundation of semantic networks and ontology, and are widely used in information retrieval and machine translation, as well as automatic question and answering systems. In named entity relationships, relationship feature selection and extraction are two key issues. Characteristics of Chinese long sentences with complicated sentence patterns and many entities, as well as the data sparse problem, bring challenges for Chinese entity relationship detection and extraction tasks. To deal with above problems, a novel method based on syntactic and semantic features is proposed. The feature of dependency relation composition is obtained through the combination of their respective dependency relations between two entities. And the verb feature with the nearest syntactic dependency is captured from dependency relation and POS (part of speech). The above features are incorporated into feature-based relationship detection and extraction using SVM. Evaluation on a real text corpus in tourist domain shows above two features from syntactic and semantic aspects can effectively improve the performance of entity relationship detection and extraction, and outperform previously best-reported systems in terms of precision, recall and F1 value. In addition, the verb feature with nearest syntactic dependency achieves high effectiveness for relationship detection and extraction, especially obtaining the most prominent contribution to the performance improvement of data sparse entity relationships, and significantly outperforms the state-of-the-art based on the verb feature.
    Related Articles | Metrics
    A Graph-Based Approach for Query Answering Under Inconsistency-Tolerant Semantics
    Fu Xuefeng, Qi Guilin, Zhang Yong
    Journal of Computer Research and Development    2016, 53 (2): 303-315.   DOI: 10.7544/issn1000-1239.2016.20150839
    Abstract855)   HTML3)    PDF (2468KB)(711)       Save
    Inconsistency often occurs during ontology evolution, and leads to the invalidity of standard reasoning. To tackle this problem, inconsistency-tolerant semantics can be provided for the target language. However, ill-defined inconsistency-tolerant semantics may cost massive calculation and result in losing valuable information. In this paper, a variant of classical inconsistency-tolerant semantics is proposed, named IPAR-semantics. The newly defined inconsistency-tolerant semantics can avoid computing the closure of an ABox w.r.t. the corresponding TBox, thus can reduce the computation time and reserve as much information as possible. Based on the newly defined inconsistency-tolerant semantics, we further propose an approach for consistent query answering based on graph. In our approach, the given ontology and the target query are both transformed into graphs by different rules and stored into graph database. The IPAR-semantics ensure that the inconsistent instances cannot be included in the answering of query and the new approach can avoid redundant rewritings of a user query. Finally, We conduct comparative experiments on the ontologies generated by UOBM generator. In the experiments, we implement the query answering system under IPAR-semantics using our graph-based approach and compare it with the query answering approach under ICAR-semantics. The experimental results show that our approach outperforms in both efficiency and scalability.
    Related Articles | Metrics
    Semiring Provenance for Data Fusion
    Xue Jianxin, Shen Derong, Kou Yue, Nie Tiezheng, Yu Ge
    Journal of Computer Research and Development    2016, 53 (2): 316-325.   DOI: 10.7544/issn1000-1239.2016.20150872
    Abstract1101)   HTML0)    PDF (2286KB)(904)       Save
    As an important part of the Web data integration, Web data fusion is the quality assurance of integrated data and the precondition of accurate analysis and mining. However, being a uniform data fusion is treated as black box, which makes the fusion lack of interpretability and debuggable ability. Therefore, to describe fusion process and origin for solving the conflict, we should construct a provenance mechanism with data provenance. Data provenance describes about how data is generated and evolves with time going on, which can not only show which input tuples contribute to the data but also how they contribute. We study the semiring provenance for data fusion. Firstly, we propose an approximate iterative approach to optimal the computational process of semiring provenance. After, to speed up the convergence, we show a Newton-like approach. Recursion may make the situation complicated, we analysize the characteristic of semiring provenance and show that Kleene sequence and Newton-like sequence can convergent only after n step. And experiments show that the technologies in this paper are highly effective and feasible.
    Related Articles | Metrics