Abstract:
The Semantic Web aims to leverage the World Wide Web to a Web of data, where machines are able to process annotations and relations between resources, and where implicit information can be derived from utilizing ontologies and shared vocabularies. To fulfill the vision of the Semantic Web, a method of automatic semantic annotation is needed. Proposed in this paper is a methodology for semantic annotation of Chinese Web pages, which is guided by domain ontology. The statistical method and the natural language processing technology are employed, and the mapping from sentences to RDF representations are realized through the identification phase and the grouping phase. The major technical contributions are: the domain lexicon constructed by the statistical method rather than the linguistic ontology is used as the external domain knowledge; the explicit property type tagging algorithm is used to recognize both instances and properties contained in sentences to facilitate relation extraction; after building dependency trees or dependency forests of sentences, the identified instances and properties can be grouped into RDF statements according to the dependency relationship among Chinese words. The experimental result shows that compared with the semantic annotation method based on the grammatical relationship of subject-verb-object, this method is significantly more effective.