Abstract:
Due to the complexity of the Web environment and topic-multiplicity of the contents of Web pages, it is quite difficult to get all the Web pages relevant to a specific topic. It is possible for an irrelevant Web page to link a relevant Web page, so it is required to traverse the irrelevant Web page to get more relevant pages. This procedure is called tunneling. In this paper, some research about tunneling technique is presented, and also presented is a correction to the previous results. Tunneling is partitioned into grey tunneling and black tunneling. During the process of crawling, in order to avoid the effect caused by the Web page that is irrelevant to the specific topic as a whole but relevant partially, a multi-topical page is divided into several blocks and the blocks are processed individually for grey tunneling. In black tunneling, a depth value is assigned to determine whether the page should be kept to each irrelevant page according to the relevance of its father page, and then the scope of the topical crawler can be broadened. The experimental results show that the two tunneling methods have achieved the effect expected. Accordingly, the approaches are effective, robust and practicable.