Advanced Search
    Yuan Xinpan, Long Jun, Zhang Zuping, Luo Yueyi, Zhang Hao, and Gui Weihua. Connected Bit Minwise Hashing[J]. Journal of Computer Research and Development, 2013, 50(4): 883-890.
    Citation: Yuan Xinpan, Long Jun, Zhang Zuping, Luo Yueyi, Zhang Hao, and Gui Weihua. Connected Bit Minwise Hashing[J]. Journal of Computer Research and Development, 2013, 50(4): 883-890.
    • Minwise Hashing has become a standard technique for estimating the similarity of the collection (e.g., resemblance) with applications in information retrieval. While traditional Minwise hashing methods store each hashed value using 64 bits, storing only the lowest b bits of each (Minwise) hashed value (e.g., b=1 or 2). The b-bit Minwise hashing algorithm can gain substantial advantages in terms of computational efficiency and storage space. Based on the b-bit Minwise hashing theory, a connected bit Minwise hashing algorithm is proposed. The unbiased estimator of the resemblance and storage factor of connected bit Minwise hashing are provided theoretically. It could be theoretically proved that the efficiency of similarity estimation is improved by the connected bit Minwise hashing algorithm since the number of comparisons is greatly reduced without significant loss of accuracy. Several key parameters (e.g., precision, recall and efficiency) are analyzed, and the availability of several estimators for connected bit Minwise hashing is analyzed. Theoretical analysis and experimental results demonstrate the effectiveness of this method.
    • loading

    Catalog

      Turn off MathJax
      Article Contents

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return