Abstract:
Data deduplication is a vital technology for efficiently managing big data, widely adopted in cloud storage systems to reduce redundancy and save space. To integrate deduplication with encryption, convergent encryption has become a common approach. This method allows for the encryption of data while still enabling deduplication by producing the same ciphertext for identical plaintexts. However, cloud service providers' outsourcing models and the deterministic nature of convergent encryption can introduce data security issues. The encryption patterns of data can become predictable, potentially exposing sensitive information to attackers, which may create serious security implications. As a result, encrypted data deduplication has emerged as an important research topic in cloud storage security. This paper firstly introduces the concept of data deduplication, encrypted deduplication algorithms, and discusses the security challenges associated with encrypting and deduplicating data in cloud storage. It then reviews the current research status from both attack and defense perspectives, covering three main types of attacks: brute force attacks, which try to decrypt data through extensive guessing; frequency analysis attacks, which exploit frequency characteristics in ciphertexts; and side-channel attacks, which leverage information from response or traffic characteristics. For each attack type, representative defense strategies are analyzed along with their strengths and weaknesses. Finally, the paper highlights the challenges faced by existing encrypted data deduplication defenses and suggests future research directions aimed at improving these techniques.