Hybrid memory systems composed of non-volatile memory (NVM) and DRAM can offer large memory capacity and DRAM-like performance. However, with the increasing memory capacity and application footprints, the address translation overhead becomes another system performance bottleneck due to lower translation lookaside buffer (TLB) converge. Large pages can significantly improve the TLB converge, however, they impede fine-grained page migration in hybrid memory systems. In this paper, we propose a hierarchical hybrid memory system that supports both large pages and fine-grained page caching. We manage NVM and DRAM with large pages and small pages, respectively. The DRAM is used as a cache to NVM by using a direct mapping mechanism. We propose a cache filtering mechanism to only fetch frequently-access (hot) data into the DRAM cache. CPUs can still access the cold data directly in NVM through a DRAM bypassing mechanism. We dynamically adjust the threshold of hot data classification to adapt to the diversifying and dynamic memory access patterns of applications. Experimental results show that our strategy improves the application performance by 69.9% and 15.2% compared with a NVM-only system and the state-of-the-art CHOP scheme, respectively. The performance gap is only 8.8% compared with a DRAM-only memory system with large pages support.