For sort acceleration on FPGA, the selection and optimization of various performance metrics, such as latency, throughput, power efficiency, hardware utilization and bandwidth efficiency, etc., are of critical importance. This paper compares the evolution of performance-driven sort acceleration, with advances in larger data size, more data types, more algorithm support, hardware-software cooperation and new hardware-based design; this paper analyzes the problems and optimization strategies faced at different stages of design, implementation, testing and so on. Among the numerous sorting algorithms, merge sort becomes mainstream due to its excellent hardware parallelism, scalability and simple control logic. Sort acceleration is an architectural design that is deeply tied to specific application scenarios. This paper analyzes the architectural adjustments made from the perspective of database system acceleration for resource competition, data arrangement, unique operations and diversity of user requests problems faced in databases. At last, to address the problems and shortcomings of existing studies, we provide an outlook on future directions in terms of distributed sort acceleration for very large data scale, the introduction of new hardware devices such as data processing unit, and the improvement of auxiliary tool chains such as high level synthesis to drive the iterative update of sort acceleration design.