In recent years, due to the rapid development of high-throughput next generation sequencing (NGS) technologies, the sequencing cost and time have been greatly reduced. However, both the explosion of the generated NGS data and the massively parallel computation pose great challenges to the capability of existing computers. We take an open-source re-sequencing algorithm based on hash-index, called PerM, as an example to investigate the optimizations for accelerating NGS with commercial multi-core CPUs as well as with customized parallel architectures. Firstly, we optimize the original algorithm by reordering the bucket accessing sequences so that data locality in shared cache is improved. Secondly, to exclude the empty hash buckets, we propose a hash-index compression algorithm, which coincides with the sequential access nature of the optimized algorithm. The experiments on a 64-cores SMP (Intel Xeon X7550) show that the optimized algorithm reduces LLC miss ratio to about 10% of the original algorithm, therefore the overall performance can be improved by 4 to 11 times. Furthermore, a parallel accelerator architecture is designed and evaluated on our customized FPGA accelerator card with a Xilinx LX330 FPGA resident. As a prototype, a systolic array of 100 PEs is built, which operates at 175MHz. The performance of the proposed parallel accelerator architecture is justified by the reported speedup of 30 to 65 times over an 8-cores CPU.