An area and power efficient VLSI architecture is presented for QC-LDPC decoder based on optimized turbo-decoding message-passing (TDMP) algorithm. The optimization is based mainly on the check node updating using normalized Min-Sum (NMS) scheme, which has been proposed for the two-phase message-passing (TPMP) algorithm or the so called two-phase belief propagation algorithm. The primary advantages of the proposed architecture over recent work are: 1) power dissipation reduction resulting from fast convergence speed by a factor of larger than 2 in terms of decoding iterations, 2) more than 50% savings in memory leading to a large chip area reduction benefiting from memory optimization by posterior message compression, 3) good performance and low complexity resulting in small area and low power due to normalized Min-Sum algorithm for check-node updating, and 4) reducing interconnection congestion by making full use of quasi-cyclic characteristics of check matrix and the proposed logarithm shifters. The proposed architecture is implemented in the Chinese Digital Television Terrestrial Multimedia Broadcasting (DTMB) system for LDPC codes decoding. The decoder consumes 0.58 million gates, and reaches a throughput of 107Mbps at a clock frequency of 100MHz. The proposed architecture can be extended to other digital communication systems such as wireless local area network (WLAN), etc, which adopt LDPC codes as the forward error correction (FEC) scheme.