As network speed continues to grow, new challenges of network processing are emerging. Although many innovated solutions have been proposed in recent years, based on the analysis of the memory accessing trace and program locality in network processing, we point out that there are still defects in current processor network subsystem designs. Moreover, we find that the interaction and context switch between network processing and local programs are bottlenecks of network performance promotion, which have not been paid enough attention before. Motivated by the studies, a hardware and software co-design solution for network optimization is proposed, which includes improved direct cache access scheme, cache locking for system software, related interconnection architecture and the coherence protocol. The experiment shows that based on the proposed system, the peak TCP bandwidth is increased about 48%, while the UDP package loss rate is decreased by 40% under heavy pressure, and the network latency is decreased by more than 10%. Especially, the network bandwidth is improved about 44% when network processing benchmark executes with SPEC2000 programs in parallel. Also we discuss the collaboration scheme among the proposed solution and other main stream network optimization technologies, as well as the basic rules for the collaboration of multiple network optimization techniques.