Deep neural networks (DNNs) have been intensively deployed in a variety of intelligent applications (e.g., image and video recognition). Nevertheless, due to DNNs’ heavy computation burden, resource-constrained IoT devices are unsuitable to locally execute DNN inference tasks. Existing cloud-assisted approaches are severely affected by unpredictable communication latency and unstable performance of remote servers. As a countermeasure, it is a promising paradigm to leverage collaborative IoT devices to achieve distributed and scalable DNN inference. However, existing works only consider homogeneous IoT devices with static partition. Thus, there is an urgent need for a novel framework to adaptively partition DNN tasks and orchestrate distributed inference among heterogeneous resource-constrained IoT devices. There are two main challenges in this framework. First, it is difficult to accurately profile the DNNs’ multi-layer inference latency. Second, it is difficult to learn the collaborative inference strategy adaptively and in real-time in the heterogeneous environments. To this end, we first propose an interpretable multi-layer prediction model to abstract complex layer parameters. Furthermore, we leverage the evolutionary reinforcement learning (ERL) to adaptively determine the near-optimal partitioning strategy for DNN inference tasks. Real-world experiments based on Raspberry Pi are implemented, showing that our proposed method can significantly accelerate the inference speed in dynamic and heterogeneous environments.