Urban traffic flow prediction is a key technology to study the behavior of traffic-related big data and predict future traffic flow, which is crucial to guide the early warning of traffic congestion in the intelligent transportation system. But effective traffic flow prediction is very challenging as it is affected by many complex factors, e.g. spatial-temporal dependency and temporal dynamics of traffic networks. In the literature, some research works applied convolutional neural networks (CNN) or recurrent neural networks (RNN) for traffic flow prediction. However, it is difficult for these models to capture the spatial-temporal correlation features of traffic flow related temporal data. In this paper, we propose a novel sequence-to-sequence spatial-temporal attention framework to deal with the urban traffic flow forecasting task. It is an end-to-end deep learning model which is based on convolutional LSTM layers and LSTM layers with attention mechanism to adaptively learn spatial-temporal dependency and non-linear correlation features of urban traffic flow related multivariate sequence data. Extensive experimental results based on three real-world traffic flow datasets show that our model has the best forecasting performance compared with state-of-the-art methods.