Cross-modal anomaly detection in big data environment is a very valuable and challenging work. Existing cross-modal anomaly detection framework often suffers from the incomplete data anomaly detection and low data utilization problems. To alleviate these concerns, an efficient cross-modal anomaly detection framework is proposed via hierarchical deep networks and similarity based bi-quintuple loss. First, the proposed framework introduces a single-view anomaly detection network to detect the attribute anomaly and part of class-attribute anomaly in data samples. Then, the similarity bi-quintuple loss, integrated with double-branch deep networks, is efficiently developed to detect the class anomaly and the remaining part class-attribute anomaly in data samples. Meanwhile, this loss regularizes the different attribute data with orthogonal property, and ensures the linear correlation between the same attribute data, enlarges the feature difference between different attribute data and increases the feature correlation between the same attribute data. Meanwhile, the bidirectional constraint and neighborhood constraint can significantly improve the data utilization and the generalization ability of the model. Extensive experimental results show that the proposed framework is able to detect possible abnormal sample points in different modalities, and outperforms the state-of-the-art corresponding methods, with obvious advantages.