Abstract:
GML, an XML-based geographic modeling language, has become a de facto encoding standard for geospatial data. Usually, GML documents are extremely verbose because of highly frequent repeating structures like tags and attribute names, which contributes to the self-describing advantage of GML data. Besides, GML documents are rich of data, having many space-consuming textual data items, including attribute values and element contents. What is worse, there often exists a great amount of high-precision spatial coordinate data in text format that occupies more storage space than in binary format. Hence it is very costly to store and transfer GML documents. An effective schema-based approach to GML compression is proposed, which compresses a GML document by first inferring a schema from the document, validating the document against the schema inferred from the document itself, and then encoding the state transition paths of the tree automaton by bits, compressing the coordinate data via the delta encoding scheme, and forwarding the inferred schema and all encodings to the general text compressors finally. Experiments on real GML documents show that the proposed compressor outperforms both typical general text compressors (gzip and PPMD), and the state-of-the-art XML compressors (including XMill, XMLPPM, XWRT), as well as the GML compressor GPress in compression ratio.