XML-constraints

XML-constraints projects

download .ZIP download .TGZ

TLDKS 2013

This repository contains the resources used in the experiments of the paper "The Finite Implication Problem for Expressive XML Keys: Foundations, Applications, and Performance Evaluation" for an Special Issue of the Springer journal "Transactions on Large Scale Data and Knowledge Centered Systems" (TLDKS).

On-line version of the paper

Abstract

The increasing popularity of XML for persistent data storage, processing and exchange has triggered the demand for efficient algorithms to manage XML data. Both industry and academia have long since recognized the importance of keys in XML data management. In this paper we make a theoretical as well as a practical contribution to this area. This endeavour is ambitious given the multitude of intractability results that have been established. Our theoretical contribution is based in the definition of a new fragment of XML keys that keeps the right balance between expressiveness and efficiency of maintenance. More precisely, we characterize the associated implication problem axiomatically and develop a low-degree polynomial time decision algorithm. In comparison to previous work, this new fragment of XML keys provides designers with an enhanced ability to capture properties of XML data that are significant for the application at hand. Our practical contribution includes an efficient implementation of this decision algorithm and a thorough evaluation of its performance, demonstrating that reasoning about expressive notions of XML keys can be done efficiently in practice, and scales well. Our results promote the use of XML keys on real-world XML practice, where a little more semantics makes applications a lot more effective. To exemplify this potential, we use the decision algorithm to calculate non-redundant covers for sets of XML keys. In turn, this allow us to reduce significantly the time required to validate large XML documents against keys from the proposed fragment.

Files

"cover-dataset" folder contains the sets of Max-Keys defined for each XML document.
"implication-dataset" folder contains the sets of Max-Keys used to decide the implication problem for Max-Keys.
"validation-dataset" folder contains the sets of Max-Keys, obtained from cover sets computation, to validate the XML documents.
"cover-uniq" is the application to compute a cover (in one-pass) for a given set of Max-Keys.
"cover-all" is the application to compute all possible covers for a given set of Max-Keys.
"max-keys" is the application to decide the implication problem over Max-Keys.
"validator" is the application to validate an XML document against a given set of Max-Keys.
"xml-constraints-covers-TLDKS2013.xls" an excel file with all the results obtained computing the cover sets and validating the documents.
"xml-constraints-implication-TLDKS2013.xls" an excel file with all the results obtained deciding the implication problem.

(Note: For now we only leave available executable versions for the applications.)

How to run

If you want to run our implementations execute the following commands in a console:

To compute one cover set in one-pass:

./cover-uniq -f cover-dataset/keys-padron.in -l label

To compute all possible covers:

./cover-all -f cover-dataset/keys-padron.in -l label

where label is the fixed replacement label needed in the implication problem. The final response for both applications is a result.out file with the obtained cover(s).

To validate an XML document against a set of Max-Keys:

./validator -f xml-document.xml -k validation-dataset/padron-cover-1.in -r padron

where the final parameter, "padron", is the root node of the "xml-document.xml" file.

Configuration

The experiments with implication problem were executed in a Intel Core i7 2.8 GHz machine, with 4 GB of RAM, running a 64 bits Linux kernel 2.6.32. While the experiments of validation were run on a cluster of 160 nodes of type Xeon E5-2680(i7) with 128 GB and 16 cores per node (2.68 GHZ) running Linux RHEL 6.1. We compiled our C++ implementation of the algorithms using the standard g++ compiler from the GNU Compiler Collection 4.6.3.

Contact

For further question about this project please do not hesitate to contact us:
Flavio Ferrarotti (flavio.ferrarottivuw.ac.nz)
Emir Muñoz (emir.munozie.fujitsu.com)

Emir Muñoz
Fujitsu Ireland Ltd.
National University of Ireland
Galway, Ireland