TLDKS 2013
This repository contains the resources used in the experiments of the paper "The Finite Implication Problem for Expressive XML Keys: Foundations, Applications, and Performance Evaluation" for an Special Issue of the Springer journal "Transactions on Large Scale Data and Knowledge Centered Systems" (TLDKS).
Abstract
The increasing popularity of XML for persistent data storage, processing and exchange has triggered the demand for efficient algorithms to manage XML data. Both industry and academia have long since recognized the importance of keys in XML data management. In this paper we make a theoretical as well as a practical contribution to this area. This endeavour is ambitious given the multitude of intractability results that have been established. Our theoretical contribution is based in the definition of a new fragment of XML keys that keeps the right balance between expressiveness and efficiency of maintenance. More precisely, we characterize the associated implication problem axiomatically and develop a low-degree polynomial time decision algorithm. In comparison to previous work, this new fragment of XML keys provides designers with an enhanced ability to capture properties of XML data that are significant for the application at hand. Our practical contribution includes an efficient implementation of this decision algorithm and a thorough evaluation of its performance, demonstrating that reasoning about expressive notions of XML keys can be done efficiently in practice, and scales well. Our results promote the use of XML keys on real-world XML practice, where a little more semantics makes applications a lot more effective. To exemplify this potential, we use the decision algorithm to calculate non-redundant covers for sets of XML keys. In turn, this allow us to reduce significantly the time required to validate large XML documents against keys from the proposed fragment.
Files
- "cover-dataset" folder contains the sets of Max-Keys defined for each XML document.
- "implication-dataset" folder contains the sets of Max-Keys used to decide the implication problem for Max-Keys.
- "validation-dataset" folder contains the sets of Max-Keys, obtained from cover sets computation, to validate the XML documents.
- "cover-uniq" is the application to compute a cover (in one-pass) for a given set of Max-Keys.
- "cover-all" is the application to compute all possible covers for a given set of Max-Keys.
- "max-keys" is the application to decide the implication problem over Max-Keys.
- "validator" is the application to validate an XML document against a given set of Max-Keys.
- "xml-constraints-covers-TLDKS2013.xls" an excel file with all the results obtained computing the cover sets and validating the documents.
- "xml-constraints-implication-TLDKS2013.xls" an excel file with all the results obtained deciding the implication problem.
(Note: For now we only leave available executable versions for the applications.)
How to run
If you want to run our implementations execute the following commands in a console:
To compute one cover set in one-pass:./cover-uniq -f cover-dataset/keys-padron.in -l labelTo compute all possible covers:
./cover-all -f cover-dataset/keys-padron.in -l label
where label is the fixed replacement label needed in the implication problem. The final response for both applications is a result.out file with the obtained cover(s).
To validate an XML document against a set of Max-Keys:./validator -f xml-document.xml -k validation-dataset/padron-cover-1.in -r padron
where the final parameter, "padron", is the root node of the "xml-document.xml" file.
Configuration
The experiments with implication problem were executed in a Intel Core i7 2.8 GHz machine, with 4 GB of RAM, running a 64 bits Linux kernel 2.6.32. While the experiments of validation were run on a cluster of 160 nodes of type Xeon E5-2680(i7) with 128 GB and 16 cores per node (2.68 GHZ) running Linux RHEL 6.1. We compiled our C++ implementation of the algorithms using the standard g++ compiler from the GNU Compiler Collection 4.6.3.Contact
For further question about this project please do not hesitate to contact us:
Flavio Ferrarotti (flavio.ferrarottivuw.ac.nz)
Emir Muñoz (emir.munozie.fujitsu.com)
Emir Muñoz
Fujitsu Ireland Ltd.
National University of Ireland
Galway, Ireland