Schema Profiling of Document Stores
Description
In document stores, schema is a soft concept and the doc- uments in a collection can have different schemata; this gives designers and implementers augmented flexibility but requires an extra effort to understand the rules that drove the use of alternative schemata when heterogeneous documents are to be analyzed or integrated. In this paper we outline a technique, called schema profiling, to explain the schema variants within a collection in document stores by capturing the hidden rules explaining the use of these variants; we express these rules in the form of a decision tree, called schema profile, whose main feature is the coexistence of value-based and schema-based conditions. Consistently with the requirements we elicited from real users, we aim at creating explicative, precise, and concise schema profiles; to quantitatively assess these qualities we introduce a novel measure of entropy.
Files
sebd17.pdf
Files
(343.1 kB)
Name | Size | Download all |
---|---|---|
md5:d3f07f872afe2454e9b5842c0977fc06
|
343.1 kB | Preview Download |