Published July 29, 2017 | Version v1
Conference paper Open

Schema Profiling of Document Stores

Creators

  • 1. C
  • 2. CI

Description

In document stores, schema is a soft concept and the doc- uments in a collection can have different schemata; this gives designers and implementers augmented flexibility but requires an extra effort to understand the rules that drove the use of alternative schemata when heterogeneous documents are to be analyzed or integrated. In this paper we outline a technique, called schema profiling, to explain the schema variants within a collection in document stores by capturing the hidden rules explaining the use of these variants; we express these rules in the form of a decision tree, called schema profile, whose main feature is the coexistence of value-based and schema-based conditions. Consistently with the requirements we elicited from real users, we aim at creating explicative, precise, and concise schema profiles; to quantitatively assess these qualities we introduce a novel measure of entropy. 

Files

sebd17.pdf

Files (343.1 kB)

Name Size Download all
md5:d3f07f872afe2454e9b5842c0977fc06
343.1 kB Preview Download

Additional details

Funding

TOREADOR – TrustwOrthy model-awaRE Analytics Data platfORm 688797
European Commission