Article

Multi-structural databases

Authors:
Ronald Fagin

IBM Almaden Research Center, San Jose, CA

IBM Almaden Research Center, San Jose, CA
View Profile

,
R. Guha

IBM Almaden Research Center, San Jose, CA

IBM Almaden Research Center, San Jose, CA
View Profile

,
Ravi Kumar

IBM Almaden Research Center, San Jose, CA

IBM Almaden Research Center, San Jose, CA
View Profile

,
Jasmine Novak

IBM Almaden Research Center, San Jose, CA

IBM Almaden Research Center, San Jose, CA
View Profile

,
D. Sivakumar

IBM Almaden Research Center, San Jose, CA

IBM Almaden Research Center, San Jose, CA
View Profile

,
Andrew Tomkins

IBM Almaden Research Center, San Jose, CA

IBM Almaden Research Center, San Jose, CA
View Profile

PODS '05: Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsJune 2005Pages 184–195https://doi.org/10.1145/1065167.1065191

Published:13 June 2005Publication History

PODS '05: Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems

Pages 184–195

ABSTRACT

We introduce the Multi-Structural Database, a new data framework to support efficient analysis of large, complex data sets. An instance of the model consists of a set of data objects, together with a schema that specifies segmentations of the set of data objects according to multiple distinct criteria (e.g., into a taxonomy based on a hierarchical attribute). Within this model, we develop a rich set of analytical operations and design highly efficient algorithms for these operations. Our operations are formulated as optimization problems, and allow the user to analyze the underlying data in terms of the allowed segmentations.

Our algorithms and results extend those of Fagin et al. [8] who studied composition of mappings given by several kinds of constraints. In particular, they proved that full source-to-target tuple-generating dependencies (tgds) are closed under composition, but embedded source-to-target tgds are not. They introduced a class of second-order constraints, <i>SO tgds</i>, that is closed under composition and has desirable properties for data exchange.

We study constraints that need not be source-to-target and we concentrate on obtaining (first-order) embedded dependencies. As part of this study, we also consider full dependencies and second-order constraints that arise from Skolemizing embedded dependencies. For each of the three classes of mappings that we study, we provide (a) an algorithm that attempts to compute the composition and (b) sufficient conditions on the input mappings that guarantee that the algorithm will succeed.

In addition, we give several negative results. In particular, we show that full dependencies are not closed under composition, and that second-order dependencies that are not limited to be source-to-target are not closed under restricted composition. Furthermore, we show that determining whether the composition can be given by these kinds of dependencies is undecidable.

References

R. Agrawal, A. Gupta, and S. Sarawagi. Modeling multidimensional databases. In Proc. 13th Intl. Conference on Data Engineering, pages 232--243, 1997. Google ScholarDigital Library
D. Barbará, Y. Li, and J. Couto. COOLCAT: An entropy-based algorithm for categorical clustering. In Proc. 11th Intl. Conference on Information and Knowledge Management, pages 582--589, 2002. Google ScholarDigital Library
L. Cabibbo and R. Torlone. A logical framework for querying multidimensional data. In Intl. Seminar on New Techniques and Technologies for Statistics, pages 155--162, 1998.Google Scholar
E. F. Codd, S. B. Codd, and C. T. Salley. Providing OLAP (on-line analytical processing) to user analysts: An IT mandate, 1993. Arbor Software, now Hyperion Solutions Corp., White Paper.Google Scholar
W. F. Cody, J. T. Kreulen, V. Krishna, and W. S. Spangler. The integration of business intelligence and knowledge management. IBM Systems Journal, 41(4):697--713, 2002. Google ScholarDigital Library
U. Feige. A threshold of ln n for approximating set cover. J. ACM, 45(4):634--652, 1998. Google ScholarDigital Library
R. Feldman and I. Dagan. Knowledge discovery in textual databases (KDT). In Knowledge Discovery and Data Mining, pages 112--117, 1995.Google Scholar
V. Ganti, J. Gehrke, and R. Ramakrishnan. Cactus: clustering categorical data using summaries. In Proc. 5th ACM SIGKDD Intl. Conference on Knowledge Discovery and Data Mining, pages 73--83, 1999. Google ScholarDigital Library
S. Gollapudi and D. Sivakumar. Framework and algorithms for trend analysis in massive temporal data sets. In Proc. 13th Intl. Conference on Information and Knowledge Management, pages 168--177, 2004. Google ScholarDigital Library
J. Gray, A. Bosworth, A. Layman, and H. Pirahesh. Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-total. In Proc. 12th Intl. Conference on Data Engineering, pages 152--159, 1996. Google ScholarDigital Library
M. Grigni and F. Manne. On the complexity of the generalized block distribution. In Proc. 3rd Intl. Workshop on Parallel Algorithms for Irregularly Structured Problems, pages 319--326, 1996. Google ScholarDigital Library
D. Gruhl, L. Chavet, D. Gibson, J. Meyer, P. Pattanayak, A. Tomkins, and J. Zien. How to build a WebFountain: An architecture for very large-scale text analytics. IBM Systems Journal, 43(1):64--77, 2004. Google ScholarDigital Library
S. Guha, R. Rastogi, and K. Shim. Rock: A robust clustering algorithm for categorical attributes. In Proc. 15th Intl. Conference on Data Engineering, page 512, 1999. Google ScholarDigital Library
M. Gyssens and L. Lakshmanan. A foundation for multi-dimensional databases. In Proc. 23rd Intl. Conference on Very Large Data Bases, pages 106--115, 1997. Google ScholarDigital Library
J. Han. Towards on-line analytical mining in large databases. SIGMOD Record, 27(1):97--107, 1998. Google ScholarDigital Library
V. Harinarayan, A. Rajaraman, and J. D. Ullman. Implementing data cubes efficiently. In Proc. ACM SIGMOD Intl. Conference on Management of Data, pages 205--216, 1996. Google ScholarDigital Library
J. Håstad. Clique is hard to approximate within n1-ε Acta Mathematica, pages 105--142, 1999.Google Scholar
S. Khanna, S. Muthukrishnan, and S. Skiena. Efficient array partitioning. In Proc. 24th Intl. Colloquium on Automata, Languages and Programming, pages 616--626, 1997. Google ScholarDigital Library
R. Kimball. The Data Warehouse Toolkit. J. Wiley and Sons, Inc, 1996. Google ScholarDigital Library
L. Lakshmanan, J. Pei, and J. Han. Quotient cube: How to summarize the semantics of a data cube. In Proc. 28th Intl. Conference on Very Large Data Bases, pages 778--789, 2002. Google ScholarDigital Library
B. Lent, R. Agrawal, and R. Srikant. Discovering trends in text databases. In Proc. 3rd Intl. Conference on Knowledge Discovery in Databases and Data Mining, August 1997.Google Scholar
C. Lund and M. Yannakakis. On the hardness of approximating minimization problems. J. ACM, 41(5):960--981, 1994. Google ScholarDigital Library
K. E. Paluch. A 2(1/8)-approximation algorithm for rectangle tiling. In Proc. 31st Intl. Colloquium on Automata, Languages and Programming, pages 1054--1065, 2004.Google ScholarCross Ref
S. Sarawagi. User-adaptive exploration of multidimensional data. In Proc. 26th Intl. Conference on Very Large Data Bases, pages 307--316, 2000.Google Scholar
S. Sarawagi, R. Agrawal, and N. Megiddo. Discovery-driven exploration of OLAP data cubes. In Proc. 6th Intl. Conference on Extending Database Technology, pages 168--182, 1998. Google ScholarDigital Library
S. Sarawagi and G. Sathe. i3: Intelligent, interactive investigation of OLAP data cubes. In Proc. ACM SIGMOD Intl. Conference on Management of Data, page 589, 2000. Google ScholarDigital Library
J. Tremblay and R. Manohar. Discrete Mathematical Structures with Applications to Computer Science. McGraw Hill Book Company, 1975. Google ScholarDigital Library
P. Vassiliadis and T. Sellis. A survey of logical models for OLAP databases. SIGMOD Record, 28(4):64--69, 1999. Google ScholarDigital Library

Recommendations

Efficient structural query processing in xml databases
Read More
Schema Versioning in Multi-temporal XML Databases
ICIS '08: Proceedings of the Seventh IEEE/ACIS International Conference on Computer and Information Science (icis 2008)

Schema evolution keeps only the current data and the schema version after applying schema changes. On the contrary, schema versioning creates new schema versions and preserves old schema versions and their corresponding data. Much research work has ...
Read More
Querying relational databases through XSLT

XML has been accepted as a universal format for data interchange and publication. It can be applied in the applications in which the data of a database needs to be viewed in XML format so that the data being viewed takes more semantics and is easily ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PODS '05: Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
June 2005
388 pages
ISBN:1595930620
DOI:10.1145/1065167
General Chair:
Georg Gottlob
Vienna University of Technology, Austria
,
Program Chair:
Foto Afrati
National Technical University of Athens, Greece
Copyright © 2005 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 June 2005
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate642of2,707submissions,24%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 29
  Total Citations
  View Citations
- 720
  Total Downloads
- Downloads (Last 12 months)10
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Multi-structural databases

PODS '05: Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems

ABSTRACT

References

Cited By

Recommendations

Efficient structural query processing in xml databases

Schema Versioning in Multi-temporal XML Databases

Querying relational databases through XSLT

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Multi-structural databases

PODS '05: Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems

ABSTRACT

References

Cited By

Recommendations

Efficient structural query processing in xml databases

Schema Versioning in Multi-temporal XML Databases

Querying relational databases through XSLT

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media