Performance analysis of parallelization models for path expression queries

https://doi.org/10.1016/S0020-0255(98)10099-3Get rights and content

Abstract

In this paper, parallelization models for path expressions queries are studied. Path expression queries involve multiple classes along aggregation/association hierarchies. Parallelization models for path expression queries are “inter-object parallelization” and “inter-class parallelization”. Inter-object parallelization exploits the associativity within complex objects, whereas inter-class parallelization imposes upon process independence. The behaviours of these parallelization models are described in terms of analytical models. Performance evaluation is also performed to confirm the results from the quantitative analysis.

Introduction

Path expression (PE) query is one of the strengths of Object-Oriented Database (OODB) systems, as processing these queries can be done through pointer navigations, and hence an expensive relational join can be avoided [2], [11]. Parallelization of object-oriented queries, particularly path expression queries, has been studied for quite sometime [10], [14], [22]. A number of parallelization methods exist such as path parallelism [10], nested parallelism [19], pointer-based join [16] and ParSets [5].

Path parallelism, proposed by Kim [10], is where all different paths are processed in parallel. The results of each path are consolidated to obtain the final query result. If the paths are connected through an AND operator, an intersection operator needs to be applied. Path parallelism is implemented through a node parallelism [10], in which each node is by itself evaluated in parallel. Hence, path parallelism is merely concerned with parallelism between different paths.

Nested parallelism [19] is naturally associated with nested collections. Parallelism is achieved at two levels (possibly an arbitrary level). This parallelism model is influenced by collection types supported by ODMG [3], where an attribute of a class can be of a collection type. The first level of parallelism is to evaluate a root (first) node in parallel. For each selected root object, a TRUE flag is attached. Since the number of associated objects per one root object may vary, it becomes necessary to re-distribute all of the associated objects to balance up the load of each processor. The second step of parallelism is to evaluate the associated objects in parallel. Nested parallelism has a similarity to path parallelism. Nested parallelism can be viewed as parallelism among object paths, whereas path parallelism views parallelism from a class relationship point of view, not from an internal object relationship.

A more “traditional” pointer-based join for parallelization of path expression queries was influenced by relational join algorithms. A number of parallel pointer based join algorithms (i.e. hash-loops, probe-children, hybrid hash) have been proposed [16]. All of them are based on hash join. A pointer-based join is influenced by a conventional explicit join and hence it is a binary operation. Path expresson queries involving multiple classes are decomposed into multiple binary operations and each operation is a pointer-based join.

Another parallelization model called ParSets was introduced by DeWitt et al. [5]. ParSet is used to exploit parallelization of the graph traversal portion of the OO7 benchmark. ParSet was originally proposed as a way of adopting the data parallel approach to C++. Essentially, it allows a program to invoke a method on every object in a set in parallel.

Our previous work reported in [20] identifies two primitive forms of parallelization for path expression queries, namely inter-object parallelization and inter-class parallelization. Inter-object parallelization is a parallelization technique where multiple complex objects are processed simultaneously, whereas inter-class parallelization is a parallelization technique where several classes are processed concurrently. The strength of these methods is that they are based on the pointer navigation approach. Another main point of these two parallelization models are that they are primitive, as they concern with 2-class path expression only. Additionally, they are the basic building block of complex path expression queries, since complex queries can be decomposed into multiple 2-class path expression sub-query. Therefore, we would like to confine our discussion to basic parallelization models for 2-class path expression queries (or sub-queries).

The objective of this paper is not to propose new parallelization methods for path expression queries, but rather to perform an evaluation of the inter-object parallelization and the inter-class parallelization. The main reason for this need is that most existing work concentrates on identifying parallelization models for object-oriented queries. A complete analysis has still to be made. In this paper, we present a thorough analysis of the two parallelization techniques that we have proposed in our previous work [20]. A comparison between these two parallelization techniques is also made. The main aims of this paper include to study the behaviour of each parallelization model and to establish a foundation for parallelization of more complex object-oriented queries, in which simple (i.e. 2-class) path expression queries serve as a basis of complex queries. To reach the aims we will present cost models for each parallelization model, validate the cost models, compare the two parallelization models, and perform extensive performance evaluation.

The rest of this paper is organized as follows. Section 2 briefly describes path expression queries. Section 3 explains our previous work on parallelization models for path expression queries including inter-object and inter-class parallelization. Section 4 presents a quantitative analysis of the two parallelization models. Section 5 presents simulation results. Section 6 discusses future work which includes issues in optimizing general path expression queries. Finally, Section 7 gives the conclusions.

Section snippets

Path expression queries: a brief overview

Objects are usually not independent, and they can be connected to each other through aggregation/association relationships [6], [12]. This relationship forms complex objects which may include objects from many different classes. A typical object-oriented query is to retrieve objects which satisfy certain predicates. These predicates may appear at any classes connected through aggregation/association relationships. Queries on these hierarchy are known as path expression queries [12]. Processing

Parallelization models: our previous work

Parallelization of path expression queries can be done through simultaneous processing among complex objects (inter-object parallelization), or concurrent processing among classes (inter-class parallelization) [20]. These two parallelization models view parallel path expression query processing from two different angles, particularly from object point of view and from class point of view.

Analytical performance comparison

To compare performance of the two parallelization models, it is necessary, firstly, to describe the behaviour of each parallelization model in terms of a cost model, and secondly, to perform analytical performance comparison between the two models. It is also critical that the basic cost models are validated through an implementation, before further quantitative performance evaluation is carried out.

Simulation results

A simulation program is constructed using Transim [9]. The simulation is built based on the validated models presented earlier. Transim is a transputer-based simulator program. Using Transim, the number of processors and the architecture topology can be configured. Communication is done through channels which are able to connect any pair of processors. The default processor is the IMS T800 transputer, clock speed 20 MHz, nominal link speed 10 Mbit/s, internal memory assumed sufficiently large

Future work

In this section, we highlight future work relating to the optimization of general path expression queries. General path expression queries normally consist of more than 2 classes connected through relationships. As these queries can be built upon multiple 2-class path expressions, the results of our analysis concerning inter-object and inter-class parallelization can be used as guidelines. The two lemmas on parallelization models can be used as a foundation for the parallel optimization of

Conclusions

Parallelization models for path expression queries are available in two forms: inter-object parallelization which exploits the associativity of complex objects, and inter-class parallelization which imposes upon process independency. From our analysis and experimentations, it can be concluded that inter-object parallelization will function well if a filtering mechanism in a form of selection operation exists. On the other hand, inter-class parallelization relies upon independency among classes,

References (22)

  • E. Bertino et al., Object-oriented query languages: the notion and the issues, IEEE Transactions on Knowledge and Data...
  • E. Bertino, L. Martino, Object-Oriented Database Systems: Concepts and Architectures, Addison-Wesley, Reading, MA,...
  • R.G.G. Cattell (Ed.), The Object Database Standard ODMG-93, Release 1.1, Morgan Kaufmann, Los Altos, CA,...
  • D. DeWitt et al.

    Parallel database systems: the future of high performance database systems

    Communication of the ACM

    (1992)
  • D.J. DeWitt et al., Parallelizing OODBMS traversals: a performance evaluation, The VLDB Journal 5 (1996)...
  • T.S. Dillon, P.L. Tan, Object-Oriented Conceptual Model, Prentice-Hall, Englewood Cliffs, NJ,...
  • R. Elmasri, S.B. Navathe, Fundamentals of Database Systems, 2nd ed., Benjamin Cummings, Redwood City, CA,...
  • G. Graefe

    Query evaluation techniques for large databases

    ACM Computing Surveys

    (1993)
  • E. Hart, Transim: Prototyping Parallel Algorithms, User Guide and Reference Manual, Transim version 3.5, University of...
  • K.-C. Kim, Parallelism in object-oriented query processing, Proceedings of the Sixth International Conference on Data...
  • W. Kim, A model of queries for object-oriented databases, Proceedings of the 15th International Conference on Very...
  • Cited by (4)

    • Exception rules in association rule mining

      2008, Applied Mathematics and Computation
      Citation Excerpt :

      Most of the parameters are continuous. In our experimentation, the simplified model of 10 parameters and 10 thousand records was employed [27]. The chosen parameters are listed in Table 3.

    • The impact of load balancing to object-oriented query execution scheduling in parallel machine environment

      2003, Information Sciences
      Citation Excerpt :

      We are particularly interested in complex object-oriented queries involving path expression and explicit join. Readers who are interested in parallelization and optimisation of path expression through traversal may find the following references useful: [23,28]. We will start now by giving a sample query, followed by how the query plan is and how the query plan differs from relational query trees.

    • High-Performance Parallel Database Processing and Grid Databases

      2008, High-Performance Parallel Database Processing and Grid Databases
    View full text