Elsevier

Information Systems

Volume 29, Issue 5, July 2004, Pages 365-384
Information Systems

Memory-adaptive association rules mining

https://doi.org/10.1016/S0306-4379(03)00035-8Get rights and content

Abstract

New application areas resulted in an increase of the diversity of the workloads that Data Base Management Systems have to confront. Resource management for mixed workloads is attained with the prioritization of their tasks, which during their execution may be forced to release some of their resources. In this paper, we consider workloads that consist of mixtures of OLTP transactions and association rule mining queries. We propose and evaluate a new scheme for memory-adaptive association rule mining. It is designed to be used in the case of memory fluctuations, which are due to OLTP transactions that run with higher priority. The proposed scheme uses dynamic adjustment to the provided buffer space. Thus, it avoids the drawbacks of simple but naive approaches; namely the thrashing due to large disk accesses that can be caused by the direct use of virtual memory or long delay times due to suspension. Detailed experimental results, which consider a wide range of factors, indicate the superiority of the proposed scheme.

Introduction

The development of new application areas resulted in an increase of the diversity of workloads that Data Base Management Systems (DBMS) have to confront. On one hand, we have the important class of tasks that process transactions, which comprise the OLTP workload. On the other hand, there exist the class of tasks that process complex, long-running queries for Decision Support applications. The latter class comprises the DSS workload and has achieved legitimacy as a business necessity during the recent years. Each of these two types of workloads has its own objectives and requirements. Nevertheless, the natural question arises whether one can provide DSS access to the data stored for OLTP in the same DBMS. In other words, can the mixture of OLTP and DSS workloads lead to consistent performance, or is the OLTP workload prone to receive such a contention from the DSS one, that will compromise its mission-critical nature? The vice versa question is also of much interest: i.e., whether the performance of the DSS workload can remain acceptable during the mixture with the OLTP one.1

One of the important issues that one has to consider in order to successfully mix OLTP and DSS workloads, is how to best allocate resources (e.g., processors, memory, disk) between them. The governing of mixed workloads is attained through the prioritization of their tasks that compete for resources. Therefore, scheduling algorithms have to be used, which will dynamically decide the distribution of resources [1]. On the basis of this paradigm, the tasks in the OLTP and DSS workloads can be forced (during their execution) to release some/all of their resources, or in contrast they may be given additional ones [2]. Evidently, the definition of a priority scheme depends on the particular type of OLTP or DSS applications. A common case is to consider OLTP as mission-critical application type, therefore the OLTP workload will be assigned the highest priority. In contrast, there exist some cases where real-time Decision Support is needed, for which the DSS workload will get higher priority.

Data mining queries, and association rules mining queries in particular [3], correspond to a significant type of DSS tasks today. Their characteristic is that they are resource intensive, since they require large CPU time and main memory space, and (usually) several database scans. For this reason, a large number of stand-alone, specialized algorithms and data mining systems have been developed. However, such approaches lead to a loose-coupling with the DBMS, which incurs a significant overhead when mining large operational databases [4]. To address this problem, several methods have been proposed that achieve a tighter-coupling of association rules mining queries with the DBMS [4], [5], [6]. Nevertheless, the issue of mixing a workload that contains association rules mining queries, with a OLTP workload has not been examined by most of the aforementioned methods. Recently, Riedel et al. [7] proposed a disk scheduling algorithm for mixed workloads that consist of OLTP transactions and general data mining tasks (that can be easily applied to association rules). According to this approach, the OLPT workload provides a consistent portion of its bandwidth to a background data mining task, without impacting OLPT transactions. The prioritization scheme in [7] assigns the highest priority to the OLPT workload, while trying to advocate the DSS workload as much as possible at the same time. As described previously, this kind of prioritization corresponds a logical choice and applies to most cases of interest.

Besides disk, main memory is another critical resource that affects the performance of mixed workloads. However, this factor has not been examined so far in the context of mixed workloads that contain data mining queries and OLTP transactions. Several association-rules mining algorithms assume the existence of unbounded main memory. In contrast, there exist algorithms that take into account the case of limited memory and use specialized buffering schemes like the ones in, e.g., [8], [9], [10]. However, even these approaches assume the allocation of a fixed amount of buffer space to the mining query throughout its lifetime. Therefore, for the purposes of prioritized execution of mixed workloads (as described previously), none of these algorithms can adapt (during their execution) to possible requests for releasing memory back to the DBMS. Instead, they may hold an amount of main memory until their completion, whereas, at the same time, higher-priority OLTP transactions may not be able to execute due to memory shortage. Clearly, the existing approaches will violate prioritization schemes that consider OLTP as mission-critical, rendering mixed workloads unfeasible.

It is important to notice that the previously described issues form a new problem (c.f., Section 3), which calls for the development of schemes that will allow association-rule mining queries to dynamically adapt to memory fluctuations. This is analogous to the introduction of memory-adaptive schemes for external sorting [2], which differ from regular external sorting algorithms (i.e., those not considering memory fluctuations). However, the problem of memory-adaptive sorting is much different in comparison to the one examined in this paper. Moreover, the development of dynamic memory-adaptive schemes for association rule mining differs significantly from the problem of developing efficient algorithms for the stand-alone execution of association rule mining (i.e., when the queries are not mixed with OLTP transactions), which has been the focus of recent research. The reason is that in the context of mixed workloads, the execution of mining queries may be impeded by OLTP transactions with higher-priority, and the buffer space provided to these queries will not be constant. Therefore, the execution time of a mining query does not depend solely on the efficiency of the corresponding (designed for stand-alone execution) algorithm that is used for the query. The effectiveness of the memory-adaptive scheme, the use of which is necessary in this context, is also of equal importance, as it determines how the execution of the query proceeds under memory fluctuations. Hence, the role of the two aforementioned factors is complementary. Moreover, simplistic memory-adaptive schemes, like the suspension of the query until sufficient memory is available or the direct use of virtual memory, are not a viable solution. As will be shown, such simplistic schemes impact the efficiency even of the best stand-alone mining algorithms, since they lead to large delays or thrashing.2

In this paper, we are interested in mixed workloads consisting of association rule queries and OLTP transactions, where the latter are considered as mission-critical. Therefore, through the use of a prioritizing scheme, memory fluctuations occur that move buffer space from the mining queries to the OLTP transactions (and vice versa). The main objective of the paper is the development of memory-adaptive schemes for association rule algorithms, which will allow them to handle memory fluctuations. To the best of the authors knowledge, no previous work has addressed this issue so far, which emerges when considering the described context of mixed workloads.

The technical contributions of the paper are the following:

  • An effective memory-adaptive scheme, which compared to simplistic ones, does not incur large delays or thrashing (in terms of disk I/O). Therefore, the proposed method can be used towards the problem of making mixed workloads feasible.

  • The description of ways to apply the proposed scheme to a significant class of association rules mining algorithms, especially those designed for tight-coupling with a DBMS.

  • A detailed experimental comparison, which considers a wide range of factors. The experimental results help to understand the performance tradeoffs of each scheme and to identify the advantages of the proposed one. Moreover, they show that the proposed method compares favorably against simplistic schemes even when the latter ones are combined with very efficient (in terms of stand-alone execution) mining algorithms, a fact that indicates the new requirements of the examined context.

The remainder of this paper is organized as follows. Section 2 presents the related work and Section 3 contains the problem description. In Section 4, we develop a new algorithm that addresses the problem of limited buffer size and serves as the framework for the development of the proposed memory-adaptive scheme. The latter is given in Section 5, along with two other schemes that are given for comparison purposes. The results on the performance evaluation of the described schemes are illustrated in Section 6. Finally, Section 7 provides the conclusions.

Section snippets

Related work

The problem of association rule mining has been proposed in [3] and since then, a large number of algorithms have been developed to address its different aspects. These algorithms can be generally categorized in two paradigms, according to the way they prune the search space and perform pattern generation. The first paradigm is denoted as Candidate set Generation and Test (CGT), which is based on the iterative generation of a set of candidate patterns, followed (at each iteration) by the

Problem description

The context studied in this paper considers mixed workloads consisting of association-rule mining queries and OLTP transactions, where the latter ones are mission-critical (i.e., take the highest priority). For this reason, under a prioritization scheme, main memory can be dynamically reallocated between the association rule queries and OLTP transactions. Therefore, the size of buffer memory provided to association rule queries fluctuates during their execution, and the queries may be forced to

The framework algorithm

In this section, we describe the framework algorithm upon which we will develop the proposed memory-adaptive scheme (that is given in the following section). The framework comprises a basic algorithm that takes into account the case of constrained available memory. First, for reasons of clarity in presentation, we develop a basic form of the framework in terms of the general structure of Apriori algorithm, and then we describe ways to extent it to other, more advanced algorithms of the CGT

Memory-adaptive schemes

In the previous section we described the algorithm-framework for the development of memory-adaptive schemes. These schemes determine the different policies that can be followed to handle memory fluctuations. For comparison purposes, we first describe two simplistic approaches and next we develop the proposed dynamic scheme.

Performance evaluation

In this section we present the experimental results on the performance. All the described schemes where implemented in C, using common components. Henceforth we refer to the memory-adaptive schemes as follows: to the suspension scheme as ‘SPND’, to the paging scheme as ‘PGNG’, and to the dynamic scheme as ‘DNMC’. In our measurements we considered the following factors: the magnitude and the rate of memory fluctuations, the duration of period for each memory release (memory given from the

Conclusions

We have examined the main memory management for mixed workloads, consisting of OLTP transactions and association-rule mining queries. We adopt a prioritization scheme that assigns the largest priority to OLTP transactions and considers that mining queries run in the background. Therefore, the latter are presented with the problem of having to adapt to varying buffer size.

We have developed a novel memory-adaptive scheme for association rule mining queries. To our knowledge, no prior work has

References (26)

  • R. Agarwal et al.

    A tree projection algorithm for generation of frequent item sets

    J. Parallel Distrib. Comput.

    (2001)
  • N. Pasquier et al.

    Efficient mining of association rules using closed itemset lattices

    Inform. Systems

    (1999)
  • K. Brown, M. Carey, M. Livny, Managing memory to meet multiclass workload response time goals, Proceedings of the...
  • H. Pang, M. Carey, M. Livny, Memory-adaptive external sorting, Proceedings of the International Conference on Very...
  • R. Agrawal, T. Imielinski, A. Swami, Mining association rules between sets of items in large databases, Proceedings of...
  • S. Sarawagi, S. Thomas, R. Agrawal, Integrating association rule mining with relational database systems: Alternatives...
  • S. Thomas, S. Chakravarthy, Performance evaluation and optimization of join queries for association rule mining,...
  • K. Rajamani, A. Cox, B. Iyer, A. Chadha, Efficient mining for association rules with relational database systems,...
  • E. Riedel, C. Faloutsos, G. Ganger, D. Nagle, Data mining on an OLTP system (Nearly) for free, Proceedings of the ACM...
  • R. Agrawal, R. Srikant, Fast algorithms for mining association rules in large databases, Proceedings of the...
  • J. Han, J. Pei, Y. Yin, Mining frequent patterns without candidate generation, Proceedings of the International ACM...
  • Y. Xiao, M. Dunham, Considering main memory in mining association rules, Proceedings of the International Conference on...
  • S. Brin, R. Motwani, J. Ullman, S. Tsur, Dynamic itemset counting and implication rules for market basket data,...
  • Cited by (0)

    Recommended by Kenneth A. Ross, Area Editor.

    View full text