Elsevier

Information Sciences

Volume 263, 1 April 2014, Pages 198-210
Information Sciences

Dividing secrets to secure data outsourcing

https://doi.org/10.1016/j.ins.2013.10.006Get rights and content

Abstract

Data outsourcing or database as a service is a new paradigm for data management. The third party service provider hosts databases as a service. These parties provide efficient and cheap data management by obviating the need to purchase expensive hardware and software, deal with software upgrades and hire professionals for administrative and maintenance tasks. However, due to recent governmental legislations, competition among companies and database thefts, companies cannot use database service providers directly. They need secure and privacy preserving data management techniques to be able to use them in practice. Since data is remotely stored in a privacy preserving manner, there are efficiency related problems such as poor query response time. We propose a new framework that provides efficient and scalable query response times by reducing the computation and communication costs. Furthermore, the proposed technique uses several service providers to guarantee the availability of the services while detecting the dishonest or faulty service providers without introducing additional overhead on the query response time. The evaluations demonstrate that our data outsourcing framework is scalable and practical.

Introduction

Data outsourcing or database as a service is a new paradigm for data management in which a third party service provider hosts database as a service. The service provides data management for its customers and thus obviates the need for the service user to purchase expensive hardware and software, deal with software upgrades and hire professionals for administrative and maintenance tasks. Since using an external database service promises reliable data storage at a low cost by eliminating the need for expensive in-house data-management infrastructure, it is very attractive for companies. However, recent governmental legislations, competition among companies and database thefts have pushed companies to use secure and privacy preserving data management techniques. Using an external database service is a straightforward server–client application in an environment where service providers and clients are honest and clients do not hesitate to share their data with database service providers. However, this is usually not the case and thus the research challenge here is to build a robust and efficient service to manage data in a secure and privacy preserving manner.

Current research has been focused only on how to index and query encrypted data [20], [21], [9]. Although one of the main problems is querying the encrypted data efficiently, it is not the only problem in data outsourcing. Since thousands of clients per database service provider are expected, the scalability of the proposed techniques and the availability of the services is a very important problem. However, current proposals do not consider this issue and assume a simple scenario consisting of an always available database service provider and a simple service user. Furthermore, they assume both of the parties are honest and trust each other. For example, the service provider may corrupt the data and it would be impossible to recover it for the service user. To be able to use external database service providers in real life, there should be a mechanism to recover the data and also to prove that data has been corrupted. Providing a trust mechanism to push both database service providers and clients to behave honestly is another important problem.

We propose a new data outsourcing framework providing efficient and scalable query response times. In addition to this, the proposed technique uses multiple service providers to guarantee the availability of the services and to be able to recover from hardware failures. Furthermore, we propose a technique to identify the dishonest or faulty service providers.

Current proposals use encryption to hide the content from service providers [20], [9]. However, the computational complexity of encrypting and decrypting data to execute a query increase the query response time. Therefore, this complexity is one of the bottlenecks in current solutions [3]. The proposed solution in this paper uses information theoretically secure techniques similar to Shamir’s secret sharing mechanism [29] instead of computationally secure techniques such as encryption. Furthermore, label-based filtration is used to execute range queries [20], [22]. However, a data provider reveals some information about the underlying data by labeling a row. Therefore, the computational complexity of our solution is much less than the current proposals using encryption. Therefore, there is a privacy performance tradeoff in these solutions. Our technique does not reveal any information about the content of the data and only the required data is retrieved from the service providers.

In this paper, we use multiple service providers for the fault tolerance. The fault tolerance in this context is the availability of service providers and the ability to recovery from data corruption. Data corruption may happen due to either disk failures or malicious service providers. Our solution deals with both these faults without incurring any additional overhead to the query response time.

The rest of the paper is organized as follows: The model and the types of queries are introduced and also related work is reviewed in Section 2. The basic attempts to solve the problem is discussed in Section 3. Section 4 presents the data distribution technique. The query processing methods for our data distribution technique is studied in Section 5. Section 6 discusses the fault tolerance of the proposed technique. The query response time of the technique is analyzed in Section 7. The last section discusses the future work and concludes the paper.

Section snippets

Solution overview and background

In this section, we define the problem and introduce the model. Then, we briefly discuss our solution and finally we review the related work.

Simple solutions for outsourcing numeric attributes

Data source D divides the numeric value in the numeric attribute into n shares and stores them at service providers DAS1, DAS2,  , DASn (one share for each of the service providers). The goal here is to divide a secret value into n shares to be stored at n service providers such that they cannot figure out the secret even if they combine their shares. The solution is based on, but slightly different than Shamir’s secret sharing method [29].

Our scheme allows data source D to distribute a secret

Practical solutions for secure data outsourcing

The solution proposed in Section 3 are impractical since the data source needs to retrieve all the information from the service providers to execute a query. The communication and computation cost paid for query processing makes them impractical. In this section, we will extend the techniques in Section 3 to be able to retrieve only the required data from service providers.

The key observation to achieve this is that the order of the values in the domain DOM = {v1, v2, …, vn} needs to remain the same

Query processing

In this section, we will discuss how to process queries in the Encryption with Labeling (EL) [20], [21] and Secret Dividing (SD) techniques discussed in Section 4. The queries are Exact Match Queries, Range Queries and Aggregation Queries.

Fault tolerance

There are two issues related to the fault tolerance: (1) Service availability and (2) Malicious service providers. Both of these issues are very important in using database services.

Data sources always need to answer their queries. In our scheme, a polynomial of degree k  1 is used to divide the secret and thus k shares and parties are needed to compute the secret. Therefore, in the secret dividing scheme if k of the n service providers are available, the queries can be answered using the shares

Evaluation

In this section, we will compute the query response time of the two techniques EL and SD for exact match, range and aggregation queries such as sum and average.

Let Cd be the cost of encryption, B be the bandwidth, T be the number of tuples required to answer the query, and S be the selectivity of the filtration. To answer the query in EL method, data source D retrieves S × T tuples and decrypts all of them. If the size of each tuple is b, then the communication cost would be: S×T×bB. And the cost

Conclusion

We proposed a novel privacy preserving data outsourcing framework in this paper. The proposed data outsourcing framework provides efficient and scalable query response times by introducing new efficient methods to store data at several service providers and also query them in a privacy preserving manner. Since the proposed technique uses several service providers, it guarantees the availability of the services. Furthermore, the dishonest or faulty service providers can be detected without

References (29)

  • Advances in cryptology – crypto 2007, in: A. Menezes, (Ed.), 27th Annual International Cryptology Conference, Santa...
  • G. Aggarwal, M. Bawa, P. Ganesan, H. Garcia-Molina, K. Kenthapadi, N. Mishra, R. Motwani, U. Srivastava, D. Thomas, J....
  • G. Aggarwal, M. Bawa, P. Ganesan, H. Garcia-Molina, K. Kenthapadi, R. Motwani, U. Srivastava, D. Thomas, Y. Xu, Two can...
  • G. Aggarwal, N. Mishra, B. Pinkas, Privacy-preserving computation of the k’th-ranked element, in: Proc. of IACR...
  • R. Agrawal, A. Evfimievski, R. Srikant, Information sharing across private databases, in: Proc. of the 2003 ACM SIGMOD...
  • R. Agrawal et al.

    A system for watermarking relational databases

  • R. Agrawal, J. Kiernan, R. Srikant, Y. Xu, Hippocratic databases, in: 28th Int’l Conf. on Very Large Databases (VLDB),...
  • R. Agrawal, J. Kiernan, R. Srikant, Y. Xu, Implementing p3p using database technology, in: Proc. of the 19th Int’l...
  • R. Agrawal et al.

    Order preserving encryption for numeric data

  • R. Agrawal et al.

    Privacy-preserving data mining

  • S. Agrawal, J.R. Haritsa, A framework for high-accuracy privacy-preserving mining, in: ICDE, 2005, pp....
  • E. Bertino, B.C. Ooi, Y. Yang, R.H. Deng, Privacy and ownership preserving of outsourced medical data, in: ICDE,...
  • C. Cachin et al.

    Computationally private information retrieval with polylogarithmic communication

    Lecture Notes in Computer Science

    (1999)
  • B. Chor et al.

    Computationally private information retrieval (extended abstract)

  • Cited by (41)

    • A new secure and searchable data outsourcing leveraging a Bucket-Chain index tree

      2022, Journal of Information Security and Applications
    • SHAMC: A Secure and highly available database system in multi-cloud environment

      2020, Future Generation Computer Systems
      Citation Excerpt :

      It is impractical in real cloud environment. Emekci [22], TwoCloud [23], and SDBS [24] apply the order preserving encryption to store the data which supports range query and aggregation query. However, they can only handle part of the operations such as addition, multiplication.

    • Multi-cloud applications: data and code fragmentation for improved security

      2023, International Journal of Information Security
    View all citing articles on Scopus
    View full text