Skip to main content

Task Oriented Privacy Preserving Data Publishing Using Feature Selection

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8436))

Abstract

In this work we show that feature selection can be used to preserve privacy of individuals without compromising the accuracy of data classification. Furthermore, when feature selection is combined with anonymization techniques, we are able to publish privacy preserving datasets. We use several UCI data sets to empirically support our claim. The obtained results show that these privacy-preserving datasets provide classification accuracy comparable and in some cases superior to the accuracy of classification of the original datasets. We generalized the results with a paired t-test applied on different levels of anonymization.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Sun, X., Wang, H., Li, J., Zhang, Y.: Injecting purpose and trust into data anonymization. Computers & Security 30, 332–345 (2011)

    Article  Google Scholar 

  2. Sweeney, L.: Achieving k-anonymity privacy protection using generalization and suppression. International Journal of Uncertainty, Fuzziness and Knowledge-Based System 10(5), 571–588 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  3. LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Workload-aware anonymization. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, USA, pp. 277–286 (2006)

    Google Scholar 

  4. Fung, B., Wang, K., Yu, P.: Top-down specialization for information and privacy preservation. In: Proceedings of the 21st International Conference on Data Engineering, pp. 205–216 (2005)

    Google Scholar 

  5. Wang, K., Yu, P., Chakraborty, S.: Bottom-up generalization-A data mining solu-tion to privacy protection. In: Proceedings of the 4th IEEE International Conference on Data Mining, Brighton, UK, pp. 249–256 (2004)

    Google Scholar 

  6. Iyengar, V.: Transforming data to satisfy privacy constraints. In: Proceedings of the 8th ACM SIGKDD Iinternational Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, pp. 279–288 (2002)

    Google Scholar 

  7. Byun, J.W., Bertino, E., Li, N.: Purpose based access control of complex data for privacy protection. In: The 10th ACM Symposium on Access Control Models and Technologies, Stockholm, Sweden, pp. 102–110 (2005)

    Google Scholar 

  8. Xiong, L., Rangachari, K.: Towards Application-Oriented Data Anonymization. In: First SIAM International Workshop on Practical Privacy-Preserving Data Mining, Atlanta, US, pp. 1–10 (2008)

    Google Scholar 

  9. Hall, M., Holmes, G.: Benchmarking attribute selection techniques for discrete class data mining. IEEE Transactions on Knowledge and Data Engineering 15(6), 1437–1447 (2003)

    Article  Google Scholar 

  10. Amaldi, E., Kann, V.: On the approximation of minimizing non zero variables or unsatisfied relations in linear systems. Theoretical Computer Science 209, 237–260 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  11. Guyon, I., Elisseff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)

    MATH  Google Scholar 

  12. Fung, B., Wang, K., Chen, R., Yu, P.: Privacy-Preserving Data Publishing - A Survey of Recent Development. ACM Computing Surveys 42(4), Article 14 (2010)

    Google Scholar 

  13. Nguyen, H.H., Kim, J.: Differential Privacy in Practice. Journal of Computing Science and Engineering 7(3), 177–186 (2013)

    Article  Google Scholar 

  14. Dwork, C.: Differential privacy. In: Proceedings of 33rd International Colloquium on Automata, Languages and Programming, Venice, Italy, pp. 1–12 (2006)

    Google Scholar 

  15. Clifton, C., Tassa, T.: On syntactic anonymity and differential privacy. Transactions of Data Privacy 6, 161–183 (2013)

    MathSciNet  Google Scholar 

  16. Soria-Comas, J., Domingo-Ferrer, J., Sanchez, D., Martinez, S.: Improving the utility of differentially private data releases via k-anonymity. CoRR abs/1307.0966 (2013)

    Google Scholar 

  17. Li, N., Qardaji, W.: Su, Dong.: Provably private data anonyization: Or, k-anonymity meets differential privacy, CoRR abs/1101.2604 (2011)

    Google Scholar 

  18. Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artificial Intelligence 97, 273–324 (1997)

    Article  MATH  Google Scholar 

  19. UCI repository, http://archive.ics.uci.edu/ml/

  20. Lefevre, K., DeWitt, D.J., Ramakrishnan, R.: Mondrian multidimensional k-anonymity. In: Proceedings of the 22nd International Conference on Data Engineering, Washington DC, USA, pp. 25–36 (2006)

    Google Scholar 

  21. Lin, K., Chen, M.: On the design and analysis of the privacy-preserving SVM classifier. IEEE Transaction on Knowedge and Data Engineering 23(11), 1704–1717 (2011)

    Article  Google Scholar 

  22. Monreale, A.: Privacy by design in data mining. PhD dissertation, universit‘a degli studi di pisa (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Jafer, Y., Matwin, S., Sokolova, M. (2014). Task Oriented Privacy Preserving Data Publishing Using Feature Selection. In: Sokolova, M., van Beek, P. (eds) Advances in Artificial Intelligence. Canadian AI 2014. Lecture Notes in Computer Science(), vol 8436. Springer, Cham. https://doi.org/10.1007/978-3-319-06483-3_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-06483-3_13

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-06482-6

  • Online ISBN: 978-3-319-06483-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics