Efficient Personalized Privacy Preservation Using Anonymization

ABSTRACT


I. INTRODUCTION
PRIVACY is very important issue when one wants to make use of data that includes sensitive information.Studies on protecting the privacy of individuals and the confidentiality of data is contributed from many fields, including computer science, statistics, economics.This is an field that attempts to answer the problem of how an organization, such as a hospital, government agency or any organisation, can release data to the people without harming the confidentiality of personal information.We focus on privacy measures that provide legal safety, present algorithms that protect data to make it safe for accessing while preserving useful information, and discuss methodes for analyzing the sensitive data.Many challenges still remain.It provides a summary of the current state, based on which we expect to see advances in years to come.As personal information is collected in increasingly detailed level by various organizations, privacy related concerns are introducing significant challenges to the data management organisations.Data anonymizationmethodes have been proposed in order to allow processing of personal data without compromising users privacy.Nevertheless, practical problems like dependencies between values in personal records do not obtain a satisfying solution.Here, we focus on the anonymization of tree-structured personal records links.Personal information do not comprise just a single tuple in modern information systems.The information concerning a single person usually spans over several tables or it is kept in a more flexible representation as an XML record.Such tree structured data could not be anonymized effectively with table based anonymization techniques since the structural relation between different fields substantially differentiates the problem.The difficulty in anonymizing tree structured data has been considered in existing research literature, in the technique of multirelationalkanonymity. In our methode we consider general case for tree structured data and we propose an anonymization method that is not dependent solely on thegeneralization of values, but also on the simplification of the data tree.

II. LITERATURE SURVEY
To introduce the concept of Efficient Personalized Privacy Preservation Using Anonymization.This paper analyzes many concepts of different authors as mentioned below: In the paper Anonymizing Collections of Tree-Structured Data, Olga Gkountouna and ManolisTerrovitis [1] introduces real-world data which have implicit or explicit structural relations.Privacy preservation has focused on data with a very simple structure, e.g.data with very complex structure such as network graphs, but has ignored intermediate cases.Here we focus on tree structured data.Such data is required from various applications, e.g.XML documents.A example is a database where information about a person is scattered amongst tables that are associated through foreign keys.k(m;n) anonymity, which provides protection and proposes a greedy anonymization technique that sanitizes large datasets.
Q Wang, C. Wang [21] introduces Enabling Public Verifiability and Data Dynamics for Storage Security in Computing, Computing has been thought as the next generation architecture of IT Enterprise.It moves the application software and databases to large data repositories, where managing data and services may not be fully trustworthy.This brings about many new challenges, which are not understood.This work studies the problem of ensuring the integrity of data storage in Computing.We consider the job of allowing a third party auditor (TPA), as a client.TPA removes the involvement of the client through the checking if the data stored in the is indeed intact.The support for data by the most general forms of operations performed ondata, such as insertion and deletion, is also a important step toward practicality, since services in Computing are not limited backup data only.
Ateniese [3] developed a dynamic provable data possession protocol based on cryptographic hash function and symmetric key encryption.The main thing is to pre compute a certain number of metadata during the setup period, so that the number of challenges is prevented and fixed beforehand.The author construct a highly efficient and secure PDP technique based largely on symmetric key cryptography.This technique allows outsourcing of dynamic data, that is, it efficiently supports operations, such as block modification, deletion and append.
A. Juels and B. S. Kaliski [4] ,introduces HLA Based Solution.It supports public auditing without retrieving data block.It requires constant bandwidth.It is possible to compute an HLA which authenticates a linear combination of the individual data blocks.
N. Cao, S. Yu, S. Yang [5],tells us about Using Virtual Machine.They proposed Virtual machines that use RSA algorithm, for client data encryption and decryptions.Also SHA 512 algorithm is used which makes message digest and check the data integrity.Digital signature is used as a identity measure for client.It solves the problem of unauthorized access, integrity, privacy and consistency.C.Erway, A.Kupcu [6] ,introduces Non Linear Authentication in which they suggested Homomorphicnon linear authenticator with randomized masking techniques to obtain security.K. Gonvinda proposed digital signature method to protect the privacy and integrity of data.RSA algorithm is used for encryption and decryption which uses the process of digital signatures for message authentication.S. Marium [7] introduced Extensible authentication protocol through hand shake with RSA.They proposed identity based signature for class conscious architecture.They provide an authentication protocol for computing (APCC) .APCC is more easy and efficient as compared to SSL authentication protocol.Here, Challenge handshake authentication protocol (CHAP) is used.When make request for any data or any service on the .The Service provider authenticator (SPA) orders the first request for client identity.Following are the steps: 1) When Client request for any service to service provider, SPA sends CHAP request challenge to the client.
2) The Client sends CHAP response or challenges which is calculated by using a hash function to SPA.
3) SPA compares the challenge value and its own calculated value.If they are similar then SPA sends CHAP success message to the client.

III. PROPOSED SYSTEM
We have proposed a novel method of privacy called closeness.We introduce two instantiations: a base model called t-closeness and a more flexible privacy methode called (n, t) closeness.We explain the rationale of the (n, t)closeness model and show that it gives a better balance between privacy and utility.The (n, t)-closeness model better protects the data while improving the utility of the released data.The t-closeness model was introducted to overcome attacks which were possible onldiversity( like similarity attack).l-diversity mdoel uses all values of a given attribute in a similar way(as distinct) even if they are semantically related.All values of an ssattribute are not equally sensitive.The algorithm to check (n,t) closeness could be given as follows.
The algorithm consists of following three subsections: 1) Choosing a dimension on which we have to partition :

Find Number of rows in patient-enq
2) selecting a value to split and start Suppression : Here we suppress using a zipcode.This zipcode is having 5 digits like 46982.The variable inc is the value to split if we set inc= 4. The zipcode is displayed as first 4 digit numbers like 4698**.And we set threshold value t=0.5F and n is the second highest value of table age-count according to patients age in the table.For example we contain this data in our patient table , 3) Checking weather partitioning violates the privacy requirement : After that we check this following calculation. (1) If Each row of our table satisfies the condition, our privacy requirement is satisfied .Else we decrement our inc value and again we test this condition satisfied by each row or not till this condition will satisfied.

Flow of algorithm : EARTH MOVER'S DISTANCE
Earth Movers distance is the difference between the probability distributions over a region d.The EMD is proposed on the minimal amount EMDmeasures the least amount of work needed to fill the of work needed to transform one distribution to another holes with earth.A unit of work corresponds to moving a unit of earth by a unit of ground distance.

IV. LIMITATIONS OF L-DIVERSITY
While the 'L-diversity principle represents an important step beyond k-anonymity in protecting against attribute disclosure, it has several shortcomings that we now discuss.'L-diversity may be difficult to achieve and may not provide sufficient privacy protection.L-diversity assumes an adversity who has knowledge of the form "Carl does not have heart disease," while closeness measures consider an adversary who knows the distributional information of the sensitive attributes.The goal is to propose an alternative technique for data publishing that remedies the limitations of L-diversity in some application.An interesting question is how to effectively combine the existing techniques with generalization and suppression to achieve better data quality and privacy.

V. CONCLUSION
As seen above as k-anonymity protects against identity disclosure, it does not provide sufficient protection against attribute disclosure.The technique of l-diversity attempts to solve this problem.We have shown that -diversity has a number of limitations and especially discussed two attacks on l-diversity.Motivated by these limitations, we have proposed a novel privacy methode called closeness.We propose two techniques: a base model called t-closeness and a more flexible privacy technique called (n, t) closeness.We explain the logic of the (n, t)-closeness model and show that it achieves a better balance between privacy and utility.Finally, through experiments on real data, we show that similarity attacks are a real problem and the (n, t)-closeness model better protects the data while improving the utility of the released data.(n, t)-closeness allows us to take advantage of anonymization techniques other than generalization of quasiidentifier and suppression of records.

ACKNOWLEDGMENT
I feel great pleasure to submitting this project paper on EFFICIENT PERSONALIZED PRIVACY PRESERVATION USING ANONYMIZATION.I wish to express true sense of gratitude towards my project guide, Prof. R. N. Phursule who at very discrete step in study of this project, contributed his valuable guidance and helped to solve every problem that arose.My great obligation would remain due towards Prof. S.R.Todmal (Head of Department), who was a constant inspiration during my project.He provided with an opportunity to undertake the project at JSPMs Imperial College Of Engineering and Research, Wagholi, Pune.I feel highly indebted to them who provided me with all my project requirements, and done much beyond my expectations to bring out the best in me.
I sincere thanks to our respected Principal Dr.S.V.Admane proved to be a constant motivation for the knowledge acquisition and moral support during our course curriculum.