Differentially Private Publication of Database Streams via Hybrid Video Coding

—While most anonymization technology available today is designed for static and small data, the current picture is of massive volumes of dynamic data arriving at unprecedented velocities. From the standpoint of anonymization, the most challenging type of dynamic data are data streams. However, while the majority of proposals deal with publishing either count-based or aggregated statistics about the underlying stream, little attention has been paid to the problem of continuously publishing the stream itself with differential privacy guarantees. In this work, we propose an anonymization method that can publish multiple numerical-attribute, ﬁnite microdata streams with high protection as well as high utility, the latter aspect measured as data distortion, delay and record reordering. Our method, which relies on the well-known differential pulse-code modulation scheme, adapts techniques originally intended for hybrid video encoding, to favor and leverage dependencies among the blocks of the original stream and thereby reduce data distortion. The proposed solution is assessed experimentally on two of the largest datasets in the scientiﬁc community working in data anonymization. Our extensive empirical evaluation shows the trade-off among privacy protection, data distortion, delay and record reordering, and demonstrates the suitability of adapting video-compression techniques to anonymize database streams.


INTRODUCTION
Much of what we touch and work with today automatically generates data that someone is disposed to collect and analyze.The availability of massive amounts of such data -frequently at the individual level-play a fundamental role in the extraction of knowledge and decision-making in contexts as varied as business competitiveness, marketing, social relationships, transportation, health and wellbeing, education and politics [1].
Despite the economic and societal good that comes from big-data research, raising tensions exist with the perceived risks to individuals' privacy [2], [3], [4].To deal with these tensions, current legal frameworks in Europe and other regions limit the collection, processing and sharing of personally identifiable information (PII).Basically, the controllers of PII have a series of obligations towards the individuals to whom the PII corresponds, which include, among others, seeking their consent, guaranteeing them rights to access, rectification and erasure.
The advent of big data, together with the development of data science in general and machine learning in particular, has raised the question of how to leverage those PII-data for secondary purposes (i.e., other than the purpose at collection time), since complying with the above-mentioned legal obligations is extremely difficult in a scenario where a bunch of controllers may exchange and fuse data.It is precisely in this situation where anonymization comes into the picture, as the tool that legitimately allows circumventing the legal restrictions applicable to those data.
Differential privacy (DP) [5] is one of the most prominent privacy notions in the field of anonymization.In the interactive setting, the assumption is that an anonymization mechanism sits between an analyst submitting queries and the database 1 answering them.In the non-interactive scenario, on the other hand, a protected version of the original database is generated and released, which allows any entity (not necessarily the data analyst in question) to perform any analyses on the protected data, and permits using such data, possibly in combination with other information, for secondary purposes.
The assumption in most of the current anonymization technology, however, is that the original database does not change over time and there is no need to publish it more than once [6].Nonetheless, in the current context where colossal amounts of data are generated every single day [7], this is by no means realistic.
Our work tackles the problem of anonymizing dynamic databases with DP guarantees.We focus on the most challenging case, data streams, where only new records and record updates are published at certain release times, data freshness is critical, and the order in which the protected data are released matters.For this type of data, the vast majority of proposals deal with publishing either countbased or aggregated statistics about the underlying dynamic data (e.g., [8], [9]).To the best of our knowledge, only [10] has studied the publication of the database itself (rather than statistics derived from it) in a context of data stream.Nonetheless, that work is intended only for data sets with a single attribute and does not contemplate record updates, which renders the anonymization scheme useless for practical stream-data based systems.

Contribution and Plan of this Paper
The main contribution of this paper is an anonymization method that can publish multiple numerical-attribute, finite database streams with DP guarantees through hybrid video encoding techniques.The proposed method relies on the signal compression scheme differential pulse-code modulation (DPCM), and is optimized in a number of different ways to allow record updates and to provide high-privacy protection and high-utility guarantees in terms of data distortion, delay and record reordering.
Our solution operates with blocks of records, which are input into a closed loop consisting of several modules: preprocessing, analysis-synthesis, quantization, prediction and encoder control.On the one hand, the preprocessing, prediction and encoder control modules work jointly to select a permutation of the records of the block and a configuration of the prediction module that minimize the error in predicting the block; the prediction module can be configured to both leverage statistical dependencies inside frames (i.e., groups of blocks protected together) and exploit dependencies among the different frames of the database stream.On the other hand, the analysis, synthesis and quantization modules operate jointly to choose the transform coding scheme and the number of transform coefficients that will be protected in order to minimize the mean squared error (MSE) incurred in releasing the synthetized, protected block (instead of the original one).
The proposed solution is evaluated experimentally on two real data sets, "(Very) Large Census" and "Quant Forest", which are two of the largest datasets in the community of statistical disclosure control.A variety of empirical results shows the trade-off among privacy protection, data distortion, delay and record reordering, and demonstrates the suitability of our approach.
The remainder of this paper is organized as follows.Sec. 2 establishes some preliminaries and reviews the state of art relevant to this work.Sec. 3 formally states the problem tackled in this paper.Sec. 4 describes our approach to generate DP database streams through hybrid video encoding techniques.Sec. 5 conducts an experimental evaluation of the proposed anonymization method.Sec.6 discusses previous work on differentially-private transform coding.Finally, conclusions are drawn in Sec. 7.

Differential Privacy
DP was originally proposed as a privacy model in a interactive setting to protect the outcomes of queries to a database.In this setting, the assumption is that an anonymization mechanism sits between a user submitting queries and a (trusted) database curator answering them.
Our work focuses on a non-interactive setting, where the curator releases a protected version of the database, allowing the user to perform hopefully any analysis on the data without further interacting with the curator.
Central to DP is the notion of neighbor databases, which can be interpreted in two different ways.On the one hand, the unbounded case assumes one entry is either removed or added.On the other hand, the bounded notion considers the replacement of one record by another.An important difference is that the former case assumes the size n of the database to be publicly known, whereas the latter assumes this parameter is private.Nonetheless, the two notions of neighborhood are very related and mechanisms satisfying one can be adapted to meet the other.For the sake of mathematical simplicity, we use the latter definition.
We shall consider central DP 2 , as defined below.
Definition 1 (L1-sensitivity [5]).Let D be the class of possible data sets.The global sensitivity or L1-sensitivity of a query function f : D → R d is defined as where x, x are any two neighbor databases in the sense described above.

Related Work
In this subsection, we review the state of the art relevant to this work.We first examine the classical approaches to anonymize static data sets, and secondly analyze those proposals aimed to protect dynamic data.In both cases, the privacy model assumed is DP.

Histograms versus Record Masking
Even if DP was initially proposed to limit disclosure risk in database queries, mechanisms to generate DP data sets (i.e., the so-called non-interactive setting) appeared soon after its inception.Nonetheless, except for the simplest data domains, publishing useful DP data sets (i.e., data sets that well approximate the original ones) remains a highly challenging task.
There exist two main approaches to generate DP data sets: histograms and record masking.In the former case, given an original data set x, we generate a histogram h through a suitable partitioning of the data domain.From this point on, we discard x and the target of protection is h.Hence, the goal is to publish h ε , an ε-DP version of h.In the latter case, the aim is to generate x ε , an ε-DP version of x, that is, an anonymized version of the data in the original format.
The histogram approach takes advantage of the low sensitivity of counting queries over a partition of the data domain [11].The naive application of this mechanism, however, becomes problematic as the complexity of the data domain increases.Note that, for a fixed accuracy, the cardinality of the partition (number of bins) grows exponentially with the number of attributes, which may have important effects on the computational cost and the accuracy of the protected data.Some mitigation strategies have been proposed to tackle the issues caused by data dimensionality.In [12], given a partition, the authors propose an algorithm that minimizes the error for a given family of counting queries.In [13], data summarization techniques are utilized to reduce the time and space complexity, by making time and space proportional to the number of non-empty cells in the summarized data set.An alternative way to deal with those issues is to apply dimensionality-reduction techniques.This is the strategy followed by [14], which models the dependency between attributes to generate the DP data set from a set of low-order marginals.
The alternative to generate DP data sets based on record masking avoids partitioning the data domain.Instead, the data set is protected by masking the original records.However, masking each record by adding a Laplace-distributed noise with magnitude proportional to the record sensitivity is not a feasible solution.Since the purpose of DP is to hide the presence of any single record, such a naive approach inescapably needs to introduce too much noise, thereby producing significant utility damage.
As a result, a wide body of research has investigated how to reduce the sensitivity of the queries used to generate the DP data sets.A few examples include [15], [16], [17], where microaggregation [18] is utilized with that purpose.In the cited works, rather than querying each original record, only the representatives of the microaggregation clusters are queried.Since a cluster representative is an aggregation of the records in the cluster, intuitively its global sensitivity is smaller than that of any single record.Clearly, the amount of sensitivity reduction depends on how such representative values are computed.

Differentially Private Publication of Dynamic Data
The aforementioned anonymization schemes assume that the original data set does not change over time and, therefore, that there is no need to publish them more than once.However, in the current context of big data, this seems not realistic.
Obviously, a straightforward application of the previous schemes to the scenario at hand would still be possible.Nonetheless, applying those methods independently at each release time, i.e., without considering correlations between consecutive releases or the dynamics of the data stream, may not be an appropriate approach.
Few recent works have tackled the problem of protecting dynamic data sets with DP guarantees.Essentially, the privacy research community has focused on two distinct scenarios.In the former scenario, all available data or a synopsis thereof (e.g., histograms) are anonymized periodically, although not necessarily at regular time instants.On the contrary, in the latter scenario, (i) data items are not republished in multiple versions, i.e., only new or updated data are protected at a given release time; (ii) time is critical, in the sense that a new or updated data item must be anonymized and published within a predefined, short time frame; and (iii) the order in which the protected data are released matters.Following the terminology of [19], we shall refer to these two scenarios respectively as multiple release and data stream.
Distinct technologies have been developed for each case.In the multiple-release scenario, [20] studies the problem of publishing histograms of dynamic data sets.Instead of generating a DP histogram at each release time, the cited work proposes computing only new histograms when the update is significant, that is, when a distance measure between the current histogram and the latest released histogram exceeds a threshold.The proposed strategy is independent of how histograms are computed at each release time, and the goal is to adjust the threshold adaptively based on data dynamics.The main problem of this proposal is that it suffers from all the limitations of the static histogram approach mentioned in Sec.2.2.1.
Another proposal for multiple release is [21], which deals with the publication of histograms as well, but combines sampling [22] with clustering (i.e., time units with similar trends are grouped) to improve utility.The proposed solution, however, adopts an event-level DP approach [23], which protects the presence of an individual event, i.e., an individual's contribution to the data stream at a single time point, rather than their presence or contribution to the entire publication series (also known as user-level DP).
In the case of data stream, the vast majority of proposals focus on publishing either count-based or aggregated statistics.One of those works is [8], which aims to protect count series (e.g., the daily count of people diagnosed with HIV/AIDS) over individuals continuously.The proposed scheme provides user-level DP and assumes the series are generated by an underlying process from which predictions are made to enhance the accuracy of the released data.However, a statistical model of the process needs to be assumed or inferred from public data with similar patterns, and therefore the anonymization scheme may not be effective when the actual data deviate from it.
PeGaSus [24] is another proposal that aims to release continuous count-based statistics.Unlike [8], the notion of neighborhood between databases (and so DP) is modified here to suit streaming analytics but it is only intended to protect single-data events, analogously to event-level DP.
A more recent work is OptStream [9], which generates a sequence of protected data where each term represents a private version of the aggregated data (e.g., a count) up to a given time instant.The proposed solution relies on the w-event framework [25], which extends the definition of DP to protect stream analytics.However, like PeGaSus, it cannot be applied to release the database stream itself and, besides, the target of protection are not individuals' full contributions to the stream 3 .
To the best of our knowledge, only [10] has studied the publication of the database itself (rather than statistics derived from it) in a context of data stream, which is the focus of this work.δ-DOCA, as the method is called, adopts a record-masking approach and provides central ε-DP, which means all contributions (and not only some 3. w-event privacy does not protect event sequences occurring beyond a time window of size w.consecutive pieces thereof) are protected.Nonetheless, it is intended only for data sets with a single attribute and does not contemplate record updates.

PROBLEM STATEMENT
We shall follow the convention of using uppercase letters for random variables (r.v.'s), lowercase letters for the particular values they take on, and bold letters for matrices.Probability density functions (PDFs) and probability mass functions (PMFs) are denoted by p and subindexed by the corresponding r.v.We adopt the same notation for vectors in [26] and use parentheses to construct column vectors from comma-separated lists.
We study the protection of database streams4 with central DP guarantees, which means there is a trusted entity (i.e., the curator) that gathers data continuously from a population and takes charge of protecting them from the outside world.
There are multiple ways to define DP in such a data streaming setting, e.g., at the granularity of attributes [27], events [23], windows of events [25], records or individuals.This work assumes the required protection is at the individual level (also known as user-level DP), that is, the curator aims to protect all tuples or records corresponding to any individual in the stream database.
Mathematically, we model database streams as discrete time vector processes.An original database stream {S i } is defined, accordingly, as a sequence of continuously incoming tuples S i = (I i , A i1 , . . ., A id ), where I i is an r.v.denoting the identity of the subject to whom S i corresponds, and A i1 , . . ., A id are r.v.'s representing d attributes of that subject.Throughout this work, we shall also refer to the tuples of a database stream as records.
In general, the protection of a stream requires some sort of distortion (e.g., Laplace-noise addition) of the original attribute values, and therefore implies inevitably some information loss.We denote by {S ε i } an ε-DP version of the original database stream {S i }, that is, a sequence of continuously output tuples S ε i = (A ε i1 , . . ., A ε id ), where identities are removed and A ε i1 , . . ., A ε id are suitably distorted versions of the attribute values in the original tuple corresponding to S ε i .To quantify how well the distorted attribute values approximate the original ones, we shall use the sum of squared errors (SSE), a measure of distortion frequently employed in the evaluation of DP mechanisms.
The degree of distortion of the protected attribute values is one dimension of the information loss incurred by a protection method.The other dimensions are related to the fact that some, or all, of the records in the original stream may be delayed and reordered; obviously, any method for data streams must buffer incoming tuples before protecting them.Next, we slightly generalize the delay-constraint definition of [28].
Definition 3 (Delay constraint).Let M be a protection mechanism that takes as input a database stream {S i } and outputs an ε-DP stream {S ε i }.For a positive integer δ, M is said to satisfy the delay constraint δ if, upon receiving any new tuple S i , M has already output all the protected tuples corresponding to tuples in {S i } with position less than i − δ + 1.
While delay constraints are common in the context of data stream, to the best of our knowledge no attempt has been made to preserve the order of the incoming records.In [10], for example, tuples are reordered as much as needed to satisfy maximum attribute homogeneity for a given delay, ignoring the value of the information encoded in such order.To make our analysis as comprehensive as possible, we shall quantify the impact of such reordering through a reordering cost function.
Unlike [10], we also contemplate tuple updates, meaning there can be tuples arriving at different time instants that belong to a same subject but contain different attribute values.In this work, we require that such updates satisfy the following mild constraint.Definition 4 (Tuple-update constraint).Let {S i } be an original database stream and T ⊆ {S i } the sequence of all tuples corresponding to a given subject.For a positive integer α, the original stream satisfies the tuple-update constraint α if, for any subject and any two consecutive tuples of T , such two tuples differ at least in α positions in {S i }.
Informally, Definition 4 tells us that we should expect a lag between a tuple and its update, or between two consecutive updates.With a mild loss of generality, this work will assume α δ.Since, by Definition 3, the maximum number of buffered tuples at any moment is δ, the tupleupdate constraint ensures those two tuples (i.e., a tuple and its update, or two consecutive updates) will not coincide in the buffer.In real practice, however, if the condition α δ is not met, only the most recent tuple will be output.
A direct consequence of the fact that tuples can be updated is the finite length of the protected database stream.Since the level of protection ε is necessarily finite, by the sequential composition property of DP [29] the privacy budget will be consumed completely at some time instant.We shall denote by l the target length of the protected database stream, that is, the number of incoming records the database curator wishes to protect.
Given all such considerations, the problem tackled in this work is as follows.We aim to design a DP mechanism suitable for database streams that, for a given ε and l, achieves serviceable points of operation in the privacyutility trade-off, being utility measured as distortion, delay and reordering.

ING
This section describes our methodology to publish DP database streams through hybrid video encoding techniques.
In this work, we propose the masking of database streams at the record level, instead of at the histogram level.Doing so is computationally efficient, since the cost is linear with the number of records.However, plain independent masking of the records in the original database stream may degrade utility severally, as we describe next.For a positive integer r, define the identity function I r ({S i }) as the function that returns the attribute values of the r-th element (i.e., record) of {S i }.Since the whole process {S i } can be interpreted as the collected answersexcept for subjects' identity-to the queries I r ({S i }) for all available elements, an intuitive way to generate the protected stream S ε 1 , . . ., S ε l with δ = 1 is collecting an ε/l-DP response to each I r ({S i }) for r = 1, . . ., l.Since we allow record updates, it follows from the sequential composition property that S ε 1 , . . ., S ε l also meets the desired ε-DP requirement.In short, with this methodology, the protected database stream is generated by providing a DP response to the queries asking for the values of all attributes in l records of the original sequence.
Although this record-level perturbation methodology does not make any assumptions on the uses of the output data, unfortunately it may come at the expense of a huge information loss.Throughout this paper, we shall assume each attribute j takes on values in the interval [0, Λ j ], and denote by Λ the column vector (Λ 1 , . . ., Λ d ).Since each query I r refers to a single individual, its L1-sensitivity is as large as d k=1 Λ k , which implies a huge distortion to attain ε-DP.The result is a database stream S ε 1 , . . ., S ε l with very limited utility.
To make record-level masking viable to generate DP data sets, there is an evident need to reduce the sensitivity of the query function/s to be used.In the following subsections, we shall describe a method that protects, at a time, groups of tuples conveniently sorted, and exploits statistical dependencies among releases.

Overview
We propose a protection method that relies on the DPCM, which is closely related to the concept of closed-loop predictive quantization.The basic structure of our method is illustrated in Fig. 1 and described succinctly next.Each individual module is analyzed in greater detail in the following subsections.
In our protection method, the database stream tuples {S i } are not directly processed, but buffered at a preprocessing module, where records are appropriately sorted.In particular, as soon as m records are available at this module, groups of n < m consecutive records are removed from the buffer and input into the closed loop successively, i.e., one after another.We shall assume that m = n b for some integer b > 1, and that groups are processed in order of their records.Following the terminology of numerous image and video compression formats, we shall refer to this processing unit as block.In analogy to video coding, the set of b of such blocks will be called a frame.
From a notational point of view, note that, while i indexes individual records within the original database stream, j indexes blocks of n records within the closed loop.On the other hand, since all modules inside the loop operate at the block level, for mathematical convenience we shall model such blocks as random matrices of dimension n × d.Hence the notation of Fig. 1.
Essentially, each block X j is predicted based on the previous protected blocks Xj−1 , Xj−2 , . . ., Xj−π , for some integer π.The prediction block Xj is subtracted from the preprocessed input block X j , thereby yielding a prediction error E j = X j − Xj .The block E j is then transformed, quantized and protected with ε j -DP, respectively by the modules analysis and quantization.The synthesis module afterwards reverses the previous transformation and the upshot is a protected and reconstructed block Ẽj for the prediction error E j .Then, Ẽj is added to the predictor Xj , resulting in the reconstructed output block Xj .Releasing Xj in a single batch yields n consecutive records of the protected database stream S ε 1 , . . ., S ε l .The fundamental principle upon which the above methodology relies is difference quantization.One simple but important result that follows from the fact that is that the overall MSE in releasing Xj instead of X j is equal to the MSE incurred in quantizing E j .Formally, where • F denotes the Frobenius norm.When Xj in (1) is a prediction of X j based on some information about the past of X j , Eq. ( 2) is called the fundamental theorem of predictive quantization [30].Note, however, that (2) holds for any Xj regardless of whether it is a prediction of X j or not.When it is, in the context of image and video compression, algorithms can be more efficient.In our context of database stream, we shall show that privacy protection can be provided with less distortion, drawing an analogy between these two fields.
We have mentioned that the analysis module applies a transformation on the prediction error block E j .Although multiple transformations are possible, here we use the most popular one in image and video compression, the discrete cosine transform (DCT), as well as the discrete sine transform (DST) and the discrete Hartley transform (DHT).Apart from variety, the reason for our choice is as follows.They are all orthogonal, two-dimensional separable and dataindependent, and they all exhibit high-energy compaction, meaning that information, after being transformed, tends to be concentrated in a few, low-frequency transform coefficients.
As we shall describe in Sec.4.3, the quantization module will be in charge of selecting which coefficients are retained and perturbed with the Laplace mechanism, and which ones are removed.Regardless of the selection criterion, however, predicting X j from the reconstructed (and protected) past has two immediate advantages.On the one hand, the variance of the error block E j will in principle 5 be less than the variance of the original block X j , so that a reduced range of values will be transformed and protected.In image coding, predictive quantization (without transform coding) has the ability to increase the accuracy of the quantized values without increasing the number of coding bits.In our case (where we additionally consider transform coding), a smaller variance of the elements of E j will intuitively translate into a smaller number of high-frequency transform coefficients.As a result, the same privacy budget ε j will be distributed among less coefficients, thus yielding less distortion.
On the other hand, predicting X j from reconstructed blocks has an evident advantage both in video coding and in database streams.In the former application, it allows both an encoder and decoder to generate the same block Xj without transmitting any additional information from the former to the latter.In our case, due to the post-processing property [31] of DP, we shall generate each prediction block without consuming any privacy budget.
With the proposed method, we shall therefore be able to output ε j -DP blocks.At the frame level, since α δ m > n, each of the blocks of a same frame will contain records belonging to different subjects 6 .The result is that each protected frame will also satisfy ε j -DP by the parallel composition property of DP [31].To meet the requirement of protecting l input records, it will suffice to set ε j = ε m/l for all j.

Transform Coding
The aim of transform coding is to apply an adequate linear transformation on each input block, so that the transform coefficients are much less correlated than the original samples and the information is more "compact" in the sense of being concentrated in only a few of the transform coefficients 7 .It is important to note that transform coding exploits only dependencies among the samples of a single block.For additionally utilizing dependencies among transform blocks and frames, intra-picture and inter-picture prediction techniques can be used.
5. As long as the prediction is good enough.6. Said otherwise, the sets of subjects protected in those blocks will be non-overlapping.
7. We emphasize that there is no general theoretical result that states that uncorrelated quantities can be more efficiently quantized than can correlated variables.
Transform codes are popular because they provide an attractive compromise between computational complexity and performance.As mentioned in Sec.4.1, we shall use, among others, the DCT, a data-independent transform that is employed in all practical video coding schemes.Although there are several DCTs, the DCT-II is probably the most commonly used form and is often simply referred to as "the DCT".In addition to the DCT, our scheme also incorporates the DST-I and the DHT.
For notational simplicity, in this subsection we shall drop the subindex j of the r.v.'s represented in Fig. 1.In addition, we shall assume realizations of these variables.
Let a n = [a n ij ] denote the n × n transformation matrix of any of the three transforms employed by the analysis and synthesis modules.In the case of the DCT, the entries of a n are In the case of the DST and DHT, the entries of the corresponding matrices are respectively Recall [32] that, given a matrix x of dimensions n × d, the forward and inverse transform of a separable, twodimensional transformation is given respectively by Our next result, Lemma 1, derives the global sensitivity of the transformed coefficients of a separable, twodimensional transformation, when a prediction block is subtracted from an input block.The strength of this result lies in that it is not restricted to the transforms contemplated in this work.
Lemma 1 (Sensitivity of transform coefficients).For any i = 1, . . ., n, denote by r * (i) the index that maximizes |a n ir |.Let x be an observed block of n 2 records and d attributes, x a prediction block, and e the corresponding error.Denote by f cij the query function that returns the element (i, j) of the transform block c = a n e a d T .The L1-sensitivity of this function is Proof: Consider two neighbor input blocks x and x , and their corresponding transformed error blocks c and c .For any r ∈ {1, . . ., n}, denote by x r = (x r1 , . . ., x rd ) and x r = (x r1 , . . ., x rd ) the respective values of the different record in either input block.Clearly, since x does not depend on x or x , but on previous reconstructed blocks, From (3), simple algebraic manipulation then shows Accordingly, where (a) reflects that the maximization of |a n ri | with respect to all x r and x r depends just on the position index r; and (b) holds with equality since the components of x r and x r can be chosen so that all terms a d kj (x rk − x rk ) have the same sign.An important conclusion that follows from Lemma 1 is that the sensitivity of any coefficient c ij (regardless of the particular transformation used) depends on the sensitivity of each and every attribute, rather than on a single Λ k .In other words, there is no a one-to-one correspondence between the sensitivity of the attribute value of a record in x, and that of the transform coefficients, which in principle may limit the benefits of transform coding.Our next result, Corollary 1, shows that this limitation is, fortunately, compensated in part by an averaging effect of Λ.Before proceeding, we first prove an interesting property of the DCT transform matrix, used in the corollary.
Next, assume 2j − 1 and 2n are not coprime integers.Denote by d their greatest common divisor and verify that d 3. Define i = 2n/d + 1 and check i may take values on {2, . . ., n}.Since 2j − 1 = βd for some β ∈ Z, it follows that and therefore |a n ij | > a n 1j .The following result, Corollary 1, compares the sensitivity of the coefficients of the DCT, DST and DHT, with that of I r , the identity function used by the naive recordperturbation approach, which we described at the beginning of this section.Also, the corollary shows the low sensitivity of the DCT coefficients of the first row.
Corollary 1.Let GS(I r ) denote the L1-sensitivity of I r , and f c cij the query function that returns the element (i, j) of the DCT.For any i = 1, . . ., n and any j = 1, . . ., d, The first claim is immediate from Proposition 1 and Lemma 1, by noting that | for the DCT matrix a n .For the same transform and for i 2, it follows that In the case of a DST and a DHT, an entirely analogous derivation leads to GS(f s cij ) 2GS(I r )/ √ nd and GS(f h cij ) 2GS(I r )/ (n + 1)(d + 1), respectively.Since GS(f c c1j ) GS(f c cij ) from claim (i), we prove the second statement.
Corollary 1 tells us that the sensitivity values of the transform coefficients are significantly lower, compared to that of the baseline identity function.Specifically, for n = d, GS(f cij ) can be interpreted roughly as averaging Λ by the number of records (attributes).
Direct application of Lemma 1 allows us to examine the differences in terms of sensitivity among the cosine, sine and Hartley transforms.For ease of comparison, we define the sensitivity relative difference between transforms σ and ρ as where σ, ρ ∈ {c, s, h}.Fig. 2 shows the percentage values of the quantities r cs , r ch and r sh for two square block sizes, namely, n = 8 and n = 16.
Several remarks are in order from this figure.First, we observe that the DST is preferable to the DCT except for roughly two rows, i = 1 and i = 5 for n = 8, and similarly for n = 16 (Fig. 2(a)); this observation is consistent with the first claim of Corollary 1.When compared to the DHT, however, the sensitivities of the DST coefficients are observed to be much larger in odd rows for n = 8.In contrast, this latter transform exhibits smaller sensitivities in columns 1, 5, 9 and 11 for n = 16.
In general, the sine and Hartley transforms seem to be more suitable, as reflected in Fig. 2(d), where for each coefficient we represent the transformation with the least sensitivity.This is evident for n = 8, where, for all but 6 coefficients, these transforms outperform the DCT.The case n = 16 is less clear although it still shows the DHT as the transformation with the largest number of coefficients with least sensitivity.We would like to stress that this does not signify the other two transforms are inappropriate.In fact, the suitability of any transform will hinge upon the block size, the individual attribute sensitivities Λ, and more importantly, the specific coefficients to be protected as well as the ability of the transform to compact energy.

Quantization
In source coding, lossy systems are characterized by the fact that the reconstructed signal is not identical to the source signal.The process that introduces the corresponding loss of information is called quantization, and the algorithm that performs the quantization process is referred to as quantizer.Although in image and video coding the information loss is due to analog-to-digital conversion, in a mild abuse of terminology we refer to quantization more generally as the process whereby distortion is introduced.In this subsection, we shall omit the block index j and therefore subindexes will denote elements of the corresponding matrices.For simplicity, we shall also drop the subindex of ε j .
The purpose of introducing distortion is to satisfy a DP requirement.As we shall show next, our quantizer will be designed to cause the least possible loss of information while meeting this requirement.Although we shall be looking at the overall MSE in releasing Xj (instead of X j ), a typical measure of performance for the quantizer is the coding gain [30], defined as the ratio which is simply the signal-to-noise (SNR) power ratio achieved by the quantizer.
Our quantizer aims to appropriately select a subset of transform coefficients of C, protect them through the Laplace mechanism, and eliminate the remaining ones.Let t be the number of retained coefficients, and ε ∈ R n×d + a matrix with the privacy budget ε ij assigned to each of them.We consider implicitly that ε ij = 0 if the transform coefficient C ij is not selected.On the other hand, we assume where L is a zero-mean Laplacian r.v. with scale GS(f cij )/ε ij .Quantization therefore incurs two sources of error: first, the error due to eliminating nd−t coefficients, and secondly, the noise added to the remaining t coefficients to attain ε Q L -DP.We shall refer to these two errors as coefficients-removal and Laplace errors, respectively.Clearly, there is a trade-off between such two errors.For a fixed ε Q L , if t approaches nd, the coefficients-removal error will likely be small or even negligible, but the privacy budget will need to be distributed among a significant number of coefficients, thereby causing the Laplace error to be large.The opposite occurs if t is small compared to nd.The fundamental questions that we address next are (i) how to choose t; and (ii) given t, which coefficients of C need to be protected, so that these two decisions cause the minimum overall distortion.We tackle these two questions in reverse order.

Selection and Protection of Transform Coefficients
Intuitively, in the choice of transform coefficients, their global sensitivities as well as the possible values they may take on will play an important role.Let ν ij = Pr{C ij > 0}.
In video coding, it is typically advantageous to arrange the transform coefficients C ij of a block in the order of decreasing probabilities ν ij .However, the transform coefficients of a block have to be transmitted in a certain order that is also known to the decoder.Making this order data-dependent is clearly inefficient, since it would need to be conveyed on a per block basis.
Most video coding standards adopt a predefined, signalindependent approach by leveraging the fact that, in transformed error blocks, ν ij usually decreases with increasing frequency indexes i and j.A signal-independent scan in video coding that approximately arranges the transform coefficient values in the desired order is the zig-zag scan.This scan, which is illustrated in Fig. 3(a) for the example of a 4 × 4 block, is used in most video coding standards.H.265, also known as MPEG-H Part 2 or high efficiency video coding (HEVC), may operate with the diagonal scan depicted in Fig. 3(b).The two scans have similar properties but the latter provides some benefits for certain implementations.
In our case, arranging the coefficients of C according to ν ij is a data-dependent operation and, as such, would not satisfy DP.To cope with this, we follow an approach entirely analogous to that of video coding and assume the coefficients of C are arranged in an order defined by a coefficients order O. Accordingly, given such an order and a number t of coefficients to protect, our quantizer proceeds just by selecting the first t coefficients in the given order.Next, we examine how these coefficients are protected.
Denote by ξ X (ε, O, t) the MSE incurred in outputting X instead of X, where conveniently we make explicit its dependency with the assignment of the privacy budget to the t selected coefficients, and with the parameters specifying which concrete coefficients are to be protected.Our next result shows that this error consists in the sum of the MSEs due to the removal of coefficients and DP protection at the quantizer.
Lemma 2 (Laplace and coefficients-removal errors).Given ε, O and t, the MSE in releasing X rather than X is Proof: From (2), we know that F .On the other hand, where (a) and (c) follow from the orthogonality of a n and a d , respectively; and (b) uses the fact that the trace is invariant under cyclic permutations.
Using the matrix indexes given by O, it follows that Finally, we derive the expression claimed in the statement by recalling that the variance of a Laplacian r.v. of scale parameter b is 2b 2 .Lemma 2 provides the MSE incurred by quantization, and shows that the Laplace and coefficients-removals errors are strictly increasing and non-increasing with t, respectively.We explore next how to distribute the privacy budget among the selected coefficients so that the total error is minimized.
Denote by ε * the optimal assignment of ε Q L , Theorem 1 (Optimal assignment of ε Q L ).For any given O and any t ∈ {1, . . ., nd}, the optimal assignment ε * is ε Q L for i = 1, . . ., t, and the corresponding minimum MSE yields Proof: The proof is organized in two steps.First, we show that the optimization problem implicit in ( 8) is convex.Secondly, we use Karush-Kuhn-Tucker (KKT) conditions to solve the problem.
For notational conciseness, we denote ε O(1) , . . ., ε O(t) by ε 1 , . . ., ε t , and define To show that the problem is convex, note that, from Lemma 2, is the sum of strictly convex functions f k , and observe that the inequality and equality constraint functions are linear and affine.Since the objective and constraint functions are also differentiable and Slater's constraint qualification holds, KKT conditions are necessary and sufficient conditions for optimality [26, §5].The application of these optimality conditions leads to the following Lagrangian cost, and finally to the conditions Since ε k > 0, it follows from the complementary slackness condition that λ k = 0, which, by the dual optimality condition, implies From the primal equality constraint, and hence which the Laplace-noise error is minimized.These points constitute the optimal trade-off.The points in gray, on the other hand, reflect a nonoptimal assignment of ε.In (b), we observe that t = 2 minimizes the minimum MSE.In this example, ε Q L = 1 and Λ is the maximum value of each attribute within the block.
Substituting the above expression for µ into f k −1 (µ) leads to the expression of the optimal ε given in the theorem.Then, the MSE follows by substituting the solution into ξ X (ε, O, t).
A couple of remarks follow from Theorem 1.On the one hand, the optimal assignment of ε Q L conforms to intuition, as those coefficients with smaller sensitivities are assigned smaller ε O(i) .On the other hand, we observe that the MSE due to the Laplace error is proportional to the inverse of the square of ε Q L .This means, for example, that increasing ε Q L from 1 to 2 implies a reduction by 75 percent in MSE.

Choice of t and Transform
For a given transform and O, the trade-off between the Laplace and the coefficients-removal errors is determined by t.In Fig. 4, we provide an example of this trade-off in the case of (i) a 32 × 13 input block X corresponding to the first 32 records of the "Census" data set [33]; (ii) the DCT; (iii) a zig-zag order, and (iv) no prediction.
In this particular example we show that there exists a value of t minimizing the sum of the two errors above for the DCT.This subsection aims to compute, in a DP manner, this value of t and the transform σ ∈ {c, s, h} that jointly minimize such total error 8 .Since this computation is a data-dependent operation, we resort to the exponential mechanism [34] of DP.Henceforth, we shall denote the optimal values of those two parameters by t * and σ * .For notational compactness, we shall use κ to refer to the tuple of quantization parameters (O, t, σ).
The exponential mechanism requires designing a proper scoring function.To investigate the impact of this design decision on our quantizer, we consider a parametrized family ω θ of such functions, where θ denotes the exponent of both the Laplace-noise and the coefficients-removal errors in ξ X (ε * , κ).

Note that minimizing E ||X − X|| 2
F implies maximizing the coding gain of the quantizer, since Intuitively, the purpose of using these error-based functions is for the exponential mechanism to favor values of t and σ causing less MSE.Let T and Σ be the r.v.'s modeling the response of this mechanism, and ε Q E the desired level of protection of said mechanism.Ideally, we would like the joint PMF p T Σ to be as large as possible for T = t * and Σ = σ * , and as small as possible for the rest of values.Since p T Σ (t, σ; θ) is proportional to ε Q E ω θ (c, κ)/GS(ω θ (c, κ)), one might be tempted to choose θ >> 1.However, this may not be an appropriate choice since the sensitivity of the corresponding function is likely to increase accordingly.
For conciseness, our analysis only contemplates the cases θ = 1 /2 and θ = 1, and for simplicity the scoring functions operate with current observed values rather than expected values.Accordingly, the respective scoring functions are and Our next result computes upper bounds on the sensitivities of these two functions.Before proceeding, however, we introduce some notation.Denote by Λ a matrix of dimension n×d with all rows being Λ T , and by f c the query function that returns all elements of the transform block c.Accordingly, define σ = arg max σ∈{c,s,h} Furthermore, the absolute value function, when applied to a matrix, will denote the element-wise absolute value of such matrix.
Lemma 3 (Sensitivities of ω1 /2 and ω 1 ).Under the assumptions of Lemma 1, and for a given prediction block x, the L1-sensitivities of the scoring functions ω1 /2 and ω 1 satisfy Proof: Let x and x be two neighboring input blocks, and c and c their corresponding transformed error blocks.For any r ∈ {1, . . ., n}, denote by x r = (x r1 , . . ., x rd ) and x r = (x r1 , . . ., x rd ) the respective values of the different record in either input block.
Let O be any order.For k ∈ {1, . . ., nd}, let O(k, 1) and O(k, 2) denote the first and the second index of O, respectively.Accordingly, define where a n and a d are transformation matrices of dimensions n × n and d × d, as specified in (3).
From the definition of L1-sensitivity, we have that = max = max where ( 9) follows from the reverse triangle inequality and does not depend on x; (10) results from (4) and from the strict monotonicity of the square root function; (11) follows from the fact that the maximum of a sum is at most the sum of maxima; (12) holds since the squaring function preserves the order of nonnegative numbers; (13) follows from Lemma 1; and from ( 13) we immediately verify claim (i) in the lemma, as it is maximized for t = 0 (and hence for any O) and σ = σ.
To prove the second claim, we use c for any x, y and any positive real-valued functions g, h.
Accordingly, it follows that We know from claim (i) that the maximum on the left-hand side is upper bounded by GS(f σ c ) F .On the other hand, we have that (14) where (14) follows from the orthogonality of the three transforms under consideration.To complete the proof, note that each summand in ( 14) is maximized for either x ij = Λ j or x ij = 0, depending on the largest absolute difference    Fig. 5: PMF p T Σ (θ) of the exponential mechanism for θ = 1 /2 and θ = 1.We have used the zig-zag order, between x ij and xij .The strict inequality in claim (ii) is due to the fact that x and x must differ in one record.Several conclusions follow from Lemma 3. First, and most evident, the upper bounds on the sensitivities of ω1 /2 and ω 1 do not depend on O.The reason lies in that the bounds are maximized for t = 0, which means all terms GS(f σ c O(k) ) 2 in ( 13) must be added up.Likewise, the upper bound on the sensitivity of ω1 /2 does not hinge on x either, as the difference c O(k) − c O(k) in (9) does not.However, this is not the case for θ = 1, which requires that the prediction module share x with the quantization module.
In this latter case, we can observe the straightforward effect that prediction may have on the obtained bound.Specifically, it is immediate to verify that which indicates that, to reduce the sensitivity bound of ω 1 and thus obtain more accurate results from the exponential mechanism, the predictions x = 0 (right inequality) and x = x − Λ/2 (left inequality) represent worst and best-case scenarios.We note that this latter prediction simply reduces the domain of each attribute to be [0, Λ j /2].
Another interesting conclusion is that the sensitivity results are valid for any set of orthogonal, separable, twodimensional transforms, which extends the scope of our selection algorithm to include the vast majority of transformcoding techniques.
Finally, we observe that squaring the error terms in ω1 /2 (i.e., moving from θ = 1 /2 to θ = 1) has a significant impact on L1-sensitivity.While the resulting function may yield larger scores for (t * , σ * ) (which may help the exponential mechanism choose the optimal number of coefficients and transform), we note its sensitivity may in the worst case become 2 Λ F times larger than that of ω1 /2 , which may lose out the benefits of such an exponentiation.
Despite this latter observation, we would like to stress that determining which function will cause the least dis-Algorithm 1: Transform coding and quantization.
Input: An input block X; a prediction block X; a coefficients order O; the respective privacy parameters ε Q L and ε Q E of the Laplace and the exponential mechanisms; the scoring-function parameter θ Output: A protected error block Ẽ satisfying for the given order, all t = 0 . . ., nd and all σ ∈ {c, s, h} 2 Calculate the upper bounds 9 on the L1-sensitivity of ω θ (C, κ) from Lemma 3 tortion is not possible a priori, since one would need to know c in advance.The appropriateness of ω1 /2 and ω 1 will therefore depend on the actual data.Fig. 5 reflects this situation by comparing the PMFs p T Σ (θ) for θ = 1 /2 and θ = 1, and for two different input blocks.In Figs.5(a,b), the smaller dispersion of θ = 1 and the fact that E p T Σ (1) [T |σ] is close to t * for all σ, makes this function more suitable.In Figs.5(c,d), however, θ = 1 /2 seems to be more appropriate: the PMF exhibits a smaller dispersion than θ = 1, and it attains its maximum value exactly at t * = 2 for the three transforms.
The joint operation of the modules analysis, quantization and synthesis is summarized in Algorithm 1.The interaction among the three modules is reflected in lines 5 and 14, where quantization decides on the transform to be used by the transform-coding modules.Since quantization also requires the prediction block to compute GS(ω 1 (c, κ)), the algorithm is input X and X, rather than just E.
Returning to the notation of block subindexes, we also note that a decision must be made with regard to the distribution of the privacy budget ε j available for each block and frame.In Algorithm 1 we make no assumption, apart from the fact that the budget devoted to the Laplace and to the exponential mechanism must satisfy ε 9. The bounds of Lemma 3 are for θ = 1 /2 and θ = 1.

Prediction
Transform coding is a simple albeit efficient technique for utilizing statistical dependencies among the records within a single transform block.For additionally exploiting dependencies among transform blocks within a same or different frame, image and video coding relies on prediction techniques.
In video compression, there exist two classical prediction modes, intra-prediction and inter-prediction.In the former mode, the transform coefficients or original samples of a transform block are predicted using already coded samples of neighboring blocks.That is to say, intra-prediction only leverages statistical dependencies inside frames.However, as video sequences usually contain significant temporal redundancies, the additional exploitation of dependencies among the different frames of a video sequence can notably enhance coding efficiency.This later approach is referred to as inter-prediction.
In this work, we propose a hybrid video coding scheme to protect database streams, meaning that the protection algorithm is a hybrid of three fundamental techniques, namely, transform coding for dependencies within blocks, and the two prediction modes above.However, unlike video compression, these modes will be applied in a more general sense: we shall allow both intra-prediction and interprediction to generate X from reconstructed blocks of the same frame and from reconstructed blocks of different frames.
Intuitively, the better the future of an input block (modeled as a vector process) is predicted from its past output blocks and the more redundancy the input block contains, the less new information is contributed by each successive block of the database stream [35]; for a fixed privacy budget, if less information needs to be protected, less distortion is introduced.
Next we recover the subindex notation for blocks.A measure of prediction performance is the closed-loop prediction gain ratio [30], which is defined as From ( 2), (7), and ( 15), the overall SNR power ratio of the DPCM system can be expressed as We shall adopt the most commonly used criterion for the optimality of a predictor [36], [30], the minimization of the denominator of (15), which implies the minimization of the variance and the mean of the prediction error.
We shall denote by Φ the set of modes and types of prediction of the video coding standards available to the module at hand.Accordingly, each φ ∈ Φ will represent a unique configuration of the prediction module, e.g., the intra-mode of H.264 with horizontal prediction, the latter being the prediction type.
We shall consider spatial prediction modes 10 , which operate with original samples, in contrast to those that estimate 10.Predictions in the sample domain have the advantage that predictor blocks can be generated for arbitrary prediction directions [35].X from transform coefficients.Formally, where the function f is chosen adequately to generate a good estimate of Xj from the π past values of the reproduced process {S ε i }.Although a variety of "standard" functions will be considered for intra-prediction in our evaluation (a couple of examples are shown in Fig. 6), we shall only contemplate block matching [37] as inter-prediction technique.In our case, when applying block matching we will be selecting the reconstructed block that minimizes the denominator of ( 16).The reason for restricting to block matching is that we expect small inter-frame redundancies, in contrast to video sequences.

Preprocessing
Recall that a permutation matrix is a square (0, 1)-matrix in which each row and each column has exactly one entry of 1 and zeros elsewhere.Let Ψ denote the set of permutation matrices.For any ψ ∈ Ψ, notice that the product ψX is a permutation of the rows of X.
Informally speaking, the goal of the preprocessing module is to find a permutation of the rows of X that helps the predictor generate a better prediction X of X.Since the actual X is not available to the preprocessing module at the time when it is to permute X, the module will be devised to find the permutation that minimizes the prediction error for all φ ∈ Φ.We shall see in Sec.4.6 that this operation is conducted jointly with the encoder-control module.
The minimization of the prediction error, however, is not without constraints, since the cost of permuting must be kept to an acceptable level.In this work, we quantify this cost with the Spearman's footrule distance [38]  11 , given by which measures the total element-wise displacement from the original order, denoted by the identity matrix i n .
Formally, for a given set Φ of prediction modes and types, the preprocessing module is designed to compute the solution to the optimization problem 11.The Spearman's footrule is the most popular metric to evaluate distances between permutations.which describes the optimal trade-off between prediction error on the one hand, and on the other permutation or reordering cost.Intuitively, the larger the maximum acceptable cost, the smaller the prediction error and vice versa.
Let v, w ∈ R n×d and z ∈ R be the parameters of an assignment problem with side constraints (APSC) [39]  12 .Recall that the formulation of an APSC in standard form is given by min Our next results shows the equivalence of the problems ( 17) and (18).
Lemma 4. For a fixed φ, the optimization problem ( 17) is an APSC.
Proof: For brevity, we write X instead of X(φ).Recall that the Frobenius inner product of two matrices a, b ∈ R n×d is defined as a, b F = tr a T b and induces the corresponding Frobenius norm a F = a, a F .Accordingly, we have that where ( 19) is due to the orthogonality of the permutation matrices, and (20) follows from the invariance of the trace under cyclic permutations.Eq. ( 20) implies that minimizing ψX − X 2 F for a given φ is equivalent to the problem of finding the permutation of the rows of X XT that maximizes the trace.The equivalence of problems (17) and (18) in terms of their objective functions is verified immediately by noting that (i) the objective function of Eq. ( 18) can be recast as the trace of ψv T and (ii) a problem in which the objective function is to be maximized can be converted into a minimization problem just by multiplying v by −1.To check the equivalence of the inequality constraint functions simply observe that F (ψ) = tr(wψ) for w ij = |i − j|.This completes the proof.
The strength of recasting (17) as an APSC lies in that it allows us to resort to efficient methods [39], [40] to compute the optimal permutation ψ * .This is of a great practical relevance as an APSC is NP-complete and our scheme must satisfy the delay constraint δ, as specified in Definition 3.

Extreme Regions of the Trade-off Plane
Even though powerful methods are available to compute ψ * , the fact that ( 17) is a minimization over all φ ∈ Φ means we need to solve an APSC for each available prediction mode and type, and each input block.This imposes an important computational burden on the preprocessing module and may compromise the fulfillment of the delay constraint δ.In the special cases when the system is designed to operate at the extreme regions of the trade-off, we may alleviate this burden as described below.
12. The problem has also been investigated in [40] where it is referred to as the resource constraint minimum weight assignment problem.

Low Prediction Error.
It can be shown [41] that This result implies that if we accept permutation costs larger than or equal to n 2 /2 , then the optimization problem (17) becomes an unconstrained linear assignment problem.Optimization problems of this kind can be solved in polynomial time O(n 4 ) with the original Hungarian algorithm and more efficiently with a bunch of algorithms that achieve O(n 3 ).We refer the reader to [42] for further details on this topic.
Low Reordering Cost.In the case when there are stringent, tight constraints on the permutation cost, intuitively the feasible set of (17) will mostly include permutations of nearby records.We contemplate two strategies, S1 and S2, that exploit this fact for the sake of computational efficiency.
Recall that an assignment problem can be regarded as a minimum weight perfect matching problem.S1 decomposes the blocks to be matched (i.e., X and X) into blocks of smaller sizes, and finds the matching of each of those sub-blocks.More specifically, it computes the solution of r optimization problems of the form (17), where X and X are now replaced with X i and Xi and denote sub-blocks of size n/r × d containing the records n(i−1) r + 1, . . ., n i r of X and X, respectively.Naturally, i c i R = c R .The strategy S2, on the other hand, tackles the original problem with a weight matrix that prevents the matching of records belonging to different sub-blocks.Specifically, we consider the matrix which produces the same effect as S1, but without having to split c R up into the r sub-problems.This is precisely the reason why the minimum prediction error attained by S1 will never be smaller than that achieved by S2, and also the reason why S1 may be more efficient than S2.Fig. 7 shows the performance of S1 and S2, expressed in relative terms with respect to the original optimization problem (17).We generated 100 instances of X and X completely at random and computed the average runtime and prediction error for d = 16, n = 8, 16, 24, 32 and r = 2. Since we are assuming low reordering costs, the performance was assessed for values of c R up to 1/5 of the maximum F (ψ) (see Eq. ( 21)).
The results show that the proposed strategies may reduce the computational burden significantly, with the highest reduction being an 80% for n = 24 and c R 33.As for the differences between the two strategies, we note that S2 performed better than S1 in terms of runtime for n = 8, while the opposite was observed for n = 24, 32.An important consideration is that both S1 and S2 may exhibit, for certain values of n and c R , larger runtimes than those required to compute (17).
The results also seem to indicate that the price to pay is relatively small.In our experiments, the minimum error value was observed to be just 9% larger than that attained by the original problem.In short, although these results obviously depend on the data and thus we cannot draw conclusions on whether which strategy is more appropriate     for a given data, with them we show the potential benefits of operating at the region of low permutation costs.

Encoder Control
Coding efficiency describes the ability of a video codec to trade-off bit rate and reconstruction quality [43].In video applications one typically wants the best possible reconstruction quality for a given available bit rate.
A multitude of parameters including coding modes and intra-prediction modes have to be selected on a per-block or per-frame basis.These selections determine the coding efficiency of a generated bitstream and are referred to as encoder control.
A larger set of coding and prediction modes is only advantageous in video coding if the reduction in bit rate that results from the improved prediction and transform coding outweighs the additional bit rate required for transmitting the selected modes to the decoder.In the case of database streams, we have an entirely analogous trade-off.Since such a selection is a data-dependent operation, re-distributing a fixed privacy budget to allow one more DP algorithm only makes sense if a larger set of prediction modes and types can effectively reduce the overall distortion.
We design the encoder control to decide, on a per block basis, the prediction mode (i.e., intra or inter) and the specific prediction type to be used (e.g., average vertical, horizontal).Consistently with the optimality criterion of the predictor module, we define the scoring function Our next results computes the sensitivity of this function, which we shall use to design the exponential mechanism selecting the specific prediction mode and type.
Lemma 5 (Sensitivity of the scoring function for the selection of the encoding parameters).The L1-sensitivity of the scoring function κ is GS(κ(x, x(φ))) = Λ 2 2 .Proof: Let x and x be two neighboring input blocks, and x r = (x r1 , . . ., x rd ) and x r = (x r1 , . . ., x rd ) the records in which the two input blocks differ respectively.Direct application of the definition of L1-sensitivity leads to GS(κ(x, x(φ))) = max Note that each of the summands above is maximized when xij (φ) = x ij , and that the minimum achievable value of each summand is zero.Accordingly, where clearly the maximum is attained at the extreme values of x r and x r .Algorithm 2 shows how the modules preprocessing, encoder control and prediction interact to select, in a DP manner, a permutation ψ and a configuration φ that minimize the prediction error.Specifically, the predictor estimates X for all possible configurations in line 1.All prediction blocks are then sent to the preprocessing module, which computes the permutations minimizing each of these blocks, as specified in (17) (lines 2 to 4).Lastly, the encoder control decides on the configuration of the predictor and the corresponding optimal permutation (line 5), which are conveyed to the predictor and the preprocessing modules, respectively.
By the sequential and parallel composition properties of DP, it is immediate to verify our DPCM-based protection method provides ε-DP frames, with

EXPERIMENTAL EVALUATION
In this section, we evaluate experimentally the protection method proposed in Section 4. The aim of this section is to show that our approach, which builds on hybrid video encoding techniques to enhance data utility, may in fact diminish the amount of noise required to attain ε-DP.The empirical analysis provided in this section has been conducted in its entirety with Matlab 2019b, on a Ryzen 7 1800X at 4GHz.

Data Sets
To try to capture the voluminous and continuous characteristics of database streams, our experiments are targeted toward large datasets.
Our experimental evaluation will use two standardized data sets, known as "(Very) Large Census" and "Quant Forest", which are two of the largest datasets in the community Algorithm 2: Preprocessing, encoder control and prediction.
Input: An input block X; the reconstructed blocks Xj−1 , Xj−2 , . . ., Xj−π ; the privacy parameter ε E of the exponential mechanism; the maximum desirable permutation cost c R Output: A permutation ψ and a prediction configuration φ satisfying both ε E -DP 1 Compute X(φ) for all φ ∈ Φ 2 forall φ ∈ Φ such that X(φ) has a least a non-constant column do 3 Compute E(φ) 2 F as (17) and denote the minimizer by φ(ψ) 4 end 5 Select φ(ψ) with probability proportional to of statistical disclosure control.For brevity, we shall refer to them as vlCensus and forest, respectively.The former data set contains 149 642 records and has 13 numerical attributes.It was previously documented and used in [44], [45], and has been chosen to adhere to the de facto convention in the area as well as for its large number of records.
The latter has 581 012 records and is based on the Forest FCoverType dataset available at the UCI KDD data repository [46].Exactly as in [45], [47], we selected just the real-valued attributes, which reduced the number from 54 to 10, and for computational reasons we took the first 150 000 records.In our analysis, all attributes have been treated as quasi-identifiers and therefore all them have been the target of protection.

Baseline Method
As we mentioned in Sec.2.2.2,only [10] has tackled the problem of publishing DP database streams in a continuous manner.However, since that work is limited to single-attribute databases, evaluating this protection method against ours is meaningless.
Consequently, we cannot but compare our solution just with the baseline approach described at the beginning of Sec. 4. The plain Laplace noise (PLN), as we shall call it, will add Laplace noise directly to the incoming records, without introducing delay nor reordering them.Although it is a rather naive strategy, it will allow us to assess the benefits of our method and derive worst-case bounds on distortion.

Configuration Parameters
Next, we specify the range of configuration parameters used in our experiments.

Coefficients Order
As explained in Sec.4.3.1, scans are designed following the empirical evidence that Pr{C ij > 0} is typically decreasing with i and j.In our experiments we use the zig-zag scan and the diagonal orders shown in Fig. 3.

Block Sizes
In real practice, d will be given by the database stream to be protected, and thus is fixed, whereas n is a parameter of the scheme and needs to be chosen appropriately.
In Fig. 8, we have computed the quantity GS(f σ c ) F for different block sizes.This quantity is central to compute the sensitivities of the scoring functions θ = 1 /2 and θ = 1, and therefore to choose t and σ; hence its importance.
The results have been obtained for Λ = 1, which is equivalent to dividing each attribute value by its maximum value and essentially indicate that the specific value of n will not have a large impact on the sensitivity of either scoring function.The number of attributes, however, does have a greater effect on GS(f σ c ) F , and appears to be roughly linear with d.
It is worth emphasizing that the transforms shown for each block size are the ones maximizing the quantity at hand.In other words, they are the worst choice, among the three transforms under study, in terms of data distortion.However, as Fig. 8 shows, the differences in terms of GS(f σ c ) F among the DCT, DST and DHT are small.Although n does not seem to have an impact on the sensitivities of the scoring functions θ = 1 /2 and θ = 1, it does pose various trade-offs in our DPCM scheme.For example, the larger n, the larger the number of transformed coefficients, and the more Laplacian noise will be added to each of them, but the larger the coding efficiency of transform coding 13 Likewise, the smaller n, the less permutations will be available for the preprocessing module, and therefore the worse the prediction X of X.In order to capture the effect of n on the proposed scheme, our experiments will be conducted for block lengths of 8, 16, 32 and 48 records.
13.The coding efficiency of transform coding typically increases with the block size.Nonetheless, the potential gains may become insignificant beyond a certain block size [35].Fig. 10: Average distortion versus privacy protection for a fixed delay δ = 7 872, several block sizes n and two allowed reordering costs c R ∈ {1, n 2 /2 } in the data set vlCensus.The baseline approach and the proposed solution are represented with black and coloured lines and points, respectively.

Preprocessing
In those cases when the processing module is to operate at the extreme regions of the prediction-reordering tradeoff, we shall use the strategies described in Sec.4.5.1 to alleviate the computational burden on the module.In the low-reordering case, we shall employ S1 for n = 8, 16 and S2 for n = 32, 48.In any case, we shall set a timeout of 2 seconds for the computation of either the original problem (17) or the strategies S1 and S2.

Distortion Metric and Privacy Parameters
We use the SSE to evaluate the impact on distortion caused by anonymization.The SSE is a measure of overall information loss that is frequently employed in the evaluation of statistical disclosure control methods.
On the other hand, we shall conduct our series of experiments for levels of privacy protection in the interval ε ∈ [1,3], which cover the usual range of values observed in the literature [51], [52], [15], [16], [17].In this regard, we shall set ε Lastly, note that the sensitivity values derived in Section 4 are essentially proportional to the length of the intervals in which these attributes take values.Since the attributes of our two data sets are not naturally upperbounded, we need to delimit the domain of each attribute.For the sake of comparison, we follow the methodology described in [15], [53], [16], [54] and upper-bound the domain of an attribute to be 1.5 times the maximum value of this attribute in the data set.

Results
First of all, it should be noted that the series of experiments shown in the sequel have been conducted for the scoring functions ω1 /2 and ω 1 .However, since the observed differences are negligible, we just report on the results for one of them, namely, ω 1 .Fig. 9 shows average 14 distortion values for ten equally spaced values of ε within the interval [1,3].In our experiments, we set δ = m, which means all records experienced a delay of m records, that is to say, a frame 15 ; and evaluated the proposed system for four delay-constraint values (shown in the figure), which account for roughly 0.5%, 3.65%, 6.82% and 10% of the total length of the dataset.Furthermore, we allowed a reordering cost of half of the maximum acceptable cost, that is, c R = n 2 /2 /2.The log-distortion obviously decreases with ε, and does so in an almost linear way, both in our system (coloured lines) and in the baseline approach (black lines).In any of the four subfigures, Figs.9(a-d), we can see that higher delays translate into lower distortion.This is not because there are more record blocks available for inter-prediction or block matching, as these parameters are fixed.This is simply because ε j is larger, on account of the fact that ε j = ε m/l and m = δ.Also, in the process of decreasing distortion, the 14.Given the randomness of the DP mechanisms employed, we used one hundred repetitions for each combination of system parameters and averaged all them.
15. Recall that m is the number of records within a frame.effect of the delay is much more important in our system than in the baseline approach, essentially because the latter does not leverage the delay for anything else, other than increasing ε j .
In comparative terms, it may seem that there is not a large difference in distortion between our solution and the baseline approach.However, indeed there is: a reduction of 0.3 or 0.4 in log 2 SSE in fact represents a relative reduction of 23.11% or 31.95% in SSE.This is what we observe in Fig. 9(d): our approach yields 32% less distortion that the baseline solution for ε = 3, n = 48, δ = 14 928 and c R = 576.However, for the smallest delay value (δ = 720), it appears that larger block sizes do not diminish distortion too much.It should be noted, though, that the observed gain margins are despite the low values of ε j our system operates with, going approximately from 0.0048 (when δ = 720 and ε = 1) to 0.2986 (when δ = 14 928 and ε = 3).
For a fixed delay, Fig. 9 shows how the distortion decreases with the pair (n, c R ).This seems to indicate that the coding efficiency of transform coding increases with n (despite the fact that we may have potentially more coefficients and thus more noise added to them) and/or that a greater number of permutations available for the preprocessing module notably improves the prediction X of X. /2 (no constraints on reordering).The results indicate that the gains due to allowing any reordering are not significant, which suggests that the block size has a greater impact on distortion.In short, it seems that, out of the main parameters controlling the trade-off among distortion, delay, reordering and privacy, n and δ have a greater effect on distortion than c R -at least in this dataset.Fig. 11 shows the same variables of Fig. 9 but for the data set forest.In general, we can observe a very similar behaviour than in vlcensus, including the same little impact of reordering on distortion reduction.There are some slight differences, however.First, the minimum difference in distortion between the baseline approach and our scheme, which is observed for δ = 720, n = 8, c R = 16 and ε = 1), is 0.66%; while in vlcensus this yields 0.078%.And secondly, the maximum differences in distortion between our solution and the baseline approach are observed, analogously as in the data set vlcensus, in Fig. 11(d) and yield 31%. Fig. 13 shows the average processing time per block we recorded in the computation of Figs. 9, 10, 11 and 12.We observe that the 75th percentile for vlcensus is 0.6038, 0.2495, 0.1039, 0.02456 seconds respectively for n = 8, 16, 32, 48 and 0.5997, 0.2141, 0.0991, 0.0214 seconds for forest.We also notice that the processing time is slightly greater for the vlcensus data set, which is consistent with its 3 additional numerical attributes.In this regard, we would like to emphasize that the efficiency of our method for largedimensional stream databases (i.e., large d) will depend on the efficiency of the employed transforms.As a matter of fact, the computational burden on the analysis and synthesis   blocks represents, on average, the 41% of the time needed to protect a block.

PREVIOUS WORK ON DIFFERENTIALLY-PRIVATE TRANSFORM CODING
As described in Secs.2.2.2 and 5.2, the only work that has dealt with the problem of publishing DP database streams is [10].Nonetheless, we could not compare our approach with that work experimentally since it just operates with single-attribute databases and does not allow record updates.
Although to the best of our knowledge there is just this work, the conceptual approach presented here, however, shares some similarities with two distinct protection methods, [55] and [56].Although none of them are intended for database streams, for the sake of rigorousness we deem it appropriate to highlight the main differences between those two works and ours.
The former work, [55], aims to answer a fixed number of queries over time-series data under DP.To this end, the authors propose a protection method called sampling perturbation algorithm (SPA) that perturbs the one-dimensional discrete Fourier transform (DFT) of such query answers.In particular, the SPA chooses the number of such coefficients adaptively with the exponential mechanism by sampling a multidimensional hyperbolic distribution and then perturbing them.On the other hand, [56] aims to protect static histograms with DP.The proposed solution, called enhanced SPA (ESPA), essentially uses a different scoring function in the exponential mechanism of the SPA.
First and foremost, we would like to emphasize that SPA and ESPA are not aimed, nor can be trivially adapted, to database streams 16 .Secondly, their fundamental operation relies merely on a single, one-dimensional transform-coding scheme and the elimination of certain coefficients; but they do not address the problem through a hybrid video coding approach nor interprets the processing of those coefficients as a quantization step, nor considers prediction, encoding control or data permutations.Thirdly, [55] and [56] capitalize upon the DFT of the original input data, whereas our work operates with the two-dimensional DCT, DST and Hartley transforms of the residual signal.Fourthly, our approach 16.Note that [56] does not even address the case of continuous data.distributes the privacy budget among the transformed coefficient in an optimal fashion, so as to minimize the MSE.Fifthly, we use a family of parameterized scoring functions in the exponential mechanism to select not only the number of transform coefficients but also the type of transform.Finally, our approach leverages the exact sensitivity of those coefficients, while SPA and ESPA operate with a sensitivity bound, whose mathematical derivation is flawed.

CONCLUSIONS AND FUTURE RESEARCH
With the advent of big-data analytics, complying with current data-protection frameworks in Europe and some Western countries has become very challenging.Our work focuses on the anonymization of database streams (a particular class of dynamic data), a technique whereby data controllers can legitimately circumvent such legal frameworks.
Among a variety of privacy notions, DP is one of the most popular among the scientific community working in data anonymization.In this work, we have tackled the protection of database streams with DP in the compelling case when the data controller wishes to publish those streams, rather than statistics derived from them.
We have proposed an anonymization method that can publish multiple numerical-attribute, finite database streams with DP guarantees and provide high protection as well as high utility in terms of data distortion, delay and record reordering.
The proposed method, which relies on the DPCM compression scheme, adapts techniques originally intended for hybrid video encoding, to favor and leverage dependencies among the blocks of the stream to be protected.In video coding, the exploitation of statistical dependencies can enhance coding efficiency and reduce the information contributed by image blocks and frames.In our context of database anonymization, we have shown the adapted techniques can help introduce significant less distortion.
We have designed our method to operate with blocks of records going through a series of modules analogous to those of the DPCM scheme, except for the preprocessing module.The design of our solution has been optimized in a number of different ways to minimize the MSE incurred in releasing the synthetized, protected block (instead of the original one).
Our extensive experimental evaluation demonstrates the suitability of utilizing hybrid video encoding to publish DP database streams.For the two data sets under study, we have shown our method can achieve a relative reduction of 32% and 31% less distortion in SSE than the baseline approach for the vlCensus and forest data sets, respectively.Remarkably enough, these results have been obtained for extremely low values of ε j (i.e., for extremely high values of block protection), in the interval [0.0048, 0.2986].
We have also observed that distortion decreases with the block size and the maximum acceptable reordering cost, which suggests that the coding efficiency of transform coding increases with the former parameter and/or that a larger number of permutations at preprocessing module significantly reduces the prediction error.Furthermore, our experimental results seem to confirm that the block size and the delay have a greater effect on distortion than the maximum acceptable reordering cost.
With this work we have also shown the riveting interplay between the field of information privacy on the one hand, and on the other the fields of data compression and video encoding, while bridging the gap between the respective communities.
Finally, an interesting and necessary avenue for future research is to develop anonymization algorithms for database streams containing categorical attributes.

Fig. 1 :
Fig. 1: Overview of the proposed scheme to generate DP database streams.Dashed and continuous lines indicate the data at those points are respectively protected and unprotected.

Fig. 2 :
Fig. 2: (a-c) Relative difference in L1-sensitivity, as defined in (6), among the discrete cosine, sine and Hartley transforms for two block sizes, n = d = 8 (top-row figures) and n = d = 16 (bottom-row figures).(d) Transform with the minimum sensitivity value for each coefficient, for n = d = 8 (top figure) and n = d = 16 (bottom figure).The results have been computed for Λ = 1.

Fig. 4 :
Fig.4: (a) Trade-off between the Laplace-noise error and the coefficientsremoval error, and (b) minimum MSE due to quantization.Each black point in (a) corresponds to one of the 33 × 14 possible values of t for which the Laplace-noise error is minimized.These points constitute the optimal trade-off.The points in gray, on the other hand, reflect a nonoptimal assignment of ε.In (b), we observe that t = 2 minimizes the minimum MSE.In this example, ε Q L = 1 and Λ is the maximum value of each attribute within the block.

Fig. 6 :
Fig. 6: Vertical intra-prediction modes of the standards H.263 (left) and H.264/MPEG-4 AVC (right).The former estimates Xj from the column averages of the previously reproduced block Xj−1 .The latter uses directly adjacent samples of already protected blocks.

Fig. 7 :
Fig.7: Reduction in execution time and increase of the minimum prediction error provided by the two proposed strategies (S1 and S2), when the system is designed to operate at low permutation costs c R .The results have been obtained for different block lengths n and for d = 16 and r = 2.

Fig. 8 :
Fig.8: Quantity GS(f σ c ) F for different block sizes and for the three transforms under study.From the figure, we note that this quantity does not vary significantly with the block size.

Fig. 9 :
Fig.9: Average distortion versus privacy protection for several values of record delay δ, block size n and maximum allowed reordering cost c R in the data set vlCensus.The baseline approach and the proposed solution are represented with black and coloured lines and points, respectively.

Fig. 11 :
Fig.11: Average distortion versus privacy protection for several values of record delay δ, block size n and maximum allowed reordering cost c R in the data set forest.The baseline approach and the proposed solution are represented with black and coloured lines and points, respectively.

Fig. 12 :
Fig.12: Average distortion versus privacy protection for a fixed delay δ = 7 872, several block sizes n and two allowed reordering costs c R ∈ {1, n 2 /2 } in the data set forest.The baseline approach and the proposed solution are represented with black and coloured lines and points, respectively.

Fig. 10
clarifies this latter point.Here we show the distortion for a fixed delay m = 7 872 and two values of c R , namely, c R = 1 (no reordering allowed) and c R = n 2
1, and x = 0.The input data are an 8 × 8 block corresponding to the last 8 records and first 8 attributes of the "Census" data set (a,b); and a 48 × 13 block corresponding to the last 48 records and all attributes of the same data set (c,d).The pairs (t, σ) that minimize ξ X (ε * , κ) are (1, c) for the former block and (2, h) for the latter.