We also want to go a bit further by creating attacks on our encrypted tables. According to the assumptions for the attacks given in [4] we will need to use multi-sensitive- attributes, indeed if we just consider one sensitive attribute, i.e. just one attribute use for the indexation, we won’t be able to discover more information with our attacks.
Another challenging part of our paper is the generalization of QOB for two attributes by using along other things the Hilbert curve.
3. OUR IMPLEMENTATION
3.1 Technologies
We have decided to use Java because it is the programming language that we know the best. We naturally used Eclipse (cf. [2]) to make our development. Moreover to facilitate the project we used a free Subversion hosting provider Project Locker (cf. [6]) and to control and check our work we used JUnit tests [5]. We naturally used as well the JavaDoc in order to facilitate the reading of the code2 .
3.1.1 Subversion
Subversion (SVN) is a version-control system, it allows peo- ple to work in the same time on the same files, which is what we need for a team project.Project Locker is a free (for less than 5 users) SVN provider, for more information about this provider cf. [6].
Subclipse (a plug-in for Eclipse cf. [7]) makes the utiliza- tion of SVN really easy: we just have to connect to our SVN3 to synchronize our project. In addition of a SVN; Project Locker provide a Trac, which is a project manager using Scrum and a Wiki.
3.1.2 JUnit
JUnit is a unit testing framework for Java. It allows to execute tests, which is a additional tool to check the merging of our work. Basically we precise for the methods we want to test the expected comportement according to its inputs. Please refer to appendix A.2 page 9 for more information about its installation.
3.2 Model of a database
3.2.1 Class Diagram
Please to facilitate your understanding refer to figure 14
(page 3) which represent a degraded class diagram of our
2 If you like object-oriented programming, you may find some beauty in our code, well at least we do.
3 our SVN (limited access):
4 The class diagrams of this paper have been made with
MetaUML, a MetaPost library.
IBucket
l: int
OptimalBucket
id: int
lB: int
hB: int
Value of the index that appears in the encrypted table
IPartition
id: int
buckets 0..* buckets 0..*
CompositePartition
name: String
table 1 1
Table
table
table tuples
tuples
0..*
Tuple
createDataSet(): DataSet 1
applyQuery(query: RangeQuery): List<Tuple>
applyQueryOnSecuredTable(RangeQuery query, IPartition partition): List<Tuple>
0..* value: int
<<creates>>
DataSet
frequencies: Map<Integer,Integer>
Mapping between the values of the attribute and their frequencies
Figure 1: Our first implementation: single attribute
6
4
3
2
1 1 1 1
0 0
IValue
Value
DValue
ValueOneAttribute
attribute1: int
ValueTwoAttributes
attribute1: int
attribute2: int
DoubleValueOneAttribute
attribute1: double
0 1 2 3 4 5 6 7 8 9
Figure 2: DataSet for our JUnit tests
first model of the database and of the bucketization. The current model is a bit more complex because of the multi- attribute possibilities.
3.2.2 DataSet
The class DataSet is used to model easily the distribution of an attribute, you can see figure 25 (page 3) our specific dataset used for our basic JUnit tests.
3.2.3 Multi-attribute
The scope of our paper concerning the multi-attribute is two sensitive attributes, but our model is easily generalizable for more attributes. To deals with multi-attribute tuples, we have implemented the class IValue (cf figure 3 page 3), most of our classes became parametrized and the type of IValue used must be precise. For instance here is how you create a instance of Table and add some tuples:
Example 2. // Creation of a Table with one sensitive
// value that will be use for indexation. Table<ValueOneAttribute> myTable =
5 The histograms of this paper have been made with Mi- crosoft Excel.
DoubleValueTwoAttributes
attribute1: double
attribute2: double
Figure 3: Model upgrades for two attributes
new Table<ValueOneAttribute>("myTable");
// Adding a tuple in the table with as
// sensitive value: 49. myTable.addTuple(
new Tuple<ValueOneAttribute>(
new ValueOneAttribute(49)));
The implementations of multi-attribute make some changes from some classes (of figure 1) that deal with unidimen- sional values have the new attribute columnName to know with witch column they are linked to.
3.3 Indicators
In order to perform some measures on the tables and on the partitions, we have implemented the indicators introduced in [4]. Into our Java package indicator, we have the classes Indicators and StatIndicators that contains the following methods. For more information about the implementation of this basic methods please refer to the JavaDoc of our project or directly to our code.
3.3.1 Total number of False Positive (TFP)
Indicator 1. As defined in subsection 3.2 of [4]:
where E(X ) and xi , i ∈ [1, n] are vectors.
A classic way to write it is:
TFP = ) RS
− |R |
where:
q∈Q
T (q) q
Var(X ) = E ( X
− E(X ) 2 )
• Q: set of all “legal” range queries over R, the clear-text relation.
• Rq : set of tuples satisfying query q.
Just note that, in our model of database, we consider the value of the tuples as a random variable. The point is for multi-attributes, let’s say two attributes, we can see a di- mension two random variable or two dimension one random variables. In our attacks this consideration will be quite important.
S
T (q)
: set of tuples in RS (encrypted and bucketized
3.3.4 Standard Deviation
relation) satisfying T (q), the translated query.
• |·|: the cardinality operator.
TFP is the sum for all the queries of the “false positive”, the tuples returned by the server but not satisfying the original query.
For the implementation of totalFalsePositive(.), we ba- sically use two methods in order to get the tuples satisfying the query from the clear-text table and from the pseudo- encrypted table:
For the experiments conducted in section 5.2 (page 5), we will use the standard deviation.
Indicator 4. As classically define the standard deviation
σ of a random variable X is the square-root of its variance:
σ2 = Var(X )
3.3.5 Entropy
Indicator 5. As defined in subsection 4.3 of [4], the en- tropy of a random variable X taking n distinct values with
corresponding probabilities pi , i ∈ [1, n] is given by:
n
• applyQuery(.)
Entropy(X ) = H(X ) = − ) p
× log (p )
• applyQueryOnSecuredTable(.)
3.3.2 Average Query Precision (AQP)
Indicator 2. As defined in subsection 3.2 of [4], AQP can be express as:
i 2 i
i=1
3.3.6 Average Squared Error of Estimation (ASEE)
@Damien: si tu peux revoir cette par- tie... merci
AQP =
q∈Q |Rq | = 1
TFP
S − S
Indicator 6. As defined in definition 4.1 of the subsection
q∈Q
T (q)
q∈Q
T (q)
4.2 of [4]:
As our implementation of TFP, we use the same principal for averageQueryPrecision(.).
Some considerations on AQP are used to introduce the QOB
algorithm in the subsection 3.2 and 3.3 of [4] (page 723).
ASEE is defined before propose the Variance of the distri- bution of values within bucket B. If there are N values in domain B, formula of ASEE can be defined as:
N N
3.3.3 Variance
Indicator 3. As defined in [10], the variance of a random variable X taking n distinct values xi , i ∈ [1, n] with corre-
ASSE(XB , X t
) )
j=1 i=1
pB (xi ) × pB (xj ) × (xi − xj )
sponding probabilities pi , i ∈ [1, n] is given by:
n
Var(X ) = ) pi × (xi − E(X ))2
i=1
Assumption: Xb is a random variable that follows the same
distribution as the elements of bucket B Pb denote Xb’s probability distribution Xj is a guessed value from Xb
X’b is a random variable from another assumption of sta- tistical estimator (external assumption) P’b is a probability
where E(X ) = n
pi × xi
distribution of X’b Xi is a guessed value from X’b
If the dimension of X superior is to one, we use the Euclidean norm to generalize the distance between xi and E(X ). It gives:
n
Var(X ) = ) pi × xi − E(X ) 2
i=1
Goal of adversary: to estimate the true value of a random element chosen from the bucket B.
In this project, our group also comparing the result of ASEE
formula above with the theorem 4.2, which defined in [4].
Propriety 1. This theorem explicit a relation between ASEE, variances and expectations of random variables.
ASEE(X, X t ) = Var(X ) + Var(X t ) + (E(X ) − E(X t ))2
3.4 Bucketization algorithms
3.4.1 Query-Optimal-Bucketization (QOB)
Damien
3.4.2 Controlled-Diffusion (CD)
YL
4. CONFRONTATION OF THE QOB BUCK-
Table 3: Deductive Sudokus are easy, right?
12
ETIZATIONS
In [4], they have introduced composite buckets, which is suppose to answer a lake of privacy of the QOB “cutting” partition (but not the mapping of the indexes). We want to show this lake of privacy, by making two attack algorithms.
4.1 Assumptions
We assume that the attacker can access the encrypted table and knows the partition and the exact distribution (i.e. the DataSet of the table).
10
7
6
5 5
4 4
3 3
2 2
1 1 1 1
0
8
7
6
4
3
2
1 1 1
The objective of the attacker is to obtain the mapping the indexes and the partition. In that case He will be able to place each eTuple (the old and the new ones) into its asso- ciate buckets and he can know a approximation (depending of the width of the buckets) of the sensitive columns.
4.2 Our Basic Attack Algorithm (BAA)
4.2.1 Principle
RATTA
4.2.2 Pseudo-Code
RATTA
4.3 Our Sudoku Attack Algorithm (SAA)
4.3.1 Origin of the name
We have called it Sudoku Attack Algorithm, because the way we found the mapping is deductive, like medium level Sudoku are. Basically, we use the sureness of a specific link between a index and a bucket to increase the probability on the mapping of the other indexes and buckets.
You can enjoy the sudoku6 in the table 3 (page 5), you can start by placing the “1”, “4”, “6” and “3” in the bottom-
right-3×3 square... Notice that you have to put the “1” first.
If you try to put the “4” first, you have two possibilities,
i.e. 0.5 probability to be right. Well it is similar to reveal the mapping, except that for the sudoku we know that the solution can be found.
You can find the solution of this sudoku in the table 6 (page
9).
4.3.2 Principle
RATTA
Figure 4: DataSet genered normally at random
4.3.3 Pseudo-Code
RATTA
5. EXPERIMENTS
5.1 Implementation
To conduct some experiments we need data to simulate in- formation. For our experiments, we consider two sources of information, which are modeled by two random variables following respectively:
• Uniformly distributed
• Normally distributed
Propriety 2. If we note N (µ, σ2 ) the normal distribu- tion7 with mean µ and variance σ2 , we have:
N (µ, σ2 ) = σ · N (0, 1) + µ
With this propriety, we have easily implemented some func- tions that create a table with random values. On the figure
4 (page 5) we asked for 100 tuples following N (100, 30): the
generated table has an expectation of 100.74 and a variance
of 28.76. And on the figure 5 (page 6) we asked for 1000 tuples following the uniform distribution for [0, 99].
5.2 Remake
5.2.1 Datasets and experimental setup
To “remake” the experiments of [4] we use two sets of data and two corresponding sets of queries:
7
6 This sudoku has been found in [9]
For more information about the Normal distribution you
may refer to [8].
20 180
18 160
16 140
14 120
12 100
10 80
60
8
40
6
20
4
0
2
0
Figure 5: DataSet genered uniformly at random
160
140
120
100
80
60
40
20
0
Figure 6: A Experimental Uniform DataSet
1. Uniform Data Set: It has been production like the
“Synthetic Data Set” in the subsection 7.1 of [4] (page
728). It is 105 integer values generated uniformly at random from the domain [0, 999]. You can see a repre- sentation of it figure 6 (page 6). In theory this dataset has an expectation of 500 and a variance of 83333.5.
2. Normal Data Set: We use our method to product 105
integer values generated normally at random and fol- lowing N (500, 83333.5). You can see a representation
of the dataset on the figure 7 (page 6), the domain created in this illustration is [−759, 1880].
3. Random Range Query Sets: We also generated two different set of 100000 queries (one for each previous data set). Each query is created uniformly at random by picking up two integers into the domain of the cor- responding data set.
4. List of parameters: For each table we compute the
QOB and CD algorithms for:
• M = {100, 150, 200, 250, 300, 350, 400}
• K = {2, 4, 6, 8, 10}
So for each table we will have 7 optimal partitions and
35 composites partitions.
5.2.2 Procedure
Because it is a remake, we won’t try to be original, not here anyway. We have conducted the same experiments that the ones presented in [4].
In order to preserve our self from a OutOfMemoryError, we had to cut the experiment into smaller parts. And another
Figure 7: A Experimental Normal DataSet
problem that we have was to calculate the AQP rate for our experimental setup.
Indeed the basic calculation (for our first implementation) is for one cutting partition in O (#queries · #tuples ) and for one composite partitions in O (#queries · M 2 · #tuples ).
The trick to reduce the computation is to compute first the DataSet of the encrypted tables and to sum the frequen- cies of the buckets that are return by the queries instead of searching the good tuples into the whole tables. We write a
O (#queries · M ) algorithm that returns directly the rate of
the AQP, which is still long but more reasonable.
We had made a global script to print all the measures made on a table and its partitions:
Algorithm 1. Experiment algorithm
Creation of a random Table and of its DataSet. Creation of random queries.
Computation of the BC table for the QOB algorithm. Realization of measure on the clear-text DataSet. FOR every M
QOB(M) on the DataSet.
Realization of measures on the Optimal DataSet.
FOR every K
CD(M,K) on the DataSet.
Realization of measures on the Composite DataSet.
endFOR
endFOR
We applied it to our “uniform” and “normal” tables.
5.2.3 Experiements
You may find our result in the attached file RemakeExperiments.xlsx.
Precision
Ratio of predsion (uniform distribution)
- k=2 k=4 k=6 k=8 k=10
12
Ratio of predsion (normal distribution)
- k= 2 k= 4 k=6 k=8 k=10
12
10 10
t t
%. _
E
u0
§ • * * * * * ...
%.
E 8
u0
* * * * * ...
":::"0
":::"0 6
t:
0 4 • • • • • •
t: ....
• • • •
4 • •
cf cf
"" ""
100 150 200 250 300 350 400
Number of Buckets
Figure 8: EE
Ratio of standard deviation (uniform distribuTion)
- k=2 k= 4 k=6 k=8 k=10
450
400
350
0
100 150 200 250 300 350 400
Number of Buckets
Figure 11: EE
Ratio of standard deviation (normal distribuTion)
- k=2 k= 4 k=6 k= 8 k=10
180
160
140
'5.
0
300
250
'5.
0
120
100
<>.
200
<>. 80
E" 150
u
0
100
50
E"
u0 60
0
40
20
100 150 200 250 300 350 400
Number of Buckets
Figure 9: EE
Ratio of entropy (uniform distribution)
- k=2 k= 4 k=6 -k =8 k=10
3,5
0
100 150 200 250 300 350 400
Number of Buckets
Figure 12: EE
Ratio of entropy (normal distribution)
- k=2 k= 4 k=6 - =8 k=10
2,5
t 2,5
&
'5.
0
t
&
'5.
0
1,5
<>.
1,5 _
<>.
u
f
0,5
100 150 200 250 300 350 400
Number of Buckets
Figure 10: EE
u
f
0,5
0
100 150 200 250 300 350 400
Number of Buckets
Figure 13: EE
300
2 50
2 00
150
100
50
._)T.r(a de-off: Precision Vç St andard Deviation (normal distribution)
.f
•
Privacy
Trade-off
Time taken. The table ?? (page ??) shows the correspon dence of the time needed for every part of our experiment for the normal distributed table, The total time to run the
0,2
0,4
0,6 0,8
Pre cision
experiment algorithm with our parameters is 57 min.
Figure 14: EE
+ O P
. k::2
.A.k=4
0,2 0,4 0,6 0,8
Precision
Figure 15: EE
X k=6
):;k::&
ek=l O
1,2
Table 4: Time taken (uniform distribution)
300
2 50
a de otfc Precioon Vs Stand ard De ation (uniform distributioo)
.. A k=4
2 00
t + OP
150
100
50
il: 4
g
'" 3
•
0,2 0,4 0,6 0,8
Precision
Figure 16: EE
Trade-off: Pre cision Vs Entropy (unf orm di'>tribut ion)
•
\
. k::2
X k=6
)(k= &
ek= lo
1,2
+ OP
•k=2
&k=4
X h6
X k::&
ek=lO
Table 5: Time taken (normal distribution)
5.3 Attacks
RATTA
6. OUR SOLUTION
DAMIEN
7. CONCLUSIONS
Blabla pipo pipo ou pas car nous ont est Supélec : on fait pas ce qu'on dit et surtout on ne dit pas ce qu 'on fait... Enfin Bref ceci c 'est un peu de contenu pour tester la mise en page. CQFD , is not it cool?
8. REFERENCES
[1] E. Damiani, S. D. C. Vimercati, S. Jajodia,
S. Paraboschi, and P. Samarati. Balancing
Confidentialit y and Efficiency in Unt rusted Relational
0,2
0,4
0,6 0,8
Precision
1,2
DBMSs. In CCS '03.' Proceedings of the l Oth ACM
conf erence on Computer and communications s ecurity,
pages 93-102, New York, NY, USA, 2003. ACM.
[2] Eclipse. .
Figure 17: EE
[3] H. HaCigümü§:, B. Iyer, C. Li, and S. Mehrotra.
Executin SQ L over Enc r ypted Date in the
Database-Service-Provider Model. In SIGMOD ’02: Proceedings of the 2002 ACM SIGMOD international conference on Management of data, pages 216–227, New York, NY, USA, 2002. ACM.
[4] B. Hore, S. Mehrotra, and G. Tsudik. A
Privacy-Preserving Index for Range Queries. In VLDB
’04: Proceedings of the Thirtieth international conference on Very large data bases, pages 720–731. VLDB Endowment, 2004.
[5] JUnit. .
[6] Project Locker. . [7] Subclise. .
[8] Gaussian distribution.ion.
[9] Sudoku. . [10] Variance - Discrete case.
APPENDIX
A. INSTALLATION
[2] In this appendix you may find some basic instructions to help you using our Java project. If you have some difficulties you can contact us during to the one week warranty.
A.1 Java Project
To install Eclipse, please refer to [2]
To import the project into Eclipse:
• File -> New -> Java Project
• Create project from existing source : selection the un-
zipped folder DatabasesSecurity
• Write “DatabasesSecurity” in the Project Name (if nec-
essary)
• click on Finish
If you are confortable with Eclipse and Subclipse you might want to download the project directly from our SVN, if you want to do so, please ask us a access to our Project Locker account on [6].
A.2 JUnit
We used the junit-4.8.1.jar, you can download it on [5], after that you might want to:
• Right click on the DatabasesSecurity project -> Build
Path -> Configure Build Bath
• Selection the Libraries tab
• Click on Add External JARs and selection the recently
downloaded .jar
Now you should be able to run the tests. In order to do so, right click on what you want to run (the test source folder, a package or even a class) and Run As -> JUnit Test.
The JUnit tab should automatically opens itself.
B. SOLUTION OF THE SUDOKU
Here is the solution of the sudoku in the table 3 (page 5):
Table 6: The solution of the sudoku
2009/2010 (Semester 2) - 9 / 9 -
Monika Damien Forest Rattanak Chhung
Yann-Loup Phan Van Song