Crate scrambledb
source ·Expand description
§ScrambleDB
This document describes ScrambleDB
, a protocol between several
parties for the pseudonymization and non-transitive joining of data.
§Overview and Concepts
ScrambleDB
operates on tables of data, where a table is a collection
of attribute entries for entities identified by unique keys.
ScrambleDB
offers two sub-protocols for blindly converting between different
table types.
§Conversion from plain tables to pseudonymized columns
A plain table contains attribute data organized by (possibly
sensitive) entity identifiers, e.g. a table might store attribute
data for attributes Address
and Date of Birth (DoB)
under the
entity identifier Full Name
:
Full Name (Identifier) | Address | DoB |
---|---|---|
Bilbo Baggins | 1 Bagshot Row, Hobbiton, the Shire | Sept. 22 1290 |
Frodo Baggins | 1 Bagshot Row, Hobbiton, the Shire | Sept. 22 1368 |
The result of ScrambleDB
pseudonymization of such a
table can be thought of as computed in two steps:
- Splitting the original table by attributes, resulting in single-column tables, one per attribute, indexed by the original identifier.
Full Name (Identifier) | Address |
---|---|
Bilbo Baggins | 1 Bagshot Row, Hobbiton, the Shire |
Frodo Baggins | 1 Bagshot Row, Hobbiton, the Shire |
Full Name (Identifier) | DoB |
---|---|
Bilbo Baggins | Sept. 22 1290 |
Frodo Baggins | Sept. 22 1368 |
- Pseudonymization and shuffling of split columns, such that the original identifiers are replaced by pseudonyms which are unlinkable between different columns.
Pseudonym (Identifier) | Address |
---|---|
pseudo1 | 1 Bagshot Row, Hobbiton, the Shire |
pseudo2 | 1 Bagshot Row, Hobbiton, the Shire |
Pseudonym (Identifier) | DoB |
---|---|
pseudo3 | Sept. 22 1368 |
pseudo4 | Sept. 22 1290 |
Since the result of pseudonymizing a plain table is a set of pseudonymized single-column tables we refer to this operation as a split conversion.
§Conversion from pseudonymized columns to non-transitively joined tables
Pseudonymized columns may be selectively re-joined such that the
original link between data is restored, but under a fresh pseudonymous
identifier instead of the original (sensitive) identifier. In the
above example, a join of pseudonymized columns Address
and DoB
would result in the following pseudonymized joined table.
Join Pseudonym (Identifier) | Address | DoB |
---|---|---|
pseudo5 | 1 Bagshot Row, Hobbiton, the Shire | Sept. 22 1290 |
pseudo6 | 1 Bagshot Row, Hobbiton, the Shire | Sept. 22 1368 |
The contained pseudonyms are fresh for each join and are non-transitive, i.e. it is not possible to further join two join-results based on the join pseudonym.
Since the result of this conversion is a joined table, we refer to the operation as a join conversion.
§Data Sources, Stores and Converter
ScrambleDB
is a multiparty protocol where parties serve different
roles as origins or destinations of data.
Non-pseudonymized data originates at a data source.
Data stores hold pseudonymized data and come in two forms:
- The data lake is a designated data store which stores pseudonymized data columns fed to it by data sources via the ScrambleDB protocol.
- A data processor is a data store which acquires pseudonymized joined tables from a data lake via the ScrambleDB protocol.
The converter facilitates the protocol in an oblivious fashion by blindly performing the two types of conversion operations.
§Cryptographic Preliminaries
§Rerandomizable Public Key Encryption
A rerandomizable public key encryption scheme RPKE
is parameterized by a
set of possible plaintexts PlainText
as well as a set of ciphertexts
Ciphertext
.
It offers the following interface:
- Key Generation:
fn RPKE.generate_key_pair(randomness) -> (ek, dk)
Inputs:
randomness
Ouputs:
ek: EncryptionKey
dk: DecryptionKey
- Encryption:
fn RPKE.encrypt(ek, msk, randomness) -> ctxt
Inputs:
ek: EncryptionKey
msg: Plaintext
randomness
Outputs:
ctxt: Ciphertext
- Decryption:
fn RPKE.decrypt(dk, ctxt) -> msg'
Inputs:
dk: DecryptionKey
ctxt: Ciphertext
Output:
msg': Plaintext
Failures:
DecryptionFailure
- Ciphertext rerandomization:
fn RPKE.rerandomize(ek, ctxt, randomness) -> ctxt'
Inputs:
ek: EncryptionKey
ctxt: Ciphertext
randomness
Output:
ctxt': Ciphertext
§Convertible Pseudorandom Function (coPRF)
TODO: Describe coPRF interface
§Pseudorandom Permutation
A pseudorandom permutation is a keyed pseudorandom permutation with
the following interface, where PRPKey
is the set of possible keys
for the permutation and PRPValue
is the both the domain and range of
the permutation.
- Permutation:
PRP.eval(k, x) -> y
Inputs:
k: PRPKey
x: PRPValue
Output:
y: PRPValue
- Inversion:
PRP.inverse(k, x) -> y
Inputs:
k: PRPKey
x: PRPValue
Output:
y: PRPValue
We require that for all possible PRP keys k
, successive application
of PRP.eval
, then PRP.inverse
or reversed is the identity
function, i.e. for any x
in PRPValue
:
PRP.eval(k, PRP.inverse(k, x)) = PRP.inverse(k, PRP.eval(k, x)) = x
Modules§
- This module defines ScrambleDB transformations at the level of individual pieces of data as defined in
data_types
. - This module defines data structures for indvidual pieces of data in ScrambleDB.
- Pseudonym Conversion
- Setup
- Pseudonymization
- Tables and Data Types
Structs§
- A wrapper type to facilitate (de-)serialization of HPKE ciphertexts to (and from) linear byte vectors.