Crate scrambledb

Expand description

§ScrambleDB

This document describes ScrambleDB, a protocol between several parties for the pseudonymization and non-transitive joining of data.

§Overview and Concepts

ScrambleDB operates on tables of data, where a table is a collection of attribute entries for entities identified by unique keys.

ScrambleDB offers two sub-protocols for blindly converting between different table types.

§Conversion from plain tables to pseudonymized columns

A plain table contains attribute data organized by (possibly sensitive) entity identifiers, e.g. a table might store attribute data for attributes Address and Date of Birth (DoB) under the entity identifier Full Name:

Full Name (Identifier)	Address	DoB
Bilbo Baggins	1 Bagshot Row, Hobbiton, the Shire	Sept. 22 1290
Frodo Baggins	1 Bagshot Row, Hobbiton, the Shire	Sept. 22 1368

The result of ScrambleDB pseudonymization of such a table can be thought of as computed in two steps:

Splitting the original table by attributes, resulting in single-column tables, one per attribute, indexed by the original identifier.

Full Name (Identifier)	Address
Bilbo Baggins	1 Bagshot Row, Hobbiton, the Shire
Frodo Baggins	1 Bagshot Row, Hobbiton, the Shire

Full Name (Identifier)	DoB
Bilbo Baggins	Sept. 22 1290
Frodo Baggins	Sept. 22 1368

Pseudonymization and shuffling of split columns, such that the original identifiers are replaced by pseudonyms which are unlinkable between different columns.

Pseudonym (Identifier)	Address
pseudo1	1 Bagshot Row, Hobbiton, the Shire
pseudo2	1 Bagshot Row, Hobbiton, the Shire

Pseudonym (Identifier)	DoB
pseudo3	Sept. 22 1368
pseudo4	Sept. 22 1290

Since the result of pseudonymizing a plain table is a set of pseudonymized single-column tables we refer to this operation as a split conversion.

§Conversion from pseudonymized columns to non-transitively joined tables

Pseudonymized columns may be selectively re-joined such that the original link between data is restored, but under a fresh pseudonymous identifier instead of the original (sensitive) identifier. In the above example, a join of pseudonymized columns Address and DoB would result in the following pseudonymized joined table.

Join Pseudonym (Identifier)	Address	DoB
pseudo5	1 Bagshot Row, Hobbiton, the Shire	Sept. 22 1290
pseudo6	1 Bagshot Row, Hobbiton, the Shire	Sept. 22 1368

The contained pseudonyms are fresh for each join and are non-transitive, i.e. it is not possible to further join two join-results based on the join pseudonym.

Since the result of this conversion is a joined table, we refer to the operation as a join conversion.

§Data Sources, Stores and Converter

ScrambleDB is a multiparty protocol where parties serve different roles as origins or destinations of data.

Non-pseudonymized data originates at a data source.

Data stores hold pseudonymized data and come in two forms:

The data lake is a designated data store which stores pseudonymized data columns fed to it by data sources via the ScrambleDB protocol.
A data processor is a data store which acquires pseudonymized joined tables from a data lake via the ScrambleDB protocol.

The converter facilitates the protocol in an oblivious fashion by blindly performing the two types of conversion operations.

§Cryptographic Preliminaries

§Rerandomizable Public Key Encryption

A rerandomizable public key encryption scheme RPKE is parameterized by a set of possible plaintexts PlainText as well as a set of ciphertexts Ciphertext.

It offers the following interface:

Key Generation:

fn RPKE.generate_key_pair(randomness) -> (ek, dk)

Inputs:
    randomness

Ouputs:
    ek: EncryptionKey
    dk: DecryptionKey

Encryption:

fn RPKE.encrypt(ek, msk, randomness) -> ctxt

Inputs:
    ek: EncryptionKey
    msg: Plaintext
    randomness

Outputs:
    ctxt: Ciphertext

Decryption:

fn RPKE.decrypt(dk, ctxt) -> msg'

Inputs:
    dk: DecryptionKey
    ctxt: Ciphertext

Output:
    msg': Plaintext

Failures:
    DecryptionFailure

Ciphertext rerandomization:

fn RPKE.rerandomize(ek, ctxt, randomness) -> ctxt'

Inputs:
    ek: EncryptionKey
    ctxt: Ciphertext
    randomness

Output:
    ctxt': Ciphertext

§Convertible Pseudorandom Function (coPRF)

TODO: Describe coPRF interface

§Pseudorandom Permutation

A pseudorandom permutation is a keyed pseudorandom permutation with the following interface, where PRPKey is the set of possible keys for the permutation and PRPValue is the both the domain and range of the permutation.

Permutation:

PRP.eval(k, x) -> y

Inputs:
    k: PRPKey
    x: PRPValue

Output:
    y: PRPValue

Inversion:

PRP.inverse(k, x) -> y

Inputs:
    k: PRPKey
    x: PRPValue

Output:
    y: PRPValue

We require that for all possible PRP keys k, successive application of PRP.eval, then PRP.inverse or reversed is the identity function, i.e. for any x in PRPValue:

PRP.eval(k, PRP.inverse(k, x)) = PRP.inverse(k, PRP.eval(k, x)) = x

Modules§

data_transformations: This module defines ScrambleDB transformations at the level of individual pieces of data as defined in data_types.
data_types: This module defines data structures for indvidual pieces of data in ScrambleDB.
error
finalize
join: Pseudonym Conversion
setup: Setup
split: Pseudonymization
table: Tables and Data Types

Structs§

SerializedHPKE: A wrapper type to facilitate (de-)serialization of HPKE ciphertexts to (and from) linear byte vectors.

Crate scrambledbCopy item path