Crate scrambledb

source ·
Expand description

§ScrambleDB

This document describes ScrambleDB, a protocol between several parties for the pseudonymization and non-transitive joining of data.

§Overview and Concepts

ScrambleDB operates on tables of data, where a table is a collection of attribute entries for entities identified by unique keys.

ScrambleDB offers two sub-protocols for blindly converting between different table types.

§Conversion from plain tables to pseudonymized columns

A plain table contains attribute data organized by (possibly sensitive) entity identifiers, e.g. a table might store attribute data for attributes Address and Date of Birth (DoB) under the entity identifier Full Name:

Full Name (Identifier)AddressDoB
Bilbo Baggins1 Bagshot Row, Hobbiton, the ShireSept. 22 1290
Frodo Baggins1 Bagshot Row, Hobbiton, the ShireSept. 22 1368

The result of ScrambleDB pseudonymization of such a table can be thought of as computed in two steps:

  1. Splitting the original table by attributes, resulting in single-column tables, one per attribute, indexed by the original identifier.
Full Name (Identifier)Address
Bilbo Baggins1 Bagshot Row, Hobbiton, the Shire
Frodo Baggins1 Bagshot Row, Hobbiton, the Shire
Full Name (Identifier)DoB
Bilbo BagginsSept. 22 1290
Frodo BagginsSept. 22 1368
  1. Pseudonymization and shuffling of split columns, such that the original identifiers are replaced by pseudonyms which are unlinkable between different columns.
Pseudonym (Identifier)Address
pseudo11 Bagshot Row, Hobbiton, the Shire
pseudo21 Bagshot Row, Hobbiton, the Shire
Pseudonym (Identifier)DoB
pseudo3Sept. 22 1368
pseudo4Sept. 22 1290

Since the result of pseudonymizing a plain table is a set of pseudonymized single-column tables we refer to this operation as a split conversion.

§Conversion from pseudonymized columns to non-transitively joined tables

Pseudonymized columns may be selectively re-joined such that the original link between data is restored, but under a fresh pseudonymous identifier instead of the original (sensitive) identifier. In the above example, a join of pseudonymized columns Address and DoB would result in the following pseudonymized joined table.

Join Pseudonym (Identifier)AddressDoB
pseudo51 Bagshot Row, Hobbiton, the ShireSept. 22 1290
pseudo61 Bagshot Row, Hobbiton, the ShireSept. 22 1368

The contained pseudonyms are fresh for each join and are non-transitive, i.e. it is not possible to further join two join-results based on the join pseudonym.

Since the result of this conversion is a joined table, we refer to the operation as a join conversion.

§Data Sources, Stores and Converter

ScrambleDB is a multiparty protocol where parties serve different roles as origins or destinations of data.

Non-pseudonymized data originates at a data source.

Data stores hold pseudonymized data and come in two forms:

  • The data lake is a designated data store which stores pseudonymized data columns fed to it by data sources via the ScrambleDB protocol.
  • A data processor is a data store which acquires pseudonymized joined tables from a data lake via the ScrambleDB protocol.

The converter facilitates the protocol in an oblivious fashion by blindly performing the two types of conversion operations.

§Cryptographic Preliminaries

§Rerandomizable Public Key Encryption

A rerandomizable public key encryption scheme RPKE is parameterized by a set of possible plaintexts PlainText as well as a set of ciphertexts Ciphertext.

It offers the following interface:

  • Key Generation:
fn RPKE.generate_key_pair(randomness) -> (ek, dk)

Inputs:
    randomness

Ouputs:
    ek: EncryptionKey
    dk: DecryptionKey
  • Encryption:
fn RPKE.encrypt(ek, msk, randomness) -> ctxt

Inputs:
    ek: EncryptionKey
    msg: Plaintext
    randomness

Outputs:
    ctxt: Ciphertext
  • Decryption:
fn RPKE.decrypt(dk, ctxt) -> msg'

Inputs:
    dk: DecryptionKey
    ctxt: Ciphertext

Output:
    msg': Plaintext

Failures:
    DecryptionFailure
  • Ciphertext rerandomization:
fn RPKE.rerandomize(ek, ctxt, randomness) -> ctxt'

Inputs:
    ek: EncryptionKey
    ctxt: Ciphertext
    randomness

Output:
    ctxt': Ciphertext

§Convertible Pseudorandom Function (coPRF)

TODO: Describe coPRF interface

§Pseudorandom Permutation

A pseudorandom permutation is a keyed pseudorandom permutation with the following interface, where PRPKey is the set of possible keys for the permutation and PRPValue is the both the domain and range of the permutation.

  • Permutation:
PRP.eval(k, x) -> y

Inputs:
    k: PRPKey
    x: PRPValue

Output:
    y: PRPValue
  • Inversion:
PRP.inverse(k, x) -> y

Inputs:
    k: PRPKey
    x: PRPValue

Output:
    y: PRPValue

We require that for all possible PRP keys k, successive application of PRP.eval, then PRP.inverse or reversed is the identity function, i.e. for any x in PRPValue:

PRP.eval(k, PRP.inverse(k, x)) = PRP.inverse(k, PRP.eval(k, x)) = x

Modules§

Structs§

  • A wrapper type to facilitate (de-)serialization of HPKE ciphertexts to (and from) linear byte vectors.