# Crate scrambledb

source ·## Expand description

## §ScrambleDB

This document describes `ScrambleDB`

, a protocol between several
parties for the pseudonymization and non-transitive joining of data.

### §Overview and Concepts

`ScrambleDB`

operates on tables of data, where a table is a collection
of attribute entries for entities identified by unique keys.

`ScrambleDB`

offers two sub-protocols for blindly converting between different
table types.

#### §Conversion from plain tables to pseudonymized columns

A plain table contains attribute data organized by (possibly
sensitive) entity identifiers, e.g. a table might store attribute
data for attributes `Address`

and `Date of Birth (DoB)`

under the
entity identifier `Full Name`

:

Full Name (Identifier) | Address | DoB |
---|---|---|

Bilbo Baggins | 1 Bagshot Row, Hobbiton, the Shire | Sept. 22 1290 |

Frodo Baggins | 1 Bagshot Row, Hobbiton, the Shire | Sept. 22 1368 |

The result of `ScrambleDB`

pseudonymization of such a
table can be thought of as computed in two steps:

- Splitting the original table by attributes, resulting in single-column tables, one per attribute, indexed by the original identifier.

Full Name (Identifier) | Address |
---|---|

Bilbo Baggins | 1 Bagshot Row, Hobbiton, the Shire |

Frodo Baggins | 1 Bagshot Row, Hobbiton, the Shire |

Full Name (Identifier) | DoB |
---|---|

Bilbo Baggins | Sept. 22 1290 |

Frodo Baggins | Sept. 22 1368 |

- Pseudonymization and shuffling of split columns, such that the original identifiers are replaced by pseudonyms which are unlinkable between different columns.

Pseudonym (Identifier) | Address |
---|---|

pseudo1 | 1 Bagshot Row, Hobbiton, the Shire |

pseudo2 | 1 Bagshot Row, Hobbiton, the Shire |

Pseudonym (Identifier) | DoB |
---|---|

pseudo3 | Sept. 22 1368 |

pseudo4 | Sept. 22 1290 |

Since the result of pseudonymizing a plain table is a set of
pseudonymized single-column tables we refer to this operation as a
*split conversion*.

#### §Conversion from pseudonymized columns to non-transitively joined tables

Pseudonymized columns may be selectively re-joined such that the
original link between data is restored, but under a fresh pseudonymous
identifier instead of the original (sensitive) identifier. In the
above example, a join of pseudonymized columns `Address`

and `DoB`

would result in the following pseudonymized joined table.

Join Pseudonym (Identifier) | Address | DoB |
---|---|---|

pseudo5 | 1 Bagshot Row, Hobbiton, the Shire | Sept. 22 1290 |

pseudo6 | 1 Bagshot Row, Hobbiton, the Shire | Sept. 22 1368 |

The contained pseudonyms are fresh for each join and are non-transitive, i.e. it is not possible to further join two join-results based on the join pseudonym.

Since the result of this conversion is a joined table, we refer to the
operation as a *join conversion*.

#### §Data Sources, Stores and Converter

`ScrambleDB`

is a multiparty protocol where parties serve different
roles as origins or destinations of data.

Non-pseudonymized data originates at a **data source**.

**Data stores** hold pseudonymized data and come in two forms:

- The
**data lake**is a designated data store which stores pseudonymized data columns fed to it by data sources via the ScrambleDB protocol. - A
**data processor**is a data store which acquires pseudonymized joined tables from a data lake via the ScrambleDB protocol.

The **converter** facilitates the protocol in an oblivious fashion by
blindly performing the two types of conversion operations.

### §Cryptographic Preliminaries

#### §Rerandomizable Public Key Encryption

A rerandomizable public key encryption scheme `RPKE`

is parameterized by a
set of possible plaintexts `PlainText`

as well as a set of ciphertexts
`Ciphertext`

.

It offers the following interface:

- Key Generation:

```
fn RPKE.generate_key_pair(randomness) -> (ek, dk)
Inputs:
randomness
Ouputs:
ek: EncryptionKey
dk: DecryptionKey
```

- Encryption:

```
fn RPKE.encrypt(ek, msk, randomness) -> ctxt
Inputs:
ek: EncryptionKey
msg: Plaintext
randomness
Outputs:
ctxt: Ciphertext
```

- Decryption:

```
fn RPKE.decrypt(dk, ctxt) -> msg'
Inputs:
dk: DecryptionKey
ctxt: Ciphertext
Output:
msg': Plaintext
Failures:
DecryptionFailure
```

- Ciphertext rerandomization:

```
fn RPKE.rerandomize(ek, ctxt, randomness) -> ctxt'
Inputs:
ek: EncryptionKey
ctxt: Ciphertext
randomness
Output:
ctxt': Ciphertext
```

#### §Convertible Pseudorandom Function (coPRF)

**TODO: Describe coPRF interface**

#### §Pseudorandom Permutation

A pseudorandom permutation is a keyed pseudorandom permutation with
the following interface, where `PRPKey`

is the set of possible keys
for the permutation and `PRPValue`

is the both the domain and range of
the permutation.

- Permutation:

```
PRP.eval(k, x) -> y
Inputs:
k: PRPKey
x: PRPValue
Output:
y: PRPValue
```

- Inversion:

```
PRP.inverse(k, x) -> y
Inputs:
k: PRPKey
x: PRPValue
Output:
y: PRPValue
```

We require that for all possible PRP keys `k`

, successive application
of `PRP.eval`

, then `PRP.inverse`

or reversed is the identity
function, i.e. for any `x`

in `PRPValue`

:

```
PRP.eval(k, PRP.inverse(k, x)) = PRP.inverse(k, PRP.eval(k, x)) = x
```

## Modules§

- This module defines ScrambleDB transformations at the level of individual pieces of data as defined in
`data_types`

. - This module defines data structures for indvidual pieces of data in ScrambleDB.
- Pseudonym Conversion
- Setup
- Pseudonymization
- Tables and Data Types

## Structs§

- A wrapper type to facilitate (de-)serialization of HPKE ciphertexts to (and from) linear byte vectors.