scrambledb/
lib.rs

1//! # ScrambleDB
2//!
3//! This document describes `ScrambleDB`, a protocol between several
4//! parties for the pseudonymization and non-transitive joining of data.
5//!
6//! ## Overview and Concepts
7//! `ScrambleDB` operates on tables of data, where a table is a collection
8//! of attribute entries for entities identified by unique keys.
9//!
10//!
11//! `ScrambleDB` offers two sub-protocols for blindly converting between different
12//! table types.
13//! ### Conversion from plain tables to pseudonymized columns
14//! A plain table contains attribute data organized by (possibly
15//! sensitive) entity identifiers, e.g. a table might store attribute
16//! data for attributes `Address` and `Date of Birth (DoB)` under the
17//! entity identifier `Full Name`:
18//!
19//! | Full Name (Identifier) | Address                            | DoB           |
20//! |------------------------|------------------------------------|---------------|
21//! | Bilbo Baggins          | 1 Bagshot Row, Hobbiton, the Shire | Sept. 22 1290 |
22//! | Frodo Baggins          | 1 Bagshot Row, Hobbiton, the Shire | Sept. 22 1368 |
23//!
24//! The result of `ScrambleDB` pseudonymization of such a
25//! table can be thought of as computed in two steps:
26//!
27//! 1. Splitting the original table by attributes, resulting in
28//!    single-column tables, one per attribute, indexed by the original
29//!    identifier.
30//!
31//! | Full Name (Identifier) | Address                            |
32//! |------------------------|------------------------------------|
33//! | Bilbo Baggins          | 1 Bagshot Row, Hobbiton, the Shire |
34//! | Frodo Baggins          | 1 Bagshot Row, Hobbiton, the Shire |
35//!
36//! | Full Name (Identifier) | DoB           |
37//! |------------------------|---------------|
38//! | Bilbo Baggins          | Sept. 22 1290 |
39//! | Frodo Baggins          | Sept. 22 1368 |
40//!
41//! 2. Pseudonymization and shuffling of split columns, such that the original
42//!    identifiers are replaced by pseudonyms which are unlinkable between
43//!    different columns.
44//!
45//! | Pseudonym (Identifier) | Address                            |
46//! |------------------------|------------------------------------|
47//! | _pseudo1_              | 1 Bagshot Row, Hobbiton, the Shire |
48//! | _pseudo2_              | 1 Bagshot Row, Hobbiton, the Shire |
49//! |                        |                                    |
50//!
51//! | Pseudonym (Identifier) | DoB           |
52//! |------------------------|---------------|
53//! | _pseudo3_              | Sept. 22 1368 |
54//! | _pseudo4_              | Sept. 22 1290 |
55//!
56//!
57//! Since the result of pseudonymizing a plain table is a set of
58//! pseudonymized single-column tables we refer to this operation as a
59//! _split conversion_.
60//!
61//! ### Conversion from pseudonymized columns to non-transitively joined tables
62//! Pseudonymized columns may be selectively re-joined such that the
63//! original link between data is restored, but under a fresh pseudonymous
64//! identifier instead of the original (sensitive) identifier. In the
65//! above example, a join of pseudonymized columns `Address` and `DoB`
66//! would result in the following pseudonymized joined table.
67//!
68//!
69//! | Join Pseudonym (Identifier) | Address                            | DoB           |
70//! |-----------------------------|------------------------------------|---------------|
71//! | _pseudo5_                   | 1 Bagshot Row, Hobbiton, the Shire | Sept. 22 1290 |
72//! | _pseudo6_                   | 1 Bagshot Row, Hobbiton, the Shire | Sept. 22 1368 |
73//!
74//! The contained pseudonyms are fresh for each join and are
75//! non-transitive, i.e. it is not possible to further join two
76//! join-results based on the join pseudonym.
77//!
78//! Since the result of this conversion is a joined table, we refer to the
79//! operation as a _join conversion_.
80//!
81//! ### Data Sources, Stores and Converter
82//! `ScrambleDB` is a multiparty protocol where parties serve different
83//! roles as origins or destinations of data.
84//!
85//! Non-pseudonymized data originates at a **data source**.
86//!
87//! **Data stores** hold pseudonymized data and come in two forms:
88//! - The **data lake** is a designated data store which stores
89//!   pseudonymized data columns fed to it by data sources via the
90//!   ScrambleDB protocol.
91//! - A **data processor** is a data store which acquires pseudonymized
92//!   joined tables from a data lake via the ScrambleDB protocol.
93//!
94//! The **converter** facilitates the protocol in an oblivious fashion by
95//! blindly performing the two types of conversion operations.
96//!
97//!
98//! ## Cryptographic Preliminaries
99//!
100//! ### Rerandomizable Public Key Encryption
101//! A rerandomizable public key encryption scheme `RPKE` is parameterized by a
102//! set of possible plaintexts `PlainText` as well as a set of ciphertexts
103//! `Ciphertext`.
104//!
105//! It offers the following interface:
106//! - Key Generation:
107//! ```text
108//! fn RPKE.generate_key_pair(randomness) -> (ek, dk)
109//!
110//! Inputs:
111//!     randomness
112//!
113//! Ouputs:
114//!     ek: EncryptionKey
115//!     dk: DecryptionKey
116//! ```
117//! - Encryption:
118//! ```text
119//! fn RPKE.encrypt(ek, msk, randomness) -> ctxt
120//!
121//! Inputs:
122//!     ek: EncryptionKey
123//!     msg: Plaintext
124//!     randomness
125//!
126//! Outputs:
127//!     ctxt: Ciphertext
128//! ```
129//! - Decryption:
130//! ``` text
131//! fn RPKE.decrypt(dk, ctxt) -> msg'
132//!
133//! Inputs:
134//!     dk: DecryptionKey
135//!     ctxt: Ciphertext
136//!
137//! Output:
138//!     msg': Plaintext
139//!
140//! Failures:
141//!     DecryptionFailure
142//! ```
143//! - Ciphertext rerandomization:
144//!
145//! ``` text
146//! fn RPKE.rerandomize(ek, ctxt, randomness) -> ctxt'
147//!
148//! Inputs:
149//!     ek: EncryptionKey
150//!     ctxt: Ciphertext
151//!     randomness
152//!
153//! Output:
154//!     ctxt': Ciphertext
155//! ```
156//!
157//! ### Convertible Pseudorandom Function (coPRF)
158//! **TODO: Describe coPRF interface**
159//!
160//! ### Pseudorandom Permutation
161//! A pseudorandom permutation is a keyed pseudorandom permutation with
162//! the following interface, where `PRPKey` is the set of possible keys
163//! for the permutation and `PRPValue` is the both the domain and range of
164//! the permutation.
165//!
166//! - Permutation:
167//!
168//! ``` text
169//! PRP.eval(k, x) -> y
170//!
171//! Inputs:
172//!     k: PRPKey
173//!     x: PRPValue
174//!
175//! Output:
176//!     y: PRPValue
177//! ```
178//!
179//! - Inversion:
180//!
181//! ``` text
182//! PRP.inverse(k, x) -> y
183//!
184//! Inputs:
185//!     k: PRPKey
186//!     x: PRPValue
187//!
188//! Output:
189//!     y: PRPValue
190//! ```
191//!
192//! We require that for all possible PRP keys `k`, successive application
193//! of `PRP.eval`, then `PRP.inverse` or reversed is the identity
194//! function, i.e. for any `x` in `PRPValue`:
195//!
196//! ``` text
197//! PRP.eval(k, PRP.inverse(k, x)) = PRP.inverse(k, PRP.eval(k, x)) = x
198//! ```
199
200use libcrux::hpke::{aead::AEAD, kdf::KDF, kem::KEM, HPKECiphertext, HPKEConfig, Mode};
201
202/// Security parameter in bytes
203const SECPAR_BYTES: usize = 16;
204
205/// The HPKE Configuration used in the implementation of double HPKE
206/// encryption using the HPKE single-shot API.
207const HPKE_CONF: HPKEConfig = HPKEConfig(
208    Mode::mode_base,
209    KEM::DHKEM_P256_HKDF_SHA256,
210    KDF::HKDF_SHA256,
211    AEAD::ChaCha20Poly1305,
212);
213
214/// A wrapper type to facilitate (de-)serialization of HPKE
215/// ciphertexts to (and from) linear byte vectors.
216pub struct SerializedHPKE {
217    len_kem_output: u32,
218    len_ciphertext: u32,
219    bytes: Vec<u8>,
220}
221
222impl SerializedHPKE {
223    /// Prepare an HPKE ciphertext for serialization by wrapping it in
224    /// a `SerializedHPKE`.
225    pub fn from_hpke_ct(ct: &HPKECiphertext) -> Self {
226        let mut bytes = ct.0.clone();
227        bytes.extend_from_slice(&ct.1);
228
229        Self {
230            len_kem_output: ct.0.len() as u32,
231            len_ciphertext: ct.1.len() as u32,
232            bytes,
233        }
234    }
235
236    /// Reconstruct an HPKE ciphertext from the wrapper type. This
237    /// does not perform validation of the reconstructed ciphertext.
238    pub fn to_hpke_ct(&self) -> HPKECiphertext {
239        HPKECiphertext(
240            self.bytes[0..self.len_kem_output as usize].to_vec(),
241            self.bytes[self.len_kem_output as usize..self.bytes.len()].to_vec(),
242        )
243    }
244
245    /// Serialize the wrapper type to a byte vector.
246    pub fn to_bytes(&self) -> Vec<u8> {
247        let mut bytes = Vec::new();
248        bytes.extend_from_slice(&self.len_kem_output.to_be_bytes());
249        bytes.extend_from_slice(&self.len_ciphertext.to_be_bytes());
250        bytes.extend_from_slice(&self.bytes);
251        bytes
252    }
253
254    /// Deseralize a wrapped HPKE ciphertext from a byte vector. This
255    /// does not perform validation of the deserialized ciphertext.
256    pub fn from_bytes(bytes: &[u8]) -> Self {
257        let len_kem_output = u32::from_be_bytes(bytes[0..4].try_into().unwrap());
258        let len_ciphertext = u32::from_be_bytes(bytes[4..8].try_into().unwrap());
259        Self {
260            len_kem_output,
261            len_ciphertext,
262            bytes: bytes[8..bytes.len()].to_vec(),
263        }
264    }
265}
266
267pub mod table;
268pub mod setup;
269pub mod split;
270pub mod join;
271pub mod finalize;
272pub mod data_transformations;
273pub mod data_types;
274
275pub mod error;
276
277#[cfg(feature = "wasm")]
278pub mod wasm_demo;
279
280#[cfg(test)]
281mod test_util;