Verifying Libcrux's ML-KEM

Karthikeyan Bhargavan, Lucas Franceschino, Franziskus Kiefer, Goutam Tamvada
January 30, 2024

In a recent blog post, we described Cryspen’s new Rust implementation of ML-KEM in Rust, and talked about how our high-assurance development methodology helped us find a new timing bug in various other Kyber implementations.

In this post, we will go into some more detail on how we use the hax verification toolchain to formally prove that our code is panic free and functionally correct.

Background on High-Assurance Cryptography

Cryptographic libraries lie at the heart of the trusted computing base of all networked applications and, consequently, programming bugs in these libraries are always treated as security vulnerabilities, sometimes with high-profile acronyms and news articles. In the past decade, a number of companies and research groups have made a concerted attempt to improve confidence in cryptographic libraries by using techniques from formal verification. This research area, called computer-aided cryptography or high-assurance cryptography, is surveyed in this paper and even has a dedicated workshop.

There are several notable successes from this community effort. Projects like HACL* and Fiat-Crypto generate C code from formally verified implementations written in proof-oriented languages like F* and Coq, and this C code is now widely used in mainstream software including Google Chrome, Mozilla Firefox, Linux Kernel, WireGuard VPN, mbed TLS, Python, ElectionGuard, and Tezos. Other projects like Vale and libjade generate verified assembly code that is as fast or faster than the most optimized implementations in mainstream crypto libraries. Still other projects like CryptoLine seek to verify existing code in important libraries like OpenSSL.

Despite these landmark achievements, many open questions remain. Cryptographic verification tools are largely focused on software written in research languages and it requires Ph.D level skills to verify new cryptographic algorithms. Our verification toolchain hax offers a new way forward targeting implementations written by crypto developers in Rust.

Running the hax toolchain

We use the hax toolchain to extract F* code from Rust code. More details on how hax works and its history is in this blog post. The hax toolchain supports multiple backend provers, but for the purposes of this blog post, we focus on the F* backend.

We run hax on the ML-KEM implementation in librux and for each module M in the implementation, hax produces an F* implementation file M.fst and a corresponding interface file M.fsti. Each function and constant in Rust is compiled to an equivalent function and constant in F*. Each Rust trait turns into an F* typeclass, and each Rust type is translated to the corresponding type in F*.

Example: Barrett Reduction

As a first example, consider the following function for Barrett reduction, taken from the arithmetic.rs file in the libcrux ML-KEM implementation:

pub(crate) fn barrett_reduce(input: i32) -> i32 {
    let t = (i64::from(input) * BARRETT_MULTIPLIER) + (BARRETT_R >> 1);
    let quotient = (t >> BARRETT_SHIFT) as i32;
    let remainder = input - (quotient * FIELD_MODULUS);
    remainder
}

This function implements the classic Barrett reduction algorithm to compute the modulus of a 32-bit signed integer input with respect to the ML-KEM prime q = 3329. The algorithm uses the constant BARRETT_R = 2.pow(26) with BARRETT_SHIFT = 26.

In the first line, we multiple the input by a pre-computed constant 20159 (equal to ⌈BARRETT_R/q⌋), and add it to BARRETT_R/2 to obtatin t. The next line divides t by BARRETT_R to find the quotient, and the third line computes and returns the remainder.

When we run hax, this function is translated to the following function in F* (taken from the module Libcux.Kem.Kyber.Arithmetic.fst):

let barrett_reduce (input: i32) =
  let t: i64 =
    ((Core.Convert.f_from input <: i64) *! v_BARRETT_MULTIPLIER <: i64) +!
    (v_BARRETT_R >>! 1l <: i64)
  in
  let quotient: i32 = cast (t >>! v_BARRETT_SHIFT <: i64) <: i32 in
  let remainder: i32 =
    input -!
    (quotient *! Libcrux.Kem.Kyber.Constants.v_FIELD_MODULUS <: i32) in
  remainder

As we can see, hax has translated each Rust arithmetic operation (e.g. +) to the corresponding operation in F* (e.g. +!), and every constant (e.g. 1l), variable (e.g. t) and function (e.g. f_from) is fully annotated with its complete path and type.

This F* code uses a set of hand-written annotated F* libraries (e.g. Core.Convert) that model the behavior of the Rust core libraries and operators. These libraries are trusted, but must be carefully audited to ensure that they correctly reflect the operational semantics of Rust.

Why bother with Barrett Reduction

One may ask why we use a complicated algorithm like Barrett Reduction for a single i32. Isn’t this algorithm meant only for large integers stored as bignumss? Why can’t we just calculate the modulus as input % 3329?

The reason we cannot use % is that input is a secret here and division is not a constant-time operation, as we discussed in our previous blog post. In F*, we treat i32 as a secret integer type (a public integer must be explicitly labeled as pub_i32), and operations like % are not defined for secret integers. Consequently, the F* typechecker would immediately reject input % 3329 as mal-typed.

We could also use a different modular reduction algorithm, and indeed our code also uses Montgomery reduction, but Barrett Reduction is well-suited for certain computations in Kyber.

Expressiveness and limitations

We note that the F* type system is much more powerful than Rust, allowing the programmer to attach invariants to types (refinement types), pre- and post-conditions to functions (dependent function types) etc., but the hax translation only uses simple F* types by default. As we start doing proofs over this code in F*, we use this expressiveness to extend the generated types with logical annotations.

We also note that although F* has an expressive effect system to handle stateful code, we only use the purely functional (Tot) effect. This gives us a significant advantage in being able to prove functional correctness without worrying about stateful side-effects.

The hax toolchain does not support the full Rust language (at least at present), so for example, if you write a function that relies on delicate pointer manipulation and aliasing in doubly-linked linked lists, it will likely be rejected by hax. However, cryptographic code like ML-KEM primarily performs numeric computations using simple data structures like arrays and structs, with some generalization using generic types, const generics, and traits, and all of these features are supported by hax.

Proving Panic Freedom

Once we have generated F* code from Rust using hax, the verification work begins. Looking again at the Barrett reduction function, we observe that it uses the following arithmetic operations:

shift right (>>)
addition (+)
multiplication (*)
subtraction (-)

All of these operations can panic if their arguments are not of the right size. For example, i32 addition requires that the sum of the two (signed) arguments does not overflow or underflow the i32 type, otherwise the operation will trigger a panic in Debug mode and produce a (possibly) unexpected result in Release mode.

In our methodology, we first require that all the code be proved to be panic-free. To do this, we annotate the F* libraries containing these operations with pre-conditions that say that their inputs have to be in the right range. We also provide specifications and lemmas for each operation that models its input-output behavior. For example, right-shift behaves like a division with a power of two, addition maps to arithmetic addition etc.

Then, we run the F* typechecker on the generated code, and it tries to prove these library pre-conditions and if F* fails to prove a pre-condition, it produces a type error, indicating that either the function may panic, or that the prover requires some more annotations or proof hints.

When we run F* on the barrett_reduce function, it uses the library definitions of the operators to prove, in sequence, that:

input fits in an i64 (trivial since input : i32)
input * 20159 fits in an i64 (trivial since both inputs are i32)
t = input * 20159 + 2.pow(25) fits in an i64 (requires some math reasoning)
quotient = t / 2.pow(26) fits in an i32 (requires some math reasoning)
input - (quotient * 3329) fits in an i32 (requires some math reasoning)

Using the annotations and lemmas in our library, F* is able to prove all of the above automatically using the Z3 SMT solver, and hence it proves that the barrett_reduce function is panic free.

As we can see, panic-freedom already requires some mathematical and logical reasoning, but it is only the first step towards functional correctness. Often, panic freedom can be proved automatically by F*, but functions with loops, as we see below, require more annotations.

Proving Functional Correctness

To prove that a function like barrett_reduce is correct, we first need to formally specify its expected input-output behavior. If the input is a natural number, the Barrett reduction algorithm is meant to compute input % 3329. However, in our code, input is a signed integer and may be negative, in which case the remainder is also a negative value. Furthermore, the way we use this function in Kyber, we know that the input is always in the range - BARRETT_R <= input <= BARRETT_R.

So, we annotate the type of barrett_reduce (in the interface file Libcux.Kem.Kyber.Arithmetic.fsti) with the types and post-condition shown below:

val barrett_reduce (input: i32_b (v v_BARRETT_R))
    : Pure (i32_b 3328)
    (requires True)
    (ensures fun result ->
        v result % v Libcrux.Kem.Kyber.Constants.v_FIELD_MODULUS
      = v  input % v Libcrux.Kem.Kyber.Constants.v_FIELD_MODULUS)

The refinement type i32_b N is defined as the type of 32-bit signed integers x:i32 such that -N <= x <= N. So the type annotation on input is a pre-condition that says that the input to this function must be in the range -BARRETT_R <= input <= BARRETT_R. The annotation on the result says that it must be in the range -3328 <= result <= 3328, and the post-condition in the ensures clause says that the result is the same as the input modulo the ML-KEM prime (3329).

Proving the functional correctness of barrett_reduce requires more work from the proof engineer, since we need to use properties of division and modulus in ways that F* cannot infer automatically. There are many ways of helping F* complete this proof. One such proof is shown below.

A Proof of Barrett Reduction in F*

The proof is interleaved with the code of the barrett_reduce function in F*:

let barrett_reduce input =
  let t: i64 =
    ((Core.Convert.f_from input <: i64) *! v_BARRETT_MULTIPLIER <: i64) +!
    (v_BARRETT_R >>! 1l <: i64)
  in
  assert_norm (v v_BARRETT_MULTIPLIER == (pow2 27 + 3329) / (2*3329));
  assert (v t = v input * v v_BARRETT_MULTIPLIER + pow2 25);
  let quotient: i32 = cast (t >>! v_BARRETT_SHIFT <: i64) <: i32 in
  assert (v quotient = v t / pow2 26);
  let remainder: i32 =
       input
    -! (quotient *! Libcrux.Kem.Kyber.Constants.v_FIELD_MODULUS <: i32)
  in
  calc (==) {
    v remainder % 3329;
    (==) { }
    (v input - (v quotient * 3329)) % 3329;
    (==) {Math.Lemmas.lemma_mod_sub_distr (v input) (v quotient * 3329) 3329}
    (v input - (v quotient * 3329) % 3329) % 3329;
    (==) {Math.Lemmas.cancel_mul_mod (v quotient) 3329}
    (v input - 0) % 3329;
    (==) {}
    (v input) % 3329;
  };
  remainder

This proof uses assertions (assert) to tell F* to prove some useful logical property, normalizing assertions (assert_norm) to ask F* to verify a calculation, and a calculational tactic (calc) to prove the post-condition using lemmas from the F* math library (Math.Lemmas). This is a fairly simple explicit proof in F*; it could be further optimized and made more elegant, but it works for the specification we have here and is representative of other proofs we do for our ML-KEM code.

Verifying Properties about Loops

Barrett reduction is a good example of our methodology since it is self-contained and has a clean mathematical specification. However, large parts of the code in ML-KEM (and other cryptographic algorithms) implement algorithms over arrays, vectors, and matrices of values. In these functions, we need to reason about loop invariants to prove that they are panic free and functionally correct.

As an example of this class of code, we consider another function from arithmetic.rs which adds two ML-KEM polynomial ring elements:

pub(crate) fn add_to_ring_element(
    mut lhs: PolynomialRingElement,
    rhs: &PolynomialRingElement,
) -> PolynomialRingElement {
    for i in 0..lhs.coefficients.len() {
        lhs.coefficients[i] += rhs.coefficients[i];
    }
    lhs
}

This function takes two polynomial ring elements (arrays of 256 i32 coefficients) called lhs and rhs and returns the sum of these elements in lhs. For simplicity, we return lhs here, but of course we could just mutate lhs as well.

To prove that this function is panic free, we require two properties. First, that the function never accesses lhs or rhs out of bounds. Since, we only access these arrays at indices i in the right range, F* can automatically prove these accesses to be safe.

Second, the addition operation + within the loop should never overflow. F* cannot prove this property, since it indeed does not hold for all inputs. Consequently, we need to add a pre-condition to the function in F* that limits the range of inputs it can be called with, and we further need to define a loop invariant in F* to ensure that this property is preserved as the loop executes.

To specify functional correctness, we add a post-condition to the function that says that the output coefficients are the sum of the input-coefficients. Proving this property also requires us to extend the loop invariant in F*. Details of the proof are below.

Specifying and Verifying add_to_ring_element

The specification for the add_to_ring_element function is:

val add_to_ring_element (#b1: nat) (#b2: nat{b1 + b2 < pow2_31})
    (lhs: t_PolynomialRingElement_b b1) (rhs: t_PolynomialRingElement_b b2)
    : Pure (t_PolynomialRingElement_b (b1 + b2))
      (requires True)
      (ensures fun result ->
        (forall i. v result.f_coefficients.[i]
                == v lhs.f_coefficients.[i] + v rhs.f_coefficients.[i]))

It requires that every coefficient x in the input lhs must be bound by some b1 (-b1 <= x <= b1) and every coefficient y in rhs must be bound by some b2 (-b2 <= y <= b2) such that b1 + b2 < pow2 31. If this pre-condition holds, then the function produces an output ring element where every coefficient z is bound by b1+b2 (-(b1+b2) < z < (b1+b2)). Furthermore, the ensures clause states a post-condition that each coefficient in the output is a point-wise sum of the corresponding coefficients in the two inputs.

The F* code and proof of this function is too verbose to include here, but the key element is the loop invariant defined below

let inv = fun (acc: t_PolynomialRingElement_b (b1 + b2)) (i: usize) ->
      (forall j. j <. i ==>
             acc.f_coefficients.[j]
          == lhs.f_coefficients.[j] +! rhs.f_coefficients.[j]) /\
      (forall j. j >=. i ==>
             acc.f_coefficients.[j]
          == orig_lhs.f_coefficients.[j])

It says that after i iterations, the coefficient at every index j < i of the result is the sum of the corresponding values in lhs and rhs, and the coefficient at every index j >= i of the result is the same as the corresponding coefficient in the original lhs. Once we state this invariant, F* is able to prove the rest of the properties automatically.

Lessons and Benefits of Formal Verification

In this post, we have tried to provide a taste of using hax and F* to verify our ML-KEM Rust implementation. The full implementation is quite large and we are still cleaning up and polishing off the last of our proofs before a final release.

Our experience with ML-KEM shows off several benefits of formal verification:

Finding bugs and attacks: Our formal analysis resulted in us finding a timing bug in our code, which then pointed the way towards bugs in other implementations of ML-KEM. Although the bug itself does not require deep verification in F*, it does rely on a model of integer arithmetic, and an analysis of how secret information flows through the ML-KEM implementation.
Code understanding: Our Rust code was comprehensively tested and so we did not expect to find many bugs, but even so, there were several instances where the formal verification found flaws in our assumptions. For example, in some cases, the calculations were much closer to the boundaries of an integer type than we thought.
Justifying optimizations: Formal verification allowed us to have confidence in the low-level optimizations that we used to speed up our code. For example, in many places in the code, we postpone modular reduction because we can prove that we would not overflow the integer type, and this yields significant performance gains.

In summary, this project shows that formal verification has advantages beyond finding an attack or constructing a proof of a snapshot of the code; it can be used as a constructive part of high-assurance, high-performance cryptographic software development.

To learn more about our methodology, results, or how we can help your project, contact us.

Get in touch