Modern programming languages typically come with a large standard library that implements essential language features like machine arithmetic and I/O, offers efficient data structures, provides interfaces to system libraries, etc. Although it is often overlooked, this standard library should be considered as much part of the trusted computing base (TCB) as the language compiler. Indeed, any bug in the standard library is likely to break applications and could result in security vulnerabilities.
Recognizing this concern, the Rust community has initiated an effort to formally verify the Rust standard library. Cryspen is participating in this project, and as our first contribution, we proposed and implemented a new methodology to formally specify and test the platform-specific SIMD functions in the Rust core library. These functions are used in a variety of performance-sensitive Rust applications, including cryptographic libraries like libcrux.
Modeling Rust’s core::arch library
The
core::arch
library in Rust provides a number of platform-specific
functionalities, including wrappers for specific machine instructions.
In particular, it provides functions corresponding to the
single-instruction-multiple-data (SIMD) instructions provided by
different vectorized architectures such as Intel’s AVX, AVX2 and
AVX-512, and Arm’s Aarch64/Neon.
To model the behaviour of each SIMD instruction in a testable manner, we essentially write a reference implementation of the function in Rust based on the instruction’s documentation.
There are three kinds of SIMD intrinsics in core::arch.
The first kind are builtin Rust compiler intrinsics, some of which are
in the intrinsics/simd.rs file
in the core crate, and others are in the simd.rs file of core_arch.
These builtin intrinsics define generic SIMD operations that the Rust compiler knows how to implement on each platform.
The second kind are extern intrinsics that are links to definitions in LLVM.
See, for example, this list
of extern intrinsics used in the Intel x86 AVX2 library.
These extern intrinsics are typically platform-specific functions that map to low-level instructions.
The third kind are defined intrinsics that are given proper definitions in Rust, and their code may
depend on the builtin intrinsics or the extern intrinsics. These defined intrinsics represent higher-level
operations that are wrappers around one or more assembly instructions.
For builtin and extern intrinsics, we need to define reference
implementations ourselves, but for defined intrinsics, we have a
choice: we can either define a completely independent model of the
intrinsic based on its documentation, or we can copy and adapt the
implementation provided in the Rust core library and treat it as a
reference implementation. If our goal is to check the Rust core
library, it is always better to write a fresh model that can serve as
the official specification. Conversely, deriving models from the Rust
library implementations allows us to quickly obtain models for a large
class of functions, and these models accurately reflect what Rust does
in practice, which is useful when analyzing Rust programs that use
these intrinsics. We need to balance this tradeoff based on our goals,
and here, we only derive models for intrinsics whose bodies are
shallow wrappers around builtin or extern intrinsics.
An Example: _mm256_bsrli_epi128
For example, let us say the intrinsic we are modeling is
_mm256_bsrli_epi128 from the Intel AVX2 feature set.
The official Intel documentation of this function provides a textual description:
Shift 128-bit lanes in a right by imm8 bytes while shifting in zeros, and store the results in dst.
It also provides pseudocode for the instruction:
tmp := imm8[7:0]
IF tmp > 15
tmp := 16
FI
dst[127:0] := a[127:0] >> (tmp*8)
dst[255:128] := a[255:128] >> (tmp*8)
dst[MAX:256] := 0
Based on this pseudocode, we can write a model for the function,
interpreting the 256-bit vector type __m256i as an array of two
128-bit integers (i128x2):
pub fn _mm256_bsrli_epi128<const IMM8: i32>(a: i128x2) -> i128x2 {
i128x2::from_fn(|i| {
let mut tmp = IMM8%256;
if tmp > 15 {tmp = 16}
((a[i] as u128 ) >> (tmp*8)) as i128
})
}
However, in the Rust core library, this is a defined intrinsic, and
when we began our project, this intrinsic was implemented as shown
here. Notably,
the implementation aims to be efficient by branching over the shift
value (tmp) and implementing the shift using byte-level shuffling
operations, presumably because this is faster than using a shift instruction.
We now have two different interpretations of the same
instruction, a model that follows the Intel documentation and an
implementation that aims to be efficient. Since both these functions
are executable, we can test them against each other.
Testing the Model agains the Rust Core library
We use the standard Rust testing framework to run a large number of tests for each intrinsic we model against the same intrinsic in the Rust core library. For convenience, we implemented a set of macros that allows us to generate random tests for large inputs and comprehensive tests for small inputs.
For the _mm256_bsrli_epi128 function above, we test it for all
values of IMM8 from 0 to 256 (since the Intel pseudocode only looks
at the bottom 8 bits), and for a 100 random bitvectors representing
the a vector (the number of tests is configurable). For each input,
we test whether our model returns the same output as the Rust
core. Often, this process reveals bugs in our hand-written models,
where we may have misinterpreted the Intel or Arm documentation.
In the case of _mm256_bsrli_epi128 our tests revealed a bug in the
Rust core library. When the value of tmp in the pseudocode above is
greater than 15, instead of setting the value to 16 as specified by
Intel, the Rust core implementation set it to tmp % 16. This bug can
be seen on this
line. As
a result, the tests fail for values of IMM8 above 15.
A similar bug affects _mm512_bsrli_epi128.
We opened an issue in the Rust stdarch repository, reporting this bug,
along with the precise tests that fail. This helped the Rust developers
quickly agree on the bug and accept our fix, which has since been upstreamed.
Testable Models for the Rust SIMD Intrinsics
In total, we formally specified 384 intrinsics for x86 platforms, 181 intrinsics for aarch64, and tested these specifications against the Rust core library. In the course of this testing, we found and fixed bugs in two intrinsics in the Rust library.
- Our challenge to Verify Rust Std: Challenge 15
- Our solution to the challenge: Testable Models for SIMD Intrinsics
- The bug we found in the Rust SIMD libraries: Bug, Fix
This work was done as part of Aniket Mishra’s Internship at Cryspen, and is part
of a larger project to formally specify the Rust core and std libraries in a way that can be used in formal proofs of Rust programs in hax.
If you want to learn more about how we test and verify Rust libraries, get in touch.