Avoid reduce_once in scalar_centered_binomial_distribution_eta_2_with_prf

The comment that a barrier-less reduce_once was safe turned out to not
*quite* be true. In Clang configurations without auto-vectorization
(notably -O1), Clang would emit a branch instead of a CMOV.

Unfortunately, adding a barrier to reduce_once has significant
performance costs. The problem seems to be that auto-vectorization
breaks. I suspect it is primarily because the value barrier forces the
value into a general-purpose register, while vectorized code puts it
straight into a SIMD register. Though knowing the comparison is a
comparison seems to also help a bit.

Based on what we've understood of Clang's select transforms thus far, it
would make sense that ML-KEM might not need the barrier. The main
culprit is turning multiple selects with the same condition into a
branch, and that does not happen in ML-KEM. Yet we observe a problem.

Based on valgrind instrumentation, the problem seems to be limited to
scalar_centered_binomial_distribution_eta_2_with_prf, likely
because the value has such a limited range of values. For some reason,
this causes many recent versions of Clang to emit a branch.

I think this may actually be a misoptimization. Indeed the very latest
trunk build of Clang on godbolt does not have this problem. Somewhere
between 8cb44859cc31929521c09fc6a8add66d53db44de and
8daf4f16fa08b5d876e98108721dd1743a360326, LLVM seems to have fixed this
issue.

We can avoid this by computing it differently. We currently write
reduce_once(kPrime + a + b - (c + d)), where a through d are 0 or 1.
Instead, we can write a + b - (c + d), let the underflow happen, and
then conditionally add kPrime based on the sign bit of the result. This
seems to avoid mishaps, for now.

If this breaks down again, we may need to get better value barriers, or
to stop relying on auto-vectorization and vectorize ourselves.

Change-Id: I917456348d63628880467d21138a57297532bc9a
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/74447
Auto-Submit: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
2 files changed
tree: c21a6b2b4a7d7d34bc6482b9de7e68237f1e377c
  1. .bcr/
  2. .github/
  3. cmake/
  4. crypto/
  5. decrepit/
  6. docs/
  7. fuzz/
  8. gen/
  9. include/
  10. infra/
  11. pki/
  12. rust/
  13. ssl/
  14. third_party/
  15. tool/
  16. util/
  17. .bazelignore
  18. .bazelrc
  19. .bazelversion
  20. .clang-format
  21. .gitignore
  22. API-CONVENTIONS.md
  23. AUTHORS
  24. BREAKING-CHANGES.md
  25. BUILD.bazel
  26. build.json
  27. BUILDING.md
  28. CMakeLists.txt
  29. codereview.settings
  30. CONTRIBUTING.md
  31. FUZZING.md
  32. go.mod
  33. go.sum
  34. INCORPORATING.md
  35. LICENSE
  36. MODULE.bazel
  37. MODULE.bazel.lock
  38. PORTING.md
  39. PrivacyInfo.xcprivacy
  40. README.md
  41. SANDBOXING.md
  42. STYLE.md
README.md

BoringSSL

BoringSSL is a fork of OpenSSL that is designed to meet Google's needs.

Although BoringSSL is an open source project, it is not intended for general use, as OpenSSL is. We don't recommend that third parties depend upon it. Doing so is likely to be frustrating because there are no guarantees of API or ABI stability.

Programs ship their own copies of BoringSSL when they use it and we update everything as needed when deciding to make API changes. This allows us to mostly avoid compromises in the name of compatibility. It works for us, but it may not work for you.

BoringSSL arose because Google used OpenSSL for many years in various ways and, over time, built up a large number of patches that were maintained while tracking upstream OpenSSL. As Google's product portfolio became more complex, more copies of OpenSSL sprung up and the effort involved in maintaining all these patches in multiple places was growing steadily.

Currently BoringSSL is the SSL library in Chrome/Chromium, Android (but it's not part of the NDK) and a number of other apps/programs.

Project links:

To file a security issue, use the Chromium process and mention in the report this is for BoringSSL. You can ignore the parts of the process that are specific to Chromium/Chrome.

There are other files in this directory which might be helpful: