Fix missing vzeroupper in gcm_ghash_vpclmulqdq_avx2() for len=16

gcm_ghash_vpclmulqdq_avx2() executes vzeroupper only when len >= 32, as
it was supposed to use only xmm registers for len == 16.  But actually
it was writing to two ymm registers unconditionally for $BSWAP_MASK and
$GFPOLY.  Therefore, there could be a slow-down in later code using
legacy SSE instructions, in the (probably rare) case where
gcm_ghash_vpclmulqdq_avx2() was called with len=16 *and* wasn't followed
by a function that does vzeroupper, e.g. aes_gcm_enc_update_vaes_avx2().

(The Windows xmm register restore epilogue of
gcm_ghash_vpclmulqdq_avx2() itself does use legacy SSE instructions, so
probably was slowed down by this, but that's just 8 instructions.)

Fix this by updating gcm_ghash_vpclmulqdq_avx2() to correctly use only
xmm registers when len=16.  This makes it match the similar code in
aes-gcm-avx10-x86_64.pl, which does do it correctly, more closely.

Also, make both functions just execute vzeroupper unconditionally, so
that it won't be missed again.  It's actually only 1 cycle on the CPUs
this code runs on, and it no longer seems worth executing conditionally.

Change-Id: I975cd5b526e5cdae1a567f4085c2484552bf6bea
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/77227
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
8 files changed
tree: 2648865cbedc9c56ef9e2b7bdeba4b02c9577eca
  1. .bcr/
  2. .github/
  3. cmake/
  4. crypto/
  5. decrepit/
  6. docs/
  7. fuzz/
  8. gen/
  9. include/
  10. infra/
  11. pki/
  12. rust/
  13. ssl/
  14. third_party/
  15. tool/
  16. util/
  17. .bazelignore
  18. .bazelrc
  19. .bazelversion
  20. .clang-format
  21. .gitignore
  22. API-CONVENTIONS.md
  23. AUTHORS
  24. BREAKING-CHANGES.md
  25. BUILD.bazel
  26. build.json
  27. BUILDING.md
  28. CMakeLists.txt
  29. codereview.settings
  30. CONTRIBUTING.md
  31. FUZZING.md
  32. go.mod
  33. go.sum
  34. INCORPORATING.md
  35. LICENSE
  36. MODULE.bazel
  37. MODULE.bazel.lock
  38. PORTING.md
  39. PrivacyInfo.xcprivacy
  40. README.md
  41. SANDBOXING.md
  42. STYLE.md
README.md

BoringSSL

BoringSSL is a fork of OpenSSL that is designed to meet Google's needs.

Although BoringSSL is an open source project, it is not intended for general use, as OpenSSL is. We don't recommend that third parties depend upon it. Doing so is likely to be frustrating because there are no guarantees of API or ABI stability.

Programs ship their own copies of BoringSSL when they use it and we update everything as needed when deciding to make API changes. This allows us to mostly avoid compromises in the name of compatibility. It works for us, but it may not work for you.

BoringSSL arose because Google used OpenSSL for many years in various ways and, over time, built up a large number of patches that were maintained while tracking upstream OpenSSL. As Google's product portfolio became more complex, more copies of OpenSSL sprung up and the effort involved in maintaining all these patches in multiple places was growing steadily.

Currently BoringSSL is the SSL library in Chrome/Chromium, Android (but it's not part of the NDK) and a number of other apps/programs.

Project links:

To file a security issue, use the Chromium process and mention in the report this is for BoringSSL. You can ignore the parts of the process that are specific to Chromium/Chrome.

There are other files in this directory which might be helpful: