Manually unroll pi and rho steps in Keccak

We've been effectively relying on Clang (on x86_64) to do it for us. But
other compiler/arch platforms don't unroll it as readily. I originally
did this to play with the 32-bit bit interleaving trick, which requires
this, but actually it's a significant win on its own.

Clang (NDK), aarch32, Pixel 5A
Before:
Did 4836 Kyber generate + decap operations in 2036618us (2374.5 ops/sec)
Did 6237 Kyber parse + encap operations in 2055784us (3033.9 ops/sec)
After:
Did 8610 Kyber generate + decap operations in 2051048us (4197.9 ops/sec) [+76.8%]
Did 12138 Kyber parse + encap operations in 2042363us (5943.1 ops/sec) [+95.9%]

Clang (NDK), aarch64, Pixel 5A
Before:
Did 16720 Kyber generate + decap operations in 2011039us (8314.1 ops/sec)
Did 30000 Kyber parse + encap operations in 2023170us (14828.2 ops/sec)
AFter:
Did 17080 Kyber generate + decap operations in 2005310us (8517.4 ops/sec) [+2.4%]
Did 31000 Kyber parse + encap operations in 2059104us (15055.1 ops/sec) [+1.5%]

GCC, x86_64
Before:
Did 14535 Kyber generate + decap operations in 2015051us (7213.2 ops/sec)
Did 21000 Kyber parse + encap operations in 2017842us (10407.2 ops/sec)
After:
Did 19900 Kyber generate + decap operations in 2016747us (9867.4 ops/sec) [+36.8%]
Did 34000 Kyber parse + encap operations in 2059643us (16507.7 ops/sec) [+58.6%]

Clang, x86_64
Before:
Did 19584 Kyber generate + decap operations in 2006839us (9758.6 ops/sec)
Did 34000 Kyber parse + encap operations in 2050513us (16581.2 ops/sec)
After:
Did 19928 Kyber generate + decap operations in 2020249us (9864.1 ops/sec) [+1.1%]
Did 34000 Kyber parse + encap operations in 2052970us (16561.4 ops/sec) [-0.1%]

Change-Id: Iee9315667c1d2044785faa9370815a3c7555c259
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/63992
Auto-Submit: David Benjamin <davidben@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
Reviewed-by: Bob Beck <bbe@google.com>
1 file changed
tree: 1ce6b526df1cb9b5f7c3c01e7467fec16a61847b
  1. .github/
  2. cmake/
  3. crypto/
  4. decrepit/
  5. fuzz/
  6. include/
  7. pki/
  8. rust/
  9. ssl/
  10. third_party/
  11. tool/
  12. util/
  13. .clang-format
  14. .gitignore
  15. API-CONVENTIONS.md
  16. BREAKING-CHANGES.md
  17. BUILDING.md
  18. CMakeLists.txt
  19. codereview.settings
  20. CONTRIBUTING.md
  21. FUZZING.md
  22. go.mod
  23. go.sum
  24. INCORPORATING.md
  25. LICENSE
  26. PORTING.md
  27. README.md
  28. SANDBOXING.md
  29. sources.cmake
  30. STYLE.md
README.md

BoringSSL

BoringSSL is a fork of OpenSSL that is designed to meet Google's needs.

Although BoringSSL is an open source project, it is not intended for general use, as OpenSSL is. We don't recommend that third parties depend upon it. Doing so is likely to be frustrating because there are no guarantees of API or ABI stability.

Programs ship their own copies of BoringSSL when they use it and we update everything as needed when deciding to make API changes. This allows us to mostly avoid compromises in the name of compatibility. It works for us, but it may not work for you.

BoringSSL arose because Google used OpenSSL for many years in various ways and, over time, built up a large number of patches that were maintained while tracking upstream OpenSSL. As Google's product portfolio became more complex, more copies of OpenSSL sprung up and the effort involved in maintaining all these patches in multiple places was growing steadily.

Currently BoringSSL is the SSL library in Chrome/Chromium, Android (but it's not part of the NDK) and a number of other apps/programs.

Project links:

There are other files in this directory which might be helpful: