Add bn_add_words and bn_sub_words assembly for aarch64.

It is 2023 and compilers *still* cannot use carry flags effectively,
particularly GCC.

There are some Clang-specific built-ins which help x86_64 (where we have
asm anyway) but, on aarch64, the built-ins actually *regress
performance* over the current formulation! I suspect Clang is getting
confused by Arm and Intel having opposite borrow flags.
https://clang.llvm.org/docs/LanguageExtensions.html#multiprecision-arithmetic-builtins

Just include aarch64 assembly to avoid this. This provides a noticeable
perf boost in code that uses these functions (Where bn_mul_mont is
available, they're not used much in RSA, but the generic EC
implementation does modular additions, and RSA private key checking
spends a lot of time in our add/sub-based bn_div_consttime.)

The new code is also smaller than the generic one (18 instructions
each), probably because it avoids all the flag spills and only tries to
unroll by two iterations.

Before:
Did 7137 RSA 2048 signing operations in 4022094us (1774.4 ops/sec)
Did 326000 RSA 2048 verify (same key) operations in 4001828us (81462.8 ops/sec)
Did 278000 RSA 2048 verify (fresh key) operations in 4001392us (69475.8 ops/sec)
Did 34830 RSA 2048 private key parse operations in 4038893us (8623.7 ops/sec)
Did 1196 RSA 4096 signing operations in 4015759us (297.8 ops/sec)
Did 90000 RSA 4096 verify (same key) operations in 4041959us (22266.4 ops/sec)
Did 79000 RSA 4096 verify (fresh key) operations in 4034561us (19580.8 ops/sec)
Did 12222 RSA 4096 private key parse operations in 4004831us (3051.8 ops/sec)
Did 10626 ECDSA P-384 signing operations in 4030764us (2636.2 ops/sec)
Did 10800 ECDSA P-384 verify operations in 4052718us (2664.9 ops/sec)
Did 4182 ECDSA P-521 signing operations in 4076198us (1026.0 ops/sec)
Did 4059 ECDSA P-521 verify operations in 4063819us (998.8 ops/sec)

After:
Did 7189 RSA 2048 signing operations in 4021331us (1787.7 ops/sec) [+0.7%]
Did 326000 RSA 2048 verify (same key) operations in 4010811us (81280.3 ops/sec) [-0.2%]
Did 278000 RSA 2048 verify (fresh key) operations in 4004206us (69427.0 ops/sec) [-0.1%]
Did 53040 RSA 2048 private key parse operations in 4050953us (13093.2 ops/sec) [+51.8%]
Did 1200 RSA 4096 signing operations in 4035548us (297.4 ops/sec) [-0.2%]
Did 90000 RSA 4096 verify (same key) operations in 4035686us (22301.0 ops/sec) [+0.2%]
Did 80000 RSA 4096 verify (fresh key) operations in 4020989us (19895.6 ops/sec) [+1.6%]
Did 20468 RSA 4096 private key parse operations in 4037474us (5069.5 ops/sec) [+66.1%]
Did 11070 ECDSA P-384 signing operations in 4023595us (2751.3 ops/sec) [+4.4%]
Did 11232 ECDSA P-384 verify operations in 4063116us (2764.4 ops/sec) [+3.7%]
Did 4387 ECDSA P-521 signing operations in 4052728us (1082.5 ops/sec) [+5.5%]
Did 4305 ECDSA P-521 verify operations in 4064660us (1059.1 ops/sec) [+6.0%]

Change-Id: If2f739373cdd10fa1d4925d5e2725e87d2255fc0
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/56966
Reviewed-by: Bob Beck <bbe@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
3 files changed
tree: 1d80d9a83419b57507e65f985edc5934d3eddb24
  1. .github/
  2. cmake/
  3. crypto/
  4. decrepit/
  5. fuzz/
  6. include/
  7. rust/
  8. ssl/
  9. third_party/
  10. tool/
  11. util/
  12. .clang-format
  13. .gitignore
  14. API-CONVENTIONS.md
  15. BREAKING-CHANGES.md
  16. BUILDING.md
  17. CMakeLists.txt
  18. codereview.settings
  19. CONTRIBUTING.md
  20. FUZZING.md
  21. go.mod
  22. go.sum
  23. INCORPORATING.md
  24. LICENSE
  25. PORTING.md
  26. README.md
  27. SANDBOXING.md
  28. sources.cmake
  29. STYLE.md
README.md

BoringSSL

BoringSSL is a fork of OpenSSL that is designed to meet Google's needs.

Although BoringSSL is an open source project, it is not intended for general use, as OpenSSL is. We don't recommend that third parties depend upon it. Doing so is likely to be frustrating because there are no guarantees of API or ABI stability.

Programs ship their own copies of BoringSSL when they use it and we update everything as needed when deciding to make API changes. This allows us to mostly avoid compromises in the name of compatibility. It works for us, but it may not work for you.

BoringSSL arose because Google used OpenSSL for many years in various ways and, over time, built up a large number of patches that were maintained while tracking upstream OpenSSL. As Google's product portfolio became more complex, more copies of OpenSSL sprung up and the effort involved in maintaining all these patches in multiple places was growing steadily.

Currently BoringSSL is the SSL library in Chrome/Chromium, Android (but it's not part of the NDK) and a number of other apps/programs.

Project links:

There are other files in this directory which might be helpful: