Add prefetch to sha1_block_data_order_shaext

Similar idea to https://boringssl-review.googlesource.com/c/boringssl/+/55466

Results are pretty close to the current state,
e.g. tool speed goes from
Did 74000 SHA-1 (16384 bytes) operations in 1004094us (73698.3 ops/sec): 1207.5 MB/s
to
Did 75000 SHA-1 (16384 bytes) operations in 1004022us (74699.6 ops/sec): 1223.9 MB/s

But on AMD with prefetchers disabled and large enough data size,
to force cache misses this gives ~3x improvement:
name              old time/op  new time/op  delta
BM_SHA1Hash/2      141ns ± 1%   143ns ± 2%     ~     (p=0.421 n=5+5)
BM_SHA1Hash/4      143ns ± 2%   143ns ± 3%     ~     (p=0.841 n=5+5)
BM_SHA1Hash/8      141ns ± 1%   141ns ± 2%     ~     (p=1.000 n=5+5)
BM_SHA1Hash/16     141ns ± 1%   141ns ± 1%     ~     (p=0.841 n=5+5)
BM_SHA1Hash/32     143ns ± 2%   143ns ± 1%     ~     (p=0.690 n=5+5)
BM_SHA1Hash/64     178ns ± 1%   179ns ± 1%     ~     (p=0.151 n=5+5)
BM_SHA1Hash/512    454ns ± 1%   454ns ± 1%     ~     (p=0.841 n=5+5)
BM_SHA1Hash/4k    2.66µs ± 1%  2.65µs ± 1%     ~     (p=1.000 n=5+5)
BM_SHA1Hash/32k   20.3µs ± 1%  20.3µs ± 2%     ~     (p=1.000 n=5+5)
BM_SHA1Hash/256k   162µs ± 1%   161µs ± 1%     ~     (p=0.548 n=5+5)
BM_SHA1Hash/1M     644µs ± 1%   645µs ± 1%     ~     (p=0.841 n=5+5)
BM_SHA1Hash/2M    1.29ms ± 1%  1.29ms ± 2%     ~     (p=0.690 n=5+5)
BM_SHA1Hash/4M    2.58ms ± 1%  2.58ms ± 1%     ~     (p=0.841 n=5+5)
BM_SHA1Hash/8M    5.14ms ± 0%  5.15ms ± 1%     ~     (p=0.286 n=4+5)
BM_SHA1Hash/16M   11.4ms ± 3%  10.3ms ± 1%   -9.04%  (p=0.016 n=4+5)
BM_SHA1Hash/128M   249ms ± 0%    83ms ± 1%  -66.73%  (p=0.008 n=5+5)

Change-Id: I7cae746b6d8a705d6bf2d5c5df6a2dca6d44791a
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/57826
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
1 file changed
tree: c09df46475804ad4a5e02155e55b44e82c62049d
  1. .github/
  2. cmake/
  3. crypto/
  4. decrepit/
  5. fuzz/
  6. include/
  7. rust/
  8. ssl/
  9. third_party/
  10. tool/
  11. util/
  12. .clang-format
  13. .gitignore
  14. API-CONVENTIONS.md
  15. BREAKING-CHANGES.md
  16. BUILDING.md
  17. CMakeLists.txt
  18. codereview.settings
  19. CONTRIBUTING.md
  20. FUZZING.md
  21. go.mod
  22. go.sum
  23. INCORPORATING.md
  24. LICENSE
  25. PORTING.md
  26. README.md
  27. SANDBOXING.md
  28. sources.cmake
  29. STYLE.md
README.md

BoringSSL

BoringSSL is a fork of OpenSSL that is designed to meet Google's needs.

Although BoringSSL is an open source project, it is not intended for general use, as OpenSSL is. We don't recommend that third parties depend upon it. Doing so is likely to be frustrating because there are no guarantees of API or ABI stability.

Programs ship their own copies of BoringSSL when they use it and we update everything as needed when deciding to make API changes. This allows us to mostly avoid compromises in the name of compatibility. It works for us, but it may not work for you.

BoringSSL arose because Google used OpenSSL for many years in various ways and, over time, built up a large number of patches that were maintained while tracking upstream OpenSSL. As Google's product portfolio became more complex, more copies of OpenSSL sprung up and the effort involved in maintaining all these patches in multiple places was growing steadily.

Currently BoringSSL is the SSL library in Chrome/Chromium, Android (but it's not part of the NDK) and a number of other apps/programs.

Project links:

There are other files in this directory which might be helpful: