commit | 90e3b6e68c0a9ff79de8cfb92fa81e6cd2a9d57d | [log] [tgz] |
---|---|---|
author | Ilya Tokar <tokarip@google.com> | Thu Dec 01 17:21:54 2022 -0500 |
committer | Boringssl LUCI CQ <boringssl-scoped@luci-project-accounts.iam.gserviceaccount.com> | Wed Dec 21 21:32:14 2022 +0000 |
tree | 66255f2ec324419b08dc2714df6afb8a3b7b39fa | |
parent | 837ade76fd2a0ce8a310b0d7c8d7b523fcb23047 [diff] |
Add prefetch to aesni_ctr32_ghash_6x Performance is neutral (~1% change with ~2% noise level): BM_AesCtrEncrypt/999 940MB/s ± 2% 941MB/s ± 1% ~ (p=0.811 n=40+39) BM_AesCtrEncrypt/4k 1.11GB/s ± 2% 1.11GB/s ± 2% ~ (p=0.452 n=40+40) BM_AesCtrEncrypt/8k 1.14GB/s ± 2% 1.14GB/s ± 1% ~ (p=0.101 n=40+39) BM_AesCtrEncrypt/12k 1.14GB/s ± 1% 1.14GB/s ± 2% ~ (p=0.629 n=39+40) BM_AesCtrEncrypt/16k 1.16GB/s ± 2% 1.16GB/s ± 1% ~ (p=0.193 n=40+38) BM_AesCtrEncrypt/24k 1.15GB/s ± 2% 1.15GB/s ± 2% +0.32% (p=0.037 n=40+40) BM_AesCtrEncrypt/64k 1.15GB/s ± 2% 1.15GB/s ± 2% ~ (p=0.246 n=40+38) BM_AesCtrEncrypt/128k 1.15GB/s ± 2% 1.15GB/s ± 2% +0.32% (p=0.042 n=40+79) BM_AesCtrEncryptWithFlush/4k 1.03GB/s ± 2% 1.03GB/s ± 2% ~ (p=0.707 n=39+40) BM_AesCtrEncryptWithFlush/8k 1.08GB/s ± 2% 1.08GB/s ± 2% ~ (p=0.381 n=40+40) BM_AesCtrEncryptWithFlush/12k 1.10GB/s ± 2% 1.10GB/s ± 1% ~ (p=0.980 n=40+37) BM_AesCtrEncryptWithFlush/16k 1.12GB/s ± 2% 1.12GB/s ± 2% ~ (p=0.568 n=39+40) BM_AesCtrEncryptWithFlush/24k 1.12GB/s ± 2% 1.12GB/s ± 2% ~ (p=0.620 n=39+40) BM_AesCtrEncryptWithFlush/64k 1.13GB/s ± 2% 1.14GB/s ± 2% ~ (p=0.289 n=40+39) BM_AesCtrEncryptWithFlush/128k 1.14GB/s ± 2% 1.14GB/s ± 2% +0.38% (p=0.011 n=40+78) BM_AesGcmEncrypt/999 1.60GB/s ± 2% 1.59GB/s ± 2% -0.67% (p=0.000 n=40+39) BM_AesGcmEncrypt/4k 2.16GB/s ± 2% 2.14GB/s ± 1% -0.72% (p=0.000 n=40+40) BM_AesGcmEncrypt/8k 2.29GB/s ± 2% 2.28GB/s ± 1% -0.49% (p=0.003 n=40+40) BM_AesGcmEncrypt/12k 2.29GB/s ± 2% 2.27GB/s ± 2% -0.67% (p=0.002 n=40+40) BM_AesGcmEncrypt/16k 2.37GB/s ± 2% 2.35GB/s ± 2% -0.70% (p=0.000 n=39+40) BM_AesGcmEncrypt/24k 2.32GB/s ± 2% 2.31GB/s ± 2% -0.49% (p=0.018 n=40+40) BM_AesGcmEncrypt/64k 2.33GB/s ± 2% 2.31GB/s ± 2% -0.54% (p=0.005 n=40+40) BM_AesGcmEncrypt/128k 2.31GB/s ± 2% 2.30GB/s ± 2% -0.49% (p=0.000 n=40+80) BM_AesCtrDecrypt/999 93.2MB/s ± 2% 93.4MB/s ± 1% ~ (p=0.788 n=40+40) BM_AesCtrDecrypt/4k 363MB/s ± 2% 364MB/s ± 1% ~ (p=0.239 n=40+39) BM_AesCtrDecrypt/8k 680MB/s ± 2% 680MB/s ± 1% ~ (p=0.852 n=40+40) BM_AesCtrDecrypt/12k 959MB/s ± 2% 963MB/s ± 1% +0.49% (p=0.013 n=40+37) BM_AesCtrDecrypt/16k 1.21GB/s ± 2% 1.21GB/s ± 2% +0.41% (p=0.038 n=40+38) BM_AesCtrDecrypt/24k 960MB/s ± 2% 964MB/s ± 2% +0.44% (p=0.006 n=40+39) BM_AesCtrDecrypt/64k 1.21GB/s ± 2% 1.21GB/s ± 2% ~ (p=0.114 n=40+39) BM_AesCtrDecrypt/128k 1.21GB/s ± 2% 1.21GB/s ± 2% ~ (p=0.110 n=40+77) BM_AesCtrDecryptRandomOffset/999 92.7MB/s ± 1% 92.9MB/s ± 1% ~ (p=0.386 n=40+40) BM_AesCtrDecryptRandomOffset/4k 188MB/s ± 1% 188MB/s ± 2% ~ (p=0.055 n=38+39) BM_AesCtrDecryptRandomOffset/8k 363MB/s ± 2% 363MB/s ± 1% ~ (p=0.890 n=40+40) BM_AesCtrDecryptRandomOffset/12k 526MB/s ± 2% 527MB/s ± 1% ~ (p=0.107 n=40+40) BM_AesCtrDecryptRandomOffset/16k 679MB/s ± 2% 681MB/s ± 2% ~ (p=0.162 n=40+40) BM_AesCtrDecryptRandomOffset/24k 681MB/s ± 2% 682MB/s ± 2% ~ (p=0.307 n=40+40) BM_AesCtrDecryptRandomOffset/64k 1.01GB/s ± 2% 1.01GB/s ± 1% ~ (p=0.574 n=38+39) BM_AesCtrDecryptRandomOffset/128k 1.10GB/s ± 2% 1.10GB/s ± 2% ~ (p=0.073 n=40+80) BM_AesGcmDecrypt/999 177MB/s ± 2% 175MB/s ± 2% -0.77% (p=0.000 n=39+40) BM_AesGcmDecrypt/4k 704MB/s ± 2% 698MB/s ± 2% -0.76% (p=0.000 n=40+40) BM_AesGcmDecrypt/8k 1.35GB/s ± 2% 1.34GB/s ± 2% -0.50% (p=0.001 n=39+39) BM_AesGcmDecrypt/12k 1.95GB/s ± 2% 1.95GB/s ± 1% -0.43% (p=0.004 n=40+39) BM_AesGcmDecrypt/16k 2.54GB/s ± 1% 2.53GB/s ± 2% -0.69% (p=0.000 n=39+40) BM_AesGcmDecrypt/24k 1.95GB/s ± 1% 1.94GB/s ± 1% -0.57% (p=0.001 n=39+40) BM_AesGcmDecrypt/64k 2.52GB/s ± 1% 2.51GB/s ± 2% -0.68% (p=0.000 n=39+40) BM_AesGcmDecrypt/128k 2.51GB/s ± 2% 2.50GB/s ± 2% -0.67% (p=0.000 n=40+79) BM_AesGcmDecryptRandomOffset/999 173MB/s ± 2% 172MB/s ± 1% -0.64% (p=0.000 n=39+39) BM_AesGcmDecryptRandomOffset/4k 356MB/s ± 2% 354MB/s ± 2% -0.66% (p=0.000 n=40+40) BM_AesGcmDecryptRandomOffset/8k 700MB/s ± 2% 694MB/s ± 2% -0.82% (p=0.000 n=40+40) BM_AesGcmDecryptRandomOffset/12k 1.03GB/s ± 2% 1.03GB/s ± 2% -0.50% (p=0.002 n=40+39) BM_AesGcmDecryptRandomOffset/16k 1.35GB/s ± 2% 1.34GB/s ± 2% ~ (p=0.057 n=40+40) BM_AesGcmDecryptRandomOffset/24k 1.35GB/s ± 2% 1.34GB/s ± 2% -0.59% (p=0.003 n=39+40) BM_AesGcmDecryptRandomOffset/64k 2.06GB/s ± 2% 2.05GB/s ± 1% -0.46% (p=0.008 n=40+40) BM_AesGcmDecryptRandomOffset/128k 2.26GB/s ± 2% 2.25GB/s ± 2% -0.60% (p=0.000 n=40+80) However on AMD with disabled hardware prefetchers gain is very significant (see 128Mb case, for a microbenchmark that doesn't fit in cache, for a 50+% speed-up): name old time/op new time/op delta BM_AesCtrEncrypt/999 1.06µs ± 2% 1.06µs ± 2% +0.42% (p=0.011 n=38+40) BM_AesCtrEncrypt/128k 114µs ± 2% 114µs ± 2% ~ (p=0.333 n=78+80) BM_AesCtrEncrypt/4k 3.70µs ± 2% 3.71µs ± 2% ~ (p=0.355 n=40+40) BM_AesCtrEncrypt/8k 7.15µs ± 2% 7.19µs ± 2% +0.44% (p=0.015 n=38+39) BM_AesCtrEncrypt/12k 10.7µs ± 2% 10.8µs ± 2% ~ (p=0.366 n=39+40) BM_AesCtrEncrypt/16k 14.1µs ± 2% 14.1µs ± 1% ~ (p=0.264 n=40+40) BM_AesCtrEncrypt/24k 21.3µs ± 2% 21.4µs ± 2% ~ (p=0.075 n=38+39) BM_AesCtrEncrypt/64k 56.8µs ± 2% 56.8µs ± 1% ~ (p=0.464 n=40+40) BM_AesCtrEncrypt/128M 200ms ± 3% 201ms ± 3% ~ (p=0.677 n=38+37) BM_AesCtrEncryptWithFlush/128k 115µs ± 2% 115µs ± 2% ~ (p=0.273 n=76+79) BM_AesCtrEncryptWithFlush/4k 3.95µs ± 1% 3.95µs ± 1% ~ (p=0.664 n=39+40) BM_AesCtrEncryptWithFlush/8k 7.53µs ± 2% 7.56µs ± 1% +0.30% (p=0.011 n=40+38) BM_AesCtrEncryptWithFlush/12k 11.1µs ± 2% 11.1µs ± 2% ~ (p=0.298 n=38+39) BM_AesCtrEncryptWithFlush/16k 14.6µs ± 2% 14.7µs ± 2% ~ (p=0.184 n=40+40) BM_AesCtrEncryptWithFlush/24k 21.9µs ± 2% 21.9µs ± 2% ~ (p=0.615 n=39+40) BM_AesCtrEncryptWithFlush/64k 57.7µs ± 2% 57.8µs ± 2% ~ (p=0.747 n=38+40) BM_AesCtrEncryptWithFlush/128M 201ms ± 3% 201ms ± 4% ~ (p=0.969 n=33+40) BM_AesGcmEncrypt/999 625ns ± 2% 629ns ± 2% +0.69% (p=0.000 n=35+37) BM_AesGcmEncrypt/128k 56.7µs ± 2% 57.1µs ± 2% +0.85% (p=0.000 n=72+79) BM_AesGcmEncrypt/4k 1.90µs ± 2% 1.91µs ± 2% +0.92% (p=0.000 n=36+40) BM_AesGcmEncrypt/8k 3.58µs ± 2% 3.60µs ± 1% +0.55% (p=0.000 n=39+37) BM_AesGcmEncrypt/12k 5.36µs ± 2% 5.42µs ± 2% +1.15% (p=0.000 n=37+40) BM_AesGcmEncrypt/16k 6.91µs ± 1% 6.96µs ± 2% +0.75% (p=0.000 n=37+37) BM_AesGcmEncrypt/24k 10.6µs ± 2% 10.7µs ± 2% +0.90% (p=0.000 n=37+39) BM_AesGcmEncrypt/64k 28.1µs ± 3% 28.3µs ± 1% +0.51% (p=0.001 n=39+36) BM_AesGcmEncrypt/128M 217ms ± 2% 199ms ± 1% -8.42% (p=0.000 n=40+37) BM_AesCtrDecrypt/999 10.7µs ± 1% 10.7µs ± 1% ~ (p=0.683 n=38+38) BM_AesCtrDecrypt/128k 108µs ± 1% 108µs ± 2% ~ (p=0.098 n=77+78) BM_AesCtrDecrypt/4k 11.3µs ± 2% 11.3µs ± 2% ~ (p=0.950 n=40+40) BM_AesCtrDecrypt/8k 12.0µs ± 2% 12.0µs ± 2% ~ (p=0.126 n=39+38) BM_AesCtrDecrypt/12k 12.7µs ± 1% 12.8µs ± 2% +0.39% (p=0.010 n=37+40) BM_AesCtrDecrypt/16k 13.5µs ± 2% 13.5µs ± 2% ~ (p=0.148 n=40+40) BM_AesCtrDecrypt/24k 25.5µs ± 2% 25.6µs ± 2% +0.32% (p=0.047 n=39+39) BM_AesCtrDecrypt/64k 53.9µs ± 1% 54.1µs ± 2% ~ (p=0.197 n=38+40) BM_AesCtrDecrypt/128M 190ms ± 3% 189ms ± 2% ~ (p=0.656 n=40+40) BM_AesCtrDecryptRandomOffset/999 10.8µs ± 2% 10.8µs ± 2% ~ (p=0.811 n=40+39) BM_AesCtrDecryptRandomOffset/128k 119µs ± 2% 119µs ± 2% ~ (p=0.072 n=80+77) BM_AesCtrDecryptRandomOffset/4k 21.8µs ± 2% 21.8µs ± 2% ~ (p=0.386 n=39+38) BM_AesCtrDecryptRandomOffset/8k 22.5µs ± 2% 22.6µs ± 2% ~ (p=0.298 n=40+38) BM_AesCtrDecryptRandomOffset/12k 23.3µs ± 2% 23.3µs ± 2% ~ (p=0.964 n=38+39) BM_AesCtrDecryptRandomOffset/16k 24.0µs ± 2% 24.1µs ± 2% +0.33% (p=0.022 n=38+39) BM_AesCtrDecryptRandomOffset/24k 36.0µs ± 1% 35.9µs ± 1% ~ (p=0.376 n=38+35) BM_AesCtrDecryptRandomOffset/64k 64.5µs ± 1% 64.6µs ± 1% ~ (p=0.237 n=38+39) BM_AesCtrDecryptRandomOffset/128M 190ms ± 2% 191ms ± 2% +0.54% (p=0.029 n=40+38) BM_AesGcmDecrypt/999 5.65µs ± 1% 5.71µs ± 2% +0.99% (p=0.000 n=36+40) BM_AesGcmDecrypt/128k 51.8µs ± 2% 52.5µs ± 2% +1.17% (p=0.000 n=77+75) BM_AesGcmDecrypt/4k 5.82µs ± 2% 5.86µs ± 2% +0.68% (p=0.000 n=39+39) BM_AesGcmDecrypt/8k 6.07µs ± 2% 6.11µs ± 2% +0.69% (p=0.000 n=39+39) BM_AesGcmDecrypt/12k 6.26µs ± 1% 6.33µs ± 1% +1.04% (p=0.000 n=38+39) BM_AesGcmDecrypt/16k 6.42µs ± 1% 6.49µs ± 1% +1.04% (p=0.000 n=38+38) BM_AesGcmDecrypt/24k 12.6µs ± 2% 12.7µs ± 2% +1.02% (p=0.000 n=39+39) BM_AesGcmDecrypt/64k 26.0µs ± 2% 26.2µs ± 1% +0.88% (p=0.000 n=40+38) BM_AesGcmDecrypt/128M 210ms ± 2% 94ms ±12% -55.31% (p=0.000 n=40+32) BM_AesGcmDecryptRandomOffset/999 5.77µs ± 2% 5.83µs ± 2% +1.11% (p=0.000 n=39+40) BM_AesGcmDecryptRandomOffset/128k 57.7µs ± 2% 58.4µs ± 2% +1.19% (p=0.000 n=80+76) BM_AesGcmDecryptRandomOffset/4k 11.5µs ± 2% 11.6µs ± 2% +0.67% (p=0.000 n=40+36) BM_AesGcmDecryptRandomOffset/8k 11.6µs ± 2% 11.8µs ± 1% +1.04% (p=0.000 n=39+37) BM_AesGcmDecryptRandomOffset/12k 11.9µs ± 1% 12.0µs ± 2% +0.95% (p=0.000 n=39+39) BM_AesGcmDecryptRandomOffset/16k 12.1µs ± 2% 12.2µs ± 2% +0.84% (p=0.000 n=40+40) BM_AesGcmDecryptRandomOffset/24k 18.1µs ± 2% 18.3µs ± 1% +0.97% (p=0.000 n=40+38) BM_AesGcmDecryptRandomOffset/64k 31.6µs ± 1% 32.0µs ± 2% +1.32% (p=0.000 n=39+39) BM_AesGcmDecryptRandomOffset/128M 209ms ± 2% 93ms ± 2% -55.34% (p=0.000 n=40+31) Change-Id: I6312e01ff0da70cc52f09194846b82cc6b69d37a Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/55466 Commit-Queue: Adam Langley <agl@google.com> Reviewed-by: Adam Langley <agl@google.com>
BoringSSL is a fork of OpenSSL that is designed to meet Google's needs.
Although BoringSSL is an open source project, it is not intended for general use, as OpenSSL is. We don't recommend that third parties depend upon it. Doing so is likely to be frustrating because there are no guarantees of API or ABI stability.
Programs ship their own copies of BoringSSL when they use it and we update everything as needed when deciding to make API changes. This allows us to mostly avoid compromises in the name of compatibility. It works for us, but it may not work for you.
BoringSSL arose because Google used OpenSSL for many years in various ways and, over time, built up a large number of patches that were maintained while tracking upstream OpenSSL. As Google's product portfolio became more complex, more copies of OpenSSL sprung up and the effort involved in maintaining all these patches in multiple places was growing steadily.
Currently BoringSSL is the SSL library in Chrome/Chromium, Android (but it's not part of the NDK) and a number of other apps/programs.
Project links:
There are other files in this directory which might be helpful: