|author||David Benjamin <firstname.lastname@example.org>||Sun Aug 25 21:17:55 2019 -0400|
|committer||Adam Langley <email@example.com>||Fri Sep 13 17:47:18 2019 +0000|
Use a mix of bsaes and vpaes for CTR on NEON. tl;dr: AES is now constant-time on 32-bit ARM with NEON. Combined with all the past work, we now have constant-time AES and GHASH on ARM and x86 chips, 32-bit and 64-bit, provided NEON (required by Chrome on Android, aside from https://crbug.com/341598) or SSSE3 (almost all Chrome on Windows users) is available! CTR-like bsaes modes is harder to resolve than CBC decryption. They use both bulk (ctr128_f) and one-off (block128_f) operations. We currently use ctr128_f of bsaes and block128_f of aes_nohw (not constant-time), which hits 22.0 MB/s on my test chip. Implement a vpaes/bsaes hybrid to get the best of both worlds. The key is kept in vpaes form and, when the input is large enough, we convert the key to bsaes on-demand. This retains bsaes performance, but with no variable-time gaps. Alternatives considered: - Convert to bsaes form immediately and only use bsaes. This makes the one-off block128_f calls very expensive. One 8-block batch of bsaes_ctr32_encrypt_blocks costs as much as 5.76 vpaes_encrypt calls. - Do the above, but fold the one-off calls into bsaes batches because GCM is parallelizable. This is a mess with the current internal structure and doesn't apply to, e.g., CCM. - Drop bsaes in favor of vpaes. However, even with vpaes_ctr32_encrypt_blocks, vpaes is 15.5 MB/s. The hybrid is a 40% win on an important platform. - Try to narrow the gap, as we did for x86_64, with a "2x" optimization. I attempted this here but the register pressure was tricky. (x86_64 was already tight and NEON can't address memory in vtbl.) If I ignored this (gives wrong answer), the gap was still 20-25%. Perf here is slower overall (20 MB/s for old ARM vs 120-140 MB/s for old x86_64), so that gap is scarier. I retained vpaes_ctr32_encrypt_blocks because it's fairly compact (only 84 bytes assembled), though it's less important in the bsaes hybrid. Cortex-A53 (Raspberry Pi 3 Model B+) Before: Did 267000 AES-128-GCM (16 bytes) seal operations in 2004871us (133175.7 ops/sec): 2.1 MB/s Did 135000 AES-128-GCM (256 bytes) seal operations in 2013825us (67036.6 ops/sec): 17.2 MB/s Did 31000 AES-128-GCM (1350 bytes) seal operations in 2059039us (15055.6 ops/sec): 20.3 MB/s Did 5565 AES-128-GCM (8192 bytes) seal operations in 2073607us (2683.7 ops/sec): 22.0 MB/s Did 2709 AES-128-GCM (16384 bytes) seal operations in 2020264us (1340.9 ops/sec): 22.0 MB/s Did 209000 AES-256-GCM (16 bytes) seal operations in 2005654us (104205.4 ops/sec): 1.7 MB/s Did 109000 AES-256-GCM (256 bytes) seal operations in 2011293us (54194.0 ops/sec): 13.9 MB/s Did 25000 AES-256-GCM (1350 bytes) seal operations in 2082385us (12005.5 ops/sec): 16.2 MB/s Did 4452 AES-256-GCM (8192 bytes) seal operations in 2080729us (2139.6 ops/sec): 17.5 MB/s Did 2226 AES-256-GCM (16384 bytes) seal operations in 2079819us (1070.3 ops/sec): 17.5 MB/s After: Did 542000 AES-128-GCM (16 bytes) seal operations in 2003408us (270539.0 ops/sec): 4.3 MB/s [+104.8%] Did 124000 AES-128-GCM (256 bytes) seal operations in 2012579us (61612.5 ops/sec): 15.8 MB/s [-8.1%] Did 30000 AES-128-GCM (1350 bytes) seal operations in 2020636us (14846.8 ops/sec): 20.0 MB/s [-1.5%] Did 5502 AES-128-GCM (8192 bytes) seal operations in 2068807us (2659.5 ops/sec): 21.8 MB/s [-0.9%] Did 2772 AES-128-GCM (16384 bytes) seal operations in 2085176us (1329.4 ops/sec): 21.8 MB/s [-0.9%] Did 459000 AES-256-GCM (16 bytes) seal operations in 2003587us (229089.1 ops/sec): 3.7 MB/s [+117.6%] Did 100000 AES-256-GCM (256 bytes) seal operations in 2018311us (49546.4 ops/sec): 12.7 MB/s [-8.6%] Did 24000 AES-256-GCM (1350 bytes) seal operations in 2026975us (11840.3 ops/sec): 16.0 MB/s [-1.2%] Did 4410 AES-256-GCM (8192 bytes) seal operations in 2079581us (2120.6 ops/sec): 17.4 MB/s [-0.6%] Did 2226 AES-256-GCM (16384 bytes) seal operations in 2099318us (1060.3 ops/sec): 17.4 MB/s [-0.6%] Bug: 256 Change-Id: Ib74ab7e63974d3ddae8ce5fc35c9b44e73dce305 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/37429 Reviewed-by: Adam Langley <firstname.lastname@example.org>
BoringSSL is a fork of OpenSSL that is designed to meet Google's needs.
Although BoringSSL is an open source project, it is not intended for general use, as OpenSSL is. We don't recommend that third parties depend upon it. Doing so is likely to be frustrating because there are no guarantees of API or ABI stability.
Programs ship their own copies of BoringSSL when they use it and we update everything as needed when deciding to make API changes. This allows us to mostly avoid compromises in the name of compatibility. It works for us, but it may not work for you.
BoringSSL arose because Google used OpenSSL for many years in various ways and, over time, built up a large number of patches that were maintained while tracking upstream OpenSSL. As Google's product portfolio became more complex, more copies of OpenSSL sprung up and the effort involved in maintaining all these patches in multiple places was growing steadily.
Currently BoringSSL is the SSL library in Chrome/Chromium, Android (but it's not part of the NDK) and a number of other apps/programs.
There are other files in this directory which might be helpful: