Add vpaes-armv7.pl and replace non-parallel modes.

This is translated from vpaes-armv8.pl. See top of the new file for
details. Unfortunately, vpaes's performance is disappointing here. The
vpaes paper notes NEON's vector permutation instructions are not very
fast. But this is now constant-time.

Parallel modes, notably CTR derivatives, are performance-sensitive and
worth further work. (They currently use bsaes.) Thus this CL only
replaces non-parallel uses, which currently use a variable-time
table-based implementation.

Note QUIC packet number encryption will do a single one-off AES block
operation per packet and use this file. But the single-block speeds
below should be fine for a per-packet operation.

Alternatives considered: I toyed with BearSSL's 32-bit C bitsliced
implementation, but it appears to be slower than this implementation.

Cortex-A53 (Raspberry Pi 3 Model B+)
Before:
Did 124000 AES-128-CBC-SHA1 (16 bytes) seal operations in 1005644us (123304.1 ops/sec): 2.0 MB/s
Did 45000 AES-128-CBC-SHA1 (256 bytes) seal operations in 1009513us (44575.9 ops/sec): 11.4 MB/s
Did 12000 AES-128-CBC-SHA1 (1350 bytes) seal operations in 1009735us (11884.3 ops/sec): 16.0 MB/s
Did 2266 AES-128-CBC-SHA1 (8192 bytes) seal operations in 1060631us (2136.5 ops/sec): 17.5 MB/s
Did 1078 AES-128-CBC-SHA1 (16384 bytes) seal operations in 1002268us (1075.6 ops/sec): 17.6 MB/s
Did 114000 AES-256-CBC-SHA1 (16 bytes) seal operations in 1004576us (113480.7 ops/sec): 1.8 MB/s
Did 38000 AES-256-CBC-SHA1 (256 bytes) seal operations in 1001777us (37932.6 ops/sec): 9.7 MB/s
Did 9999 AES-256-CBC-SHA1 (1350 bytes) seal operations in 1028518us (9721.8 ops/sec): 13.1 MB/s
Did 1892 AES-256-CBC-SHA1 (8192 bytes) seal operations in 1095702us (1726.7 ops/sec): 14.1 MB/s
Did 902 AES-256-CBC-SHA1 (16384 bytes) seal operations in 1038989us (868.2 ops/sec): 14.2 MB/s
Did 2094000 AES-128 encrypt setup operations in 1000296us (2093380.4 ops/sec)
Did 1505000 AES-128 encrypt operations in 1000596us (1504103.6 ops/sec)
Did 465000 AES-128 decrypt setup operations in 1000354us (464835.4 ops/sec)
Did 1468000 AES-128 decrypt operations in 1000178us (1467738.7 ops/sec)
Did 1751000 AES-256 encrypt setup operations in 1000189us (1750669.1 ops/sec)
Did 1113000 AES-256 encrypt operations in 1000004us (1112995.5 ops/sec)
Did 339000 AES-256 decrypt setup operations in 1002970us (337996.2 ops/sec)
Did 1103000 AES-256 decrypt operations in 1000882us (1102028.0 ops/sec)

After:
Did 119000 AES-128-CBC-SHA1 (16 bytes) seal operations in 1000259us (118969.2 ops/sec): 1.9 MB/s [-5.0%]
Did 39000 AES-128-CBC-SHA1 (256 bytes) seal operations in 1001341us (38947.8 ops/sec): 10.0 MB/s [-12.3%]
Did 10571 AES-128-CBC-SHA1 (1350 bytes) seal operations in 1067614us (9901.5 ops/sec): 13.4 MB/s [-16.3%]
Did 1903 AES-128-CBC-SHA1 (8192 bytes) seal operations in 1090907us (1744.4 ops/sec): 14.3 MB/s [-18.3%]
Did 957 AES-128-CBC-SHA1 (16384 bytes) seal operations in 1093380us (875.3 ops/sec): 14.3 MB/s [-18.8%]
Did 108000 AES-256-CBC-SHA1 (16 bytes) seal operations in 1005090us (107453.1 ops/sec): 1.7 MB/s [-5.6%]
Did 33000 AES-256-CBC-SHA1 (256 bytes) seal operations in 1026530us (32147.1 ops/sec): 8.2 MB/s [-15.5%]
Did 8393 AES-256-CBC-SHA1 (1350 bytes) seal operations in 1064768us (7882.5 ops/sec): 10.6 MB/s [-19.1%]
Did 1496 AES-256-CBC-SHA1 (8192 bytes) seal operations in 1090316us (1372.1 ops/sec): 11.2 MB/s [-20.6%]
Did 737 AES-256-CBC-SHA1 (16384 bytes) seal operations in 1070396us (688.5 ops/sec): 11.3 MB/s [-20.4%]
Did 695000 AES-128 encrypt setup operations in 1000325us (694774.2 ops/sec) [-66.8%]
Did 1043000 AES-128 encrypt operations in 1000568us (1042407.9 ops/sec) [-30.7%]
Did 495000 AES-128 decrypt setup operations in 1000680us (494663.6 ops/sec) [-6.4%]
Did 743000 AES-128 decrypt operations in 1000892us (742337.8 ops/sec) [-49.4%]
Did 550000 AES-256 encrypt setup operations in 1000228us (549874.6 ops/sec) [-68.6%]
Did 786000 AES-256 encrypt operations in 1000978us (785232.0 ops/sec) [-29.4%]
Did 377000 AES-256 decrypt setup operations in 1002252us (376152.9 ops/sec) [-11.3%]
Did 547000 AES-256 decrypt operations in 1000168us (546908.1 ops/sec) [-50.3%]

Bug: 266
Change-Id: Ia5f9c90bcf5e713e40cacc954c604a6ffb432d6c
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/37426
Reviewed-by: Adam Langley <agl@google.com>
5 files changed
tree: 908401339bdb2683b77680a43562f02a630d70a0
  1. .github/
  2. crypto/
  3. decrepit/
  4. fuzz/
  5. include/
  6. ssl/
  7. third_party/
  8. tool/
  9. util/
  10. .clang-format
  11. .gitignore
  12. API-CONVENTIONS.md
  13. BREAKING-CHANGES.md
  14. BUILDING.md
  15. CMakeLists.txt
  16. codereview.settings
  17. CONTRIBUTING.md
  18. FUZZING.md
  19. go.mod
  20. INCORPORATING.md
  21. LICENSE
  22. PORTING.md
  23. README.md
  24. sources.cmake
  25. STYLE.md
README.md

BoringSSL

BoringSSL is a fork of OpenSSL that is designed to meet Google's needs.

Although BoringSSL is an open source project, it is not intended for general use, as OpenSSL is. We don't recommend that third parties depend upon it. Doing so is likely to be frustrating because there are no guarantees of API or ABI stability.

Programs ship their own copies of BoringSSL when they use it and we update everything as needed when deciding to make API changes. This allows us to mostly avoid compromises in the name of compatibility. It works for us, but it may not work for you.

BoringSSL arose because Google used OpenSSL for many years in various ways and, over time, built up a large number of patches that were maintained while tracking upstream OpenSSL. As Google's product portfolio became more complex, more copies of OpenSSL sprung up and the effort involved in maintaining all these patches in multiple places was growing steadily.

Currently BoringSSL is the SSL library in Chrome/Chromium, Android (but it's not part of the NDK) and a number of other apps/programs.

Project links:

There are other files in this directory which might be helpful: