)]}'
{
  "commit": "32ce6032ff769aef2ebb93189d14398adad357b1",
  "tree": "f7d51428295836beb5c44900051a9da46b23d6ab",
  "parents": [
    "5501a26915ad33d8bcd37b7e99b7de460016d586"
  ],
  "author": {
    "name": "David Benjamin",
    "email": "davidben@google.com",
    "time": "Tue Mar 19 21:59:49 2019 -0500"
  },
  "committer": {
    "name": "CQ bot account: commit-bot@chromium.org",
    "email": "commit-bot@chromium.org",
    "time": "Sat Mar 23 06:59:22 2019 +0000"
  },
  "message": "Add an optimized x86_64 vpaes ctr128_f and remove bsaes.\n\nBrian Smith suggested applying vpaes-armv8\u0027s \"2x\" optimization to\nvpaes-x86_64. The registers are a little tight (aarch64 has a whole 32\nSIMD registers, while x86_64 only has 16), but it\u0027s doable with some\nspills and makes vpaes much more competitive with bsaes. At small- and\nmedium-sized inputs, vpaes now matches bsaes. At large inputs, it\u0027s a\n~10% perf hit.\n\nbsaes is thus pulling much less weight. Losing an entire AES\nimplementation and having constant-time AES for SSSE3 is attractive.\nSome notes:\n\n- The fact that these are older CPUs tempers the perf hit, but CPUs\n  without AES-NI are still common enough to matter.\n\n- This CL does regress CBC decrypt performance nontrivially (see below).\n  If this matters, we can double-up CBC decryption too. CBC in TLS is\n  legacy and already pays a costly Lucky13 mitigation.\n\n- The difference between 1350 and 8192 bytes is likely bsaes AES-GCM\n  paying for two slow (and variable-time!) aes_nohw_encrypt\n  calls for EK0 and the trailing partial block. At larger inputs, those\n  two calls are more amortized.\n\n- To that end, bsaes would likely be much faster on AES-GCM with smarter\n  use of bsaes. (Fold one-off calls above into bulk data.) Implementing\n  this is a bit of a nuisance though, especially considering we don\u0027t\n  wish to regress hwaes.\n\n- I\u0027d discarded the key conversion idea, but I think I did it wrong.\n  Benchmarks from\n  https://boringssl-review.googlesource.com/c/boringssl/+/33589 suggest\n  converting to bsaes format on-demand for large ctr32 inputs should\n  give the best of both worlds, but at the cost of an entire AES\n  implementation relative to this CL.\n\n- ARMv7 still depends on bsaes and has no vpaes. It also has 16 SIMD\n  registers, so my plan is to translate it, with the same 2x\n  optimization, and see how it compares. Hopefully that, or some\n  combination of the above, will work for ARMv7.\n\nSandy Bridge\nbsaes (before):\nDid 3144750 AES-128-GCM (16 bytes) seal operations in 5016000us (626943.8 ops/sec): 10.0 MB/s\nDid 2053750 AES-128-GCM (256 bytes) seal operations in 5016000us (409439.8 ops/sec): 104.8 MB/s\nDid 469000 AES-128-GCM (1350 bytes) seal operations in 5015000us (93519.4 ops/sec): 126.3 MB/s\nDid 92500 AES-128-GCM (8192 bytes) seal operations in 5016000us (18441.0 ops/sec): 151.1 MB/s\nDid 46750 AES-128-GCM (16384 bytes) seal operations in 5032000us (9290.5 ops/sec): 152.2 MB/s\nvpaes-1x (for reference, not this CL):\nDid 8684750 AES-128-GCM (16 bytes) seal operations in 5015000us (1731754.7 ops/sec): 27.7 MB/s [+177%]\nDid 1731500 AES-128-GCM (256 bytes) seal operations in 5016000us (345195.4 ops/sec): 88.4 MB/s [-15.6%]\nDid 346500 AES-128-GCM (1350 bytes) seal operations in 5016000us (69078.9 ops/sec): 93.3 MB/s [-26.1%]\nDid 61250 AES-128-GCM (8192 bytes) seal operations in 5015000us (12213.4 ops/sec): 100.1 MB/s [-33.8%]\nDid 32500 AES-128-GCM (16384 bytes) seal operations in 5031000us (6459.9 ops/sec): 105.8 MB/s [-30.5%]\nvpaes-2x (this CL):\nDid 8840000 AES-128-GCM (16 bytes) seal operations in 5015000us (1762711.9 ops/sec): 28.2 MB/s [+182%]\nDid 2167750 AES-128-GCM (256 bytes) seal operations in 5016000us (432167.1 ops/sec): 110.6 MB/s [+5.5%]\nDid 474000 AES-128-GCM (1350 bytes) seal operations in 5016000us (94497.6 ops/sec): 127.6 MB/s [+1.0%]\nDid 81750 AES-128-GCM (8192 bytes) seal operations in 5015000us (16301.1 ops/sec): 133.5 MB/s [-11.6%]\nDid 41750 AES-128-GCM (16384 bytes) seal operations in 5031000us (8298.5 ops/sec): 136.0 MB/s [-10.6%]\n\nPenryn\nbsaes (before):\nDid 958000 AES-128-GCM (16 bytes) seal operations in 1000264us (957747.2 ops/sec): 15.3 MB/s\nDid 420000 AES-128-GCM (256 bytes) seal operations in 1000480us (419798.5 ops/sec): 107.5 MB/s\nDid 96000 AES-128-GCM (1350 bytes) seal operations in 1001083us (95896.1 ops/sec): 129.5 MB/s\nDid 18000 AES-128-GCM (8192 bytes) seal operations in 1042491us (17266.3 ops/sec): 141.4 MB/s\nDid 9482 AES-128-GCM (16384 bytes) seal operations in 1095703us (8653.8 ops/sec): 141.8 MB/s\nDid 758000 AES-256-GCM (16 bytes) seal operations in 1000769us (757417.5 ops/sec): 12.1 MB/s\nDid 359000 AES-256-GCM (256 bytes) seal operations in 1001993us (358285.9 ops/sec): 91.7 MB/s\nDid 82000 AES-256-GCM (1350 bytes) seal operations in 1009583us (81221.7 ops/sec): 109.6 MB/s\nDid 15000 AES-256-GCM (8192 bytes) seal operations in 1022294us (14672.9 ops/sec): 120.2 MB/s\nDid 7884 AES-256-GCM (16384 bytes) seal operations in 1070934us (7361.8 ops/sec): 120.6 MB/s\nvpaes-1x (for reference, not this CL):\nDid 2030000 AES-128-GCM (16 bytes) seal operations in 1000227us (2029539.3 ops/sec): 32.5 MB/s [+112%]\nDid 382000 AES-128-GCM (256 bytes) seal operations in 1001949us (381256.9 ops/sec): 97.6 MB/s [-9.2%]\nDid 81000 AES-128-GCM (1350 bytes) seal operations in 1007297us (80413.2 ops/sec): 108.6 MB/s [-16.1%]\nDid 14000 AES-128-GCM (8192 bytes) seal operations in 1031499us (13572.5 ops/sec): 111.2 MB/s [-21.4%]\nDid 7008 AES-128-GCM (16384 bytes) seal operations in 1030706us (6799.2 ops/sec): 111.4 MB/s [-21.4%]\nDid 1838000 AES-256-GCM (16 bytes) seal operations in 1000238us (1837562.7 ops/sec): 29.4 MB/s [+143%]\nDid 321000 AES-256-GCM (256 bytes) seal operations in 1001666us (320466.1 ops/sec): 82.0 MB/s [-10.6%]\nDid 67000 AES-256-GCM (1350 bytes) seal operations in 1010359us (66313.1 ops/sec): 89.5 MB/s [-18.3%]\nDid 12000 AES-256-GCM (8192 bytes) seal operations in 1072706us (11186.7 ops/sec): 91.6 MB/s [-23.8%]\nDid 5680 AES-256-GCM (16384 bytes) seal operations in 1009214us (5628.1 ops/sec): 92.2 MB/s [-23.5%]\nvpaes-2x (this CL):\nDid 2072000 AES-128-GCM (16 bytes) seal operations in 1000066us (2071863.3 ops/sec): 33.1 MB/s [+116%]\nDid 432000 AES-128-GCM (256 bytes) seal operations in 1000732us (431684.0 ops/sec): 110.5 MB/s [+2.8%]\nDid 92000 AES-128-GCM (1350 bytes) seal operations in 1000580us (91946.7 ops/sec): 124.1 MB/s [-4.2%]\nDid 16000 AES-128-GCM (8192 bytes) seal operations in 1016422us (15741.5 ops/sec): 129.0 MB/s [-8.8%]\nDid 8448 AES-128-GCM (16384 bytes) seal operations in 1073962us (7866.2 ops/sec): 128.9 MB/s [-9.1%]\nDid 1865000 AES-256-GCM (16 bytes) seal operations in 1000043us (1864919.8 ops/sec): 29.8 MB/s [+146%]\nDid 364000 AES-256-GCM (256 bytes) seal operations in 1001561us (363432.7 ops/sec): 93.0 MB/s [+1.4%]\nDid 77000 AES-256-GCM (1350 bytes) seal operations in 1004123us (76683.8 ops/sec): 103.5 MB/s [-5.6%]\nDid 14000 AES-256-GCM (8192 bytes) seal operations in 1071179us (13069.7 ops/sec): 107.1 MB/s [-10.9%]\nDid 7008 AES-256-GCM (16384 bytes) seal operations in 1074125us (6524.4 ops/sec): 106.9 MB/s [-11.4%]\n\nPenryn, CBC mode decryption\nbsaes (before):\nDid 159000 AES-128-CBC-SHA1 (16 bytes) open operations in 1001019us (158838.1 ops/sec): 2.5 MB/s\nDid 114000 AES-128-CBC-SHA1 (256 bytes) open operations in 1006485us (113265.5 ops/sec): 29.0 MB/s\nDid 65000 AES-128-CBC-SHA1 (1350 bytes) open operations in 1008441us (64455.9 ops/sec): 87.0 MB/s\nDid 17000 AES-128-CBC-SHA1 (8192 bytes) open operations in 1005440us (16908.0 ops/sec): 138.5 MB/s\nvpaes (after):\nDid 167000 AES-128-CBC-SHA1 (16 bytes) open operations in 1003556us (166408.3 ops/sec): 2.7 MB/s [+8%]\nDid 112000 AES-128-CBC-SHA1 (256 bytes) open operations in 1005673us (111368.2 ops/sec): 28.5 MB/s [-1.7%]\nDid 56000 AES-128-CBC-SHA1 (1350 bytes) open operations in 1005647us (55685.5 ops/sec): 75.2 MB/s [-13.6%]\nDid 13635 AES-128-CBC-SHA1 (8192 bytes) open operations in 1020486us (13361.3 ops/sec): 109.5 MB/s [-20.9%]\n\nBug: 256\nChange-Id: I11ed773323ec7a5ee61080c9ed9ed4761849828a\nReviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35364\nCommit-Queue: David Benjamin \u003cdavidben@google.com\u003e\nReviewed-by: Adam Langley \u003cagl@google.com\u003e\n",
  "tree_diff": [
    {
      "type": "modify",
      "old_id": "fbf25ac88ab0bc96eb687707c3efb8695481c929",
      "old_mode": 33188,
      "old_path": "crypto/fipsmodule/CMakeLists.txt",
      "new_id": "d1e2cb9d63d01481937631be198a1b0e8d6d2858",
      "new_mode": 33188,
      "new_path": "crypto/fipsmodule/CMakeLists.txt"
    },
    {
      "type": "delete",
      "old_id": "3bb28190da4b896df259dfbe9f3d128490f8b9fb",
      "old_mode": 33188,
      "old_path": "crypto/fipsmodule/aes/asm/bsaes-x86_64.pl",
      "new_id": "0000000000000000000000000000000000000000",
      "new_mode": 0,
      "new_path": "/dev/null"
    },
    {
      "type": "modify",
      "old_id": "47d9972f4aa71ea68d5a091d4211454a95af966a",
      "old_mode": 33188,
      "old_path": "crypto/fipsmodule/aes/asm/vpaes-x86_64.pl",
      "new_id": "9429344bc11f020c0e982d8ed3378ae0ea21df83",
      "new_mode": 33188,
      "new_path": "crypto/fipsmodule/aes/asm/vpaes-x86_64.pl"
    },
    {
      "type": "modify",
      "old_id": "63070bc6bc3764668523b5b607ffd3ce64df6dbc",
      "old_mode": 33188,
      "old_path": "crypto/fipsmodule/aes/internal.h",
      "new_id": "0cebb04cae34da881f726065f811957b9b42af0f",
      "new_mode": 33188,
      "new_path": "crypto/fipsmodule/aes/internal.h"
    }
  ]
}
