)]}'
{
  "commit": "1599fea8705622e80d8bcab6a36a0fdff4c97992",
  "tree": "7cb563c1d8262b537a94f6d7db09cd58fb9d40b7",
  "parents": [
    "a05691d5d88ea944c35c148755ed231c7a899a15"
  ],
  "author": {
    "name": "David Benjamin",
    "email": "davidben@google.com",
    "time": "Sun Jan 08 16:50:48 2023 -0800"
  },
  "committer": {
    "name": "Boringssl LUCI CQ",
    "email": "boringssl-scoped@luci-project-accounts.iam.gserviceaccount.com",
    "time": "Tue May 16 21:21:48 2023 +0000"
  },
  "message": "Remove read locks from PRNG steady state\n\nWe don\u0027t take write locks in the PRNG, steady state, but we do take some\nread locks: computing fork generation, reading the fork-unsafe buffering\nflag, and a FIPS-only artifact of some global state clearing mess. That\nlast one is completely useless, but it\u0027s a consequence of FIPS\u0027s\nunderstanding of process exit being comically inconsistent with reality.\n\nTaking read locks is, in principle, parallel, but the cacheline write\ncauses some contention, even in newer glibcs with faster read locks. Fix\nthese:\n\n- Use atomic reads to check the fork generation. We only need to lock\n  when we observe a fork.\n\n- Replace the fork-unsafe buffering flag with an atomic altogether.\n\n- Split state_clear_all_lock into a per-rand_thread_state lock. We still\n  need a read lock, but a completely uncontended one until process exit.\n\nWith many threads, this gives a significant perf boost.\n\nx86_64, non-FIPS, Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz, 30 threads:\nBefore:\nDid 45131875 RNG (16 bytes) operations in 300039649us (150419.7 ops/sec): 2.4 MB/s\nDid 44089000 RNG (32 bytes) operations in 300053237us (146937.3 ops/sec): 4.7 MB/s\nDid 43328000 RNG (256 bytes) operations in 300058423us (144398.5 ops/sec): 37.0 MB/s\nDid 45857000 RNG (1350 bytes) operations in 300095943us (152807.8 ops/sec): 206.3 MB/s\nDid 43249000 RNG (8192 bytes) operations in 300102698us (144114.0 ops/sec): 1180.6 MB/s\nAfter:\nDid 296204000 RNG (16 bytes) operations in 300009524us (987315.3 ops/sec): 15.8 MB/s\nDid 311347000 RNG (32 bytes) operations in 300014396us (1037773.5 ops/sec): 33.2 MB/s\nDid 295104000 RNG (256 bytes) operations in 300012657us (983638.5 ops/sec): 251.8 MB/s\nDid 255721000 RNG (1350 bytes) operations in 300016481us (852356.5 ops/sec): 1150.7 MB/s\nDid 103339000 RNG (8192 bytes) operations in 300040059us (344417.3 ops/sec): 2821.5 MB/s\n\n(Smaller PRNG draws are more impacted because they spend less time in the\nDRBG. But they\u0027re also more likely because you rarely need to pull 8K of\ndata out at once.)\n\nx86_64, FIPS, Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz, 30 threads:\nBefore:\nDid 29060000 RNG (16 bytes) operations in 300081190us (96840.5 ops/sec): 1.5 MB/s\nDid 31882000 RNG (32 bytes) operations in 300118031us (106231.5 ops/sec): 3.4 MB/s\nDid 30925000 RNG (256 bytes) operations in 300113646us (103044.3 ops/sec): 26.4 MB/s\nDid 31969000 RNG (1350 bytes) operations in 300096688us (106529.0 ops/sec): 143.8 MB/s\nDid 33434000 RNG (8192 bytes) operations in 300093240us (111412.0 ops/sec): 912.7 MB/s\nAfter:\nDid 299013000 RNG (16 bytes) operations in 300012167us (996669.6 ops/sec): 15.9 MB/s\nDid 289788000 RNG (32 bytes) operations in 300014611us (965913.0 ops/sec): 30.9 MB/s\nDid 298699000 RNG (256 bytes) operations in 300013443us (995618.7 ops/sec): 254.9 MB/s\nDid 247061000 RNG (1350 bytes) operations in 300018215us (823486.7 ops/sec): 1111.7 MB/s\nDid 100479000 RNG (8192 bytes) operations in 300037708us (334887.9 ops/sec): 2743.4 MB/s\n\nOn an M1 Pro, it\u0027s mostly a wash by default (fewer threads because this chip has fewer cores)\n\naarch64, M1 Pro, 8 threads:\nBefore:\nDid 23218000 RNG (16 bytes) operations in 80009131us (290191.9 ops/sec): 4.6 MB/s\nDid 23021000 RNG (256 bytes) operations in 80007544us (287735.4 ops/sec): 73.7 MB/s\nDid 22853000 RNG (1350 bytes) operations in 80013184us (285615.4 ops/sec): 385.6 MB/s\nDid 25407000 RNG (8192 bytes) operations in 80008371us (317554.3 ops/sec): 2601.4 MB/s\nDid 22128000 RNG (16384 bytes) operations in 80013269us (276554.1 ops/sec): 4531.1 MB/s\nAfter:\nDid 23303000 RNG (16 bytes) operations in 80011433us (291245.9 ops/sec): 4.7 MB/s\nDid 23072000 RNG (256 bytes) operations in 80008755us (288368.4 ops/sec): 73.8 MB/s\nDid 22807000 RNG (1350 bytes) operations in 80013355us (285039.9 ops/sec): 384.8 MB/s\nDid 23759000 RNG (8192 bytes) operations in 80010212us (296949.6 ops/sec): 2432.6 MB/s\nDid 23193000 RNG (16384 bytes) operations in 80011537us (289870.7 ops/sec): 4749.2 MB/s\n\nThis is likely because, without RDRAND or MADV_WIPEONFORK, we draw from\nthe OS on every call. We\u0027re likely bottlenecked by getentropy, whether\nit\u0027s some internal synchronization or syscall overherad. With\nfork-unsafe buffering enabled, this change shows even more significant\nwins on the M1 Pro.\n\naarch64, fork-unsafe buffering, M1 Pro, 8 threads:\nBefore:\nDid 25727000 RNG (16 bytes) operations in 80010579us (321545.0 ops/sec): 5.1 MB/s\nDid 25776000 RNG (32 bytes) operations in 80008587us (322165.4 ops/sec): 10.3 MB/s\nDid 25780000 RNG (256 bytes) operations in 80006127us (322225.3 ops/sec): 82.5 MB/s\nDid 33171250 RNG (1350 bytes) operations in 80002532us (414627.5 ops/sec): 559.7 MB/s\nDid 54784000 RNG (8192 bytes) operations in 80005706us (684751.2 ops/sec): 5609.5 MB/s\nAfter:\nDid 573826000 RNG (16 bytes) operations in 80000668us (7172765.1 ops/sec): 114.8 MB/s\nDid 571329000 RNG (32 bytes) operations in 80000423us (7141574.7 ops/sec): 228.5 MB/s\nDid 435043750 RNG (256 bytes) operations in 80000214us (5438032.3 ops/sec): 1392.1 MB/s\nDid 229536000 RNG (1350 bytes) operations in 80001888us (2869132.3 ops/sec): 3873.3 MB/s\nDid 57253000 RNG (8192 bytes) operations in 80004974us (715618.0 ops/sec): 5862.3 MB/s\n\nNote that, on hardware with RDRAND, the read lock in\nrand_fork_unsafe_buffering_enabled() doesn\u0027t do much. But without\nRDRAND, we hit that on every RAND_bytes call. More importantly, the\nsubsequent CL will fix a bug that will require us to hit it more\nfrequently.\n\nI\u0027ve removed the volatile on g_fork_detect_addr because I think we\ndidn\u0027t need it and this avoids thinking about the interaction between\nvolatile and atomics. The pointer is passed into madvise, so the\ncompiler knows the pointer escapes. For it to be invalid, the compiler\nwould need to go out of its way to model madvise as not remembering the\npointer, which would be incorrect of it for MADV_WIPEONFORK.\n\nBug: 570\nCq-Include-Trybots: luci.boringssl.try:linux_clang_rel_tsan\nChange-Id: Ie6977acd1b8e7639aaa419cf6f4f5f0645bde9d1\nReviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/59849\nCommit-Queue: David Benjamin \u003cdavidben@google.com\u003e\nReviewed-by: Adam Langley \u003cagl@google.com\u003e\n",
  "tree_diff": [
    {
      "type": "modify",
      "old_id": "58b068743427af0161e1883642ce696cffc8b288",
      "old_mode": 33188,
      "old_path": "crypto/fipsmodule/rand/fork_detect.c",
      "new_id": "9e46223c4ea1ec4be78c86c5126884c2a620cf97",
      "new_mode": 33188,
      "new_path": "crypto/fipsmodule/rand/fork_detect.c"
    },
    {
      "type": "modify",
      "old_id": "0ead1822cad25d44923825626a8ae680789ad68d",
      "old_mode": 33188,
      "old_path": "crypto/fipsmodule/rand/rand.c",
      "new_id": "bf6b0469fd5b8ddcdcf648414f0004ad06a8e40a",
      "new_mode": 33188,
      "new_path": "crypto/fipsmodule/rand/rand.c"
    },
    {
      "type": "modify",
      "old_id": "00f0582fecfc24572666b705806cc3cd4f7d17bc",
      "old_mode": 33188,
      "old_path": "crypto/internal.h",
      "new_id": "9edfd0e2d5e859575fef5ac71af70f44248997cd",
      "new_mode": 33188,
      "new_path": "crypto/internal.h"
    },
    {
      "type": "modify",
      "old_id": "0f1ececc89789d43644b8036c807677694658b86",
      "old_mode": 33188,
      "old_path": "crypto/rand_extra/forkunsafe.c",
      "new_id": "356afddf8146492ad21aa14e1487a9756c93bc48",
      "new_mode": 33188,
      "new_path": "crypto/rand_extra/forkunsafe.c"
    }
  ]
}
