Rework how DTLS ACKs and retransmits are flushed

This CL addresses a number of issues:

1. If an ACK or retransmit hit SSL_ERROR_WANT_WRITE, we do not provide a
   way for the caller to retry it.

2. If the retransmit timer fires too many times, we do not track the
   fatal error. Instead, we return an error out of the
   DTLSv1_handle_timeout and leave the timeout unhandled. If the caller
   does not tear down the connection in response to
   DTLSv1_handle_timeout failing, they may register a timeout for 0s,
   wake up again and loop forever.

   This happened in https://crbug.com/42224241. I filed
   https://crbug.com/42290595 to track fixing this API wart.

3. When the DTLS open_app_data hook needs to retransmit or send an ACK,
   we write immediately in that callback, but the eventual aim was to
   provide an in-place encrypt/decrypt API, so those hooks should not
   perform I/O.

   DTLS 1.3 adds a lot more cases where we'll want to do this:

   - We might read a message and then want to ACK on a timer
   - We might read a message and then want to ACK immediately
   - We might read a past retransmit and then want to retransmit
     the next flight
   - We might read an ACK, clear the current flight, and then want to
     send a queued up KeyUpdate that was waiting on it.

To address all these, this CL rearranges things slightly. Now the DTLS
record layer keeps track of whether it owes the peer an ACK and a
flight. If so, the flush() operation will try to write those out. If it
fails, it keeps track of how far it got. If it succeeds, it clears those
flags.

Various operations can set those flags, notably the timeouts. This means
the actual "handle timeout" part of DTLSv1_handle_timeout is
infallible. Handling a timeout means cashing the timeout expiry in for
setting the flag.

From there, we drive flush() to completion where we need to. For now,
I've added it to the handshake and SSL_read(), though I'm not sure if
I've gotten exactly the right spots. (Should we also flush on
SSL_write?) We could also go further and have flush() drive
handle_timeout(), and then handle_timeout() can be deprecated
altogether. The model is just "you must retry this operation after the
timeout, in addition to the SSL_ERROR_WANT_WHATEVER you got". That's
quite attractive, but for now I haven't quite gone that far.

Another complication is that flush() currently behaves very differently
between TLS and DTLS. In DTLS, I had to add an explicit finish_flight()
to queue up the flight for sending. If you leave things in there, it's
no big deal because everything in DTLS is unordered. In TLS, everything
is ordered, so once we've encrypted records, we have to flush them
regardless. To that end, in TLS we generally like to flush things
implicitly on write, because that's when the caller is expecting I/O
anyway, and we can concatenate the, say, encrypted KeyUpdate with the
encrypted application data. Not sure what's best to do there. For now
I've just gated flush() calls on SSL_is_dtls().

I'm sure we'll find this also isn't quite right and rearrange it again
later, but hopefully this works a bit better than what we had before.

Bug: 42290594, 376718254
Fixed: 42290595
Change-Id: I4d9b5c753530b514a1bc5e4e69ddb25dfc8c1d06
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/73527
Reviewed-by: Nick Harper <nharper@chromium.org>
Commit-Queue: David Benjamin <davidben@google.com>
17 files changed
tree: 78d1bd9c4f71beb05c1dea5321b236951ff4c049
  1. .bcr/
  2. .github/
  3. cmake/
  4. crypto/
  5. decrepit/
  6. docs/
  7. fuzz/
  8. gen/
  9. include/
  10. infra/
  11. pki/
  12. rust/
  13. ssl/
  14. third_party/
  15. tool/
  16. util/
  17. .bazelignore
  18. .bazelrc
  19. .bazelversion
  20. .clang-format
  21. .gitignore
  22. API-CONVENTIONS.md
  23. BREAKING-CHANGES.md
  24. BUILD.bazel
  25. build.json
  26. BUILDING.md
  27. CMakeLists.txt
  28. codereview.settings
  29. CONTRIBUTING.md
  30. FUZZING.md
  31. go.mod
  32. go.sum
  33. INCORPORATING.md
  34. LICENSE
  35. MODULE.bazel
  36. MODULE.bazel.lock
  37. PORTING.md
  38. PrivacyInfo.xcprivacy
  39. README.md
  40. SANDBOXING.md
  41. STYLE.md
README.md

BoringSSL

BoringSSL is a fork of OpenSSL that is designed to meet Google's needs.

Although BoringSSL is an open source project, it is not intended for general use, as OpenSSL is. We don't recommend that third parties depend upon it. Doing so is likely to be frustrating because there are no guarantees of API or ABI stability.

Programs ship their own copies of BoringSSL when they use it and we update everything as needed when deciding to make API changes. This allows us to mostly avoid compromises in the name of compatibility. It works for us, but it may not work for you.

BoringSSL arose because Google used OpenSSL for many years in various ways and, over time, built up a large number of patches that were maintained while tracking upstream OpenSSL. As Google's product portfolio became more complex, more copies of OpenSSL sprung up and the effort involved in maintaining all these patches in multiple places was growing steadily.

Currently BoringSSL is the SSL library in Chrome/Chromium, Android (but it's not part of the NDK) and a number of other apps/programs.

Project links:

To file a security issue, use the Chromium process and mention in the report this is for BoringSSL. You can ignore the parts of the process that are specific to Chromium/Chrome.

There are other files in this directory which might be helpful: