Only draw from RDRAND for additional_data if it's fast.

We seek to incorporate entropy into every |RAND_bytes| call to avoid
problems with fork() and VM cloning. However, on some chips, RDRAND is
significantly slower than a system call thus crushing the performance of
|RAND_bytes|.

This change disables use of RDRAND for this opportunistic draw for
non-Intel chips. BoringSSL will then fall back to either the OS, or
nothing (if fork-unsafe mode has been set).

RDRAND is still used for seeding the PRNG whenever it is available.

This now adds a new blocking case: RDRAND may be used for seeding, but
the syscall to get additional_data was still blocking. Previously that
didn't matter because, if a syscall was used to get additional_data,
then a blocking one had already been used to seed. Thus the syscall for
additional_data is now non-blocking.

Also, we had both |hwrand| and |rdrand| names hanging around. We don't
support entropy instructions other than RDRAND, so unify around |rdrand|
naming. If we ever do add support for more we can properly abstract at
that time.

Change-Id: I91121b270a2ebc667543dad1324f37285daad893
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/40565
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
diff --git a/crypto/cpu-intel.c b/crypto/cpu-intel.c
index cc41fc4..53ece95 100644
--- a/crypto/cpu-intel.c
+++ b/crypto/cpu-intel.c
@@ -123,7 +123,9 @@
 // and |out[1]|. See the comment in |OPENSSL_cpuid_setup| about this.
 static void handle_cpu_env(uint32_t *out, const char *in) {
   const int invert = in[0] == '~';
-  const int hex = in[invert] == '0' && in[invert+1] == 'x';
+  const int or = in[0] == '|';
+  const int skip_first_byte = invert || or;
+  const int hex = in[skip_first_byte] == '0' && in[skip_first_byte+1] == 'x';
 
   int sscanf_result;
   uint64_t v;
@@ -140,6 +142,9 @@
   if (invert) {
     out[0] &= ~v;
     out[1] &= ~(v >> 32);
+  } else if (or) {
+    out[0] |= v;
+    out[1] |= (v >> 32);
   } else {
     out[0] = v;
     out[1] = v >> 32;
@@ -264,10 +269,14 @@
 
   // OPENSSL_ia32cap can contain zero, one or two values, separated with a ':'.
   // Each value is a 64-bit, unsigned value which may start with "0x" to
-  // indicate a hex value. Prior to the 64-bit value, a '~' may be given.
+  // indicate a hex value. Prior to the 64-bit value, a '~' or '|' may be given.
   //
-  // If '~' isn't present, then the value is taken as the result of the CPUID.
-  // Otherwise the value is inverted and ANDed with the probed CPUID result.
+  // If the '~' prefix is present:
+  //   the value is inverted and ANDed with the probed CPUID result
+  // If the '|' prefix is present:
+  //   the value is ORed with the probed CPUID result
+  // Otherwise:
+  //   the value is taken as the result of the CPUID
   //
   // The first value determines OPENSSL_ia32cap_P[0] and [1]. The second [2]
   // and [3].