Assume hyper-threading-like vulnerabilities are always present.
It's not clear that CPUID will always report the correct value here,
especially for hyper-threading environments. It also isn't clear that
the assumptions made by AMD processors are correct and will always be
correct. It also seems likely that, if a code path is
security-sensitive w.r.t. SMT, it is probably also security-sensitive
w.r.t. other processor (mis)features. Finally, it isn't clear that all
dynamic analysis (fuzzing, SDE, etc.) is done separately for the cross
product of all CPU feature combinations * the value of this bit.
With all that in mind, instruct code sensitive to this bit to always
choose the more conservative path.
I only found one place that's sensitive to this bit, though I didn't
look too hard:
```
aes_nohw_cbc_encrypt:
[...]
leaq OPENSSL_ia32cap_P(%rip),%r10
mov (%r10), %r10d
[...]
bt \$28,%r10d
jc .Lcbc_slow_prologue
```
I didn't verify that the code in the HTT-enabled paths is any better
than the code in the HTT-disabled paths.
Change-Id: Ifd643e6a1301e5ca2174b84c344eb933d49e0067
Reviewed-on: https://boringssl-review.googlesource.com/c/33404
Reviewed-by: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
diff --git a/crypto/cpu-intel.c b/crypto/cpu-intel.c
index 20cfbe8..5c21f4a 100644
--- a/crypto/cpu-intel.c
+++ b/crypto/cpu-intel.c
@@ -173,29 +173,11 @@
extended_features[1] = ecx;
}
- // Determine the number of cores sharing an L1 data cache to adjust the
- // hyper-threading bit.
- uint32_t cores_per_cache = 0;
- if (is_amd) {
- // AMD CPUs never share an L1 data cache between threads but do set the HTT
- // bit on multi-core CPUs.
- cores_per_cache = 1;
- } else if (num_ids >= 4) {
- // TODO(davidben): The Intel manual says this CPUID leaf enumerates all
- // caches using ECX and doesn't say which is first. Does this matter?
- OPENSSL_cpuid(&eax, &ebx, &ecx, &edx, 4);
- cores_per_cache = 1 + ((eax >> 14) & 0xfff);
- }
-
OPENSSL_cpuid(&eax, &ebx, &ecx, &edx, 1);
- // Adjust the hyper-threading bit.
- if (edx & (1u << 28)) {
- uint32_t num_logical_cores = (ebx >> 16) & 0xff;
- if (cores_per_cache == 1 || num_logical_cores <= 1) {
- edx &= ~(1u << 28);
- }
- }
+ // Force the hyper-threading bit so that the more conservative path is always
+ // chosen.
+ edx |= 1u << 28;
// Reserved bit #20 was historically repurposed to control the in-memory
// representation of RC4 state. Always set it to zero.