Clear AVX512 feature bits when AVX512 not actually supported
According to Intel's documentation, if not all the AVX512 bits in XCR0
are set (meaning that the operating system doesn't fully support
AVX512), then no AVX512 feature can be used, even on xmm and ymm
registers. Make OPENSSL_cpuid_setup() correctly handle this case by
clearing all the AVX512 feature bits when this situation is detected.
Change-Id: I2774dbc28bfbac1196e405c0920ba2909e7f0eb3
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/68907
Reviewed-by: David Benjamin <davidben@google.com>
Reviewed-by: Bob Beck <bbe@google.com>
Commit-Queue: Adam Langley <agl@google.com>
Auto-Submit: Eric Biggers <ebiggers@google.com>
diff --git a/crypto/cpu_intel.c b/crypto/cpu_intel.c
index 347eadc..3453fa3 100644
--- a/crypto/cpu_intel.c
+++ b/crypto/cpu_intel.c
@@ -227,19 +227,39 @@
ecx &= ~(1u << 28); // AVX
ecx &= ~(1u << 12); // FMA
ecx &= ~(1u << 11); // AMD XOP
- // Clear AVX2 and AVX512* bits.
- //
- // TODO(davidben): Should bits 17 and 26-28 also be cleared? Upstream
- // doesn't clear those. See the comments in
- // |CRYPTO_hardware_supports_XSAVE|.
- extended_features[0] &=
- ~((1u << 5) | (1u << 16) | (1u << 21) | (1u << 30) | (1u << 31));
+ extended_features[0] &= ~(1u << 5); // AVX2
}
- // See Intel manual, volume 1, section 15.2.
+ // See Intel manual, volume 1, sections 15.2 ("Detection of AVX-512 Foundation
+ // Instructions") through 15.4 ("Detection of Intel AVX-512 Instruction Groups
+ // Operating at 256 and 128-bit Vector Lengths").
if ((xcr0 & 0xe6) != 0xe6) {
- // Clear AVX512F. Note we don't touch other AVX512 extensions because they
- // can be used with YMM.
- extended_features[0] &= ~(1u << 16);
+ // Without XCR0.111xx11x, no AVX512 feature can be used. This includes ZMM
+ // registers, masking, SIMD registers 16-31 (even if accessed as YMM or
+ // XMM), and EVEX-coded instructions (even on YMM or XMM). Even if only
+ // XCR0.ZMM_Hi256 is missing, it isn't valid to use AVX512 features on
+ // shorter vectors, since AVX512 ties everything to the availability of
+ // 512-bit vectors. See the above-mentioned sections of the Intel manual,
+ // which say that *all* these XCR0 bits must be checked even when just using
+ // 128-bit or 256-bit vectors, and also volume 2a section 2.7.11 ("#UD
+ // Equations for EVEX") which says that all EVEX-coded instructions raise an
+ // undefined-instruction exception if any of these XCR0 bits is zero.
+ //
+ // AVX10 fixes this by reorganizing the features that used to be part of
+ // "AVX512" and allowing them to be used independently of 512-bit support.
+ // TODO: add AVX10 detection.
+ extended_features[0] &= ~(1u << 16); // AVX512F
+ extended_features[0] &= ~(1u << 17); // AVX512DQ
+ extended_features[0] &= ~(1u << 21); // AVX512IFMA
+ extended_features[0] &= ~(1u << 26); // AVX512PF
+ extended_features[0] &= ~(1u << 27); // AVX512ER
+ extended_features[0] &= ~(1u << 28); // AVX512CD
+ extended_features[0] &= ~(1u << 30); // AVX512BW
+ extended_features[0] &= ~(1u << 31); // AVX512VL
+ extended_features[1] &= ~(1u << 1); // AVX512VBMI
+ extended_features[1] &= ~(1u << 6); // AVX512VBMI2
+ extended_features[1] &= ~(1u << 11); // AVX512VNNI
+ extended_features[1] &= ~(1u << 12); // AVX512BITALG
+ extended_features[1] &= ~(1u << 14); // AVX512VPOPCNTDQ
}
OPENSSL_ia32cap_P[0] = edx;
diff --git a/crypto/internal.h b/crypto/internal.h
index 89d1e80..209c85a 100644
--- a/crypto/internal.h
+++ b/crypto/internal.h
@@ -1390,13 +1390,13 @@
// ECX for CPUID where EAX = 1
// Bit 11 is used to indicate AMD XOP support, not SDBG
// Index 2:
-// EBX for CPUID where EAX = 7
+// EBX for CPUID where EAX = 7, ECX = 0
// Index 3:
-// ECX for CPUID where EAX = 7
+// ECX for CPUID where EAX = 7, ECX = 0
//
-// Note: the CPUID bits are pre-adjusted for the OSXSAVE bit and the YMM and XMM
-// bits in XCR0, so it is not necessary to check those. (WARNING: See caveats
-// in cpu_intel.c.)
+// Note: the CPUID bits are pre-adjusted for the OSXSAVE bit and the XMM, YMM,
+// and AVX512 bits in XCR0, so it is not necessary to check those. (WARNING: See
+// caveats in cpu_intel.c.)
//
// From C, this symbol should only be accessed with |OPENSSL_get_ia32cap|.
extern uint32_t OPENSSL_ia32cap_P[4];