Enable the SSE2 Poly1305 implementation on clang-cl.

poly1305_vec.c requires SSE2 intrinsics and uint128_t. Unlike MSVC, clang-cl
supports uint128_t just fine (as long as we do not do division). This makes
ChaCha20-Poly1305 much faster on Windows when built with clang-cl.

Before:
Did 2219000 ChaCha20-Poly1305 (16 bytes) seal operations in 1016000us (2184055.1 ops/sec): 34.9 MB/s
Did 1279500 ChaCha20-Poly1305 (256 bytes) seal operations in 1016000us (1259350.4 ops/sec): 322.4 MB/s
Did 428250 ChaCha20-Poly1305 (1350 bytes) seal operations in 1015000us (421921.2 ops/sec): 569.6 MB/s
Did 84000 ChaCha20-Poly1305 (8192 bytes) seal operations in 1016000us (82677.2 ops/sec): 677.3 MB/s
Did 39750 ChaCha20-Poly1305 (16384 bytes) seal operations in 1015000us (39162.6 ops/sec): 641.6 MB/s

After:
Did 2096250 ChaCha20-Poly1305 (16 bytes) seal operations in 1015000us (2065270.9 ops/sec): 33.0 MB/s
Did 1453250 ChaCha20-Poly1305 (256 bytes) seal operations in 1016000us (1430364.2 ops/sec): 366.2 MB/s
Did 642500 ChaCha20-Poly1305 (1350 bytes) seal operations in 1015000us (633004.9 ops/sec): 854.6 MB/s
Did 136250 ChaCha20-Poly1305 (8192 bytes) seal operations in 1016000us (134104.3 ops/sec): 1098.6 MB/s
Did 69750 ChaCha20-Poly1305 (16384 bytes) seal operations in 1016000us (68651.6 ops/sec): 1124.8 MB/s

(Benchmarks gathered in VM, but this is a significant difference.)

Change-Id: Ia0a856e75995623c5621d2e48d61d945c41b17de
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/39345
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
diff --git a/crypto/poly1305/poly1305.c b/crypto/poly1305/poly1305.c
index c3e9272..a6dd145 100644
--- a/crypto/poly1305/poly1305.c
+++ b/crypto/poly1305/poly1305.c
@@ -26,7 +26,7 @@
 #include "../internal.h"
 
 
-#if defined(OPENSSL_WINDOWS) || !defined(OPENSSL_X86_64)
+#if !defined(BORINGSSL_HAS_UINT128) || !defined(OPENSSL_X86_64)
 
 // We can assume little-endian.
 static uint32_t U8TO32_LE(const uint8_t *m) {
@@ -315,4 +315,4 @@
   U32TO8_LE(&mac[12], f3);
 }
 
-#endif  // OPENSSL_WINDOWS || !OPENSSL_X86_64
+#endif  // !BORINGSSL_HAS_UINT128 || !OPENSSL_X86_64
diff --git a/crypto/poly1305/poly1305_vec.c b/crypto/poly1305/poly1305_vec.c
index 7860e0a..29cd5c3 100644
--- a/crypto/poly1305/poly1305_vec.c
+++ b/crypto/poly1305/poly1305_vec.c
@@ -23,7 +23,7 @@
 #include "../internal.h"
 
 
-#if !defined(OPENSSL_WINDOWS) && defined(OPENSSL_X86_64)
+#if defined(BORINGSSL_HAS_UINT128) && defined(OPENSSL_X86_64)
 
 #include <emmintrin.h>
 
@@ -853,4 +853,4 @@
   store_u64_le(mac + 8, ((h1 >> 20) | (h2 << 24)));
 }
 
-#endif  // !OPENSSL_WINDOWS && OPENSSL_X86_64
+#endif  // BORINGSSL_HAS_UINT128 && OPENSSL_X86_64