Build 32-bit assembly with SSE2 enabled.
This affects bignum and sha. Also now that we're passing the SSE2 flag, revert
the change to ghash-x86.pl which unconditionally sets $sse2, just to minimize
upstream divergence. Chromium assumes SSE2 support, so relying on it is okay.
See https://crbug.com/349320.
Note: this change needs to be mirrored in Chromium to take.
bssl speed numbers:
SSE2:
Did 552 RSA 2048 signing operations in 3007814us (183.5 ops/sec)
Did 19003 RSA 2048 verify operations in 3070779us (6188.3 ops/sec)
Did 72 RSA 4096 signing operations in 3055885us (23.6 ops/sec)
Did 4650 RSA 4096 verify operations in 3024926us (1537.2 ops/sec)
Without SSE2:
Did 350 RSA 2048 signing operations in 3042021us (115.1 ops/sec)
Did 11760 RSA 2048 verify operations in 3003197us (3915.8 ops/sec)
Did 46 RSA 4096 signing operations in 3042692us (15.1 ops/sec)
Did 3400 RSA 4096 verify operations in 3083035us (1102.8 ops/sec)
SSE2:
Did 16407000 SHA-1 (16 bytes) operations in 3000141us (5468743.0 ops/sec): 87.5 MB/s
Did 4367000 SHA-1 (256 bytes) operations in 3000436us (1455455.1 ops/sec): 372.6 MB/s
Did 185000 SHA-1 (8192 bytes) operations in 3002666us (61611.9 ops/sec): 504.7 MB/s
Did 9444000 SHA-256 (16 bytes) operations in 3000052us (3147945.4 ops/sec): 50.4 MB/s
Did 2283000 SHA-256 (256 bytes) operations in 3000457us (760884.1 ops/sec): 194.8 MB/s
Did 89000 SHA-256 (8192 bytes) operations in 3016024us (29509.0 ops/sec): 241.7 MB/s
Did 5550000 SHA-512 (16 bytes) operations in 3000350us (1849784.2 ops/sec): 29.6 MB/s
Did 1820000 SHA-512 (256 bytes) operations in 3001039us (606456.6 ops/sec): 155.3 MB/s
Did 93000 SHA-512 (8192 bytes) operations in 3007874us (30918.8 ops/sec): 253.3 MB/s
Without SSE2:
Did 10573000 SHA-1 (16 bytes) operations in 3000261us (3524026.7 ops/sec): 56.4 MB/s
Did 2937000 SHA-1 (256 bytes) operations in 3000621us (978797.4 ops/sec): 250.6 MB/s
Did 123000 SHA-1 (8192 bytes) operations in 3033202us (40551.2 ops/sec): 332.2 MB/s
Did 5846000 SHA-256 (16 bytes) operations in 3000294us (1948475.7 ops/sec): 31.2 MB/s
Did 1377000 SHA-256 (256 bytes) operations in 3000335us (458948.8 ops/sec): 117.5 MB/s
Did 54000 SHA-256 (8192 bytes) operations in 3027962us (17833.8 ops/sec): 146.1 MB/s
Did 2075000 SHA-512 (16 bytes) operations in 3000967us (691443.8 ops/sec): 11.1 MB/s
Did 638000 SHA-512 (256 bytes) operations in 3000576us (212625.8 ops/sec): 54.4 MB/s
Did 30000 SHA-512 (8192 bytes) operations in 3042797us (9859.3 ops/sec): 80.8 MB/s
BUG=430237
Change-Id: I47d1c1ffcd71afe4f4a192272f8cb92af9505ee1
Reviewed-on: https://boringssl-review.googlesource.com/4130
Reviewed-by: Adam Langley <agl@google.com>
diff --git a/crypto/CMakeLists.txt b/crypto/CMakeLists.txt
index bbd4f20..c454664 100644
--- a/crypto/CMakeLists.txt
+++ b/crypto/CMakeLists.txt
@@ -2,7 +2,7 @@
if(APPLE)
if (${ARCH} STREQUAL "x86")
- set(PERLASM_FLAGS "-fPIC")
+ set(PERLASM_FLAGS "-fPIC -DOPENSSL_IA32_SSE2")
endif()
set(PERLASM_STYLE macosx)
set(ASM_EXT S)
@@ -13,7 +13,7 @@
# in order to decide whether to generate 32- or 64-bit asm.
set(PERLASM_STYLE linux64)
elseif (${ARCH} STREQUAL "x86")
- set(PERLASM_FLAGS "-fPIC")
+ set(PERLASM_FLAGS "-fPIC -DOPENSSL_IA32_SSE2")
set(PERLASM_STYLE elf)
else()
set(PERLASM_STYLE elf)
@@ -27,6 +27,7 @@
else()
message("Using win32n")
set(PERLASM_STYLE win32n)
+ set(PERLASM_FLAGS "-DOPENSSL_IA32_SSE2")
endif()
# On Windows, we use the NASM output, specifically built with Yasm.
diff --git a/crypto/modes/asm/ghash-x86.pl b/crypto/modes/asm/ghash-x86.pl
index eb6d55e..23a5527 100644
--- a/crypto/modes/asm/ghash-x86.pl
+++ b/crypto/modes/asm/ghash-x86.pl
@@ -131,8 +131,8 @@
&asm_init($ARGV[0],"ghash-x86.pl",$x86only = $ARGV[$#ARGV] eq "386");
-$sse2=1;
-#for (@ARGV) { $sse2=1 if (/-DOPENSSL_IA32_SSE2/); }
+$sse2=0;
+for (@ARGV) { $sse2=1 if (/-DOPENSSL_IA32_SSE2/); }
($Zhh,$Zhl,$Zlh,$Zll) = ("ebp","edx","ecx","ebx");
$inp = "edi";