Use dec/jnz instead of loop in bn_add_words and bn_sub_words. Imported from upstream's a78324d95bd4568ce2c3b34bfa1d6f14cddf92ef. I think the "regression" part of that change is some tweak to BN_usub and I guess the bn_*_words was to compensate for it, but we may as well import it. Apparently the loop instruction is terrible. Before: Did 39871000 bn_add_words operations in 1000002us (39870920.3 ops/sec) Did 38621750 bn_sub_words operations in 1000001us (38621711.4 ops/sec) After: Did 64012000 bn_add_words operations in 1000007us (64011551.9 ops/sec) Did 81792250 bn_sub_words operations in 1000002us (81792086.4 ops/sec) loop sets no flags (even doing the comparison to zero without ZF) while dec sets all flags but CF, so Andres and I are assuming that because this prevents Intel from microcoding it to dec/jnz, they otherwise can't be bothered to add more circuitry since every compiler has internalized by now to never use loop. Change-Id: I3927cd1c7b707841bbe9963e3d4afd7ba9bd9b36 Reviewed-on: https://boringssl-review.googlesource.com/23344 Reviewed-by: Adam Langley <agl@google.com>

commit: 02514002fd67e9494294e6020878c844a3fe9b83 [log] [tgz]
author: David Benjamin <davidben@google.com> Wed Nov 22 11:08:45 2017 -0500
committer: Adam Langley <agl@google.com> Wed Nov 22 21:56:05 2017 +0000
tree: 70a0725a61b4781ac38e901483ad21973912b838
parent: 2056d7290a05c9cfd98889ef8b5519ddc81bd4d8 [diff]
diff --git a/crypto/fipsmodule/bn/asm/x86_64-gcc.c b/crypto/fipsmodule/bn/asm/x86_64-gcc.c
index 4059dcc..49351c1 100644
--- a/crypto/fipsmodule/bn/asm/x86_64-gcc.c
+++ b/crypto/fipsmodule/bn/asm/x86_64-gcc.c

@@ -202,7 +202,8 @@
       "	adcq	(%5,%2,8),%0	\n"
       "	movq	%0,(%3,%2,8)	\n"
       "	lea	1(%2),%2	\n"
-      "	loop	1b		\n"
+      "	dec	%1		\n"
+      "	jnz	1b		\n"
       "	sbbq	%0,%0		\n"
       : "=&r"(ret), "+c"(n), "+r"(i)
       : "r"(rp), "r"(ap), "r"(bp)
@@ -229,7 +230,8 @@
       "	sbbq	(%5,%2,8),%0	\n"
       "	movq	%0,(%3,%2,8)	\n"
       "	lea	1(%2),%2	\n"
-      "	loop	1b		\n"
+      "	dec	%1		\n"
+      "	jnz	1b		\n"
       "	sbbq	%0,%0		\n"
       : "=&r"(ret), "+c"(n), "+r"(i)
       : "r"(rp), "r"(ap), "r"(bp)
commit	02514002fd67e9494294e6020878c844a3fe9b83	[log] [tgz]
author	David Benjamin <davidben@google.com>	Wed Nov 22 11:08:45 2017 -0500
committer	Adam Langley <agl@google.com>	Wed Nov 22 21:56:05 2017 +0000
tree	70a0725a61b4781ac38e901483ad21973912b838
parent	2056d7290a05c9cfd98889ef8b5519ddc81bd4d8 [diff]