Enable ADX in x86_64-mont*.pl.

This is a reland of https://boringssl-review.googlesource.com/18965
which was reverted due to Windows toolchain problems that have since
been fixed.

We have an SDE bot now and can more easily test things. We also enabled
ADX in rsaz-avx2.pl which does not work without x86_64-mont*.pl enabled.
rsa-avx2.pl's ADX code only turns itself off so that the faster ADX code
can be used... but we disable it.

Verified, after reverting the fix, the test vectors we imported combined
with Intel SDE catches CVE-2016-7055, so we do indeed have test
coverage. Also verified on the Windows version of Intel SDE.

Thanks to Alexey Ivanov for pointing out the discrepancy.

Skylake numbers:

Before:
Did 7296 RSA 2048 signing operations in 10038191us (726.8 ops/sec)
Did 209000 RSA 2048 verify operations in 10030629us (20836.2 ops/sec)
Did 1080 RSA 4096 signing operations in 10072221us (107.2 ops/sec)
Did 60836 RSA 4096 verify operations in 10053929us (6051.0 ops/sec)

ADX consistently off:
Did 9360 RSA 2048 signing operations in 10025823us (933.6 ops/sec)
Did 220000 RSA 2048 verify operations in 10024339us (21946.6 ops/sec)
Did 1048 RSA 4096 signing operations in 10006782us (104.7 ops/sec)
Did 61936 RSA 4096 verify operations in 10088011us (6139.6 ops/sec)

After (ADX consistently on):
Did 10444 RSA 2048 signing operations in 10006781us (1043.7 ops/sec)
Did 323000 RSA 2048 verify operations in 10012192us (32260.7 ops/sec)
Did 1610 RSA 4096 signing operations in 10044930us (160.3 ops/sec)
Did 96000 RSA 4096 verify operations in 10075606us (9528.0 ops/sec)

Change-Id: I2502ce80e9cfcdea40907512682e3a6663000faa
Reviewed-on: https://boringssl-review.googlesource.com/19105
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
diff --git a/BUILDING.md b/BUILDING.md
index 6dfeddb..d5d3fa0 100644
--- a/BUILDING.md
+++ b/BUILDING.md
@@ -32,8 +32,9 @@
   * [Go](https://golang.org/dl/) is required. If not found by CMake, the go
     executable may be configured explicitly by setting `GO_EXECUTABLE`.
 
-  * To build the x86 and x86\_64 assembly, your assembler must support AVX2
-    instructions and MOVBE. If using GNU binutils, you must have 2.22 or later.
+  * To build the x86 and x86\_64 assembly, your assembler must support AVX2,
+    MOVBE, and ADX. If using GNU binutils, you must have 2.23 or later. If using
+    Yasm, you must have 1.3.0 or later.
 
 ## Building
 
diff --git a/crypto/fipsmodule/bn/asm/rsaz-avx2.pl b/crypto/fipsmodule/bn/asm/rsaz-avx2.pl
index 14a28dc..437c0f7 100755
--- a/crypto/fipsmodule/bn/asm/rsaz-avx2.pl
+++ b/crypto/fipsmodule/bn/asm/rsaz-avx2.pl
@@ -82,8 +82,6 @@
 # In upstream, this is controlled by shelling out to the compiler to check
 # versions, but BoringSSL is intended to be used with pre-generated perlasm
 # output, so this isn't useful anyway.
-#
-# TODO(davidben): Enable these after testing. $avx goes up to 2 and $addx to 1.
 $avx = 2;
 $addx = 1;
 
diff --git a/crypto/fipsmodule/bn/asm/x86_64-mont.pl b/crypto/fipsmodule/bn/asm/x86_64-mont.pl
index 286e0a0..c27f65d 100755
--- a/crypto/fipsmodule/bn/asm/x86_64-mont.pl
+++ b/crypto/fipsmodule/bn/asm/x86_64-mont.pl
@@ -56,9 +56,7 @@
 # In upstream, this is controlled by shelling out to the compiler to check
 # versions, but BoringSSL is intended to be used with pre-generated perlasm
 # output, so this isn't useful anyway.
-#
-# TODO(davidben): Enable this option after testing. $addx goes up to 1.
-$addx = 0;
+$addx = 1;
 
 # int bn_mul_mont(
 $rp="%rdi";	# BN_ULONG *rp,
diff --git a/crypto/fipsmodule/bn/asm/x86_64-mont5.pl b/crypto/fipsmodule/bn/asm/x86_64-mont5.pl
index db09624..73abb71 100755
--- a/crypto/fipsmodule/bn/asm/x86_64-mont5.pl
+++ b/crypto/fipsmodule/bn/asm/x86_64-mont5.pl
@@ -41,9 +41,7 @@
 # In upstream, this is controlled by shelling out to the compiler to check
 # versions, but BoringSSL is intended to be used with pre-generated perlasm
 # output, so this isn't useful anyway.
-#
-# TODO(davidben): Enable this after testing. $addx goes up to 1.
-$addx = 0;
+$addx = 1;
 
 # int bn_mul_mont_gather5(
 $rp="%rdi";	# BN_ULONG *rp,