Enable Montgomery optimisations on ARM.

These were accidently disabled for ARM.

Before:

Did 38 RSA 2048 signing operations in 1051209us (36.1 ops/sec)
Did 1500 RSA 2048 verify operations in 1069611us (1402.4 ops/sec)
Did 65 RSA 2048 (3 prime, e=3) signing operations in 1055664us (61.6 ops/sec)
Did 4719 RSA 2048 (3 prime, e=3) verify operations in 1029144us (4585.4 ops/sec)
Did 5 RSA 4096 signing operations in 1092346us (4.6 ops/sec)
Did 418 RSA 4096 verify operations in 1069977us (390.7 ops/sec)

After:

Did 156 RSA 2048 signing operations in 1000672us (155.9 ops/sec)
Did 6071 RSA 2048 verify operations in 1068512us (5681.7 ops/sec)
Did 84 RSA 2048 (3 prime, e=3) signing operations in 1068847us (78.6 ops/sec)
Did 11000 RSA 2048 (3 prime, e=3) verify operations in 1023620us (10746.2 ops/sec)
Did 26 RSA 4096 signing operations in 1028320us (25.3 ops/sec)
Did 1788 RSA 4096 verify operations in 1072479us (1667.2 ops/sec)

Change-Id: I448698f7d8e5b481a06f98d54d608f0278827cd1
Reviewed-on: https://boringssl-review.googlesource.com/6443
Reviewed-by: David Benjamin <davidben@chromium.org>
Reviewed-by: Adam Langley <agl@google.com>
diff --git a/crypto/bn/montgomery.c b/crypto/bn/montgomery.c
index db673a1..90cb149 100644
--- a/crypto/bn/montgomery.c
+++ b/crypto/bn/montgomery.c
@@ -119,7 +119,7 @@
 
 
 #if !defined(OPENSSL_NO_ASM) && \
-    (defined(OPENSSL_X86) || defined(OPENSSL_X86_64))
+    (defined(OPENSSL_X86) || defined(OPENSSL_X86_64) || defined(OPENSSL_ARM))
 #define OPENSSL_BN_ASM_MONT
 #endif