Break early on composites in the primality test.

|a| is usually much smaller than |w_bits|. We only need to loop up to
|w_bits| and hide |a| when the value is possibly composite. If
Miller-Rabin has not hit -1 by then, break early.

This speeds up RSA keygen by a bit.

Before:
Did 248 RSA 2048 key-gen operations in 30041496us (8.3 ops/sec)
  min: 31690us, median: 109097us, max: 373911us
Did 71 RSA 3072 key-gen operations in 30096719us (2.4 ops/sec)
  min: 108650us, median: 370844us, max: 1768070us
Did 27 RSA 4096 key-gen operations in 32829007us (0.8 ops/sec)
  min: 205485us, median: 1107051us, max: 4035040us

After:
Did 340 RSA 2048 key-gen operations in 30026342us (11.3 ops/sec)
  min: 24681us, median: 77749us, max: 350477us
Did 67 RSA 3072 key-gen operations in 30089075us (2.2 ops/sec)
  min: 75070us, median: 394220us, max: 1101562us
Did 38 RSA 4096 key-gen operations in 30283788us (1.3 ops/sec)
  min: 284947us, median: 742688us, max: 1970468us

Change-Id: If1b48e9306c3fe1be56c304143e206c3bdb3301d
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/38165
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
diff --git a/crypto/fipsmodule/bn/prime.c b/crypto/fipsmodule/bn/prime.c
index 1dd0870..404139e 100644
--- a/crypto/fipsmodule/bn/prime.c
+++ b/crypto/fipsmodule/bn/prime.c
@@ -670,6 +670,11 @@
   // iterations once |j| = |a|.
   for (int j = 1; j < miller_rabin->w_bits; j++) {
     loop_done |= constant_time_eq_int(j, miller_rabin->a);
+    if (loop_done & ~is_possibly_prime) {
+      // If the loop is done and we haven't seen z = 1 or z = w-1 yet, the
+      // value is composite and we can break in variable time.
+      break;
+    }
 
     // Step 4.5.1.
     if (!BN_mod_mul_montgomery(z, z, z, mont, ctx)) {