Make ECDSA signing 10% faster and plug some timing leaks.

None of the asymmetric crypto we inherented from OpenSSL is
constant-time because of BIGNUM. BIGNUM chops leading zeros off the
front of everything, so we end up leaking information about the first
word, in theory. BIGNUM functions additionally tend to take the full
range of inputs and then call into BN_nnmod at various points.

All our secret values should be acted on in constant-time, but k in
ECDSA is a particularly sensitive value. So, ecdsa_sign_setup, in an
attempt to mitigate the BIGNUM leaks, would add a couple copies of the
order.

This does not work at all. k is used to compute two values: k^-1 and kG.
The first operation when computing k^-1 is to call BN_nnmod if k is out
of range. The entry point to our tuned constant-time curve
implementations is to call BN_nnmod if the scalar has too many bits,
which this causes. The result is both corrections are immediately undone
but cause us to do more variable-time work in the meantime.

Replace all these computations around k with the word-based functions
added in the various preceding CLs. In doing so, replace the BN_mod_mul
calls (which internally call BN_nnmod) with Montgomery reduction. We can
avoid taking k^-1 out of Montgomery form, which combines nicely with
Brian Smith's trick in 3426d1011946b26ff1bb2fd98a081ba4753c9cc8. Along
the way, we avoid some unnecessary mallocs.

BIGNUM still affects the private key itself, as well as the EC_POINTs.
But this should hopefully be much better now. Also it's 10% faster:

Before:
Did 15000 ECDSA P-224 signing operations in 1069117us (14030.3 ops/sec)
Did 18000 ECDSA P-256 signing operations in 1053908us (17079.3 ops/sec)
Did 1078 ECDSA P-384 signing operations in 1087853us (990.9 ops/sec)
Did 473 ECDSA P-521 signing operations in 1069835us (442.1 ops/sec)

After:
Did 16000 ECDSA P-224 signing operations in 1064799us (15026.3 ops/sec)
Did 19000 ECDSA P-256 signing operations in 1007839us (18852.2 ops/sec)
Did 1078 ECDSA P-384 signing operations in 1079413us (998.7 ops/sec)
Did 484 ECDSA P-521 signing operations in 1083616us (446.7 ops/sec)

Change-Id: I2a25e90fc99dac13c0616d0ea45e125a4bd8cca1
Reviewed-on: https://boringssl-review.googlesource.com/23075
Reviewed-by: Adam Langley <agl@google.com>
diff --git a/include/openssl/bn.h b/include/openssl/bn.h
index 85fdae4..bb32c2f 100644
--- a/include/openssl/bn.h
+++ b/include/openssl/bn.h
@@ -618,17 +618,6 @@
 // BN_pseudo_rand_range is an alias for BN_rand_range.
 OPENSSL_EXPORT int BN_pseudo_rand_range(BIGNUM *rnd, const BIGNUM *range);
 
-// BN_generate_dsa_nonce generates a random number 0 <= out < range. Unlike
-// BN_rand_range, it also includes the contents of |priv| and |message| in the
-// generation so that an RNG failure isn't fatal as long as |priv| remains
-// secret. This is intended for use in DSA and ECDSA where an RNG weakness
-// leads directly to private key exposure unless this function is used.
-// It returns one on success and zero on error.
-OPENSSL_EXPORT int BN_generate_dsa_nonce(BIGNUM *out, const BIGNUM *range,
-                                         const BIGNUM *priv,
-                                         const uint8_t *message,
-                                         size_t message_len, BN_CTX *ctx);
-
 // BN_GENCB holds a callback function that is used by generation functions that
 // can take a very long time to complete. Use |BN_GENCB_set| to initialise a
 // |BN_GENCB| structure.
diff --git a/include/openssl/ec.h b/include/openssl/ec.h
index dee41b7..b34605f 100644
--- a/include/openssl/ec.h
+++ b/include/openssl/ec.h
@@ -402,5 +402,6 @@
 #define EC_R_GROUP_MISMATCH 130
 #define EC_R_INVALID_COFACTOR 131
 #define EC_R_PUBLIC_KEY_VALIDATION_FAILED 132
+#define EC_R_INVALID_SCALAR 133
 
 #endif  // OPENSSL_HEADER_EC_H