Remove p > q normalization in RSA keys
RSA CRT is tiny bit messier when p < q.
https://boringssl-review.googlesource.com/25263 solved this by
normalizing to p > q. The cost was we sometimes had to compute a new
iqmp.
Modular inversion is expensive. We did it only once per key, but it's
still a performance cliff in per-key costs. When later work moves
freeze_private_key into RSA private key parsing, it will be a
performance cliff in the private key parser.
Instead, just handle p < q in the CRT function. The only difference is
needing one extra reduction before the modular subtraction. Even using
the fully general mod_montgomery function (as opposed to checking p < q,
or using bn_reduce_once when num_bits(p) == num_bits(q)) was not
measurable.
In doing so, I noticed we didn't actually have tests that exercise the
reduction step. I added one to evp_tests.txt, but it is only meaningful
when blinding is disabled. (Another cost of blinding.) When blinding is
enabled, the answers mod p and q are randomized and we hit this case
with about 1.8% probability. See comment in evp_test.txt.
I kept the optimization where we store iqmp in Montgomery form, not
because the optimization matters, but because we need to store a
corrected, fixed-width version of the value anyway, so we may as well
store it in a more convenient form.
M1 Max
Before:
Did 9048 RSA 2048 signing operations in 5033403us (1797.6 ops/sec)
Did 1500 RSA 4096 signing operations in 5009288us (299.4 ops/sec)
After:
Did 9116 RSA 2048 signing operations in 5053802us (1803.8 ops/sec) [+0.3%]
Did 1500 RSA 4096 signing operations in 5008283us (299.5 ops/sec) [+0.0%]
Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz
Before:
Did 9282 RSA 2048 signing operations in 5019395us (1849.2 ops/sec)
Did 1302 RSA 4096 signing operations in 5055011us (257.6 ops/sec)
After:
Did 9240 RSA 2048 signing operations in 5024845us (1838.9 ops/sec) [-0.6%]
Did 1302 RSA 4096 signing operations in 5046157us (258.0 ops/sec) [+0.2%]
Bug: 316
Change-Id: Icb90c7d5f5188f9b69a6d7bcc63db13d92ec26d5
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60705
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
diff --git a/crypto/evp/evp_test.cc b/crypto/evp/evp_test.cc
index 9c4f14d..fafd50b 100644
--- a/crypto/evp/evp_test.cc
+++ b/crypto/evp/evp_test.cc
@@ -125,7 +125,7 @@
return EVP_PKEY_NONE;
}
-static int GetRSAPadding(FileTest *t, int *out, const std::string &name) {
+static bool GetRSAPadding(FileTest *t, int *out, const std::string &name) {
if (name == "PKCS1") {
*out = RSA_PKCS1_PADDING;
return true;
@@ -138,6 +138,10 @@
*out = RSA_PKCS1_OAEP_PADDING;
return true;
}
+ if (name == "None") {
+ *out = RSA_NO_PADDING;
+ return true;
+ }
ADD_FAILURE() << "Unknown RSA padding mode: " << name;
return false;
}
diff --git a/crypto/evp/evp_tests.txt b/crypto/evp/evp_tests.txt
index 238a602..cbce1b0 100644
--- a/crypto/evp/evp_tests.txt
+++ b/crypto/evp/evp_tests.txt
@@ -655,6 +655,16 @@
Input = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
Output = 07fa4e3de9c002c41c952dc292ef5a814c4c17dc1a6cf958c4c971e8089676d6661b442270ef9295c41e5385c9628aa1bdee2cc2558b8473ba212f2ba04b9ff2264c19187b9506b1d0a1cc2751844cc8dedf555d62ce81bc0e70bfe83d0184ee964593af91b9b327c0fb272c799148cd8737d412cbf36c2ad25fd66977bf805f
+# The input is 1 (mod p) and q - 1 (mod q). If the CRT implementation does not
+# account for p < q, the subtraction step will break. However, this test only
+# works when RSA blinding is disabled. When blinding is enabled, these values
+# are randomized and, instead, either this or the above test will hit this case
+# if repeated enough. (It hits it with probability (q-p)^2 / 2pq, or 1.8% here.)
+Encrypt = RSA-Swapped
+RSAPadding = None
+Input = 7ab490e1f5e718ff23a9e738f9867559e3a6ea72332e3bcc1b6f58585218ed815a865dab75e3f44ea550bdef815101d6039251a513fd99188c1916abb94b15ef72de793f6472a342b33a125cfc96a4fc89d140e337f5fe6ff937ed3f34b6b09f5227f598e12faddf4423254acbee748197bed26954502c7484c65e279fed7fed
+CheckDecrypt
+
# Though we will never generate such a key, test that RSA keys where p and q are
# different sizes work properly.
PrivateKey = RSA-PrimeMismatch
diff --git a/crypto/fipsmodule/rsa/internal.h b/crypto/fipsmodule/rsa/internal.h
index c6bf60a..5a993a2 100644
--- a/crypto/fipsmodule/rsa/internal.h
+++ b/crypto/fipsmodule/rsa/internal.h
@@ -103,10 +103,8 @@
// the |freeze_private_key| implementation.
BIGNUM *d_fixed, *dmp1_fixed, *dmq1_fixed;
- // inv_small_mod_large_mont is q^-1 mod p in Montgomery form, using |mont_p|,
- // if |p| >= |q|. Otherwise, it is p^-1 mod q in Montgomery form, using
- // |mont_q|.
- BIGNUM *inv_small_mod_large_mont;
+ // iqmp_mont is q^-1 mod p in Montgomery form, using |mont_p|.
+ BIGNUM *iqmp_mont;
// num_blindings contains the size of the |blindings| and |blindings_inuse|
// arrays. This member and the |blindings_inuse| array are protected by
diff --git a/crypto/fipsmodule/rsa/rsa_impl.c b/crypto/fipsmodule/rsa/rsa_impl.c
index 1206397..6cdc290 100644
--- a/crypto/fipsmodule/rsa/rsa_impl.c
+++ b/crypto/fipsmodule/rsa/rsa_impl.c
@@ -243,37 +243,24 @@
}
// CRT components are only publicly bounded by their corresponding
- // moduli's bit lengths. |rsa->iqmp| is unused outside of this one-time
- // setup, so we do not compute a fixed-width version of it.
+ // moduli's bit lengths.
if (!ensure_fixed_copy(&rsa->dmp1_fixed, rsa->dmp1, p_fixed->width) ||
!ensure_fixed_copy(&rsa->dmq1_fixed, rsa->dmq1, q_fixed->width)) {
goto err;
}
- // Compute |inv_small_mod_large_mont|. Note that it is always modulo the
- // larger prime, independent of what is stored in |rsa->iqmp|.
- if (rsa->inv_small_mod_large_mont == NULL) {
- BIGNUM *inv_small_mod_large_mont = BN_new();
- int ok;
- if (BN_cmp(rsa->p, rsa->q) < 0) {
- ok = inv_small_mod_large_mont != NULL &&
- bn_mod_inverse_secret_prime(inv_small_mod_large_mont, rsa->p,
- rsa->q, ctx, rsa->mont_q) &&
- BN_to_montgomery(inv_small_mod_large_mont,
- inv_small_mod_large_mont, rsa->mont_q, ctx);
- } else {
- ok = inv_small_mod_large_mont != NULL &&
- BN_to_montgomery(inv_small_mod_large_mont, rsa->iqmp,
- rsa->mont_p, ctx);
- }
- if (!ok) {
- BN_free(inv_small_mod_large_mont);
+ // Compute |iqmp_mont|, which is |iqmp| in Montgomery form and with the
+ // correct bit width.
+ if (rsa->iqmp_mont == NULL) {
+ BIGNUM *iqmp_mont = BN_new();
+ if (iqmp_mont == NULL ||
+ !BN_to_montgomery(iqmp_mont, rsa->iqmp, rsa->mont_p, ctx)) {
+ BN_free(iqmp_mont);
goto err;
}
- rsa->inv_small_mod_large_mont = inv_small_mod_large_mont;
- CONSTTIME_SECRET(
- rsa->inv_small_mod_large_mont->d,
- sizeof(BN_ULONG) * rsa->inv_small_mod_large_mont->width);
+ rsa->iqmp_mont = iqmp_mont;
+ CONSTTIME_SECRET(rsa->iqmp_mont->d,
+ sizeof(BN_ULONG) * rsa->iqmp_mont->width);
}
}
}
@@ -302,8 +289,8 @@
rsa->dmp1_fixed = NULL;
BN_free(rsa->dmq1_fixed);
rsa->dmq1_fixed = NULL;
- BN_free(rsa->inv_small_mod_large_mont);
- rsa->inv_small_mod_large_mont = NULL;
+ BN_free(rsa->iqmp_mont);
+ rsa->iqmp_mont = NULL;
for (size_t i = 0; i < rsa->num_blindings; i++) {
BN_BLINDING_free(rsa->blindings[i]);
@@ -798,42 +785,37 @@
goto err;
}
- // Implementing RSA with CRT in constant-time is sensitive to which prime is
- // larger. Canonicalize fields so that |p| is the larger prime.
- const BIGNUM *dmp1 = rsa->dmp1_fixed, *dmq1 = rsa->dmq1_fixed;
- const BN_MONT_CTX *mont_p = rsa->mont_p, *mont_q = rsa->mont_q;
- if (BN_cmp(rsa->p, rsa->q) < 0) {
- mont_p = rsa->mont_q;
- mont_q = rsa->mont_p;
- dmp1 = rsa->dmq1_fixed;
- dmq1 = rsa->dmp1_fixed;
- }
-
// Use the minimal-width versions of |n|, |p|, and |q|. Either works, but if
// someone gives us non-minimal values, these will be slightly more efficient
// on the non-Montgomery operations.
const BIGNUM *n = &rsa->mont_n->N;
- const BIGNUM *p = &mont_p->N;
- const BIGNUM *q = &mont_q->N;
+ const BIGNUM *p = &rsa->mont_p->N;
+ const BIGNUM *q = &rsa->mont_q->N;
// This is a pre-condition for |mod_montgomery|. It was already checked by the
// caller.
assert(BN_ucmp(I, n) < 0);
if (// |m1| is the result modulo |q|.
- !mod_montgomery(r1, I, q, mont_q, p, ctx) ||
- !BN_mod_exp_mont_consttime(m1, r1, dmq1, q, ctx, mont_q) ||
+ !mod_montgomery(r1, I, q, rsa->mont_q, p, ctx) ||
+ !BN_mod_exp_mont_consttime(m1, r1, rsa->dmq1_fixed, q, ctx,
+ rsa->mont_q) ||
// |r0| is the result modulo |p|.
- !mod_montgomery(r1, I, p, mont_p, q, ctx) ||
- !BN_mod_exp_mont_consttime(r0, r1, dmp1, p, ctx, mont_p) ||
- // Compute r0 = r0 - m1 mod p. |p| is the larger prime, so |m1| is already
- // fully reduced mod |p|.
- !bn_mod_sub_consttime(r0, r0, m1, p, ctx) ||
+ !mod_montgomery(r1, I, p, rsa->mont_p, q, ctx) ||
+ !BN_mod_exp_mont_consttime(r0, r1, rsa->dmp1_fixed, p, ctx,
+ rsa->mont_p) ||
+ // Compute r0 = r0 - m1 mod p. |m1| is reduced mod |q|, not |p|, so we
+ // just run |mod_montgomery| again for simplicity. This could be more
+ // efficient with more cases: if |p > q|, |m1| is already reduced. If
+ // |p < q| but they have the same bit width, |bn_reduce_once| suffices.
+ // However, compared to over 2048 Montgomery multiplications above, this
+ // difference is not measurable.
+ !mod_montgomery(r1, m1, p, rsa->mont_p, q, ctx) ||
+ !bn_mod_sub_consttime(r0, r0, r1, p, ctx) ||
// r0 = r0 * iqmp mod p. We use Montgomery multiplication to compute this
- // in constant time. |inv_small_mod_large_mont| is in Montgomery form and
- // r0 is not, so the result is taken out of Montgomery form.
- !BN_mod_mul_montgomery(r0, r0, rsa->inv_small_mod_large_mont, mont_p,
- ctx) ||
+ // in constant time. |iqmp_mont| is in Montgomery form and r0 is not, so
+ // the result is taken out of Montgomery form.
+ !BN_mod_mul_montgomery(r0, r0, rsa->iqmp_mont, rsa->mont_p, ctx) ||
// r0 = r0 * q + m1 gives the final result. Reducing modulo q gives m1, so
// it is correct mod p. Reducing modulo p gives (r0-m1)*iqmp*q + m1 = r0,
// so it is correct mod q. Finally, the result is bounded by [m1, n + m1),
@@ -1308,8 +1290,7 @@
replace_bignum(&rsa->d_fixed, &tmp->d_fixed);
replace_bignum(&rsa->dmp1_fixed, &tmp->dmp1_fixed);
replace_bignum(&rsa->dmq1_fixed, &tmp->dmq1_fixed);
- replace_bignum(&rsa->inv_small_mod_large_mont,
- &tmp->inv_small_mod_large_mont);
+ replace_bignum(&rsa->iqmp_mont, &tmp->iqmp_mont);
rsa->private_key_frozen = tmp->private_key_frozen;
ret = 1;
diff --git a/crypto/rsa_extra/rsa_test.cc b/crypto/rsa_extra/rsa_test.cc
index 87e0396..e332cdf 100644
--- a/crypto/rsa_extra/rsa_test.cc
+++ b/crypto/rsa_extra/rsa_test.cc
@@ -983,7 +983,7 @@
EXPECT_FALSE(rsa->d_fixed);
EXPECT_FALSE(rsa->dmp1_fixed);
EXPECT_FALSE(rsa->dmq1_fixed);
- EXPECT_FALSE(rsa->inv_small_mod_large_mont);
+ EXPECT_FALSE(rsa->iqmp_mont);
EXPECT_FALSE(rsa->private_key_frozen);
// Failed key generations leave the previous contents alone.