Remove p > q normalization in RSA keys RSA CRT is tiny bit messier when p < q. https://boringssl-review.googlesource.com/25263 solved this by normalizing to p > q. The cost was we sometimes had to compute a new iqmp. Modular inversion is expensive. We did it only once per key, but it's still a performance cliff in per-key costs. When later work moves freeze_private_key into RSA private key parsing, it will be a performance cliff in the private key parser. Instead, just handle p < q in the CRT function. The only difference is needing one extra reduction before the modular subtraction. Even using the fully general mod_montgomery function (as opposed to checking p < q, or using bn_reduce_once when num_bits(p) == num_bits(q)) was not measurable. In doing so, I noticed we didn't actually have tests that exercise the reduction step. I added one to evp_tests.txt, but it is only meaningful when blinding is disabled. (Another cost of blinding.) When blinding is enabled, the answers mod p and q are randomized and we hit this case with about 1.8% probability. See comment in evp_test.txt. I kept the optimization where we store iqmp in Montgomery form, not because the optimization matters, but because we need to store a corrected, fixed-width version of the value anyway, so we may as well store it in a more convenient form. M1 Max Before: Did 9048 RSA 2048 signing operations in 5033403us (1797.6 ops/sec) Did 1500 RSA 4096 signing operations in 5009288us (299.4 ops/sec) After: Did 9116 RSA 2048 signing operations in 5053802us (1803.8 ops/sec) [+0.3%] Did 1500 RSA 4096 signing operations in 5008283us (299.5 ops/sec) [+0.0%] Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz Before: Did 9282 RSA 2048 signing operations in 5019395us (1849.2 ops/sec) Did 1302 RSA 4096 signing operations in 5055011us (257.6 ops/sec) After: Did 9240 RSA 2048 signing operations in 5024845us (1838.9 ops/sec) [-0.6%] Did 1302 RSA 4096 signing operations in 5046157us (258.0 ops/sec) [+0.2%] Bug: 316 Change-Id: Icb90c7d5f5188f9b69a6d7bcc63db13d92ec26d5 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/60705 Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: Adam Langley <agl@google.com>
diff --git a/crypto/evp/evp_test.cc b/crypto/evp/evp_test.cc index 9c4f14d..fafd50b 100644 --- a/crypto/evp/evp_test.cc +++ b/crypto/evp/evp_test.cc
@@ -125,7 +125,7 @@ return EVP_PKEY_NONE; } -static int GetRSAPadding(FileTest *t, int *out, const std::string &name) { +static bool GetRSAPadding(FileTest *t, int *out, const std::string &name) { if (name == "PKCS1") { *out = RSA_PKCS1_PADDING; return true; @@ -138,6 +138,10 @@ *out = RSA_PKCS1_OAEP_PADDING; return true; } + if (name == "None") { + *out = RSA_NO_PADDING; + return true; + } ADD_FAILURE() << "Unknown RSA padding mode: " << name; return false; }
diff --git a/crypto/evp/evp_tests.txt b/crypto/evp/evp_tests.txt index 238a602..cbce1b0 100644 --- a/crypto/evp/evp_tests.txt +++ b/crypto/evp/evp_tests.txt
@@ -655,6 +655,16 @@ Input = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" Output = 07fa4e3de9c002c41c952dc292ef5a814c4c17dc1a6cf958c4c971e8089676d6661b442270ef9295c41e5385c9628aa1bdee2cc2558b8473ba212f2ba04b9ff2264c19187b9506b1d0a1cc2751844cc8dedf555d62ce81bc0e70bfe83d0184ee964593af91b9b327c0fb272c799148cd8737d412cbf36c2ad25fd66977bf805f +# The input is 1 (mod p) and q - 1 (mod q). If the CRT implementation does not +# account for p < q, the subtraction step will break. However, this test only +# works when RSA blinding is disabled. When blinding is enabled, these values +# are randomized and, instead, either this or the above test will hit this case +# if repeated enough. (It hits it with probability (q-p)^2 / 2pq, or 1.8% here.) +Encrypt = RSA-Swapped +RSAPadding = None +Input = 7ab490e1f5e718ff23a9e738f9867559e3a6ea72332e3bcc1b6f58585218ed815a865dab75e3f44ea550bdef815101d6039251a513fd99188c1916abb94b15ef72de793f6472a342b33a125cfc96a4fc89d140e337f5fe6ff937ed3f34b6b09f5227f598e12faddf4423254acbee748197bed26954502c7484c65e279fed7fed +CheckDecrypt + # Though we will never generate such a key, test that RSA keys where p and q are # different sizes work properly. PrivateKey = RSA-PrimeMismatch
diff --git a/crypto/fipsmodule/rsa/internal.h b/crypto/fipsmodule/rsa/internal.h index c6bf60a..5a993a2 100644 --- a/crypto/fipsmodule/rsa/internal.h +++ b/crypto/fipsmodule/rsa/internal.h
@@ -103,10 +103,8 @@ // the |freeze_private_key| implementation. BIGNUM *d_fixed, *dmp1_fixed, *dmq1_fixed; - // inv_small_mod_large_mont is q^-1 mod p in Montgomery form, using |mont_p|, - // if |p| >= |q|. Otherwise, it is p^-1 mod q in Montgomery form, using - // |mont_q|. - BIGNUM *inv_small_mod_large_mont; + // iqmp_mont is q^-1 mod p in Montgomery form, using |mont_p|. + BIGNUM *iqmp_mont; // num_blindings contains the size of the |blindings| and |blindings_inuse| // arrays. This member and the |blindings_inuse| array are protected by
diff --git a/crypto/fipsmodule/rsa/rsa_impl.c b/crypto/fipsmodule/rsa/rsa_impl.c index 1206397..6cdc290 100644 --- a/crypto/fipsmodule/rsa/rsa_impl.c +++ b/crypto/fipsmodule/rsa/rsa_impl.c
@@ -243,37 +243,24 @@ } // CRT components are only publicly bounded by their corresponding - // moduli's bit lengths. |rsa->iqmp| is unused outside of this one-time - // setup, so we do not compute a fixed-width version of it. + // moduli's bit lengths. if (!ensure_fixed_copy(&rsa->dmp1_fixed, rsa->dmp1, p_fixed->width) || !ensure_fixed_copy(&rsa->dmq1_fixed, rsa->dmq1, q_fixed->width)) { goto err; } - // Compute |inv_small_mod_large_mont|. Note that it is always modulo the - // larger prime, independent of what is stored in |rsa->iqmp|. - if (rsa->inv_small_mod_large_mont == NULL) { - BIGNUM *inv_small_mod_large_mont = BN_new(); - int ok; - if (BN_cmp(rsa->p, rsa->q) < 0) { - ok = inv_small_mod_large_mont != NULL && - bn_mod_inverse_secret_prime(inv_small_mod_large_mont, rsa->p, - rsa->q, ctx, rsa->mont_q) && - BN_to_montgomery(inv_small_mod_large_mont, - inv_small_mod_large_mont, rsa->mont_q, ctx); - } else { - ok = inv_small_mod_large_mont != NULL && - BN_to_montgomery(inv_small_mod_large_mont, rsa->iqmp, - rsa->mont_p, ctx); - } - if (!ok) { - BN_free(inv_small_mod_large_mont); + // Compute |iqmp_mont|, which is |iqmp| in Montgomery form and with the + // correct bit width. + if (rsa->iqmp_mont == NULL) { + BIGNUM *iqmp_mont = BN_new(); + if (iqmp_mont == NULL || + !BN_to_montgomery(iqmp_mont, rsa->iqmp, rsa->mont_p, ctx)) { + BN_free(iqmp_mont); goto err; } - rsa->inv_small_mod_large_mont = inv_small_mod_large_mont; - CONSTTIME_SECRET( - rsa->inv_small_mod_large_mont->d, - sizeof(BN_ULONG) * rsa->inv_small_mod_large_mont->width); + rsa->iqmp_mont = iqmp_mont; + CONSTTIME_SECRET(rsa->iqmp_mont->d, + sizeof(BN_ULONG) * rsa->iqmp_mont->width); } } } @@ -302,8 +289,8 @@ rsa->dmp1_fixed = NULL; BN_free(rsa->dmq1_fixed); rsa->dmq1_fixed = NULL; - BN_free(rsa->inv_small_mod_large_mont); - rsa->inv_small_mod_large_mont = NULL; + BN_free(rsa->iqmp_mont); + rsa->iqmp_mont = NULL; for (size_t i = 0; i < rsa->num_blindings; i++) { BN_BLINDING_free(rsa->blindings[i]); @@ -798,42 +785,37 @@ goto err; } - // Implementing RSA with CRT in constant-time is sensitive to which prime is - // larger. Canonicalize fields so that |p| is the larger prime. - const BIGNUM *dmp1 = rsa->dmp1_fixed, *dmq1 = rsa->dmq1_fixed; - const BN_MONT_CTX *mont_p = rsa->mont_p, *mont_q = rsa->mont_q; - if (BN_cmp(rsa->p, rsa->q) < 0) { - mont_p = rsa->mont_q; - mont_q = rsa->mont_p; - dmp1 = rsa->dmq1_fixed; - dmq1 = rsa->dmp1_fixed; - } - // Use the minimal-width versions of |n|, |p|, and |q|. Either works, but if // someone gives us non-minimal values, these will be slightly more efficient // on the non-Montgomery operations. const BIGNUM *n = &rsa->mont_n->N; - const BIGNUM *p = &mont_p->N; - const BIGNUM *q = &mont_q->N; + const BIGNUM *p = &rsa->mont_p->N; + const BIGNUM *q = &rsa->mont_q->N; // This is a pre-condition for |mod_montgomery|. It was already checked by the // caller. assert(BN_ucmp(I, n) < 0); if (// |m1| is the result modulo |q|. - !mod_montgomery(r1, I, q, mont_q, p, ctx) || - !BN_mod_exp_mont_consttime(m1, r1, dmq1, q, ctx, mont_q) || + !mod_montgomery(r1, I, q, rsa->mont_q, p, ctx) || + !BN_mod_exp_mont_consttime(m1, r1, rsa->dmq1_fixed, q, ctx, + rsa->mont_q) || // |r0| is the result modulo |p|. - !mod_montgomery(r1, I, p, mont_p, q, ctx) || - !BN_mod_exp_mont_consttime(r0, r1, dmp1, p, ctx, mont_p) || - // Compute r0 = r0 - m1 mod p. |p| is the larger prime, so |m1| is already - // fully reduced mod |p|. - !bn_mod_sub_consttime(r0, r0, m1, p, ctx) || + !mod_montgomery(r1, I, p, rsa->mont_p, q, ctx) || + !BN_mod_exp_mont_consttime(r0, r1, rsa->dmp1_fixed, p, ctx, + rsa->mont_p) || + // Compute r0 = r0 - m1 mod p. |m1| is reduced mod |q|, not |p|, so we + // just run |mod_montgomery| again for simplicity. This could be more + // efficient with more cases: if |p > q|, |m1| is already reduced. If + // |p < q| but they have the same bit width, |bn_reduce_once| suffices. + // However, compared to over 2048 Montgomery multiplications above, this + // difference is not measurable. + !mod_montgomery(r1, m1, p, rsa->mont_p, q, ctx) || + !bn_mod_sub_consttime(r0, r0, r1, p, ctx) || // r0 = r0 * iqmp mod p. We use Montgomery multiplication to compute this - // in constant time. |inv_small_mod_large_mont| is in Montgomery form and - // r0 is not, so the result is taken out of Montgomery form. - !BN_mod_mul_montgomery(r0, r0, rsa->inv_small_mod_large_mont, mont_p, - ctx) || + // in constant time. |iqmp_mont| is in Montgomery form and r0 is not, so + // the result is taken out of Montgomery form. + !BN_mod_mul_montgomery(r0, r0, rsa->iqmp_mont, rsa->mont_p, ctx) || // r0 = r0 * q + m1 gives the final result. Reducing modulo q gives m1, so // it is correct mod p. Reducing modulo p gives (r0-m1)*iqmp*q + m1 = r0, // so it is correct mod q. Finally, the result is bounded by [m1, n + m1), @@ -1308,8 +1290,7 @@ replace_bignum(&rsa->d_fixed, &tmp->d_fixed); replace_bignum(&rsa->dmp1_fixed, &tmp->dmp1_fixed); replace_bignum(&rsa->dmq1_fixed, &tmp->dmq1_fixed); - replace_bignum(&rsa->inv_small_mod_large_mont, - &tmp->inv_small_mod_large_mont); + replace_bignum(&rsa->iqmp_mont, &tmp->iqmp_mont); rsa->private_key_frozen = tmp->private_key_frozen; ret = 1;
diff --git a/crypto/rsa_extra/rsa_test.cc b/crypto/rsa_extra/rsa_test.cc index 87e0396..e332cdf 100644 --- a/crypto/rsa_extra/rsa_test.cc +++ b/crypto/rsa_extra/rsa_test.cc
@@ -983,7 +983,7 @@ EXPECT_FALSE(rsa->d_fixed); EXPECT_FALSE(rsa->dmp1_fixed); EXPECT_FALSE(rsa->dmq1_fixed); - EXPECT_FALSE(rsa->inv_small_mod_large_mont); + EXPECT_FALSE(rsa->iqmp_mont); EXPECT_FALSE(rsa->private_key_frozen); // Failed key generations leave the previous contents alone.