OpenSSL: make final reduction in Montgomery multiplication constant-time.

(The issue was reported by Shay Gueron.)

The final reduction in Montgomery multiplication computes if (X >= m) then X =
X - m else X = X

In OpenSSL, this was done by computing T = X - m,  doing a constant-time
selection of the *addresses* of X and T, and loading from the resulting
address. But this is not cache-neutral.

This patch changes the behaviour by loading both X and T into registers, and
doing a constant-time selection of the *values*.

TODO(fork): only some of the fixes from the original patch still apply to
the 1.0.2 code.
3 files changed