Test and fix an ABI issue with small parameters.

Calling conventions must specify how to handle arguments smaller than a
machine word. Should the caller pad them up to a machine word size with
predictable values (zero/sign-extended), or should the callee tolerate
an arbitrary bit pattern?

Annoyingly, I found no text in either SysV or Win64 ABI documentation
describing any of this and resorted to experiment. The short answer is
that callees must tolerate an arbitrary bit pattern on x86_64, which
means we must test this. See the comment in abi_test::internal::ToWord
for the long answer.

CHECK_ABI now, if the type of the parameter is smaller than
crypto_word_t, fills the remaining bytes with 0xaa. This is so the
number is out of bounds for code expecting either zero or sign
extension. (Not that crypto assembly has any business seeing negative
numbers.)

Doing so reveals a bug in ecp_nistz256_ord_sqr_mont. The rep parameter
is typed int, but the code expected uint64_t. In practice, the compiler
will always compile this correctly because:

- On both Win64 and SysV, rep is a register parameter.

- The rep parameter is always a constant, so the compiler has no reason
  to leave garbage in the upper half.

However, I was indeed able to get a bug out of GCC via:

  uint64_t foo = (1ull << 63) | 2;  // Some global the compiler can't
                                    // prove constant.
  ecp_nistz256_ord_sqr_mont(res, a, foo >> 1);

Were ecp_nistz256_ord_sqr_mont a true int-taking function, this would
act like ecp_nistz256_ord_sqr_mont(res, a, 1). Instead, it hung. Fix
this by having it take a full-width word.

This mess has several consequences:

- ABI testing now ideally needs a functional testing component to fully cover
  this case. A bad input might merely produce the wrong answer. Still,
  this is fairly effective as it will cause most code to either segfault
  or loop forever. (Not the enc parameter to AES however...)

- We cannot freely change the type of assembly function prototypes. If the
  prototype says int or unsigned, it must be ignoring the upper half and
  thus "fixing" it to size_t cannot have handled the full range. (Unless
  it was simply wrong of the parameter is already bounded.) If the
  prototype says size_t, switching to int or unsigned will hit this type
  of bug. The former is a safer failure mode though.

- The simplest path out of this mess: new assembly code should *only*
  ever take word-sized parameters. This is not a tall order as the bad
  parameters are usually ints that should have been size_t.

Calling conventions are hard.

Change-Id: If8254aff8953844679fbce4bd3e345e5e2fa5213
Reviewed-on: https://boringssl-review.googlesource.com/c/34627
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
diff --git a/crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl b/crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl
index 2f5ae86..5402885 100755
--- a/crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl
+++ b/crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl
@@ -488,7 +488,7 @@
 # void ecp_nistz256_ord_sqr_mont(
 #   uint64_t res[4],
 #   uint64_t a[4],
-#   int rep);
+#   uint64_t rep);
 
 .globl	ecp_nistz256_ord_sqr_mont
 .type	ecp_nistz256_ord_sqr_mont,\@function,3
diff --git a/crypto/fipsmodule/ec/p256-x86_64.h b/crypto/fipsmodule/ec/p256-x86_64.h
index 2d70ca7..5deb81a 100644
--- a/crypto/fipsmodule/ec/p256-x86_64.h
+++ b/crypto/fipsmodule/ec/p256-x86_64.h
@@ -89,7 +89,7 @@
 // outputs are in Montgomery form. That is, |res| is
 // (|a| * 2^-256)^(2*|rep|) * 2^256 mod N.
 void ecp_nistz256_ord_sqr_mont(BN_ULONG res[P256_LIMBS],
-                               const BN_ULONG a[P256_LIMBS], int rep);
+                               const BN_ULONG a[P256_LIMBS], BN_ULONG rep);
 
 // beeu_mod_inverse_vartime sets out = a^-1 mod p using a Euclidean algorithm.
 // Assumption: 0 < a < p < 2^(256) and p is odd.
diff --git a/crypto/test/abi_test.h b/crypto/test/abi_test.h
index 4ff42d1..91b2d42 100644
--- a/crypto/test/abi_test.h
+++ b/crypto/test/abi_test.h
@@ -127,6 +127,57 @@
                             const crypto_word_t *argv, size_t argc,
                             bool unwind);
 
+template <typename T>
+inline crypto_word_t ToWord(T t) {
+  static_assert(sizeof(T) <= sizeof(crypto_word_t),
+                "T is larger than crypto_word_t");
+  // Functions declared to take arguments smaller than native words cannot
+  // assume anything about the unused bits.
+  //
+  // TODO(davidben): Find authoritative citations for all supported assembly
+  // architectures. This is based on observed behavior in Clang, GCC, and MSVC
+  // for x86_64. The results are complex.
+  //
+  // ABI rules here may be inferred from two kinds of experiments:
+  //
+  // 1. When passing a value to a small-argument-taking function, does the
+  //    compiler ensure unused bits are cleared, sign-extended, etc.? Tests for
+  //    register parameters are confounded by x86_64's implicit clearing of
+  //    registers' upper halves, but passing some_u64 >> 1 usually clears this.
+  //
+  // 2. When compiling a small-argument-taking function, does the compiler make
+  //    assumptions about unused bits of arguments?
+  //
+  // Stack parameters are straightforward. As both caller and callee, all
+  // compilers consistently use the minimally-sized read and write. Both SysV
+  // and Windows ABIs tolerate and produce arbitrary values for unused stack
+  // parameter bits.
+  //
+  // MSVC also appears to tolerate and produce arbitrary values for unused
+  // register parameter bits. The SysV ABI is messier. GCC and Clang tolerate
+  // and produce arbitrary values for the upper 32 bits of each register, but
+  // types smaller than |int| are promoted before passing to a register. (Zero
+  // or sign extension depends on signedness of the type.) When compiling a
+  // callee, Clang takes advantage of this conversion, but I was unable to make
+  // GCC do so.
+  //
+  // Note that, although the Win64 rules are sufficient to require our assembly
+  // be conservative, we wish for |CHECK_ABI| to support C-compiled functions,
+  // so it must enforce the correct rules for each platform.
+  //
+  // This is all a mess so, for now, do not support parameter types smaller than
+  // |int| in |CHECK_ABI|. In practice, assembly functions only use 4- and
+  // 8-byte values. (And, given this behavior, we should avoid parameters
+  // smaller than native words in all new code.)
+  static_assert(sizeof(T) >= 4, "types under four bytes are complicated");
+  crypto_word_t ret;
+  // Filling extra bits with 0xaa will be vastly out of bounds for code
+  // expecting either sign- or zero-extension. (0xaa is 0b10101010.)
+  OPENSSL_memset(&ret, 0xaa, sizeof(ret));
+  OPENSSL_memcpy(&ret, &t, sizeof(t));
+  return ret;
+}
+
 // CheckImpl runs |func| on |args|, recording ABI errors in |out|. If |unwind|
 // is true and unwind tests have been enabled, |func| is single-stepped under an
 // unwind test.
@@ -144,7 +195,7 @@
 
   // Allocate one extra entry so MSVC does not complain about zero-size arrays.
   crypto_word_t argv[sizeof...(args) + 1] = {
-      (crypto_word_t)args...,
+      ToWord(args)...,
   };
   return RunTrampoline(out, reinterpret_cast<crypto_word_t>(func), argv,
                        sizeof...(args), unwind);