Switch to new fiat pipeline.

This new version makes it much easier to tell which code is handwritten
and which is verified. For some reason, it also is *dramatically* faster
for 32-bit x86 GCC. Clang x86_64, however, does take a small hit.
Benchmarks below.

x86, GCC 7.3.0, OPENSSL_SMALL
(For some reason, GCC used to be really bad at compiling the 32-bit curve25519
code. The new one fixes this. I'm not sure what changed.)
Before:
Did 17135 Ed25519 key generation operations in 10026402us (1709.0 ops/sec)
Did 17170 Ed25519 signing operations in 10074192us (1704.4 ops/sec)
Did 9180 Ed25519 verify operations in 10034025us (914.9 ops/sec)
Did 17271 Curve25519 base-point multiplication operations in 10050837us (1718.4 ops/sec)
Did 10605 Curve25519 arbitrary point multiplication operations in 10047714us (1055.5 ops/sec)
Did 7800 ECDH P-256 operations in 10018331us (778.6 ops/sec)
Did 24308 ECDSA P-256 signing operations in 10019241us (2426.1 ops/sec)
Did 9191 ECDSA P-256 verify operations in 10081639us (911.7 ops/sec)
After:
Did 99873 Ed25519 key generation operations in 10021810us (9965.6 ops/sec) [+483.1%]
Did 99960 Ed25519 signing operations in 10052236us (9944.1 ops/sec) [+483.4%]
Did 53676 Ed25519 verify operations in 10009078us (5362.7 ops/sec) [+486.2%]
Did 102000 Curve25519 base-point multiplication operations in 10039764us (10159.6 ops/sec) [+491.2%]
Did 60802 Curve25519 arbitrary point multiplication operations in 10056897us (6045.8 ops/sec) [+472.8%]
Did 7900 ECDH P-256 operations in 10054509us (785.7 ops/sec) [+0.9%]
Did 24926 ECDSA P-256 signing operations in 10050919us (2480.0 ops/sec) [+2.2%]
Did 9494 ECDSA P-256 verify operations in 10064659us (943.3 ops/sec) [+3.5%]

x86, Clang 8.0.0 trunk 349417, OPENSSL_SMALL
Before:
Did 82750 Ed25519 key generation operations in 10051177us (8232.9 ops/sec)
Did 82400 Ed25519 signing operations in 10035806us (8210.6 ops/sec)
Did 41511 Ed25519 verify operations in 10048919us (4130.9 ops/sec)
Did 83300 Curve25519 base-point multiplication operations in 10044283us (8293.3 ops/sec)
Did 49700 Curve25519 arbitrary point multiplication operations in 10007005us (4966.5 ops/sec)
Did 14039 ECDH P-256 operations in 10093929us (1390.8 ops/sec)
Did 40950 ECDSA P-256 signing operations in 10006757us (4092.2 ops/sec)
Did 16068 ECDSA P-256 verify operations in 10095996us (1591.5 ops/sec)
After:
Did 80476 Ed25519 key generation operations in 10048648us (8008.6 ops/sec) [-2.7%]
Did 79050 Ed25519 signing operations in 10049180us (7866.3 ops/sec) [-4.2%]
Did 40501 Ed25519 verify operations in 10048347us (4030.6 ops/sec) [-2.4%]
Did 81300 Curve25519 base-point multiplication operations in 10017480us (8115.8 ops/sec) [-2.1%]
Did 48278 Curve25519 arbitrary point multiplication operations in 10092500us (4783.6 ops/sec) [-3.7%]
Did 15402 ECDH P-256 operations in 10096705us (1525.4 ops/sec) [+9.7%]
Did 44200 ECDSA P-256 signing operations in 10037715us (4403.4 ops/sec) [+7.6%]
Did 17000 ECDSA P-256 verify operations in 10008813us (1698.5 ops/sec) [+6.7%]

x86_64, GCC 7.3.0
(Note these P-256 numbers are not affected by this change. Included to get a
sense of noise.)
Before:
Did 557000 Ed25519 key generation operations in 10011721us (55634.8 ops/sec)
Did 550000 Ed25519 signing operations in 10016449us (54909.7 ops/sec)
Did 190000 Ed25519 verify operations in 10014565us (18972.4 ops/sec)
Did 587000 Curve25519 base-point multiplication operations in 10015402us (58609.7 ops/sec)
Did 230000 Curve25519 arbitrary point multiplication operations in 10023827us (22945.3 ops/sec)
Did 179000 ECDH P-256 operations in 10016294us (17870.9 ops/sec)
Did 557000 ECDSA P-256 signing operations in 10014158us (55621.3 ops/sec)
Did 198000 ECDSA P-256 verify operations in 10036694us (19727.6 ops/sec)
After:
Did 569000 Ed25519 key generation operations in 10004965us (56871.8 ops/sec) [+2.2%]
Did 563000 Ed25519 signing operations in 10000064us (56299.6 ops/sec) [+2.5%]
Did 196000 Ed25519 verify operations in 10025650us (19549.9 ops/sec) [+3.0%]
Did 596000 Curve25519 base-point multiplication operations in 10008666us (59548.4 ops/sec) [+1.6%]
Did 229000 Curve25519 arbitrary point multiplication operations in 10028921us (22834.0 ops/sec) [-0.5%]
Did 182910 ECDH P-256 operations in 10014905us (18263.8 ops/sec) [+2.2%]
Did 562000 ECDSA P-256 signing operations in 10011944us (56133.0 ops/sec) [+0.9%]
Did 202000 ECDSA P-256 verify operations in 10046901us (20105.7 ops/sec) [+1.9%]

x86_64, GCC 7.3.0, OPENSSL_SMALL
Before:
Did 350000 Ed25519 key generation operations in 10002540us (34991.1 ops/sec)
Did 344000 Ed25519 signing operations in 10010420us (34364.2 ops/sec)
Did 197000 Ed25519 verify operations in 10030593us (19639.9 ops/sec)
Did 362000 Curve25519 base-point multiplication operations in 10004615us (36183.3 ops/sec)
Did 235000 Curve25519 arbitrary point multiplication operations in 10025951us (23439.2 ops/sec)
Did 32032 ECDH P-256 operations in 10056486us (3185.2 ops/sec)
Did 96354 ECDSA P-256 signing operations in 10007297us (9628.4 ops/sec)
Did 37774 ECDSA P-256 verify operations in 10044892us (3760.5 ops/sec)
After:
Did 343000 Ed25519 key generation operations in 10025108us (34214.1 ops/sec) [-2.2%]
Did 340000 Ed25519 signing operations in 10014870us (33949.5 ops/sec) [-1.2%]
Did 192000 Ed25519 verify operations in 10025082us (19152.0 ops/sec) [-2.5%]
Did 355000 Curve25519 base-point multiplication operations in 10013220us (35453.1 ops/sec) [-2.0%]
Did 231000 Curve25519 arbitrary point multiplication operations in 10010775us (23075.1 ops/sec) [-1.6%]
Did 31540 ECDH P-256 operations in 10009664us (3151.0 ops/sec) [-1.1%]
Did 99012 ECDSA P-256 signing operations in 10090296us (9812.6 ops/sec) [+1.9%]
Did 37695 ECDSA P-256 verify operations in 10092859us (3734.8 ops/sec) [-0.7%]

x86_64, Clang 8.0.0 trunk 349417
(Note these P-256 numbers are not affected by this change. Included to get a
sense of noise.)
Before:
Did 600000 Ed25519 key generation operations in 10000278us (59998.3 ops/sec)
Did 595000 Ed25519 signing operations in 10010375us (59438.3 ops/sec)
Did 184000 Ed25519 verify operations in 10013984us (18374.3 ops/sec)
Did 636000 Curve25519 base-point multiplication operations in 10005250us (63566.6 ops/sec)
Did 229000 Curve25519 arbitrary point multiplication operations in 10006059us (22886.1 ops/sec)
Did 179250 ECDH P-256 operations in 10026354us (17877.9 ops/sec)
Did 547000 ECDSA P-256 signing operations in 10017585us (54604.0 ops/sec)
Did 197000 ECDSA P-256 verify operations in 10013020us (19674.4 ops/sec)
After:
Did 560000 Ed25519 key generation operations in 10009295us (55948.0 ops/sec) [-6.8%]
Did 548000 Ed25519 signing operations in 10007912us (54756.7 ops/sec) [-7.9%]
Did 170000 Ed25519 verify operations in 10056948us (16903.7 ops/sec) [-8.0%]
Did 592000 Curve25519 base-point multiplication operations in 10016818us (59100.6 ops/sec) [-7.0%]
Did 214000 Curve25519 arbitrary point multiplication operations in 10043918us (21306.4 ops/sec) [-6.9%]
Did 180000 ECDH P-256 operations in 10026019us (17953.3 ops/sec) [+0.4%]
Did 550000 ECDSA P-256 signing operations in 10004943us (54972.8 ops/sec) [+0.7%]
Did 198000 ECDSA P-256 verify operations in 10021714us (19757.1 ops/sec) [+0.4%]

x86_64, Clang 8.0.0 trunk 349417, OPENSSL_SMALL
Before:
Did 326000 Ed25519 key generation operations in 10003266us (32589.4 ops/sec)
Did 322000 Ed25519 signing operations in 10026783us (32114.0 ops/sec)
Did 181000 Ed25519 verify operations in 10015635us (18071.7 ops/sec)
Did 335000 Curve25519 base-point multiplication operations in 10000359us (33498.8 ops/sec)
Did 224000 Curve25519 arbitrary point multiplication operations in 10027245us (22339.1 ops/sec)
Did 68552 ECDH P-256 operations in 10018900us (6842.3 ops/sec)
Did 184000 ECDSA P-256 signing operations in 10014516us (18373.3 ops/sec)
Did 76020 ECDSA P-256 verify operations in 10016891us (7589.2 ops/sec)
After:
Did 310000 Ed25519 key generation operations in 10022086us (30931.7 ops/sec) [-5.1%]
Did 308000 Ed25519 signing operations in 10007543us (30776.8 ops/sec) [-4.2%]
Did 173000 Ed25519 verify operations in 10005829us (17289.9 ops/sec) [-4.3%]
Did 321000 Curve25519 base-point multiplication operations in 10027058us (32013.4 ops/sec) [-4.4%]
Did 212000 Curve25519 arbitrary point multiplication operations in 10015203us (21167.8 ops/sec) [-5.2%]
Did 64059 ECDH P-256 operations in 10042781us (6378.6 ops/sec) [-6.8%]
Did 170000 ECDSA P-256 signing operations in 10030896us (16947.6 ops/sec) [-7.8%]
Did 72176 ECDSA P-256 verify operations in 10075369us (7163.6 ops/sec) [-5.6%]

Bug: 254
Change-Id: Ib04c773f01b542bcb8611cceb582466bfa6f6d52
Reviewed-on: https://boringssl-review.googlesource.com/c/34306
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
diff --git a/third_party/fiat/METADATA b/third_party/fiat/METADATA
index 6cd1612..0e4012f 100644
--- a/third_party/fiat/METADATA
+++ b/third_party/fiat/METADATA
@@ -6,8 +6,8 @@
     type: GIT
     value: "https://github.com/mit-plv/fiat-crypto"
   }
-  version: "6c4d4afb26de639718fcac39094353ca7feec365"
-  last_upgrade_date { year: 2017 month: 11 day: 3 }
+  version: "4441785fb44b88bb6943ddbf639d872c8c903281"
+  last_upgrade_date { year: 2019 month: 1 day: 16 }
 
   local_modifications: "Fiat-generated code has been integrated into existing BoringSSL code"
 }
diff --git a/third_party/fiat/curve25519.c b/third_party/fiat/curve25519.c
index b64956e..c5fa5da 100644
--- a/third_party/fiat/curve25519.c
+++ b/third_party/fiat/curve25519.c
@@ -45,8 +45,14 @@
 // Various pre-computed constants.
 #include "./curve25519_tables.h"
 
+#if defined(BORINGSSL_CURVE25519_64BIT)
+#include "./curve25519_64.c"
+#else
+#include "./curve25519_32.c"
+#endif  // BORINGSSL_CURVE25519_64BIT
 
-// Low-level intrinsic operations (hand-written).
+
+// Low-level intrinsic operations
 
 static uint64_t load_3(const uint8_t *in) {
   uint64_t result;
@@ -65,706 +71,111 @@
   return result;
 }
 
-#if defined(BORINGSSL_CURVE25519_64BIT)
-static uint64_t load_8(const uint8_t *in) {
-  uint64_t result;
-  result = (uint64_t)in[0];
-  result |= ((uint64_t)in[1]) << 8;
-  result |= ((uint64_t)in[2]) << 16;
-  result |= ((uint64_t)in[3]) << 24;
-  result |= ((uint64_t)in[4]) << 32;
-  result |= ((uint64_t)in[5]) << 40;
-  result |= ((uint64_t)in[6]) << 48;
-  result |= ((uint64_t)in[7]) << 56;
-  return result;
-}
-
-static uint8_t /*bool*/ addcarryx_u51(uint8_t /*bool*/ c, uint64_t a,
-                                      uint64_t b, uint64_t *low) {
-  // This function extracts 51 bits of result and 1 bit of carry (52 total), so
-  // a 64-bit intermediate is sufficient.
-  uint64_t x = a + b + c;
-  *low = x & ((UINT64_C(1) << 51) - 1);
-  return (x >> 51) & 1;
-}
-
-static uint8_t /*bool*/ subborrow_u51(uint8_t /*bool*/ c, uint64_t a,
-                                      uint64_t b, uint64_t *low) {
-  // This function extracts 51 bits of result and 1 bit of borrow (52 total), so
-  // a 64-bit intermediate is sufficient.
-  uint64_t x = a - b - c;
-  *low = x & ((UINT64_C(1) << 51) - 1);
-  return x >> 63;
-}
-
-static uint64_t cmovznz64(uint64_t t, uint64_t z, uint64_t nz) {
-  t = -!!t; // all set if nonzero, 0 if 0
-  return (t&nz) | ((~t)&z);
-}
-
-#else
-
-static uint8_t /*bool*/ addcarryx_u25(uint8_t /*bool*/ c, uint32_t a,
-                                      uint32_t b, uint32_t *low) {
-  // This function extracts 25 bits of result and 1 bit of carry (26 total), so
-  // a 32-bit intermediate is sufficient.
-  uint32_t x = a + b + c;
-  *low = x & ((1 << 25) - 1);
-  return (x >> 25) & 1;
-}
-
-static uint8_t /*bool*/ addcarryx_u26(uint8_t /*bool*/ c, uint32_t a,
-                                      uint32_t b, uint32_t *low) {
-  // This function extracts 26 bits of result and 1 bit of carry (27 total), so
-  // a 32-bit intermediate is sufficient.
-  uint32_t x = a + b + c;
-  *low = x & ((1 << 26) - 1);
-  return (x >> 26) & 1;
-}
-
-static uint8_t /*bool*/ subborrow_u25(uint8_t /*bool*/ c, uint32_t a,
-                                      uint32_t b, uint32_t *low) {
-  // This function extracts 25 bits of result and 1 bit of borrow (26 total), so
-  // a 32-bit intermediate is sufficient.
-  uint32_t x = a - b - c;
-  *low = x & ((1 << 25) - 1);
-  return x >> 31;
-}
-
-static uint8_t /*bool*/ subborrow_u26(uint8_t /*bool*/ c, uint32_t a,
-                                      uint32_t b, uint32_t *low) {
-  // This function extracts 26 bits of result and 1 bit of borrow (27 total), so
-  // a 32-bit intermediate is sufficient.
-  uint32_t x = a - b - c;
-  *low = x & ((1 << 26) - 1);
-  return x >> 31;
-}
-
-static uint32_t cmovznz32(uint32_t t, uint32_t z, uint32_t nz) {
-  t = -!!t; // all set if nonzero, 0 if 0
-  return (t&nz) | ((~t)&z);
-}
-
-#endif
-
 
 // Field operations.
 
 #if defined(BORINGSSL_CURVE25519_64BIT)
 
-#define assert_fe(f) do { \
-  for (unsigned _assert_fe_i = 0; _assert_fe_i< 5; _assert_fe_i++) { \
-    assert(f[_assert_fe_i] < 1.125*(UINT64_C(1)<<51)); \
-  } \
-} while (0)
+typedef uint64_t fe_limb_t;
+#define FE_NUM_LIMBS 5
 
-#define assert_fe_loose(f) do { \
-  for (unsigned _assert_fe_i = 0; _assert_fe_i< 5; _assert_fe_i++) { \
-    assert(f[_assert_fe_i] < 3.375*(UINT64_C(1)<<51)); \
-  } \
-} while (0)
-
-#define assert_fe_frozen(f) do { \
-  for (unsigned _assert_fe_i = 0; _assert_fe_i< 5; _assert_fe_i++) { \
-    assert(f[_assert_fe_i] < (UINT64_C(1)<<51)); \
-  } \
-} while (0)
-
-static void fe_frombytes_impl(uint64_t h[5], const uint8_t s[32]) {
-  // Ignores top bit of s.
-  uint64_t a0 = load_8(s);
-  uint64_t a1 = load_8(s+8);
-  uint64_t a2 = load_8(s+16);
-  uint64_t a3 = load_8(s+24);
-  // Use 51 bits, 64-51 = 13 left.
-  h[0] = a0 & ((UINT64_C(1) << 51) - 1);
-  // (64-51) + 38 = 13 + 38 = 51
-  h[1] = (a0 >> 51) | ((a1 & ((UINT64_C(1) << 38) - 1)) << 13);
-  // (64-38) + 25 = 26 + 25 = 51
-  h[2] = (a1 >> 38) | ((a2 & ((UINT64_C(1) << 25) - 1)) << 26);
-  // (64-25) + 12 = 39 + 12 = 51
-  h[3] = (a2 >> 25) | ((a3 & ((UINT64_C(1) << 12) - 1)) << 39);
-  // (64-12) = 52, ignore top bit
-  h[4] = (a3 >> 12) & ((UINT64_C(1) << 51) - 1);
-  assert_fe(h);
-}
-
-static void fe_frombytes(fe *h, const uint8_t s[32]) {
-  fe_frombytes_impl(h->v, s);
-}
-
-static void fe_freeze(uint64_t out[5], const uint64_t in1[5]) {
-  { const uint64_t x7 = in1[4];
-  { const uint64_t x8 = in1[3];
-  { const uint64_t x6 = in1[2];
-  { const uint64_t x4 = in1[1];
-  { const uint64_t x2 = in1[0];
-  { uint64_t x10; uint8_t/*bool*/ x11 = subborrow_u51(0x0, x2, 0x7ffffffffffed, &x10);
-  { uint64_t x13; uint8_t/*bool*/ x14 = subborrow_u51(x11, x4, 0x7ffffffffffff, &x13);
-  { uint64_t x16; uint8_t/*bool*/ x17 = subborrow_u51(x14, x6, 0x7ffffffffffff, &x16);
-  { uint64_t x19; uint8_t/*bool*/ x20 = subborrow_u51(x17, x8, 0x7ffffffffffff, &x19);
-  { uint64_t x22; uint8_t/*bool*/ x23 = subborrow_u51(x20, x7, 0x7ffffffffffff, &x22);
-  { uint64_t x24 = cmovznz64(x23, 0x0, 0xffffffffffffffffL);
-  { uint64_t x25 = (x24 & 0x7ffffffffffed);
-  { uint64_t x27; uint8_t/*bool*/ x28 = addcarryx_u51(0x0, x10, x25, &x27);
-  { uint64_t x29 = (x24 & 0x7ffffffffffff);
-  { uint64_t x31; uint8_t/*bool*/ x32 = addcarryx_u51(x28, x13, x29, &x31);
-  { uint64_t x33 = (x24 & 0x7ffffffffffff);
-  { uint64_t x35; uint8_t/*bool*/ x36 = addcarryx_u51(x32, x16, x33, &x35);
-  { uint64_t x37 = (x24 & 0x7ffffffffffff);
-  { uint64_t x39; uint8_t/*bool*/ x40 = addcarryx_u51(x36, x19, x37, &x39);
-  { uint64_t x41 = (x24 & 0x7ffffffffffff);
-  { uint64_t x43; addcarryx_u51(x40, x22, x41, &x43);
-  out[0] = x27;
-  out[1] = x31;
-  out[2] = x35;
-  out[3] = x39;
-  out[4] = x43;
-  }}}}}}}}}}}}}}}}}}}}}
-}
-
-static void fe_tobytes(uint8_t s[32], const fe *f) {
-  assert_fe(f->v);
-  uint64_t h[5];
-  fe_freeze(h, f->v);
-  assert_fe_frozen(h);
-
-  s[0] = h[0] >> 0;
-  s[1] = h[0] >> 8;
-  s[2] = h[0] >> 16;
-  s[3] = h[0] >> 24;
-  s[4] = h[0] >> 32;
-  s[5] = h[0] >> 40;
-  s[6] = (h[0] >> 48) | (h[1] << 3);
-  s[7] = h[1] >> 5;
-  s[8] = h[1] >> 13;
-  s[9] = h[1] >> 21;
-  s[10] = h[1] >> 29;
-  s[11] = h[1] >> 37;
-  s[12] = (h[1] >> 45) | (h[2] << 6);
-  s[13] = h[2] >> 2;
-  s[14] = h[2] >> 10;
-  s[15] = h[2] >> 18;
-  s[16] = h[2] >> 26;
-  s[17] = h[2] >> 34;
-  s[18] = h[2] >> 42;
-  s[19] = (h[2] >> 50) | (h[3] << 1);
-  s[20] = h[3] >> 7;
-  s[21] = h[3] >> 15;
-  s[22] = h[3] >> 23;
-  s[23] = h[3] >> 31;
-  s[24] = h[3] >> 39;
-  s[25] = (h[3] >> 47) | (h[4] << 4);
-  s[26] = h[4] >> 4;
-  s[27] = h[4] >> 12;
-  s[28] = h[4] >> 20;
-  s[29] = h[4] >> 28;
-  s[30] = h[4] >> 36;
-  s[31] = h[4] >> 44;
-}
-
-// h = 0
-static void fe_0(fe *h) {
-  OPENSSL_memset(h, 0, sizeof(fe));
-}
-
-static void fe_loose_0(fe_loose *h) {
-  OPENSSL_memset(h, 0, sizeof(fe_loose));
-}
-
-// h = 1
-static void fe_1(fe *h) {
-  OPENSSL_memset(h, 0, sizeof(fe));
-  h->v[0] = 1;
-}
-
-static void fe_loose_1(fe_loose *h) {
-  OPENSSL_memset(h, 0, sizeof(fe_loose));
-  h->v[0] = 1;
-}
-
-static void fe_add_impl(uint64_t out[5], const uint64_t in1[5], const uint64_t in2[5]) {
-  { const uint64_t x10 = in1[4];
-  { const uint64_t x11 = in1[3];
-  { const uint64_t x9 = in1[2];
-  { const uint64_t x7 = in1[1];
-  { const uint64_t x5 = in1[0];
-  { const uint64_t x18 = in2[4];
-  { const uint64_t x19 = in2[3];
-  { const uint64_t x17 = in2[2];
-  { const uint64_t x15 = in2[1];
-  { const uint64_t x13 = in2[0];
-  out[0] = (x5 + x13);
-  out[1] = (x7 + x15);
-  out[2] = (x9 + x17);
-  out[3] = (x11 + x19);
-  out[4] = (x10 + x18);
-  }}}}}}}}}}
-}
-
-// h = f + g
-// Can overlap h with f or g.
-static void fe_add(fe_loose *h, const fe *f, const fe *g) {
-  assert_fe(f->v);
-  assert_fe(g->v);
-  fe_add_impl(h->v, f->v, g->v);
-  assert_fe_loose(h->v);
-}
-
-static void fe_sub_impl(uint64_t out[5], const uint64_t in1[5], const uint64_t in2[5]) {
-  { const uint64_t x10 = in1[4];
-  { const uint64_t x11 = in1[3];
-  { const uint64_t x9 = in1[2];
-  { const uint64_t x7 = in1[1];
-  { const uint64_t x5 = in1[0];
-  { const uint64_t x18 = in2[4];
-  { const uint64_t x19 = in2[3];
-  { const uint64_t x17 = in2[2];
-  { const uint64_t x15 = in2[1];
-  { const uint64_t x13 = in2[0];
-  out[0] = ((0xfffffffffffda + x5) - x13);
-  out[1] = ((0xffffffffffffe + x7) - x15);
-  out[2] = ((0xffffffffffffe + x9) - x17);
-  out[3] = ((0xffffffffffffe + x11) - x19);
-  out[4] = ((0xffffffffffffe + x10) - x18);
-  }}}}}}}}}}
-}
-
-// h = f - g
-// Can overlap h with f or g.
-static void fe_sub(fe_loose *h, const fe *f, const fe *g) {
-  assert_fe(f->v);
-  assert_fe(g->v);
-  fe_sub_impl(h->v, f->v, g->v);
-  assert_fe_loose(h->v);
-}
-
-static void fe_carry_impl(uint64_t out[5], const uint64_t in1[5]) {
-  { const uint64_t x7 = in1[4];
-  { const uint64_t x8 = in1[3];
-  { const uint64_t x6 = in1[2];
-  { const uint64_t x4 = in1[1];
-  { const uint64_t x2 = in1[0];
-  { uint64_t x9 = (x2 >> 0x33);
-  { uint64_t x10 = (x2 & 0x7ffffffffffff);
-  { uint64_t x11 = (x9 + x4);
-  { uint64_t x12 = (x11 >> 0x33);
-  { uint64_t x13 = (x11 & 0x7ffffffffffff);
-  { uint64_t x14 = (x12 + x6);
-  { uint64_t x15 = (x14 >> 0x33);
-  { uint64_t x16 = (x14 & 0x7ffffffffffff);
-  { uint64_t x17 = (x15 + x8);
-  { uint64_t x18 = (x17 >> 0x33);
-  { uint64_t x19 = (x17 & 0x7ffffffffffff);
-  { uint64_t x20 = (x18 + x7);
-  { uint64_t x21 = (x20 >> 0x33);
-  { uint64_t x22 = (x20 & 0x7ffffffffffff);
-  { uint64_t x23 = (x10 + (0x13 * x21));
-  { uint64_t x24 = (x23 >> 0x33);
-  { uint64_t x25 = (x23 & 0x7ffffffffffff);
-  { uint64_t x26 = (x24 + x13);
-  { uint64_t x27 = (x26 >> 0x33);
-  { uint64_t x28 = (x26 & 0x7ffffffffffff);
-  out[0] = x25;
-  out[1] = x28;
-  out[2] = (x27 + x16);
-  out[3] = x19;
-  out[4] = x22;
-  }}}}}}}}}}}}}}}}}}}}}}}}}
-}
-
-static void fe_carry(fe *h, const fe_loose* f) {
-  assert_fe_loose(f->v);
-  fe_carry_impl(h->v, f->v);
-  assert_fe(h->v);
-}
-
-static void fe_mul_impl(uint64_t out[5], const uint64_t in1[5], const uint64_t in2[5]) {
-  assert_fe_loose(in1);
-  assert_fe_loose(in2);
-  { const uint64_t x10 = in1[4];
-  { const uint64_t x11 = in1[3];
-  { const uint64_t x9 = in1[2];
-  { const uint64_t x7 = in1[1];
-  { const uint64_t x5 = in1[0];
-  { const uint64_t x18 = in2[4];
-  { const uint64_t x19 = in2[3];
-  { const uint64_t x17 = in2[2];
-  { const uint64_t x15 = in2[1];
-  { const uint64_t x13 = in2[0];
-  { uint128_t x20 = ((uint128_t)x5 * x13);
-  { uint128_t x21 = (((uint128_t)x5 * x15) + ((uint128_t)x7 * x13));
-  { uint128_t x22 = ((((uint128_t)x5 * x17) + ((uint128_t)x9 * x13)) + ((uint128_t)x7 * x15));
-  { uint128_t x23 = (((((uint128_t)x5 * x19) + ((uint128_t)x11 * x13)) + ((uint128_t)x7 * x17)) + ((uint128_t)x9 * x15));
-  { uint128_t x24 = ((((((uint128_t)x5 * x18) + ((uint128_t)x10 * x13)) + ((uint128_t)x11 * x15)) + ((uint128_t)x7 * x19)) + ((uint128_t)x9 * x17));
-  { uint64_t x25 = (x10 * 0x13);
-  { uint64_t x26 = (x7 * 0x13);
-  { uint64_t x27 = (x9 * 0x13);
-  { uint64_t x28 = (x11 * 0x13);
-  { uint128_t x29 = ((((x20 + ((uint128_t)x25 * x15)) + ((uint128_t)x26 * x18)) + ((uint128_t)x27 * x19)) + ((uint128_t)x28 * x17));
-  { uint128_t x30 = (((x21 + ((uint128_t)x25 * x17)) + ((uint128_t)x27 * x18)) + ((uint128_t)x28 * x19));
-  { uint128_t x31 = ((x22 + ((uint128_t)x25 * x19)) + ((uint128_t)x28 * x18));
-  { uint128_t x32 = (x23 + ((uint128_t)x25 * x18));
-  { uint64_t x33 = (uint64_t) (x29 >> 0x33);
-  { uint64_t x34 = ((uint64_t)x29 & 0x7ffffffffffff);
-  { uint128_t x35 = (x33 + x30);
-  { uint64_t x36 = (uint64_t) (x35 >> 0x33);
-  { uint64_t x37 = ((uint64_t)x35 & 0x7ffffffffffff);
-  { uint128_t x38 = (x36 + x31);
-  { uint64_t x39 = (uint64_t) (x38 >> 0x33);
-  { uint64_t x40 = ((uint64_t)x38 & 0x7ffffffffffff);
-  { uint128_t x41 = (x39 + x32);
-  { uint64_t x42 = (uint64_t) (x41 >> 0x33);
-  { uint64_t x43 = ((uint64_t)x41 & 0x7ffffffffffff);
-  { uint128_t x44 = (x42 + x24);
-  { uint64_t x45 = (uint64_t) (x44 >> 0x33);
-  { uint64_t x46 = ((uint64_t)x44 & 0x7ffffffffffff);
-  { uint64_t x47 = (x34 + (0x13 * x45));
-  { uint64_t x48 = (x47 >> 0x33);
-  { uint64_t x49 = (x47 & 0x7ffffffffffff);
-  { uint64_t x50 = (x48 + x37);
-  { uint64_t x51 = (x50 >> 0x33);
-  { uint64_t x52 = (x50 & 0x7ffffffffffff);
-  out[0] = x49;
-  out[1] = x52;
-  out[2] = (x51 + x40);
-  out[3] = x43;
-  out[4] = x46;
-  }}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}
-  assert_fe(out);
-}
-
-static void fe_mul_ltt(fe_loose *h, const fe *f, const fe *g) {
-  fe_mul_impl(h->v, f->v, g->v);
-}
-
-static void fe_mul_llt(fe_loose *h, const fe_loose *f, const fe *g) {
-  fe_mul_impl(h->v, f->v, g->v);
-}
-
-static void fe_mul_ttt(fe *h, const fe *f, const fe *g) {
-  fe_mul_impl(h->v, f->v, g->v);
-}
-
-static void fe_mul_tlt(fe *h, const fe_loose *f, const fe *g) {
-  fe_mul_impl(h->v, f->v, g->v);
-}
-
-static void fe_mul_ttl(fe *h, const fe *f, const fe_loose *g) {
-  fe_mul_impl(h->v, f->v, g->v);
-}
-
-static void fe_mul_tll(fe *h, const fe_loose *f, const fe_loose *g) {
-  fe_mul_impl(h->v, f->v, g->v);
-}
-
-static void fe_sqr_impl(uint64_t out[5], const uint64_t in1[5]) {
-  assert_fe_loose(in1);
-  { const uint64_t x7 = in1[4];
-  { const uint64_t x8 = in1[3];
-  { const uint64_t x6 = in1[2];
-  { const uint64_t x4 = in1[1];
-  { const uint64_t x2 = in1[0];
-  { uint64_t x9 = (x2 * 0x2);
-  { uint64_t x10 = (x4 * 0x2);
-  { uint64_t x11 = ((x6 * 0x2) * 0x13);
-  { uint64_t x12 = (x7 * 0x13);
-  { uint64_t x13 = (x12 * 0x2);
-  { uint128_t x14 = ((((uint128_t)x2 * x2) + ((uint128_t)x13 * x4)) + ((uint128_t)x11 * x8));
-  { uint128_t x15 = ((((uint128_t)x9 * x4) + ((uint128_t)x13 * x6)) + ((uint128_t)x8 * (x8 * 0x13)));
-  { uint128_t x16 = ((((uint128_t)x9 * x6) + ((uint128_t)x4 * x4)) + ((uint128_t)x13 * x8));
-  { uint128_t x17 = ((((uint128_t)x9 * x8) + ((uint128_t)x10 * x6)) + ((uint128_t)x7 * x12));
-  { uint128_t x18 = ((((uint128_t)x9 * x7) + ((uint128_t)x10 * x8)) + ((uint128_t)x6 * x6));
-  { uint64_t x19 = (uint64_t) (x14 >> 0x33);
-  { uint64_t x20 = ((uint64_t)x14 & 0x7ffffffffffff);
-  { uint128_t x21 = (x19 + x15);
-  { uint64_t x22 = (uint64_t) (x21 >> 0x33);
-  { uint64_t x23 = ((uint64_t)x21 & 0x7ffffffffffff);
-  { uint128_t x24 = (x22 + x16);
-  { uint64_t x25 = (uint64_t) (x24 >> 0x33);
-  { uint64_t x26 = ((uint64_t)x24 & 0x7ffffffffffff);
-  { uint128_t x27 = (x25 + x17);
-  { uint64_t x28 = (uint64_t) (x27 >> 0x33);
-  { uint64_t x29 = ((uint64_t)x27 & 0x7ffffffffffff);
-  { uint128_t x30 = (x28 + x18);
-  { uint64_t x31 = (uint64_t) (x30 >> 0x33);
-  { uint64_t x32 = ((uint64_t)x30 & 0x7ffffffffffff);
-  { uint64_t x33 = (x20 + (0x13 * x31));
-  { uint64_t x34 = (x33 >> 0x33);
-  { uint64_t x35 = (x33 & 0x7ffffffffffff);
-  { uint64_t x36 = (x34 + x23);
-  { uint64_t x37 = (x36 >> 0x33);
-  { uint64_t x38 = (x36 & 0x7ffffffffffff);
-  out[0] = x35;
-  out[1] = x38;
-  out[2] = (x37 + x26);
-  out[3] = x29;
-  out[4] = x32;
-  }}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}
-  assert_fe(out);
-}
-
-static void fe_sq_tl(fe *h, const fe_loose *f) {
-  fe_sqr_impl(h->v, f->v);
-}
-
-static void fe_sq_tt(fe *h, const fe *f) {
-  fe_sqr_impl(h->v, f->v);
-}
-
-// Replace (f,g) with (g,f) if b == 1;
-// replace (f,g) with (f,g) if b == 0.
+// assert_fe asserts that |f| satisfies bounds:
 //
-// Preconditions: b in {0,1}.
-static void fe_cswap(fe *f, fe *g, uint64_t b) {
-  b = 0-b;
-  for (unsigned i = 0; i < 5; i++) {
-    uint64_t x = f->v[i] ^ g->v[i];
-    x &= b;
-    f->v[i] ^= x;
-    g->v[i] ^= x;
-  }
-}
-
-// NOTE: based on fiat-crypto fe_mul, edited for in2=121666, 0, 0..
-static void fe_mul_121666_impl(uint64_t out[5], const uint64_t in1[5]) {
-  { const uint64_t x10 = in1[4];
-  { const uint64_t x11 = in1[3];
-  { const uint64_t x9 = in1[2];
-  { const uint64_t x7 = in1[1];
-  { const uint64_t x5 = in1[0];
-  { const uint64_t x18 = 0;
-  { const uint64_t x19 = 0;
-  { const uint64_t x17 = 0;
-  { const uint64_t x15 = 0;
-  { const uint64_t x13 = 121666;
-  { uint128_t x20 = ((uint128_t)x5 * x13);
-  { uint128_t x21 = (((uint128_t)x5 * x15) + ((uint128_t)x7 * x13));
-  { uint128_t x22 = ((((uint128_t)x5 * x17) + ((uint128_t)x9 * x13)) + ((uint128_t)x7 * x15));
-  { uint128_t x23 = (((((uint128_t)x5 * x19) + ((uint128_t)x11 * x13)) + ((uint128_t)x7 * x17)) + ((uint128_t)x9 * x15));
-  { uint128_t x24 = ((((((uint128_t)x5 * x18) + ((uint128_t)x10 * x13)) + ((uint128_t)x11 * x15)) + ((uint128_t)x7 * x19)) + ((uint128_t)x9 * x17));
-  { uint64_t x25 = (x10 * 0x13);
-  { uint64_t x26 = (x7 * 0x13);
-  { uint64_t x27 = (x9 * 0x13);
-  { uint64_t x28 = (x11 * 0x13);
-  { uint128_t x29 = ((((x20 + ((uint128_t)x25 * x15)) + ((uint128_t)x26 * x18)) + ((uint128_t)x27 * x19)) + ((uint128_t)x28 * x17));
-  { uint128_t x30 = (((x21 + ((uint128_t)x25 * x17)) + ((uint128_t)x27 * x18)) + ((uint128_t)x28 * x19));
-  { uint128_t x31 = ((x22 + ((uint128_t)x25 * x19)) + ((uint128_t)x28 * x18));
-  { uint128_t x32 = (x23 + ((uint128_t)x25 * x18));
-  { uint64_t x33 = (uint64_t) (x29 >> 0x33);
-  { uint64_t x34 = ((uint64_t)x29 & 0x7ffffffffffff);
-  { uint128_t x35 = (x33 + x30);
-  { uint64_t x36 = (uint64_t) (x35 >> 0x33);
-  { uint64_t x37 = ((uint64_t)x35 & 0x7ffffffffffff);
-  { uint128_t x38 = (x36 + x31);
-  { uint64_t x39 = (uint64_t) (x38 >> 0x33);
-  { uint64_t x40 = ((uint64_t)x38 & 0x7ffffffffffff);
-  { uint128_t x41 = (x39 + x32);
-  { uint64_t x42 = (uint64_t) (x41 >> 0x33);
-  { uint64_t x43 = ((uint64_t)x41 & 0x7ffffffffffff);
-  { uint128_t x44 = (x42 + x24);
-  { uint64_t x45 = (uint64_t) (x44 >> 0x33);
-  { uint64_t x46 = ((uint64_t)x44 & 0x7ffffffffffff);
-  { uint64_t x47 = (x34 + (0x13 * x45));
-  { uint64_t x48 = (x47 >> 0x33);
-  { uint64_t x49 = (x47 & 0x7ffffffffffff);
-  { uint64_t x50 = (x48 + x37);
-  { uint64_t x51 = (x50 >> 0x33);
-  { uint64_t x52 = (x50 & 0x7ffffffffffff);
-  out[0] = x49;
-  out[1] = x52;
-  out[2] = (x51 + x40);
-  out[3] = x43;
-  out[4] = x46;
-  }}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}
-}
-
-static void fe_mul121666(fe *h, const fe_loose *f) {
-  assert_fe_loose(f->v);
-  fe_mul_121666_impl(h->v, f->v);
-  assert_fe(h->v);
-}
-
-// Adapted from Fiat-synthesized |fe_sub_impl| with |out| = 0.
-static void fe_neg_impl(uint64_t out[5], const uint64_t in2[5]) {
-  { const uint64_t x10 = 0;
-  { const uint64_t x11 = 0;
-  { const uint64_t x9 = 0;
-  { const uint64_t x7 = 0;
-  { const uint64_t x5 = 0;
-  { const uint64_t x18 = in2[4];
-  { const uint64_t x19 = in2[3];
-  { const uint64_t x17 = in2[2];
-  { const uint64_t x15 = in2[1];
-  { const uint64_t x13 = in2[0];
-  out[0] = ((0xfffffffffffda + x5) - x13);
-  out[1] = ((0xffffffffffffe + x7) - x15);
-  out[2] = ((0xffffffffffffe + x9) - x17);
-  out[3] = ((0xffffffffffffe + x11) - x19);
-  out[4] = ((0xffffffffffffe + x10) - x18);
-  }}}}}}}}}}
-}
-
-// h = -f
-static void fe_neg(fe_loose *h, const fe *f) {
-  assert_fe(f->v);
-  fe_neg_impl(h->v, f->v);
-  assert_fe_loose(h->v);
-}
-
-// Replace (f,g) with (g,g) if b == 1;
-// replace (f,g) with (f,g) if b == 0.
+//  [[0x0 ~> 0x8cccccccccccc],
+//   [0x0 ~> 0x8cccccccccccc],
+//   [0x0 ~> 0x8cccccccccccc],
+//   [0x0 ~> 0x8cccccccccccc],
+//   [0x0 ~> 0x8cccccccccccc]]
 //
-// Preconditions: b in {0,1}.
-static void fe_cmov(fe_loose *f, const fe_loose *g, uint64_t b) {
-  b = 0-b;
-  for (unsigned i = 0; i < 5; i++) {
-    uint64_t x = f->v[i] ^ g->v[i];
-    x &= b;
-    f->v[i] ^= x;
-  }
-}
+// See comments in curve25519_64.c for which functions use these bounds for
+// inputs or outputs.
+#define assert_fe(f)                                                    \
+  do {                                                                  \
+    for (unsigned _assert_fe_i = 0; _assert_fe_i < 5; _assert_fe_i++) { \
+      assert(f[_assert_fe_i] <= UINT64_C(0x8cccccccccccc));             \
+    }                                                                   \
+  } while (0)
+
+// assert_fe_loose asserts that |f| satisfies bounds:
+//
+//  [[0x0 ~> 0x1a666666666664],
+//   [0x0 ~> 0x1a666666666664],
+//   [0x0 ~> 0x1a666666666664],
+//   [0x0 ~> 0x1a666666666664],
+//   [0x0 ~> 0x1a666666666664]]
+//
+// See comments in curve25519_64.c for which functions use these bounds for
+// inputs or outputs.
+#define assert_fe_loose(f)                                              \
+  do {                                                                  \
+    for (unsigned _assert_fe_i = 0; _assert_fe_i < 5; _assert_fe_i++) { \
+      assert(f[_assert_fe_i] <= UINT64_C(0x1a666666666664));            \
+    }                                                                   \
+  } while (0)
 
 #else
 
-#define assert_fe(f) do { \
-  for (unsigned _assert_fe_i = 0; _assert_fe_i< 10; _assert_fe_i++) { \
-    assert(f[_assert_fe_i] < 1.125*(1<<(26-(_assert_fe_i&1)))); \
-  } \
-} while (0)
+typedef uint32_t fe_limb_t;
+#define FE_NUM_LIMBS 10
 
-#define assert_fe_loose(f) do { \
-  for (unsigned _assert_fe_i = 0; _assert_fe_i< 10; _assert_fe_i++) { \
-    assert(f[_assert_fe_i] < 3.375*(1<<(26-(_assert_fe_i&1)))); \
-  } \
-} while (0)
+// assert_fe asserts that |f| satisfies bounds:
+//
+//  [[0x0 ~> 0x4666666], [0x0 ~> 0x2333333],
+//   [0x0 ~> 0x4666666], [0x0 ~> 0x2333333],
+//   [0x0 ~> 0x4666666], [0x0 ~> 0x2333333],
+//   [0x0 ~> 0x4666666], [0x0 ~> 0x2333333],
+//   [0x0 ~> 0x4666666], [0x0 ~> 0x2333333]]
+//
+// See comments in curve25519_32.c for which functions use these bounds for
+// inputs or outputs.
+#define assert_fe(f)                                                     \
+  do {                                                                   \
+    for (unsigned _assert_fe_i = 0; _assert_fe_i < 10; _assert_fe_i++) { \
+      assert(f[_assert_fe_i] <=                                          \
+             ((_assert_fe_i & 1) ? 0x2333333u : 0x4666666u));            \
+    }                                                                    \
+  } while (0)
 
-#define assert_fe_frozen(f) do { \
-  for (unsigned _assert_fe_i = 0; _assert_fe_i< 10; _assert_fe_i++) { \
-    assert(f[_assert_fe_i] < (1u<<(26-(_assert_fe_i&1)))); \
-  } \
-} while (0)
+// assert_fe_loose asserts that |f| satisfies bounds:
+//
+//  [[0x0 ~> 0xd333332], [0x0 ~> 0x6999999],
+//   [0x0 ~> 0xd333332], [0x0 ~> 0x6999999],
+//   [0x0 ~> 0xd333332], [0x0 ~> 0x6999999],
+//   [0x0 ~> 0xd333332], [0x0 ~> 0x6999999],
+//   [0x0 ~> 0xd333332], [0x0 ~> 0x6999999]]
+//
+// See comments in curve25519_32.c for which functions use these bounds for
+// inputs or outputs.
+#define assert_fe_loose(f)                                               \
+  do {                                                                   \
+    for (unsigned _assert_fe_i = 0; _assert_fe_i < 10; _assert_fe_i++) { \
+      assert(f[_assert_fe_i] <=                                          \
+             ((_assert_fe_i & 1) ? 0x6999999u : 0xd333332u));            \
+    }                                                                    \
+  } while (0)
 
-static void fe_frombytes_impl(uint32_t h[10], const uint8_t s[32]) {
-  // Ignores top bit of s.
-  uint32_t a0 = load_4(s);
-  uint32_t a1 = load_4(s+4);
-  uint32_t a2 = load_4(s+8);
-  uint32_t a3 = load_4(s+12);
-  uint32_t a4 = load_4(s+16);
-  uint32_t a5 = load_4(s+20);
-  uint32_t a6 = load_4(s+24);
-  uint32_t a7 = load_4(s+28);
-  h[0] = a0&((1<<26)-1);                    // 26 used, 32-26 left.   26
-  h[1] = (a0>>26) | ((a1&((1<<19)-1))<< 6); // (32-26) + 19 =  6+19 = 25
-  h[2] = (a1>>19) | ((a2&((1<<13)-1))<<13); // (32-19) + 13 = 13+13 = 26
-  h[3] = (a2>>13) | ((a3&((1<< 6)-1))<<19); // (32-13) +  6 = 19+ 6 = 25
-  h[4] = (a3>> 6);                          // (32- 6)              = 26
-  h[5] = a4&((1<<25)-1);                    //                        25
-  h[6] = (a4>>25) | ((a5&((1<<19)-1))<< 7); // (32-25) + 19 =  7+19 = 26
-  h[7] = (a5>>19) | ((a6&((1<<12)-1))<<13); // (32-19) + 12 = 13+12 = 25
-  h[8] = (a6>>12) | ((a7&((1<< 6)-1))<<20); // (32-12) +  6 = 20+ 6 = 26
-  h[9] = (a7>> 6)&((1<<25)-1); //                                     25
-  assert_fe(h);
+#endif  // BORINGSSL_CURVE25519_64BIT
+
+OPENSSL_STATIC_ASSERT(sizeof(fe) == sizeof(fe_limb_t) * FE_NUM_LIMBS,
+                      "fe_limb_t[FE_NUM_LIMBS] is inconsistent with fe");
+
+static void fe_frombytes_strict(fe *h, const uint8_t s[32]) {
+  // |fiat_25519_from_bytes| requires the top-most bit be clear.
+  assert((s[31] & 0x80) == 0);
+  fiat_25519_from_bytes(h->v, s);
+  assert_fe(h->v);
 }
 
 static void fe_frombytes(fe *h, const uint8_t s[32]) {
-  fe_frombytes_impl(h->v, s);
-}
-
-static void fe_freeze(uint32_t out[10], const uint32_t in1[10]) {
-  { const uint32_t x17 = in1[9];
-  { const uint32_t x18 = in1[8];
-  { const uint32_t x16 = in1[7];
-  { const uint32_t x14 = in1[6];
-  { const uint32_t x12 = in1[5];
-  { const uint32_t x10 = in1[4];
-  { const uint32_t x8 = in1[3];
-  { const uint32_t x6 = in1[2];
-  { const uint32_t x4 = in1[1];
-  { const uint32_t x2 = in1[0];
-  { uint32_t x20; uint8_t/*bool*/ x21 = subborrow_u26(0x0, x2, 0x3ffffed, &x20);
-  { uint32_t x23; uint8_t/*bool*/ x24 = subborrow_u25(x21, x4, 0x1ffffff, &x23);
-  { uint32_t x26; uint8_t/*bool*/ x27 = subborrow_u26(x24, x6, 0x3ffffff, &x26);
-  { uint32_t x29; uint8_t/*bool*/ x30 = subborrow_u25(x27, x8, 0x1ffffff, &x29);
-  { uint32_t x32; uint8_t/*bool*/ x33 = subborrow_u26(x30, x10, 0x3ffffff, &x32);
-  { uint32_t x35; uint8_t/*bool*/ x36 = subborrow_u25(x33, x12, 0x1ffffff, &x35);
-  { uint32_t x38; uint8_t/*bool*/ x39 = subborrow_u26(x36, x14, 0x3ffffff, &x38);
-  { uint32_t x41; uint8_t/*bool*/ x42 = subborrow_u25(x39, x16, 0x1ffffff, &x41);
-  { uint32_t x44; uint8_t/*bool*/ x45 = subborrow_u26(x42, x18, 0x3ffffff, &x44);
-  { uint32_t x47; uint8_t/*bool*/ x48 = subborrow_u25(x45, x17, 0x1ffffff, &x47);
-  { uint32_t x49 = cmovznz32(x48, 0x0, 0xffffffff);
-  { uint32_t x50 = (x49 & 0x3ffffed);
-  { uint32_t x52; uint8_t/*bool*/ x53 = addcarryx_u26(0x0, x20, x50, &x52);
-  { uint32_t x54 = (x49 & 0x1ffffff);
-  { uint32_t x56; uint8_t/*bool*/ x57 = addcarryx_u25(x53, x23, x54, &x56);
-  { uint32_t x58 = (x49 & 0x3ffffff);
-  { uint32_t x60; uint8_t/*bool*/ x61 = addcarryx_u26(x57, x26, x58, &x60);
-  { uint32_t x62 = (x49 & 0x1ffffff);
-  { uint32_t x64; uint8_t/*bool*/ x65 = addcarryx_u25(x61, x29, x62, &x64);
-  { uint32_t x66 = (x49 & 0x3ffffff);
-  { uint32_t x68; uint8_t/*bool*/ x69 = addcarryx_u26(x65, x32, x66, &x68);
-  { uint32_t x70 = (x49 & 0x1ffffff);
-  { uint32_t x72; uint8_t/*bool*/ x73 = addcarryx_u25(x69, x35, x70, &x72);
-  { uint32_t x74 = (x49 & 0x3ffffff);
-  { uint32_t x76; uint8_t/*bool*/ x77 = addcarryx_u26(x73, x38, x74, &x76);
-  { uint32_t x78 = (x49 & 0x1ffffff);
-  { uint32_t x80; uint8_t/*bool*/ x81 = addcarryx_u25(x77, x41, x78, &x80);
-  { uint32_t x82 = (x49 & 0x3ffffff);
-  { uint32_t x84; uint8_t/*bool*/ x85 = addcarryx_u26(x81, x44, x82, &x84);
-  { uint32_t x86 = (x49 & 0x1ffffff);
-  { uint32_t x88; addcarryx_u25(x85, x47, x86, &x88);
-  out[0] = x52;
-  out[1] = x56;
-  out[2] = x60;
-  out[3] = x64;
-  out[4] = x68;
-  out[5] = x72;
-  out[6] = x76;
-  out[7] = x80;
-  out[8] = x84;
-  out[9] = x88;
-  }}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}
+  uint8_t s_copy[32];
+  OPENSSL_memcpy(s_copy, s, 32);
+  s_copy[31] &= 0x7f;
+  fe_frombytes_strict(h, s_copy);
 }
 
 static void fe_tobytes(uint8_t s[32], const fe *f) {
   assert_fe(f->v);
-  uint32_t h[10];
-  fe_freeze(h, f->v);
-  assert_fe_frozen(h);
-
-  s[0] = h[0] >> 0;
-  s[1] = h[0] >> 8;
-  s[2] = h[0] >> 16;
-  s[3] = (h[0] >> 24) | (h[1] << 2);
-  s[4] = h[1] >> 6;
-  s[5] = h[1] >> 14;
-  s[6] = (h[1] >> 22) | (h[2] << 3);
-  s[7] = h[2] >> 5;
-  s[8] = h[2] >> 13;
-  s[9] = (h[2] >> 21) | (h[3] << 5);
-  s[10] = h[3] >> 3;
-  s[11] = h[3] >> 11;
-  s[12] = (h[3] >> 19) | (h[4] << 6);
-  s[13] = h[4] >> 2;
-  s[14] = h[4] >> 10;
-  s[15] = h[4] >> 18;
-  s[16] = h[5] >> 0;
-  s[17] = h[5] >> 8;
-  s[18] = h[5] >> 16;
-  s[19] = (h[5] >> 24) | (h[6] << 1);
-  s[20] = h[6] >> 7;
-  s[21] = h[6] >> 15;
-  s[22] = (h[6] >> 23) | (h[7] << 3);
-  s[23] = h[7] >> 5;
-  s[24] = h[7] >> 13;
-  s[25] = (h[7] >> 21) | (h[8] << 4);
-  s[26] = h[8] >> 4;
-  s[27] = h[8] >> 12;
-  s[28] = (h[8] >> 20) | (h[9] << 6);
-  s[29] = h[9] >> 2;
-  s[30] = h[9] >> 10;
-  s[31] = h[9] >> 18;
+  fiat_25519_to_bytes(s, f->v);
 }
 
 // h = 0
@@ -787,272 +198,36 @@
   h->v[0] = 1;
 }
 
-static void fe_add_impl(uint32_t out[10], const uint32_t in1[10], const uint32_t in2[10]) {
-  { const uint32_t x20 = in1[9];
-  { const uint32_t x21 = in1[8];
-  { const uint32_t x19 = in1[7];
-  { const uint32_t x17 = in1[6];
-  { const uint32_t x15 = in1[5];
-  { const uint32_t x13 = in1[4];
-  { const uint32_t x11 = in1[3];
-  { const uint32_t x9 = in1[2];
-  { const uint32_t x7 = in1[1];
-  { const uint32_t x5 = in1[0];
-  { const uint32_t x38 = in2[9];
-  { const uint32_t x39 = in2[8];
-  { const uint32_t x37 = in2[7];
-  { const uint32_t x35 = in2[6];
-  { const uint32_t x33 = in2[5];
-  { const uint32_t x31 = in2[4];
-  { const uint32_t x29 = in2[3];
-  { const uint32_t x27 = in2[2];
-  { const uint32_t x25 = in2[1];
-  { const uint32_t x23 = in2[0];
-  out[0] = (x5 + x23);
-  out[1] = (x7 + x25);
-  out[2] = (x9 + x27);
-  out[3] = (x11 + x29);
-  out[4] = (x13 + x31);
-  out[5] = (x15 + x33);
-  out[6] = (x17 + x35);
-  out[7] = (x19 + x37);
-  out[8] = (x21 + x39);
-  out[9] = (x20 + x38);
-  }}}}}}}}}}}}}}}}}}}}
-}
-
 // h = f + g
 // Can overlap h with f or g.
 static void fe_add(fe_loose *h, const fe *f, const fe *g) {
   assert_fe(f->v);
   assert_fe(g->v);
-  fe_add_impl(h->v, f->v, g->v);
+  fiat_25519_add(h->v, f->v, g->v);
   assert_fe_loose(h->v);
 }
 
-static void fe_sub_impl(uint32_t out[10], const uint32_t in1[10], const uint32_t in2[10]) {
-  { const uint32_t x20 = in1[9];
-  { const uint32_t x21 = in1[8];
-  { const uint32_t x19 = in1[7];
-  { const uint32_t x17 = in1[6];
-  { const uint32_t x15 = in1[5];
-  { const uint32_t x13 = in1[4];
-  { const uint32_t x11 = in1[3];
-  { const uint32_t x9 = in1[2];
-  { const uint32_t x7 = in1[1];
-  { const uint32_t x5 = in1[0];
-  { const uint32_t x38 = in2[9];
-  { const uint32_t x39 = in2[8];
-  { const uint32_t x37 = in2[7];
-  { const uint32_t x35 = in2[6];
-  { const uint32_t x33 = in2[5];
-  { const uint32_t x31 = in2[4];
-  { const uint32_t x29 = in2[3];
-  { const uint32_t x27 = in2[2];
-  { const uint32_t x25 = in2[1];
-  { const uint32_t x23 = in2[0];
-  out[0] = ((0x7ffffda + x5) - x23);
-  out[1] = ((0x3fffffe + x7) - x25);
-  out[2] = ((0x7fffffe + x9) - x27);
-  out[3] = ((0x3fffffe + x11) - x29);
-  out[4] = ((0x7fffffe + x13) - x31);
-  out[5] = ((0x3fffffe + x15) - x33);
-  out[6] = ((0x7fffffe + x17) - x35);
-  out[7] = ((0x3fffffe + x19) - x37);
-  out[8] = ((0x7fffffe + x21) - x39);
-  out[9] = ((0x3fffffe + x20) - x38);
-  }}}}}}}}}}}}}}}}}}}}
-}
-
 // h = f - g
 // Can overlap h with f or g.
 static void fe_sub(fe_loose *h, const fe *f, const fe *g) {
   assert_fe(f->v);
   assert_fe(g->v);
-  fe_sub_impl(h->v, f->v, g->v);
+  fiat_25519_sub(h->v, f->v, g->v);
   assert_fe_loose(h->v);
 }
 
-static void fe_carry_impl(uint32_t out[10], const uint32_t in1[10]) {
-  { const uint32_t x17 = in1[9];
-  { const uint32_t x18 = in1[8];
-  { const uint32_t x16 = in1[7];
-  { const uint32_t x14 = in1[6];
-  { const uint32_t x12 = in1[5];
-  { const uint32_t x10 = in1[4];
-  { const uint32_t x8 = in1[3];
-  { const uint32_t x6 = in1[2];
-  { const uint32_t x4 = in1[1];
-  { const uint32_t x2 = in1[0];
-  { uint32_t x19 = (x2 >> 0x1a);
-  { uint32_t x20 = (x2 & 0x3ffffff);
-  { uint32_t x21 = (x19 + x4);
-  { uint32_t x22 = (x21 >> 0x19);
-  { uint32_t x23 = (x21 & 0x1ffffff);
-  { uint32_t x24 = (x22 + x6);
-  { uint32_t x25 = (x24 >> 0x1a);
-  { uint32_t x26 = (x24 & 0x3ffffff);
-  { uint32_t x27 = (x25 + x8);
-  { uint32_t x28 = (x27 >> 0x19);
-  { uint32_t x29 = (x27 & 0x1ffffff);
-  { uint32_t x30 = (x28 + x10);
-  { uint32_t x31 = (x30 >> 0x1a);
-  { uint32_t x32 = (x30 & 0x3ffffff);
-  { uint32_t x33 = (x31 + x12);
-  { uint32_t x34 = (x33 >> 0x19);
-  { uint32_t x35 = (x33 & 0x1ffffff);
-  { uint32_t x36 = (x34 + x14);
-  { uint32_t x37 = (x36 >> 0x1a);
-  { uint32_t x38 = (x36 & 0x3ffffff);
-  { uint32_t x39 = (x37 + x16);
-  { uint32_t x40 = (x39 >> 0x19);
-  { uint32_t x41 = (x39 & 0x1ffffff);
-  { uint32_t x42 = (x40 + x18);
-  { uint32_t x43 = (x42 >> 0x1a);
-  { uint32_t x44 = (x42 & 0x3ffffff);
-  { uint32_t x45 = (x43 + x17);
-  { uint32_t x46 = (x45 >> 0x19);
-  { uint32_t x47 = (x45 & 0x1ffffff);
-  { uint32_t x48 = (x20 + (0x13 * x46));
-  { uint32_t x49 = (x48 >> 0x1a);
-  { uint32_t x50 = (x48 & 0x3ffffff);
-  { uint32_t x51 = (x49 + x23);
-  { uint32_t x52 = (x51 >> 0x19);
-  { uint32_t x53 = (x51 & 0x1ffffff);
-  out[0] = x50;
-  out[1] = x53;
-  out[2] = (x52 + x26);
-  out[3] = x29;
-  out[4] = x32;
-  out[5] = x35;
-  out[6] = x38;
-  out[7] = x41;
-  out[8] = x44;
-  out[9] = x47;
-  }}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}
-}
-
 static void fe_carry(fe *h, const fe_loose* f) {
   assert_fe_loose(f->v);
-  fe_carry_impl(h->v, f->v);
+  fiat_25519_carry(h->v, f->v);
   assert_fe(h->v);
 }
 
-static void fe_mul_impl(uint32_t out[10], const uint32_t in1[10], const uint32_t in2[10]) {
+static void fe_mul_impl(fe_limb_t out[FE_NUM_LIMBS],
+                        const fe_limb_t in1[FE_NUM_LIMBS],
+                        const fe_limb_t in2[FE_NUM_LIMBS]) {
   assert_fe_loose(in1);
   assert_fe_loose(in2);
-  { const uint32_t x20 = in1[9];
-  { const uint32_t x21 = in1[8];
-  { const uint32_t x19 = in1[7];
-  { const uint32_t x17 = in1[6];
-  { const uint32_t x15 = in1[5];
-  { const uint32_t x13 = in1[4];
-  { const uint32_t x11 = in1[3];
-  { const uint32_t x9 = in1[2];
-  { const uint32_t x7 = in1[1];
-  { const uint32_t x5 = in1[0];
-  { const uint32_t x38 = in2[9];
-  { const uint32_t x39 = in2[8];
-  { const uint32_t x37 = in2[7];
-  { const uint32_t x35 = in2[6];
-  { const uint32_t x33 = in2[5];
-  { const uint32_t x31 = in2[4];
-  { const uint32_t x29 = in2[3];
-  { const uint32_t x27 = in2[2];
-  { const uint32_t x25 = in2[1];
-  { const uint32_t x23 = in2[0];
-  { uint64_t x40 = ((uint64_t)x23 * x5);
-  { uint64_t x41 = (((uint64_t)x23 * x7) + ((uint64_t)x25 * x5));
-  { uint64_t x42 = ((((uint64_t)(0x2 * x25) * x7) + ((uint64_t)x23 * x9)) + ((uint64_t)x27 * x5));
-  { uint64_t x43 = (((((uint64_t)x25 * x9) + ((uint64_t)x27 * x7)) + ((uint64_t)x23 * x11)) + ((uint64_t)x29 * x5));
-  { uint64_t x44 = (((((uint64_t)x27 * x9) + (0x2 * (((uint64_t)x25 * x11) + ((uint64_t)x29 * x7)))) + ((uint64_t)x23 * x13)) + ((uint64_t)x31 * x5));
-  { uint64_t x45 = (((((((uint64_t)x27 * x11) + ((uint64_t)x29 * x9)) + ((uint64_t)x25 * x13)) + ((uint64_t)x31 * x7)) + ((uint64_t)x23 * x15)) + ((uint64_t)x33 * x5));
-  { uint64_t x46 = (((((0x2 * ((((uint64_t)x29 * x11) + ((uint64_t)x25 * x15)) + ((uint64_t)x33 * x7))) + ((uint64_t)x27 * x13)) + ((uint64_t)x31 * x9)) + ((uint64_t)x23 * x17)) + ((uint64_t)x35 * x5));
-  { uint64_t x47 = (((((((((uint64_t)x29 * x13) + ((uint64_t)x31 * x11)) + ((uint64_t)x27 * x15)) + ((uint64_t)x33 * x9)) + ((uint64_t)x25 * x17)) + ((uint64_t)x35 * x7)) + ((uint64_t)x23 * x19)) + ((uint64_t)x37 * x5));
-  { uint64_t x48 = (((((((uint64_t)x31 * x13) + (0x2 * (((((uint64_t)x29 * x15) + ((uint64_t)x33 * x11)) + ((uint64_t)x25 * x19)) + ((uint64_t)x37 * x7)))) + ((uint64_t)x27 * x17)) + ((uint64_t)x35 * x9)) + ((uint64_t)x23 * x21)) + ((uint64_t)x39 * x5));
-  { uint64_t x49 = (((((((((((uint64_t)x31 * x15) + ((uint64_t)x33 * x13)) + ((uint64_t)x29 * x17)) + ((uint64_t)x35 * x11)) + ((uint64_t)x27 * x19)) + ((uint64_t)x37 * x9)) + ((uint64_t)x25 * x21)) + ((uint64_t)x39 * x7)) + ((uint64_t)x23 * x20)) + ((uint64_t)x38 * x5));
-  { uint64_t x50 = (((((0x2 * ((((((uint64_t)x33 * x15) + ((uint64_t)x29 * x19)) + ((uint64_t)x37 * x11)) + ((uint64_t)x25 * x20)) + ((uint64_t)x38 * x7))) + ((uint64_t)x31 * x17)) + ((uint64_t)x35 * x13)) + ((uint64_t)x27 * x21)) + ((uint64_t)x39 * x9));
-  { uint64_t x51 = (((((((((uint64_t)x33 * x17) + ((uint64_t)x35 * x15)) + ((uint64_t)x31 * x19)) + ((uint64_t)x37 * x13)) + ((uint64_t)x29 * x21)) + ((uint64_t)x39 * x11)) + ((uint64_t)x27 * x20)) + ((uint64_t)x38 * x9));
-  { uint64_t x52 = (((((uint64_t)x35 * x17) + (0x2 * (((((uint64_t)x33 * x19) + ((uint64_t)x37 * x15)) + ((uint64_t)x29 * x20)) + ((uint64_t)x38 * x11)))) + ((uint64_t)x31 * x21)) + ((uint64_t)x39 * x13));
-  { uint64_t x53 = (((((((uint64_t)x35 * x19) + ((uint64_t)x37 * x17)) + ((uint64_t)x33 * x21)) + ((uint64_t)x39 * x15)) + ((uint64_t)x31 * x20)) + ((uint64_t)x38 * x13));
-  { uint64_t x54 = (((0x2 * ((((uint64_t)x37 * x19) + ((uint64_t)x33 * x20)) + ((uint64_t)x38 * x15))) + ((uint64_t)x35 * x21)) + ((uint64_t)x39 * x17));
-  { uint64_t x55 = (((((uint64_t)x37 * x21) + ((uint64_t)x39 * x19)) + ((uint64_t)x35 * x20)) + ((uint64_t)x38 * x17));
-  { uint64_t x56 = (((uint64_t)x39 * x21) + (0x2 * (((uint64_t)x37 * x20) + ((uint64_t)x38 * x19))));
-  { uint64_t x57 = (((uint64_t)x39 * x20) + ((uint64_t)x38 * x21));
-  { uint64_t x58 = ((uint64_t)(0x2 * x38) * x20);
-  { uint64_t x59 = (x48 + (x58 << 0x4));
-  { uint64_t x60 = (x59 + (x58 << 0x1));
-  { uint64_t x61 = (x60 + x58);
-  { uint64_t x62 = (x47 + (x57 << 0x4));
-  { uint64_t x63 = (x62 + (x57 << 0x1));
-  { uint64_t x64 = (x63 + x57);
-  { uint64_t x65 = (x46 + (x56 << 0x4));
-  { uint64_t x66 = (x65 + (x56 << 0x1));
-  { uint64_t x67 = (x66 + x56);
-  { uint64_t x68 = (x45 + (x55 << 0x4));
-  { uint64_t x69 = (x68 + (x55 << 0x1));
-  { uint64_t x70 = (x69 + x55);
-  { uint64_t x71 = (x44 + (x54 << 0x4));
-  { uint64_t x72 = (x71 + (x54 << 0x1));
-  { uint64_t x73 = (x72 + x54);
-  { uint64_t x74 = (x43 + (x53 << 0x4));
-  { uint64_t x75 = (x74 + (x53 << 0x1));
-  { uint64_t x76 = (x75 + x53);
-  { uint64_t x77 = (x42 + (x52 << 0x4));
-  { uint64_t x78 = (x77 + (x52 << 0x1));
-  { uint64_t x79 = (x78 + x52);
-  { uint64_t x80 = (x41 + (x51 << 0x4));
-  { uint64_t x81 = (x80 + (x51 << 0x1));
-  { uint64_t x82 = (x81 + x51);
-  { uint64_t x83 = (x40 + (x50 << 0x4));
-  { uint64_t x84 = (x83 + (x50 << 0x1));
-  { uint64_t x85 = (x84 + x50);
-  { uint64_t x86 = (x85 >> 0x1a);
-  { uint32_t x87 = ((uint32_t)x85 & 0x3ffffff);
-  { uint64_t x88 = (x86 + x82);
-  { uint64_t x89 = (x88 >> 0x19);
-  { uint32_t x90 = ((uint32_t)x88 & 0x1ffffff);
-  { uint64_t x91 = (x89 + x79);
-  { uint64_t x92 = (x91 >> 0x1a);
-  { uint32_t x93 = ((uint32_t)x91 & 0x3ffffff);
-  { uint64_t x94 = (x92 + x76);
-  { uint64_t x95 = (x94 >> 0x19);
-  { uint32_t x96 = ((uint32_t)x94 & 0x1ffffff);
-  { uint64_t x97 = (x95 + x73);
-  { uint64_t x98 = (x97 >> 0x1a);
-  { uint32_t x99 = ((uint32_t)x97 & 0x3ffffff);
-  { uint64_t x100 = (x98 + x70);
-  { uint64_t x101 = (x100 >> 0x19);
-  { uint32_t x102 = ((uint32_t)x100 & 0x1ffffff);
-  { uint64_t x103 = (x101 + x67);
-  { uint64_t x104 = (x103 >> 0x1a);
-  { uint32_t x105 = ((uint32_t)x103 & 0x3ffffff);
-  { uint64_t x106 = (x104 + x64);
-  { uint64_t x107 = (x106 >> 0x19);
-  { uint32_t x108 = ((uint32_t)x106 & 0x1ffffff);
-  { uint64_t x109 = (x107 + x61);
-  { uint64_t x110 = (x109 >> 0x1a);
-  { uint32_t x111 = ((uint32_t)x109 & 0x3ffffff);
-  { uint64_t x112 = (x110 + x49);
-  { uint64_t x113 = (x112 >> 0x19);
-  { uint32_t x114 = ((uint32_t)x112 & 0x1ffffff);
-  { uint64_t x115 = (x87 + (0x13 * x113));
-  { uint32_t x116 = (uint32_t) (x115 >> 0x1a);
-  { uint32_t x117 = ((uint32_t)x115 & 0x3ffffff);
-  { uint32_t x118 = (x116 + x90);
-  { uint32_t x119 = (x118 >> 0x19);
-  { uint32_t x120 = (x118 & 0x1ffffff);
-  out[0] = x117;
-  out[1] = x120;
-  out[2] = (x119 + x93);
-  out[3] = x96;
-  out[4] = x99;
-  out[5] = x102;
-  out[6] = x105;
-  out[7] = x108;
-  out[8] = x111;
-  out[9] = x114;
-  }}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}
+  fiat_25519_carry_mul(out, in1, in2);
   assert_fe(out);
 }
 
@@ -1080,297 +255,42 @@
   fe_mul_impl(h->v, f->v, g->v);
 }
 
-static void fe_sqr_impl(uint32_t out[10], const uint32_t in1[10]) {
-  assert_fe_loose(in1);
-  { const uint32_t x17 = in1[9];
-  { const uint32_t x18 = in1[8];
-  { const uint32_t x16 = in1[7];
-  { const uint32_t x14 = in1[6];
-  { const uint32_t x12 = in1[5];
-  { const uint32_t x10 = in1[4];
-  { const uint32_t x8 = in1[3];
-  { const uint32_t x6 = in1[2];
-  { const uint32_t x4 = in1[1];
-  { const uint32_t x2 = in1[0];
-  { uint64_t x19 = ((uint64_t)x2 * x2);
-  { uint64_t x20 = ((uint64_t)(0x2 * x2) * x4);
-  { uint64_t x21 = (0x2 * (((uint64_t)x4 * x4) + ((uint64_t)x2 * x6)));
-  { uint64_t x22 = (0x2 * (((uint64_t)x4 * x6) + ((uint64_t)x2 * x8)));
-  { uint64_t x23 = ((((uint64_t)x6 * x6) + ((uint64_t)(0x4 * x4) * x8)) + ((uint64_t)(0x2 * x2) * x10));
-  { uint64_t x24 = (0x2 * ((((uint64_t)x6 * x8) + ((uint64_t)x4 * x10)) + ((uint64_t)x2 * x12)));
-  { uint64_t x25 = (0x2 * (((((uint64_t)x8 * x8) + ((uint64_t)x6 * x10)) + ((uint64_t)x2 * x14)) + ((uint64_t)(0x2 * x4) * x12)));
-  { uint64_t x26 = (0x2 * (((((uint64_t)x8 * x10) + ((uint64_t)x6 * x12)) + ((uint64_t)x4 * x14)) + ((uint64_t)x2 * x16)));
-  { uint64_t x27 = (((uint64_t)x10 * x10) + (0x2 * ((((uint64_t)x6 * x14) + ((uint64_t)x2 * x18)) + (0x2 * (((uint64_t)x4 * x16) + ((uint64_t)x8 * x12))))));
-  { uint64_t x28 = (0x2 * ((((((uint64_t)x10 * x12) + ((uint64_t)x8 * x14)) + ((uint64_t)x6 * x16)) + ((uint64_t)x4 * x18)) + ((uint64_t)x2 * x17)));
-  { uint64_t x29 = (0x2 * (((((uint64_t)x12 * x12) + ((uint64_t)x10 * x14)) + ((uint64_t)x6 * x18)) + (0x2 * (((uint64_t)x8 * x16) + ((uint64_t)x4 * x17)))));
-  { uint64_t x30 = (0x2 * (((((uint64_t)x12 * x14) + ((uint64_t)x10 * x16)) + ((uint64_t)x8 * x18)) + ((uint64_t)x6 * x17)));
-  { uint64_t x31 = (((uint64_t)x14 * x14) + (0x2 * (((uint64_t)x10 * x18) + (0x2 * (((uint64_t)x12 * x16) + ((uint64_t)x8 * x17))))));
-  { uint64_t x32 = (0x2 * ((((uint64_t)x14 * x16) + ((uint64_t)x12 * x18)) + ((uint64_t)x10 * x17)));
-  { uint64_t x33 = (0x2 * ((((uint64_t)x16 * x16) + ((uint64_t)x14 * x18)) + ((uint64_t)(0x2 * x12) * x17)));
-  { uint64_t x34 = (0x2 * (((uint64_t)x16 * x18) + ((uint64_t)x14 * x17)));
-  { uint64_t x35 = (((uint64_t)x18 * x18) + ((uint64_t)(0x4 * x16) * x17));
-  { uint64_t x36 = ((uint64_t)(0x2 * x18) * x17);
-  { uint64_t x37 = ((uint64_t)(0x2 * x17) * x17);
-  { uint64_t x38 = (x27 + (x37 << 0x4));
-  { uint64_t x39 = (x38 + (x37 << 0x1));
-  { uint64_t x40 = (x39 + x37);
-  { uint64_t x41 = (x26 + (x36 << 0x4));
-  { uint64_t x42 = (x41 + (x36 << 0x1));
-  { uint64_t x43 = (x42 + x36);
-  { uint64_t x44 = (x25 + (x35 << 0x4));
-  { uint64_t x45 = (x44 + (x35 << 0x1));
-  { uint64_t x46 = (x45 + x35);
-  { uint64_t x47 = (x24 + (x34 << 0x4));
-  { uint64_t x48 = (x47 + (x34 << 0x1));
-  { uint64_t x49 = (x48 + x34);
-  { uint64_t x50 = (x23 + (x33 << 0x4));
-  { uint64_t x51 = (x50 + (x33 << 0x1));
-  { uint64_t x52 = (x51 + x33);
-  { uint64_t x53 = (x22 + (x32 << 0x4));
-  { uint64_t x54 = (x53 + (x32 << 0x1));
-  { uint64_t x55 = (x54 + x32);
-  { uint64_t x56 = (x21 + (x31 << 0x4));
-  { uint64_t x57 = (x56 + (x31 << 0x1));
-  { uint64_t x58 = (x57 + x31);
-  { uint64_t x59 = (x20 + (x30 << 0x4));
-  { uint64_t x60 = (x59 + (x30 << 0x1));
-  { uint64_t x61 = (x60 + x30);
-  { uint64_t x62 = (x19 + (x29 << 0x4));
-  { uint64_t x63 = (x62 + (x29 << 0x1));
-  { uint64_t x64 = (x63 + x29);
-  { uint64_t x65 = (x64 >> 0x1a);
-  { uint32_t x66 = ((uint32_t)x64 & 0x3ffffff);
-  { uint64_t x67 = (x65 + x61);
-  { uint64_t x68 = (x67 >> 0x19);
-  { uint32_t x69 = ((uint32_t)x67 & 0x1ffffff);
-  { uint64_t x70 = (x68 + x58);
-  { uint64_t x71 = (x70 >> 0x1a);
-  { uint32_t x72 = ((uint32_t)x70 & 0x3ffffff);
-  { uint64_t x73 = (x71 + x55);
-  { uint64_t x74 = (x73 >> 0x19);
-  { uint32_t x75 = ((uint32_t)x73 & 0x1ffffff);
-  { uint64_t x76 = (x74 + x52);
-  { uint64_t x77 = (x76 >> 0x1a);
-  { uint32_t x78 = ((uint32_t)x76 & 0x3ffffff);
-  { uint64_t x79 = (x77 + x49);
-  { uint64_t x80 = (x79 >> 0x19);
-  { uint32_t x81 = ((uint32_t)x79 & 0x1ffffff);
-  { uint64_t x82 = (x80 + x46);
-  { uint64_t x83 = (x82 >> 0x1a);
-  { uint32_t x84 = ((uint32_t)x82 & 0x3ffffff);
-  { uint64_t x85 = (x83 + x43);
-  { uint64_t x86 = (x85 >> 0x19);
-  { uint32_t x87 = ((uint32_t)x85 & 0x1ffffff);
-  { uint64_t x88 = (x86 + x40);
-  { uint64_t x89 = (x88 >> 0x1a);
-  { uint32_t x90 = ((uint32_t)x88 & 0x3ffffff);
-  { uint64_t x91 = (x89 + x28);
-  { uint64_t x92 = (x91 >> 0x19);
-  { uint32_t x93 = ((uint32_t)x91 & 0x1ffffff);
-  { uint64_t x94 = (x66 + (0x13 * x92));
-  { uint32_t x95 = (uint32_t) (x94 >> 0x1a);
-  { uint32_t x96 = ((uint32_t)x94 & 0x3ffffff);
-  { uint32_t x97 = (x95 + x69);
-  { uint32_t x98 = (x97 >> 0x19);
-  { uint32_t x99 = (x97 & 0x1ffffff);
-  out[0] = x96;
-  out[1] = x99;
-  out[2] = (x98 + x72);
-  out[3] = x75;
-  out[4] = x78;
-  out[5] = x81;
-  out[6] = x84;
-  out[7] = x87;
-  out[8] = x90;
-  out[9] = x93;
-  }}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}
-  assert_fe(out);
-}
-
 static void fe_sq_tl(fe *h, const fe_loose *f) {
-  fe_sqr_impl(h->v, f->v);
+  assert_fe_loose(f->v);
+  fiat_25519_carry_square(h->v, f->v);
+  assert_fe(h->v);
 }
 
 static void fe_sq_tt(fe *h, const fe *f) {
-  fe_sqr_impl(h->v, f->v);
+  assert_fe_loose(f->v);
+  fiat_25519_carry_square(h->v, f->v);
+  assert_fe(h->v);
 }
 
 // Replace (f,g) with (g,f) if b == 1;
 // replace (f,g) with (f,g) if b == 0.
 //
 // Preconditions: b in {0,1}.
-static void fe_cswap(fe *f, fe *g, unsigned int b) {
+static void fe_cswap(fe *f, fe *g, fe_limb_t b) {
   b = 0-b;
-  unsigned i;
-  for (i = 0; i < 10; i++) {
-    uint32_t x = f->v[i] ^ g->v[i];
+  for (unsigned i = 0; i < FE_NUM_LIMBS; i++) {
+    fe_limb_t x = f->v[i] ^ g->v[i];
     x &= b;
     f->v[i] ^= x;
     g->v[i] ^= x;
   }
 }
 
-// NOTE: based on fiat-crypto fe_mul, edited for in2=121666, 0, 0..
-static void fe_mul_121666_impl(uint32_t out[10], const uint32_t in1[10]) {
-  { const uint32_t x20 = in1[9];
-  { const uint32_t x21 = in1[8];
-  { const uint32_t x19 = in1[7];
-  { const uint32_t x17 = in1[6];
-  { const uint32_t x15 = in1[5];
-  { const uint32_t x13 = in1[4];
-  { const uint32_t x11 = in1[3];
-  { const uint32_t x9 = in1[2];
-  { const uint32_t x7 = in1[1];
-  { const uint32_t x5 = in1[0];
-  { const uint32_t x38 = 0;
-  { const uint32_t x39 = 0;
-  { const uint32_t x37 = 0;
-  { const uint32_t x35 = 0;
-  { const uint32_t x33 = 0;
-  { const uint32_t x31 = 0;
-  { const uint32_t x29 = 0;
-  { const uint32_t x27 = 0;
-  { const uint32_t x25 = 0;
-  { const uint32_t x23 = 121666;
-  { uint64_t x40 = ((uint64_t)x23 * x5);
-  { uint64_t x41 = (((uint64_t)x23 * x7) + ((uint64_t)x25 * x5));
-  { uint64_t x42 = ((((uint64_t)(0x2 * x25) * x7) + ((uint64_t)x23 * x9)) + ((uint64_t)x27 * x5));
-  { uint64_t x43 = (((((uint64_t)x25 * x9) + ((uint64_t)x27 * x7)) + ((uint64_t)x23 * x11)) + ((uint64_t)x29 * x5));
-  { uint64_t x44 = (((((uint64_t)x27 * x9) + (0x2 * (((uint64_t)x25 * x11) + ((uint64_t)x29 * x7)))) + ((uint64_t)x23 * x13)) + ((uint64_t)x31 * x5));
-  { uint64_t x45 = (((((((uint64_t)x27 * x11) + ((uint64_t)x29 * x9)) + ((uint64_t)x25 * x13)) + ((uint64_t)x31 * x7)) + ((uint64_t)x23 * x15)) + ((uint64_t)x33 * x5));
-  { uint64_t x46 = (((((0x2 * ((((uint64_t)x29 * x11) + ((uint64_t)x25 * x15)) + ((uint64_t)x33 * x7))) + ((uint64_t)x27 * x13)) + ((uint64_t)x31 * x9)) + ((uint64_t)x23 * x17)) + ((uint64_t)x35 * x5));
-  { uint64_t x47 = (((((((((uint64_t)x29 * x13) + ((uint64_t)x31 * x11)) + ((uint64_t)x27 * x15)) + ((uint64_t)x33 * x9)) + ((uint64_t)x25 * x17)) + ((uint64_t)x35 * x7)) + ((uint64_t)x23 * x19)) + ((uint64_t)x37 * x5));
-  { uint64_t x48 = (((((((uint64_t)x31 * x13) + (0x2 * (((((uint64_t)x29 * x15) + ((uint64_t)x33 * x11)) + ((uint64_t)x25 * x19)) + ((uint64_t)x37 * x7)))) + ((uint64_t)x27 * x17)) + ((uint64_t)x35 * x9)) + ((uint64_t)x23 * x21)) + ((uint64_t)x39 * x5));
-  { uint64_t x49 = (((((((((((uint64_t)x31 * x15) + ((uint64_t)x33 * x13)) + ((uint64_t)x29 * x17)) + ((uint64_t)x35 * x11)) + ((uint64_t)x27 * x19)) + ((uint64_t)x37 * x9)) + ((uint64_t)x25 * x21)) + ((uint64_t)x39 * x7)) + ((uint64_t)x23 * x20)) + ((uint64_t)x38 * x5));
-  { uint64_t x50 = (((((0x2 * ((((((uint64_t)x33 * x15) + ((uint64_t)x29 * x19)) + ((uint64_t)x37 * x11)) + ((uint64_t)x25 * x20)) + ((uint64_t)x38 * x7))) + ((uint64_t)x31 * x17)) + ((uint64_t)x35 * x13)) + ((uint64_t)x27 * x21)) + ((uint64_t)x39 * x9));
-  { uint64_t x51 = (((((((((uint64_t)x33 * x17) + ((uint64_t)x35 * x15)) + ((uint64_t)x31 * x19)) + ((uint64_t)x37 * x13)) + ((uint64_t)x29 * x21)) + ((uint64_t)x39 * x11)) + ((uint64_t)x27 * x20)) + ((uint64_t)x38 * x9));
-  { uint64_t x52 = (((((uint64_t)x35 * x17) + (0x2 * (((((uint64_t)x33 * x19) + ((uint64_t)x37 * x15)) + ((uint64_t)x29 * x20)) + ((uint64_t)x38 * x11)))) + ((uint64_t)x31 * x21)) + ((uint64_t)x39 * x13));
-  { uint64_t x53 = (((((((uint64_t)x35 * x19) + ((uint64_t)x37 * x17)) + ((uint64_t)x33 * x21)) + ((uint64_t)x39 * x15)) + ((uint64_t)x31 * x20)) + ((uint64_t)x38 * x13));
-  { uint64_t x54 = (((0x2 * ((((uint64_t)x37 * x19) + ((uint64_t)x33 * x20)) + ((uint64_t)x38 * x15))) + ((uint64_t)x35 * x21)) + ((uint64_t)x39 * x17));
-  { uint64_t x55 = (((((uint64_t)x37 * x21) + ((uint64_t)x39 * x19)) + ((uint64_t)x35 * x20)) + ((uint64_t)x38 * x17));
-  { uint64_t x56 = (((uint64_t)x39 * x21) + (0x2 * (((uint64_t)x37 * x20) + ((uint64_t)x38 * x19))));
-  { uint64_t x57 = (((uint64_t)x39 * x20) + ((uint64_t)x38 * x21));
-  { uint64_t x58 = ((uint64_t)(0x2 * x38) * x20);
-  { uint64_t x59 = (x48 + (x58 << 0x4));
-  { uint64_t x60 = (x59 + (x58 << 0x1));
-  { uint64_t x61 = (x60 + x58);
-  { uint64_t x62 = (x47 + (x57 << 0x4));
-  { uint64_t x63 = (x62 + (x57 << 0x1));
-  { uint64_t x64 = (x63 + x57);
-  { uint64_t x65 = (x46 + (x56 << 0x4));
-  { uint64_t x66 = (x65 + (x56 << 0x1));
-  { uint64_t x67 = (x66 + x56);
-  { uint64_t x68 = (x45 + (x55 << 0x4));
-  { uint64_t x69 = (x68 + (x55 << 0x1));
-  { uint64_t x70 = (x69 + x55);
-  { uint64_t x71 = (x44 + (x54 << 0x4));
-  { uint64_t x72 = (x71 + (x54 << 0x1));
-  { uint64_t x73 = (x72 + x54);
-  { uint64_t x74 = (x43 + (x53 << 0x4));
-  { uint64_t x75 = (x74 + (x53 << 0x1));
-  { uint64_t x76 = (x75 + x53);
-  { uint64_t x77 = (x42 + (x52 << 0x4));
-  { uint64_t x78 = (x77 + (x52 << 0x1));
-  { uint64_t x79 = (x78 + x52);
-  { uint64_t x80 = (x41 + (x51 << 0x4));
-  { uint64_t x81 = (x80 + (x51 << 0x1));
-  { uint64_t x82 = (x81 + x51);
-  { uint64_t x83 = (x40 + (x50 << 0x4));
-  { uint64_t x84 = (x83 + (x50 << 0x1));
-  { uint64_t x85 = (x84 + x50);
-  { uint64_t x86 = (x85 >> 0x1a);
-  { uint32_t x87 = ((uint32_t)x85 & 0x3ffffff);
-  { uint64_t x88 = (x86 + x82);
-  { uint64_t x89 = (x88 >> 0x19);
-  { uint32_t x90 = ((uint32_t)x88 & 0x1ffffff);
-  { uint64_t x91 = (x89 + x79);
-  { uint64_t x92 = (x91 >> 0x1a);
-  { uint32_t x93 = ((uint32_t)x91 & 0x3ffffff);
-  { uint64_t x94 = (x92 + x76);
-  { uint64_t x95 = (x94 >> 0x19);
-  { uint32_t x96 = ((uint32_t)x94 & 0x1ffffff);
-  { uint64_t x97 = (x95 + x73);
-  { uint64_t x98 = (x97 >> 0x1a);
-  { uint32_t x99 = ((uint32_t)x97 & 0x3ffffff);
-  { uint64_t x100 = (x98 + x70);
-  { uint64_t x101 = (x100 >> 0x19);
-  { uint32_t x102 = ((uint32_t)x100 & 0x1ffffff);
-  { uint64_t x103 = (x101 + x67);
-  { uint64_t x104 = (x103 >> 0x1a);
-  { uint32_t x105 = ((uint32_t)x103 & 0x3ffffff);
-  { uint64_t x106 = (x104 + x64);
-  { uint64_t x107 = (x106 >> 0x19);
-  { uint32_t x108 = ((uint32_t)x106 & 0x1ffffff);
-  { uint64_t x109 = (x107 + x61);
-  { uint64_t x110 = (x109 >> 0x1a);
-  { uint32_t x111 = ((uint32_t)x109 & 0x3ffffff);
-  { uint64_t x112 = (x110 + x49);
-  { uint64_t x113 = (x112 >> 0x19);
-  { uint32_t x114 = ((uint32_t)x112 & 0x1ffffff);
-  { uint64_t x115 = (x87 + (0x13 * x113));
-  { uint32_t x116 = (uint32_t) (x115 >> 0x1a);
-  { uint32_t x117 = ((uint32_t)x115 & 0x3ffffff);
-  { uint32_t x118 = (x116 + x90);
-  { uint32_t x119 = (x118 >> 0x19);
-  { uint32_t x120 = (x118 & 0x1ffffff);
-  out[0] = x117;
-  out[1] = x120;
-  out[2] = (x119 + x93);
-  out[3] = x96;
-  out[4] = x99;
-  out[5] = x102;
-  out[6] = x105;
-  out[7] = x108;
-  out[8] = x111;
-  out[9] = x114;
-  }}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}
-}
-
 static void fe_mul121666(fe *h, const fe_loose *f) {
   assert_fe_loose(f->v);
-  fe_mul_121666_impl(h->v, f->v);
+  fiat_25519_carry_scmul_121666(h->v, f->v);
   assert_fe(h->v);
 }
 
-// Adapted from Fiat-synthesized |fe_sub_impl| with |out| = 0.
-static void fe_neg_impl(uint32_t out[10], const uint32_t in2[10]) {
-  { const uint32_t x20 = 0;
-  { const uint32_t x21 = 0;
-  { const uint32_t x19 = 0;
-  { const uint32_t x17 = 0;
-  { const uint32_t x15 = 0;
-  { const uint32_t x13 = 0;
-  { const uint32_t x11 = 0;
-  { const uint32_t x9 = 0;
-  { const uint32_t x7 = 0;
-  { const uint32_t x5 = 0;
-  { const uint32_t x38 = in2[9];
-  { const uint32_t x39 = in2[8];
-  { const uint32_t x37 = in2[7];
-  { const uint32_t x35 = in2[6];
-  { const uint32_t x33 = in2[5];
-  { const uint32_t x31 = in2[4];
-  { const uint32_t x29 = in2[3];
-  { const uint32_t x27 = in2[2];
-  { const uint32_t x25 = in2[1];
-  { const uint32_t x23 = in2[0];
-  out[0] = ((0x7ffffda + x5) - x23);
-  out[1] = ((0x3fffffe + x7) - x25);
-  out[2] = ((0x7fffffe + x9) - x27);
-  out[3] = ((0x3fffffe + x11) - x29);
-  out[4] = ((0x7fffffe + x13) - x31);
-  out[5] = ((0x3fffffe + x15) - x33);
-  out[6] = ((0x7fffffe + x17) - x35);
-  out[7] = ((0x3fffffe + x19) - x37);
-  out[8] = ((0x7fffffe + x21) - x39);
-  out[9] = ((0x3fffffe + x20) - x38);
-  }}}}}}}}}}}}}}}}}}}}
-}
-
 // h = -f
 static void fe_neg(fe_loose *h, const fe *f) {
   assert_fe(f->v);
-  fe_neg_impl(h->v, f->v);
+  fiat_25519_opp(h->v, f->v);
   assert_fe_loose(h->v);
 }
 
@@ -1378,18 +298,22 @@
 // replace (f,g) with (f,g) if b == 0.
 //
 // Preconditions: b in {0,1}.
-static void fe_cmov(fe_loose *f, const fe_loose *g, unsigned b) {
+static void fe_cmov(fe_loose *f, const fe_loose *g, fe_limb_t b) {
+  // Silence an unused function warning. |fiat_25519_selectznz| isn't quite the
+  // calling convention the rest of this code wants, so implement it by hand.
+  //
+  // TODO(davidben): Switch to fiat's calling convention, or ask fiat to emit a
+  // different one.
+  (void)fiat_25519_selectznz;
+
   b = 0-b;
-  unsigned i;
-  for (i = 0; i < 10; i++) {
-    uint32_t x = f->v[i] ^ g->v[i];
+  for (unsigned i = 0; i < FE_NUM_LIMBS; i++) {
+    fe_limb_t x = f->v[i] ^ g->v[i];
     x &= b;
     f->v[i] ^= x;
   }
 }
 
-#endif  // BORINGSSL_CURVE25519_64BIT
-
 // h = f
 static void fe_copy(fe *h, const fe *f) {
   OPENSSL_memmove(h, f, sizeof(fe));
@@ -1813,10 +737,12 @@
 
   unsigned i;
   for (i = 0; i < 15; i++) {
+    // The precomputed table is assumed to already clear the top bit, so
+    // |fe_frombytes_strict| may be used directly.
     const uint8_t *bytes = &precomp_table[i*(2 * 32)];
     fe x, y;
-    fe_frombytes(&x, bytes);
-    fe_frombytes(&y, bytes + 32);
+    fe_frombytes_strict(&x, bytes);
+    fe_frombytes_strict(&y, bytes + 32);
 
     ge_precomp *out = &multiples[i];
     fe_add(&out->yplusx, &y, &x);
diff --git a/third_party/fiat/curve25519_32.c b/third_party/fiat/curve25519_32.c
new file mode 100644
index 0000000..820a5c9
--- /dev/null
+++ b/third_party/fiat/curve25519_32.c
@@ -0,0 +1,905 @@
+/* Autogenerated */
+/* curve description: 25519 */
+/* requested operations: carry_mul, carry_square, carry_scmul121666, carry, add, sub, opp, selectznz, to_bytes, from_bytes */
+/* n = 10 (from "10") */
+/* s = 0x8000000000000000000000000000000000000000000000000000000000000000 (from "2^255") */
+/* c = [(1, 19)] (from "1,19") */
+/* machine_wordsize = 32 (from "32") */
+
+#include <stdint.h>
+typedef unsigned char fiat_25519_uint1;
+typedef signed char fiat_25519_int1;
+
+
+/*
+ * Input Bounds:
+ *   arg1: [0x0 ~> 0x1]
+ *   arg2: [0x0 ~> 0x3ffffff]
+ *   arg3: [0x0 ~> 0x3ffffff]
+ * Output Bounds:
+ *   out1: [0x0 ~> 0x3ffffff]
+ *   out2: [0x0 ~> 0x1]
+ */
+static void fiat_25519_addcarryx_u26(uint32_t* out1, fiat_25519_uint1* out2, fiat_25519_uint1 arg1, uint32_t arg2, uint32_t arg3) {
+  uint32_t x1 = ((arg1 + arg2) + arg3);
+  uint32_t x2 = (x1 & UINT32_C(0x3ffffff));
+  fiat_25519_uint1 x3 = (fiat_25519_uint1)(x1 >> 26);
+  *out1 = x2;
+  *out2 = x3;
+}
+
+/*
+ * Input Bounds:
+ *   arg1: [0x0 ~> 0x1]
+ *   arg2: [0x0 ~> 0x3ffffff]
+ *   arg3: [0x0 ~> 0x3ffffff]
+ * Output Bounds:
+ *   out1: [0x0 ~> 0x3ffffff]
+ *   out2: [0x0 ~> 0x1]
+ */
+static void fiat_25519_subborrowx_u26(uint32_t* out1, fiat_25519_uint1* out2, fiat_25519_uint1 arg1, uint32_t arg2, uint32_t arg3) {
+  int32_t x1 = ((int32_t)(arg2 - arg1) - (int32_t)arg3);
+  fiat_25519_int1 x2 = (fiat_25519_int1)(x1 >> 26);
+  uint32_t x3 = (x1 & UINT32_C(0x3ffffff));
+  *out1 = x3;
+  *out2 = (fiat_25519_uint1)(0x0 - x2);
+}
+
+/*
+ * Input Bounds:
+ *   arg1: [0x0 ~> 0x1]
+ *   arg2: [0x0 ~> 0x1ffffff]
+ *   arg3: [0x0 ~> 0x1ffffff]
+ * Output Bounds:
+ *   out1: [0x0 ~> 0x1ffffff]
+ *   out2: [0x0 ~> 0x1]
+ */
+static void fiat_25519_addcarryx_u25(uint32_t* out1, fiat_25519_uint1* out2, fiat_25519_uint1 arg1, uint32_t arg2, uint32_t arg3) {
+  uint32_t x1 = ((arg1 + arg2) + arg3);
+  uint32_t x2 = (x1 & UINT32_C(0x1ffffff));
+  fiat_25519_uint1 x3 = (fiat_25519_uint1)(x1 >> 25);
+  *out1 = x2;
+  *out2 = x3;
+}
+
+/*
+ * Input Bounds:
+ *   arg1: [0x0 ~> 0x1]
+ *   arg2: [0x0 ~> 0x1ffffff]
+ *   arg3: [0x0 ~> 0x1ffffff]
+ * Output Bounds:
+ *   out1: [0x0 ~> 0x1ffffff]
+ *   out2: [0x0 ~> 0x1]
+ */
+static void fiat_25519_subborrowx_u25(uint32_t* out1, fiat_25519_uint1* out2, fiat_25519_uint1 arg1, uint32_t arg2, uint32_t arg3) {
+  int32_t x1 = ((int32_t)(arg2 - arg1) - (int32_t)arg3);
+  fiat_25519_int1 x2 = (fiat_25519_int1)(x1 >> 25);
+  uint32_t x3 = (x1 & UINT32_C(0x1ffffff));
+  *out1 = x3;
+  *out2 = (fiat_25519_uint1)(0x0 - x2);
+}
+
+/*
+ * Input Bounds:
+ *   arg1: [0x0 ~> 0x1]
+ *   arg2: [0x0 ~> 0xffffffff]
+ *   arg3: [0x0 ~> 0xffffffff]
+ * Output Bounds:
+ *   out1: [0x0 ~> 0xffffffff]
+ */
+static void fiat_25519_cmovznz_u32(uint32_t* out1, fiat_25519_uint1 arg1, uint32_t arg2, uint32_t arg3) {
+  fiat_25519_uint1 x1 = (!(!arg1));
+  uint32_t x2 = ((fiat_25519_int1)(0x0 - x1) & UINT32_C(0xffffffff));
+  uint32_t x3 = ((x2 & arg3) | ((~x2) & arg2));
+  *out1 = x3;
+}
+
+/*
+ * Input Bounds:
+ *   arg1: [[0x0 ~> 0xd333332], [0x0 ~> 0x6999999], [0x0 ~> 0xd333332], [0x0 ~> 0x6999999], [0x0 ~> 0xd333332], [0x0 ~> 0x6999999], [0x0 ~> 0xd333332], [0x0 ~> 0x6999999], [0x0 ~> 0xd333332], [0x0 ~> 0x6999999]]
+ *   arg2: [[0x0 ~> 0xd333332], [0x0 ~> 0x6999999], [0x0 ~> 0xd333332], [0x0 ~> 0x6999999], [0x0 ~> 0xd333332], [0x0 ~> 0x6999999], [0x0 ~> 0xd333332], [0x0 ~> 0x6999999], [0x0 ~> 0xd333332], [0x0 ~> 0x6999999]]
+ * Output Bounds:
+ *   out1: [[0x0 ~> 0x4666666], [0x0 ~> 0x2333333], [0x0 ~> 0x4666666], [0x0 ~> 0x2333333], [0x0 ~> 0x4666666], [0x0 ~> 0x2333333], [0x0 ~> 0x4666666], [0x0 ~> 0x2333333], [0x0 ~> 0x4666666], [0x0 ~> 0x2333333]]
+ */
+static void fiat_25519_carry_mul(uint32_t out1[10], const uint32_t arg1[10], const uint32_t arg2[10]) {
+  uint64_t x1 = ((uint64_t)(arg1[9]) * ((arg2[9]) * ((uint32_t)0x2 * UINT8_C(0x13))));
+  uint64_t x2 = ((uint64_t)(arg1[9]) * ((arg2[8]) * (uint32_t)UINT8_C(0x13)));
+  uint64_t x3 = ((uint64_t)(arg1[9]) * ((arg2[7]) * ((uint32_t)0x2 * UINT8_C(0x13))));
+  uint64_t x4 = ((uint64_t)(arg1[9]) * ((arg2[6]) * (uint32_t)UINT8_C(0x13)));
+  uint64_t x5 = ((uint64_t)(arg1[9]) * ((arg2[5]) * ((uint32_t)0x2 * UINT8_C(0x13))));
+  uint64_t x6 = ((uint64_t)(arg1[9]) * ((arg2[4]) * (uint32_t)UINT8_C(0x13)));
+  uint64_t x7 = ((uint64_t)(arg1[9]) * ((arg2[3]) * ((uint32_t)0x2 * UINT8_C(0x13))));
+  uint64_t x8 = ((uint64_t)(arg1[9]) * ((arg2[2]) * (uint32_t)UINT8_C(0x13)));
+  uint64_t x9 = ((uint64_t)(arg1[9]) * ((arg2[1]) * ((uint32_t)0x2 * UINT8_C(0x13))));
+  uint64_t x10 = ((uint64_t)(arg1[8]) * ((arg2[9]) * (uint32_t)UINT8_C(0x13)));
+  uint64_t x11 = ((uint64_t)(arg1[8]) * ((arg2[8]) * (uint32_t)UINT8_C(0x13)));
+  uint64_t x12 = ((uint64_t)(arg1[8]) * ((arg2[7]) * (uint32_t)UINT8_C(0x13)));
+  uint64_t x13 = ((uint64_t)(arg1[8]) * ((arg2[6]) * (uint32_t)UINT8_C(0x13)));
+  uint64_t x14 = ((uint64_t)(arg1[8]) * ((arg2[5]) * (uint32_t)UINT8_C(0x13)));
+  uint64_t x15 = ((uint64_t)(arg1[8]) * ((arg2[4]) * (uint32_t)UINT8_C(0x13)));
+  uint64_t x16 = ((uint64_t)(arg1[8]) * ((arg2[3]) * (uint32_t)UINT8_C(0x13)));
+  uint64_t x17 = ((uint64_t)(arg1[8]) * ((arg2[2]) * (uint32_t)UINT8_C(0x13)));
+  uint64_t x18 = ((uint64_t)(arg1[7]) * ((arg2[9]) * ((uint32_t)0x2 * UINT8_C(0x13))));
+  uint64_t x19 = ((uint64_t)(arg1[7]) * ((arg2[8]) * (uint32_t)UINT8_C(0x13)));
+  uint64_t x20 = ((uint64_t)(arg1[7]) * ((arg2[7]) * ((uint32_t)0x2 * UINT8_C(0x13))));
+  uint64_t x21 = ((uint64_t)(arg1[7]) * ((arg2[6]) * (uint32_t)UINT8_C(0x13)));
+  uint64_t x22 = ((uint64_t)(arg1[7]) * ((arg2[5]) * ((uint32_t)0x2 * UINT8_C(0x13))));
+  uint64_t x23 = ((uint64_t)(arg1[7]) * ((arg2[4]) * (uint32_t)UINT8_C(0x13)));
+  uint64_t x24 = ((uint64_t)(arg1[7]) * ((arg2[3]) * ((uint32_t)0x2 * UINT8_C(0x13))));
+  uint64_t x25 = ((uint64_t)(arg1[6]) * ((arg2[9]) * (uint32_t)UINT8_C(0x13)));
+  uint64_t x26 = ((uint64_t)(arg1[6]) * ((arg2[8]) * (uint32_t)UINT8_C(0x13)));
+  uint64_t x27 = ((uint64_t)(arg1[6]) * ((arg2[7]) * (uint32_t)UINT8_C(0x13)));
+  uint64_t x28 = ((uint64_t)(arg1[6]) * ((arg2[6]) * (uint32_t)UINT8_C(0x13)));
+  uint64_t x29 = ((uint64_t)(arg1[6]) * ((arg2[5]) * (uint32_t)UINT8_C(0x13)));
+  uint64_t x30 = ((uint64_t)(arg1[6]) * ((arg2[4]) * (uint32_t)UINT8_C(0x13)));
+  uint64_t x31 = ((uint64_t)(arg1[5]) * ((arg2[9]) * ((uint32_t)0x2 * UINT8_C(0x13))));
+  uint64_t x32 = ((uint64_t)(arg1[5]) * ((arg2[8]) * (uint32_t)UINT8_C(0x13)));
+  uint64_t x33 = ((uint64_t)(arg1[5]) * ((arg2[7]) * ((uint32_t)0x2 * UINT8_C(0x13))));
+  uint64_t x34 = ((uint64_t)(arg1[5]) * ((arg2[6]) * (uint32_t)UINT8_C(0x13)));
+  uint64_t x35 = ((uint64_t)(arg1[5]) * ((arg2[5]) * ((uint32_t)0x2 * UINT8_C(0x13))));
+  uint64_t x36 = ((uint64_t)(arg1[4]) * ((arg2[9]) * (uint32_t)UINT8_C(0x13)));
+  uint64_t x37 = ((uint64_t)(arg1[4]) * ((arg2[8]) * (uint32_t)UINT8_C(0x13)));
+  uint64_t x38 = ((uint64_t)(arg1[4]) * ((arg2[7]) * (uint32_t)UINT8_C(0x13)));
+  uint64_t x39 = ((uint64_t)(arg1[4]) * ((arg2[6]) * (uint32_t)UINT8_C(0x13)));
+  uint64_t x40 = ((uint64_t)(arg1[3]) * ((arg2[9]) * ((uint32_t)0x2 * UINT8_C(0x13))));
+  uint64_t x41 = ((uint64_t)(arg1[3]) * ((arg2[8]) * (uint32_t)UINT8_C(0x13)));
+  uint64_t x42 = ((uint64_t)(arg1[3]) * ((arg2[7]) * ((uint32_t)0x2 * UINT8_C(0x13))));
+  uint64_t x43 = ((uint64_t)(arg1[2]) * ((arg2[9]) * (uint32_t)UINT8_C(0x13)));
+  uint64_t x44 = ((uint64_t)(arg1[2]) * ((arg2[8]) * (uint32_t)UINT8_C(0x13)));
+  uint64_t x45 = ((uint64_t)(arg1[1]) * ((arg2[9]) * ((uint32_t)0x2 * UINT8_C(0x13))));
+  uint64_t x46 = ((uint64_t)(arg1[9]) * (arg2[0]));
+  uint64_t x47 = ((uint64_t)(arg1[8]) * (arg2[1]));
+  uint64_t x48 = ((uint64_t)(arg1[8]) * (arg2[0]));
+  uint64_t x49 = ((uint64_t)(arg1[7]) * (arg2[2]));
+  uint64_t x50 = ((uint64_t)(arg1[7]) * ((arg2[1]) * (uint32_t)0x2));
+  uint64_t x51 = ((uint64_t)(arg1[7]) * (arg2[0]));
+  uint64_t x52 = ((uint64_t)(arg1[6]) * (arg2[3]));
+  uint64_t x53 = ((uint64_t)(arg1[6]) * (arg2[2]));
+  uint64_t x54 = ((uint64_t)(arg1[6]) * (arg2[1]));
+  uint64_t x55 = ((uint64_t)(arg1[6]) * (arg2[0]));
+  uint64_t x56 = ((uint64_t)(arg1[5]) * (arg2[4]));
+  uint64_t x57 = ((uint64_t)(arg1[5]) * ((arg2[3]) * (uint32_t)0x2));
+  uint64_t x58 = ((uint64_t)(arg1[5]) * (arg2[2]));
+  uint64_t x59 = ((uint64_t)(arg1[5]) * ((arg2[1]) * (uint32_t)0x2));
+  uint64_t x60 = ((uint64_t)(arg1[5]) * (arg2[0]));
+  uint64_t x61 = ((uint64_t)(arg1[4]) * (arg2[5]));
+  uint64_t x62 = ((uint64_t)(arg1[4]) * (arg2[4]));
+  uint64_t x63 = ((uint64_t)(arg1[4]) * (arg2[3]));
+  uint64_t x64 = ((uint64_t)(arg1[4]) * (arg2[2]));
+  uint64_t x65 = ((uint64_t)(arg1[4]) * (arg2[1]));
+  uint64_t x66 = ((uint64_t)(arg1[4]) * (arg2[0]));
+  uint64_t x67 = ((uint64_t)(arg1[3]) * (arg2[6]));
+  uint64_t x68 = ((uint64_t)(arg1[3]) * ((arg2[5]) * (uint32_t)0x2));
+  uint64_t x69 = ((uint64_t)(arg1[3]) * (arg2[4]));
+  uint64_t x70 = ((uint64_t)(arg1[3]) * ((arg2[3]) * (uint32_t)0x2));
+  uint64_t x71 = ((uint64_t)(arg1[3]) * (arg2[2]));
+  uint64_t x72 = ((uint64_t)(arg1[3]) * ((arg2[1]) * (uint32_t)0x2));
+  uint64_t x73 = ((uint64_t)(arg1[3]) * (arg2[0]));
+  uint64_t x74 = ((uint64_t)(arg1[2]) * (arg2[7]));
+  uint64_t x75 = ((uint64_t)(arg1[2]) * (arg2[6]));
+  uint64_t x76 = ((uint64_t)(arg1[2]) * (arg2[5]));
+  uint64_t x77 = ((uint64_t)(arg1[2]) * (arg2[4]));
+  uint64_t x78 = ((uint64_t)(arg1[2]) * (arg2[3]));
+  uint64_t x79 = ((uint64_t)(arg1[2]) * (arg2[2]));
+  uint64_t x80 = ((uint64_t)(arg1[2]) * (arg2[1]));
+  uint64_t x81 = ((uint64_t)(arg1[2]) * (arg2[0]));
+  uint64_t x82 = ((uint64_t)(arg1[1]) * (arg2[8]));
+  uint64_t x83 = ((uint64_t)(arg1[1]) * ((arg2[7]) * (uint32_t)0x2));
+  uint64_t x84 = ((uint64_t)(arg1[1]) * (arg2[6]));
+  uint64_t x85 = ((uint64_t)(arg1[1]) * ((arg2[5]) * (uint32_t)0x2));
+  uint64_t x86 = ((uint64_t)(arg1[1]) * (arg2[4]));
+  uint64_t x87 = ((uint64_t)(arg1[1]) * ((arg2[3]) * (uint32_t)0x2));
+  uint64_t x88 = ((uint64_t)(arg1[1]) * (arg2[2]));
+  uint64_t x89 = ((uint64_t)(arg1[1]) * ((arg2[1]) * (uint32_t)0x2));
+  uint64_t x90 = ((uint64_t)(arg1[1]) * (arg2[0]));
+  uint64_t x91 = ((uint64_t)(arg1[0]) * (arg2[9]));
+  uint64_t x92 = ((uint64_t)(arg1[0]) * (arg2[8]));
+  uint64_t x93 = ((uint64_t)(arg1[0]) * (arg2[7]));
+  uint64_t x94 = ((uint64_t)(arg1[0]) * (arg2[6]));
+  uint64_t x95 = ((uint64_t)(arg1[0]) * (arg2[5]));
+  uint64_t x96 = ((uint64_t)(arg1[0]) * (arg2[4]));
+  uint64_t x97 = ((uint64_t)(arg1[0]) * (arg2[3]));
+  uint64_t x98 = ((uint64_t)(arg1[0]) * (arg2[2]));
+  uint64_t x99 = ((uint64_t)(arg1[0]) * (arg2[1]));
+  uint64_t x100 = ((uint64_t)(arg1[0]) * (arg2[0]));
+  uint64_t x101 = (x100 + (x45 + (x44 + (x42 + (x39 + (x35 + (x30 + (x24 + (x17 + x9)))))))));
+  uint64_t x102 = (x101 >> 26);
+  uint32_t x103 = (uint32_t)(x101 & UINT32_C(0x3ffffff));
+  uint64_t x104 = (x91 + (x82 + (x74 + (x67 + (x61 + (x56 + (x52 + (x49 + (x47 + x46)))))))));
+  uint64_t x105 = (x92 + (x83 + (x75 + (x68 + (x62 + (x57 + (x53 + (x50 + (x48 + x1)))))))));
+  uint64_t x106 = (x93 + (x84 + (x76 + (x69 + (x63 + (x58 + (x54 + (x51 + (x10 + x2)))))))));
+  uint64_t x107 = (x94 + (x85 + (x77 + (x70 + (x64 + (x59 + (x55 + (x18 + (x11 + x3)))))))));
+  uint64_t x108 = (x95 + (x86 + (x78 + (x71 + (x65 + (x60 + (x25 + (x19 + (x12 + x4)))))))));
+  uint64_t x109 = (x96 + (x87 + (x79 + (x72 + (x66 + (x31 + (x26 + (x20 + (x13 + x5)))))))));
+  uint64_t x110 = (x97 + (x88 + (x80 + (x73 + (x36 + (x32 + (x27 + (x21 + (x14 + x6)))))))));
+  uint64_t x111 = (x98 + (x89 + (x81 + (x40 + (x37 + (x33 + (x28 + (x22 + (x15 + x7)))))))));
+  uint64_t x112 = (x99 + (x90 + (x43 + (x41 + (x38 + (x34 + (x29 + (x23 + (x16 + x8)))))))));
+  uint64_t x113 = (x102 + x112);
+  uint64_t x114 = (x113 >> 25);
+  uint32_t x115 = (uint32_t)(x113 & UINT32_C(0x1ffffff));
+  uint64_t x116 = (x114 + x111);
+  uint64_t x117 = (x116 >> 26);
+  uint32_t x118 = (uint32_t)(x116 & UINT32_C(0x3ffffff));
+  uint64_t x119 = (x117 + x110);
+  uint64_t x120 = (x119 >> 25);
+  uint32_t x121 = (uint32_t)(x119 & UINT32_C(0x1ffffff));
+  uint64_t x122 = (x120 + x109);
+  uint64_t x123 = (x122 >> 26);
+  uint32_t x124 = (uint32_t)(x122 & UINT32_C(0x3ffffff));
+  uint64_t x125 = (x123 + x108);
+  uint64_t x126 = (x125 >> 25);
+  uint32_t x127 = (uint32_t)(x125 & UINT32_C(0x1ffffff));
+  uint64_t x128 = (x126 + x107);
+  uint64_t x129 = (x128 >> 26);
+  uint32_t x130 = (uint32_t)(x128 & UINT32_C(0x3ffffff));
+  uint64_t x131 = (x129 + x106);
+  uint64_t x132 = (x131 >> 25);
+  uint32_t x133 = (uint32_t)(x131 & UINT32_C(0x1ffffff));
+  uint64_t x134 = (x132 + x105);
+  uint64_t x135 = (x134 >> 26);
+  uint32_t x136 = (uint32_t)(x134 & UINT32_C(0x3ffffff));
+  uint64_t x137 = (x135 + x104);
+  uint64_t x138 = (x137 >> 25);
+  uint32_t x139 = (uint32_t)(x137 & UINT32_C(0x1ffffff));
+  uint64_t x140 = (x138 * (uint64_t)UINT8_C(0x13));
+  uint64_t x141 = (x103 + x140);
+  uint32_t x142 = (uint32_t)(x141 >> 26);
+  uint32_t x143 = (uint32_t)(x141 & UINT32_C(0x3ffffff));
+  uint32_t x144 = (x142 + x115);
+  uint32_t x145 = (x144 >> 25);
+  uint32_t x146 = (x144 & UINT32_C(0x1ffffff));
+  uint32_t x147 = (x145 + x118);
+  out1[0] = x143;
+  out1[1] = x146;
+  out1[2] = x147;
+  out1[3] = x121;
+  out1[4] = x124;
+  out1[5] = x127;
+  out1[6] = x130;
+  out1[7] = x133;
+  out1[8] = x136;
+  out1[9] = x139;
+}
+
+/*
+ * Input Bounds:
+ *   arg1: [[0x0 ~> 0xd333332], [0x0 ~> 0x6999999], [0x0 ~> 0xd333332], [0x0 ~> 0x6999999], [0x0 ~> 0xd333332], [0x0 ~> 0x6999999], [0x0 ~> 0xd333332], [0x0 ~> 0x6999999], [0x0 ~> 0xd333332], [0x0 ~> 0x6999999]]
+ * Output Bounds:
+ *   out1: [[0x0 ~> 0x4666666], [0x0 ~> 0x2333333], [0x0 ~> 0x4666666], [0x0 ~> 0x2333333], [0x0 ~> 0x4666666], [0x0 ~> 0x2333333], [0x0 ~> 0x4666666], [0x0 ~> 0x2333333], [0x0 ~> 0x4666666], [0x0 ~> 0x2333333]]
+ */
+static void fiat_25519_carry_square(uint32_t out1[10], const uint32_t arg1[10]) {
+  uint32_t x1 = ((arg1[9]) * (uint32_t)UINT8_C(0x13));
+  uint32_t x2 = (x1 * (uint32_t)0x2);
+  uint32_t x3 = ((arg1[9]) * (uint32_t)0x2);
+  uint32_t x4 = ((arg1[8]) * (uint32_t)UINT8_C(0x13));
+  uint64_t x5 = (x4 * (uint64_t)0x2);
+  uint32_t x6 = ((arg1[8]) * (uint32_t)0x2);
+  uint32_t x7 = ((arg1[7]) * (uint32_t)UINT8_C(0x13));
+  uint32_t x8 = (x7 * (uint32_t)0x2);
+  uint32_t x9 = ((arg1[7]) * (uint32_t)0x2);
+  uint32_t x10 = ((arg1[6]) * (uint32_t)UINT8_C(0x13));
+  uint64_t x11 = (x10 * (uint64_t)0x2);
+  uint32_t x12 = ((arg1[6]) * (uint32_t)0x2);
+  uint32_t x13 = ((arg1[5]) * (uint32_t)UINT8_C(0x13));
+  uint32_t x14 = ((arg1[5]) * (uint32_t)0x2);
+  uint32_t x15 = ((arg1[4]) * (uint32_t)0x2);
+  uint32_t x16 = ((arg1[3]) * (uint32_t)0x2);
+  uint32_t x17 = ((arg1[2]) * (uint32_t)0x2);
+  uint32_t x18 = ((arg1[1]) * (uint32_t)0x2);
+  uint64_t x19 = ((uint64_t)(arg1[9]) * (x1 * (uint32_t)0x2));
+  uint64_t x20 = ((uint64_t)(arg1[8]) * x2);
+  uint64_t x21 = ((uint64_t)(arg1[8]) * x4);
+  uint64_t x22 = ((arg1[7]) * (x2 * (uint64_t)0x2));
+  uint64_t x23 = ((arg1[7]) * x5);
+  uint64_t x24 = ((uint64_t)(arg1[7]) * (x7 * (uint32_t)0x2));
+  uint64_t x25 = ((uint64_t)(arg1[6]) * x2);
+  uint64_t x26 = ((arg1[6]) * x5);
+  uint64_t x27 = ((uint64_t)(arg1[6]) * x8);
+  uint64_t x28 = ((uint64_t)(arg1[6]) * x10);
+  uint64_t x29 = ((arg1[5]) * (x2 * (uint64_t)0x2));
+  uint64_t x30 = ((arg1[5]) * x5);
+  uint64_t x31 = ((arg1[5]) * (x8 * (uint64_t)0x2));
+  uint64_t x32 = ((arg1[5]) * x11);
+  uint64_t x33 = ((uint64_t)(arg1[5]) * (x13 * (uint32_t)0x2));
+  uint64_t x34 = ((uint64_t)(arg1[4]) * x2);
+  uint64_t x35 = ((arg1[4]) * x5);
+  uint64_t x36 = ((uint64_t)(arg1[4]) * x8);
+  uint64_t x37 = ((arg1[4]) * x11);
+  uint64_t x38 = ((uint64_t)(arg1[4]) * x14);
+  uint64_t x39 = ((uint64_t)(arg1[4]) * (arg1[4]));
+  uint64_t x40 = ((arg1[3]) * (x2 * (uint64_t)0x2));
+  uint64_t x41 = ((arg1[3]) * x5);
+  uint64_t x42 = ((arg1[3]) * (x8 * (uint64_t)0x2));
+  uint64_t x43 = ((uint64_t)(arg1[3]) * x12);
+  uint64_t x44 = ((uint64_t)(arg1[3]) * (x14 * (uint32_t)0x2));
+  uint64_t x45 = ((uint64_t)(arg1[3]) * x15);
+  uint64_t x46 = ((uint64_t)(arg1[3]) * ((arg1[3]) * (uint32_t)0x2));
+  uint64_t x47 = ((uint64_t)(arg1[2]) * x2);
+  uint64_t x48 = ((arg1[2]) * x5);
+  uint64_t x49 = ((uint64_t)(arg1[2]) * x9);
+  uint64_t x50 = ((uint64_t)(arg1[2]) * x12);
+  uint64_t x51 = ((uint64_t)(arg1[2]) * x14);
+  uint64_t x52 = ((uint64_t)(arg1[2]) * x15);
+  uint64_t x53 = ((uint64_t)(arg1[2]) * x16);
+  uint64_t x54 = ((uint64_t)(arg1[2]) * (arg1[2]));
+  uint64_t x55 = ((arg1[1]) * (x2 * (uint64_t)0x2));
+  uint64_t x56 = ((uint64_t)(arg1[1]) * x6);
+  uint64_t x57 = ((uint64_t)(arg1[1]) * (x9 * (uint32_t)0x2));
+  uint64_t x58 = ((uint64_t)(arg1[1]) * x12);
+  uint64_t x59 = ((uint64_t)(arg1[1]) * (x14 * (uint32_t)0x2));
+  uint64_t x60 = ((uint64_t)(arg1[1]) * x15);
+  uint64_t x61 = ((uint64_t)(arg1[1]) * (x16 * (uint32_t)0x2));
+  uint64_t x62 = ((uint64_t)(arg1[1]) * x17);
+  uint64_t x63 = ((uint64_t)(arg1[1]) * ((arg1[1]) * (uint32_t)0x2));
+  uint64_t x64 = ((uint64_t)(arg1[0]) * x3);
+  uint64_t x65 = ((uint64_t)(arg1[0]) * x6);
+  uint64_t x66 = ((uint64_t)(arg1[0]) * x9);
+  uint64_t x67 = ((uint64_t)(arg1[0]) * x12);
+  uint64_t x68 = ((uint64_t)(arg1[0]) * x14);
+  uint64_t x69 = ((uint64_t)(arg1[0]) * x15);
+  uint64_t x70 = ((uint64_t)(arg1[0]) * x16);
+  uint64_t x71 = ((uint64_t)(arg1[0]) * x17);
+  uint64_t x72 = ((uint64_t)(arg1[0]) * x18);
+  uint64_t x73 = ((uint64_t)(arg1[0]) * (arg1[0]));
+  uint64_t x74 = (x73 + (x55 + (x48 + (x42 + (x37 + x33)))));
+  uint64_t x75 = (x74 >> 26);
+  uint32_t x76 = (uint32_t)(x74 & UINT32_C(0x3ffffff));
+  uint64_t x77 = (x64 + (x56 + (x49 + (x43 + x38))));
+  uint64_t x78 = (x65 + (x57 + (x50 + (x44 + (x39 + x19)))));
+  uint64_t x79 = (x66 + (x58 + (x51 + (x45 + x20))));
+  uint64_t x80 = (x67 + (x59 + (x52 + (x46 + (x22 + x21)))));
+  uint64_t x81 = (x68 + (x60 + (x53 + (x25 + x23))));
+  uint64_t x82 = (x69 + (x61 + (x54 + (x29 + (x26 + x24)))));
+  uint64_t x83 = (x70 + (x62 + (x34 + (x30 + x27))));
+  uint64_t x84 = (x71 + (x63 + (x40 + (x35 + (x31 + x28)))));
+  uint64_t x85 = (x72 + (x47 + (x41 + (x36 + x32))));
+  uint64_t x86 = (x75 + x85);
+  uint64_t x87 = (x86 >> 25);
+  uint32_t x88 = (uint32_t)(x86 & UINT32_C(0x1ffffff));
+  uint64_t x89 = (x87 + x84);
+  uint64_t x90 = (x89 >> 26);
+  uint32_t x91 = (uint32_t)(x89 & UINT32_C(0x3ffffff));
+  uint64_t x92 = (x90 + x83);
+  uint64_t x93 = (x92 >> 25);
+  uint32_t x94 = (uint32_t)(x92 & UINT32_C(0x1ffffff));
+  uint64_t x95 = (x93 + x82);
+  uint64_t x96 = (x95 >> 26);
+  uint32_t x97 = (uint32_t)(x95 & UINT32_C(0x3ffffff));
+  uint64_t x98 = (x96 + x81);
+  uint64_t x99 = (x98 >> 25);
+  uint32_t x100 = (uint32_t)(x98 & UINT32_C(0x1ffffff));
+  uint64_t x101 = (x99 + x80);
+  uint64_t x102 = (x101 >> 26);
+  uint32_t x103 = (uint32_t)(x101 & UINT32_C(0x3ffffff));
+  uint64_t x104 = (x102 + x79);
+  uint64_t x105 = (x104 >> 25);
+  uint32_t x106 = (uint32_t)(x104 & UINT32_C(0x1ffffff));
+  uint64_t x107 = (x105 + x78);
+  uint64_t x108 = (x107 >> 26);
+  uint32_t x109 = (uint32_t)(x107 & UINT32_C(0x3ffffff));
+  uint64_t x110 = (x108 + x77);
+  uint64_t x111 = (x110 >> 25);
+  uint32_t x112 = (uint32_t)(x110 & UINT32_C(0x1ffffff));
+  uint64_t x113 = (x111 * (uint64_t)UINT8_C(0x13));
+  uint64_t x114 = (x76 + x113);
+  uint32_t x115 = (uint32_t)(x114 >> 26);
+  uint32_t x116 = (uint32_t)(x114 & UINT32_C(0x3ffffff));
+  uint32_t x117 = (x115 + x88);
+  uint32_t x118 = (x117 >> 25);
+  uint32_t x119 = (x117 & UINT32_C(0x1ffffff));
+  uint32_t x120 = (x118 + x91);
+  out1[0] = x116;
+  out1[1] = x119;
+  out1[2] = x120;
+  out1[3] = x94;
+  out1[4] = x97;
+  out1[5] = x100;
+  out1[6] = x103;
+  out1[7] = x106;
+  out1[8] = x109;
+  out1[9] = x112;
+}
+
+/*
+ * Input Bounds:
+ *   arg1: [[0x0 ~> 0xd333332], [0x0 ~> 0x6999999], [0x0 ~> 0xd333332], [0x0 ~> 0x6999999], [0x0 ~> 0xd333332], [0x0 ~> 0x6999999], [0x0 ~> 0xd333332], [0x0 ~> 0x6999999], [0x0 ~> 0xd333332], [0x0 ~> 0x6999999]]
+ * Output Bounds:
+ *   out1: [[0x0 ~> 0x4666666], [0x0 ~> 0x2333333], [0x0 ~> 0x4666666], [0x0 ~> 0x2333333], [0x0 ~> 0x4666666], [0x0 ~> 0x2333333], [0x0 ~> 0x4666666], [0x0 ~> 0x2333333], [0x0 ~> 0x4666666], [0x0 ~> 0x2333333]]
+ */
+static void fiat_25519_carry_scmul_121666(uint32_t out1[10], const uint32_t arg1[10]) {
+  uint64_t x1 = ((uint64_t)UINT32_C(0x1db42) * (arg1[9]));
+  uint64_t x2 = ((uint64_t)UINT32_C(0x1db42) * (arg1[8]));
+  uint64_t x3 = ((uint64_t)UINT32_C(0x1db42) * (arg1[7]));
+  uint64_t x4 = ((uint64_t)UINT32_C(0x1db42) * (arg1[6]));
+  uint64_t x5 = ((uint64_t)UINT32_C(0x1db42) * (arg1[5]));
+  uint64_t x6 = ((uint64_t)UINT32_C(0x1db42) * (arg1[4]));
+  uint64_t x7 = ((uint64_t)UINT32_C(0x1db42) * (arg1[3]));
+  uint64_t x8 = ((uint64_t)UINT32_C(0x1db42) * (arg1[2]));
+  uint64_t x9 = ((uint64_t)UINT32_C(0x1db42) * (arg1[1]));
+  uint64_t x10 = ((uint64_t)UINT32_C(0x1db42) * (arg1[0]));
+  uint32_t x11 = (uint32_t)(x10 >> 26);
+  uint32_t x12 = (uint32_t)(x10 & UINT32_C(0x3ffffff));
+  uint64_t x13 = (x11 + x9);
+  uint32_t x14 = (uint32_t)(x13 >> 25);
+  uint32_t x15 = (uint32_t)(x13 & UINT32_C(0x1ffffff));
+  uint64_t x16 = (x14 + x8);
+  uint32_t x17 = (uint32_t)(x16 >> 26);
+  uint32_t x18 = (uint32_t)(x16 & UINT32_C(0x3ffffff));
+  uint64_t x19 = (x17 + x7);
+  uint32_t x20 = (uint32_t)(x19 >> 25);
+  uint32_t x21 = (uint32_t)(x19 & UINT32_C(0x1ffffff));
+  uint64_t x22 = (x20 + x6);
+  uint32_t x23 = (uint32_t)(x22 >> 26);
+  uint32_t x24 = (uint32_t)(x22 & UINT32_C(0x3ffffff));
+  uint64_t x25 = (x23 + x5);
+  uint32_t x26 = (uint32_t)(x25 >> 25);
+  uint32_t x27 = (uint32_t)(x25 & UINT32_C(0x1ffffff));
+  uint64_t x28 = (x26 + x4);
+  uint32_t x29 = (uint32_t)(x28 >> 26);
+  uint32_t x30 = (uint32_t)(x28 & UINT32_C(0x3ffffff));
+  uint64_t x31 = (x29 + x3);
+  uint32_t x32 = (uint32_t)(x31 >> 25);
+  uint32_t x33 = (uint32_t)(x31 & UINT32_C(0x1ffffff));
+  uint64_t x34 = (x32 + x2);
+  uint32_t x35 = (uint32_t)(x34 >> 26);
+  uint32_t x36 = (uint32_t)(x34 & UINT32_C(0x3ffffff));
+  uint64_t x37 = (x35 + x1);
+  uint32_t x38 = (uint32_t)(x37 >> 25);
+  uint32_t x39 = (uint32_t)(x37 & UINT32_C(0x1ffffff));
+  uint32_t x40 = (x38 * (uint32_t)UINT8_C(0x13));
+  uint32_t x41 = (x12 + x40);
+  uint32_t x42 = (x41 >> 26);
+  uint32_t x43 = (x41 & UINT32_C(0x3ffffff));
+  uint32_t x44 = (x42 + x15);
+  uint32_t x45 = (x44 >> 25);
+  uint32_t x46 = (x44 & UINT32_C(0x1ffffff));
+  uint32_t x47 = (x45 + x18);
+  out1[0] = x43;
+  out1[1] = x46;
+  out1[2] = x47;
+  out1[3] = x21;
+  out1[4] = x24;
+  out1[5] = x27;
+  out1[6] = x30;
+  out1[7] = x33;
+  out1[8] = x36;
+  out1[9] = x39;
+}
+
+/*
+ * Input Bounds:
+ *   arg1: [[0x0 ~> 0xd333332], [0x0 ~> 0x6999999], [0x0 ~> 0xd333332], [0x0 ~> 0x6999999], [0x0 ~> 0xd333332], [0x0 ~> 0x6999999], [0x0 ~> 0xd333332], [0x0 ~> 0x6999999], [0x0 ~> 0xd333332], [0x0 ~> 0x6999999]]
+ * Output Bounds:
+ *   out1: [[0x0 ~> 0x4666666], [0x0 ~> 0x2333333], [0x0 ~> 0x4666666], [0x0 ~> 0x2333333], [0x0 ~> 0x4666666], [0x0 ~> 0x2333333], [0x0 ~> 0x4666666], [0x0 ~> 0x2333333], [0x0 ~> 0x4666666], [0x0 ~> 0x2333333]]
+ */
+static void fiat_25519_carry(uint32_t out1[10], const uint32_t arg1[10]) {
+  uint32_t x1 = (arg1[0]);
+  uint32_t x2 = ((x1 >> 26) + (arg1[1]));
+  uint32_t x3 = ((x2 >> 25) + (arg1[2]));
+  uint32_t x4 = ((x3 >> 26) + (arg1[3]));
+  uint32_t x5 = ((x4 >> 25) + (arg1[4]));
+  uint32_t x6 = ((x5 >> 26) + (arg1[5]));
+  uint32_t x7 = ((x6 >> 25) + (arg1[6]));
+  uint32_t x8 = ((x7 >> 26) + (arg1[7]));
+  uint32_t x9 = ((x8 >> 25) + (arg1[8]));
+  uint32_t x10 = ((x9 >> 26) + (arg1[9]));
+  uint32_t x11 = ((x1 & UINT32_C(0x3ffffff)) + ((x10 >> 25) * (uint32_t)UINT8_C(0x13)));
+  uint32_t x12 = ((x11 >> 26) + (x2 & UINT32_C(0x1ffffff)));
+  uint32_t x13 = (x11 & UINT32_C(0x3ffffff));
+  uint32_t x14 = (x12 & UINT32_C(0x1ffffff));
+  uint32_t x15 = ((x12 >> 25) + (x3 & UINT32_C(0x3ffffff)));
+  uint32_t x16 = (x4 & UINT32_C(0x1ffffff));
+  uint32_t x17 = (x5 & UINT32_C(0x3ffffff));
+  uint32_t x18 = (x6 & UINT32_C(0x1ffffff));
+  uint32_t x19 = (x7 & UINT32_C(0x3ffffff));
+  uint32_t x20 = (x8 & UINT32_C(0x1ffffff));
+  uint32_t x21 = (x9 & UINT32_C(0x3ffffff));
+  uint32_t x22 = (x10 & UINT32_C(0x1ffffff));
+  out1[0] = x13;
+  out1[1] = x14;
+  out1[2] = x15;
+  out1[3] = x16;
+  out1[4] = x17;
+  out1[5] = x18;
+  out1[6] = x19;
+  out1[7] = x20;
+  out1[8] = x21;
+  out1[9] = x22;
+}
+
+/*
+ * Input Bounds:
+ *   arg1: [[0x0 ~> 0x4666666], [0x0 ~> 0x2333333], [0x0 ~> 0x4666666], [0x0 ~> 0x2333333], [0x0 ~> 0x4666666], [0x0 ~> 0x2333333], [0x0 ~> 0x4666666], [0x0 ~> 0x2333333], [0x0 ~> 0x4666666], [0x0 ~> 0x2333333]]
+ *   arg2: [[0x0 ~> 0x4666666], [0x0 ~> 0x2333333], [0x0 ~> 0x4666666], [0x0 ~> 0x2333333], [0x0 ~> 0x4666666], [0x0 ~> 0x2333333], [0x0 ~> 0x4666666], [0x0 ~> 0x2333333], [0x0 ~> 0x4666666], [0x0 ~> 0x2333333]]
+ * Output Bounds:
+ *   out1: [[0x0 ~> 0xd333332], [0x0 ~> 0x6999999], [0x0 ~> 0xd333332], [0x0 ~> 0x6999999], [0x0 ~> 0xd333332], [0x0 ~> 0x6999999], [0x0 ~> 0xd333332], [0x0 ~> 0x6999999], [0x0 ~> 0xd333332], [0x0 ~> 0x6999999]]
+ */
+static void fiat_25519_add(uint32_t out1[10], const uint32_t arg1[10], const uint32_t arg2[10]) {
+  uint32_t x1 = ((arg1[0]) + (arg2[0]));
+  uint32_t x2 = ((arg1[1]) + (arg2[1]));
+  uint32_t x3 = ((arg1[2]) + (arg2[2]));
+  uint32_t x4 = ((arg1[3]) + (arg2[3]));
+  uint32_t x5 = ((arg1[4]) + (arg2[4]));
+  uint32_t x6 = ((arg1[5]) + (arg2[5]));
+  uint32_t x7 = ((arg1[6]) + (arg2[6]));
+  uint32_t x8 = ((arg1[7]) + (arg2[7]));
+  uint32_t x9 = ((arg1[8]) + (arg2[8]));
+  uint32_t x10 = ((arg1[9]) + (arg2[9]));
+  out1[0] = x1;
+  out1[1] = x2;
+  out1[2] = x3;
+  out1[3] = x4;
+  out1[4] = x5;
+  out1[5] = x6;
+  out1[6] = x7;
+  out1[7] = x8;
+  out1[8] = x9;
+  out1[9] = x10;
+}
+
+/*
+ * Input Bounds:
+ *   arg1: [[0x0 ~> 0x4666666], [0x0 ~> 0x2333333], [0x0 ~> 0x4666666], [0x0 ~> 0x2333333], [0x0 ~> 0x4666666], [0x0 ~> 0x2333333], [0x0 ~> 0x4666666], [0x0 ~> 0x2333333], [0x0 ~> 0x4666666], [0x0 ~> 0x2333333]]
+ *   arg2: [[0x0 ~> 0x4666666], [0x0 ~> 0x2333333], [0x0 ~> 0x4666666], [0x0 ~> 0x2333333], [0x0 ~> 0x4666666], [0x0 ~> 0x2333333], [0x0 ~> 0x4666666], [0x0 ~> 0x2333333], [0x0 ~> 0x4666666], [0x0 ~> 0x2333333]]
+ * Output Bounds:
+ *   out1: [[0x0 ~> 0xd333332], [0x0 ~> 0x6999999], [0x0 ~> 0xd333332], [0x0 ~> 0x6999999], [0x0 ~> 0xd333332], [0x0 ~> 0x6999999], [0x0 ~> 0xd333332], [0x0 ~> 0x6999999], [0x0 ~> 0xd333332], [0x0 ~> 0x6999999]]
+ */
+static void fiat_25519_sub(uint32_t out1[10], const uint32_t arg1[10], const uint32_t arg2[10]) {
+  uint32_t x1 = ((UINT32_C(0x7ffffda) + (arg1[0])) - (arg2[0]));
+  uint32_t x2 = ((UINT32_C(0x3fffffe) + (arg1[1])) - (arg2[1]));
+  uint32_t x3 = ((UINT32_C(0x7fffffe) + (arg1[2])) - (arg2[2]));
+  uint32_t x4 = ((UINT32_C(0x3fffffe) + (arg1[3])) - (arg2[3]));
+  uint32_t x5 = ((UINT32_C(0x7fffffe) + (arg1[4])) - (arg2[4]));
+  uint32_t x6 = ((UINT32_C(0x3fffffe) + (arg1[5])) - (arg2[5]));
+  uint32_t x7 = ((UINT32_C(0x7fffffe) + (arg1[6])) - (arg2[6]));
+  uint32_t x8 = ((UINT32_C(0x3fffffe) + (arg1[7])) - (arg2[7]));
+  uint32_t x9 = ((UINT32_C(0x7fffffe) + (arg1[8])) - (arg2[8]));
+  uint32_t x10 = ((UINT32_C(0x3fffffe) + (arg1[9])) - (arg2[9]));
+  out1[0] = x1;
+  out1[1] = x2;
+  out1[2] = x3;
+  out1[3] = x4;
+  out1[4] = x5;
+  out1[5] = x6;
+  out1[6] = x7;
+  out1[7] = x8;
+  out1[8] = x9;
+  out1[9] = x10;
+}
+
+/*
+ * Input Bounds:
+ *   arg1: [[0x0 ~> 0x4666666], [0x0 ~> 0x2333333], [0x0 ~> 0x4666666], [0x0 ~> 0x2333333], [0x0 ~> 0x4666666], [0x0 ~> 0x2333333], [0x0 ~> 0x4666666], [0x0 ~> 0x2333333], [0x0 ~> 0x4666666], [0x0 ~> 0x2333333]]
+ * Output Bounds:
+ *   out1: [[0x0 ~> 0xd333332], [0x0 ~> 0x6999999], [0x0 ~> 0xd333332], [0x0 ~> 0x6999999], [0x0 ~> 0xd333332], [0x0 ~> 0x6999999], [0x0 ~> 0xd333332], [0x0 ~> 0x6999999], [0x0 ~> 0xd333332], [0x0 ~> 0x6999999]]
+ */
+static void fiat_25519_opp(uint32_t out1[10], const uint32_t arg1[10]) {
+  uint32_t x1 = (UINT32_C(0x7ffffda) - (arg1[0]));
+  uint32_t x2 = (UINT32_C(0x3fffffe) - (arg1[1]));
+  uint32_t x3 = (UINT32_C(0x7fffffe) - (arg1[2]));
+  uint32_t x4 = (UINT32_C(0x3fffffe) - (arg1[3]));
+  uint32_t x5 = (UINT32_C(0x7fffffe) - (arg1[4]));
+  uint32_t x6 = (UINT32_C(0x3fffffe) - (arg1[5]));
+  uint32_t x7 = (UINT32_C(0x7fffffe) - (arg1[6]));
+  uint32_t x8 = (UINT32_C(0x3fffffe) - (arg1[7]));
+  uint32_t x9 = (UINT32_C(0x7fffffe) - (arg1[8]));
+  uint32_t x10 = (UINT32_C(0x3fffffe) - (arg1[9]));
+  out1[0] = x1;
+  out1[1] = x2;
+  out1[2] = x3;
+  out1[3] = x4;
+  out1[4] = x5;
+  out1[5] = x6;
+  out1[6] = x7;
+  out1[7] = x8;
+  out1[8] = x9;
+  out1[9] = x10;
+}
+
+/*
+ * Input Bounds:
+ *   arg1: [0x0 ~> 0x1]
+ *   arg2: [[0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff]]
+ *   arg3: [[0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff]]
+ * Output Bounds:
+ *   out1: [[0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff]]
+ */
+static void fiat_25519_selectznz(uint32_t out1[10], fiat_25519_uint1 arg1, const uint32_t arg2[10], const uint32_t arg3[10]) {
+  uint32_t x1;
+  fiat_25519_cmovznz_u32(&x1, arg1, (arg2[0]), (arg3[0]));
+  uint32_t x2;
+  fiat_25519_cmovznz_u32(&x2, arg1, (arg2[1]), (arg3[1]));
+  uint32_t x3;
+  fiat_25519_cmovznz_u32(&x3, arg1, (arg2[2]), (arg3[2]));
+  uint32_t x4;
+  fiat_25519_cmovznz_u32(&x4, arg1, (arg2[3]), (arg3[3]));
+  uint32_t x5;
+  fiat_25519_cmovznz_u32(&x5, arg1, (arg2[4]), (arg3[4]));
+  uint32_t x6;
+  fiat_25519_cmovznz_u32(&x6, arg1, (arg2[5]), (arg3[5]));
+  uint32_t x7;
+  fiat_25519_cmovznz_u32(&x7, arg1, (arg2[6]), (arg3[6]));
+  uint32_t x8;
+  fiat_25519_cmovznz_u32(&x8, arg1, (arg2[7]), (arg3[7]));
+  uint32_t x9;
+  fiat_25519_cmovznz_u32(&x9, arg1, (arg2[8]), (arg3[8]));
+  uint32_t x10;
+  fiat_25519_cmovznz_u32(&x10, arg1, (arg2[9]), (arg3[9]));
+  out1[0] = x1;
+  out1[1] = x2;
+  out1[2] = x3;
+  out1[3] = x4;
+  out1[4] = x5;
+  out1[5] = x6;
+  out1[6] = x7;
+  out1[7] = x8;
+  out1[8] = x9;
+  out1[9] = x10;
+}
+
+/*
+ * Input Bounds:
+ *   arg1: [[0x0 ~> 0x4666666], [0x0 ~> 0x2333333], [0x0 ~> 0x4666666], [0x0 ~> 0x2333333], [0x0 ~> 0x4666666], [0x0 ~> 0x2333333], [0x0 ~> 0x4666666], [0x0 ~> 0x2333333], [0x0 ~> 0x4666666], [0x0 ~> 0x2333333]]
+ * Output Bounds:
+ *   out1: [[0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0x7f]]
+ */
+static void fiat_25519_to_bytes(uint8_t out1[32], const uint32_t arg1[10]) {
+  uint32_t x1;
+  fiat_25519_uint1 x2;
+  fiat_25519_subborrowx_u26(&x1, &x2, 0x0, (arg1[0]), UINT32_C(0x3ffffed));
+  uint32_t x3;
+  fiat_25519_uint1 x4;
+  fiat_25519_subborrowx_u25(&x3, &x4, x2, (arg1[1]), UINT32_C(0x1ffffff));
+  uint32_t x5;
+  fiat_25519_uint1 x6;
+  fiat_25519_subborrowx_u26(&x5, &x6, x4, (arg1[2]), UINT32_C(0x3ffffff));
+  uint32_t x7;
+  fiat_25519_uint1 x8;
+  fiat_25519_subborrowx_u25(&x7, &x8, x6, (arg1[3]), UINT32_C(0x1ffffff));
+  uint32_t x9;
+  fiat_25519_uint1 x10;
+  fiat_25519_subborrowx_u26(&x9, &x10, x8, (arg1[4]), UINT32_C(0x3ffffff));
+  uint32_t x11;
+  fiat_25519_uint1 x12;
+  fiat_25519_subborrowx_u25(&x11, &x12, x10, (arg1[5]), UINT32_C(0x1ffffff));
+  uint32_t x13;
+  fiat_25519_uint1 x14;
+  fiat_25519_subborrowx_u26(&x13, &x14, x12, (arg1[6]), UINT32_C(0x3ffffff));
+  uint32_t x15;
+  fiat_25519_uint1 x16;
+  fiat_25519_subborrowx_u25(&x15, &x16, x14, (arg1[7]), UINT32_C(0x1ffffff));
+  uint32_t x17;
+  fiat_25519_uint1 x18;
+  fiat_25519_subborrowx_u26(&x17, &x18, x16, (arg1[8]), UINT32_C(0x3ffffff));
+  uint32_t x19;
+  fiat_25519_uint1 x20;
+  fiat_25519_subborrowx_u25(&x19, &x20, x18, (arg1[9]), UINT32_C(0x1ffffff));
+  uint32_t x21;
+  fiat_25519_cmovznz_u32(&x21, x20, 0x0, UINT32_C(0xffffffff));
+  uint32_t x22;
+  fiat_25519_uint1 x23;
+  fiat_25519_addcarryx_u26(&x22, &x23, 0x0, (x21 & UINT32_C(0x3ffffed)), x1);
+  uint32_t x24;
+  fiat_25519_uint1 x25;
+  fiat_25519_addcarryx_u25(&x24, &x25, x23, (x21 & UINT32_C(0x1ffffff)), x3);
+  uint32_t x26;
+  fiat_25519_uint1 x27;
+  fiat_25519_addcarryx_u26(&x26, &x27, x25, (x21 & UINT32_C(0x3ffffff)), x5);
+  uint32_t x28;
+  fiat_25519_uint1 x29;
+  fiat_25519_addcarryx_u25(&x28, &x29, x27, (x21 & UINT32_C(0x1ffffff)), x7);
+  uint32_t x30;
+  fiat_25519_uint1 x31;
+  fiat_25519_addcarryx_u26(&x30, &x31, x29, (x21 & UINT32_C(0x3ffffff)), x9);
+  uint32_t x32;
+  fiat_25519_uint1 x33;
+  fiat_25519_addcarryx_u25(&x32, &x33, x31, (x21 & UINT32_C(0x1ffffff)), x11);
+  uint32_t x34;
+  fiat_25519_uint1 x35;
+  fiat_25519_addcarryx_u26(&x34, &x35, x33, (x21 & UINT32_C(0x3ffffff)), x13);
+  uint32_t x36;
+  fiat_25519_uint1 x37;
+  fiat_25519_addcarryx_u25(&x36, &x37, x35, (x21 & UINT32_C(0x1ffffff)), x15);
+  uint32_t x38;
+  fiat_25519_uint1 x39;
+  fiat_25519_addcarryx_u26(&x38, &x39, x37, (x21 & UINT32_C(0x3ffffff)), x17);
+  uint32_t x40;
+  fiat_25519_uint1 x41;
+  fiat_25519_addcarryx_u25(&x40, &x41, x39, (x21 & UINT32_C(0x1ffffff)), x19);
+  uint32_t x42 = (x40 << 6);
+  uint32_t x43 = (x38 << 4);
+  uint32_t x44 = (x36 << 3);
+  uint32_t x45 = (x34 * (uint32_t)0x2);
+  uint32_t x46 = (x30 << 6);
+  uint32_t x47 = (x28 << 5);
+  uint32_t x48 = (x26 << 3);
+  uint32_t x49 = (x24 << 2);
+  uint32_t x50 = (x22 >> 8);
+  uint8_t x51 = (uint8_t)(x22 & UINT8_C(0xff));
+  uint32_t x52 = (x50 >> 8);
+  uint8_t x53 = (uint8_t)(x50 & UINT8_C(0xff));
+  uint8_t x54 = (uint8_t)(x52 >> 8);
+  uint8_t x55 = (uint8_t)(x52 & UINT8_C(0xff));
+  uint32_t x56 = (x54 + x49);
+  uint32_t x57 = (x56 >> 8);
+  uint8_t x58 = (uint8_t)(x56 & UINT8_C(0xff));
+  uint32_t x59 = (x57 >> 8);
+  uint8_t x60 = (uint8_t)(x57 & UINT8_C(0xff));
+  uint8_t x61 = (uint8_t)(x59 >> 8);
+  uint8_t x62 = (uint8_t)(x59 & UINT8_C(0xff));
+  uint32_t x63 = (x61 + x48);
+  uint32_t x64 = (x63 >> 8);
+  uint8_t x65 = (uint8_t)(x63 & UINT8_C(0xff));
+  uint32_t x66 = (x64 >> 8);
+  uint8_t x67 = (uint8_t)(x64 & UINT8_C(0xff));
+  uint8_t x68 = (uint8_t)(x66 >> 8);
+  uint8_t x69 = (uint8_t)(x66 & UINT8_C(0xff));
+  uint32_t x70 = (x68 + x47);
+  uint32_t x71 = (x70 >> 8);
+  uint8_t x72 = (uint8_t)(x70 & UINT8_C(0xff));
+  uint32_t x73 = (x71 >> 8);
+  uint8_t x74 = (uint8_t)(x71 & UINT8_C(0xff));
+  uint8_t x75 = (uint8_t)(x73 >> 8);
+  uint8_t x76 = (uint8_t)(x73 & UINT8_C(0xff));
+  uint32_t x77 = (x75 + x46);
+  uint32_t x78 = (x77 >> 8);
+  uint8_t x79 = (uint8_t)(x77 & UINT8_C(0xff));
+  uint32_t x80 = (x78 >> 8);
+  uint8_t x81 = (uint8_t)(x78 & UINT8_C(0xff));
+  uint8_t x82 = (uint8_t)(x80 >> 8);
+  uint8_t x83 = (uint8_t)(x80 & UINT8_C(0xff));
+  uint8_t x84 = (uint8_t)(x82 & UINT8_C(0xff));
+  uint32_t x85 = (x32 >> 8);
+  uint8_t x86 = (uint8_t)(x32 & UINT8_C(0xff));
+  uint32_t x87 = (x85 >> 8);
+  uint8_t x88 = (uint8_t)(x85 & UINT8_C(0xff));
+  fiat_25519_uint1 x89 = (fiat_25519_uint1)(x87 >> 8);
+  uint8_t x90 = (uint8_t)(x87 & UINT8_C(0xff));
+  uint32_t x91 = (x89 + x45);
+  uint32_t x92 = (x91 >> 8);
+  uint8_t x93 = (uint8_t)(x91 & UINT8_C(0xff));
+  uint32_t x94 = (x92 >> 8);
+  uint8_t x95 = (uint8_t)(x92 & UINT8_C(0xff));
+  uint8_t x96 = (uint8_t)(x94 >> 8);
+  uint8_t x97 = (uint8_t)(x94 & UINT8_C(0xff));
+  uint32_t x98 = (x96 + x44);
+  uint32_t x99 = (x98 >> 8);
+  uint8_t x100 = (uint8_t)(x98 & UINT8_C(0xff));
+  uint32_t x101 = (x99 >> 8);
+  uint8_t x102 = (uint8_t)(x99 & UINT8_C(0xff));
+  uint8_t x103 = (uint8_t)(x101 >> 8);
+  uint8_t x104 = (uint8_t)(x101 & UINT8_C(0xff));
+  uint32_t x105 = (x103 + x43);
+  uint32_t x106 = (x105 >> 8);
+  uint8_t x107 = (uint8_t)(x105 & UINT8_C(0xff));
+  uint32_t x108 = (x106 >> 8);
+  uint8_t x109 = (uint8_t)(x106 & UINT8_C(0xff));
+  uint8_t x110 = (uint8_t)(x108 >> 8);
+  uint8_t x111 = (uint8_t)(x108 & UINT8_C(0xff));
+  uint32_t x112 = (x110 + x42);
+  uint32_t x113 = (x112 >> 8);
+  uint8_t x114 = (uint8_t)(x112 & UINT8_C(0xff));
+  uint32_t x115 = (x113 >> 8);
+  uint8_t x116 = (uint8_t)(x113 & UINT8_C(0xff));
+  uint8_t x117 = (uint8_t)(x115 >> 8);
+  uint8_t x118 = (uint8_t)(x115 & UINT8_C(0xff));
+  out1[0] = x51;
+  out1[1] = x53;
+  out1[2] = x55;
+  out1[3] = x58;
+  out1[4] = x60;
+  out1[5] = x62;
+  out1[6] = x65;
+  out1[7] = x67;
+  out1[8] = x69;
+  out1[9] = x72;
+  out1[10] = x74;
+  out1[11] = x76;
+  out1[12] = x79;
+  out1[13] = x81;
+  out1[14] = x83;
+  out1[15] = x84;
+  out1[16] = x86;
+  out1[17] = x88;
+  out1[18] = x90;
+  out1[19] = x93;
+  out1[20] = x95;
+  out1[21] = x97;
+  out1[22] = x100;
+  out1[23] = x102;
+  out1[24] = x104;
+  out1[25] = x107;
+  out1[26] = x109;
+  out1[27] = x111;
+  out1[28] = x114;
+  out1[29] = x116;
+  out1[30] = x118;
+  out1[31] = x117;
+}
+
+/*
+ * Input Bounds:
+ *   arg1: [[0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0x7f]]
+ * Output Bounds:
+ *   out1: [[0x0 ~> 0x4666666], [0x0 ~> 0x2333333], [0x0 ~> 0x4666666], [0x0 ~> 0x2333333], [0x0 ~> 0x4666666], [0x0 ~> 0x2333333], [0x0 ~> 0x4666666], [0x0 ~> 0x2333333], [0x0 ~> 0x4666666], [0x0 ~> 0x2333333]]
+ */
+static void fiat_25519_from_bytes(uint32_t out1[10], const uint8_t arg1[32]) {
+  uint32_t x1 = ((uint32_t)(arg1[31]) << 18);
+  uint32_t x2 = ((uint32_t)(arg1[30]) << 10);
+  uint32_t x3 = ((uint32_t)(arg1[29]) << 2);
+  uint32_t x4 = ((uint32_t)(arg1[28]) << 20);
+  uint32_t x5 = ((uint32_t)(arg1[27]) << 12);
+  uint32_t x6 = ((uint32_t)(arg1[26]) << 4);
+  uint32_t x7 = ((uint32_t)(arg1[25]) << 21);
+  uint32_t x8 = ((uint32_t)(arg1[24]) << 13);
+  uint32_t x9 = ((uint32_t)(arg1[23]) << 5);
+  uint32_t x10 = ((uint32_t)(arg1[22]) << 23);
+  uint32_t x11 = ((uint32_t)(arg1[21]) << 15);
+  uint32_t x12 = ((uint32_t)(arg1[20]) << 7);
+  uint32_t x13 = ((uint32_t)(arg1[19]) << 24);
+  uint32_t x14 = ((uint32_t)(arg1[18]) << 16);
+  uint32_t x15 = ((uint32_t)(arg1[17]) << 8);
+  uint8_t x16 = (arg1[16]);
+  uint32_t x17 = ((uint32_t)(arg1[15]) << 18);
+  uint32_t x18 = ((uint32_t)(arg1[14]) << 10);
+  uint32_t x19 = ((uint32_t)(arg1[13]) << 2);
+  uint32_t x20 = ((uint32_t)(arg1[12]) << 19);
+  uint32_t x21 = ((uint32_t)(arg1[11]) << 11);
+  uint32_t x22 = ((uint32_t)(arg1[10]) << 3);
+  uint32_t x23 = ((uint32_t)(arg1[9]) << 21);
+  uint32_t x24 = ((uint32_t)(arg1[8]) << 13);
+  uint32_t x25 = ((uint32_t)(arg1[7]) << 5);
+  uint32_t x26 = ((uint32_t)(arg1[6]) << 22);
+  uint32_t x27 = ((uint32_t)(arg1[5]) << 14);
+  uint32_t x28 = ((uint32_t)(arg1[4]) << 6);
+  uint32_t x29 = ((uint32_t)(arg1[3]) << 24);
+  uint32_t x30 = ((uint32_t)(arg1[2]) << 16);
+  uint32_t x31 = ((uint32_t)(arg1[1]) << 8);
+  uint8_t x32 = (arg1[0]);
+  uint32_t x33 = (x32 + (x31 + (x30 + x29)));
+  uint8_t x34 = (uint8_t)(x33 >> 26);
+  uint32_t x35 = (x33 & UINT32_C(0x3ffffff));
+  uint32_t x36 = (x3 + (x2 + x1));
+  uint32_t x37 = (x6 + (x5 + x4));
+  uint32_t x38 = (x9 + (x8 + x7));
+  uint32_t x39 = (x12 + (x11 + x10));
+  uint32_t x40 = (x16 + (x15 + (x14 + x13)));
+  uint32_t x41 = (x19 + (x18 + x17));
+  uint32_t x42 = (x22 + (x21 + x20));
+  uint32_t x43 = (x25 + (x24 + x23));
+  uint32_t x44 = (x28 + (x27 + x26));
+  uint32_t x45 = (x34 + x44);
+  uint8_t x46 = (uint8_t)(x45 >> 25);
+  uint32_t x47 = (x45 & UINT32_C(0x1ffffff));
+  uint32_t x48 = (x46 + x43);
+  uint8_t x49 = (uint8_t)(x48 >> 26);
+  uint32_t x50 = (x48 & UINT32_C(0x3ffffff));
+  uint32_t x51 = (x49 + x42);
+  uint8_t x52 = (uint8_t)(x51 >> 25);
+  uint32_t x53 = (x51 & UINT32_C(0x1ffffff));
+  uint32_t x54 = (x52 + x41);
+  uint32_t x55 = (x54 & UINT32_C(0x3ffffff));
+  uint8_t x56 = (uint8_t)(x40 >> 25);
+  uint32_t x57 = (x40 & UINT32_C(0x1ffffff));
+  uint32_t x58 = (x56 + x39);
+  uint8_t x59 = (uint8_t)(x58 >> 26);
+  uint32_t x60 = (x58 & UINT32_C(0x3ffffff));
+  uint32_t x61 = (x59 + x38);
+  uint8_t x62 = (uint8_t)(x61 >> 25);
+  uint32_t x63 = (x61 & UINT32_C(0x1ffffff));
+  uint32_t x64 = (x62 + x37);
+  uint8_t x65 = (uint8_t)(x64 >> 26);
+  uint32_t x66 = (x64 & UINT32_C(0x3ffffff));
+  uint32_t x67 = (x65 + x36);
+  out1[0] = x35;
+  out1[1] = x47;
+  out1[2] = x50;
+  out1[3] = x53;
+  out1[4] = x55;
+  out1[5] = x57;
+  out1[6] = x60;
+  out1[7] = x63;
+  out1[8] = x66;
+  out1[9] = x67;
+}
+
diff --git a/third_party/fiat/curve25519_64.c b/third_party/fiat/curve25519_64.c
new file mode 100644
index 0000000..23bf361
--- /dev/null
+++ b/third_party/fiat/curve25519_64.c
@@ -0,0 +1,553 @@
+/* Autogenerated */
+/* curve description: 25519 */
+/* requested operations: carry_mul, carry_square, carry_scmul121666, carry, add, sub, opp, selectznz, to_bytes, from_bytes */
+/* n = 5 (from "5") */
+/* s = 0x8000000000000000000000000000000000000000000000000000000000000000 (from "2^255") */
+/* c = [(1, 19)] (from "1,19") */
+/* machine_wordsize = 64 (from "64") */
+
+#include <stdint.h>
+typedef unsigned char fiat_25519_uint1;
+typedef signed char fiat_25519_int1;
+typedef signed __int128 fiat_25519_int128;
+typedef unsigned __int128 fiat_25519_uint128;
+
+
+/*
+ * Input Bounds:
+ *   arg1: [0x0 ~> 0x1]
+ *   arg2: [0x0 ~> 0x7ffffffffffff]
+ *   arg3: [0x0 ~> 0x7ffffffffffff]
+ * Output Bounds:
+ *   out1: [0x0 ~> 0x7ffffffffffff]
+ *   out2: [0x0 ~> 0x1]
+ */
+static void fiat_25519_addcarryx_u51(uint64_t* out1, fiat_25519_uint1* out2, fiat_25519_uint1 arg1, uint64_t arg2, uint64_t arg3) {
+  uint64_t x1 = ((arg1 + arg2) + arg3);
+  uint64_t x2 = (x1 & UINT64_C(0x7ffffffffffff));
+  fiat_25519_uint1 x3 = (fiat_25519_uint1)(x1 >> 51);
+  *out1 = x2;
+  *out2 = x3;
+}
+
+/*
+ * Input Bounds:
+ *   arg1: [0x0 ~> 0x1]
+ *   arg2: [0x0 ~> 0x7ffffffffffff]
+ *   arg3: [0x0 ~> 0x7ffffffffffff]
+ * Output Bounds:
+ *   out1: [0x0 ~> 0x7ffffffffffff]
+ *   out2: [0x0 ~> 0x1]
+ */
+static void fiat_25519_subborrowx_u51(uint64_t* out1, fiat_25519_uint1* out2, fiat_25519_uint1 arg1, uint64_t arg2, uint64_t arg3) {
+  int64_t x1 = ((int64_t)(arg2 - (int64_t)arg1) - (int64_t)arg3);
+  fiat_25519_int1 x2 = (fiat_25519_int1)(x1 >> 51);
+  uint64_t x3 = (x1 & UINT64_C(0x7ffffffffffff));
+  *out1 = x3;
+  *out2 = (fiat_25519_uint1)(0x0 - x2);
+}
+
+/*
+ * Input Bounds:
+ *   arg1: [0x0 ~> 0x1]
+ *   arg2: [0x0 ~> 0xffffffffffffffff]
+ *   arg3: [0x0 ~> 0xffffffffffffffff]
+ * Output Bounds:
+ *   out1: [0x0 ~> 0xffffffffffffffff]
+ */
+static void fiat_25519_cmovznz_u64(uint64_t* out1, fiat_25519_uint1 arg1, uint64_t arg2, uint64_t arg3) {
+  fiat_25519_uint1 x1 = (!(!arg1));
+  uint64_t x2 = ((fiat_25519_int1)(0x0 - x1) & UINT64_C(0xffffffffffffffff));
+  uint64_t x3 = ((x2 & arg3) | ((~x2) & arg2));
+  *out1 = x3;
+}
+
+/*
+ * Input Bounds:
+ *   arg1: [[0x0 ~> 0x1a666666666664], [0x0 ~> 0x1a666666666664], [0x0 ~> 0x1a666666666664], [0x0 ~> 0x1a666666666664], [0x0 ~> 0x1a666666666664]]
+ *   arg2: [[0x0 ~> 0x1a666666666664], [0x0 ~> 0x1a666666666664], [0x0 ~> 0x1a666666666664], [0x0 ~> 0x1a666666666664], [0x0 ~> 0x1a666666666664]]
+ * Output Bounds:
+ *   out1: [[0x0 ~> 0x8cccccccccccc], [0x0 ~> 0x8cccccccccccc], [0x0 ~> 0x8cccccccccccc], [0x0 ~> 0x8cccccccccccc], [0x0 ~> 0x8cccccccccccc]]
+ */
+static void fiat_25519_carry_mul(uint64_t out1[5], const uint64_t arg1[5], const uint64_t arg2[5]) {
+  fiat_25519_uint128 x1 = ((fiat_25519_uint128)(arg1[4]) * ((arg2[4]) * (uint64_t)UINT8_C(0x13)));
+  fiat_25519_uint128 x2 = ((fiat_25519_uint128)(arg1[4]) * ((arg2[3]) * (uint64_t)UINT8_C(0x13)));
+  fiat_25519_uint128 x3 = ((fiat_25519_uint128)(arg1[4]) * ((arg2[2]) * (uint64_t)UINT8_C(0x13)));
+  fiat_25519_uint128 x4 = ((fiat_25519_uint128)(arg1[4]) * ((arg2[1]) * (uint64_t)UINT8_C(0x13)));
+  fiat_25519_uint128 x5 = ((fiat_25519_uint128)(arg1[3]) * ((arg2[4]) * (uint64_t)UINT8_C(0x13)));
+  fiat_25519_uint128 x6 = ((fiat_25519_uint128)(arg1[3]) * ((arg2[3]) * (uint64_t)UINT8_C(0x13)));
+  fiat_25519_uint128 x7 = ((fiat_25519_uint128)(arg1[3]) * ((arg2[2]) * (uint64_t)UINT8_C(0x13)));
+  fiat_25519_uint128 x8 = ((fiat_25519_uint128)(arg1[2]) * ((arg2[4]) * (uint64_t)UINT8_C(0x13)));
+  fiat_25519_uint128 x9 = ((fiat_25519_uint128)(arg1[2]) * ((arg2[3]) * (uint64_t)UINT8_C(0x13)));
+  fiat_25519_uint128 x10 = ((fiat_25519_uint128)(arg1[1]) * ((arg2[4]) * (uint64_t)UINT8_C(0x13)));
+  fiat_25519_uint128 x11 = ((fiat_25519_uint128)(arg1[4]) * (arg2[0]));
+  fiat_25519_uint128 x12 = ((fiat_25519_uint128)(arg1[3]) * (arg2[1]));
+  fiat_25519_uint128 x13 = ((fiat_25519_uint128)(arg1[3]) * (arg2[0]));
+  fiat_25519_uint128 x14 = ((fiat_25519_uint128)(arg1[2]) * (arg2[2]));
+  fiat_25519_uint128 x15 = ((fiat_25519_uint128)(arg1[2]) * (arg2[1]));
+  fiat_25519_uint128 x16 = ((fiat_25519_uint128)(arg1[2]) * (arg2[0]));
+  fiat_25519_uint128 x17 = ((fiat_25519_uint128)(arg1[1]) * (arg2[3]));
+  fiat_25519_uint128 x18 = ((fiat_25519_uint128)(arg1[1]) * (arg2[2]));
+  fiat_25519_uint128 x19 = ((fiat_25519_uint128)(arg1[1]) * (arg2[1]));
+  fiat_25519_uint128 x20 = ((fiat_25519_uint128)(arg1[1]) * (arg2[0]));
+  fiat_25519_uint128 x21 = ((fiat_25519_uint128)(arg1[0]) * (arg2[4]));
+  fiat_25519_uint128 x22 = ((fiat_25519_uint128)(arg1[0]) * (arg2[3]));
+  fiat_25519_uint128 x23 = ((fiat_25519_uint128)(arg1[0]) * (arg2[2]));
+  fiat_25519_uint128 x24 = ((fiat_25519_uint128)(arg1[0]) * (arg2[1]));
+  fiat_25519_uint128 x25 = ((fiat_25519_uint128)(arg1[0]) * (arg2[0]));
+  fiat_25519_uint128 x26 = (x25 + (x10 + (x9 + (x7 + x4))));
+  uint64_t x27 = (uint64_t)(x26 >> 51);
+  uint64_t x28 = (uint64_t)(x26 & UINT64_C(0x7ffffffffffff));
+  fiat_25519_uint128 x29 = (x21 + (x17 + (x14 + (x12 + x11))));
+  fiat_25519_uint128 x30 = (x22 + (x18 + (x15 + (x13 + x1))));
+  fiat_25519_uint128 x31 = (x23 + (x19 + (x16 + (x5 + x2))));
+  fiat_25519_uint128 x32 = (x24 + (x20 + (x8 + (x6 + x3))));
+  fiat_25519_uint128 x33 = (x27 + x32);
+  uint64_t x34 = (uint64_t)(x33 >> 51);
+  uint64_t x35 = (uint64_t)(x33 & UINT64_C(0x7ffffffffffff));
+  fiat_25519_uint128 x36 = (x34 + x31);
+  uint64_t x37 = (uint64_t)(x36 >> 51);
+  uint64_t x38 = (uint64_t)(x36 & UINT64_C(0x7ffffffffffff));
+  fiat_25519_uint128 x39 = (x37 + x30);
+  uint64_t x40 = (uint64_t)(x39 >> 51);
+  uint64_t x41 = (uint64_t)(x39 & UINT64_C(0x7ffffffffffff));
+  fiat_25519_uint128 x42 = (x40 + x29);
+  uint64_t x43 = (uint64_t)(x42 >> 51);
+  uint64_t x44 = (uint64_t)(x42 & UINT64_C(0x7ffffffffffff));
+  uint64_t x45 = (x43 * (uint64_t)UINT8_C(0x13));
+  uint64_t x46 = (x28 + x45);
+  uint64_t x47 = (x46 >> 51);
+  uint64_t x48 = (x46 & UINT64_C(0x7ffffffffffff));
+  uint64_t x49 = (x47 + x35);
+  uint64_t x50 = (x49 >> 51);
+  uint64_t x51 = (x49 & UINT64_C(0x7ffffffffffff));
+  uint64_t x52 = (x50 + x38);
+  out1[0] = x48;
+  out1[1] = x51;
+  out1[2] = x52;
+  out1[3] = x41;
+  out1[4] = x44;
+}
+
+/*
+ * Input Bounds:
+ *   arg1: [[0x0 ~> 0x1a666666666664], [0x0 ~> 0x1a666666666664], [0x0 ~> 0x1a666666666664], [0x0 ~> 0x1a666666666664], [0x0 ~> 0x1a666666666664]]
+ * Output Bounds:
+ *   out1: [[0x0 ~> 0x8cccccccccccc], [0x0 ~> 0x8cccccccccccc], [0x0 ~> 0x8cccccccccccc], [0x0 ~> 0x8cccccccccccc], [0x0 ~> 0x8cccccccccccc]]
+ */
+static void fiat_25519_carry_square(uint64_t out1[5], const uint64_t arg1[5]) {
+  uint64_t x1 = ((arg1[4]) * (uint64_t)UINT8_C(0x13));
+  uint64_t x2 = (x1 * (uint64_t)0x2);
+  uint64_t x3 = ((arg1[4]) * (uint64_t)0x2);
+  uint64_t x4 = ((arg1[3]) * (uint64_t)UINT8_C(0x13));
+  uint64_t x5 = (x4 * (uint64_t)0x2);
+  uint64_t x6 = ((arg1[3]) * (uint64_t)0x2);
+  uint64_t x7 = ((arg1[2]) * (uint64_t)0x2);
+  uint64_t x8 = ((arg1[1]) * (uint64_t)0x2);
+  fiat_25519_uint128 x9 = ((fiat_25519_uint128)(arg1[4]) * x1);
+  fiat_25519_uint128 x10 = ((fiat_25519_uint128)(arg1[3]) * x2);
+  fiat_25519_uint128 x11 = ((fiat_25519_uint128)(arg1[3]) * x4);
+  fiat_25519_uint128 x12 = ((fiat_25519_uint128)(arg1[2]) * x2);
+  fiat_25519_uint128 x13 = ((fiat_25519_uint128)(arg1[2]) * x5);
+  fiat_25519_uint128 x14 = ((fiat_25519_uint128)(arg1[2]) * (arg1[2]));
+  fiat_25519_uint128 x15 = ((fiat_25519_uint128)(arg1[1]) * x2);
+  fiat_25519_uint128 x16 = ((fiat_25519_uint128)(arg1[1]) * x6);
+  fiat_25519_uint128 x17 = ((fiat_25519_uint128)(arg1[1]) * x7);
+  fiat_25519_uint128 x18 = ((fiat_25519_uint128)(arg1[1]) * (arg1[1]));
+  fiat_25519_uint128 x19 = ((fiat_25519_uint128)(arg1[0]) * x3);
+  fiat_25519_uint128 x20 = ((fiat_25519_uint128)(arg1[0]) * x6);
+  fiat_25519_uint128 x21 = ((fiat_25519_uint128)(arg1[0]) * x7);
+  fiat_25519_uint128 x22 = ((fiat_25519_uint128)(arg1[0]) * x8);
+  fiat_25519_uint128 x23 = ((fiat_25519_uint128)(arg1[0]) * (arg1[0]));
+  fiat_25519_uint128 x24 = (x23 + (x15 + x13));
+  uint64_t x25 = (uint64_t)(x24 >> 51);
+  uint64_t x26 = (uint64_t)(x24 & UINT64_C(0x7ffffffffffff));
+  fiat_25519_uint128 x27 = (x19 + (x16 + x14));
+  fiat_25519_uint128 x28 = (x20 + (x17 + x9));
+  fiat_25519_uint128 x29 = (x21 + (x18 + x10));
+  fiat_25519_uint128 x30 = (x22 + (x12 + x11));
+  fiat_25519_uint128 x31 = (x25 + x30);
+  uint64_t x32 = (uint64_t)(x31 >> 51);
+  uint64_t x33 = (uint64_t)(x31 & UINT64_C(0x7ffffffffffff));
+  fiat_25519_uint128 x34 = (x32 + x29);
+  uint64_t x35 = (uint64_t)(x34 >> 51);
+  uint64_t x36 = (uint64_t)(x34 & UINT64_C(0x7ffffffffffff));
+  fiat_25519_uint128 x37 = (x35 + x28);
+  uint64_t x38 = (uint64_t)(x37 >> 51);
+  uint64_t x39 = (uint64_t)(x37 & UINT64_C(0x7ffffffffffff));
+  fiat_25519_uint128 x40 = (x38 + x27);
+  uint64_t x41 = (uint64_t)(x40 >> 51);
+  uint64_t x42 = (uint64_t)(x40 & UINT64_C(0x7ffffffffffff));
+  uint64_t x43 = (x41 * (uint64_t)UINT8_C(0x13));
+  uint64_t x44 = (x26 + x43);
+  uint64_t x45 = (x44 >> 51);
+  uint64_t x46 = (x44 & UINT64_C(0x7ffffffffffff));
+  uint64_t x47 = (x45 + x33);
+  uint64_t x48 = (x47 >> 51);
+  uint64_t x49 = (x47 & UINT64_C(0x7ffffffffffff));
+  uint64_t x50 = (x48 + x36);
+  out1[0] = x46;
+  out1[1] = x49;
+  out1[2] = x50;
+  out1[3] = x39;
+  out1[4] = x42;
+}
+
+/*
+ * Input Bounds:
+ *   arg1: [[0x0 ~> 0x1a666666666664], [0x0 ~> 0x1a666666666664], [0x0 ~> 0x1a666666666664], [0x0 ~> 0x1a666666666664], [0x0 ~> 0x1a666666666664]]
+ * Output Bounds:
+ *   out1: [[0x0 ~> 0x8cccccccccccc], [0x0 ~> 0x8cccccccccccc], [0x0 ~> 0x8cccccccccccc], [0x0 ~> 0x8cccccccccccc], [0x0 ~> 0x8cccccccccccc]]
+ */
+static void fiat_25519_carry_scmul_121666(uint64_t out1[5], const uint64_t arg1[5]) {
+  fiat_25519_uint128 x1 = (UINT32_C(0x1db42) * (fiat_25519_uint128)(arg1[4]));
+  fiat_25519_uint128 x2 = (UINT32_C(0x1db42) * (fiat_25519_uint128)(arg1[3]));
+  fiat_25519_uint128 x3 = (UINT32_C(0x1db42) * (fiat_25519_uint128)(arg1[2]));
+  fiat_25519_uint128 x4 = (UINT32_C(0x1db42) * (fiat_25519_uint128)(arg1[1]));
+  fiat_25519_uint128 x5 = (UINT32_C(0x1db42) * (fiat_25519_uint128)(arg1[0]));
+  uint64_t x6 = (uint64_t)(x5 >> 51);
+  uint64_t x7 = (uint64_t)(x5 & UINT64_C(0x7ffffffffffff));
+  fiat_25519_uint128 x8 = (x6 + x4);
+  uint64_t x9 = (uint64_t)(x8 >> 51);
+  uint64_t x10 = (uint64_t)(x8 & UINT64_C(0x7ffffffffffff));
+  fiat_25519_uint128 x11 = (x9 + x3);
+  uint64_t x12 = (uint64_t)(x11 >> 51);
+  uint64_t x13 = (uint64_t)(x11 & UINT64_C(0x7ffffffffffff));
+  fiat_25519_uint128 x14 = (x12 + x2);
+  uint64_t x15 = (uint64_t)(x14 >> 51);
+  uint64_t x16 = (uint64_t)(x14 & UINT64_C(0x7ffffffffffff));
+  fiat_25519_uint128 x17 = (x15 + x1);
+  uint64_t x18 = (uint64_t)(x17 >> 51);
+  uint64_t x19 = (uint64_t)(x17 & UINT64_C(0x7ffffffffffff));
+  uint64_t x20 = (x18 * (uint64_t)UINT8_C(0x13));
+  uint64_t x21 = (x7 + x20);
+  uint64_t x22 = (x21 >> 51);
+  uint64_t x23 = (x21 & UINT64_C(0x7ffffffffffff));
+  uint64_t x24 = (x22 + x10);
+  uint64_t x25 = (x24 >> 51);
+  uint64_t x26 = (x24 & UINT64_C(0x7ffffffffffff));
+  uint64_t x27 = (x25 + x13);
+  out1[0] = x23;
+  out1[1] = x26;
+  out1[2] = x27;
+  out1[3] = x16;
+  out1[4] = x19;
+}
+
+/*
+ * Input Bounds:
+ *   arg1: [[0x0 ~> 0x1a666666666664], [0x0 ~> 0x1a666666666664], [0x0 ~> 0x1a666666666664], [0x0 ~> 0x1a666666666664], [0x0 ~> 0x1a666666666664]]
+ * Output Bounds:
+ *   out1: [[0x0 ~> 0x8cccccccccccc], [0x0 ~> 0x8cccccccccccc], [0x0 ~> 0x8cccccccccccc], [0x0 ~> 0x8cccccccccccc], [0x0 ~> 0x8cccccccccccc]]
+ */
+static void fiat_25519_carry(uint64_t out1[5], const uint64_t arg1[5]) {
+  uint64_t x1 = (arg1[0]);
+  uint64_t x2 = ((x1 >> 51) + (arg1[1]));
+  uint64_t x3 = ((x2 >> 51) + (arg1[2]));
+  uint64_t x4 = ((x3 >> 51) + (arg1[3]));
+  uint64_t x5 = ((x4 >> 51) + (arg1[4]));
+  uint64_t x6 = ((x1 & UINT64_C(0x7ffffffffffff)) + ((x5 >> 51) * (uint64_t)UINT8_C(0x13)));
+  uint64_t x7 = ((x6 >> 51) + (x2 & UINT64_C(0x7ffffffffffff)));
+  uint64_t x8 = (x6 & UINT64_C(0x7ffffffffffff));
+  uint64_t x9 = (x7 & UINT64_C(0x7ffffffffffff));
+  uint64_t x10 = ((x7 >> 51) + (x3 & UINT64_C(0x7ffffffffffff)));
+  uint64_t x11 = (x4 & UINT64_C(0x7ffffffffffff));
+  uint64_t x12 = (x5 & UINT64_C(0x7ffffffffffff));
+  out1[0] = x8;
+  out1[1] = x9;
+  out1[2] = x10;
+  out1[3] = x11;
+  out1[4] = x12;
+}
+
+/*
+ * Input Bounds:
+ *   arg1: [[0x0 ~> 0x8cccccccccccc], [0x0 ~> 0x8cccccccccccc], [0x0 ~> 0x8cccccccccccc], [0x0 ~> 0x8cccccccccccc], [0x0 ~> 0x8cccccccccccc]]
+ *   arg2: [[0x0 ~> 0x8cccccccccccc], [0x0 ~> 0x8cccccccccccc], [0x0 ~> 0x8cccccccccccc], [0x0 ~> 0x8cccccccccccc], [0x0 ~> 0x8cccccccccccc]]
+ * Output Bounds:
+ *   out1: [[0x0 ~> 0x1a666666666664], [0x0 ~> 0x1a666666666664], [0x0 ~> 0x1a666666666664], [0x0 ~> 0x1a666666666664], [0x0 ~> 0x1a666666666664]]
+ */
+static void fiat_25519_add(uint64_t out1[5], const uint64_t arg1[5], const uint64_t arg2[5]) {
+  uint64_t x1 = ((arg1[0]) + (arg2[0]));
+  uint64_t x2 = ((arg1[1]) + (arg2[1]));
+  uint64_t x3 = ((arg1[2]) + (arg2[2]));
+  uint64_t x4 = ((arg1[3]) + (arg2[3]));
+  uint64_t x5 = ((arg1[4]) + (arg2[4]));
+  out1[0] = x1;
+  out1[1] = x2;
+  out1[2] = x3;
+  out1[3] = x4;
+  out1[4] = x5;
+}
+
+/*
+ * Input Bounds:
+ *   arg1: [[0x0 ~> 0x8cccccccccccc], [0x0 ~> 0x8cccccccccccc], [0x0 ~> 0x8cccccccccccc], [0x0 ~> 0x8cccccccccccc], [0x0 ~> 0x8cccccccccccc]]
+ *   arg2: [[0x0 ~> 0x8cccccccccccc], [0x0 ~> 0x8cccccccccccc], [0x0 ~> 0x8cccccccccccc], [0x0 ~> 0x8cccccccccccc], [0x0 ~> 0x8cccccccccccc]]
+ * Output Bounds:
+ *   out1: [[0x0 ~> 0x1a666666666664], [0x0 ~> 0x1a666666666664], [0x0 ~> 0x1a666666666664], [0x0 ~> 0x1a666666666664], [0x0 ~> 0x1a666666666664]]
+ */
+static void fiat_25519_sub(uint64_t out1[5], const uint64_t arg1[5], const uint64_t arg2[5]) {
+  uint64_t x1 = ((UINT64_C(0xfffffffffffda) + (arg1[0])) - (arg2[0]));
+  uint64_t x2 = ((UINT64_C(0xffffffffffffe) + (arg1[1])) - (arg2[1]));
+  uint64_t x3 = ((UINT64_C(0xffffffffffffe) + (arg1[2])) - (arg2[2]));
+  uint64_t x4 = ((UINT64_C(0xffffffffffffe) + (arg1[3])) - (arg2[3]));
+  uint64_t x5 = ((UINT64_C(0xffffffffffffe) + (arg1[4])) - (arg2[4]));
+  out1[0] = x1;
+  out1[1] = x2;
+  out1[2] = x3;
+  out1[3] = x4;
+  out1[4] = x5;
+}
+
+/*
+ * Input Bounds:
+ *   arg1: [[0x0 ~> 0x8cccccccccccc], [0x0 ~> 0x8cccccccccccc], [0x0 ~> 0x8cccccccccccc], [0x0 ~> 0x8cccccccccccc], [0x0 ~> 0x8cccccccccccc]]
+ * Output Bounds:
+ *   out1: [[0x0 ~> 0x1a666666666664], [0x0 ~> 0x1a666666666664], [0x0 ~> 0x1a666666666664], [0x0 ~> 0x1a666666666664], [0x0 ~> 0x1a666666666664]]
+ */
+static void fiat_25519_opp(uint64_t out1[5], const uint64_t arg1[5]) {
+  uint64_t x1 = (UINT64_C(0xfffffffffffda) - (arg1[0]));
+  uint64_t x2 = (UINT64_C(0xffffffffffffe) - (arg1[1]));
+  uint64_t x3 = (UINT64_C(0xffffffffffffe) - (arg1[2]));
+  uint64_t x4 = (UINT64_C(0xffffffffffffe) - (arg1[3]));
+  uint64_t x5 = (UINT64_C(0xffffffffffffe) - (arg1[4]));
+  out1[0] = x1;
+  out1[1] = x2;
+  out1[2] = x3;
+  out1[3] = x4;
+  out1[4] = x5;
+}
+
+/*
+ * Input Bounds:
+ *   arg1: [0x0 ~> 0x1]
+ *   arg2: [[0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff]]
+ *   arg3: [[0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff]]
+ * Output Bounds:
+ *   out1: [[0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff]]
+ */
+static void fiat_25519_selectznz(uint64_t out1[5], fiat_25519_uint1 arg1, const uint64_t arg2[5], const uint64_t arg3[5]) {
+  uint64_t x1;
+  fiat_25519_cmovznz_u64(&x1, arg1, (arg2[0]), (arg3[0]));
+  uint64_t x2;
+  fiat_25519_cmovznz_u64(&x2, arg1, (arg2[1]), (arg3[1]));
+  uint64_t x3;
+  fiat_25519_cmovznz_u64(&x3, arg1, (arg2[2]), (arg3[2]));
+  uint64_t x4;
+  fiat_25519_cmovznz_u64(&x4, arg1, (arg2[3]), (arg3[3]));
+  uint64_t x5;
+  fiat_25519_cmovznz_u64(&x5, arg1, (arg2[4]), (arg3[4]));
+  out1[0] = x1;
+  out1[1] = x2;
+  out1[2] = x3;
+  out1[3] = x4;
+  out1[4] = x5;
+}
+
+/*
+ * Input Bounds:
+ *   arg1: [[0x0 ~> 0x8cccccccccccc], [0x0 ~> 0x8cccccccccccc], [0x0 ~> 0x8cccccccccccc], [0x0 ~> 0x8cccccccccccc], [0x0 ~> 0x8cccccccccccc]]
+ * Output Bounds:
+ *   out1: [[0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0x7f]]
+ */
+static void fiat_25519_to_bytes(uint8_t out1[32], const uint64_t arg1[5]) {
+  uint64_t x1;
+  fiat_25519_uint1 x2;
+  fiat_25519_subborrowx_u51(&x1, &x2, 0x0, (arg1[0]), UINT64_C(0x7ffffffffffed));
+  uint64_t x3;
+  fiat_25519_uint1 x4;
+  fiat_25519_subborrowx_u51(&x3, &x4, x2, (arg1[1]), UINT64_C(0x7ffffffffffff));
+  uint64_t x5;
+  fiat_25519_uint1 x6;
+  fiat_25519_subborrowx_u51(&x5, &x6, x4, (arg1[2]), UINT64_C(0x7ffffffffffff));
+  uint64_t x7;
+  fiat_25519_uint1 x8;
+  fiat_25519_subborrowx_u51(&x7, &x8, x6, (arg1[3]), UINT64_C(0x7ffffffffffff));
+  uint64_t x9;
+  fiat_25519_uint1 x10;
+  fiat_25519_subborrowx_u51(&x9, &x10, x8, (arg1[4]), UINT64_C(0x7ffffffffffff));
+  uint64_t x11;
+  fiat_25519_cmovznz_u64(&x11, x10, 0x0, UINT64_C(0xffffffffffffffff));
+  uint64_t x12;
+  fiat_25519_uint1 x13;
+  fiat_25519_addcarryx_u51(&x12, &x13, 0x0, (x11 & UINT64_C(0x7ffffffffffed)), x1);
+  uint64_t x14;
+  fiat_25519_uint1 x15;
+  fiat_25519_addcarryx_u51(&x14, &x15, x13, (x11 & UINT64_C(0x7ffffffffffff)), x3);
+  uint64_t x16;
+  fiat_25519_uint1 x17;
+  fiat_25519_addcarryx_u51(&x16, &x17, x15, (x11 & UINT64_C(0x7ffffffffffff)), x5);
+  uint64_t x18;
+  fiat_25519_uint1 x19;
+  fiat_25519_addcarryx_u51(&x18, &x19, x17, (x11 & UINT64_C(0x7ffffffffffff)), x7);
+  uint64_t x20;
+  fiat_25519_uint1 x21;
+  fiat_25519_addcarryx_u51(&x20, &x21, x19, (x11 & UINT64_C(0x7ffffffffffff)), x9);
+  uint64_t x22 = (x20 << 4);
+  uint64_t x23 = (x18 * (uint64_t)0x2);
+  uint64_t x24 = (x16 << 6);
+  uint64_t x25 = (x14 << 3);
+  uint64_t x26 = (x12 >> 8);
+  uint8_t x27 = (uint8_t)(x12 & UINT8_C(0xff));
+  uint64_t x28 = (x26 >> 8);
+  uint8_t x29 = (uint8_t)(x26 & UINT8_C(0xff));
+  uint64_t x30 = (x28 >> 8);
+  uint8_t x31 = (uint8_t)(x28 & UINT8_C(0xff));
+  uint64_t x32 = (x30 >> 8);
+  uint8_t x33 = (uint8_t)(x30 & UINT8_C(0xff));
+  uint64_t x34 = (x32 >> 8);
+  uint8_t x35 = (uint8_t)(x32 & UINT8_C(0xff));
+  uint8_t x36 = (uint8_t)(x34 >> 8);
+  uint8_t x37 = (uint8_t)(x34 & UINT8_C(0xff));
+  uint64_t x38 = (x36 + x25);
+  uint64_t x39 = (x38 >> 8);
+  uint8_t x40 = (uint8_t)(x38 & UINT8_C(0xff));
+  uint64_t x41 = (x39 >> 8);
+  uint8_t x42 = (uint8_t)(x39 & UINT8_C(0xff));
+  uint64_t x43 = (x41 >> 8);
+  uint8_t x44 = (uint8_t)(x41 & UINT8_C(0xff));
+  uint64_t x45 = (x43 >> 8);
+  uint8_t x46 = (uint8_t)(x43 & UINT8_C(0xff));
+  uint64_t x47 = (x45 >> 8);
+  uint8_t x48 = (uint8_t)(x45 & UINT8_C(0xff));
+  uint8_t x49 = (uint8_t)(x47 >> 8);
+  uint8_t x50 = (uint8_t)(x47 & UINT8_C(0xff));
+  uint64_t x51 = (x49 + x24);
+  uint64_t x52 = (x51 >> 8);
+  uint8_t x53 = (uint8_t)(x51 & UINT8_C(0xff));
+  uint64_t x54 = (x52 >> 8);
+  uint8_t x55 = (uint8_t)(x52 & UINT8_C(0xff));
+  uint64_t x56 = (x54 >> 8);
+  uint8_t x57 = (uint8_t)(x54 & UINT8_C(0xff));
+  uint64_t x58 = (x56 >> 8);
+  uint8_t x59 = (uint8_t)(x56 & UINT8_C(0xff));
+  uint64_t x60 = (x58 >> 8);
+  uint8_t x61 = (uint8_t)(x58 & UINT8_C(0xff));
+  uint64_t x62 = (x60 >> 8);
+  uint8_t x63 = (uint8_t)(x60 & UINT8_C(0xff));
+  fiat_25519_uint1 x64 = (fiat_25519_uint1)(x62 >> 8);
+  uint8_t x65 = (uint8_t)(x62 & UINT8_C(0xff));
+  uint64_t x66 = (x64 + x23);
+  uint64_t x67 = (x66 >> 8);
+  uint8_t x68 = (uint8_t)(x66 & UINT8_C(0xff));
+  uint64_t x69 = (x67 >> 8);
+  uint8_t x70 = (uint8_t)(x67 & UINT8_C(0xff));
+  uint64_t x71 = (x69 >> 8);
+  uint8_t x72 = (uint8_t)(x69 & UINT8_C(0xff));
+  uint64_t x73 = (x71 >> 8);
+  uint8_t x74 = (uint8_t)(x71 & UINT8_C(0xff));
+  uint64_t x75 = (x73 >> 8);
+  uint8_t x76 = (uint8_t)(x73 & UINT8_C(0xff));
+  uint8_t x77 = (uint8_t)(x75 >> 8);
+  uint8_t x78 = (uint8_t)(x75 & UINT8_C(0xff));
+  uint64_t x79 = (x77 + x22);
+  uint64_t x80 = (x79 >> 8);
+  uint8_t x81 = (uint8_t)(x79 & UINT8_C(0xff));
+  uint64_t x82 = (x80 >> 8);
+  uint8_t x83 = (uint8_t)(x80 & UINT8_C(0xff));
+  uint64_t x84 = (x82 >> 8);
+  uint8_t x85 = (uint8_t)(x82 & UINT8_C(0xff));
+  uint64_t x86 = (x84 >> 8);
+  uint8_t x87 = (uint8_t)(x84 & UINT8_C(0xff));
+  uint64_t x88 = (x86 >> 8);
+  uint8_t x89 = (uint8_t)(x86 & UINT8_C(0xff));
+  uint8_t x90 = (uint8_t)(x88 >> 8);
+  uint8_t x91 = (uint8_t)(x88 & UINT8_C(0xff));
+  out1[0] = x27;
+  out1[1] = x29;
+  out1[2] = x31;
+  out1[3] = x33;
+  out1[4] = x35;
+  out1[5] = x37;
+  out1[6] = x40;
+  out1[7] = x42;
+  out1[8] = x44;
+  out1[9] = x46;
+  out1[10] = x48;
+  out1[11] = x50;
+  out1[12] = x53;
+  out1[13] = x55;
+  out1[14] = x57;
+  out1[15] = x59;
+  out1[16] = x61;
+  out1[17] = x63;
+  out1[18] = x65;
+  out1[19] = x68;
+  out1[20] = x70;
+  out1[21] = x72;
+  out1[22] = x74;
+  out1[23] = x76;
+  out1[24] = x78;
+  out1[25] = x81;
+  out1[26] = x83;
+  out1[27] = x85;
+  out1[28] = x87;
+  out1[29] = x89;
+  out1[30] = x91;
+  out1[31] = x90;
+}
+
+/*
+ * Input Bounds:
+ *   arg1: [[0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0x7f]]
+ * Output Bounds:
+ *   out1: [[0x0 ~> 0x8cccccccccccc], [0x0 ~> 0x8cccccccccccc], [0x0 ~> 0x8cccccccccccc], [0x0 ~> 0x8cccccccccccc], [0x0 ~> 0x8cccccccccccc]]
+ */
+static void fiat_25519_from_bytes(uint64_t out1[5], const uint8_t arg1[32]) {
+  uint64_t x1 = ((uint64_t)(arg1[31]) << 44);
+  uint64_t x2 = ((uint64_t)(arg1[30]) << 36);
+  uint64_t x3 = ((uint64_t)(arg1[29]) << 28);
+  uint64_t x4 = ((uint64_t)(arg1[28]) << 20);
+  uint64_t x5 = ((uint64_t)(arg1[27]) << 12);
+  uint64_t x6 = ((uint64_t)(arg1[26]) << 4);
+  uint64_t x7 = ((uint64_t)(arg1[25]) << 47);
+  uint64_t x8 = ((uint64_t)(arg1[24]) << 39);
+  uint64_t x9 = ((uint64_t)(arg1[23]) << 31);
+  uint64_t x10 = ((uint64_t)(arg1[22]) << 23);
+  uint64_t x11 = ((uint64_t)(arg1[21]) << 15);
+  uint64_t x12 = ((uint64_t)(arg1[20]) << 7);
+  uint64_t x13 = ((uint64_t)(arg1[19]) << 50);
+  uint64_t x14 = ((uint64_t)(arg1[18]) << 42);
+  uint64_t x15 = ((uint64_t)(arg1[17]) << 34);
+  uint64_t x16 = ((uint64_t)(arg1[16]) << 26);
+  uint64_t x17 = ((uint64_t)(arg1[15]) << 18);
+  uint64_t x18 = ((uint64_t)(arg1[14]) << 10);
+  uint64_t x19 = ((uint64_t)(arg1[13]) << 2);
+  uint64_t x20 = ((uint64_t)(arg1[12]) << 45);
+  uint64_t x21 = ((uint64_t)(arg1[11]) << 37);
+  uint64_t x22 = ((uint64_t)(arg1[10]) << 29);
+  uint64_t x23 = ((uint64_t)(arg1[9]) << 21);
+  uint64_t x24 = ((uint64_t)(arg1[8]) << 13);
+  uint64_t x25 = ((uint64_t)(arg1[7]) << 5);
+  uint64_t x26 = ((uint64_t)(arg1[6]) << 48);
+  uint64_t x27 = ((uint64_t)(arg1[5]) << 40);
+  uint64_t x28 = ((uint64_t)(arg1[4]) << 32);
+  uint64_t x29 = ((uint64_t)(arg1[3]) << 24);
+  uint64_t x30 = ((uint64_t)(arg1[2]) << 16);
+  uint64_t x31 = ((uint64_t)(arg1[1]) << 8);
+  uint8_t x32 = (arg1[0]);
+  uint64_t x33 = (x32 + (x31 + (x30 + (x29 + (x28 + (x27 + x26))))));
+  uint8_t x34 = (uint8_t)(x33 >> 51);
+  uint64_t x35 = (x33 & UINT64_C(0x7ffffffffffff));
+  uint64_t x36 = (x6 + (x5 + (x4 + (x3 + (x2 + x1)))));
+  uint64_t x37 = (x12 + (x11 + (x10 + (x9 + (x8 + x7)))));
+  uint64_t x38 = (x19 + (x18 + (x17 + (x16 + (x15 + (x14 + x13))))));
+  uint64_t x39 = (x25 + (x24 + (x23 + (x22 + (x21 + x20)))));
+  uint64_t x40 = (x34 + x39);
+  uint8_t x41 = (uint8_t)(x40 >> 51);
+  uint64_t x42 = (x40 & UINT64_C(0x7ffffffffffff));
+  uint64_t x43 = (x41 + x38);
+  uint8_t x44 = (uint8_t)(x43 >> 51);
+  uint64_t x45 = (x43 & UINT64_C(0x7ffffffffffff));
+  uint64_t x46 = (x44 + x37);
+  uint8_t x47 = (uint8_t)(x46 >> 51);
+  uint64_t x48 = (x46 & UINT64_C(0x7ffffffffffff));
+  uint64_t x49 = (x47 + x36);
+  out1[0] = x35;
+  out1[1] = x42;
+  out1[2] = x45;
+  out1[3] = x48;
+  out1[4] = x49;
+}
+
diff --git a/third_party/fiat/p256.c b/third_party/fiat/p256.c
index 414b7e0..3c2ce1d 100644
--- a/third_party/fiat/p256.c
+++ b/third_party/fiat/p256.c
@@ -46,791 +46,11 @@
 // MSVC does not implement uint128_t, and crashes with intrinsics
 #if defined(BORINGSSL_HAS_UINT128)
 #define BORINGSSL_NISTP256_64BIT 1
-#endif
-
-// "intrinsics"
-
-#if defined(BORINGSSL_NISTP256_64BIT)
-
-static uint64_t mulx_u64(uint64_t a, uint64_t b, uint64_t *high) {
-  uint128_t x = (uint128_t)a * b;
-  *high = (uint64_t) (x >> 64);
-  return (uint64_t) x;
-}
-
-static uint64_t addcarryx_u64(uint8_t c, uint64_t a, uint64_t b, uint64_t *low) {
-  uint128_t x = (uint128_t)a + b + c;
-  *low = (uint64_t) x;
-  return (uint64_t) (x>>64);
-}
-
-static uint64_t subborrow_u64(uint8_t c, uint64_t a, uint64_t b, uint64_t *low) {
-  uint128_t t = ((uint128_t) b + c);
-  uint128_t x = a-t;
-  *low = (uint64_t) x;
-  return (uint8_t) (x>>127);
-}
-
-static uint64_t cmovznz_u64(uint64_t t, uint64_t z, uint64_t nz) {
-  t = -!!t; // all set if nonzero, 0 if 0
-  return (t&nz) | ((~t)&z);
-}
-
+#include "p256_64.c"
 #else
-
-static uint32_t mulx_u32(uint32_t a, uint32_t b, uint32_t *high) {
-  uint64_t x = (uint64_t)a * b;
-  *high = (uint32_t) (x >> 32);
-  return (uint32_t) x;
-}
-
-static uint32_t addcarryx_u32(uint8_t c, uint32_t a, uint32_t b, uint32_t *low) {
-  uint64_t x = (uint64_t)a + b + c;
-  *low = (uint32_t) x;
-  return (uint32_t) (x>>32);
-}
-
-static uint32_t subborrow_u32(uint8_t c, uint32_t a, uint32_t b, uint32_t *low) {
-  uint64_t t = ((uint64_t) b + c);
-  uint64_t x = a-t;
-  *low = (uint32_t) x;
-  return (uint8_t) (x>>63);
-}
-
-static uint32_t cmovznz_u32(uint32_t t, uint32_t z, uint32_t nz) {
-  t = -!!t; // all set if nonzero, 0 if 0
-  return (t&nz) | ((~t)&z);
-}
-
+#include "p256_32.c"
 #endif
 
-// fiat-crypto generated code
-
-#if defined(BORINGSSL_NISTP256_64BIT)
-
-static void fe_add(uint64_t out[4], const uint64_t in1[4], const uint64_t in2[4]) {
-  { const uint64_t x8 = in1[3];
-  { const uint64_t x9 = in1[2];
-  { const uint64_t x7 = in1[1];
-  { const uint64_t x5 = in1[0];
-  { const uint64_t x14 = in2[3];
-  { const uint64_t x15 = in2[2];
-  { const uint64_t x13 = in2[1];
-  { const uint64_t x11 = in2[0];
-  { uint64_t x17; uint8_t x18 = addcarryx_u64(0x0, x5, x11, &x17);
-  { uint64_t x20; uint8_t x21 = addcarryx_u64(x18, x7, x13, &x20);
-  { uint64_t x23; uint8_t x24 = addcarryx_u64(x21, x9, x15, &x23);
-  { uint64_t x26; uint8_t x27 = addcarryx_u64(x24, x8, x14, &x26);
-  { uint64_t x29; uint8_t x30 = subborrow_u64(0x0, x17, 0xffffffffffffffffL, &x29);
-  { uint64_t x32; uint8_t x33 = subborrow_u64(x30, x20, 0xffffffff, &x32);
-  { uint64_t x35; uint8_t x36 = subborrow_u64(x33, x23, 0x0, &x35);
-  { uint64_t x38; uint8_t x39 = subborrow_u64(x36, x26, 0xffffffff00000001L, &x38);
-  { uint64_t _1; uint8_t x42 = subborrow_u64(x39, x27, 0x0, &_1);
-  { uint64_t x43 = cmovznz_u64(x42, x38, x26);
-  { uint64_t x44 = cmovznz_u64(x42, x35, x23);
-  { uint64_t x45 = cmovznz_u64(x42, x32, x20);
-  { uint64_t x46 = cmovznz_u64(x42, x29, x17);
-  out[0] = x46;
-  out[1] = x45;
-  out[2] = x44;
-  out[3] = x43;
-  }}}}}}}}}}}}}}}}}}}}}
-}
-
-// fe_op sets out = -in
-static void fe_opp(uint64_t out[4], const uint64_t in1[4]) {
-  const uint64_t x5 = in1[3];
-  const uint64_t x6 = in1[2];
-  const uint64_t x4 = in1[1];
-  const uint64_t x2 = in1[0];
-  uint64_t x8; uint8_t x9 = subborrow_u64(0x0, 0x0, x2, &x8);
-  uint64_t x11; uint8_t x12 = subborrow_u64(x9, 0x0, x4, &x11);
-  uint64_t x14; uint8_t x15 = subborrow_u64(x12, 0x0, x6, &x14);
-  uint64_t x17; uint8_t x18 = subborrow_u64(x15, 0x0, x5, &x17);
-  uint64_t x19 = (uint64_t)cmovznz_u64(x18, 0x0, 0xffffffffffffffffL);
-  uint64_t x20 = (x19 & 0xffffffffffffffffL);
-  uint64_t x22; uint8_t x23 = addcarryx_u64(0x0, x8, x20, &x22);
-  uint64_t x24 = (x19 & 0xffffffff);
-  uint64_t x26; uint8_t x27 = addcarryx_u64(x23, x11, x24, &x26);
-  uint64_t x29; uint8_t x30 = addcarryx_u64(x27, x14, 0x0, &x29);
-  uint64_t x31 = (x19 & 0xffffffff00000001L);
-  uint64_t x33; addcarryx_u64(x30, x17, x31, &x33);
-  out[0] = x22;
-  out[1] = x26;
-  out[2] = x29;
-  out[3] = x33;
-}
-
-static void fe_mul(uint64_t out[4], const uint64_t in1[4], const uint64_t in2[4]) {
-  const uint64_t x8 = in1[3];
-  const uint64_t x9 = in1[2];
-  const uint64_t x7 = in1[1];
-  const uint64_t x5 = in1[0];
-  const uint64_t x14 = in2[3];
-  const uint64_t x15 = in2[2];
-  const uint64_t x13 = in2[1];
-  const uint64_t x11 = in2[0];
-  uint64_t x18;  uint64_t x17 = mulx_u64(x5, x11, &x18);
-  uint64_t x21;  uint64_t x20 = mulx_u64(x5, x13, &x21);
-  uint64_t x24;  uint64_t x23 = mulx_u64(x5, x15, &x24);
-  uint64_t x27;  uint64_t x26 = mulx_u64(x5, x14, &x27);
-  uint64_t x29; uint8_t x30 = addcarryx_u64(0x0, x18, x20, &x29);
-  uint64_t x32; uint8_t x33 = addcarryx_u64(x30, x21, x23, &x32);
-  uint64_t x35; uint8_t x36 = addcarryx_u64(x33, x24, x26, &x35);
-  uint64_t x38; addcarryx_u64(0x0, x36, x27, &x38);
-  uint64_t x42;  uint64_t x41 = mulx_u64(x17, 0xffffffffffffffffL, &x42);
-  uint64_t x45;  uint64_t x44 = mulx_u64(x17, 0xffffffff, &x45);
-  uint64_t x48;  uint64_t x47 = mulx_u64(x17, 0xffffffff00000001L, &x48);
-  uint64_t x50; uint8_t x51 = addcarryx_u64(0x0, x42, x44, &x50);
-  uint64_t x53; uint8_t x54 = addcarryx_u64(x51, x45, 0x0, &x53);
-  uint64_t x56; uint8_t x57 = addcarryx_u64(x54, 0x0, x47, &x56);
-  uint64_t x59; addcarryx_u64(0x0, x57, x48, &x59);
-  uint64_t _2; uint8_t x63 = addcarryx_u64(0x0, x17, x41, &_2);
-  uint64_t x65; uint8_t x66 = addcarryx_u64(x63, x29, x50, &x65);
-  uint64_t x68; uint8_t x69 = addcarryx_u64(x66, x32, x53, &x68);
-  uint64_t x71; uint8_t x72 = addcarryx_u64(x69, x35, x56, &x71);
-  uint64_t x74; uint8_t x75 = addcarryx_u64(x72, x38, x59, &x74);
-  uint64_t x78;  uint64_t x77 = mulx_u64(x7, x11, &x78);
-  uint64_t x81;  uint64_t x80 = mulx_u64(x7, x13, &x81);
-  uint64_t x84;  uint64_t x83 = mulx_u64(x7, x15, &x84);
-  uint64_t x87;  uint64_t x86 = mulx_u64(x7, x14, &x87);
-  uint64_t x89; uint8_t x90 = addcarryx_u64(0x0, x78, x80, &x89);
-  uint64_t x92; uint8_t x93 = addcarryx_u64(x90, x81, x83, &x92);
-  uint64_t x95; uint8_t x96 = addcarryx_u64(x93, x84, x86, &x95);
-  uint64_t x98; addcarryx_u64(0x0, x96, x87, &x98);
-  uint64_t x101; uint8_t x102 = addcarryx_u64(0x0, x65, x77, &x101);
-  uint64_t x104; uint8_t x105 = addcarryx_u64(x102, x68, x89, &x104);
-  uint64_t x107; uint8_t x108 = addcarryx_u64(x105, x71, x92, &x107);
-  uint64_t x110; uint8_t x111 = addcarryx_u64(x108, x74, x95, &x110);
-  uint64_t x113; uint8_t x114 = addcarryx_u64(x111, x75, x98, &x113);
-  uint64_t x117;  uint64_t x116 = mulx_u64(x101, 0xffffffffffffffffL, &x117);
-  uint64_t x120;  uint64_t x119 = mulx_u64(x101, 0xffffffff, &x120);
-  uint64_t x123;  uint64_t x122 = mulx_u64(x101, 0xffffffff00000001L, &x123);
-  uint64_t x125; uint8_t x126 = addcarryx_u64(0x0, x117, x119, &x125);
-  uint64_t x128; uint8_t x129 = addcarryx_u64(x126, x120, 0x0, &x128);
-  uint64_t x131; uint8_t x132 = addcarryx_u64(x129, 0x0, x122, &x131);
-  uint64_t x134; addcarryx_u64(0x0, x132, x123, &x134);
-  uint64_t _3; uint8_t x138 = addcarryx_u64(0x0, x101, x116, &_3);
-  uint64_t x140; uint8_t x141 = addcarryx_u64(x138, x104, x125, &x140);
-  uint64_t x143; uint8_t x144 = addcarryx_u64(x141, x107, x128, &x143);
-  uint64_t x146; uint8_t x147 = addcarryx_u64(x144, x110, x131, &x146);
-  uint64_t x149; uint8_t x150 = addcarryx_u64(x147, x113, x134, &x149);
-  uint8_t x151 = (x150 + x114);
-  uint64_t x154;  uint64_t x153 = mulx_u64(x9, x11, &x154);
-  uint64_t x157;  uint64_t x156 = mulx_u64(x9, x13, &x157);
-  uint64_t x160;  uint64_t x159 = mulx_u64(x9, x15, &x160);
-  uint64_t x163;  uint64_t x162 = mulx_u64(x9, x14, &x163);
-  uint64_t x165; uint8_t x166 = addcarryx_u64(0x0, x154, x156, &x165);
-  uint64_t x168; uint8_t x169 = addcarryx_u64(x166, x157, x159, &x168);
-  uint64_t x171; uint8_t x172 = addcarryx_u64(x169, x160, x162, &x171);
-  uint64_t x174; addcarryx_u64(0x0, x172, x163, &x174);
-  uint64_t x177; uint8_t x178 = addcarryx_u64(0x0, x140, x153, &x177);
-  uint64_t x180; uint8_t x181 = addcarryx_u64(x178, x143, x165, &x180);
-  uint64_t x183; uint8_t x184 = addcarryx_u64(x181, x146, x168, &x183);
-  uint64_t x186; uint8_t x187 = addcarryx_u64(x184, x149, x171, &x186);
-  uint64_t x189; uint8_t x190 = addcarryx_u64(x187, x151, x174, &x189);
-  uint64_t x193;  uint64_t x192 = mulx_u64(x177, 0xffffffffffffffffL, &x193);
-  uint64_t x196;  uint64_t x195 = mulx_u64(x177, 0xffffffff, &x196);
-  uint64_t x199;  uint64_t x198 = mulx_u64(x177, 0xffffffff00000001L, &x199);
-  uint64_t x201; uint8_t x202 = addcarryx_u64(0x0, x193, x195, &x201);
-  uint64_t x204; uint8_t x205 = addcarryx_u64(x202, x196, 0x0, &x204);
-  uint64_t x207; uint8_t x208 = addcarryx_u64(x205, 0x0, x198, &x207);
-  uint64_t x210; addcarryx_u64(0x0, x208, x199, &x210);
-  uint64_t _4; uint8_t x214 = addcarryx_u64(0x0, x177, x192, &_4);
-  uint64_t x216; uint8_t x217 = addcarryx_u64(x214, x180, x201, &x216);
-  uint64_t x219; uint8_t x220 = addcarryx_u64(x217, x183, x204, &x219);
-  uint64_t x222; uint8_t x223 = addcarryx_u64(x220, x186, x207, &x222);
-  uint64_t x225; uint8_t x226 = addcarryx_u64(x223, x189, x210, &x225);
-  uint8_t x227 = (x226 + x190);
-  uint64_t x230;  uint64_t x229 = mulx_u64(x8, x11, &x230);
-  uint64_t x233;  uint64_t x232 = mulx_u64(x8, x13, &x233);
-  uint64_t x236;  uint64_t x235 = mulx_u64(x8, x15, &x236);
-  uint64_t x239;  uint64_t x238 = mulx_u64(x8, x14, &x239);
-  uint64_t x241; uint8_t x242 = addcarryx_u64(0x0, x230, x232, &x241);
-  uint64_t x244; uint8_t x245 = addcarryx_u64(x242, x233, x235, &x244);
-  uint64_t x247; uint8_t x248 = addcarryx_u64(x245, x236, x238, &x247);
-  uint64_t x250; addcarryx_u64(0x0, x248, x239, &x250);
-  uint64_t x253; uint8_t x254 = addcarryx_u64(0x0, x216, x229, &x253);
-  uint64_t x256; uint8_t x257 = addcarryx_u64(x254, x219, x241, &x256);
-  uint64_t x259; uint8_t x260 = addcarryx_u64(x257, x222, x244, &x259);
-  uint64_t x262; uint8_t x263 = addcarryx_u64(x260, x225, x247, &x262);
-  uint64_t x265; uint8_t x266 = addcarryx_u64(x263, x227, x250, &x265);
-  uint64_t x269;  uint64_t x268 = mulx_u64(x253, 0xffffffffffffffffL, &x269);
-  uint64_t x272;  uint64_t x271 = mulx_u64(x253, 0xffffffff, &x272);
-  uint64_t x275;  uint64_t x274 = mulx_u64(x253, 0xffffffff00000001L, &x275);
-  uint64_t x277; uint8_t x278 = addcarryx_u64(0x0, x269, x271, &x277);
-  uint64_t x280; uint8_t x281 = addcarryx_u64(x278, x272, 0x0, &x280);
-  uint64_t x283; uint8_t x284 = addcarryx_u64(x281, 0x0, x274, &x283);
-  uint64_t x286; addcarryx_u64(0x0, x284, x275, &x286);
-  uint64_t _5; uint8_t x290 = addcarryx_u64(0x0, x253, x268, &_5);
-  uint64_t x292; uint8_t x293 = addcarryx_u64(x290, x256, x277, &x292);
-  uint64_t x295; uint8_t x296 = addcarryx_u64(x293, x259, x280, &x295);
-  uint64_t x298; uint8_t x299 = addcarryx_u64(x296, x262, x283, &x298);
-  uint64_t x301; uint8_t x302 = addcarryx_u64(x299, x265, x286, &x301);
-  uint8_t x303 = (x302 + x266);
-  uint64_t x305; uint8_t x306 = subborrow_u64(0x0, x292, 0xffffffffffffffffL, &x305);
-  uint64_t x308; uint8_t x309 = subborrow_u64(x306, x295, 0xffffffff, &x308);
-  uint64_t x311; uint8_t x312 = subborrow_u64(x309, x298, 0x0, &x311);
-  uint64_t x314; uint8_t x315 = subborrow_u64(x312, x301, 0xffffffff00000001L, &x314);
-  uint64_t _6; uint8_t x318 = subborrow_u64(x315, x303, 0x0, &_6);
-  uint64_t x319 = cmovznz_u64(x318, x314, x301);
-  uint64_t x320 = cmovznz_u64(x318, x311, x298);
-  uint64_t x321 = cmovznz_u64(x318, x308, x295);
-  uint64_t x322 = cmovznz_u64(x318, x305, x292);
-  out[0] = x322;
-  out[1] = x321;
-  out[2] = x320;
-  out[3] = x319;
-}
-
-static void fe_sub(uint64_t out[4], const uint64_t in1[4], const uint64_t in2[4]) {
-  const uint64_t x8 = in1[3];
-  const uint64_t x9 = in1[2];
-  const uint64_t x7 = in1[1];
-  const uint64_t x5 = in1[0];
-  const uint64_t x14 = in2[3];
-  const uint64_t x15 = in2[2];
-  const uint64_t x13 = in2[1];
-  const uint64_t x11 = in2[0];
-  uint64_t x17; uint8_t x18 = subborrow_u64(0x0, x5, x11, &x17);
-  uint64_t x20; uint8_t x21 = subborrow_u64(x18, x7, x13, &x20);
-  uint64_t x23; uint8_t x24 = subborrow_u64(x21, x9, x15, &x23);
-  uint64_t x26; uint8_t x27 = subborrow_u64(x24, x8, x14, &x26);
-  uint64_t x28 = (uint64_t)cmovznz_u64(x27, 0x0, 0xffffffffffffffffL);
-  uint64_t x29 = (x28 & 0xffffffffffffffffL);
-  uint64_t x31; uint8_t x32 = addcarryx_u64(0x0, x17, x29, &x31);
-  uint64_t x33 = (x28 & 0xffffffff);
-  uint64_t x35; uint8_t x36 = addcarryx_u64(x32, x20, x33, &x35);
-  uint64_t x38; uint8_t x39 = addcarryx_u64(x36, x23, 0x0, &x38);
-  uint64_t x40 = (x28 & 0xffffffff00000001L);
-  uint64_t x42; addcarryx_u64(x39, x26, x40, &x42);
-  out[0] = x31;
-  out[1] = x35;
-  out[2] = x38;
-  out[3] = x42;
-}
-
-#else // 64BIT, else 32BIT
-
-static void fe_add(uint32_t out[8], const uint32_t in1[8], const uint32_t in2[8]) {
-  const uint32_t x16 = in1[7];
-  const uint32_t x17 = in1[6];
-  const uint32_t x15 = in1[5];
-  const uint32_t x13 = in1[4];
-  const uint32_t x11 = in1[3];
-  const uint32_t x9 = in1[2];
-  const uint32_t x7 = in1[1];
-  const uint32_t x5 = in1[0];
-  const uint32_t x30 = in2[7];
-  const uint32_t x31 = in2[6];
-  const uint32_t x29 = in2[5];
-  const uint32_t x27 = in2[4];
-  const uint32_t x25 = in2[3];
-  const uint32_t x23 = in2[2];
-  const uint32_t x21 = in2[1];
-  const uint32_t x19 = in2[0];
-  uint32_t x33; uint8_t x34 = addcarryx_u32(0x0, x5, x19, &x33);
-  uint32_t x36; uint8_t x37 = addcarryx_u32(x34, x7, x21, &x36);
-  uint32_t x39; uint8_t x40 = addcarryx_u32(x37, x9, x23, &x39);
-  uint32_t x42; uint8_t x43 = addcarryx_u32(x40, x11, x25, &x42);
-  uint32_t x45; uint8_t x46 = addcarryx_u32(x43, x13, x27, &x45);
-  uint32_t x48; uint8_t x49 = addcarryx_u32(x46, x15, x29, &x48);
-  uint32_t x51; uint8_t x52 = addcarryx_u32(x49, x17, x31, &x51);
-  uint32_t x54; uint8_t x55 = addcarryx_u32(x52, x16, x30, &x54);
-  uint32_t x57; uint8_t x58 = subborrow_u32(0x0, x33, 0xffffffff, &x57);
-  uint32_t x60; uint8_t x61 = subborrow_u32(x58, x36, 0xffffffff, &x60);
-  uint32_t x63; uint8_t x64 = subborrow_u32(x61, x39, 0xffffffff, &x63);
-  uint32_t x66; uint8_t x67 = subborrow_u32(x64, x42, 0x0, &x66);
-  uint32_t x69; uint8_t x70 = subborrow_u32(x67, x45, 0x0, &x69);
-  uint32_t x72; uint8_t x73 = subborrow_u32(x70, x48, 0x0, &x72);
-  uint32_t x75; uint8_t x76 = subborrow_u32(x73, x51, 0x1, &x75);
-  uint32_t x78; uint8_t x79 = subborrow_u32(x76, x54, 0xffffffff, &x78);
-  uint32_t _; uint8_t x82 = subborrow_u32(x79, x55, 0x0, &_);
-  uint32_t x83 = cmovznz_u32(x82, x78, x54);
-  uint32_t x84 = cmovznz_u32(x82, x75, x51);
-  uint32_t x85 = cmovznz_u32(x82, x72, x48);
-  uint32_t x86 = cmovznz_u32(x82, x69, x45);
-  uint32_t x87 = cmovznz_u32(x82, x66, x42);
-  uint32_t x88 = cmovznz_u32(x82, x63, x39);
-  uint32_t x89 = cmovznz_u32(x82, x60, x36);
-  uint32_t x90 = cmovznz_u32(x82, x57, x33);
-  out[0] = x90;
-  out[1] = x89;
-  out[2] = x88;
-  out[3] = x87;
-  out[4] = x86;
-  out[5] = x85;
-  out[6] = x84;
-  out[7] = x83;
-}
-
-static void fe_mul(uint32_t out[8], const uint32_t in1[8], const uint32_t in2[8]) {
-  const uint32_t x16 = in1[7];
-  const uint32_t x17 = in1[6];
-  const uint32_t x15 = in1[5];
-  const uint32_t x13 = in1[4];
-  const uint32_t x11 = in1[3];
-  const uint32_t x9 = in1[2];
-  const uint32_t x7 = in1[1];
-  const uint32_t x5 = in1[0];
-  const uint32_t x30 = in2[7];
-  const uint32_t x31 = in2[6];
-  const uint32_t x29 = in2[5];
-  const uint32_t x27 = in2[4];
-  const uint32_t x25 = in2[3];
-  const uint32_t x23 = in2[2];
-  const uint32_t x21 = in2[1];
-  const uint32_t x19 = in2[0];
-  uint32_t x34;  uint32_t x33 = mulx_u32(x5, x19, &x34);
-  uint32_t x37;  uint32_t x36 = mulx_u32(x5, x21, &x37);
-  uint32_t x40;  uint32_t x39 = mulx_u32(x5, x23, &x40);
-  uint32_t x43;  uint32_t x42 = mulx_u32(x5, x25, &x43);
-  uint32_t x46;  uint32_t x45 = mulx_u32(x5, x27, &x46);
-  uint32_t x49;  uint32_t x48 = mulx_u32(x5, x29, &x49);
-  uint32_t x52;  uint32_t x51 = mulx_u32(x5, x31, &x52);
-  uint32_t x55;  uint32_t x54 = mulx_u32(x5, x30, &x55);
-  uint32_t x57; uint8_t x58 = addcarryx_u32(0x0, x34, x36, &x57);
-  uint32_t x60; uint8_t x61 = addcarryx_u32(x58, x37, x39, &x60);
-  uint32_t x63; uint8_t x64 = addcarryx_u32(x61, x40, x42, &x63);
-  uint32_t x66; uint8_t x67 = addcarryx_u32(x64, x43, x45, &x66);
-  uint32_t x69; uint8_t x70 = addcarryx_u32(x67, x46, x48, &x69);
-  uint32_t x72; uint8_t x73 = addcarryx_u32(x70, x49, x51, &x72);
-  uint32_t x75; uint8_t x76 = addcarryx_u32(x73, x52, x54, &x75);
-  uint32_t x78; addcarryx_u32(0x0, x76, x55, &x78);
-  uint32_t x82;  uint32_t x81 = mulx_u32(x33, 0xffffffff, &x82);
-  uint32_t x85;  uint32_t x84 = mulx_u32(x33, 0xffffffff, &x85);
-  uint32_t x88;  uint32_t x87 = mulx_u32(x33, 0xffffffff, &x88);
-  uint32_t x91;  uint32_t x90 = mulx_u32(x33, 0xffffffff, &x91);
-  uint32_t x93; uint8_t x94 = addcarryx_u32(0x0, x82, x84, &x93);
-  uint32_t x96; uint8_t x97 = addcarryx_u32(x94, x85, x87, &x96);
-  uint32_t x99; uint8_t x100 = addcarryx_u32(x97, x88, 0x0, &x99);
-  uint8_t x101 = (0x0 + 0x0);
-  uint32_t _1; uint8_t x104 = addcarryx_u32(0x0, x33, x81, &_1);
-  uint32_t x106; uint8_t x107 = addcarryx_u32(x104, x57, x93, &x106);
-  uint32_t x109; uint8_t x110 = addcarryx_u32(x107, x60, x96, &x109);
-  uint32_t x112; uint8_t x113 = addcarryx_u32(x110, x63, x99, &x112);
-  uint32_t x115; uint8_t x116 = addcarryx_u32(x113, x66, x100, &x115);
-  uint32_t x118; uint8_t x119 = addcarryx_u32(x116, x69, x101, &x118);
-  uint32_t x121; uint8_t x122 = addcarryx_u32(x119, x72, x33, &x121);
-  uint32_t x124; uint8_t x125 = addcarryx_u32(x122, x75, x90, &x124);
-  uint32_t x127; uint8_t x128 = addcarryx_u32(x125, x78, x91, &x127);
-  uint8_t x129 = (x128 + 0x0);
-  uint32_t x132;  uint32_t x131 = mulx_u32(x7, x19, &x132);
-  uint32_t x135;  uint32_t x134 = mulx_u32(x7, x21, &x135);
-  uint32_t x138;  uint32_t x137 = mulx_u32(x7, x23, &x138);
-  uint32_t x141;  uint32_t x140 = mulx_u32(x7, x25, &x141);
-  uint32_t x144;  uint32_t x143 = mulx_u32(x7, x27, &x144);
-  uint32_t x147;  uint32_t x146 = mulx_u32(x7, x29, &x147);
-  uint32_t x150;  uint32_t x149 = mulx_u32(x7, x31, &x150);
-  uint32_t x153;  uint32_t x152 = mulx_u32(x7, x30, &x153);
-  uint32_t x155; uint8_t x156 = addcarryx_u32(0x0, x132, x134, &x155);
-  uint32_t x158; uint8_t x159 = addcarryx_u32(x156, x135, x137, &x158);
-  uint32_t x161; uint8_t x162 = addcarryx_u32(x159, x138, x140, &x161);
-  uint32_t x164; uint8_t x165 = addcarryx_u32(x162, x141, x143, &x164);
-  uint32_t x167; uint8_t x168 = addcarryx_u32(x165, x144, x146, &x167);
-  uint32_t x170; uint8_t x171 = addcarryx_u32(x168, x147, x149, &x170);
-  uint32_t x173; uint8_t x174 = addcarryx_u32(x171, x150, x152, &x173);
-  uint32_t x176; addcarryx_u32(0x0, x174, x153, &x176);
-  uint32_t x179; uint8_t x180 = addcarryx_u32(0x0, x106, x131, &x179);
-  uint32_t x182; uint8_t x183 = addcarryx_u32(x180, x109, x155, &x182);
-  uint32_t x185; uint8_t x186 = addcarryx_u32(x183, x112, x158, &x185);
-  uint32_t x188; uint8_t x189 = addcarryx_u32(x186, x115, x161, &x188);
-  uint32_t x191; uint8_t x192 = addcarryx_u32(x189, x118, x164, &x191);
-  uint32_t x194; uint8_t x195 = addcarryx_u32(x192, x121, x167, &x194);
-  uint32_t x197; uint8_t x198 = addcarryx_u32(x195, x124, x170, &x197);
-  uint32_t x200; uint8_t x201 = addcarryx_u32(x198, x127, x173, &x200);
-  uint32_t x203; uint8_t x204 = addcarryx_u32(x201, x129, x176, &x203);
-  uint32_t x207;  uint32_t x206 = mulx_u32(x179, 0xffffffff, &x207);
-  uint32_t x210;  uint32_t x209 = mulx_u32(x179, 0xffffffff, &x210);
-  uint32_t x213;  uint32_t x212 = mulx_u32(x179, 0xffffffff, &x213);
-  uint32_t x216;  uint32_t x215 = mulx_u32(x179, 0xffffffff, &x216);
-  uint32_t x218; uint8_t x219 = addcarryx_u32(0x0, x207, x209, &x218);
-  uint32_t x221; uint8_t x222 = addcarryx_u32(x219, x210, x212, &x221);
-  uint32_t x224; uint8_t x225 = addcarryx_u32(x222, x213, 0x0, &x224);
-  uint8_t x226 = (0x0 + 0x0);
-  uint32_t _2; uint8_t x229 = addcarryx_u32(0x0, x179, x206, &_2);
-  uint32_t x231; uint8_t x232 = addcarryx_u32(x229, x182, x218, &x231);
-  uint32_t x234; uint8_t x235 = addcarryx_u32(x232, x185, x221, &x234);
-  uint32_t x237; uint8_t x238 = addcarryx_u32(x235, x188, x224, &x237);
-  uint32_t x240; uint8_t x241 = addcarryx_u32(x238, x191, x225, &x240);
-  uint32_t x243; uint8_t x244 = addcarryx_u32(x241, x194, x226, &x243);
-  uint32_t x246; uint8_t x247 = addcarryx_u32(x244, x197, x179, &x246);
-  uint32_t x249; uint8_t x250 = addcarryx_u32(x247, x200, x215, &x249);
-  uint32_t x252; uint8_t x253 = addcarryx_u32(x250, x203, x216, &x252);
-  uint8_t x254 = (x253 + x204);
-  uint32_t x257;  uint32_t x256 = mulx_u32(x9, x19, &x257);
-  uint32_t x260;  uint32_t x259 = mulx_u32(x9, x21, &x260);
-  uint32_t x263;  uint32_t x262 = mulx_u32(x9, x23, &x263);
-  uint32_t x266;  uint32_t x265 = mulx_u32(x9, x25, &x266);
-  uint32_t x269;  uint32_t x268 = mulx_u32(x9, x27, &x269);
-  uint32_t x272;  uint32_t x271 = mulx_u32(x9, x29, &x272);
-  uint32_t x275;  uint32_t x274 = mulx_u32(x9, x31, &x275);
-  uint32_t x278;  uint32_t x277 = mulx_u32(x9, x30, &x278);
-  uint32_t x280; uint8_t x281 = addcarryx_u32(0x0, x257, x259, &x280);
-  uint32_t x283; uint8_t x284 = addcarryx_u32(x281, x260, x262, &x283);
-  uint32_t x286; uint8_t x287 = addcarryx_u32(x284, x263, x265, &x286);
-  uint32_t x289; uint8_t x290 = addcarryx_u32(x287, x266, x268, &x289);
-  uint32_t x292; uint8_t x293 = addcarryx_u32(x290, x269, x271, &x292);
-  uint32_t x295; uint8_t x296 = addcarryx_u32(x293, x272, x274, &x295);
-  uint32_t x298; uint8_t x299 = addcarryx_u32(x296, x275, x277, &x298);
-  uint32_t x301; addcarryx_u32(0x0, x299, x278, &x301);
-  uint32_t x304; uint8_t x305 = addcarryx_u32(0x0, x231, x256, &x304);
-  uint32_t x307; uint8_t x308 = addcarryx_u32(x305, x234, x280, &x307);
-  uint32_t x310; uint8_t x311 = addcarryx_u32(x308, x237, x283, &x310);
-  uint32_t x313; uint8_t x314 = addcarryx_u32(x311, x240, x286, &x313);
-  uint32_t x316; uint8_t x317 = addcarryx_u32(x314, x243, x289, &x316);
-  uint32_t x319; uint8_t x320 = addcarryx_u32(x317, x246, x292, &x319);
-  uint32_t x322; uint8_t x323 = addcarryx_u32(x320, x249, x295, &x322);
-  uint32_t x325; uint8_t x326 = addcarryx_u32(x323, x252, x298, &x325);
-  uint32_t x328; uint8_t x329 = addcarryx_u32(x326, x254, x301, &x328);
-  uint32_t x332;  uint32_t x331 = mulx_u32(x304, 0xffffffff, &x332);
-  uint32_t x335;  uint32_t x334 = mulx_u32(x304, 0xffffffff, &x335);
-  uint32_t x338;  uint32_t x337 = mulx_u32(x304, 0xffffffff, &x338);
-  uint32_t x341;  uint32_t x340 = mulx_u32(x304, 0xffffffff, &x341);
-  uint32_t x343; uint8_t x344 = addcarryx_u32(0x0, x332, x334, &x343);
-  uint32_t x346; uint8_t x347 = addcarryx_u32(x344, x335, x337, &x346);
-  uint32_t x349; uint8_t x350 = addcarryx_u32(x347, x338, 0x0, &x349);
-  uint8_t x351 = (0x0 + 0x0);
-  uint32_t _3; uint8_t x354 = addcarryx_u32(0x0, x304, x331, &_3);
-  uint32_t x356; uint8_t x357 = addcarryx_u32(x354, x307, x343, &x356);
-  uint32_t x359; uint8_t x360 = addcarryx_u32(x357, x310, x346, &x359);
-  uint32_t x362; uint8_t x363 = addcarryx_u32(x360, x313, x349, &x362);
-  uint32_t x365; uint8_t x366 = addcarryx_u32(x363, x316, x350, &x365);
-  uint32_t x368; uint8_t x369 = addcarryx_u32(x366, x319, x351, &x368);
-  uint32_t x371; uint8_t x372 = addcarryx_u32(x369, x322, x304, &x371);
-  uint32_t x374; uint8_t x375 = addcarryx_u32(x372, x325, x340, &x374);
-  uint32_t x377; uint8_t x378 = addcarryx_u32(x375, x328, x341, &x377);
-  uint8_t x379 = (x378 + x329);
-  uint32_t x382;  uint32_t x381 = mulx_u32(x11, x19, &x382);
-  uint32_t x385;  uint32_t x384 = mulx_u32(x11, x21, &x385);
-  uint32_t x388;  uint32_t x387 = mulx_u32(x11, x23, &x388);
-  uint32_t x391;  uint32_t x390 = mulx_u32(x11, x25, &x391);
-  uint32_t x394;  uint32_t x393 = mulx_u32(x11, x27, &x394);
-  uint32_t x397;  uint32_t x396 = mulx_u32(x11, x29, &x397);
-  uint32_t x400;  uint32_t x399 = mulx_u32(x11, x31, &x400);
-  uint32_t x403;  uint32_t x402 = mulx_u32(x11, x30, &x403);
-  uint32_t x405; uint8_t x406 = addcarryx_u32(0x0, x382, x384, &x405);
-  uint32_t x408; uint8_t x409 = addcarryx_u32(x406, x385, x387, &x408);
-  uint32_t x411; uint8_t x412 = addcarryx_u32(x409, x388, x390, &x411);
-  uint32_t x414; uint8_t x415 = addcarryx_u32(x412, x391, x393, &x414);
-  uint32_t x417; uint8_t x418 = addcarryx_u32(x415, x394, x396, &x417);
-  uint32_t x420; uint8_t x421 = addcarryx_u32(x418, x397, x399, &x420);
-  uint32_t x423; uint8_t x424 = addcarryx_u32(x421, x400, x402, &x423);
-  uint32_t x426; addcarryx_u32(0x0, x424, x403, &x426);
-  uint32_t x429; uint8_t x430 = addcarryx_u32(0x0, x356, x381, &x429);
-  uint32_t x432; uint8_t x433 = addcarryx_u32(x430, x359, x405, &x432);
-  uint32_t x435; uint8_t x436 = addcarryx_u32(x433, x362, x408, &x435);
-  uint32_t x438; uint8_t x439 = addcarryx_u32(x436, x365, x411, &x438);
-  uint32_t x441; uint8_t x442 = addcarryx_u32(x439, x368, x414, &x441);
-  uint32_t x444; uint8_t x445 = addcarryx_u32(x442, x371, x417, &x444);
-  uint32_t x447; uint8_t x448 = addcarryx_u32(x445, x374, x420, &x447);
-  uint32_t x450; uint8_t x451 = addcarryx_u32(x448, x377, x423, &x450);
-  uint32_t x453; uint8_t x454 = addcarryx_u32(x451, x379, x426, &x453);
-  uint32_t x457;  uint32_t x456 = mulx_u32(x429, 0xffffffff, &x457);
-  uint32_t x460;  uint32_t x459 = mulx_u32(x429, 0xffffffff, &x460);
-  uint32_t x463;  uint32_t x462 = mulx_u32(x429, 0xffffffff, &x463);
-  uint32_t x466;  uint32_t x465 = mulx_u32(x429, 0xffffffff, &x466);
-  uint32_t x468; uint8_t x469 = addcarryx_u32(0x0, x457, x459, &x468);
-  uint32_t x471; uint8_t x472 = addcarryx_u32(x469, x460, x462, &x471);
-  uint32_t x474; uint8_t x475 = addcarryx_u32(x472, x463, 0x0, &x474);
-  uint8_t x476 = (0x0 + 0x0);
-  uint32_t _4; uint8_t x479 = addcarryx_u32(0x0, x429, x456, &_4);
-  uint32_t x481; uint8_t x482 = addcarryx_u32(x479, x432, x468, &x481);
-  uint32_t x484; uint8_t x485 = addcarryx_u32(x482, x435, x471, &x484);
-  uint32_t x487; uint8_t x488 = addcarryx_u32(x485, x438, x474, &x487);
-  uint32_t x490; uint8_t x491 = addcarryx_u32(x488, x441, x475, &x490);
-  uint32_t x493; uint8_t x494 = addcarryx_u32(x491, x444, x476, &x493);
-  uint32_t x496; uint8_t x497 = addcarryx_u32(x494, x447, x429, &x496);
-  uint32_t x499; uint8_t x500 = addcarryx_u32(x497, x450, x465, &x499);
-  uint32_t x502; uint8_t x503 = addcarryx_u32(x500, x453, x466, &x502);
-  uint8_t x504 = (x503 + x454);
-  uint32_t x507;  uint32_t x506 = mulx_u32(x13, x19, &x507);
-  uint32_t x510;  uint32_t x509 = mulx_u32(x13, x21, &x510);
-  uint32_t x513;  uint32_t x512 = mulx_u32(x13, x23, &x513);
-  uint32_t x516;  uint32_t x515 = mulx_u32(x13, x25, &x516);
-  uint32_t x519;  uint32_t x518 = mulx_u32(x13, x27, &x519);
-  uint32_t x522;  uint32_t x521 = mulx_u32(x13, x29, &x522);
-  uint32_t x525;  uint32_t x524 = mulx_u32(x13, x31, &x525);
-  uint32_t x528;  uint32_t x527 = mulx_u32(x13, x30, &x528);
-  uint32_t x530; uint8_t x531 = addcarryx_u32(0x0, x507, x509, &x530);
-  uint32_t x533; uint8_t x534 = addcarryx_u32(x531, x510, x512, &x533);
-  uint32_t x536; uint8_t x537 = addcarryx_u32(x534, x513, x515, &x536);
-  uint32_t x539; uint8_t x540 = addcarryx_u32(x537, x516, x518, &x539);
-  uint32_t x542; uint8_t x543 = addcarryx_u32(x540, x519, x521, &x542);
-  uint32_t x545; uint8_t x546 = addcarryx_u32(x543, x522, x524, &x545);
-  uint32_t x548; uint8_t x549 = addcarryx_u32(x546, x525, x527, &x548);
-  uint32_t x551; addcarryx_u32(0x0, x549, x528, &x551);
-  uint32_t x554; uint8_t x555 = addcarryx_u32(0x0, x481, x506, &x554);
-  uint32_t x557; uint8_t x558 = addcarryx_u32(x555, x484, x530, &x557);
-  uint32_t x560; uint8_t x561 = addcarryx_u32(x558, x487, x533, &x560);
-  uint32_t x563; uint8_t x564 = addcarryx_u32(x561, x490, x536, &x563);
-  uint32_t x566; uint8_t x567 = addcarryx_u32(x564, x493, x539, &x566);
-  uint32_t x569; uint8_t x570 = addcarryx_u32(x567, x496, x542, &x569);
-  uint32_t x572; uint8_t x573 = addcarryx_u32(x570, x499, x545, &x572);
-  uint32_t x575; uint8_t x576 = addcarryx_u32(x573, x502, x548, &x575);
-  uint32_t x578; uint8_t x579 = addcarryx_u32(x576, x504, x551, &x578);
-  uint32_t x582;  uint32_t x581 = mulx_u32(x554, 0xffffffff, &x582);
-  uint32_t x585;  uint32_t x584 = mulx_u32(x554, 0xffffffff, &x585);
-  uint32_t x588;  uint32_t x587 = mulx_u32(x554, 0xffffffff, &x588);
-  uint32_t x591;  uint32_t x590 = mulx_u32(x554, 0xffffffff, &x591);
-  uint32_t x593; uint8_t x594 = addcarryx_u32(0x0, x582, x584, &x593);
-  uint32_t x596; uint8_t x597 = addcarryx_u32(x594, x585, x587, &x596);
-  uint32_t x599; uint8_t x600 = addcarryx_u32(x597, x588, 0x0, &x599);
-  uint8_t x601 = (0x0 + 0x0);
-  uint32_t _5; uint8_t x604 = addcarryx_u32(0x0, x554, x581, &_5);
-  uint32_t x606; uint8_t x607 = addcarryx_u32(x604, x557, x593, &x606);
-  uint32_t x609; uint8_t x610 = addcarryx_u32(x607, x560, x596, &x609);
-  uint32_t x612; uint8_t x613 = addcarryx_u32(x610, x563, x599, &x612);
-  uint32_t x615; uint8_t x616 = addcarryx_u32(x613, x566, x600, &x615);
-  uint32_t x618; uint8_t x619 = addcarryx_u32(x616, x569, x601, &x618);
-  uint32_t x621; uint8_t x622 = addcarryx_u32(x619, x572, x554, &x621);
-  uint32_t x624; uint8_t x625 = addcarryx_u32(x622, x575, x590, &x624);
-  uint32_t x627; uint8_t x628 = addcarryx_u32(x625, x578, x591, &x627);
-  uint8_t x629 = (x628 + x579);
-  uint32_t x632;  uint32_t x631 = mulx_u32(x15, x19, &x632);
-  uint32_t x635;  uint32_t x634 = mulx_u32(x15, x21, &x635);
-  uint32_t x638;  uint32_t x637 = mulx_u32(x15, x23, &x638);
-  uint32_t x641;  uint32_t x640 = mulx_u32(x15, x25, &x641);
-  uint32_t x644;  uint32_t x643 = mulx_u32(x15, x27, &x644);
-  uint32_t x647;  uint32_t x646 = mulx_u32(x15, x29, &x647);
-  uint32_t x650;  uint32_t x649 = mulx_u32(x15, x31, &x650);
-  uint32_t x653;  uint32_t x652 = mulx_u32(x15, x30, &x653);
-  uint32_t x655; uint8_t x656 = addcarryx_u32(0x0, x632, x634, &x655);
-  uint32_t x658; uint8_t x659 = addcarryx_u32(x656, x635, x637, &x658);
-  uint32_t x661; uint8_t x662 = addcarryx_u32(x659, x638, x640, &x661);
-  uint32_t x664; uint8_t x665 = addcarryx_u32(x662, x641, x643, &x664);
-  uint32_t x667; uint8_t x668 = addcarryx_u32(x665, x644, x646, &x667);
-  uint32_t x670; uint8_t x671 = addcarryx_u32(x668, x647, x649, &x670);
-  uint32_t x673; uint8_t x674 = addcarryx_u32(x671, x650, x652, &x673);
-  uint32_t x676; addcarryx_u32(0x0, x674, x653, &x676);
-  uint32_t x679; uint8_t x680 = addcarryx_u32(0x0, x606, x631, &x679);
-  uint32_t x682; uint8_t x683 = addcarryx_u32(x680, x609, x655, &x682);
-  uint32_t x685; uint8_t x686 = addcarryx_u32(x683, x612, x658, &x685);
-  uint32_t x688; uint8_t x689 = addcarryx_u32(x686, x615, x661, &x688);
-  uint32_t x691; uint8_t x692 = addcarryx_u32(x689, x618, x664, &x691);
-  uint32_t x694; uint8_t x695 = addcarryx_u32(x692, x621, x667, &x694);
-  uint32_t x697; uint8_t x698 = addcarryx_u32(x695, x624, x670, &x697);
-  uint32_t x700; uint8_t x701 = addcarryx_u32(x698, x627, x673, &x700);
-  uint32_t x703; uint8_t x704 = addcarryx_u32(x701, x629, x676, &x703);
-  uint32_t x707;  uint32_t x706 = mulx_u32(x679, 0xffffffff, &x707);
-  uint32_t x710;  uint32_t x709 = mulx_u32(x679, 0xffffffff, &x710);
-  uint32_t x713;  uint32_t x712 = mulx_u32(x679, 0xffffffff, &x713);
-  uint32_t x716;  uint32_t x715 = mulx_u32(x679, 0xffffffff, &x716);
-  uint32_t x718; uint8_t x719 = addcarryx_u32(0x0, x707, x709, &x718);
-  uint32_t x721; uint8_t x722 = addcarryx_u32(x719, x710, x712, &x721);
-  uint32_t x724; uint8_t x725 = addcarryx_u32(x722, x713, 0x0, &x724);
-  uint8_t x726 = (0x0 + 0x0);
-  uint32_t _6; uint8_t x729 = addcarryx_u32(0x0, x679, x706, &_6);
-  uint32_t x731; uint8_t x732 = addcarryx_u32(x729, x682, x718, &x731);
-  uint32_t x734; uint8_t x735 = addcarryx_u32(x732, x685, x721, &x734);
-  uint32_t x737; uint8_t x738 = addcarryx_u32(x735, x688, x724, &x737);
-  uint32_t x740; uint8_t x741 = addcarryx_u32(x738, x691, x725, &x740);
-  uint32_t x743; uint8_t x744 = addcarryx_u32(x741, x694, x726, &x743);
-  uint32_t x746; uint8_t x747 = addcarryx_u32(x744, x697, x679, &x746);
-  uint32_t x749; uint8_t x750 = addcarryx_u32(x747, x700, x715, &x749);
-  uint32_t x752; uint8_t x753 = addcarryx_u32(x750, x703, x716, &x752);
-  uint8_t x754 = (x753 + x704);
-  uint32_t x757;  uint32_t x756 = mulx_u32(x17, x19, &x757);
-  uint32_t x760;  uint32_t x759 = mulx_u32(x17, x21, &x760);
-  uint32_t x763;  uint32_t x762 = mulx_u32(x17, x23, &x763);
-  uint32_t x766;  uint32_t x765 = mulx_u32(x17, x25, &x766);
-  uint32_t x769;  uint32_t x768 = mulx_u32(x17, x27, &x769);
-  uint32_t x772;  uint32_t x771 = mulx_u32(x17, x29, &x772);
-  uint32_t x775;  uint32_t x774 = mulx_u32(x17, x31, &x775);
-  uint32_t x778;  uint32_t x777 = mulx_u32(x17, x30, &x778);
-  uint32_t x780; uint8_t x781 = addcarryx_u32(0x0, x757, x759, &x780);
-  uint32_t x783; uint8_t x784 = addcarryx_u32(x781, x760, x762, &x783);
-  uint32_t x786; uint8_t x787 = addcarryx_u32(x784, x763, x765, &x786);
-  uint32_t x789; uint8_t x790 = addcarryx_u32(x787, x766, x768, &x789);
-  uint32_t x792; uint8_t x793 = addcarryx_u32(x790, x769, x771, &x792);
-  uint32_t x795; uint8_t x796 = addcarryx_u32(x793, x772, x774, &x795);
-  uint32_t x798; uint8_t x799 = addcarryx_u32(x796, x775, x777, &x798);
-  uint32_t x801; addcarryx_u32(0x0, x799, x778, &x801);
-  uint32_t x804; uint8_t x805 = addcarryx_u32(0x0, x731, x756, &x804);
-  uint32_t x807; uint8_t x808 = addcarryx_u32(x805, x734, x780, &x807);
-  uint32_t x810; uint8_t x811 = addcarryx_u32(x808, x737, x783, &x810);
-  uint32_t x813; uint8_t x814 = addcarryx_u32(x811, x740, x786, &x813);
-  uint32_t x816; uint8_t x817 = addcarryx_u32(x814, x743, x789, &x816);
-  uint32_t x819; uint8_t x820 = addcarryx_u32(x817, x746, x792, &x819);
-  uint32_t x822; uint8_t x823 = addcarryx_u32(x820, x749, x795, &x822);
-  uint32_t x825; uint8_t x826 = addcarryx_u32(x823, x752, x798, &x825);
-  uint32_t x828; uint8_t x829 = addcarryx_u32(x826, x754, x801, &x828);
-  uint32_t x832;  uint32_t x831 = mulx_u32(x804, 0xffffffff, &x832);
-  uint32_t x835;  uint32_t x834 = mulx_u32(x804, 0xffffffff, &x835);
-  uint32_t x838;  uint32_t x837 = mulx_u32(x804, 0xffffffff, &x838);
-  uint32_t x841;  uint32_t x840 = mulx_u32(x804, 0xffffffff, &x841);
-  uint32_t x843; uint8_t x844 = addcarryx_u32(0x0, x832, x834, &x843);
-  uint32_t x846; uint8_t x847 = addcarryx_u32(x844, x835, x837, &x846);
-  uint32_t x849; uint8_t x850 = addcarryx_u32(x847, x838, 0x0, &x849);
-  uint8_t x851 = (0x0 + 0x0);
-  uint32_t _7; uint8_t x854 = addcarryx_u32(0x0, x804, x831, &_7);
-  uint32_t x856; uint8_t x857 = addcarryx_u32(x854, x807, x843, &x856);
-  uint32_t x859; uint8_t x860 = addcarryx_u32(x857, x810, x846, &x859);
-  uint32_t x862; uint8_t x863 = addcarryx_u32(x860, x813, x849, &x862);
-  uint32_t x865; uint8_t x866 = addcarryx_u32(x863, x816, x850, &x865);
-  uint32_t x868; uint8_t x869 = addcarryx_u32(x866, x819, x851, &x868);
-  uint32_t x871; uint8_t x872 = addcarryx_u32(x869, x822, x804, &x871);
-  uint32_t x874; uint8_t x875 = addcarryx_u32(x872, x825, x840, &x874);
-  uint32_t x877; uint8_t x878 = addcarryx_u32(x875, x828, x841, &x877);
-  uint8_t x879 = (x878 + x829);
-  uint32_t x882;  uint32_t x881 = mulx_u32(x16, x19, &x882);
-  uint32_t x885;  uint32_t x884 = mulx_u32(x16, x21, &x885);
-  uint32_t x888;  uint32_t x887 = mulx_u32(x16, x23, &x888);
-  uint32_t x891;  uint32_t x890 = mulx_u32(x16, x25, &x891);
-  uint32_t x894;  uint32_t x893 = mulx_u32(x16, x27, &x894);
-  uint32_t x897;  uint32_t x896 = mulx_u32(x16, x29, &x897);
-  uint32_t x900;  uint32_t x899 = mulx_u32(x16, x31, &x900);
-  uint32_t x903;  uint32_t x902 = mulx_u32(x16, x30, &x903);
-  uint32_t x905; uint8_t x906 = addcarryx_u32(0x0, x882, x884, &x905);
-  uint32_t x908; uint8_t x909 = addcarryx_u32(x906, x885, x887, &x908);
-  uint32_t x911; uint8_t x912 = addcarryx_u32(x909, x888, x890, &x911);
-  uint32_t x914; uint8_t x915 = addcarryx_u32(x912, x891, x893, &x914);
-  uint32_t x917; uint8_t x918 = addcarryx_u32(x915, x894, x896, &x917);
-  uint32_t x920; uint8_t x921 = addcarryx_u32(x918, x897, x899, &x920);
-  uint32_t x923; uint8_t x924 = addcarryx_u32(x921, x900, x902, &x923);
-  uint32_t x926; addcarryx_u32(0x0, x924, x903, &x926);
-  uint32_t x929; uint8_t x930 = addcarryx_u32(0x0, x856, x881, &x929);
-  uint32_t x932; uint8_t x933 = addcarryx_u32(x930, x859, x905, &x932);
-  uint32_t x935; uint8_t x936 = addcarryx_u32(x933, x862, x908, &x935);
-  uint32_t x938; uint8_t x939 = addcarryx_u32(x936, x865, x911, &x938);
-  uint32_t x941; uint8_t x942 = addcarryx_u32(x939, x868, x914, &x941);
-  uint32_t x944; uint8_t x945 = addcarryx_u32(x942, x871, x917, &x944);
-  uint32_t x947; uint8_t x948 = addcarryx_u32(x945, x874, x920, &x947);
-  uint32_t x950; uint8_t x951 = addcarryx_u32(x948, x877, x923, &x950);
-  uint32_t x953; uint8_t x954 = addcarryx_u32(x951, x879, x926, &x953);
-  uint32_t x957;  uint32_t x956 = mulx_u32(x929, 0xffffffff, &x957);
-  uint32_t x960;  uint32_t x959 = mulx_u32(x929, 0xffffffff, &x960);
-  uint32_t x963;  uint32_t x962 = mulx_u32(x929, 0xffffffff, &x963);
-  uint32_t x966;  uint32_t x965 = mulx_u32(x929, 0xffffffff, &x966);
-  uint32_t x968; uint8_t x969 = addcarryx_u32(0x0, x957, x959, &x968);
-  uint32_t x971; uint8_t x972 = addcarryx_u32(x969, x960, x962, &x971);
-  uint32_t x974; uint8_t x975 = addcarryx_u32(x972, x963, 0x0, &x974);
-  uint8_t x976 = (0x0 + 0x0);
-  uint32_t _8; uint8_t x979 = addcarryx_u32(0x0, x929, x956, &_8);
-  uint32_t x981; uint8_t x982 = addcarryx_u32(x979, x932, x968, &x981);
-  uint32_t x984; uint8_t x985 = addcarryx_u32(x982, x935, x971, &x984);
-  uint32_t x987; uint8_t x988 = addcarryx_u32(x985, x938, x974, &x987);
-  uint32_t x990; uint8_t x991 = addcarryx_u32(x988, x941, x975, &x990);
-  uint32_t x993; uint8_t x994 = addcarryx_u32(x991, x944, x976, &x993);
-  uint32_t x996; uint8_t x997 = addcarryx_u32(x994, x947, x929, &x996);
-  uint32_t x999; uint8_t x1000 = addcarryx_u32(x997, x950, x965, &x999);
-  uint32_t x1002; uint8_t x1003 = addcarryx_u32(x1000, x953, x966, &x1002);
-  uint8_t x1004 = (x1003 + x954);
-  uint32_t x1006; uint8_t x1007 = subborrow_u32(0x0, x981, 0xffffffff, &x1006);
-  uint32_t x1009; uint8_t x1010 = subborrow_u32(x1007, x984, 0xffffffff, &x1009);
-  uint32_t x1012; uint8_t x1013 = subborrow_u32(x1010, x987, 0xffffffff, &x1012);
-  uint32_t x1015; uint8_t x1016 = subborrow_u32(x1013, x990, 0x0, &x1015);
-  uint32_t x1018; uint8_t x1019 = subborrow_u32(x1016, x993, 0x0, &x1018);
-  uint32_t x1021; uint8_t x1022 = subborrow_u32(x1019, x996, 0x0, &x1021);
-  uint32_t x1024; uint8_t x1025 = subborrow_u32(x1022, x999, 0x1, &x1024);
-  uint32_t x1027; uint8_t x1028 = subborrow_u32(x1025, x1002, 0xffffffff, &x1027);
-  uint32_t _9; uint8_t x1031 = subborrow_u32(x1028, x1004, 0x0, &_9);
-  uint32_t x1032 = cmovznz_u32(x1031, x1027, x1002);
-  uint32_t x1033 = cmovznz_u32(x1031, x1024, x999);
-  uint32_t x1034 = cmovznz_u32(x1031, x1021, x996);
-  uint32_t x1035 = cmovznz_u32(x1031, x1018, x993);
-  uint32_t x1036 = cmovznz_u32(x1031, x1015, x990);
-  uint32_t x1037 = cmovznz_u32(x1031, x1012, x987);
-  uint32_t x1038 = cmovznz_u32(x1031, x1009, x984);
-  uint32_t x1039 = cmovznz_u32(x1031, x1006, x981);
-  out[0] = x1039;
-  out[1] = x1038;
-  out[2] = x1037;
-  out[3] = x1036;
-  out[4] = x1035;
-  out[5] = x1034;
-  out[6] = x1033;
-  out[7] = x1032;
-}
-
-// NOTE: the following functions are generated from fiat-crypto, from the same
-// template as their 64-bit counterparts above, but the correctness proof of
-// the template was not composed with the correctness proof of the
-// specialization pipeline. This is because Coq unexplainedly loops on trying
-// to synthesize opp and sub using the normal pipeline.
-
-static void fe_sub(uint32_t out[8], const uint32_t in1[8], const uint32_t in2[8]) {
-  const uint32_t x14 = in1[7];
-  const uint32_t x15 = in1[6];
-  const uint32_t x13 = in1[5];
-  const uint32_t x11 = in1[4];
-  const uint32_t x9 = in1[3];
-  const uint32_t x7 = in1[2];
-  const uint32_t x5 = in1[1];
-  const uint32_t x3 = in1[0];
-  const uint32_t x28 = in2[7];
-  const uint32_t x29 = in2[6];
-  const uint32_t x27 = in2[5];
-  const uint32_t x25 = in2[4];
-  const uint32_t x23 = in2[3];
-  const uint32_t x21 = in2[2];
-  const uint32_t x19 = in2[1];
-  const uint32_t x17 = in2[0];
-  uint32_t x31; uint8_t x32 = subborrow_u32(0x0, x3, x17, &x31);
-  uint32_t x34; uint8_t x35 = subborrow_u32(x32, x5, x19, &x34);
-  uint32_t x37; uint8_t x38 = subborrow_u32(x35, x7, x21, &x37);
-  uint32_t x40; uint8_t x41 = subborrow_u32(x38, x9, x23, &x40);
-  uint32_t x43; uint8_t x44 = subborrow_u32(x41, x11, x25, &x43);
-  uint32_t x46; uint8_t x47 = subborrow_u32(x44, x13, x27, &x46);
-  uint32_t x49; uint8_t x50 = subborrow_u32(x47, x15, x29, &x49);
-  uint32_t x52; uint8_t x53 = subborrow_u32(x50, x14, x28, &x52);
-  uint32_t x54 = cmovznz_u32(x53, 0x0, 0xffffffff);
-  uint32_t x56; uint8_t x57 = addcarryx_u32(0x0, x31, (x54 & 0xffffffff), &x56);
-  uint32_t x59; uint8_t x60 = addcarryx_u32(x57, x34, (x54 & 0xffffffff), &x59);
-  uint32_t x62; uint8_t x63 = addcarryx_u32(x60, x37, (x54 & 0xffffffff), &x62);
-  uint32_t x65; uint8_t x66 = addcarryx_u32(x63, x40, 0x0, &x65);
-  uint32_t x68; uint8_t x69 = addcarryx_u32(x66, x43, 0x0, &x68);
-  uint32_t x71; uint8_t x72 = addcarryx_u32(x69, x46, 0x0, &x71);
-  uint32_t x74; uint8_t x75 = addcarryx_u32(x72, x49, ((uint8_t)x54 & 0x1), &x74);
-  uint32_t x77; addcarryx_u32(x75, x52, (x54 & 0xffffffff), &x77);
-  out[0] = x56;
-  out[1] = x59;
-  out[2] = x62;
-  out[3] = x65;
-  out[4] = x68;
-  out[5] = x71;
-  out[6] = x74;
-  out[7] = x77;
-}
-
-// fe_op sets out = -in
-static void fe_opp(uint32_t out[8], const uint32_t in1[8]) {
-  const uint32_t x12 = in1[7];
-  const uint32_t x13 = in1[6];
-  const uint32_t x11 = in1[5];
-  const uint32_t x9 = in1[4];
-  const uint32_t x7 = in1[3];
-  const uint32_t x5 = in1[2];
-  const uint32_t x3 = in1[1];
-  const uint32_t x1 = in1[0];
-  uint32_t x15; uint8_t x16 = subborrow_u32(0x0, 0x0, x1, &x15);
-  uint32_t x18; uint8_t x19 = subborrow_u32(x16, 0x0, x3, &x18);
-  uint32_t x21; uint8_t x22 = subborrow_u32(x19, 0x0, x5, &x21);
-  uint32_t x24; uint8_t x25 = subborrow_u32(x22, 0x0, x7, &x24);
-  uint32_t x27; uint8_t x28 = subborrow_u32(x25, 0x0, x9, &x27);
-  uint32_t x30; uint8_t x31 = subborrow_u32(x28, 0x0, x11, &x30);
-  uint32_t x33; uint8_t x34 = subborrow_u32(x31, 0x0, x13, &x33);
-  uint32_t x36; uint8_t x37 = subborrow_u32(x34, 0x0, x12, &x36);
-  uint32_t x38 = cmovznz_u32(x37, 0x0, 0xffffffff);
-  uint32_t x40; uint8_t x41 = addcarryx_u32(0x0, x15, (x38 & 0xffffffff), &x40);
-  uint32_t x43; uint8_t x44 = addcarryx_u32(x41, x18, (x38 & 0xffffffff), &x43);
-  uint32_t x46; uint8_t x47 = addcarryx_u32(x44, x21, (x38 & 0xffffffff), &x46);
-  uint32_t x49; uint8_t x50 = addcarryx_u32(x47, x24, 0x0, &x49);
-  uint32_t x52; uint8_t x53 = addcarryx_u32(x50, x27, 0x0, &x52);
-  uint32_t x55; uint8_t x56 = addcarryx_u32(x53, x30, 0x0, &x55);
-  uint32_t x58; uint8_t x59 = addcarryx_u32(x56, x33, ((uint8_t)x38 & 0x1), &x58);
-  uint32_t x61; addcarryx_u32(x59, x36, (x38 & 0xffffffff), &x61);
-  out[0] = x40;
-  out[1] = x43;
-  out[2] = x46;
-  out[3] = x49;
-  out[4] = x52;
-  out[5] = x55;
-  out[6] = x58;
-  out[7] = x61;
-}
-
-#endif
 
 // utility functions, handwritten
 
@@ -840,22 +60,28 @@
 
 #define NLIMBS 4
 typedef uint64_t limb_t;
-#define cmovznz_limb cmovznz_u64
 typedef uint64_t fe[NLIMBS];
 #else // 64BIT; else 32BIT
 
 #define NLIMBS 8
 typedef uint32_t limb_t;
-#define cmovznz_limb cmovznz_u32
 typedef uint32_t fe[NLIMBS];
 
 #endif // 64BIT
 
+#define fe_add fiat_p256_add
+#define fe_sub fiat_p256_sub
+#define fe_opp fiat_p256_opp
+
+#define fe_mul fiat_p256_mul
+#define fe_sqr fiat_p256_square
+
+#define fe_tobytes fiat_p256_to_bytes
+#define fe_frombytes fiat_p256_from_bytes
+
 static limb_t fe_nz(const limb_t in1[NLIMBS]) {
-  limb_t ret = 0;
-  for (int i = 0; i < NLIMBS; i++) {
-    ret |= in1[i];
-  }
+  limb_t ret;
+  fiat_p256_nonzero(&ret, in1);
   return ret;
 }
 
@@ -867,33 +93,11 @@
 
 static void fe_cmovznz(limb_t out[NLIMBS], limb_t t, const limb_t z[NLIMBS],
                        const limb_t nz[NLIMBS]) {
-  for (int i = 0; i < NLIMBS; i++) {
-    out[i] = cmovznz_limb(t, z[i], nz[i]);
-  }
-}
-
-static void fe_sqr(fe out, const fe in) {
-  fe_mul(out, in, in);
-}
-
-static void fe_tobytes(uint8_t out[NBYTES], const fe in) {
-  for (int i = 0; i<NBYTES; i++) {
-    out[i] = (uint8_t)(in[i/sizeof(in[0])] >> (8*(i%sizeof(in[0]))));
-  }
-}
-
-static void fe_frombytes(fe out, const uint8_t in[NBYTES]) {
-  for (int i = 0; i<NLIMBS; i++) {
-    out[i] = 0;
-  }
-  for (int i = 0; i<NBYTES; i++) {
-    out[i/sizeof(out[0])] |= ((limb_t)in[i]) << (8*(i%sizeof(out[0])));
-  }
+  fiat_p256_selectznz(out, !!t, z, nz);
 }
 
 static void fe_from_montgomery(fe x) {
-  static const limb_t kOne[NLIMBS] = {1, 0};
-  fe_mul(x, x, kOne);
+  fiat_p256_from_montgomery(x, x);
 }
 
 static void fe_from_generic(fe out, const EC_FELEM *in) {
diff --git a/third_party/fiat/p256_32.c b/third_party/fiat/p256_32.c
new file mode 100644
index 0000000..faaa0b0
--- /dev/null
+++ b/third_party/fiat/p256_32.c
@@ -0,0 +1,3220 @@
+/* Autogenerated */
+/* curve description: p256 */
+/* requested operations: (all) */
+/* m = 0xffffffff00000001000000000000000000000000ffffffffffffffffffffffff (from "2^256 - 2^224 + 2^192 + 2^96 - 1") */
+/* machine_wordsize = 32 (from "32") */
+/*                                                                    */
+/* NOTE: In addition to the bounds specified above each function, all */
+/*   functions synthesized for this Montgomery arithmetic require the */
+/*   input to be strictly less than the prime modulus (m), and also   */
+/*   require the input to be in the unique saturated representation.  */
+/*   All functions also ensure that these two properties are true of  */
+/*   return values.                                                   */
+
+#include <stdint.h>
+typedef unsigned char fiat_p256_uint1;
+typedef signed char fiat_p256_int1;
+
+
+/*
+ * Input Bounds:
+ *   arg1: [0x0 ~> 0x1]
+ *   arg2: [0x0 ~> 0xffffffff]
+ *   arg3: [0x0 ~> 0xffffffff]
+ * Output Bounds:
+ *   out1: [0x0 ~> 0xffffffff]
+ *   out2: [0x0 ~> 0x1]
+ */
+static void fiat_p256_addcarryx_u32(uint32_t* out1, fiat_p256_uint1* out2, fiat_p256_uint1 arg1, uint32_t arg2, uint32_t arg3) {
+  uint64_t x1 = ((arg1 + (uint64_t)arg2) + arg3);
+  uint32_t x2 = (uint32_t)(x1 & UINT32_C(0xffffffff));
+  fiat_p256_uint1 x3 = (fiat_p256_uint1)(x1 >> 32);
+  *out1 = x2;
+  *out2 = x3;
+}
+
+/*
+ * Input Bounds:
+ *   arg1: [0x0 ~> 0x1]
+ *   arg2: [0x0 ~> 0xffffffff]
+ *   arg3: [0x0 ~> 0xffffffff]
+ * Output Bounds:
+ *   out1: [0x0 ~> 0xffffffff]
+ *   out2: [0x0 ~> 0x1]
+ */
+static void fiat_p256_subborrowx_u32(uint32_t* out1, fiat_p256_uint1* out2, fiat_p256_uint1 arg1, uint32_t arg2, uint32_t arg3) {
+  int64_t x1 = ((arg2 - (int64_t)arg1) - arg3);
+  fiat_p256_int1 x2 = (fiat_p256_int1)(x1 >> 32);
+  uint32_t x3 = (uint32_t)(x1 & UINT32_C(0xffffffff));
+  *out1 = x3;
+  *out2 = (fiat_p256_uint1)(0x0 - x2);
+}
+
+/*
+ * Input Bounds:
+ *   arg1: [0x0 ~> 0xffffffff]
+ *   arg2: [0x0 ~> 0xffffffff]
+ * Output Bounds:
+ *   out1: [0x0 ~> 0xffffffff]
+ *   out2: [0x0 ~> 0xffffffff]
+ */
+static void fiat_p256_mulx_u32(uint32_t* out1, uint32_t* out2, uint32_t arg1, uint32_t arg2) {
+  uint64_t x1 = ((uint64_t)arg1 * arg2);
+  uint32_t x2 = (uint32_t)(x1 & UINT32_C(0xffffffff));
+  uint32_t x3 = (uint32_t)(x1 >> 32);
+  *out1 = x2;
+  *out2 = x3;
+}
+
+/*
+ * Input Bounds:
+ *   arg1: [0x0 ~> 0x1]
+ *   arg2: [0x0 ~> 0xffffffff]
+ *   arg3: [0x0 ~> 0xffffffff]
+ * Output Bounds:
+ *   out1: [0x0 ~> 0xffffffff]
+ */
+static void fiat_p256_cmovznz_u32(uint32_t* out1, fiat_p256_uint1 arg1, uint32_t arg2, uint32_t arg3) {
+  fiat_p256_uint1 x1 = (!(!arg1));
+  uint32_t x2 = ((fiat_p256_int1)(0x0 - x1) & UINT32_C(0xffffffff));
+  uint32_t x3 = ((x2 & arg3) | ((~x2) & arg2));
+  *out1 = x3;
+}
+
+/*
+ * Input Bounds:
+ *   arg1: [[0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff]]
+ *   arg2: [[0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff]]
+ * Output Bounds:
+ *   out1: [[0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff]]
+ */
+static void fiat_p256_mul(uint32_t out1[8], const uint32_t arg1[8], const uint32_t arg2[8]) {
+  uint32_t x1 = (arg1[1]);
+  uint32_t x2 = (arg1[2]);
+  uint32_t x3 = (arg1[3]);
+  uint32_t x4 = (arg1[4]);
+  uint32_t x5 = (arg1[5]);
+  uint32_t x6 = (arg1[6]);
+  uint32_t x7 = (arg1[7]);
+  uint32_t x8 = (arg1[0]);
+  uint32_t x9;
+  uint32_t x10;
+  fiat_p256_mulx_u32(&x9, &x10, x8, (arg2[7]));
+  uint32_t x11;
+  uint32_t x12;
+  fiat_p256_mulx_u32(&x11, &x12, x8, (arg2[6]));
+  uint32_t x13;
+  uint32_t x14;
+  fiat_p256_mulx_u32(&x13, &x14, x8, (arg2[5]));
+  uint32_t x15;
+  uint32_t x16;
+  fiat_p256_mulx_u32(&x15, &x16, x8, (arg2[4]));
+  uint32_t x17;
+  uint32_t x18;
+  fiat_p256_mulx_u32(&x17, &x18, x8, (arg2[3]));
+  uint32_t x19;
+  uint32_t x20;
+  fiat_p256_mulx_u32(&x19, &x20, x8, (arg2[2]));
+  uint32_t x21;
+  uint32_t x22;
+  fiat_p256_mulx_u32(&x21, &x22, x8, (arg2[1]));
+  uint32_t x23;
+  uint32_t x24;
+  fiat_p256_mulx_u32(&x23, &x24, x8, (arg2[0]));
+  uint32_t x25;
+  fiat_p256_uint1 x26;
+  fiat_p256_addcarryx_u32(&x25, &x26, 0x0, x21, x24);
+  uint32_t x27;
+  fiat_p256_uint1 x28;
+  fiat_p256_addcarryx_u32(&x27, &x28, x26, x19, x22);
+  uint32_t x29;
+  fiat_p256_uint1 x30;
+  fiat_p256_addcarryx_u32(&x29, &x30, x28, x17, x20);
+  uint32_t x31;
+  fiat_p256_uint1 x32;
+  fiat_p256_addcarryx_u32(&x31, &x32, x30, x15, x18);
+  uint32_t x33;
+  fiat_p256_uint1 x34;
+  fiat_p256_addcarryx_u32(&x33, &x34, x32, x13, x16);
+  uint32_t x35;
+  fiat_p256_uint1 x36;
+  fiat_p256_addcarryx_u32(&x35, &x36, x34, x11, x14);
+  uint32_t x37;
+  fiat_p256_uint1 x38;
+  fiat_p256_addcarryx_u32(&x37, &x38, x36, x9, x12);
+  uint32_t x39;
+  fiat_p256_uint1 x40;
+  fiat_p256_addcarryx_u32(&x39, &x40, x38, 0x0, x10);
+  uint32_t x41;
+  uint32_t x42;
+  fiat_p256_mulx_u32(&x41, &x42, x23, UINT32_C(0xffffffff));
+  uint32_t x43;
+  uint32_t x44;
+  fiat_p256_mulx_u32(&x43, &x44, x23, UINT32_C(0xffffffff));
+  uint32_t x45;
+  uint32_t x46;
+  fiat_p256_mulx_u32(&x45, &x46, x23, UINT32_C(0xffffffff));
+  uint32_t x47;
+  uint32_t x48;
+  fiat_p256_mulx_u32(&x47, &x48, x23, UINT32_C(0xffffffff));
+  uint32_t x49;
+  fiat_p256_uint1 x50;
+  fiat_p256_addcarryx_u32(&x49, &x50, 0x0, x45, x48);
+  uint32_t x51;
+  fiat_p256_uint1 x52;
+  fiat_p256_addcarryx_u32(&x51, &x52, x50, x43, x46);
+  uint32_t x53;
+  fiat_p256_uint1 x54;
+  fiat_p256_addcarryx_u32(&x53, &x54, x52, 0x0, x44);
+  uint32_t x55;
+  fiat_p256_uint1 x56;
+  fiat_p256_addcarryx_u32(&x55, &x56, 0x0, x47, x23);
+  uint32_t x57;
+  fiat_p256_uint1 x58;
+  fiat_p256_addcarryx_u32(&x57, &x58, x56, x49, x25);
+  uint32_t x59;
+  fiat_p256_uint1 x60;
+  fiat_p256_addcarryx_u32(&x59, &x60, x58, x51, x27);
+  uint32_t x61;
+  fiat_p256_uint1 x62;
+  fiat_p256_addcarryx_u32(&x61, &x62, x60, x53, x29);
+  uint32_t x63;
+  fiat_p256_uint1 x64;
+  fiat_p256_addcarryx_u32(&x63, &x64, x62, 0x0, x31);
+  uint32_t x65;
+  fiat_p256_uint1 x66;
+  fiat_p256_addcarryx_u32(&x65, &x66, x64, 0x0, x33);
+  uint32_t x67;
+  fiat_p256_uint1 x68;
+  fiat_p256_addcarryx_u32(&x67, &x68, x66, x23, x35);
+  uint32_t x69;
+  fiat_p256_uint1 x70;
+  fiat_p256_addcarryx_u32(&x69, &x70, x68, x41, x37);
+  uint32_t x71;
+  fiat_p256_uint1 x72;
+  fiat_p256_addcarryx_u32(&x71, &x72, x70, x42, x39);
+  uint32_t x73;
+  fiat_p256_uint1 x74;
+  fiat_p256_addcarryx_u32(&x73, &x74, x72, 0x0, 0x0);
+  uint32_t x75;
+  uint32_t x76;
+  fiat_p256_mulx_u32(&x75, &x76, x1, (arg2[7]));
+  uint32_t x77;
+  uint32_t x78;
+  fiat_p256_mulx_u32(&x77, &x78, x1, (arg2[6]));
+  uint32_t x79;
+  uint32_t x80;
+  fiat_p256_mulx_u32(&x79, &x80, x1, (arg2[5]));
+  uint32_t x81;
+  uint32_t x82;
+  fiat_p256_mulx_u32(&x81, &x82, x1, (arg2[4]));
+  uint32_t x83;
+  uint32_t x84;
+  fiat_p256_mulx_u32(&x83, &x84, x1, (arg2[3]));
+  uint32_t x85;
+  uint32_t x86;
+  fiat_p256_mulx_u32(&x85, &x86, x1, (arg2[2]));
+  uint32_t x87;
+  uint32_t x88;
+  fiat_p256_mulx_u32(&x87, &x88, x1, (arg2[1]));
+  uint32_t x89;
+  uint32_t x90;
+  fiat_p256_mulx_u32(&x89, &x90, x1, (arg2[0]));
+  uint32_t x91;
+  fiat_p256_uint1 x92;
+  fiat_p256_addcarryx_u32(&x91, &x92, 0x0, x87, x90);
+  uint32_t x93;
+  fiat_p256_uint1 x94;
+  fiat_p256_addcarryx_u32(&x93, &x94, x92, x85, x88);
+  uint32_t x95;
+  fiat_p256_uint1 x96;
+  fiat_p256_addcarryx_u32(&x95, &x96, x94, x83, x86);
+  uint32_t x97;
+  fiat_p256_uint1 x98;
+  fiat_p256_addcarryx_u32(&x97, &x98, x96, x81, x84);
+  uint32_t x99;
+  fiat_p256_uint1 x100;
+  fiat_p256_addcarryx_u32(&x99, &x100, x98, x79, x82);
+  uint32_t x101;
+  fiat_p256_uint1 x102;
+  fiat_p256_addcarryx_u32(&x101, &x102, x100, x77, x80);
+  uint32_t x103;
+  fiat_p256_uint1 x104;
+  fiat_p256_addcarryx_u32(&x103, &x104, x102, x75, x78);
+  uint32_t x105;
+  fiat_p256_uint1 x106;
+  fiat_p256_addcarryx_u32(&x105, &x106, x104, 0x0, x76);
+  uint32_t x107;
+  fiat_p256_uint1 x108;
+  fiat_p256_addcarryx_u32(&x107, &x108, 0x0, x89, x57);
+  uint32_t x109;
+  fiat_p256_uint1 x110;
+  fiat_p256_addcarryx_u32(&x109, &x110, x108, x91, x59);
+  uint32_t x111;
+  fiat_p256_uint1 x112;
+  fiat_p256_addcarryx_u32(&x111, &x112, x110, x93, x61);
+  uint32_t x113;
+  fiat_p256_uint1 x114;
+  fiat_p256_addcarryx_u32(&x113, &x114, x112, x95, x63);
+  uint32_t x115;
+  fiat_p256_uint1 x116;
+  fiat_p256_addcarryx_u32(&x115, &x116, x114, x97, x65);
+  uint32_t x117;
+  fiat_p256_uint1 x118;
+  fiat_p256_addcarryx_u32(&x117, &x118, x116, x99, x67);
+  uint32_t x119;
+  fiat_p256_uint1 x120;
+  fiat_p256_addcarryx_u32(&x119, &x120, x118, x101, x69);
+  uint32_t x121;
+  fiat_p256_uint1 x122;
+  fiat_p256_addcarryx_u32(&x121, &x122, x120, x103, x71);
+  uint32_t x123;
+  fiat_p256_uint1 x124;
+  fiat_p256_addcarryx_u32(&x123, &x124, x122, x105, (fiat_p256_uint1)x73);
+  uint32_t x125;
+  uint32_t x126;
+  fiat_p256_mulx_u32(&x125, &x126, x107, UINT32_C(0xffffffff));
+  uint32_t x127;
+  uint32_t x128;
+  fiat_p256_mulx_u32(&x127, &x128, x107, UINT32_C(0xffffffff));
+  uint32_t x129;
+  uint32_t x130;
+  fiat_p256_mulx_u32(&x129, &x130, x107, UINT32_C(0xffffffff));
+  uint32_t x131;
+  uint32_t x132;
+  fiat_p256_mulx_u32(&x131, &x132, x107, UINT32_C(0xffffffff));
+  uint32_t x133;
+  fiat_p256_uint1 x134;
+  fiat_p256_addcarryx_u32(&x133, &x134, 0x0, x129, x132);
+  uint32_t x135;
+  fiat_p256_uint1 x136;
+  fiat_p256_addcarryx_u32(&x135, &x136, x134, x127, x130);
+  uint32_t x137;
+  fiat_p256_uint1 x138;
+  fiat_p256_addcarryx_u32(&x137, &x138, x136, 0x0, x128);
+  uint32_t x139;
+  fiat_p256_uint1 x140;
+  fiat_p256_addcarryx_u32(&x139, &x140, 0x0, x131, x107);
+  uint32_t x141;
+  fiat_p256_uint1 x142;
+  fiat_p256_addcarryx_u32(&x141, &x142, x140, x133, x109);
+  uint32_t x143;
+  fiat_p256_uint1 x144;
+  fiat_p256_addcarryx_u32(&x143, &x144, x142, x135, x111);
+  uint32_t x145;
+  fiat_p256_uint1 x146;
+  fiat_p256_addcarryx_u32(&x145, &x146, x144, x137, x113);
+  uint32_t x147;
+  fiat_p256_uint1 x148;
+  fiat_p256_addcarryx_u32(&x147, &x148, x146, 0x0, x115);
+  uint32_t x149;
+  fiat_p256_uint1 x150;
+  fiat_p256_addcarryx_u32(&x149, &x150, x148, 0x0, x117);
+  uint32_t x151;
+  fiat_p256_uint1 x152;
+  fiat_p256_addcarryx_u32(&x151, &x152, x150, x107, x119);
+  uint32_t x153;
+  fiat_p256_uint1 x154;
+  fiat_p256_addcarryx_u32(&x153, &x154, x152, x125, x121);
+  uint32_t x155;
+  fiat_p256_uint1 x156;
+  fiat_p256_addcarryx_u32(&x155, &x156, x154, x126, x123);
+  uint32_t x157;
+  fiat_p256_uint1 x158;
+  fiat_p256_addcarryx_u32(&x157, &x158, x156, 0x0, x124);
+  uint32_t x159;
+  uint32_t x160;
+  fiat_p256_mulx_u32(&x159, &x160, x2, (arg2[7]));
+  uint32_t x161;
+  uint32_t x162;
+  fiat_p256_mulx_u32(&x161, &x162, x2, (arg2[6]));
+  uint32_t x163;
+  uint32_t x164;
+  fiat_p256_mulx_u32(&x163, &x164, x2, (arg2[5]));
+  uint32_t x165;
+  uint32_t x166;
+  fiat_p256_mulx_u32(&x165, &x166, x2, (arg2[4]));
+  uint32_t x167;
+  uint32_t x168;
+  fiat_p256_mulx_u32(&x167, &x168, x2, (arg2[3]));
+  uint32_t x169;
+  uint32_t x170;
+  fiat_p256_mulx_u32(&x169, &x170, x2, (arg2[2]));
+  uint32_t x171;
+  uint32_t x172;
+  fiat_p256_mulx_u32(&x171, &x172, x2, (arg2[1]));
+  uint32_t x173;
+  uint32_t x174;
+  fiat_p256_mulx_u32(&x173, &x174, x2, (arg2[0]));
+  uint32_t x175;
+  fiat_p256_uint1 x176;
+  fiat_p256_addcarryx_u32(&x175, &x176, 0x0, x171, x174);
+  uint32_t x177;
+  fiat_p256_uint1 x178;
+  fiat_p256_addcarryx_u32(&x177, &x178, x176, x169, x172);
+  uint32_t x179;
+  fiat_p256_uint1 x180;
+  fiat_p256_addcarryx_u32(&x179, &x180, x178, x167, x170);
+  uint32_t x181;
+  fiat_p256_uint1 x182;
+  fiat_p256_addcarryx_u32(&x181, &x182, x180, x165, x168);
+  uint32_t x183;
+  fiat_p256_uint1 x184;
+  fiat_p256_addcarryx_u32(&x183, &x184, x182, x163, x166);
+  uint32_t x185;
+  fiat_p256_uint1 x186;
+  fiat_p256_addcarryx_u32(&x185, &x186, x184, x161, x164);
+  uint32_t x187;
+  fiat_p256_uint1 x188;
+  fiat_p256_addcarryx_u32(&x187, &x188, x186, x159, x162);
+  uint32_t x189;
+  fiat_p256_uint1 x190;
+  fiat_p256_addcarryx_u32(&x189, &x190, x188, 0x0, x160);
+  uint32_t x191;
+  fiat_p256_uint1 x192;
+  fiat_p256_addcarryx_u32(&x191, &x192, 0x0, x173, x141);
+  uint32_t x193;
+  fiat_p256_uint1 x194;
+  fiat_p256_addcarryx_u32(&x193, &x194, x192, x175, x143);
+  uint32_t x195;
+  fiat_p256_uint1 x196;
+  fiat_p256_addcarryx_u32(&x195, &x196, x194, x177, x145);
+  uint32_t x197;
+  fiat_p256_uint1 x198;
+  fiat_p256_addcarryx_u32(&x197, &x198, x196, x179, x147);
+  uint32_t x199;
+  fiat_p256_uint1 x200;
+  fiat_p256_addcarryx_u32(&x199, &x200, x198, x181, x149);
+  uint32_t x201;
+  fiat_p256_uint1 x202;
+  fiat_p256_addcarryx_u32(&x201, &x202, x200, x183, x151);
+  uint32_t x203;
+  fiat_p256_uint1 x204;
+  fiat_p256_addcarryx_u32(&x203, &x204, x202, x185, x153);
+  uint32_t x205;
+  fiat_p256_uint1 x206;
+  fiat_p256_addcarryx_u32(&x205, &x206, x204, x187, x155);
+  uint32_t x207;
+  fiat_p256_uint1 x208;
+  fiat_p256_addcarryx_u32(&x207, &x208, x206, x189, x157);
+  uint32_t x209;
+  uint32_t x210;
+  fiat_p256_mulx_u32(&x209, &x210, x191, UINT32_C(0xffffffff));
+  uint32_t x211;
+  uint32_t x212;
+  fiat_p256_mulx_u32(&x211, &x212, x191, UINT32_C(0xffffffff));
+  uint32_t x213;
+  uint32_t x214;
+  fiat_p256_mulx_u32(&x213, &x214, x191, UINT32_C(0xffffffff));
+  uint32_t x215;
+  uint32_t x216;
+  fiat_p256_mulx_u32(&x215, &x216, x191, UINT32_C(0xffffffff));
+  uint32_t x217;
+  fiat_p256_uint1 x218;
+  fiat_p256_addcarryx_u32(&x217, &x218, 0x0, x213, x216);
+  uint32_t x219;
+  fiat_p256_uint1 x220;
+  fiat_p256_addcarryx_u32(&x219, &x220, x218, x211, x214);
+  uint32_t x221;
+  fiat_p256_uint1 x222;
+  fiat_p256_addcarryx_u32(&x221, &x222, x220, 0x0, x212);
+  uint32_t x223;
+  fiat_p256_uint1 x224;
+  fiat_p256_addcarryx_u32(&x223, &x224, 0x0, x215, x191);
+  uint32_t x225;
+  fiat_p256_uint1 x226;
+  fiat_p256_addcarryx_u32(&x225, &x226, x224, x217, x193);
+  uint32_t x227;
+  fiat_p256_uint1 x228;
+  fiat_p256_addcarryx_u32(&x227, &x228, x226, x219, x195);
+  uint32_t x229;
+  fiat_p256_uint1 x230;
+  fiat_p256_addcarryx_u32(&x229, &x230, x228, x221, x197);
+  uint32_t x231;
+  fiat_p256_uint1 x232;
+  fiat_p256_addcarryx_u32(&x231, &x232, x230, 0x0, x199);
+  uint32_t x233;
+  fiat_p256_uint1 x234;
+  fiat_p256_addcarryx_u32(&x233, &x234, x232, 0x0, x201);
+  uint32_t x235;
+  fiat_p256_uint1 x236;
+  fiat_p256_addcarryx_u32(&x235, &x236, x234, x191, x203);
+  uint32_t x237;
+  fiat_p256_uint1 x238;
+  fiat_p256_addcarryx_u32(&x237, &x238, x236, x209, x205);
+  uint32_t x239;
+  fiat_p256_uint1 x240;
+  fiat_p256_addcarryx_u32(&x239, &x240, x238, x210, x207);
+  uint32_t x241;
+  fiat_p256_uint1 x242;
+  fiat_p256_addcarryx_u32(&x241, &x242, x240, 0x0, x208);
+  uint32_t x243;
+  uint32_t x244;
+  fiat_p256_mulx_u32(&x243, &x244, x3, (arg2[7]));
+  uint32_t x245;
+  uint32_t x246;
+  fiat_p256_mulx_u32(&x245, &x246, x3, (arg2[6]));
+  uint32_t x247;
+  uint32_t x248;
+  fiat_p256_mulx_u32(&x247, &x248, x3, (arg2[5]));
+  uint32_t x249;
+  uint32_t x250;
+  fiat_p256_mulx_u32(&x249, &x250, x3, (arg2[4]));
+  uint32_t x251;
+  uint32_t x252;
+  fiat_p256_mulx_u32(&x251, &x252, x3, (arg2[3]));
+  uint32_t x253;
+  uint32_t x254;
+  fiat_p256_mulx_u32(&x253, &x254, x3, (arg2[2]));
+  uint32_t x255;
+  uint32_t x256;
+  fiat_p256_mulx_u32(&x255, &x256, x3, (arg2[1]));
+  uint32_t x257;
+  uint32_t x258;
+  fiat_p256_mulx_u32(&x257, &x258, x3, (arg2[0]));
+  uint32_t x259;
+  fiat_p256_uint1 x260;
+  fiat_p256_addcarryx_u32(&x259, &x260, 0x0, x255, x258);
+  uint32_t x261;
+  fiat_p256_uint1 x262;
+  fiat_p256_addcarryx_u32(&x261, &x262, x260, x253, x256);
+  uint32_t x263;
+  fiat_p256_uint1 x264;
+  fiat_p256_addcarryx_u32(&x263, &x264, x262, x251, x254);
+  uint32_t x265;
+  fiat_p256_uint1 x266;
+  fiat_p256_addcarryx_u32(&x265, &x266, x264, x249, x252);
+  uint32_t x267;
+  fiat_p256_uint1 x268;
+  fiat_p256_addcarryx_u32(&x267, &x268, x266, x247, x250);
+  uint32_t x269;
+  fiat_p256_uint1 x270;
+  fiat_p256_addcarryx_u32(&x269, &x270, x268, x245, x248);
+  uint32_t x271;
+  fiat_p256_uint1 x272;
+  fiat_p256_addcarryx_u32(&x271, &x272, x270, x243, x246);
+  uint32_t x273;
+  fiat_p256_uint1 x274;
+  fiat_p256_addcarryx_u32(&x273, &x274, x272, 0x0, x244);
+  uint32_t x275;
+  fiat_p256_uint1 x276;
+  fiat_p256_addcarryx_u32(&x275, &x276, 0x0, x257, x225);
+  uint32_t x277;
+  fiat_p256_uint1 x278;
+  fiat_p256_addcarryx_u32(&x277, &x278, x276, x259, x227);
+  uint32_t x279;
+  fiat_p256_uint1 x280;
+  fiat_p256_addcarryx_u32(&x279, &x280, x278, x261, x229);
+  uint32_t x281;
+  fiat_p256_uint1 x282;
+  fiat_p256_addcarryx_u32(&x281, &x282, x280, x263, x231);
+  uint32_t x283;
+  fiat_p256_uint1 x284;
+  fiat_p256_addcarryx_u32(&x283, &x284, x282, x265, x233);
+  uint32_t x285;
+  fiat_p256_uint1 x286;
+  fiat_p256_addcarryx_u32(&x285, &x286, x284, x267, x235);
+  uint32_t x287;
+  fiat_p256_uint1 x288;
+  fiat_p256_addcarryx_u32(&x287, &x288, x286, x269, x237);
+  uint32_t x289;
+  fiat_p256_uint1 x290;
+  fiat_p256_addcarryx_u32(&x289, &x290, x288, x271, x239);
+  uint32_t x291;
+  fiat_p256_uint1 x292;
+  fiat_p256_addcarryx_u32(&x291, &x292, x290, x273, x241);
+  uint32_t x293;
+  uint32_t x294;
+  fiat_p256_mulx_u32(&x293, &x294, x275, UINT32_C(0xffffffff));
+  uint32_t x295;
+  uint32_t x296;
+  fiat_p256_mulx_u32(&x295, &x296, x275, UINT32_C(0xffffffff));
+  uint32_t x297;
+  uint32_t x298;
+  fiat_p256_mulx_u32(&x297, &x298, x275, UINT32_C(0xffffffff));
+  uint32_t x299;
+  uint32_t x300;
+  fiat_p256_mulx_u32(&x299, &x300, x275, UINT32_C(0xffffffff));
+  uint32_t x301;
+  fiat_p256_uint1 x302;
+  fiat_p256_addcarryx_u32(&x301, &x302, 0x0, x297, x300);
+  uint32_t x303;
+  fiat_p256_uint1 x304;
+  fiat_p256_addcarryx_u32(&x303, &x304, x302, x295, x298);
+  uint32_t x305;
+  fiat_p256_uint1 x306;
+  fiat_p256_addcarryx_u32(&x305, &x306, x304, 0x0, x296);
+  uint32_t x307;
+  fiat_p256_uint1 x308;
+  fiat_p256_addcarryx_u32(&x307, &x308, 0x0, x299, x275);
+  uint32_t x309;
+  fiat_p256_uint1 x310;
+  fiat_p256_addcarryx_u32(&x309, &x310, x308, x301, x277);
+  uint32_t x311;
+  fiat_p256_uint1 x312;
+  fiat_p256_addcarryx_u32(&x311, &x312, x310, x303, x279);
+  uint32_t x313;
+  fiat_p256_uint1 x314;
+  fiat_p256_addcarryx_u32(&x313, &x314, x312, x305, x281);
+  uint32_t x315;
+  fiat_p256_uint1 x316;
+  fiat_p256_addcarryx_u32(&x315, &x316, x314, 0x0, x283);
+  uint32_t x317;
+  fiat_p256_uint1 x318;
+  fiat_p256_addcarryx_u32(&x317, &x318, x316, 0x0, x285);
+  uint32_t x319;
+  fiat_p256_uint1 x320;
+  fiat_p256_addcarryx_u32(&x319, &x320, x318, x275, x287);
+  uint32_t x321;
+  fiat_p256_uint1 x322;
+  fiat_p256_addcarryx_u32(&x321, &x322, x320, x293, x289);
+  uint32_t x323;
+  fiat_p256_uint1 x324;
+  fiat_p256_addcarryx_u32(&x323, &x324, x322, x294, x291);
+  uint32_t x325;
+  fiat_p256_uint1 x326;
+  fiat_p256_addcarryx_u32(&x325, &x326, x324, 0x0, x292);
+  uint32_t x327;
+  uint32_t x328;
+  fiat_p256_mulx_u32(&x327, &x328, x4, (arg2[7]));
+  uint32_t x329;
+  uint32_t x330;
+  fiat_p256_mulx_u32(&x329, &x330, x4, (arg2[6]));
+  uint32_t x331;
+  uint32_t x332;
+  fiat_p256_mulx_u32(&x331, &x332, x4, (arg2[5]));
+  uint32_t x333;
+  uint32_t x334;
+  fiat_p256_mulx_u32(&x333, &x334, x4, (arg2[4]));
+  uint32_t x335;
+  uint32_t x336;
+  fiat_p256_mulx_u32(&x335, &x336, x4, (arg2[3]));
+  uint32_t x337;
+  uint32_t x338;
+  fiat_p256_mulx_u32(&x337, &x338, x4, (arg2[2]));
+  uint32_t x339;
+  uint32_t x340;
+  fiat_p256_mulx_u32(&x339, &x340, x4, (arg2[1]));
+  uint32_t x341;
+  uint32_t x342;
+  fiat_p256_mulx_u32(&x341, &x342, x4, (arg2[0]));
+  uint32_t x343;
+  fiat_p256_uint1 x344;
+  fiat_p256_addcarryx_u32(&x343, &x344, 0x0, x339, x342);
+  uint32_t x345;
+  fiat_p256_uint1 x346;
+  fiat_p256_addcarryx_u32(&x345, &x346, x344, x337, x340);
+  uint32_t x347;
+  fiat_p256_uint1 x348;
+  fiat_p256_addcarryx_u32(&x347, &x348, x346, x335, x338);
+  uint32_t x349;
+  fiat_p256_uint1 x350;
+  fiat_p256_addcarryx_u32(&x349, &x350, x348, x333, x336);
+  uint32_t x351;
+  fiat_p256_uint1 x352;
+  fiat_p256_addcarryx_u32(&x351, &x352, x350, x331, x334);
+  uint32_t x353;
+  fiat_p256_uint1 x354;
+  fiat_p256_addcarryx_u32(&x353, &x354, x352, x329, x332);
+  uint32_t x355;
+  fiat_p256_uint1 x356;
+  fiat_p256_addcarryx_u32(&x355, &x356, x354, x327, x330);
+  uint32_t x357;
+  fiat_p256_uint1 x358;
+  fiat_p256_addcarryx_u32(&x357, &x358, x356, 0x0, x328);
+  uint32_t x359;
+  fiat_p256_uint1 x360;
+  fiat_p256_addcarryx_u32(&x359, &x360, 0x0, x341, x309);
+  uint32_t x361;
+  fiat_p256_uint1 x362;
+  fiat_p256_addcarryx_u32(&x361, &x362, x360, x343, x311);
+  uint32_t x363;
+  fiat_p256_uint1 x364;
+  fiat_p256_addcarryx_u32(&x363, &x364, x362, x345, x313);
+  uint32_t x365;
+  fiat_p256_uint1 x366;
+  fiat_p256_addcarryx_u32(&x365, &x366, x364, x347, x315);
+  uint32_t x367;
+  fiat_p256_uint1 x368;
+  fiat_p256_addcarryx_u32(&x367, &x368, x366, x349, x317);
+  uint32_t x369;
+  fiat_p256_uint1 x370;
+  fiat_p256_addcarryx_u32(&x369, &x370, x368, x351, x319);
+  uint32_t x371;
+  fiat_p256_uint1 x372;
+  fiat_p256_addcarryx_u32(&x371, &x372, x370, x353, x321);
+  uint32_t x373;
+  fiat_p256_uint1 x374;
+  fiat_p256_addcarryx_u32(&x373, &x374, x372, x355, x323);
+  uint32_t x375;
+  fiat_p256_uint1 x376;
+  fiat_p256_addcarryx_u32(&x375, &x376, x374, x357, x325);
+  uint32_t x377;
+  uint32_t x378;
+  fiat_p256_mulx_u32(&x377, &x378, x359, UINT32_C(0xffffffff));
+  uint32_t x379;
+  uint32_t x380;
+  fiat_p256_mulx_u32(&x379, &x380, x359, UINT32_C(0xffffffff));
+  uint32_t x381;
+  uint32_t x382;
+  fiat_p256_mulx_u32(&x381, &x382, x359, UINT32_C(0xffffffff));
+  uint32_t x383;
+  uint32_t x384;
+  fiat_p256_mulx_u32(&x383, &x384, x359, UINT32_C(0xffffffff));
+  uint32_t x385;
+  fiat_p256_uint1 x386;
+  fiat_p256_addcarryx_u32(&x385, &x386, 0x0, x381, x384);
+  uint32_t x387;
+  fiat_p256_uint1 x388;
+  fiat_p256_addcarryx_u32(&x387, &x388, x386, x379, x382);
+  uint32_t x389;
+  fiat_p256_uint1 x390;
+  fiat_p256_addcarryx_u32(&x389, &x390, x388, 0x0, x380);
+  uint32_t x391;
+  fiat_p256_uint1 x392;
+  fiat_p256_addcarryx_u32(&x391, &x392, 0x0, x383, x359);
+  uint32_t x393;
+  fiat_p256_uint1 x394;
+  fiat_p256_addcarryx_u32(&x393, &x394, x392, x385, x361);
+  uint32_t x395;
+  fiat_p256_uint1 x396;
+  fiat_p256_addcarryx_u32(&x395, &x396, x394, x387, x363);
+  uint32_t x397;
+  fiat_p256_uint1 x398;
+  fiat_p256_addcarryx_u32(&x397, &x398, x396, x389, x365);
+  uint32_t x399;
+  fiat_p256_uint1 x400;
+  fiat_p256_addcarryx_u32(&x399, &x400, x398, 0x0, x367);
+  uint32_t x401;
+  fiat_p256_uint1 x402;
+  fiat_p256_addcarryx_u32(&x401, &x402, x400, 0x0, x369);
+  uint32_t x403;
+  fiat_p256_uint1 x404;
+  fiat_p256_addcarryx_u32(&x403, &x404, x402, x359, x371);
+  uint32_t x405;
+  fiat_p256_uint1 x406;
+  fiat_p256_addcarryx_u32(&x405, &x406, x404, x377, x373);
+  uint32_t x407;
+  fiat_p256_uint1 x408;
+  fiat_p256_addcarryx_u32(&x407, &x408, x406, x378, x375);
+  uint32_t x409;
+  fiat_p256_uint1 x410;
+  fiat_p256_addcarryx_u32(&x409, &x410, x408, 0x0, x376);
+  uint32_t x411;
+  uint32_t x412;
+  fiat_p256_mulx_u32(&x411, &x412, x5, (arg2[7]));
+  uint32_t x413;
+  uint32_t x414;
+  fiat_p256_mulx_u32(&x413, &x414, x5, (arg2[6]));
+  uint32_t x415;
+  uint32_t x416;
+  fiat_p256_mulx_u32(&x415, &x416, x5, (arg2[5]));
+  uint32_t x417;
+  uint32_t x418;
+  fiat_p256_mulx_u32(&x417, &x418, x5, (arg2[4]));
+  uint32_t x419;
+  uint32_t x420;
+  fiat_p256_mulx_u32(&x419, &x420, x5, (arg2[3]));
+  uint32_t x421;
+  uint32_t x422;
+  fiat_p256_mulx_u32(&x421, &x422, x5, (arg2[2]));
+  uint32_t x423;
+  uint32_t x424;
+  fiat_p256_mulx_u32(&x423, &x424, x5, (arg2[1]));
+  uint32_t x425;
+  uint32_t x426;
+  fiat_p256_mulx_u32(&x425, &x426, x5, (arg2[0]));
+  uint32_t x427;
+  fiat_p256_uint1 x428;
+  fiat_p256_addcarryx_u32(&x427, &x428, 0x0, x423, x426);
+  uint32_t x429;
+  fiat_p256_uint1 x430;
+  fiat_p256_addcarryx_u32(&x429, &x430, x428, x421, x424);
+  uint32_t x431;
+  fiat_p256_uint1 x432;
+  fiat_p256_addcarryx_u32(&x431, &x432, x430, x419, x422);
+  uint32_t x433;
+  fiat_p256_uint1 x434;
+  fiat_p256_addcarryx_u32(&x433, &x434, x432, x417, x420);
+  uint32_t x435;
+  fiat_p256_uint1 x436;
+  fiat_p256_addcarryx_u32(&x435, &x436, x434, x415, x418);
+  uint32_t x437;
+  fiat_p256_uint1 x438;
+  fiat_p256_addcarryx_u32(&x437, &x438, x436, x413, x416);
+  uint32_t x439;
+  fiat_p256_uint1 x440;
+  fiat_p256_addcarryx_u32(&x439, &x440, x438, x411, x414);
+  uint32_t x441;
+  fiat_p256_uint1 x442;
+  fiat_p256_addcarryx_u32(&x441, &x442, x440, 0x0, x412);
+  uint32_t x443;
+  fiat_p256_uint1 x444;
+  fiat_p256_addcarryx_u32(&x443, &x444, 0x0, x425, x393);
+  uint32_t x445;
+  fiat_p256_uint1 x446;
+  fiat_p256_addcarryx_u32(&x445, &x446, x444, x427, x395);
+  uint32_t x447;
+  fiat_p256_uint1 x448;
+  fiat_p256_addcarryx_u32(&x447, &x448, x446, x429, x397);
+  uint32_t x449;
+  fiat_p256_uint1 x450;
+  fiat_p256_addcarryx_u32(&x449, &x450, x448, x431, x399);
+  uint32_t x451;
+  fiat_p256_uint1 x452;
+  fiat_p256_addcarryx_u32(&x451, &x452, x450, x433, x401);
+  uint32_t x453;
+  fiat_p256_uint1 x454;
+  fiat_p256_addcarryx_u32(&x453, &x454, x452, x435, x403);
+  uint32_t x455;
+  fiat_p256_uint1 x456;
+  fiat_p256_addcarryx_u32(&x455, &x456, x454, x437, x405);
+  uint32_t x457;
+  fiat_p256_uint1 x458;
+  fiat_p256_addcarryx_u32(&x457, &x458, x456, x439, x407);
+  uint32_t x459;
+  fiat_p256_uint1 x460;
+  fiat_p256_addcarryx_u32(&x459, &x460, x458, x441, x409);
+  uint32_t x461;
+  uint32_t x462;
+  fiat_p256_mulx_u32(&x461, &x462, x443, UINT32_C(0xffffffff));
+  uint32_t x463;
+  uint32_t x464;
+  fiat_p256_mulx_u32(&x463, &x464, x443, UINT32_C(0xffffffff));
+  uint32_t x465;
+  uint32_t x466;
+  fiat_p256_mulx_u32(&x465, &x466, x443, UINT32_C(0xffffffff));
+  uint32_t x467;
+  uint32_t x468;
+  fiat_p256_mulx_u32(&x467, &x468, x443, UINT32_C(0xffffffff));
+  uint32_t x469;
+  fiat_p256_uint1 x470;
+  fiat_p256_addcarryx_u32(&x469, &x470, 0x0, x465, x468);
+  uint32_t x471;
+  fiat_p256_uint1 x472;
+  fiat_p256_addcarryx_u32(&x471, &x472, x470, x463, x466);
+  uint32_t x473;
+  fiat_p256_uint1 x474;
+  fiat_p256_addcarryx_u32(&x473, &x474, x472, 0x0, x464);
+  uint32_t x475;
+  fiat_p256_uint1 x476;
+  fiat_p256_addcarryx_u32(&x475, &x476, 0x0, x467, x443);
+  uint32_t x477;
+  fiat_p256_uint1 x478;
+  fiat_p256_addcarryx_u32(&x477, &x478, x476, x469, x445);
+  uint32_t x479;
+  fiat_p256_uint1 x480;
+  fiat_p256_addcarryx_u32(&x479, &x480, x478, x471, x447);
+  uint32_t x481;
+  fiat_p256_uint1 x482;
+  fiat_p256_addcarryx_u32(&x481, &x482, x480, x473, x449);
+  uint32_t x483;
+  fiat_p256_uint1 x484;
+  fiat_p256_addcarryx_u32(&x483, &x484, x482, 0x0, x451);
+  uint32_t x485;
+  fiat_p256_uint1 x486;
+  fiat_p256_addcarryx_u32(&x485, &x486, x484, 0x0, x453);
+  uint32_t x487;
+  fiat_p256_uint1 x488;
+  fiat_p256_addcarryx_u32(&x487, &x488, x486, x443, x455);
+  uint32_t x489;
+  fiat_p256_uint1 x490;
+  fiat_p256_addcarryx_u32(&x489, &x490, x488, x461, x457);
+  uint32_t x491;
+  fiat_p256_uint1 x492;
+  fiat_p256_addcarryx_u32(&x491, &x492, x490, x462, x459);
+  uint32_t x493;
+  fiat_p256_uint1 x494;
+  fiat_p256_addcarryx_u32(&x493, &x494, x492, 0x0, x460);
+  uint32_t x495;
+  uint32_t x496;
+  fiat_p256_mulx_u32(&x495, &x496, x6, (arg2[7]));
+  uint32_t x497;
+  uint32_t x498;
+  fiat_p256_mulx_u32(&x497, &x498, x6, (arg2[6]));
+  uint32_t x499;
+  uint32_t x500;
+  fiat_p256_mulx_u32(&x499, &x500, x6, (arg2[5]));
+  uint32_t x501;
+  uint32_t x502;
+  fiat_p256_mulx_u32(&x501, &x502, x6, (arg2[4]));
+  uint32_t x503;
+  uint32_t x504;
+  fiat_p256_mulx_u32(&x503, &x504, x6, (arg2[3]));
+  uint32_t x505;
+  uint32_t x506;
+  fiat_p256_mulx_u32(&x505, &x506, x6, (arg2[2]));
+  uint32_t x507;
+  uint32_t x508;
+  fiat_p256_mulx_u32(&x507, &x508, x6, (arg2[1]));
+  uint32_t x509;
+  uint32_t x510;
+  fiat_p256_mulx_u32(&x509, &x510, x6, (arg2[0]));
+  uint32_t x511;
+  fiat_p256_uint1 x512;
+  fiat_p256_addcarryx_u32(&x511, &x512, 0x0, x507, x510);
+  uint32_t x513;
+  fiat_p256_uint1 x514;
+  fiat_p256_addcarryx_u32(&x513, &x514, x512, x505, x508);
+  uint32_t x515;
+  fiat_p256_uint1 x516;
+  fiat_p256_addcarryx_u32(&x515, &x516, x514, x503, x506);
+  uint32_t x517;
+  fiat_p256_uint1 x518;
+  fiat_p256_addcarryx_u32(&x517, &x518, x516, x501, x504);
+  uint32_t x519;
+  fiat_p256_uint1 x520;
+  fiat_p256_addcarryx_u32(&x519, &x520, x518, x499, x502);
+  uint32_t x521;
+  fiat_p256_uint1 x522;
+  fiat_p256_addcarryx_u32(&x521, &x522, x520, x497, x500);
+  uint32_t x523;
+  fiat_p256_uint1 x524;
+  fiat_p256_addcarryx_u32(&x523, &x524, x522, x495, x498);
+  uint32_t x525;
+  fiat_p256_uint1 x526;
+  fiat_p256_addcarryx_u32(&x525, &x526, x524, 0x0, x496);
+  uint32_t x527;
+  fiat_p256_uint1 x528;
+  fiat_p256_addcarryx_u32(&x527, &x528, 0x0, x509, x477);
+  uint32_t x529;
+  fiat_p256_uint1 x530;
+  fiat_p256_addcarryx_u32(&x529, &x530, x528, x511, x479);
+  uint32_t x531;
+  fiat_p256_uint1 x532;
+  fiat_p256_addcarryx_u32(&x531, &x532, x530, x513, x481);
+  uint32_t x533;
+  fiat_p256_uint1 x534;
+  fiat_p256_addcarryx_u32(&x533, &x534, x532, x515, x483);
+  uint32_t x535;
+  fiat_p256_uint1 x536;
+  fiat_p256_addcarryx_u32(&x535, &x536, x534, x517, x485);
+  uint32_t x537;
+  fiat_p256_uint1 x538;
+  fiat_p256_addcarryx_u32(&x537, &x538, x536, x519, x487);
+  uint32_t x539;
+  fiat_p256_uint1 x540;
+  fiat_p256_addcarryx_u32(&x539, &x540, x538, x521, x489);
+  uint32_t x541;
+  fiat_p256_uint1 x542;
+  fiat_p256_addcarryx_u32(&x541, &x542, x540, x523, x491);
+  uint32_t x543;
+  fiat_p256_uint1 x544;
+  fiat_p256_addcarryx_u32(&x543, &x544, x542, x525, x493);
+  uint32_t x545;
+  uint32_t x546;
+  fiat_p256_mulx_u32(&x545, &x546, x527, UINT32_C(0xffffffff));
+  uint32_t x547;
+  uint32_t x548;
+  fiat_p256_mulx_u32(&x547, &x548, x527, UINT32_C(0xffffffff));
+  uint32_t x549;
+  uint32_t x550;
+  fiat_p256_mulx_u32(&x549, &x550, x527, UINT32_C(0xffffffff));
+  uint32_t x551;
+  uint32_t x552;
+  fiat_p256_mulx_u32(&x551, &x552, x527, UINT32_C(0xffffffff));
+  uint32_t x553;
+  fiat_p256_uint1 x554;
+  fiat_p256_addcarryx_u32(&x553, &x554, 0x0, x549, x552);
+  uint32_t x555;
+  fiat_p256_uint1 x556;
+  fiat_p256_addcarryx_u32(&x555, &x556, x554, x547, x550);
+  uint32_t x557;
+  fiat_p256_uint1 x558;
+  fiat_p256_addcarryx_u32(&x557, &x558, x556, 0x0, x548);
+  uint32_t x559;
+  fiat_p256_uint1 x560;
+  fiat_p256_addcarryx_u32(&x559, &x560, 0x0, x551, x527);
+  uint32_t x561;
+  fiat_p256_uint1 x562;
+  fiat_p256_addcarryx_u32(&x561, &x562, x560, x553, x529);
+  uint32_t x563;
+  fiat_p256_uint1 x564;
+  fiat_p256_addcarryx_u32(&x563, &x564, x562, x555, x531);
+  uint32_t x565;
+  fiat_p256_uint1 x566;
+  fiat_p256_addcarryx_u32(&x565, &x566, x564, x557, x533);
+  uint32_t x567;
+  fiat_p256_uint1 x568;
+  fiat_p256_addcarryx_u32(&x567, &x568, x566, 0x0, x535);
+  uint32_t x569;
+  fiat_p256_uint1 x570;
+  fiat_p256_addcarryx_u32(&x569, &x570, x568, 0x0, x537);
+  uint32_t x571;
+  fiat_p256_uint1 x572;
+  fiat_p256_addcarryx_u32(&x571, &x572, x570, x527, x539);
+  uint32_t x573;
+  fiat_p256_uint1 x574;
+  fiat_p256_addcarryx_u32(&x573, &x574, x572, x545, x541);
+  uint32_t x575;
+  fiat_p256_uint1 x576;
+  fiat_p256_addcarryx_u32(&x575, &x576, x574, x546, x543);
+  uint32_t x577;
+  fiat_p256_uint1 x578;
+  fiat_p256_addcarryx_u32(&x577, &x578, x576, 0x0, x544);
+  uint32_t x579;
+  uint32_t x580;
+  fiat_p256_mulx_u32(&x579, &x580, x7, (arg2[7]));
+  uint32_t x581;
+  uint32_t x582;
+  fiat_p256_mulx_u32(&x581, &x582, x7, (arg2[6]));
+  uint32_t x583;
+  uint32_t x584;
+  fiat_p256_mulx_u32(&x583, &x584, x7, (arg2[5]));
+  uint32_t x585;
+  uint32_t x586;
+  fiat_p256_mulx_u32(&x585, &x586, x7, (arg2[4]));
+  uint32_t x587;
+  uint32_t x588;
+  fiat_p256_mulx_u32(&x587, &x588, x7, (arg2[3]));
+  uint32_t x589;
+  uint32_t x590;
+  fiat_p256_mulx_u32(&x589, &x590, x7, (arg2[2]));
+  uint32_t x591;
+  uint32_t x592;
+  fiat_p256_mulx_u32(&x591, &x592, x7, (arg2[1]));
+  uint32_t x593;
+  uint32_t x594;
+  fiat_p256_mulx_u32(&x593, &x594, x7, (arg2[0]));
+  uint32_t x595;
+  fiat_p256_uint1 x596;
+  fiat_p256_addcarryx_u32(&x595, &x596, 0x0, x591, x594);
+  uint32_t x597;
+  fiat_p256_uint1 x598;
+  fiat_p256_addcarryx_u32(&x597, &x598, x596, x589, x592);
+  uint32_t x599;
+  fiat_p256_uint1 x600;
+  fiat_p256_addcarryx_u32(&x599, &x600, x598, x587, x590);
+  uint32_t x601;
+  fiat_p256_uint1 x602;
+  fiat_p256_addcarryx_u32(&x601, &x602, x600, x585, x588);
+  uint32_t x603;
+  fiat_p256_uint1 x604;
+  fiat_p256_addcarryx_u32(&x603, &x604, x602, x583, x586);
+  uint32_t x605;
+  fiat_p256_uint1 x606;
+  fiat_p256_addcarryx_u32(&x605, &x606, x604, x581, x584);
+  uint32_t x607;
+  fiat_p256_uint1 x608;
+  fiat_p256_addcarryx_u32(&x607, &x608, x606, x579, x582);
+  uint32_t x609;
+  fiat_p256_uint1 x610;
+  fiat_p256_addcarryx_u32(&x609, &x610, x608, 0x0, x580);
+  uint32_t x611;
+  fiat_p256_uint1 x612;
+  fiat_p256_addcarryx_u32(&x611, &x612, 0x0, x593, x561);
+  uint32_t x613;
+  fiat_p256_uint1 x614;
+  fiat_p256_addcarryx_u32(&x613, &x614, x612, x595, x563);
+  uint32_t x615;
+  fiat_p256_uint1 x616;
+  fiat_p256_addcarryx_u32(&x615, &x616, x614, x597, x565);
+  uint32_t x617;
+  fiat_p256_uint1 x618;
+  fiat_p256_addcarryx_u32(&x617, &x618, x616, x599, x567);
+  uint32_t x619;
+  fiat_p256_uint1 x620;
+  fiat_p256_addcarryx_u32(&x619, &x620, x618, x601, x569);
+  uint32_t x621;
+  fiat_p256_uint1 x622;
+  fiat_p256_addcarryx_u32(&x621, &x622, x620, x603, x571);
+  uint32_t x623;
+  fiat_p256_uint1 x624;
+  fiat_p256_addcarryx_u32(&x623, &x624, x622, x605, x573);
+  uint32_t x625;
+  fiat_p256_uint1 x626;
+  fiat_p256_addcarryx_u32(&x625, &x626, x624, x607, x575);
+  uint32_t x627;
+  fiat_p256_uint1 x628;
+  fiat_p256_addcarryx_u32(&x627, &x628, x626, x609, x577);
+  uint32_t x629;
+  uint32_t x630;
+  fiat_p256_mulx_u32(&x629, &x630, x611, UINT32_C(0xffffffff));
+  uint32_t x631;
+  uint32_t x632;
+  fiat_p256_mulx_u32(&x631, &x632, x611, UINT32_C(0xffffffff));
+  uint32_t x633;
+  uint32_t x634;
+  fiat_p256_mulx_u32(&x633, &x634, x611, UINT32_C(0xffffffff));
+  uint32_t x635;
+  uint32_t x636;
+  fiat_p256_mulx_u32(&x635, &x636, x611, UINT32_C(0xffffffff));
+  uint32_t x637;
+  fiat_p256_uint1 x638;
+  fiat_p256_addcarryx_u32(&x637, &x638, 0x0, x633, x636);
+  uint32_t x639;
+  fiat_p256_uint1 x640;
+  fiat_p256_addcarryx_u32(&x639, &x640, x638, x631, x634);
+  uint32_t x641;
+  fiat_p256_uint1 x642;
+  fiat_p256_addcarryx_u32(&x641, &x642, x640, 0x0, x632);
+  uint32_t x643;
+  fiat_p256_uint1 x644;
+  fiat_p256_addcarryx_u32(&x643, &x644, 0x0, x635, x611);
+  uint32_t x645;
+  fiat_p256_uint1 x646;
+  fiat_p256_addcarryx_u32(&x645, &x646, x644, x637, x613);
+  uint32_t x647;
+  fiat_p256_uint1 x648;
+  fiat_p256_addcarryx_u32(&x647, &x648, x646, x639, x615);
+  uint32_t x649;
+  fiat_p256_uint1 x650;
+  fiat_p256_addcarryx_u32(&x649, &x650, x648, x641, x617);
+  uint32_t x651;
+  fiat_p256_uint1 x652;
+  fiat_p256_addcarryx_u32(&x651, &x652, x650, 0x0, x619);
+  uint32_t x653;
+  fiat_p256_uint1 x654;
+  fiat_p256_addcarryx_u32(&x653, &x654, x652, 0x0, x621);
+  uint32_t x655;
+  fiat_p256_uint1 x656;
+  fiat_p256_addcarryx_u32(&x655, &x656, x654, x611, x623);
+  uint32_t x657;
+  fiat_p256_uint1 x658;
+  fiat_p256_addcarryx_u32(&x657, &x658, x656, x629, x625);
+  uint32_t x659;
+  fiat_p256_uint1 x660;
+  fiat_p256_addcarryx_u32(&x659, &x660, x658, x630, x627);
+  uint32_t x661;
+  fiat_p256_uint1 x662;
+  fiat_p256_addcarryx_u32(&x661, &x662, x660, 0x0, x628);
+  uint32_t x663;
+  fiat_p256_uint1 x664;
+  fiat_p256_subborrowx_u32(&x663, &x664, 0x0, x645, UINT32_C(0xffffffff));
+  uint32_t x665;
+  fiat_p256_uint1 x666;
+  fiat_p256_subborrowx_u32(&x665, &x666, x664, x647, UINT32_C(0xffffffff));
+  uint32_t x667;
+  fiat_p256_uint1 x668;
+  fiat_p256_subborrowx_u32(&x667, &x668, x666, x649, UINT32_C(0xffffffff));
+  uint32_t x669;
+  fiat_p256_uint1 x670;
+  fiat_p256_subborrowx_u32(&x669, &x670, x668, x651, 0x0);
+  uint32_t x671;
+  fiat_p256_uint1 x672;
+  fiat_p256_subborrowx_u32(&x671, &x672, x670, x653, 0x0);
+  uint32_t x673;
+  fiat_p256_uint1 x674;
+  fiat_p256_subborrowx_u32(&x673, &x674, x672, x655, 0x0);
+  uint32_t x675;
+  fiat_p256_uint1 x676;
+  fiat_p256_subborrowx_u32(&x675, &x676, x674, x657, 0x1);
+  uint32_t x677;
+  fiat_p256_uint1 x678;
+  fiat_p256_subborrowx_u32(&x677, &x678, x676, x659, UINT32_C(0xffffffff));
+  uint32_t x679;
+  fiat_p256_uint1 x680;
+  fiat_p256_subborrowx_u32(&x679, &x680, x678, x661, 0x0);
+  uint32_t x681;
+  fiat_p256_cmovznz_u32(&x681, x680, x663, x645);
+  uint32_t x682;
+  fiat_p256_cmovznz_u32(&x682, x680, x665, x647);
+  uint32_t x683;
+  fiat_p256_cmovznz_u32(&x683, x680, x667, x649);
+  uint32_t x684;
+  fiat_p256_cmovznz_u32(&x684, x680, x669, x651);
+  uint32_t x685;
+  fiat_p256_cmovznz_u32(&x685, x680, x671, x653);
+  uint32_t x686;
+  fiat_p256_cmovznz_u32(&x686, x680, x673, x655);
+  uint32_t x687;
+  fiat_p256_cmovznz_u32(&x687, x680, x675, x657);
+  uint32_t x688;
+  fiat_p256_cmovznz_u32(&x688, x680, x677, x659);
+  out1[0] = x681;
+  out1[1] = x682;
+  out1[2] = x683;
+  out1[3] = x684;
+  out1[4] = x685;
+  out1[5] = x686;
+  out1[6] = x687;
+  out1[7] = x688;
+}
+
+/*
+ * Input Bounds:
+ *   arg1: [[0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff]]
+ * Output Bounds:
+ *   out1: [[0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff]]
+ */
+static void fiat_p256_square(uint32_t out1[8], const uint32_t arg1[8]) {
+  uint32_t x1 = (arg1[1]);
+  uint32_t x2 = (arg1[2]);
+  uint32_t x3 = (arg1[3]);
+  uint32_t x4 = (arg1[4]);
+  uint32_t x5 = (arg1[5]);
+  uint32_t x6 = (arg1[6]);
+  uint32_t x7 = (arg1[7]);
+  uint32_t x8 = (arg1[0]);
+  uint32_t x9;
+  uint32_t x10;
+  fiat_p256_mulx_u32(&x9, &x10, x8, (arg1[7]));
+  uint32_t x11;
+  uint32_t x12;
+  fiat_p256_mulx_u32(&x11, &x12, x8, (arg1[6]));
+  uint32_t x13;
+  uint32_t x14;
+  fiat_p256_mulx_u32(&x13, &x14, x8, (arg1[5]));
+  uint32_t x15;
+  uint32_t x16;
+  fiat_p256_mulx_u32(&x15, &x16, x8, (arg1[4]));
+  uint32_t x17;
+  uint32_t x18;
+  fiat_p256_mulx_u32(&x17, &x18, x8, (arg1[3]));
+  uint32_t x19;
+  uint32_t x20;
+  fiat_p256_mulx_u32(&x19, &x20, x8, (arg1[2]));
+  uint32_t x21;
+  uint32_t x22;
+  fiat_p256_mulx_u32(&x21, &x22, x8, (arg1[1]));
+  uint32_t x23;
+  uint32_t x24;
+  fiat_p256_mulx_u32(&x23, &x24, x8, (arg1[0]));
+  uint32_t x25;
+  fiat_p256_uint1 x26;
+  fiat_p256_addcarryx_u32(&x25, &x26, 0x0, x21, x24);
+  uint32_t x27;
+  fiat_p256_uint1 x28;
+  fiat_p256_addcarryx_u32(&x27, &x28, x26, x19, x22);
+  uint32_t x29;
+  fiat_p256_uint1 x30;
+  fiat_p256_addcarryx_u32(&x29, &x30, x28, x17, x20);
+  uint32_t x31;
+  fiat_p256_uint1 x32;
+  fiat_p256_addcarryx_u32(&x31, &x32, x30, x15, x18);
+  uint32_t x33;
+  fiat_p256_uint1 x34;
+  fiat_p256_addcarryx_u32(&x33, &x34, x32, x13, x16);
+  uint32_t x35;
+  fiat_p256_uint1 x36;
+  fiat_p256_addcarryx_u32(&x35, &x36, x34, x11, x14);
+  uint32_t x37;
+  fiat_p256_uint1 x38;
+  fiat_p256_addcarryx_u32(&x37, &x38, x36, x9, x12);
+  uint32_t x39;
+  fiat_p256_uint1 x40;
+  fiat_p256_addcarryx_u32(&x39, &x40, x38, 0x0, x10);
+  uint32_t x41;
+  uint32_t x42;
+  fiat_p256_mulx_u32(&x41, &x42, x23, UINT32_C(0xffffffff));
+  uint32_t x43;
+  uint32_t x44;
+  fiat_p256_mulx_u32(&x43, &x44, x23, UINT32_C(0xffffffff));
+  uint32_t x45;
+  uint32_t x46;
+  fiat_p256_mulx_u32(&x45, &x46, x23, UINT32_C(0xffffffff));
+  uint32_t x47;
+  uint32_t x48;
+  fiat_p256_mulx_u32(&x47, &x48, x23, UINT32_C(0xffffffff));
+  uint32_t x49;
+  fiat_p256_uint1 x50;
+  fiat_p256_addcarryx_u32(&x49, &x50, 0x0, x45, x48);
+  uint32_t x51;
+  fiat_p256_uint1 x52;
+  fiat_p256_addcarryx_u32(&x51, &x52, x50, x43, x46);
+  uint32_t x53;
+  fiat_p256_uint1 x54;
+  fiat_p256_addcarryx_u32(&x53, &x54, x52, 0x0, x44);
+  uint32_t x55;
+  fiat_p256_uint1 x56;
+  fiat_p256_addcarryx_u32(&x55, &x56, 0x0, x47, x23);
+  uint32_t x57;
+  fiat_p256_uint1 x58;
+  fiat_p256_addcarryx_u32(&x57, &x58, x56, x49, x25);
+  uint32_t x59;
+  fiat_p256_uint1 x60;
+  fiat_p256_addcarryx_u32(&x59, &x60, x58, x51, x27);
+  uint32_t x61;
+  fiat_p256_uint1 x62;
+  fiat_p256_addcarryx_u32(&x61, &x62, x60, x53, x29);
+  uint32_t x63;
+  fiat_p256_uint1 x64;
+  fiat_p256_addcarryx_u32(&x63, &x64, x62, 0x0, x31);
+  uint32_t x65;
+  fiat_p256_uint1 x66;
+  fiat_p256_addcarryx_u32(&x65, &x66, x64, 0x0, x33);
+  uint32_t x67;
+  fiat_p256_uint1 x68;
+  fiat_p256_addcarryx_u32(&x67, &x68, x66, x23, x35);
+  uint32_t x69;
+  fiat_p256_uint1 x70;
+  fiat_p256_addcarryx_u32(&x69, &x70, x68, x41, x37);
+  uint32_t x71;
+  fiat_p256_uint1 x72;
+  fiat_p256_addcarryx_u32(&x71, &x72, x70, x42, x39);
+  uint32_t x73;
+  fiat_p256_uint1 x74;
+  fiat_p256_addcarryx_u32(&x73, &x74, x72, 0x0, 0x0);
+  uint32_t x75;
+  uint32_t x76;
+  fiat_p256_mulx_u32(&x75, &x76, x1, (arg1[7]));
+  uint32_t x77;
+  uint32_t x78;
+  fiat_p256_mulx_u32(&x77, &x78, x1, (arg1[6]));
+  uint32_t x79;
+  uint32_t x80;
+  fiat_p256_mulx_u32(&x79, &x80, x1, (arg1[5]));
+  uint32_t x81;
+  uint32_t x82;
+  fiat_p256_mulx_u32(&x81, &x82, x1, (arg1[4]));
+  uint32_t x83;
+  uint32_t x84;
+  fiat_p256_mulx_u32(&x83, &x84, x1, (arg1[3]));
+  uint32_t x85;
+  uint32_t x86;
+  fiat_p256_mulx_u32(&x85, &x86, x1, (arg1[2]));
+  uint32_t x87;
+  uint32_t x88;
+  fiat_p256_mulx_u32(&x87, &x88, x1, (arg1[1]));
+  uint32_t x89;
+  uint32_t x90;
+  fiat_p256_mulx_u32(&x89, &x90, x1, (arg1[0]));
+  uint32_t x91;
+  fiat_p256_uint1 x92;
+  fiat_p256_addcarryx_u32(&x91, &x92, 0x0, x87, x90);
+  uint32_t x93;
+  fiat_p256_uint1 x94;
+  fiat_p256_addcarryx_u32(&x93, &x94, x92, x85, x88);
+  uint32_t x95;
+  fiat_p256_uint1 x96;
+  fiat_p256_addcarryx_u32(&x95, &x96, x94, x83, x86);
+  uint32_t x97;
+  fiat_p256_uint1 x98;
+  fiat_p256_addcarryx_u32(&x97, &x98, x96, x81, x84);
+  uint32_t x99;
+  fiat_p256_uint1 x100;
+  fiat_p256_addcarryx_u32(&x99, &x100, x98, x79, x82);
+  uint32_t x101;
+  fiat_p256_uint1 x102;
+  fiat_p256_addcarryx_u32(&x101, &x102, x100, x77, x80);
+  uint32_t x103;
+  fiat_p256_uint1 x104;
+  fiat_p256_addcarryx_u32(&x103, &x104, x102, x75, x78);
+  uint32_t x105;
+  fiat_p256_uint1 x106;
+  fiat_p256_addcarryx_u32(&x105, &x106, x104, 0x0, x76);
+  uint32_t x107;
+  fiat_p256_uint1 x108;
+  fiat_p256_addcarryx_u32(&x107, &x108, 0x0, x89, x57);
+  uint32_t x109;
+  fiat_p256_uint1 x110;
+  fiat_p256_addcarryx_u32(&x109, &x110, x108, x91, x59);
+  uint32_t x111;
+  fiat_p256_uint1 x112;
+  fiat_p256_addcarryx_u32(&x111, &x112, x110, x93, x61);
+  uint32_t x113;
+  fiat_p256_uint1 x114;
+  fiat_p256_addcarryx_u32(&x113, &x114, x112, x95, x63);
+  uint32_t x115;
+  fiat_p256_uint1 x116;
+  fiat_p256_addcarryx_u32(&x115, &x116, x114, x97, x65);
+  uint32_t x117;
+  fiat_p256_uint1 x118;
+  fiat_p256_addcarryx_u32(&x117, &x118, x116, x99, x67);
+  uint32_t x119;
+  fiat_p256_uint1 x120;
+  fiat_p256_addcarryx_u32(&x119, &x120, x118, x101, x69);
+  uint32_t x121;
+  fiat_p256_uint1 x122;
+  fiat_p256_addcarryx_u32(&x121, &x122, x120, x103, x71);
+  uint32_t x123;
+  fiat_p256_uint1 x124;
+  fiat_p256_addcarryx_u32(&x123, &x124, x122, x105, (fiat_p256_uint1)x73);
+  uint32_t x125;
+  uint32_t x126;
+  fiat_p256_mulx_u32(&x125, &x126, x107, UINT32_C(0xffffffff));
+  uint32_t x127;
+  uint32_t x128;
+  fiat_p256_mulx_u32(&x127, &x128, x107, UINT32_C(0xffffffff));
+  uint32_t x129;
+  uint32_t x130;
+  fiat_p256_mulx_u32(&x129, &x130, x107, UINT32_C(0xffffffff));
+  uint32_t x131;
+  uint32_t x132;
+  fiat_p256_mulx_u32(&x131, &x132, x107, UINT32_C(0xffffffff));
+  uint32_t x133;
+  fiat_p256_uint1 x134;
+  fiat_p256_addcarryx_u32(&x133, &x134, 0x0, x129, x132);
+  uint32_t x135;
+  fiat_p256_uint1 x136;
+  fiat_p256_addcarryx_u32(&x135, &x136, x134, x127, x130);
+  uint32_t x137;
+  fiat_p256_uint1 x138;
+  fiat_p256_addcarryx_u32(&x137, &x138, x136, 0x0, x128);
+  uint32_t x139;
+  fiat_p256_uint1 x140;
+  fiat_p256_addcarryx_u32(&x139, &x140, 0x0, x131, x107);
+  uint32_t x141;
+  fiat_p256_uint1 x142;
+  fiat_p256_addcarryx_u32(&x141, &x142, x140, x133, x109);
+  uint32_t x143;
+  fiat_p256_uint1 x144;
+  fiat_p256_addcarryx_u32(&x143, &x144, x142, x135, x111);
+  uint32_t x145;
+  fiat_p256_uint1 x146;
+  fiat_p256_addcarryx_u32(&x145, &x146, x144, x137, x113);
+  uint32_t x147;
+  fiat_p256_uint1 x148;
+  fiat_p256_addcarryx_u32(&x147, &x148, x146, 0x0, x115);
+  uint32_t x149;
+  fiat_p256_uint1 x150;
+  fiat_p256_addcarryx_u32(&x149, &x150, x148, 0x0, x117);
+  uint32_t x151;
+  fiat_p256_uint1 x152;
+  fiat_p256_addcarryx_u32(&x151, &x152, x150, x107, x119);
+  uint32_t x153;
+  fiat_p256_uint1 x154;
+  fiat_p256_addcarryx_u32(&x153, &x154, x152, x125, x121);
+  uint32_t x155;
+  fiat_p256_uint1 x156;
+  fiat_p256_addcarryx_u32(&x155, &x156, x154, x126, x123);
+  uint32_t x157;
+  fiat_p256_uint1 x158;
+  fiat_p256_addcarryx_u32(&x157, &x158, x156, 0x0, x124);
+  uint32_t x159;
+  uint32_t x160;
+  fiat_p256_mulx_u32(&x159, &x160, x2, (arg1[7]));
+  uint32_t x161;
+  uint32_t x162;
+  fiat_p256_mulx_u32(&x161, &x162, x2, (arg1[6]));
+  uint32_t x163;
+  uint32_t x164;
+  fiat_p256_mulx_u32(&x163, &x164, x2, (arg1[5]));
+  uint32_t x165;
+  uint32_t x166;
+  fiat_p256_mulx_u32(&x165, &x166, x2, (arg1[4]));
+  uint32_t x167;
+  uint32_t x168;
+  fiat_p256_mulx_u32(&x167, &x168, x2, (arg1[3]));
+  uint32_t x169;
+  uint32_t x170;
+  fiat_p256_mulx_u32(&x169, &x170, x2, (arg1[2]));
+  uint32_t x171;
+  uint32_t x172;
+  fiat_p256_mulx_u32(&x171, &x172, x2, (arg1[1]));
+  uint32_t x173;
+  uint32_t x174;
+  fiat_p256_mulx_u32(&x173, &x174, x2, (arg1[0]));
+  uint32_t x175;
+  fiat_p256_uint1 x176;
+  fiat_p256_addcarryx_u32(&x175, &x176, 0x0, x171, x174);
+  uint32_t x177;
+  fiat_p256_uint1 x178;
+  fiat_p256_addcarryx_u32(&x177, &x178, x176, x169, x172);
+  uint32_t x179;
+  fiat_p256_uint1 x180;
+  fiat_p256_addcarryx_u32(&x179, &x180, x178, x167, x170);
+  uint32_t x181;
+  fiat_p256_uint1 x182;
+  fiat_p256_addcarryx_u32(&x181, &x182, x180, x165, x168);
+  uint32_t x183;
+  fiat_p256_uint1 x184;
+  fiat_p256_addcarryx_u32(&x183, &x184, x182, x163, x166);
+  uint32_t x185;
+  fiat_p256_uint1 x186;
+  fiat_p256_addcarryx_u32(&x185, &x186, x184, x161, x164);
+  uint32_t x187;
+  fiat_p256_uint1 x188;
+  fiat_p256_addcarryx_u32(&x187, &x188, x186, x159, x162);
+  uint32_t x189;
+  fiat_p256_uint1 x190;
+  fiat_p256_addcarryx_u32(&x189, &x190, x188, 0x0, x160);
+  uint32_t x191;
+  fiat_p256_uint1 x192;
+  fiat_p256_addcarryx_u32(&x191, &x192, 0x0, x173, x141);
+  uint32_t x193;
+  fiat_p256_uint1 x194;
+  fiat_p256_addcarryx_u32(&x193, &x194, x192, x175, x143);
+  uint32_t x195;
+  fiat_p256_uint1 x196;
+  fiat_p256_addcarryx_u32(&x195, &x196, x194, x177, x145);
+  uint32_t x197;
+  fiat_p256_uint1 x198;
+  fiat_p256_addcarryx_u32(&x197, &x198, x196, x179, x147);
+  uint32_t x199;
+  fiat_p256_uint1 x200;
+  fiat_p256_addcarryx_u32(&x199, &x200, x198, x181, x149);
+  uint32_t x201;
+  fiat_p256_uint1 x202;
+  fiat_p256_addcarryx_u32(&x201, &x202, x200, x183, x151);
+  uint32_t x203;
+  fiat_p256_uint1 x204;
+  fiat_p256_addcarryx_u32(&x203, &x204, x202, x185, x153);
+  uint32_t x205;
+  fiat_p256_uint1 x206;
+  fiat_p256_addcarryx_u32(&x205, &x206, x204, x187, x155);
+  uint32_t x207;
+  fiat_p256_uint1 x208;
+  fiat_p256_addcarryx_u32(&x207, &x208, x206, x189, x157);
+  uint32_t x209;
+  uint32_t x210;
+  fiat_p256_mulx_u32(&x209, &x210, x191, UINT32_C(0xffffffff));
+  uint32_t x211;
+  uint32_t x212;
+  fiat_p256_mulx_u32(&x211, &x212, x191, UINT32_C(0xffffffff));
+  uint32_t x213;
+  uint32_t x214;
+  fiat_p256_mulx_u32(&x213, &x214, x191, UINT32_C(0xffffffff));
+  uint32_t x215;
+  uint32_t x216;
+  fiat_p256_mulx_u32(&x215, &x216, x191, UINT32_C(0xffffffff));
+  uint32_t x217;
+  fiat_p256_uint1 x218;
+  fiat_p256_addcarryx_u32(&x217, &x218, 0x0, x213, x216);
+  uint32_t x219;
+  fiat_p256_uint1 x220;
+  fiat_p256_addcarryx_u32(&x219, &x220, x218, x211, x214);
+  uint32_t x221;
+  fiat_p256_uint1 x222;
+  fiat_p256_addcarryx_u32(&x221, &x222, x220, 0x0, x212);
+  uint32_t x223;
+  fiat_p256_uint1 x224;
+  fiat_p256_addcarryx_u32(&x223, &x224, 0x0, x215, x191);
+  uint32_t x225;
+  fiat_p256_uint1 x226;
+  fiat_p256_addcarryx_u32(&x225, &x226, x224, x217, x193);
+  uint32_t x227;
+  fiat_p256_uint1 x228;
+  fiat_p256_addcarryx_u32(&x227, &x228, x226, x219, x195);
+  uint32_t x229;
+  fiat_p256_uint1 x230;
+  fiat_p256_addcarryx_u32(&x229, &x230, x228, x221, x197);
+  uint32_t x231;
+  fiat_p256_uint1 x232;
+  fiat_p256_addcarryx_u32(&x231, &x232, x230, 0x0, x199);
+  uint32_t x233;
+  fiat_p256_uint1 x234;
+  fiat_p256_addcarryx_u32(&x233, &x234, x232, 0x0, x201);
+  uint32_t x235;
+  fiat_p256_uint1 x236;
+  fiat_p256_addcarryx_u32(&x235, &x236, x234, x191, x203);
+  uint32_t x237;
+  fiat_p256_uint1 x238;
+  fiat_p256_addcarryx_u32(&x237, &x238, x236, x209, x205);
+  uint32_t x239;
+  fiat_p256_uint1 x240;
+  fiat_p256_addcarryx_u32(&x239, &x240, x238, x210, x207);
+  uint32_t x241;
+  fiat_p256_uint1 x242;
+  fiat_p256_addcarryx_u32(&x241, &x242, x240, 0x0, x208);
+  uint32_t x243;
+  uint32_t x244;
+  fiat_p256_mulx_u32(&x243, &x244, x3, (arg1[7]));
+  uint32_t x245;
+  uint32_t x246;
+  fiat_p256_mulx_u32(&x245, &x246, x3, (arg1[6]));
+  uint32_t x247;
+  uint32_t x248;
+  fiat_p256_mulx_u32(&x247, &x248, x3, (arg1[5]));
+  uint32_t x249;
+  uint32_t x250;
+  fiat_p256_mulx_u32(&x249, &x250, x3, (arg1[4]));
+  uint32_t x251;
+  uint32_t x252;
+  fiat_p256_mulx_u32(&x251, &x252, x3, (arg1[3]));
+  uint32_t x253;
+  uint32_t x254;
+  fiat_p256_mulx_u32(&x253, &x254, x3, (arg1[2]));
+  uint32_t x255;
+  uint32_t x256;
+  fiat_p256_mulx_u32(&x255, &x256, x3, (arg1[1]));
+  uint32_t x257;
+  uint32_t x258;
+  fiat_p256_mulx_u32(&x257, &x258, x3, (arg1[0]));
+  uint32_t x259;
+  fiat_p256_uint1 x260;
+  fiat_p256_addcarryx_u32(&x259, &x260, 0x0, x255, x258);
+  uint32_t x261;
+  fiat_p256_uint1 x262;
+  fiat_p256_addcarryx_u32(&x261, &x262, x260, x253, x256);
+  uint32_t x263;
+  fiat_p256_uint1 x264;
+  fiat_p256_addcarryx_u32(&x263, &x264, x262, x251, x254);
+  uint32_t x265;
+  fiat_p256_uint1 x266;
+  fiat_p256_addcarryx_u32(&x265, &x266, x264, x249, x252);
+  uint32_t x267;
+  fiat_p256_uint1 x268;
+  fiat_p256_addcarryx_u32(&x267, &x268, x266, x247, x250);
+  uint32_t x269;
+  fiat_p256_uint1 x270;
+  fiat_p256_addcarryx_u32(&x269, &x270, x268, x245, x248);
+  uint32_t x271;
+  fiat_p256_uint1 x272;
+  fiat_p256_addcarryx_u32(&x271, &x272, x270, x243, x246);
+  uint32_t x273;
+  fiat_p256_uint1 x274;
+  fiat_p256_addcarryx_u32(&x273, &x274, x272, 0x0, x244);
+  uint32_t x275;
+  fiat_p256_uint1 x276;
+  fiat_p256_addcarryx_u32(&x275, &x276, 0x0, x257, x225);
+  uint32_t x277;
+  fiat_p256_uint1 x278;
+  fiat_p256_addcarryx_u32(&x277, &x278, x276, x259, x227);
+  uint32_t x279;
+  fiat_p256_uint1 x280;
+  fiat_p256_addcarryx_u32(&x279, &x280, x278, x261, x229);
+  uint32_t x281;
+  fiat_p256_uint1 x282;
+  fiat_p256_addcarryx_u32(&x281, &x282, x280, x263, x231);
+  uint32_t x283;
+  fiat_p256_uint1 x284;
+  fiat_p256_addcarryx_u32(&x283, &x284, x282, x265, x233);
+  uint32_t x285;
+  fiat_p256_uint1 x286;
+  fiat_p256_addcarryx_u32(&x285, &x286, x284, x267, x235);
+  uint32_t x287;
+  fiat_p256_uint1 x288;
+  fiat_p256_addcarryx_u32(&x287, &x288, x286, x269, x237);
+  uint32_t x289;
+  fiat_p256_uint1 x290;
+  fiat_p256_addcarryx_u32(&x289, &x290, x288, x271, x239);
+  uint32_t x291;
+  fiat_p256_uint1 x292;
+  fiat_p256_addcarryx_u32(&x291, &x292, x290, x273, x241);
+  uint32_t x293;
+  uint32_t x294;
+  fiat_p256_mulx_u32(&x293, &x294, x275, UINT32_C(0xffffffff));
+  uint32_t x295;
+  uint32_t x296;
+  fiat_p256_mulx_u32(&x295, &x296, x275, UINT32_C(0xffffffff));
+  uint32_t x297;
+  uint32_t x298;
+  fiat_p256_mulx_u32(&x297, &x298, x275, UINT32_C(0xffffffff));
+  uint32_t x299;
+  uint32_t x300;
+  fiat_p256_mulx_u32(&x299, &x300, x275, UINT32_C(0xffffffff));
+  uint32_t x301;
+  fiat_p256_uint1 x302;
+  fiat_p256_addcarryx_u32(&x301, &x302, 0x0, x297, x300);
+  uint32_t x303;
+  fiat_p256_uint1 x304;
+  fiat_p256_addcarryx_u32(&x303, &x304, x302, x295, x298);
+  uint32_t x305;
+  fiat_p256_uint1 x306;
+  fiat_p256_addcarryx_u32(&x305, &x306, x304, 0x0, x296);
+  uint32_t x307;
+  fiat_p256_uint1 x308;
+  fiat_p256_addcarryx_u32(&x307, &x308, 0x0, x299, x275);
+  uint32_t x309;
+  fiat_p256_uint1 x310;
+  fiat_p256_addcarryx_u32(&x309, &x310, x308, x301, x277);
+  uint32_t x311;
+  fiat_p256_uint1 x312;
+  fiat_p256_addcarryx_u32(&x311, &x312, x310, x303, x279);
+  uint32_t x313;
+  fiat_p256_uint1 x314;
+  fiat_p256_addcarryx_u32(&x313, &x314, x312, x305, x281);
+  uint32_t x315;
+  fiat_p256_uint1 x316;
+  fiat_p256_addcarryx_u32(&x315, &x316, x314, 0x0, x283);
+  uint32_t x317;
+  fiat_p256_uint1 x318;
+  fiat_p256_addcarryx_u32(&x317, &x318, x316, 0x0, x285);
+  uint32_t x319;
+  fiat_p256_uint1 x320;
+  fiat_p256_addcarryx_u32(&x319, &x320, x318, x275, x287);
+  uint32_t x321;
+  fiat_p256_uint1 x322;
+  fiat_p256_addcarryx_u32(&x321, &x322, x320, x293, x289);
+  uint32_t x323;
+  fiat_p256_uint1 x324;
+  fiat_p256_addcarryx_u32(&x323, &x324, x322, x294, x291);
+  uint32_t x325;
+  fiat_p256_uint1 x326;
+  fiat_p256_addcarryx_u32(&x325, &x326, x324, 0x0, x292);
+  uint32_t x327;
+  uint32_t x328;
+  fiat_p256_mulx_u32(&x327, &x328, x4, (arg1[7]));
+  uint32_t x329;
+  uint32_t x330;
+  fiat_p256_mulx_u32(&x329, &x330, x4, (arg1[6]));
+  uint32_t x331;
+  uint32_t x332;
+  fiat_p256_mulx_u32(&x331, &x332, x4, (arg1[5]));
+  uint32_t x333;
+  uint32_t x334;
+  fiat_p256_mulx_u32(&x333, &x334, x4, (arg1[4]));
+  uint32_t x335;
+  uint32_t x336;
+  fiat_p256_mulx_u32(&x335, &x336, x4, (arg1[3]));
+  uint32_t x337;
+  uint32_t x338;
+  fiat_p256_mulx_u32(&x337, &x338, x4, (arg1[2]));
+  uint32_t x339;
+  uint32_t x340;
+  fiat_p256_mulx_u32(&x339, &x340, x4, (arg1[1]));
+  uint32_t x341;
+  uint32_t x342;
+  fiat_p256_mulx_u32(&x341, &x342, x4, (arg1[0]));
+  uint32_t x343;
+  fiat_p256_uint1 x344;
+  fiat_p256_addcarryx_u32(&x343, &x344, 0x0, x339, x342);
+  uint32_t x345;
+  fiat_p256_uint1 x346;
+  fiat_p256_addcarryx_u32(&x345, &x346, x344, x337, x340);
+  uint32_t x347;
+  fiat_p256_uint1 x348;
+  fiat_p256_addcarryx_u32(&x347, &x348, x346, x335, x338);
+  uint32_t x349;
+  fiat_p256_uint1 x350;
+  fiat_p256_addcarryx_u32(&x349, &x350, x348, x333, x336);
+  uint32_t x351;
+  fiat_p256_uint1 x352;
+  fiat_p256_addcarryx_u32(&x351, &x352, x350, x331, x334);
+  uint32_t x353;
+  fiat_p256_uint1 x354;
+  fiat_p256_addcarryx_u32(&x353, &x354, x352, x329, x332);
+  uint32_t x355;
+  fiat_p256_uint1 x356;
+  fiat_p256_addcarryx_u32(&x355, &x356, x354, x327, x330);
+  uint32_t x357;
+  fiat_p256_uint1 x358;
+  fiat_p256_addcarryx_u32(&x357, &x358, x356, 0x0, x328);
+  uint32_t x359;
+  fiat_p256_uint1 x360;
+  fiat_p256_addcarryx_u32(&x359, &x360, 0x0, x341, x309);
+  uint32_t x361;
+  fiat_p256_uint1 x362;
+  fiat_p256_addcarryx_u32(&x361, &x362, x360, x343, x311);
+  uint32_t x363;
+  fiat_p256_uint1 x364;
+  fiat_p256_addcarryx_u32(&x363, &x364, x362, x345, x313);
+  uint32_t x365;
+  fiat_p256_uint1 x366;
+  fiat_p256_addcarryx_u32(&x365, &x366, x364, x347, x315);
+  uint32_t x367;
+  fiat_p256_uint1 x368;
+  fiat_p256_addcarryx_u32(&x367, &x368, x366, x349, x317);
+  uint32_t x369;
+  fiat_p256_uint1 x370;
+  fiat_p256_addcarryx_u32(&x369, &x370, x368, x351, x319);
+  uint32_t x371;
+  fiat_p256_uint1 x372;
+  fiat_p256_addcarryx_u32(&x371, &x372, x370, x353, x321);
+  uint32_t x373;
+  fiat_p256_uint1 x374;
+  fiat_p256_addcarryx_u32(&x373, &x374, x372, x355, x323);
+  uint32_t x375;
+  fiat_p256_uint1 x376;
+  fiat_p256_addcarryx_u32(&x375, &x376, x374, x357, x325);
+  uint32_t x377;
+  uint32_t x378;
+  fiat_p256_mulx_u32(&x377, &x378, x359, UINT32_C(0xffffffff));
+  uint32_t x379;
+  uint32_t x380;
+  fiat_p256_mulx_u32(&x379, &x380, x359, UINT32_C(0xffffffff));
+  uint32_t x381;
+  uint32_t x382;
+  fiat_p256_mulx_u32(&x381, &x382, x359, UINT32_C(0xffffffff));
+  uint32_t x383;
+  uint32_t x384;
+  fiat_p256_mulx_u32(&x383, &x384, x359, UINT32_C(0xffffffff));
+  uint32_t x385;
+  fiat_p256_uint1 x386;
+  fiat_p256_addcarryx_u32(&x385, &x386, 0x0, x381, x384);
+  uint32_t x387;
+  fiat_p256_uint1 x388;
+  fiat_p256_addcarryx_u32(&x387, &x388, x386, x379, x382);
+  uint32_t x389;
+  fiat_p256_uint1 x390;
+  fiat_p256_addcarryx_u32(&x389, &x390, x388, 0x0, x380);
+  uint32_t x391;
+  fiat_p256_uint1 x392;
+  fiat_p256_addcarryx_u32(&x391, &x392, 0x0, x383, x359);
+  uint32_t x393;
+  fiat_p256_uint1 x394;
+  fiat_p256_addcarryx_u32(&x393, &x394, x392, x385, x361);
+  uint32_t x395;
+  fiat_p256_uint1 x396;
+  fiat_p256_addcarryx_u32(&x395, &x396, x394, x387, x363);
+  uint32_t x397;
+  fiat_p256_uint1 x398;
+  fiat_p256_addcarryx_u32(&x397, &x398, x396, x389, x365);
+  uint32_t x399;
+  fiat_p256_uint1 x400;
+  fiat_p256_addcarryx_u32(&x399, &x400, x398, 0x0, x367);
+  uint32_t x401;
+  fiat_p256_uint1 x402;
+  fiat_p256_addcarryx_u32(&x401, &x402, x400, 0x0, x369);
+  uint32_t x403;
+  fiat_p256_uint1 x404;
+  fiat_p256_addcarryx_u32(&x403, &x404, x402, x359, x371);
+  uint32_t x405;
+  fiat_p256_uint1 x406;
+  fiat_p256_addcarryx_u32(&x405, &x406, x404, x377, x373);
+  uint32_t x407;
+  fiat_p256_uint1 x408;
+  fiat_p256_addcarryx_u32(&x407, &x408, x406, x378, x375);
+  uint32_t x409;
+  fiat_p256_uint1 x410;
+  fiat_p256_addcarryx_u32(&x409, &x410, x408, 0x0, x376);
+  uint32_t x411;
+  uint32_t x412;
+  fiat_p256_mulx_u32(&x411, &x412, x5, (arg1[7]));
+  uint32_t x413;
+  uint32_t x414;
+  fiat_p256_mulx_u32(&x413, &x414, x5, (arg1[6]));
+  uint32_t x415;
+  uint32_t x416;
+  fiat_p256_mulx_u32(&x415, &x416, x5, (arg1[5]));
+  uint32_t x417;
+  uint32_t x418;
+  fiat_p256_mulx_u32(&x417, &x418, x5, (arg1[4]));
+  uint32_t x419;
+  uint32_t x420;
+  fiat_p256_mulx_u32(&x419, &x420, x5, (arg1[3]));
+  uint32_t x421;
+  uint32_t x422;
+  fiat_p256_mulx_u32(&x421, &x422, x5, (arg1[2]));
+  uint32_t x423;
+  uint32_t x424;
+  fiat_p256_mulx_u32(&x423, &x424, x5, (arg1[1]));
+  uint32_t x425;
+  uint32_t x426;
+  fiat_p256_mulx_u32(&x425, &x426, x5, (arg1[0]));
+  uint32_t x427;
+  fiat_p256_uint1 x428;
+  fiat_p256_addcarryx_u32(&x427, &x428, 0x0, x423, x426);
+  uint32_t x429;
+  fiat_p256_uint1 x430;
+  fiat_p256_addcarryx_u32(&x429, &x430, x428, x421, x424);
+  uint32_t x431;
+  fiat_p256_uint1 x432;
+  fiat_p256_addcarryx_u32(&x431, &x432, x430, x419, x422);
+  uint32_t x433;
+  fiat_p256_uint1 x434;
+  fiat_p256_addcarryx_u32(&x433, &x434, x432, x417, x420);
+  uint32_t x435;
+  fiat_p256_uint1 x436;
+  fiat_p256_addcarryx_u32(&x435, &x436, x434, x415, x418);
+  uint32_t x437;
+  fiat_p256_uint1 x438;
+  fiat_p256_addcarryx_u32(&x437, &x438, x436, x413, x416);
+  uint32_t x439;
+  fiat_p256_uint1 x440;
+  fiat_p256_addcarryx_u32(&x439, &x440, x438, x411, x414);
+  uint32_t x441;
+  fiat_p256_uint1 x442;
+  fiat_p256_addcarryx_u32(&x441, &x442, x440, 0x0, x412);
+  uint32_t x443;
+  fiat_p256_uint1 x444;
+  fiat_p256_addcarryx_u32(&x443, &x444, 0x0, x425, x393);
+  uint32_t x445;
+  fiat_p256_uint1 x446;
+  fiat_p256_addcarryx_u32(&x445, &x446, x444, x427, x395);
+  uint32_t x447;
+  fiat_p256_uint1 x448;
+  fiat_p256_addcarryx_u32(&x447, &x448, x446, x429, x397);
+  uint32_t x449;
+  fiat_p256_uint1 x450;
+  fiat_p256_addcarryx_u32(&x449, &x450, x448, x431, x399);
+  uint32_t x451;
+  fiat_p256_uint1 x452;
+  fiat_p256_addcarryx_u32(&x451, &x452, x450, x433, x401);
+  uint32_t x453;
+  fiat_p256_uint1 x454;
+  fiat_p256_addcarryx_u32(&x453, &x454, x452, x435, x403);
+  uint32_t x455;
+  fiat_p256_uint1 x456;
+  fiat_p256_addcarryx_u32(&x455, &x456, x454, x437, x405);
+  uint32_t x457;
+  fiat_p256_uint1 x458;
+  fiat_p256_addcarryx_u32(&x457, &x458, x456, x439, x407);
+  uint32_t x459;
+  fiat_p256_uint1 x460;
+  fiat_p256_addcarryx_u32(&x459, &x460, x458, x441, x409);
+  uint32_t x461;
+  uint32_t x462;
+  fiat_p256_mulx_u32(&x461, &x462, x443, UINT32_C(0xffffffff));
+  uint32_t x463;
+  uint32_t x464;
+  fiat_p256_mulx_u32(&x463, &x464, x443, UINT32_C(0xffffffff));
+  uint32_t x465;
+  uint32_t x466;
+  fiat_p256_mulx_u32(&x465, &x466, x443, UINT32_C(0xffffffff));
+  uint32_t x467;
+  uint32_t x468;
+  fiat_p256_mulx_u32(&x467, &x468, x443, UINT32_C(0xffffffff));
+  uint32_t x469;
+  fiat_p256_uint1 x470;
+  fiat_p256_addcarryx_u32(&x469, &x470, 0x0, x465, x468);
+  uint32_t x471;
+  fiat_p256_uint1 x472;
+  fiat_p256_addcarryx_u32(&x471, &x472, x470, x463, x466);
+  uint32_t x473;
+  fiat_p256_uint1 x474;
+  fiat_p256_addcarryx_u32(&x473, &x474, x472, 0x0, x464);
+  uint32_t x475;
+  fiat_p256_uint1 x476;
+  fiat_p256_addcarryx_u32(&x475, &x476, 0x0, x467, x443);
+  uint32_t x477;
+  fiat_p256_uint1 x478;
+  fiat_p256_addcarryx_u32(&x477, &x478, x476, x469, x445);
+  uint32_t x479;
+  fiat_p256_uint1 x480;
+  fiat_p256_addcarryx_u32(&x479, &x480, x478, x471, x447);
+  uint32_t x481;
+  fiat_p256_uint1 x482;
+  fiat_p256_addcarryx_u32(&x481, &x482, x480, x473, x449);
+  uint32_t x483;
+  fiat_p256_uint1 x484;
+  fiat_p256_addcarryx_u32(&x483, &x484, x482, 0x0, x451);
+  uint32_t x485;
+  fiat_p256_uint1 x486;
+  fiat_p256_addcarryx_u32(&x485, &x486, x484, 0x0, x453);
+  uint32_t x487;
+  fiat_p256_uint1 x488;
+  fiat_p256_addcarryx_u32(&x487, &x488, x486, x443, x455);
+  uint32_t x489;
+  fiat_p256_uint1 x490;
+  fiat_p256_addcarryx_u32(&x489, &x490, x488, x461, x457);
+  uint32_t x491;
+  fiat_p256_uint1 x492;
+  fiat_p256_addcarryx_u32(&x491, &x492, x490, x462, x459);
+  uint32_t x493;
+  fiat_p256_uint1 x494;
+  fiat_p256_addcarryx_u32(&x493, &x494, x492, 0x0, x460);
+  uint32_t x495;
+  uint32_t x496;
+  fiat_p256_mulx_u32(&x495, &x496, x6, (arg1[7]));
+  uint32_t x497;
+  uint32_t x498;
+  fiat_p256_mulx_u32(&x497, &x498, x6, (arg1[6]));
+  uint32_t x499;
+  uint32_t x500;
+  fiat_p256_mulx_u32(&x499, &x500, x6, (arg1[5]));
+  uint32_t x501;
+  uint32_t x502;
+  fiat_p256_mulx_u32(&x501, &x502, x6, (arg1[4]));
+  uint32_t x503;
+  uint32_t x504;
+  fiat_p256_mulx_u32(&x503, &x504, x6, (arg1[3]));
+  uint32_t x505;
+  uint32_t x506;
+  fiat_p256_mulx_u32(&x505, &x506, x6, (arg1[2]));
+  uint32_t x507;
+  uint32_t x508;
+  fiat_p256_mulx_u32(&x507, &x508, x6, (arg1[1]));
+  uint32_t x509;
+  uint32_t x510;
+  fiat_p256_mulx_u32(&x509, &x510, x6, (arg1[0]));
+  uint32_t x511;
+  fiat_p256_uint1 x512;
+  fiat_p256_addcarryx_u32(&x511, &x512, 0x0, x507, x510);
+  uint32_t x513;
+  fiat_p256_uint1 x514;
+  fiat_p256_addcarryx_u32(&x513, &x514, x512, x505, x508);
+  uint32_t x515;
+  fiat_p256_uint1 x516;
+  fiat_p256_addcarryx_u32(&x515, &x516, x514, x503, x506);
+  uint32_t x517;
+  fiat_p256_uint1 x518;
+  fiat_p256_addcarryx_u32(&x517, &x518, x516, x501, x504);
+  uint32_t x519;
+  fiat_p256_uint1 x520;
+  fiat_p256_addcarryx_u32(&x519, &x520, x518, x499, x502);
+  uint32_t x521;
+  fiat_p256_uint1 x522;
+  fiat_p256_addcarryx_u32(&x521, &x522, x520, x497, x500);
+  uint32_t x523;
+  fiat_p256_uint1 x524;
+  fiat_p256_addcarryx_u32(&x523, &x524, x522, x495, x498);
+  uint32_t x525;
+  fiat_p256_uint1 x526;
+  fiat_p256_addcarryx_u32(&x525, &x526, x524, 0x0, x496);
+  uint32_t x527;
+  fiat_p256_uint1 x528;
+  fiat_p256_addcarryx_u32(&x527, &x528, 0x0, x509, x477);
+  uint32_t x529;
+  fiat_p256_uint1 x530;
+  fiat_p256_addcarryx_u32(&x529, &x530, x528, x511, x479);
+  uint32_t x531;
+  fiat_p256_uint1 x532;
+  fiat_p256_addcarryx_u32(&x531, &x532, x530, x513, x481);
+  uint32_t x533;
+  fiat_p256_uint1 x534;
+  fiat_p256_addcarryx_u32(&x533, &x534, x532, x515, x483);
+  uint32_t x535;
+  fiat_p256_uint1 x536;
+  fiat_p256_addcarryx_u32(&x535, &x536, x534, x517, x485);
+  uint32_t x537;
+  fiat_p256_uint1 x538;
+  fiat_p256_addcarryx_u32(&x537, &x538, x536, x519, x487);
+  uint32_t x539;
+  fiat_p256_uint1 x540;
+  fiat_p256_addcarryx_u32(&x539, &x540, x538, x521, x489);
+  uint32_t x541;
+  fiat_p256_uint1 x542;
+  fiat_p256_addcarryx_u32(&x541, &x542, x540, x523, x491);
+  uint32_t x543;
+  fiat_p256_uint1 x544;
+  fiat_p256_addcarryx_u32(&x543, &x544, x542, x525, x493);
+  uint32_t x545;
+  uint32_t x546;
+  fiat_p256_mulx_u32(&x545, &x546, x527, UINT32_C(0xffffffff));
+  uint32_t x547;
+  uint32_t x548;
+  fiat_p256_mulx_u32(&x547, &x548, x527, UINT32_C(0xffffffff));
+  uint32_t x549;
+  uint32_t x550;
+  fiat_p256_mulx_u32(&x549, &x550, x527, UINT32_C(0xffffffff));
+  uint32_t x551;
+  uint32_t x552;
+  fiat_p256_mulx_u32(&x551, &x552, x527, UINT32_C(0xffffffff));
+  uint32_t x553;
+  fiat_p256_uint1 x554;
+  fiat_p256_addcarryx_u32(&x553, &x554, 0x0, x549, x552);
+  uint32_t x555;
+  fiat_p256_uint1 x556;
+  fiat_p256_addcarryx_u32(&x555, &x556, x554, x547, x550);
+  uint32_t x557;
+  fiat_p256_uint1 x558;
+  fiat_p256_addcarryx_u32(&x557, &x558, x556, 0x0, x548);
+  uint32_t x559;
+  fiat_p256_uint1 x560;
+  fiat_p256_addcarryx_u32(&x559, &x560, 0x0, x551, x527);
+  uint32_t x561;
+  fiat_p256_uint1 x562;
+  fiat_p256_addcarryx_u32(&x561, &x562, x560, x553, x529);
+  uint32_t x563;
+  fiat_p256_uint1 x564;
+  fiat_p256_addcarryx_u32(&x563, &x564, x562, x555, x531);
+  uint32_t x565;
+  fiat_p256_uint1 x566;
+  fiat_p256_addcarryx_u32(&x565, &x566, x564, x557, x533);
+  uint32_t x567;
+  fiat_p256_uint1 x568;
+  fiat_p256_addcarryx_u32(&x567, &x568, x566, 0x0, x535);
+  uint32_t x569;
+  fiat_p256_uint1 x570;
+  fiat_p256_addcarryx_u32(&x569, &x570, x568, 0x0, x537);
+  uint32_t x571;
+  fiat_p256_uint1 x572;
+  fiat_p256_addcarryx_u32(&x571, &x572, x570, x527, x539);
+  uint32_t x573;
+  fiat_p256_uint1 x574;
+  fiat_p256_addcarryx_u32(&x573, &x574, x572, x545, x541);
+  uint32_t x575;
+  fiat_p256_uint1 x576;
+  fiat_p256_addcarryx_u32(&x575, &x576, x574, x546, x543);
+  uint32_t x577;
+  fiat_p256_uint1 x578;
+  fiat_p256_addcarryx_u32(&x577, &x578, x576, 0x0, x544);
+  uint32_t x579;
+  uint32_t x580;
+  fiat_p256_mulx_u32(&x579, &x580, x7, (arg1[7]));
+  uint32_t x581;
+  uint32_t x582;
+  fiat_p256_mulx_u32(&x581, &x582, x7, (arg1[6]));
+  uint32_t x583;
+  uint32_t x584;
+  fiat_p256_mulx_u32(&x583, &x584, x7, (arg1[5]));
+  uint32_t x585;
+  uint32_t x586;
+  fiat_p256_mulx_u32(&x585, &x586, x7, (arg1[4]));
+  uint32_t x587;
+  uint32_t x588;
+  fiat_p256_mulx_u32(&x587, &x588, x7, (arg1[3]));
+  uint32_t x589;
+  uint32_t x590;
+  fiat_p256_mulx_u32(&x589, &x590, x7, (arg1[2]));
+  uint32_t x591;
+  uint32_t x592;
+  fiat_p256_mulx_u32(&x591, &x592, x7, (arg1[1]));
+  uint32_t x593;
+  uint32_t x594;
+  fiat_p256_mulx_u32(&x593, &x594, x7, (arg1[0]));
+  uint32_t x595;
+  fiat_p256_uint1 x596;
+  fiat_p256_addcarryx_u32(&x595, &x596, 0x0, x591, x594);
+  uint32_t x597;
+  fiat_p256_uint1 x598;
+  fiat_p256_addcarryx_u32(&x597, &x598, x596, x589, x592);
+  uint32_t x599;
+  fiat_p256_uint1 x600;
+  fiat_p256_addcarryx_u32(&x599, &x600, x598, x587, x590);
+  uint32_t x601;
+  fiat_p256_uint1 x602;
+  fiat_p256_addcarryx_u32(&x601, &x602, x600, x585, x588);
+  uint32_t x603;
+  fiat_p256_uint1 x604;
+  fiat_p256_addcarryx_u32(&x603, &x604, x602, x583, x586);
+  uint32_t x605;
+  fiat_p256_uint1 x606;
+  fiat_p256_addcarryx_u32(&x605, &x606, x604, x581, x584);
+  uint32_t x607;
+  fiat_p256_uint1 x608;
+  fiat_p256_addcarryx_u32(&x607, &x608, x606, x579, x582);
+  uint32_t x609;
+  fiat_p256_uint1 x610;
+  fiat_p256_addcarryx_u32(&x609, &x610, x608, 0x0, x580);
+  uint32_t x611;
+  fiat_p256_uint1 x612;
+  fiat_p256_addcarryx_u32(&x611, &x612, 0x0, x593, x561);
+  uint32_t x613;
+  fiat_p256_uint1 x614;
+  fiat_p256_addcarryx_u32(&x613, &x614, x612, x595, x563);
+  uint32_t x615;
+  fiat_p256_uint1 x616;
+  fiat_p256_addcarryx_u32(&x615, &x616, x614, x597, x565);
+  uint32_t x617;
+  fiat_p256_uint1 x618;
+  fiat_p256_addcarryx_u32(&x617, &x618, x616, x599, x567);
+  uint32_t x619;
+  fiat_p256_uint1 x620;
+  fiat_p256_addcarryx_u32(&x619, &x620, x618, x601, x569);
+  uint32_t x621;
+  fiat_p256_uint1 x622;
+  fiat_p256_addcarryx_u32(&x621, &x622, x620, x603, x571);
+  uint32_t x623;
+  fiat_p256_uint1 x624;
+  fiat_p256_addcarryx_u32(&x623, &x624, x622, x605, x573);
+  uint32_t x625;
+  fiat_p256_uint1 x626;
+  fiat_p256_addcarryx_u32(&x625, &x626, x624, x607, x575);
+  uint32_t x627;
+  fiat_p256_uint1 x628;
+  fiat_p256_addcarryx_u32(&x627, &x628, x626, x609, x577);
+  uint32_t x629;
+  uint32_t x630;
+  fiat_p256_mulx_u32(&x629, &x630, x611, UINT32_C(0xffffffff));
+  uint32_t x631;
+  uint32_t x632;
+  fiat_p256_mulx_u32(&x631, &x632, x611, UINT32_C(0xffffffff));
+  uint32_t x633;
+  uint32_t x634;
+  fiat_p256_mulx_u32(&x633, &x634, x611, UINT32_C(0xffffffff));
+  uint32_t x635;
+  uint32_t x636;
+  fiat_p256_mulx_u32(&x635, &x636, x611, UINT32_C(0xffffffff));
+  uint32_t x637;
+  fiat_p256_uint1 x638;
+  fiat_p256_addcarryx_u32(&x637, &x638, 0x0, x633, x636);
+  uint32_t x639;
+  fiat_p256_uint1 x640;
+  fiat_p256_addcarryx_u32(&x639, &x640, x638, x631, x634);
+  uint32_t x641;
+  fiat_p256_uint1 x642;
+  fiat_p256_addcarryx_u32(&x641, &x642, x640, 0x0, x632);
+  uint32_t x643;
+  fiat_p256_uint1 x644;
+  fiat_p256_addcarryx_u32(&x643, &x644, 0x0, x635, x611);
+  uint32_t x645;
+  fiat_p256_uint1 x646;
+  fiat_p256_addcarryx_u32(&x645, &x646, x644, x637, x613);
+  uint32_t x647;
+  fiat_p256_uint1 x648;
+  fiat_p256_addcarryx_u32(&x647, &x648, x646, x639, x615);
+  uint32_t x649;
+  fiat_p256_uint1 x650;
+  fiat_p256_addcarryx_u32(&x649, &x650, x648, x641, x617);
+  uint32_t x651;
+  fiat_p256_uint1 x652;
+  fiat_p256_addcarryx_u32(&x651, &x652, x650, 0x0, x619);
+  uint32_t x653;
+  fiat_p256_uint1 x654;
+  fiat_p256_addcarryx_u32(&x653, &x654, x652, 0x0, x621);
+  uint32_t x655;
+  fiat_p256_uint1 x656;
+  fiat_p256_addcarryx_u32(&x655, &x656, x654, x611, x623);
+  uint32_t x657;
+  fiat_p256_uint1 x658;
+  fiat_p256_addcarryx_u32(&x657, &x658, x656, x629, x625);
+  uint32_t x659;
+  fiat_p256_uint1 x660;
+  fiat_p256_addcarryx_u32(&x659, &x660, x658, x630, x627);
+  uint32_t x661;
+  fiat_p256_uint1 x662;
+  fiat_p256_addcarryx_u32(&x661, &x662, x660, 0x0, x628);
+  uint32_t x663;
+  fiat_p256_uint1 x664;
+  fiat_p256_subborrowx_u32(&x663, &x664, 0x0, x645, UINT32_C(0xffffffff));
+  uint32_t x665;
+  fiat_p256_uint1 x666;
+  fiat_p256_subborrowx_u32(&x665, &x666, x664, x647, UINT32_C(0xffffffff));
+  uint32_t x667;
+  fiat_p256_uint1 x668;
+  fiat_p256_subborrowx_u32(&x667, &x668, x666, x649, UINT32_C(0xffffffff));
+  uint32_t x669;
+  fiat_p256_uint1 x670;
+  fiat_p256_subborrowx_u32(&x669, &x670, x668, x651, 0x0);
+  uint32_t x671;
+  fiat_p256_uint1 x672;
+  fiat_p256_subborrowx_u32(&x671, &x672, x670, x653, 0x0);
+  uint32_t x673;
+  fiat_p256_uint1 x674;
+  fiat_p256_subborrowx_u32(&x673, &x674, x672, x655, 0x0);
+  uint32_t x675;
+  fiat_p256_uint1 x676;
+  fiat_p256_subborrowx_u32(&x675, &x676, x674, x657, 0x1);
+  uint32_t x677;
+  fiat_p256_uint1 x678;
+  fiat_p256_subborrowx_u32(&x677, &x678, x676, x659, UINT32_C(0xffffffff));
+  uint32_t x679;
+  fiat_p256_uint1 x680;
+  fiat_p256_subborrowx_u32(&x679, &x680, x678, x661, 0x0);
+  uint32_t x681;
+  fiat_p256_cmovznz_u32(&x681, x680, x663, x645);
+  uint32_t x682;
+  fiat_p256_cmovznz_u32(&x682, x680, x665, x647);
+  uint32_t x683;
+  fiat_p256_cmovznz_u32(&x683, x680, x667, x649);
+  uint32_t x684;
+  fiat_p256_cmovznz_u32(&x684, x680, x669, x651);
+  uint32_t x685;
+  fiat_p256_cmovznz_u32(&x685, x680, x671, x653);
+  uint32_t x686;
+  fiat_p256_cmovznz_u32(&x686, x680, x673, x655);
+  uint32_t x687;
+  fiat_p256_cmovznz_u32(&x687, x680, x675, x657);
+  uint32_t x688;
+  fiat_p256_cmovznz_u32(&x688, x680, x677, x659);
+  out1[0] = x681;
+  out1[1] = x682;
+  out1[2] = x683;
+  out1[3] = x684;
+  out1[4] = x685;
+  out1[5] = x686;
+  out1[6] = x687;
+  out1[7] = x688;
+}
+
+/*
+ * Input Bounds:
+ *   arg1: [[0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff]]
+ *   arg2: [[0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff]]
+ * Output Bounds:
+ *   out1: [[0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff]]
+ */
+static void fiat_p256_add(uint32_t out1[8], const uint32_t arg1[8], const uint32_t arg2[8]) {
+  uint32_t x1;
+  fiat_p256_uint1 x2;
+  fiat_p256_addcarryx_u32(&x1, &x2, 0x0, (arg2[0]), (arg1[0]));
+  uint32_t x3;
+  fiat_p256_uint1 x4;
+  fiat_p256_addcarryx_u32(&x3, &x4, x2, (arg2[1]), (arg1[1]));
+  uint32_t x5;
+  fiat_p256_uint1 x6;
+  fiat_p256_addcarryx_u32(&x5, &x6, x4, (arg2[2]), (arg1[2]));
+  uint32_t x7;
+  fiat_p256_uint1 x8;
+  fiat_p256_addcarryx_u32(&x7, &x8, x6, (arg2[3]), (arg1[3]));
+  uint32_t x9;
+  fiat_p256_uint1 x10;
+  fiat_p256_addcarryx_u32(&x9, &x10, x8, (arg2[4]), (arg1[4]));
+  uint32_t x11;
+  fiat_p256_uint1 x12;
+  fiat_p256_addcarryx_u32(&x11, &x12, x10, (arg2[5]), (arg1[5]));
+  uint32_t x13;
+  fiat_p256_uint1 x14;
+  fiat_p256_addcarryx_u32(&x13, &x14, x12, (arg2[6]), (arg1[6]));
+  uint32_t x15;
+  fiat_p256_uint1 x16;
+  fiat_p256_addcarryx_u32(&x15, &x16, x14, (arg2[7]), (arg1[7]));
+  uint32_t x17;
+  fiat_p256_uint1 x18;
+  fiat_p256_subborrowx_u32(&x17, &x18, 0x0, x1, UINT32_C(0xffffffff));
+  uint32_t x19;
+  fiat_p256_uint1 x20;
+  fiat_p256_subborrowx_u32(&x19, &x20, x18, x3, UINT32_C(0xffffffff));
+  uint32_t x21;
+  fiat_p256_uint1 x22;
+  fiat_p256_subborrowx_u32(&x21, &x22, x20, x5, UINT32_C(0xffffffff));
+  uint32_t x23;
+  fiat_p256_uint1 x24;
+  fiat_p256_subborrowx_u32(&x23, &x24, x22, x7, 0x0);
+  uint32_t x25;
+  fiat_p256_uint1 x26;
+  fiat_p256_subborrowx_u32(&x25, &x26, x24, x9, 0x0);
+  uint32_t x27;
+  fiat_p256_uint1 x28;
+  fiat_p256_subborrowx_u32(&x27, &x28, x26, x11, 0x0);
+  uint32_t x29;
+  fiat_p256_uint1 x30;
+  fiat_p256_subborrowx_u32(&x29, &x30, x28, x13, 0x1);
+  uint32_t x31;
+  fiat_p256_uint1 x32;
+  fiat_p256_subborrowx_u32(&x31, &x32, x30, x15, UINT32_C(0xffffffff));
+  uint32_t x33;
+  fiat_p256_uint1 x34;
+  fiat_p256_subborrowx_u32(&x33, &x34, x32, x16, 0x0);
+  uint32_t x35;
+  fiat_p256_cmovznz_u32(&x35, x34, x17, x1);
+  uint32_t x36;
+  fiat_p256_cmovznz_u32(&x36, x34, x19, x3);
+  uint32_t x37;
+  fiat_p256_cmovznz_u32(&x37, x34, x21, x5);
+  uint32_t x38;
+  fiat_p256_cmovznz_u32(&x38, x34, x23, x7);
+  uint32_t x39;
+  fiat_p256_cmovznz_u32(&x39, x34, x25, x9);
+  uint32_t x40;
+  fiat_p256_cmovznz_u32(&x40, x34, x27, x11);
+  uint32_t x41;
+  fiat_p256_cmovznz_u32(&x41, x34, x29, x13);
+  uint32_t x42;
+  fiat_p256_cmovznz_u32(&x42, x34, x31, x15);
+  out1[0] = x35;
+  out1[1] = x36;
+  out1[2] = x37;
+  out1[3] = x38;
+  out1[4] = x39;
+  out1[5] = x40;
+  out1[6] = x41;
+  out1[7] = x42;
+}
+
+/*
+ * Input Bounds:
+ *   arg1: [[0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff]]
+ *   arg2: [[0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff]]
+ * Output Bounds:
+ *   out1: [[0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff]]
+ */
+static void fiat_p256_sub(uint32_t out1[8], const uint32_t arg1[8], const uint32_t arg2[8]) {
+  uint32_t x1;
+  fiat_p256_uint1 x2;
+  fiat_p256_subborrowx_u32(&x1, &x2, 0x0, (arg1[0]), (arg2[0]));
+  uint32_t x3;
+  fiat_p256_uint1 x4;
+  fiat_p256_subborrowx_u32(&x3, &x4, x2, (arg1[1]), (arg2[1]));
+  uint32_t x5;
+  fiat_p256_uint1 x6;
+  fiat_p256_subborrowx_u32(&x5, &x6, x4, (arg1[2]), (arg2[2]));
+  uint32_t x7;
+  fiat_p256_uint1 x8;
+  fiat_p256_subborrowx_u32(&x7, &x8, x6, (arg1[3]), (arg2[3]));
+  uint32_t x9;
+  fiat_p256_uint1 x10;
+  fiat_p256_subborrowx_u32(&x9, &x10, x8, (arg1[4]), (arg2[4]));
+  uint32_t x11;
+  fiat_p256_uint1 x12;
+  fiat_p256_subborrowx_u32(&x11, &x12, x10, (arg1[5]), (arg2[5]));
+  uint32_t x13;
+  fiat_p256_uint1 x14;
+  fiat_p256_subborrowx_u32(&x13, &x14, x12, (arg1[6]), (arg2[6]));
+  uint32_t x15;
+  fiat_p256_uint1 x16;
+  fiat_p256_subborrowx_u32(&x15, &x16, x14, (arg1[7]), (arg2[7]));
+  uint32_t x17;
+  fiat_p256_cmovznz_u32(&x17, x16, 0x0, UINT32_C(0xffffffff));
+  uint32_t x18;
+  fiat_p256_uint1 x19;
+  fiat_p256_addcarryx_u32(&x18, &x19, 0x0, (x17 & UINT32_C(0xffffffff)), x1);
+  uint32_t x20;
+  fiat_p256_uint1 x21;
+  fiat_p256_addcarryx_u32(&x20, &x21, x19, (x17 & UINT32_C(0xffffffff)), x3);
+  uint32_t x22;
+  fiat_p256_uint1 x23;
+  fiat_p256_addcarryx_u32(&x22, &x23, x21, (x17 & UINT32_C(0xffffffff)), x5);
+  uint32_t x24;
+  fiat_p256_uint1 x25;
+  fiat_p256_addcarryx_u32(&x24, &x25, x23, 0x0, x7);
+  uint32_t x26;
+  fiat_p256_uint1 x27;
+  fiat_p256_addcarryx_u32(&x26, &x27, x25, 0x0, x9);
+  uint32_t x28;
+  fiat_p256_uint1 x29;
+  fiat_p256_addcarryx_u32(&x28, &x29, x27, 0x0, x11);
+  uint32_t x30;
+  fiat_p256_uint1 x31;
+  fiat_p256_addcarryx_u32(&x30, &x31, x29, (fiat_p256_uint1)(x17 & 0x1), x13);
+  uint32_t x32;
+  fiat_p256_uint1 x33;
+  fiat_p256_addcarryx_u32(&x32, &x33, x31, (x17 & UINT32_C(0xffffffff)), x15);
+  out1[0] = x18;
+  out1[1] = x20;
+  out1[2] = x22;
+  out1[3] = x24;
+  out1[4] = x26;
+  out1[5] = x28;
+  out1[6] = x30;
+  out1[7] = x32;
+}
+
+/*
+ * Input Bounds:
+ *   arg1: [[0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff]]
+ * Output Bounds:
+ *   out1: [[0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff]]
+ */
+static void fiat_p256_opp(uint32_t out1[8], const uint32_t arg1[8]) {
+  uint32_t x1;
+  fiat_p256_uint1 x2;
+  fiat_p256_subborrowx_u32(&x1, &x2, 0x0, 0x0, (arg1[0]));
+  uint32_t x3;
+  fiat_p256_uint1 x4;
+  fiat_p256_subborrowx_u32(&x3, &x4, x2, 0x0, (arg1[1]));
+  uint32_t x5;
+  fiat_p256_uint1 x6;
+  fiat_p256_subborrowx_u32(&x5, &x6, x4, 0x0, (arg1[2]));
+  uint32_t x7;
+  fiat_p256_uint1 x8;
+  fiat_p256_subborrowx_u32(&x7, &x8, x6, 0x0, (arg1[3]));
+  uint32_t x9;
+  fiat_p256_uint1 x10;
+  fiat_p256_subborrowx_u32(&x9, &x10, x8, 0x0, (arg1[4]));
+  uint32_t x11;
+  fiat_p256_uint1 x12;
+  fiat_p256_subborrowx_u32(&x11, &x12, x10, 0x0, (arg1[5]));
+  uint32_t x13;
+  fiat_p256_uint1 x14;
+  fiat_p256_subborrowx_u32(&x13, &x14, x12, 0x0, (arg1[6]));
+  uint32_t x15;
+  fiat_p256_uint1 x16;
+  fiat_p256_subborrowx_u32(&x15, &x16, x14, 0x0, (arg1[7]));
+  uint32_t x17;
+  fiat_p256_cmovznz_u32(&x17, x16, 0x0, UINT32_C(0xffffffff));
+  uint32_t x18;
+  fiat_p256_uint1 x19;
+  fiat_p256_addcarryx_u32(&x18, &x19, 0x0, (x17 & UINT32_C(0xffffffff)), x1);
+  uint32_t x20;
+  fiat_p256_uint1 x21;
+  fiat_p256_addcarryx_u32(&x20, &x21, x19, (x17 & UINT32_C(0xffffffff)), x3);
+  uint32_t x22;
+  fiat_p256_uint1 x23;
+  fiat_p256_addcarryx_u32(&x22, &x23, x21, (x17 & UINT32_C(0xffffffff)), x5);
+  uint32_t x24;
+  fiat_p256_uint1 x25;
+  fiat_p256_addcarryx_u32(&x24, &x25, x23, 0x0, x7);
+  uint32_t x26;
+  fiat_p256_uint1 x27;
+  fiat_p256_addcarryx_u32(&x26, &x27, x25, 0x0, x9);
+  uint32_t x28;
+  fiat_p256_uint1 x29;
+  fiat_p256_addcarryx_u32(&x28, &x29, x27, 0x0, x11);
+  uint32_t x30;
+  fiat_p256_uint1 x31;
+  fiat_p256_addcarryx_u32(&x30, &x31, x29, (fiat_p256_uint1)(x17 & 0x1), x13);
+  uint32_t x32;
+  fiat_p256_uint1 x33;
+  fiat_p256_addcarryx_u32(&x32, &x33, x31, (x17 & UINT32_C(0xffffffff)), x15);
+  out1[0] = x18;
+  out1[1] = x20;
+  out1[2] = x22;
+  out1[3] = x24;
+  out1[4] = x26;
+  out1[5] = x28;
+  out1[6] = x30;
+  out1[7] = x32;
+}
+
+/*
+ * Input Bounds:
+ *   arg1: [[0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff]]
+ * Output Bounds:
+ *   out1: [[0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff]]
+ */
+static void fiat_p256_from_montgomery(uint32_t out1[8], const uint32_t arg1[8]) {
+  uint32_t x1 = (arg1[0]);
+  uint32_t x2;
+  uint32_t x3;
+  fiat_p256_mulx_u32(&x2, &x3, x1, UINT32_C(0xffffffff));
+  uint32_t x4;
+  uint32_t x5;
+  fiat_p256_mulx_u32(&x4, &x5, x1, UINT32_C(0xffffffff));
+  uint32_t x6;
+  uint32_t x7;
+  fiat_p256_mulx_u32(&x6, &x7, x1, UINT32_C(0xffffffff));
+  uint32_t x8;
+  uint32_t x9;
+  fiat_p256_mulx_u32(&x8, &x9, x1, UINT32_C(0xffffffff));
+  uint32_t x10;
+  fiat_p256_uint1 x11;
+  fiat_p256_addcarryx_u32(&x10, &x11, 0x0, x6, x9);
+  uint32_t x12;
+  fiat_p256_uint1 x13;
+  fiat_p256_addcarryx_u32(&x12, &x13, x11, x4, x7);
+  uint32_t x14;
+  fiat_p256_uint1 x15;
+  fiat_p256_addcarryx_u32(&x14, &x15, 0x0, x8, x1);
+  uint32_t x16;
+  fiat_p256_uint1 x17;
+  fiat_p256_addcarryx_u32(&x16, &x17, x15, x10, 0x0);
+  uint32_t x18;
+  fiat_p256_uint1 x19;
+  fiat_p256_addcarryx_u32(&x18, &x19, x17, x12, 0x0);
+  uint32_t x20;
+  fiat_p256_uint1 x21;
+  fiat_p256_addcarryx_u32(&x20, &x21, x13, 0x0, x5);
+  uint32_t x22;
+  fiat_p256_uint1 x23;
+  fiat_p256_addcarryx_u32(&x22, &x23, x19, x20, 0x0);
+  uint32_t x24;
+  fiat_p256_uint1 x25;
+  fiat_p256_addcarryx_u32(&x24, &x25, 0x0, (arg1[1]), x16);
+  uint32_t x26;
+  fiat_p256_uint1 x27;
+  fiat_p256_addcarryx_u32(&x26, &x27, x25, 0x0, x18);
+  uint32_t x28;
+  fiat_p256_uint1 x29;
+  fiat_p256_addcarryx_u32(&x28, &x29, x27, 0x0, x22);
+  uint32_t x30;
+  uint32_t x31;
+  fiat_p256_mulx_u32(&x30, &x31, x24, UINT32_C(0xffffffff));
+  uint32_t x32;
+  uint32_t x33;
+  fiat_p256_mulx_u32(&x32, &x33, x24, UINT32_C(0xffffffff));
+  uint32_t x34;
+  uint32_t x35;
+  fiat_p256_mulx_u32(&x34, &x35, x24, UINT32_C(0xffffffff));
+  uint32_t x36;
+  uint32_t x37;
+  fiat_p256_mulx_u32(&x36, &x37, x24, UINT32_C(0xffffffff));
+  uint32_t x38;
+  fiat_p256_uint1 x39;
+  fiat_p256_addcarryx_u32(&x38, &x39, 0x0, x34, x37);
+  uint32_t x40;
+  fiat_p256_uint1 x41;
+  fiat_p256_addcarryx_u32(&x40, &x41, x39, x32, x35);
+  uint32_t x42;
+  fiat_p256_uint1 x43;
+  fiat_p256_addcarryx_u32(&x42, &x43, 0x0, x36, x24);
+  uint32_t x44;
+  fiat_p256_uint1 x45;
+  fiat_p256_addcarryx_u32(&x44, &x45, x43, x38, x26);
+  uint32_t x46;
+  fiat_p256_uint1 x47;
+  fiat_p256_addcarryx_u32(&x46, &x47, x45, x40, x28);
+  uint32_t x48;
+  fiat_p256_uint1 x49;
+  fiat_p256_addcarryx_u32(&x48, &x49, x23, 0x0, 0x0);
+  uint32_t x50;
+  fiat_p256_uint1 x51;
+  fiat_p256_addcarryx_u32(&x50, &x51, x29, 0x0, (fiat_p256_uint1)x48);
+  uint32_t x52;
+  fiat_p256_uint1 x53;
+  fiat_p256_addcarryx_u32(&x52, &x53, x41, 0x0, x33);
+  uint32_t x54;
+  fiat_p256_uint1 x55;
+  fiat_p256_addcarryx_u32(&x54, &x55, x47, x52, x50);
+  uint32_t x56;
+  fiat_p256_uint1 x57;
+  fiat_p256_addcarryx_u32(&x56, &x57, 0x0, x24, x2);
+  uint32_t x58;
+  fiat_p256_uint1 x59;
+  fiat_p256_addcarryx_u32(&x58, &x59, x57, x30, x3);
+  uint32_t x60;
+  fiat_p256_uint1 x61;
+  fiat_p256_addcarryx_u32(&x60, &x61, 0x0, (arg1[2]), x44);
+  uint32_t x62;
+  fiat_p256_uint1 x63;
+  fiat_p256_addcarryx_u32(&x62, &x63, x61, 0x0, x46);
+  uint32_t x64;
+  fiat_p256_uint1 x65;
+  fiat_p256_addcarryx_u32(&x64, &x65, x63, 0x0, x54);
+  uint32_t x66;
+  uint32_t x67;
+  fiat_p256_mulx_u32(&x66, &x67, x60, UINT32_C(0xffffffff));
+  uint32_t x68;
+  uint32_t x69;
+  fiat_p256_mulx_u32(&x68, &x69, x60, UINT32_C(0xffffffff));
+  uint32_t x70;
+  uint32_t x71;
+  fiat_p256_mulx_u32(&x70, &x71, x60, UINT32_C(0xffffffff));
+  uint32_t x72;
+  uint32_t x73;
+  fiat_p256_mulx_u32(&x72, &x73, x60, UINT32_C(0xffffffff));
+  uint32_t x74;
+  fiat_p256_uint1 x75;
+  fiat_p256_addcarryx_u32(&x74, &x75, 0x0, x70, x73);
+  uint32_t x76;
+  fiat_p256_uint1 x77;
+  fiat_p256_addcarryx_u32(&x76, &x77, x75, x68, x71);
+  uint32_t x78;
+  fiat_p256_uint1 x79;
+  fiat_p256_addcarryx_u32(&x78, &x79, 0x0, x72, x60);
+  uint32_t x80;
+  fiat_p256_uint1 x81;
+  fiat_p256_addcarryx_u32(&x80, &x81, x79, x74, x62);
+  uint32_t x82;
+  fiat_p256_uint1 x83;
+  fiat_p256_addcarryx_u32(&x82, &x83, x81, x76, x64);
+  uint32_t x84;
+  fiat_p256_uint1 x85;
+  fiat_p256_addcarryx_u32(&x84, &x85, x55, 0x0, 0x0);
+  uint32_t x86;
+  fiat_p256_uint1 x87;
+  fiat_p256_addcarryx_u32(&x86, &x87, x65, 0x0, (fiat_p256_uint1)x84);
+  uint32_t x88;
+  fiat_p256_uint1 x89;
+  fiat_p256_addcarryx_u32(&x88, &x89, x77, 0x0, x69);
+  uint32_t x90;
+  fiat_p256_uint1 x91;
+  fiat_p256_addcarryx_u32(&x90, &x91, x83, x88, x86);
+  uint32_t x92;
+  fiat_p256_uint1 x93;
+  fiat_p256_addcarryx_u32(&x92, &x93, x91, 0x0, x1);
+  uint32_t x94;
+  fiat_p256_uint1 x95;
+  fiat_p256_addcarryx_u32(&x94, &x95, x93, 0x0, x56);
+  uint32_t x96;
+  fiat_p256_uint1 x97;
+  fiat_p256_addcarryx_u32(&x96, &x97, x95, x60, x58);
+  uint32_t x98;
+  fiat_p256_uint1 x99;
+  fiat_p256_addcarryx_u32(&x98, &x99, x59, x31, 0x0);
+  uint32_t x100;
+  fiat_p256_uint1 x101;
+  fiat_p256_addcarryx_u32(&x100, &x101, x97, x66, x98);
+  uint32_t x102;
+  fiat_p256_uint1 x103;
+  fiat_p256_addcarryx_u32(&x102, &x103, 0x0, (arg1[3]), x80);
+  uint32_t x104;
+  fiat_p256_uint1 x105;
+  fiat_p256_addcarryx_u32(&x104, &x105, x103, 0x0, x82);
+  uint32_t x106;
+  fiat_p256_uint1 x107;
+  fiat_p256_addcarryx_u32(&x106, &x107, x105, 0x0, x90);
+  uint32_t x108;
+  fiat_p256_uint1 x109;
+  fiat_p256_addcarryx_u32(&x108, &x109, x107, 0x0, x92);
+  uint32_t x110;
+  fiat_p256_uint1 x111;
+  fiat_p256_addcarryx_u32(&x110, &x111, x109, 0x0, x94);
+  uint32_t x112;
+  fiat_p256_uint1 x113;
+  fiat_p256_addcarryx_u32(&x112, &x113, x111, 0x0, x96);
+  uint32_t x114;
+  fiat_p256_uint1 x115;
+  fiat_p256_addcarryx_u32(&x114, &x115, x113, 0x0, x100);
+  uint32_t x116;
+  fiat_p256_uint1 x117;
+  fiat_p256_addcarryx_u32(&x116, &x117, x101, x67, 0x0);
+  uint32_t x118;
+  fiat_p256_uint1 x119;
+  fiat_p256_addcarryx_u32(&x118, &x119, x115, 0x0, x116);
+  uint32_t x120;
+  uint32_t x121;
+  fiat_p256_mulx_u32(&x120, &x121, x102, UINT32_C(0xffffffff));
+  uint32_t x122;
+  uint32_t x123;
+  fiat_p256_mulx_u32(&x122, &x123, x102, UINT32_C(0xffffffff));
+  uint32_t x124;
+  uint32_t x125;
+  fiat_p256_mulx_u32(&x124, &x125, x102, UINT32_C(0xffffffff));
+  uint32_t x126;
+  uint32_t x127;
+  fiat_p256_mulx_u32(&x126, &x127, x102, UINT32_C(0xffffffff));
+  uint32_t x128;
+  fiat_p256_uint1 x129;
+  fiat_p256_addcarryx_u32(&x128, &x129, 0x0, x124, x127);
+  uint32_t x130;
+  fiat_p256_uint1 x131;
+  fiat_p256_addcarryx_u32(&x130, &x131, x129, x122, x125);
+  uint32_t x132;
+  fiat_p256_uint1 x133;
+  fiat_p256_addcarryx_u32(&x132, &x133, 0x0, x126, x102);
+  uint32_t x134;
+  fiat_p256_uint1 x135;
+  fiat_p256_addcarryx_u32(&x134, &x135, x133, x128, x104);
+  uint32_t x136;
+  fiat_p256_uint1 x137;
+  fiat_p256_addcarryx_u32(&x136, &x137, x135, x130, x106);
+  uint32_t x138;
+  fiat_p256_uint1 x139;
+  fiat_p256_addcarryx_u32(&x138, &x139, x131, 0x0, x123);
+  uint32_t x140;
+  fiat_p256_uint1 x141;
+  fiat_p256_addcarryx_u32(&x140, &x141, x137, x138, x108);
+  uint32_t x142;
+  fiat_p256_uint1 x143;
+  fiat_p256_addcarryx_u32(&x142, &x143, x141, 0x0, x110);
+  uint32_t x144;
+  fiat_p256_uint1 x145;
+  fiat_p256_addcarryx_u32(&x144, &x145, x143, 0x0, x112);
+  uint32_t x146;
+  fiat_p256_uint1 x147;
+  fiat_p256_addcarryx_u32(&x146, &x147, x145, x102, x114);
+  uint32_t x148;
+  fiat_p256_uint1 x149;
+  fiat_p256_addcarryx_u32(&x148, &x149, x147, x120, x118);
+  uint32_t x150;
+  fiat_p256_uint1 x151;
+  fiat_p256_addcarryx_u32(&x150, &x151, x119, 0x0, 0x0);
+  uint32_t x152;
+  fiat_p256_uint1 x153;
+  fiat_p256_addcarryx_u32(&x152, &x153, x149, x121, (fiat_p256_uint1)x150);
+  uint32_t x154;
+  fiat_p256_uint1 x155;
+  fiat_p256_addcarryx_u32(&x154, &x155, 0x0, (arg1[4]), x134);
+  uint32_t x156;
+  fiat_p256_uint1 x157;
+  fiat_p256_addcarryx_u32(&x156, &x157, x155, 0x0, x136);
+  uint32_t x158;
+  fiat_p256_uint1 x159;
+  fiat_p256_addcarryx_u32(&x158, &x159, x157, 0x0, x140);
+  uint32_t x160;
+  fiat_p256_uint1 x161;
+  fiat_p256_addcarryx_u32(&x160, &x161, x159, 0x0, x142);
+  uint32_t x162;
+  fiat_p256_uint1 x163;
+  fiat_p256_addcarryx_u32(&x162, &x163, x161, 0x0, x144);
+  uint32_t x164;
+  fiat_p256_uint1 x165;
+  fiat_p256_addcarryx_u32(&x164, &x165, x163, 0x0, x146);
+  uint32_t x166;
+  fiat_p256_uint1 x167;
+  fiat_p256_addcarryx_u32(&x166, &x167, x165, 0x0, x148);
+  uint32_t x168;
+  fiat_p256_uint1 x169;
+  fiat_p256_addcarryx_u32(&x168, &x169, x167, 0x0, x152);
+  uint32_t x170;
+  uint32_t x171;
+  fiat_p256_mulx_u32(&x170, &x171, x154, UINT32_C(0xffffffff));
+  uint32_t x172;
+  uint32_t x173;
+  fiat_p256_mulx_u32(&x172, &x173, x154, UINT32_C(0xffffffff));
+  uint32_t x174;
+  uint32_t x175;
+  fiat_p256_mulx_u32(&x174, &x175, x154, UINT32_C(0xffffffff));
+  uint32_t x176;
+  uint32_t x177;
+  fiat_p256_mulx_u32(&x176, &x177, x154, UINT32_C(0xffffffff));
+  uint32_t x178;
+  fiat_p256_uint1 x179;
+  fiat_p256_addcarryx_u32(&x178, &x179, 0x0, x174, x177);
+  uint32_t x180;
+  fiat_p256_uint1 x181;
+  fiat_p256_addcarryx_u32(&x180, &x181, x179, x172, x175);
+  uint32_t x182;
+  fiat_p256_uint1 x183;
+  fiat_p256_addcarryx_u32(&x182, &x183, 0x0, x176, x154);
+  uint32_t x184;
+  fiat_p256_uint1 x185;
+  fiat_p256_addcarryx_u32(&x184, &x185, x183, x178, x156);
+  uint32_t x186;
+  fiat_p256_uint1 x187;
+  fiat_p256_addcarryx_u32(&x186, &x187, x185, x180, x158);
+  uint32_t x188;
+  fiat_p256_uint1 x189;
+  fiat_p256_addcarryx_u32(&x188, &x189, x181, 0x0, x173);
+  uint32_t x190;
+  fiat_p256_uint1 x191;
+  fiat_p256_addcarryx_u32(&x190, &x191, x187, x188, x160);
+  uint32_t x192;
+  fiat_p256_uint1 x193;
+  fiat_p256_addcarryx_u32(&x192, &x193, x191, 0x0, x162);
+  uint32_t x194;
+  fiat_p256_uint1 x195;
+  fiat_p256_addcarryx_u32(&x194, &x195, x193, 0x0, x164);
+  uint32_t x196;
+  fiat_p256_uint1 x197;
+  fiat_p256_addcarryx_u32(&x196, &x197, x195, x154, x166);
+  uint32_t x198;
+  fiat_p256_uint1 x199;
+  fiat_p256_addcarryx_u32(&x198, &x199, x197, x170, x168);
+  uint32_t x200;
+  fiat_p256_uint1 x201;
+  fiat_p256_addcarryx_u32(&x200, &x201, x153, 0x0, 0x0);
+  uint32_t x202;
+  fiat_p256_uint1 x203;
+  fiat_p256_addcarryx_u32(&x202, &x203, x169, 0x0, (fiat_p256_uint1)x200);
+  uint32_t x204;
+  fiat_p256_uint1 x205;
+  fiat_p256_addcarryx_u32(&x204, &x205, x199, x171, x202);
+  uint32_t x206;
+  fiat_p256_uint1 x207;
+  fiat_p256_addcarryx_u32(&x206, &x207, 0x0, (arg1[5]), x184);
+  uint32_t x208;
+  fiat_p256_uint1 x209;
+  fiat_p256_addcarryx_u32(&x208, &x209, x207, 0x0, x186);
+  uint32_t x210;
+  fiat_p256_uint1 x211;
+  fiat_p256_addcarryx_u32(&x210, &x211, x209, 0x0, x190);
+  uint32_t x212;
+  fiat_p256_uint1 x213;
+  fiat_p256_addcarryx_u32(&x212, &x213, x211, 0x0, x192);
+  uint32_t x214;
+  fiat_p256_uint1 x215;
+  fiat_p256_addcarryx_u32(&x214, &x215, x213, 0x0, x194);
+  uint32_t x216;
+  fiat_p256_uint1 x217;
+  fiat_p256_addcarryx_u32(&x216, &x217, x215, 0x0, x196);
+  uint32_t x218;
+  fiat_p256_uint1 x219;
+  fiat_p256_addcarryx_u32(&x218, &x219, x217, 0x0, x198);
+  uint32_t x220;
+  fiat_p256_uint1 x221;
+  fiat_p256_addcarryx_u32(&x220, &x221, x219, 0x0, x204);
+  uint32_t x222;
+  uint32_t x223;
+  fiat_p256_mulx_u32(&x222, &x223, x206, UINT32_C(0xffffffff));
+  uint32_t x224;
+  uint32_t x225;
+  fiat_p256_mulx_u32(&x224, &x225, x206, UINT32_C(0xffffffff));
+  uint32_t x226;
+  uint32_t x227;
+  fiat_p256_mulx_u32(&x226, &x227, x206, UINT32_C(0xffffffff));
+  uint32_t x228;
+  uint32_t x229;
+  fiat_p256_mulx_u32(&x228, &x229, x206, UINT32_C(0xffffffff));
+  uint32_t x230;
+  fiat_p256_uint1 x231;
+  fiat_p256_addcarryx_u32(&x230, &x231, 0x0, x226, x229);
+  uint32_t x232;
+  fiat_p256_uint1 x233;
+  fiat_p256_addcarryx_u32(&x232, &x233, x231, x224, x227);
+  uint32_t x234;
+  fiat_p256_uint1 x235;
+  fiat_p256_addcarryx_u32(&x234, &x235, 0x0, x228, x206);
+  uint32_t x236;
+  fiat_p256_uint1 x237;
+  fiat_p256_addcarryx_u32(&x236, &x237, x235, x230, x208);
+  uint32_t x238;
+  fiat_p256_uint1 x239;
+  fiat_p256_addcarryx_u32(&x238, &x239, x237, x232, x210);
+  uint32_t x240;
+  fiat_p256_uint1 x241;
+  fiat_p256_addcarryx_u32(&x240, &x241, x233, 0x0, x225);
+  uint32_t x242;
+  fiat_p256_uint1 x243;
+  fiat_p256_addcarryx_u32(&x242, &x243, x239, x240, x212);
+  uint32_t x244;
+  fiat_p256_uint1 x245;
+  fiat_p256_addcarryx_u32(&x244, &x245, x243, 0x0, x214);
+  uint32_t x246;
+  fiat_p256_uint1 x247;
+  fiat_p256_addcarryx_u32(&x246, &x247, x245, 0x0, x216);
+  uint32_t x248;
+  fiat_p256_uint1 x249;
+  fiat_p256_addcarryx_u32(&x248, &x249, x247, x206, x218);
+  uint32_t x250;
+  fiat_p256_uint1 x251;
+  fiat_p256_addcarryx_u32(&x250, &x251, x249, x222, x220);
+  uint32_t x252;
+  fiat_p256_uint1 x253;
+  fiat_p256_addcarryx_u32(&x252, &x253, x205, 0x0, 0x0);
+  uint32_t x254;
+  fiat_p256_uint1 x255;
+  fiat_p256_addcarryx_u32(&x254, &x255, x221, 0x0, (fiat_p256_uint1)x252);
+  uint32_t x256;
+  fiat_p256_uint1 x257;
+  fiat_p256_addcarryx_u32(&x256, &x257, x251, x223, x254);
+  uint32_t x258;
+  fiat_p256_uint1 x259;
+  fiat_p256_addcarryx_u32(&x258, &x259, 0x0, (arg1[6]), x236);
+  uint32_t x260;
+  fiat_p256_uint1 x261;
+  fiat_p256_addcarryx_u32(&x260, &x261, x259, 0x0, x238);
+  uint32_t x262;
+  fiat_p256_uint1 x263;
+  fiat_p256_addcarryx_u32(&x262, &x263, x261, 0x0, x242);
+  uint32_t x264;
+  fiat_p256_uint1 x265;
+  fiat_p256_addcarryx_u32(&x264, &x265, x263, 0x0, x244);
+  uint32_t x266;
+  fiat_p256_uint1 x267;
+  fiat_p256_addcarryx_u32(&x266, &x267, x265, 0x0, x246);
+  uint32_t x268;
+  fiat_p256_uint1 x269;
+  fiat_p256_addcarryx_u32(&x268, &x269, x267, 0x0, x248);
+  uint32_t x270;
+  fiat_p256_uint1 x271;
+  fiat_p256_addcarryx_u32(&x270, &x271, x269, 0x0, x250);
+  uint32_t x272;
+  fiat_p256_uint1 x273;
+  fiat_p256_addcarryx_u32(&x272, &x273, x271, 0x0, x256);
+  uint32_t x274;
+  uint32_t x275;
+  fiat_p256_mulx_u32(&x274, &x275, x258, UINT32_C(0xffffffff));
+  uint32_t x276;
+  uint32_t x277;
+  fiat_p256_mulx_u32(&x276, &x277, x258, UINT32_C(0xffffffff));
+  uint32_t x278;
+  uint32_t x279;
+  fiat_p256_mulx_u32(&x278, &x279, x258, UINT32_C(0xffffffff));
+  uint32_t x280;
+  uint32_t x281;
+  fiat_p256_mulx_u32(&x280, &x281, x258, UINT32_C(0xffffffff));
+  uint32_t x282;
+  fiat_p256_uint1 x283;
+  fiat_p256_addcarryx_u32(&x282, &x283, 0x0, x278, x281);
+  uint32_t x284;
+  fiat_p256_uint1 x285;
+  fiat_p256_addcarryx_u32(&x284, &x285, x283, x276, x279);
+  uint32_t x286;
+  fiat_p256_uint1 x287;
+  fiat_p256_addcarryx_u32(&x286, &x287, 0x0, x280, x258);
+  uint32_t x288;
+  fiat_p256_uint1 x289;
+  fiat_p256_addcarryx_u32(&x288, &x289, x287, x282, x260);
+  uint32_t x290;
+  fiat_p256_uint1 x291;
+  fiat_p256_addcarryx_u32(&x290, &x291, x289, x284, x262);
+  uint32_t x292;
+  fiat_p256_uint1 x293;
+  fiat_p256_addcarryx_u32(&x292, &x293, x285, 0x0, x277);
+  uint32_t x294;
+  fiat_p256_uint1 x295;
+  fiat_p256_addcarryx_u32(&x294, &x295, x291, x292, x264);
+  uint32_t x296;
+  fiat_p256_uint1 x297;
+  fiat_p256_addcarryx_u32(&x296, &x297, x295, 0x0, x266);
+  uint32_t x298;
+  fiat_p256_uint1 x299;
+  fiat_p256_addcarryx_u32(&x298, &x299, x297, 0x0, x268);
+  uint32_t x300;
+  fiat_p256_uint1 x301;
+  fiat_p256_addcarryx_u32(&x300, &x301, x299, x258, x270);
+  uint32_t x302;
+  fiat_p256_uint1 x303;
+  fiat_p256_addcarryx_u32(&x302, &x303, x301, x274, x272);
+  uint32_t x304;
+  fiat_p256_uint1 x305;
+  fiat_p256_addcarryx_u32(&x304, &x305, x257, 0x0, 0x0);
+  uint32_t x306;
+  fiat_p256_uint1 x307;
+  fiat_p256_addcarryx_u32(&x306, &x307, x273, 0x0, (fiat_p256_uint1)x304);
+  uint32_t x308;
+  fiat_p256_uint1 x309;
+  fiat_p256_addcarryx_u32(&x308, &x309, x303, x275, x306);
+  uint32_t x310;
+  fiat_p256_uint1 x311;
+  fiat_p256_addcarryx_u32(&x310, &x311, 0x0, (arg1[7]), x288);
+  uint32_t x312;
+  fiat_p256_uint1 x313;
+  fiat_p256_addcarryx_u32(&x312, &x313, x311, 0x0, x290);
+  uint32_t x314;
+  fiat_p256_uint1 x315;
+  fiat_p256_addcarryx_u32(&x314, &x315, x313, 0x0, x294);
+  uint32_t x316;
+  fiat_p256_uint1 x317;
+  fiat_p256_addcarryx_u32(&x316, &x317, x315, 0x0, x296);
+  uint32_t x318;
+  fiat_p256_uint1 x319;
+  fiat_p256_addcarryx_u32(&x318, &x319, x317, 0x0, x298);
+  uint32_t x320;
+  fiat_p256_uint1 x321;
+  fiat_p256_addcarryx_u32(&x320, &x321, x319, 0x0, x300);
+  uint32_t x322;
+  fiat_p256_uint1 x323;
+  fiat_p256_addcarryx_u32(&x322, &x323, x321, 0x0, x302);
+  uint32_t x324;
+  fiat_p256_uint1 x325;
+  fiat_p256_addcarryx_u32(&x324, &x325, x323, 0x0, x308);
+  uint32_t x326;
+  uint32_t x327;
+  fiat_p256_mulx_u32(&x326, &x327, x310, UINT32_C(0xffffffff));
+  uint32_t x328;
+  uint32_t x329;
+  fiat_p256_mulx_u32(&x328, &x329, x310, UINT32_C(0xffffffff));
+  uint32_t x330;
+  uint32_t x331;
+  fiat_p256_mulx_u32(&x330, &x331, x310, UINT32_C(0xffffffff));
+  uint32_t x332;
+  uint32_t x333;
+  fiat_p256_mulx_u32(&x332, &x333, x310, UINT32_C(0xffffffff));
+  uint32_t x334;
+  fiat_p256_uint1 x335;
+  fiat_p256_addcarryx_u32(&x334, &x335, 0x0, x330, x333);
+  uint32_t x336;
+  fiat_p256_uint1 x337;
+  fiat_p256_addcarryx_u32(&x336, &x337, x335, x328, x331);
+  uint32_t x338;
+  fiat_p256_uint1 x339;
+  fiat_p256_addcarryx_u32(&x338, &x339, 0x0, x332, x310);
+  uint32_t x340;
+  fiat_p256_uint1 x341;
+  fiat_p256_addcarryx_u32(&x340, &x341, x339, x334, x312);
+  uint32_t x342;
+  fiat_p256_uint1 x343;
+  fiat_p256_addcarryx_u32(&x342, &x343, x341, x336, x314);
+  uint32_t x344;
+  fiat_p256_uint1 x345;
+  fiat_p256_addcarryx_u32(&x344, &x345, x337, 0x0, x329);
+  uint32_t x346;
+  fiat_p256_uint1 x347;
+  fiat_p256_addcarryx_u32(&x346, &x347, x343, x344, x316);
+  uint32_t x348;
+  fiat_p256_uint1 x349;
+  fiat_p256_addcarryx_u32(&x348, &x349, x347, 0x0, x318);
+  uint32_t x350;
+  fiat_p256_uint1 x351;
+  fiat_p256_addcarryx_u32(&x350, &x351, x349, 0x0, x320);
+  uint32_t x352;
+  fiat_p256_uint1 x353;
+  fiat_p256_addcarryx_u32(&x352, &x353, x351, x310, x322);
+  uint32_t x354;
+  fiat_p256_uint1 x355;
+  fiat_p256_addcarryx_u32(&x354, &x355, x353, x326, x324);
+  uint32_t x356;
+  fiat_p256_uint1 x357;
+  fiat_p256_addcarryx_u32(&x356, &x357, x309, 0x0, 0x0);
+  uint32_t x358;
+  fiat_p256_uint1 x359;
+  fiat_p256_addcarryx_u32(&x358, &x359, x325, 0x0, (fiat_p256_uint1)x356);
+  uint32_t x360;
+  fiat_p256_uint1 x361;
+  fiat_p256_addcarryx_u32(&x360, &x361, x355, x327, x358);
+  uint32_t x362;
+  fiat_p256_uint1 x363;
+  fiat_p256_subborrowx_u32(&x362, &x363, 0x0, x340, UINT32_C(0xffffffff));
+  uint32_t x364;
+  fiat_p256_uint1 x365;
+  fiat_p256_subborrowx_u32(&x364, &x365, x363, x342, UINT32_C(0xffffffff));
+  uint32_t x366;
+  fiat_p256_uint1 x367;
+  fiat_p256_subborrowx_u32(&x366, &x367, x365, x346, UINT32_C(0xffffffff));
+  uint32_t x368;
+  fiat_p256_uint1 x369;
+  fiat_p256_subborrowx_u32(&x368, &x369, x367, x348, 0x0);
+  uint32_t x370;
+  fiat_p256_uint1 x371;
+  fiat_p256_subborrowx_u32(&x370, &x371, x369, x350, 0x0);
+  uint32_t x372;
+  fiat_p256_uint1 x373;
+  fiat_p256_subborrowx_u32(&x372, &x373, x371, x352, 0x0);
+  uint32_t x374;
+  fiat_p256_uint1 x375;
+  fiat_p256_subborrowx_u32(&x374, &x375, x373, x354, 0x1);
+  uint32_t x376;
+  fiat_p256_uint1 x377;
+  fiat_p256_subborrowx_u32(&x376, &x377, x375, x360, UINT32_C(0xffffffff));
+  uint32_t x378;
+  fiat_p256_uint1 x379;
+  fiat_p256_addcarryx_u32(&x378, &x379, x361, 0x0, 0x0);
+  uint32_t x380;
+  fiat_p256_uint1 x381;
+  fiat_p256_subborrowx_u32(&x380, &x381, x377, (fiat_p256_uint1)x378, 0x0);
+  uint32_t x382;
+  fiat_p256_cmovznz_u32(&x382, x381, x362, x340);
+  uint32_t x383;
+  fiat_p256_cmovznz_u32(&x383, x381, x364, x342);
+  uint32_t x384;
+  fiat_p256_cmovznz_u32(&x384, x381, x366, x346);
+  uint32_t x385;
+  fiat_p256_cmovznz_u32(&x385, x381, x368, x348);
+  uint32_t x386;
+  fiat_p256_cmovznz_u32(&x386, x381, x370, x350);
+  uint32_t x387;
+  fiat_p256_cmovznz_u32(&x387, x381, x372, x352);
+  uint32_t x388;
+  fiat_p256_cmovznz_u32(&x388, x381, x374, x354);
+  uint32_t x389;
+  fiat_p256_cmovznz_u32(&x389, x381, x376, x360);
+  out1[0] = x382;
+  out1[1] = x383;
+  out1[2] = x384;
+  out1[3] = x385;
+  out1[4] = x386;
+  out1[5] = x387;
+  out1[6] = x388;
+  out1[7] = x389;
+}
+
+/*
+ * Input Bounds:
+ *   arg1: [[0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff]]
+ * Output Bounds:
+ *   out1: [0x0 ~> 0xffffffff]
+ */
+static void fiat_p256_nonzero(uint32_t* out1, const uint32_t arg1[8]) {
+  uint32_t x1 = ((arg1[0]) | ((arg1[1]) | ((arg1[2]) | ((arg1[3]) | ((arg1[4]) | ((arg1[5]) | ((arg1[6]) | ((arg1[7]) | (uint32_t)0x0))))))));
+  *out1 = x1;
+}
+
+/*
+ * Input Bounds:
+ *   arg1: [0x0 ~> 0x1]
+ *   arg2: [[0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff]]
+ *   arg3: [[0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff]]
+ * Output Bounds:
+ *   out1: [[0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff]]
+ */
+static void fiat_p256_selectznz(uint32_t out1[8], fiat_p256_uint1 arg1, const uint32_t arg2[8], const uint32_t arg3[8]) {
+  uint32_t x1;
+  fiat_p256_cmovznz_u32(&x1, arg1, (arg2[0]), (arg3[0]));
+  uint32_t x2;
+  fiat_p256_cmovznz_u32(&x2, arg1, (arg2[1]), (arg3[1]));
+  uint32_t x3;
+  fiat_p256_cmovznz_u32(&x3, arg1, (arg2[2]), (arg3[2]));
+  uint32_t x4;
+  fiat_p256_cmovznz_u32(&x4, arg1, (arg2[3]), (arg3[3]));
+  uint32_t x5;
+  fiat_p256_cmovznz_u32(&x5, arg1, (arg2[4]), (arg3[4]));
+  uint32_t x6;
+  fiat_p256_cmovznz_u32(&x6, arg1, (arg2[5]), (arg3[5]));
+  uint32_t x7;
+  fiat_p256_cmovznz_u32(&x7, arg1, (arg2[6]), (arg3[6]));
+  uint32_t x8;
+  fiat_p256_cmovznz_u32(&x8, arg1, (arg2[7]), (arg3[7]));
+  out1[0] = x1;
+  out1[1] = x2;
+  out1[2] = x3;
+  out1[3] = x4;
+  out1[4] = x5;
+  out1[5] = x6;
+  out1[6] = x7;
+  out1[7] = x8;
+}
+
+/*
+ * Input Bounds:
+ *   arg1: [[0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff]]
+ * Output Bounds:
+ *   out1: [[0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff]]
+ */
+static void fiat_p256_to_bytes(uint8_t out1[32], const uint32_t arg1[8]) {
+  uint32_t x1 = (arg1[7]);
+  uint32_t x2 = (arg1[6]);
+  uint32_t x3 = (arg1[5]);
+  uint32_t x4 = (arg1[4]);
+  uint32_t x5 = (arg1[3]);
+  uint32_t x6 = (arg1[2]);
+  uint32_t x7 = (arg1[1]);
+  uint32_t x8 = (arg1[0]);
+  uint32_t x9 = (x8 >> 8);
+  uint8_t x10 = (uint8_t)(x8 & UINT8_C(0xff));
+  uint32_t x11 = (x9 >> 8);
+  uint8_t x12 = (uint8_t)(x9 & UINT8_C(0xff));
+  uint8_t x13 = (uint8_t)(x11 >> 8);
+  uint8_t x14 = (uint8_t)(x11 & UINT8_C(0xff));
+  uint8_t x15 = (uint8_t)(x13 & UINT8_C(0xff));
+  uint32_t x16 = (x7 >> 8);
+  uint8_t x17 = (uint8_t)(x7 & UINT8_C(0xff));
+  uint32_t x18 = (x16 >> 8);
+  uint8_t x19 = (uint8_t)(x16 & UINT8_C(0xff));
+  uint8_t x20 = (uint8_t)(x18 >> 8);
+  uint8_t x21 = (uint8_t)(x18 & UINT8_C(0xff));
+  uint8_t x22 = (uint8_t)(x20 & UINT8_C(0xff));
+  uint32_t x23 = (x6 >> 8);
+  uint8_t x24 = (uint8_t)(x6 & UINT8_C(0xff));
+  uint32_t x25 = (x23 >> 8);
+  uint8_t x26 = (uint8_t)(x23 & UINT8_C(0xff));
+  uint8_t x27 = (uint8_t)(x25 >> 8);
+  uint8_t x28 = (uint8_t)(x25 & UINT8_C(0xff));
+  uint8_t x29 = (uint8_t)(x27 & UINT8_C(0xff));
+  uint32_t x30 = (x5 >> 8);
+  uint8_t x31 = (uint8_t)(x5 & UINT8_C(0xff));
+  uint32_t x32 = (x30 >> 8);
+  uint8_t x33 = (uint8_t)(x30 & UINT8_C(0xff));
+  uint8_t x34 = (uint8_t)(x32 >> 8);
+  uint8_t x35 = (uint8_t)(x32 & UINT8_C(0xff));
+  uint8_t x36 = (uint8_t)(x34 & UINT8_C(0xff));
+  uint32_t x37 = (x4 >> 8);
+  uint8_t x38 = (uint8_t)(x4 & UINT8_C(0xff));
+  uint32_t x39 = (x37 >> 8);
+  uint8_t x40 = (uint8_t)(x37 & UINT8_C(0xff));
+  uint8_t x41 = (uint8_t)(x39 >> 8);
+  uint8_t x42 = (uint8_t)(x39 & UINT8_C(0xff));
+  uint8_t x43 = (uint8_t)(x41 & UINT8_C(0xff));
+  uint32_t x44 = (x3 >> 8);
+  uint8_t x45 = (uint8_t)(x3 & UINT8_C(0xff));
+  uint32_t x46 = (x44 >> 8);
+  uint8_t x47 = (uint8_t)(x44 & UINT8_C(0xff));
+  uint8_t x48 = (uint8_t)(x46 >> 8);
+  uint8_t x49 = (uint8_t)(x46 & UINT8_C(0xff));
+  uint8_t x50 = (uint8_t)(x48 & UINT8_C(0xff));
+  uint32_t x51 = (x2 >> 8);
+  uint8_t x52 = (uint8_t)(x2 & UINT8_C(0xff));
+  uint32_t x53 = (x51 >> 8);
+  uint8_t x54 = (uint8_t)(x51 & UINT8_C(0xff));
+  uint8_t x55 = (uint8_t)(x53 >> 8);
+  uint8_t x56 = (uint8_t)(x53 & UINT8_C(0xff));
+  uint8_t x57 = (uint8_t)(x55 & UINT8_C(0xff));
+  uint32_t x58 = (x1 >> 8);
+  uint8_t x59 = (uint8_t)(x1 & UINT8_C(0xff));
+  uint32_t x60 = (x58 >> 8);
+  uint8_t x61 = (uint8_t)(x58 & UINT8_C(0xff));
+  uint8_t x62 = (uint8_t)(x60 >> 8);
+  uint8_t x63 = (uint8_t)(x60 & UINT8_C(0xff));
+  out1[0] = x10;
+  out1[1] = x12;
+  out1[2] = x14;
+  out1[3] = x15;
+  out1[4] = x17;
+  out1[5] = x19;
+  out1[6] = x21;
+  out1[7] = x22;
+  out1[8] = x24;
+  out1[9] = x26;
+  out1[10] = x28;
+  out1[11] = x29;
+  out1[12] = x31;
+  out1[13] = x33;
+  out1[14] = x35;
+  out1[15] = x36;
+  out1[16] = x38;
+  out1[17] = x40;
+  out1[18] = x42;
+  out1[19] = x43;
+  out1[20] = x45;
+  out1[21] = x47;
+  out1[22] = x49;
+  out1[23] = x50;
+  out1[24] = x52;
+  out1[25] = x54;
+  out1[26] = x56;
+  out1[27] = x57;
+  out1[28] = x59;
+  out1[29] = x61;
+  out1[30] = x63;
+  out1[31] = x62;
+}
+
+/*
+ * Input Bounds:
+ *   arg1: [[0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff]]
+ * Output Bounds:
+ *   out1: [[0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff], [0x0 ~> 0xffffffff]]
+ */
+static void fiat_p256_from_bytes(uint32_t out1[8], const uint8_t arg1[32]) {
+  uint32_t x1 = ((uint32_t)(arg1[31]) << 24);
+  uint32_t x2 = ((uint32_t)(arg1[30]) << 16);
+  uint32_t x3 = ((uint32_t)(arg1[29]) << 8);
+  uint8_t x4 = (arg1[28]);
+  uint32_t x5 = ((uint32_t)(arg1[27]) << 24);
+  uint32_t x6 = ((uint32_t)(arg1[26]) << 16);
+  uint32_t x7 = ((uint32_t)(arg1[25]) << 8);
+  uint8_t x8 = (arg1[24]);
+  uint32_t x9 = ((uint32_t)(arg1[23]) << 24);
+  uint32_t x10 = ((uint32_t)(arg1[22]) << 16);
+  uint32_t x11 = ((uint32_t)(arg1[21]) << 8);
+  uint8_t x12 = (arg1[20]);
+  uint32_t x13 = ((uint32_t)(arg1[19]) << 24);
+  uint32_t x14 = ((uint32_t)(arg1[18]) << 16);
+  uint32_t x15 = ((uint32_t)(arg1[17]) << 8);
+  uint8_t x16 = (arg1[16]);
+  uint32_t x17 = ((uint32_t)(arg1[15]) << 24);
+  uint32_t x18 = ((uint32_t)(arg1[14]) << 16);
+  uint32_t x19 = ((uint32_t)(arg1[13]) << 8);
+  uint8_t x20 = (arg1[12]);
+  uint32_t x21 = ((uint32_t)(arg1[11]) << 24);
+  uint32_t x22 = ((uint32_t)(arg1[10]) << 16);
+  uint32_t x23 = ((uint32_t)(arg1[9]) << 8);
+  uint8_t x24 = (arg1[8]);
+  uint32_t x25 = ((uint32_t)(arg1[7]) << 24);
+  uint32_t x26 = ((uint32_t)(arg1[6]) << 16);
+  uint32_t x27 = ((uint32_t)(arg1[5]) << 8);
+  uint8_t x28 = (arg1[4]);
+  uint32_t x29 = ((uint32_t)(arg1[3]) << 24);
+  uint32_t x30 = ((uint32_t)(arg1[2]) << 16);
+  uint32_t x31 = ((uint32_t)(arg1[1]) << 8);
+  uint8_t x32 = (arg1[0]);
+  uint32_t x33 = (x32 + (x31 + (x30 + x29)));
+  uint32_t x34 = (x33 & UINT32_C(0xffffffff));
+  uint32_t x35 = (x4 + (x3 + (x2 + x1)));
+  uint32_t x36 = (x8 + (x7 + (x6 + x5)));
+  uint32_t x37 = (x12 + (x11 + (x10 + x9)));
+  uint32_t x38 = (x16 + (x15 + (x14 + x13)));
+  uint32_t x39 = (x20 + (x19 + (x18 + x17)));
+  uint32_t x40 = (x24 + (x23 + (x22 + x21)));
+  uint32_t x41 = (x28 + (x27 + (x26 + x25)));
+  uint32_t x42 = (x41 & UINT32_C(0xffffffff));
+  uint32_t x43 = (x40 & UINT32_C(0xffffffff));
+  uint32_t x44 = (x39 & UINT32_C(0xffffffff));
+  uint32_t x45 = (x38 & UINT32_C(0xffffffff));
+  uint32_t x46 = (x37 & UINT32_C(0xffffffff));
+  uint32_t x47 = (x36 & UINT32_C(0xffffffff));
+  out1[0] = x34;
+  out1[1] = x42;
+  out1[2] = x43;
+  out1[3] = x44;
+  out1[4] = x45;
+  out1[5] = x46;
+  out1[6] = x47;
+  out1[7] = x35;
+}
+
diff --git a/third_party/fiat/p256_64.c b/third_party/fiat/p256_64.c
new file mode 100644
index 0000000..8e449c6
--- /dev/null
+++ b/third_party/fiat/p256_64.c
@@ -0,0 +1,1211 @@
+/* Autogenerated */
+/* curve description: p256 */
+/* requested operations: (all) */
+/* m = 0xffffffff00000001000000000000000000000000ffffffffffffffffffffffff (from "2^256 - 2^224 + 2^192 + 2^96 - 1") */
+/* machine_wordsize = 64 (from "64") */
+/*                                                                    */
+/* NOTE: In addition to the bounds specified above each function, all */
+/*   functions synthesized for this Montgomery arithmetic require the */
+/*   input to be strictly less than the prime modulus (m), and also   */
+/*   require the input to be in the unique saturated representation.  */
+/*   All functions also ensure that these two properties are true of  */
+/*   return values.                                                   */
+
+#include <stdint.h>
+typedef unsigned char fiat_p256_uint1;
+typedef signed char fiat_p256_int1;
+typedef signed __int128 fiat_p256_int128;
+typedef unsigned __int128 fiat_p256_uint128;
+
+
+/*
+ * Input Bounds:
+ *   arg1: [0x0 ~> 0x1]
+ *   arg2: [0x0 ~> 0xffffffffffffffff]
+ *   arg3: [0x0 ~> 0xffffffffffffffff]
+ * Output Bounds:
+ *   out1: [0x0 ~> 0xffffffffffffffff]
+ *   out2: [0x0 ~> 0x1]
+ */
+static void fiat_p256_addcarryx_u64(uint64_t* out1, fiat_p256_uint1* out2, fiat_p256_uint1 arg1, uint64_t arg2, uint64_t arg3) {
+  fiat_p256_uint128 x1 = ((arg1 + (fiat_p256_uint128)arg2) + arg3);
+  uint64_t x2 = (uint64_t)(x1 & UINT64_C(0xffffffffffffffff));
+  fiat_p256_uint1 x3 = (fiat_p256_uint1)(x1 >> 64);
+  *out1 = x2;
+  *out2 = x3;
+}
+
+/*
+ * Input Bounds:
+ *   arg1: [0x0 ~> 0x1]
+ *   arg2: [0x0 ~> 0xffffffffffffffff]
+ *   arg3: [0x0 ~> 0xffffffffffffffff]
+ * Output Bounds:
+ *   out1: [0x0 ~> 0xffffffffffffffff]
+ *   out2: [0x0 ~> 0x1]
+ */
+static void fiat_p256_subborrowx_u64(uint64_t* out1, fiat_p256_uint1* out2, fiat_p256_uint1 arg1, uint64_t arg2, uint64_t arg3) {
+  fiat_p256_int128 x1 = ((arg2 - (fiat_p256_int128)arg1) - arg3);
+  fiat_p256_int1 x2 = (fiat_p256_int1)(x1 >> 64);
+  uint64_t x3 = (uint64_t)(x1 & UINT64_C(0xffffffffffffffff));
+  *out1 = x3;
+  *out2 = (fiat_p256_uint1)(0x0 - x2);
+}
+
+/*
+ * Input Bounds:
+ *   arg1: [0x0 ~> 0xffffffffffffffff]
+ *   arg2: [0x0 ~> 0xffffffffffffffff]
+ * Output Bounds:
+ *   out1: [0x0 ~> 0xffffffffffffffff]
+ *   out2: [0x0 ~> 0xffffffffffffffff]
+ */
+static void fiat_p256_mulx_u64(uint64_t* out1, uint64_t* out2, uint64_t arg1, uint64_t arg2) {
+  fiat_p256_uint128 x1 = ((fiat_p256_uint128)arg1 * arg2);
+  uint64_t x2 = (uint64_t)(x1 & UINT64_C(0xffffffffffffffff));
+  uint64_t x3 = (uint64_t)(x1 >> 64);
+  *out1 = x2;
+  *out2 = x3;
+}
+
+/*
+ * Input Bounds:
+ *   arg1: [0x0 ~> 0x1]
+ *   arg2: [0x0 ~> 0xffffffffffffffff]
+ *   arg3: [0x0 ~> 0xffffffffffffffff]
+ * Output Bounds:
+ *   out1: [0x0 ~> 0xffffffffffffffff]
+ */
+static void fiat_p256_cmovznz_u64(uint64_t* out1, fiat_p256_uint1 arg1, uint64_t arg2, uint64_t arg3) {
+  fiat_p256_uint1 x1 = (!(!arg1));
+  uint64_t x2 = ((fiat_p256_int1)(0x0 - x1) & UINT64_C(0xffffffffffffffff));
+  uint64_t x3 = ((x2 & arg3) | ((~x2) & arg2));
+  *out1 = x3;
+}
+
+/*
+ * Input Bounds:
+ *   arg1: [[0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff]]
+ *   arg2: [[0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff]]
+ * Output Bounds:
+ *   out1: [[0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff]]
+ */
+static void fiat_p256_mul(uint64_t out1[4], const uint64_t arg1[4], const uint64_t arg2[4]) {
+  uint64_t x1 = (arg1[1]);
+  uint64_t x2 = (arg1[2]);
+  uint64_t x3 = (arg1[3]);
+  uint64_t x4 = (arg1[0]);
+  uint64_t x5;
+  uint64_t x6;
+  fiat_p256_mulx_u64(&x5, &x6, x4, (arg2[3]));
+  uint64_t x7;
+  uint64_t x8;
+  fiat_p256_mulx_u64(&x7, &x8, x4, (arg2[2]));
+  uint64_t x9;
+  uint64_t x10;
+  fiat_p256_mulx_u64(&x9, &x10, x4, (arg2[1]));
+  uint64_t x11;
+  uint64_t x12;
+  fiat_p256_mulx_u64(&x11, &x12, x4, (arg2[0]));
+  uint64_t x13;
+  fiat_p256_uint1 x14;
+  fiat_p256_addcarryx_u64(&x13, &x14, 0x0, x9, x12);
+  uint64_t x15;
+  fiat_p256_uint1 x16;
+  fiat_p256_addcarryx_u64(&x15, &x16, x14, x7, x10);
+  uint64_t x17;
+  fiat_p256_uint1 x18;
+  fiat_p256_addcarryx_u64(&x17, &x18, x16, x5, x8);
+  uint64_t x19;
+  fiat_p256_uint1 x20;
+  fiat_p256_addcarryx_u64(&x19, &x20, x18, 0x0, x6);
+  uint64_t x21;
+  uint64_t x22;
+  fiat_p256_mulx_u64(&x21, &x22, x11, UINT64_C(0xffffffff00000001));
+  uint64_t x23;
+  uint64_t x24;
+  fiat_p256_mulx_u64(&x23, &x24, x11, UINT32_C(0xffffffff));
+  uint64_t x25;
+  uint64_t x26;
+  fiat_p256_mulx_u64(&x25, &x26, x11, UINT64_C(0xffffffffffffffff));
+  uint64_t x27;
+  fiat_p256_uint1 x28;
+  fiat_p256_addcarryx_u64(&x27, &x28, 0x0, x23, x26);
+  uint64_t x29;
+  fiat_p256_uint1 x30;
+  fiat_p256_addcarryx_u64(&x29, &x30, x28, 0x0, x24);
+  uint64_t x31;
+  fiat_p256_uint1 x32;
+  fiat_p256_addcarryx_u64(&x31, &x32, 0x0, x25, x11);
+  uint64_t x33;
+  fiat_p256_uint1 x34;
+  fiat_p256_addcarryx_u64(&x33, &x34, x32, x27, x13);
+  uint64_t x35;
+  fiat_p256_uint1 x36;
+  fiat_p256_addcarryx_u64(&x35, &x36, x34, x29, x15);
+  uint64_t x37;
+  fiat_p256_uint1 x38;
+  fiat_p256_addcarryx_u64(&x37, &x38, x36, x21, x17);
+  uint64_t x39;
+  fiat_p256_uint1 x40;
+  fiat_p256_addcarryx_u64(&x39, &x40, x38, x22, x19);
+  uint64_t x41;
+  fiat_p256_uint1 x42;
+  fiat_p256_addcarryx_u64(&x41, &x42, x40, 0x0, 0x0);
+  uint64_t x43;
+  uint64_t x44;
+  fiat_p256_mulx_u64(&x43, &x44, x1, (arg2[3]));
+  uint64_t x45;
+  uint64_t x46;
+  fiat_p256_mulx_u64(&x45, &x46, x1, (arg2[2]));
+  uint64_t x47;
+  uint64_t x48;
+  fiat_p256_mulx_u64(&x47, &x48, x1, (arg2[1]));
+  uint64_t x49;
+  uint64_t x50;
+  fiat_p256_mulx_u64(&x49, &x50, x1, (arg2[0]));
+  uint64_t x51;
+  fiat_p256_uint1 x52;
+  fiat_p256_addcarryx_u64(&x51, &x52, 0x0, x47, x50);
+  uint64_t x53;
+  fiat_p256_uint1 x54;
+  fiat_p256_addcarryx_u64(&x53, &x54, x52, x45, x48);
+  uint64_t x55;
+  fiat_p256_uint1 x56;
+  fiat_p256_addcarryx_u64(&x55, &x56, x54, x43, x46);
+  uint64_t x57;
+  fiat_p256_uint1 x58;
+  fiat_p256_addcarryx_u64(&x57, &x58, x56, 0x0, x44);
+  uint64_t x59;
+  fiat_p256_uint1 x60;
+  fiat_p256_addcarryx_u64(&x59, &x60, 0x0, x49, x33);
+  uint64_t x61;
+  fiat_p256_uint1 x62;
+  fiat_p256_addcarryx_u64(&x61, &x62, x60, x51, x35);
+  uint64_t x63;
+  fiat_p256_uint1 x64;
+  fiat_p256_addcarryx_u64(&x63, &x64, x62, x53, x37);
+  uint64_t x65;
+  fiat_p256_uint1 x66;
+  fiat_p256_addcarryx_u64(&x65, &x66, x64, x55, x39);
+  uint64_t x67;
+  fiat_p256_uint1 x68;
+  fiat_p256_addcarryx_u64(&x67, &x68, x66, x57, (fiat_p256_uint1)x41);
+  uint64_t x69;
+  uint64_t x70;
+  fiat_p256_mulx_u64(&x69, &x70, x59, UINT64_C(0xffffffff00000001));
+  uint64_t x71;
+  uint64_t x72;
+  fiat_p256_mulx_u64(&x71, &x72, x59, UINT32_C(0xffffffff));
+  uint64_t x73;
+  uint64_t x74;
+  fiat_p256_mulx_u64(&x73, &x74, x59, UINT64_C(0xffffffffffffffff));
+  uint64_t x75;
+  fiat_p256_uint1 x76;
+  fiat_p256_addcarryx_u64(&x75, &x76, 0x0, x71, x74);
+  uint64_t x77;
+  fiat_p256_uint1 x78;
+  fiat_p256_addcarryx_u64(&x77, &x78, x76, 0x0, x72);
+  uint64_t x79;
+  fiat_p256_uint1 x80;
+  fiat_p256_addcarryx_u64(&x79, &x80, 0x0, x73, x59);
+  uint64_t x81;
+  fiat_p256_uint1 x82;
+  fiat_p256_addcarryx_u64(&x81, &x82, x80, x75, x61);
+  uint64_t x83;
+  fiat_p256_uint1 x84;
+  fiat_p256_addcarryx_u64(&x83, &x84, x82, x77, x63);
+  uint64_t x85;
+  fiat_p256_uint1 x86;
+  fiat_p256_addcarryx_u64(&x85, &x86, x84, x69, x65);
+  uint64_t x87;
+  fiat_p256_uint1 x88;
+  fiat_p256_addcarryx_u64(&x87, &x88, x86, x70, x67);
+  uint64_t x89;
+  fiat_p256_uint1 x90;
+  fiat_p256_addcarryx_u64(&x89, &x90, x88, 0x0, x68);
+  uint64_t x91;
+  uint64_t x92;
+  fiat_p256_mulx_u64(&x91, &x92, x2, (arg2[3]));
+  uint64_t x93;
+  uint64_t x94;
+  fiat_p256_mulx_u64(&x93, &x94, x2, (arg2[2]));
+  uint64_t x95;
+  uint64_t x96;
+  fiat_p256_mulx_u64(&x95, &x96, x2, (arg2[1]));
+  uint64_t x97;
+  uint64_t x98;
+  fiat_p256_mulx_u64(&x97, &x98, x2, (arg2[0]));
+  uint64_t x99;
+  fiat_p256_uint1 x100;
+  fiat_p256_addcarryx_u64(&x99, &x100, 0x0, x95, x98);
+  uint64_t x101;
+  fiat_p256_uint1 x102;
+  fiat_p256_addcarryx_u64(&x101, &x102, x100, x93, x96);
+  uint64_t x103;
+  fiat_p256_uint1 x104;
+  fiat_p256_addcarryx_u64(&x103, &x104, x102, x91, x94);
+  uint64_t x105;
+  fiat_p256_uint1 x106;
+  fiat_p256_addcarryx_u64(&x105, &x106, x104, 0x0, x92);
+  uint64_t x107;
+  fiat_p256_uint1 x108;
+  fiat_p256_addcarryx_u64(&x107, &x108, 0x0, x97, x81);
+  uint64_t x109;
+  fiat_p256_uint1 x110;
+  fiat_p256_addcarryx_u64(&x109, &x110, x108, x99, x83);
+  uint64_t x111;
+  fiat_p256_uint1 x112;
+  fiat_p256_addcarryx_u64(&x111, &x112, x110, x101, x85);
+  uint64_t x113;
+  fiat_p256_uint1 x114;
+  fiat_p256_addcarryx_u64(&x113, &x114, x112, x103, x87);
+  uint64_t x115;
+  fiat_p256_uint1 x116;
+  fiat_p256_addcarryx_u64(&x115, &x116, x114, x105, x89);
+  uint64_t x117;
+  uint64_t x118;
+  fiat_p256_mulx_u64(&x117, &x118, x107, UINT64_C(0xffffffff00000001));
+  uint64_t x119;
+  uint64_t x120;
+  fiat_p256_mulx_u64(&x119, &x120, x107, UINT32_C(0xffffffff));
+  uint64_t x121;
+  uint64_t x122;
+  fiat_p256_mulx_u64(&x121, &x122, x107, UINT64_C(0xffffffffffffffff));
+  uint64_t x123;
+  fiat_p256_uint1 x124;
+  fiat_p256_addcarryx_u64(&x123, &x124, 0x0, x119, x122);
+  uint64_t x125;
+  fiat_p256_uint1 x126;
+  fiat_p256_addcarryx_u64(&x125, &x126, x124, 0x0, x120);
+  uint64_t x127;
+  fiat_p256_uint1 x128;
+  fiat_p256_addcarryx_u64(&x127, &x128, 0x0, x121, x107);
+  uint64_t x129;
+  fiat_p256_uint1 x130;
+  fiat_p256_addcarryx_u64(&x129, &x130, x128, x123, x109);
+  uint64_t x131;
+  fiat_p256_uint1 x132;
+  fiat_p256_addcarryx_u64(&x131, &x132, x130, x125, x111);
+  uint64_t x133;
+  fiat_p256_uint1 x134;
+  fiat_p256_addcarryx_u64(&x133, &x134, x132, x117, x113);
+  uint64_t x135;
+  fiat_p256_uint1 x136;
+  fiat_p256_addcarryx_u64(&x135, &x136, x134, x118, x115);
+  uint64_t x137;
+  fiat_p256_uint1 x138;
+  fiat_p256_addcarryx_u64(&x137, &x138, x136, 0x0, x116);
+  uint64_t x139;
+  uint64_t x140;
+  fiat_p256_mulx_u64(&x139, &x140, x3, (arg2[3]));
+  uint64_t x141;
+  uint64_t x142;
+  fiat_p256_mulx_u64(&x141, &x142, x3, (arg2[2]));
+  uint64_t x143;
+  uint64_t x144;
+  fiat_p256_mulx_u64(&x143, &x144, x3, (arg2[1]));
+  uint64_t x145;
+  uint64_t x146;
+  fiat_p256_mulx_u64(&x145, &x146, x3, (arg2[0]));
+  uint64_t x147;
+  fiat_p256_uint1 x148;
+  fiat_p256_addcarryx_u64(&x147, &x148, 0x0, x143, x146);
+  uint64_t x149;
+  fiat_p256_uint1 x150;
+  fiat_p256_addcarryx_u64(&x149, &x150, x148, x141, x144);
+  uint64_t x151;
+  fiat_p256_uint1 x152;
+  fiat_p256_addcarryx_u64(&x151, &x152, x150, x139, x142);
+  uint64_t x153;
+  fiat_p256_uint1 x154;
+  fiat_p256_addcarryx_u64(&x153, &x154, x152, 0x0, x140);
+  uint64_t x155;
+  fiat_p256_uint1 x156;
+  fiat_p256_addcarryx_u64(&x155, &x156, 0x0, x145, x129);
+  uint64_t x157;
+  fiat_p256_uint1 x158;
+  fiat_p256_addcarryx_u64(&x157, &x158, x156, x147, x131);
+  uint64_t x159;
+  fiat_p256_uint1 x160;
+  fiat_p256_addcarryx_u64(&x159, &x160, x158, x149, x133);
+  uint64_t x161;
+  fiat_p256_uint1 x162;
+  fiat_p256_addcarryx_u64(&x161, &x162, x160, x151, x135);
+  uint64_t x163;
+  fiat_p256_uint1 x164;
+  fiat_p256_addcarryx_u64(&x163, &x164, x162, x153, x137);
+  uint64_t x165;
+  uint64_t x166;
+  fiat_p256_mulx_u64(&x165, &x166, x155, UINT64_C(0xffffffff00000001));
+  uint64_t x167;
+  uint64_t x168;
+  fiat_p256_mulx_u64(&x167, &x168, x155, UINT32_C(0xffffffff));
+  uint64_t x169;
+  uint64_t x170;
+  fiat_p256_mulx_u64(&x169, &x170, x155, UINT64_C(0xffffffffffffffff));
+  uint64_t x171;
+  fiat_p256_uint1 x172;
+  fiat_p256_addcarryx_u64(&x171, &x172, 0x0, x167, x170);
+  uint64_t x173;
+  fiat_p256_uint1 x174;
+  fiat_p256_addcarryx_u64(&x173, &x174, x172, 0x0, x168);
+  uint64_t x175;
+  fiat_p256_uint1 x176;
+  fiat_p256_addcarryx_u64(&x175, &x176, 0x0, x169, x155);
+  uint64_t x177;
+  fiat_p256_uint1 x178;
+  fiat_p256_addcarryx_u64(&x177, &x178, x176, x171, x157);
+  uint64_t x179;
+  fiat_p256_uint1 x180;
+  fiat_p256_addcarryx_u64(&x179, &x180, x178, x173, x159);
+  uint64_t x181;
+  fiat_p256_uint1 x182;
+  fiat_p256_addcarryx_u64(&x181, &x182, x180, x165, x161);
+  uint64_t x183;
+  fiat_p256_uint1 x184;
+  fiat_p256_addcarryx_u64(&x183, &x184, x182, x166, x163);
+  uint64_t x185;
+  fiat_p256_uint1 x186;
+  fiat_p256_addcarryx_u64(&x185, &x186, x184, 0x0, x164);
+  uint64_t x187;
+  fiat_p256_uint1 x188;
+  fiat_p256_subborrowx_u64(&x187, &x188, 0x0, x177, UINT64_C(0xffffffffffffffff));
+  uint64_t x189;
+  fiat_p256_uint1 x190;
+  fiat_p256_subborrowx_u64(&x189, &x190, x188, x179, UINT32_C(0xffffffff));
+  uint64_t x191;
+  fiat_p256_uint1 x192;
+  fiat_p256_subborrowx_u64(&x191, &x192, x190, x181, 0x0);
+  uint64_t x193;
+  fiat_p256_uint1 x194;
+  fiat_p256_subborrowx_u64(&x193, &x194, x192, x183, UINT64_C(0xffffffff00000001));
+  uint64_t x195;
+  fiat_p256_uint1 x196;
+  fiat_p256_subborrowx_u64(&x195, &x196, x194, x185, 0x0);
+  uint64_t x197;
+  fiat_p256_cmovznz_u64(&x197, x196, x187, x177);
+  uint64_t x198;
+  fiat_p256_cmovznz_u64(&x198, x196, x189, x179);
+  uint64_t x199;
+  fiat_p256_cmovznz_u64(&x199, x196, x191, x181);
+  uint64_t x200;
+  fiat_p256_cmovznz_u64(&x200, x196, x193, x183);
+  out1[0] = x197;
+  out1[1] = x198;
+  out1[2] = x199;
+  out1[3] = x200;
+}
+
+/*
+ * Input Bounds:
+ *   arg1: [[0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff]]
+ * Output Bounds:
+ *   out1: [[0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff]]
+ */
+static void fiat_p256_square(uint64_t out1[4], const uint64_t arg1[4]) {
+  uint64_t x1 = (arg1[1]);
+  uint64_t x2 = (arg1[2]);
+  uint64_t x3 = (arg1[3]);
+  uint64_t x4 = (arg1[0]);
+  uint64_t x5;
+  uint64_t x6;
+  fiat_p256_mulx_u64(&x5, &x6, x4, (arg1[3]));
+  uint64_t x7;
+  uint64_t x8;
+  fiat_p256_mulx_u64(&x7, &x8, x4, (arg1[2]));
+  uint64_t x9;
+  uint64_t x10;
+  fiat_p256_mulx_u64(&x9, &x10, x4, (arg1[1]));
+  uint64_t x11;
+  uint64_t x12;
+  fiat_p256_mulx_u64(&x11, &x12, x4, (arg1[0]));
+  uint64_t x13;
+  fiat_p256_uint1 x14;
+  fiat_p256_addcarryx_u64(&x13, &x14, 0x0, x9, x12);
+  uint64_t x15;
+  fiat_p256_uint1 x16;
+  fiat_p256_addcarryx_u64(&x15, &x16, x14, x7, x10);
+  uint64_t x17;
+  fiat_p256_uint1 x18;
+  fiat_p256_addcarryx_u64(&x17, &x18, x16, x5, x8);
+  uint64_t x19;
+  fiat_p256_uint1 x20;
+  fiat_p256_addcarryx_u64(&x19, &x20, x18, 0x0, x6);
+  uint64_t x21;
+  uint64_t x22;
+  fiat_p256_mulx_u64(&x21, &x22, x11, UINT64_C(0xffffffff00000001));
+  uint64_t x23;
+  uint64_t x24;
+  fiat_p256_mulx_u64(&x23, &x24, x11, UINT32_C(0xffffffff));
+  uint64_t x25;
+  uint64_t x26;
+  fiat_p256_mulx_u64(&x25, &x26, x11, UINT64_C(0xffffffffffffffff));
+  uint64_t x27;
+  fiat_p256_uint1 x28;
+  fiat_p256_addcarryx_u64(&x27, &x28, 0x0, x23, x26);
+  uint64_t x29;
+  fiat_p256_uint1 x30;
+  fiat_p256_addcarryx_u64(&x29, &x30, x28, 0x0, x24);
+  uint64_t x31;
+  fiat_p256_uint1 x32;
+  fiat_p256_addcarryx_u64(&x31, &x32, 0x0, x25, x11);
+  uint64_t x33;
+  fiat_p256_uint1 x34;
+  fiat_p256_addcarryx_u64(&x33, &x34, x32, x27, x13);
+  uint64_t x35;
+  fiat_p256_uint1 x36;
+  fiat_p256_addcarryx_u64(&x35, &x36, x34, x29, x15);
+  uint64_t x37;
+  fiat_p256_uint1 x38;
+  fiat_p256_addcarryx_u64(&x37, &x38, x36, x21, x17);
+  uint64_t x39;
+  fiat_p256_uint1 x40;
+  fiat_p256_addcarryx_u64(&x39, &x40, x38, x22, x19);
+  uint64_t x41;
+  fiat_p256_uint1 x42;
+  fiat_p256_addcarryx_u64(&x41, &x42, x40, 0x0, 0x0);
+  uint64_t x43;
+  uint64_t x44;
+  fiat_p256_mulx_u64(&x43, &x44, x1, (arg1[3]));
+  uint64_t x45;
+  uint64_t x46;
+  fiat_p256_mulx_u64(&x45, &x46, x1, (arg1[2]));
+  uint64_t x47;
+  uint64_t x48;
+  fiat_p256_mulx_u64(&x47, &x48, x1, (arg1[1]));
+  uint64_t x49;
+  uint64_t x50;
+  fiat_p256_mulx_u64(&x49, &x50, x1, (arg1[0]));
+  uint64_t x51;
+  fiat_p256_uint1 x52;
+  fiat_p256_addcarryx_u64(&x51, &x52, 0x0, x47, x50);
+  uint64_t x53;
+  fiat_p256_uint1 x54;
+  fiat_p256_addcarryx_u64(&x53, &x54, x52, x45, x48);
+  uint64_t x55;
+  fiat_p256_uint1 x56;
+  fiat_p256_addcarryx_u64(&x55, &x56, x54, x43, x46);
+  uint64_t x57;
+  fiat_p256_uint1 x58;
+  fiat_p256_addcarryx_u64(&x57, &x58, x56, 0x0, x44);
+  uint64_t x59;
+  fiat_p256_uint1 x60;
+  fiat_p256_addcarryx_u64(&x59, &x60, 0x0, x49, x33);
+  uint64_t x61;
+  fiat_p256_uint1 x62;
+  fiat_p256_addcarryx_u64(&x61, &x62, x60, x51, x35);
+  uint64_t x63;
+  fiat_p256_uint1 x64;
+  fiat_p256_addcarryx_u64(&x63, &x64, x62, x53, x37);
+  uint64_t x65;
+  fiat_p256_uint1 x66;
+  fiat_p256_addcarryx_u64(&x65, &x66, x64, x55, x39);
+  uint64_t x67;
+  fiat_p256_uint1 x68;
+  fiat_p256_addcarryx_u64(&x67, &x68, x66, x57, (fiat_p256_uint1)x41);
+  uint64_t x69;
+  uint64_t x70;
+  fiat_p256_mulx_u64(&x69, &x70, x59, UINT64_C(0xffffffff00000001));
+  uint64_t x71;
+  uint64_t x72;
+  fiat_p256_mulx_u64(&x71, &x72, x59, UINT32_C(0xffffffff));
+  uint64_t x73;
+  uint64_t x74;
+  fiat_p256_mulx_u64(&x73, &x74, x59, UINT64_C(0xffffffffffffffff));
+  uint64_t x75;
+  fiat_p256_uint1 x76;
+  fiat_p256_addcarryx_u64(&x75, &x76, 0x0, x71, x74);
+  uint64_t x77;
+  fiat_p256_uint1 x78;
+  fiat_p256_addcarryx_u64(&x77, &x78, x76, 0x0, x72);
+  uint64_t x79;
+  fiat_p256_uint1 x80;
+  fiat_p256_addcarryx_u64(&x79, &x80, 0x0, x73, x59);
+  uint64_t x81;
+  fiat_p256_uint1 x82;
+  fiat_p256_addcarryx_u64(&x81, &x82, x80, x75, x61);
+  uint64_t x83;
+  fiat_p256_uint1 x84;
+  fiat_p256_addcarryx_u64(&x83, &x84, x82, x77, x63);
+  uint64_t x85;
+  fiat_p256_uint1 x86;
+  fiat_p256_addcarryx_u64(&x85, &x86, x84, x69, x65);
+  uint64_t x87;
+  fiat_p256_uint1 x88;
+  fiat_p256_addcarryx_u64(&x87, &x88, x86, x70, x67);
+  uint64_t x89;
+  fiat_p256_uint1 x90;
+  fiat_p256_addcarryx_u64(&x89, &x90, x88, 0x0, x68);
+  uint64_t x91;
+  uint64_t x92;
+  fiat_p256_mulx_u64(&x91, &x92, x2, (arg1[3]));
+  uint64_t x93;
+  uint64_t x94;
+  fiat_p256_mulx_u64(&x93, &x94, x2, (arg1[2]));
+  uint64_t x95;
+  uint64_t x96;
+  fiat_p256_mulx_u64(&x95, &x96, x2, (arg1[1]));
+  uint64_t x97;
+  uint64_t x98;
+  fiat_p256_mulx_u64(&x97, &x98, x2, (arg1[0]));
+  uint64_t x99;
+  fiat_p256_uint1 x100;
+  fiat_p256_addcarryx_u64(&x99, &x100, 0x0, x95, x98);
+  uint64_t x101;
+  fiat_p256_uint1 x102;
+  fiat_p256_addcarryx_u64(&x101, &x102, x100, x93, x96);
+  uint64_t x103;
+  fiat_p256_uint1 x104;
+  fiat_p256_addcarryx_u64(&x103, &x104, x102, x91, x94);
+  uint64_t x105;
+  fiat_p256_uint1 x106;
+  fiat_p256_addcarryx_u64(&x105, &x106, x104, 0x0, x92);
+  uint64_t x107;
+  fiat_p256_uint1 x108;
+  fiat_p256_addcarryx_u64(&x107, &x108, 0x0, x97, x81);
+  uint64_t x109;
+  fiat_p256_uint1 x110;
+  fiat_p256_addcarryx_u64(&x109, &x110, x108, x99, x83);
+  uint64_t x111;
+  fiat_p256_uint1 x112;
+  fiat_p256_addcarryx_u64(&x111, &x112, x110, x101, x85);
+  uint64_t x113;
+  fiat_p256_uint1 x114;
+  fiat_p256_addcarryx_u64(&x113, &x114, x112, x103, x87);
+  uint64_t x115;
+  fiat_p256_uint1 x116;
+  fiat_p256_addcarryx_u64(&x115, &x116, x114, x105, x89);
+  uint64_t x117;
+  uint64_t x118;
+  fiat_p256_mulx_u64(&x117, &x118, x107, UINT64_C(0xffffffff00000001));
+  uint64_t x119;
+  uint64_t x120;
+  fiat_p256_mulx_u64(&x119, &x120, x107, UINT32_C(0xffffffff));
+  uint64_t x121;
+  uint64_t x122;
+  fiat_p256_mulx_u64(&x121, &x122, x107, UINT64_C(0xffffffffffffffff));
+  uint64_t x123;
+  fiat_p256_uint1 x124;
+  fiat_p256_addcarryx_u64(&x123, &x124, 0x0, x119, x122);
+  uint64_t x125;
+  fiat_p256_uint1 x126;
+  fiat_p256_addcarryx_u64(&x125, &x126, x124, 0x0, x120);
+  uint64_t x127;
+  fiat_p256_uint1 x128;
+  fiat_p256_addcarryx_u64(&x127, &x128, 0x0, x121, x107);
+  uint64_t x129;
+  fiat_p256_uint1 x130;
+  fiat_p256_addcarryx_u64(&x129, &x130, x128, x123, x109);
+  uint64_t x131;
+  fiat_p256_uint1 x132;
+  fiat_p256_addcarryx_u64(&x131, &x132, x130, x125, x111);
+  uint64_t x133;
+  fiat_p256_uint1 x134;
+  fiat_p256_addcarryx_u64(&x133, &x134, x132, x117, x113);
+  uint64_t x135;
+  fiat_p256_uint1 x136;
+  fiat_p256_addcarryx_u64(&x135, &x136, x134, x118, x115);
+  uint64_t x137;
+  fiat_p256_uint1 x138;
+  fiat_p256_addcarryx_u64(&x137, &x138, x136, 0x0, x116);
+  uint64_t x139;
+  uint64_t x140;
+  fiat_p256_mulx_u64(&x139, &x140, x3, (arg1[3]));
+  uint64_t x141;
+  uint64_t x142;
+  fiat_p256_mulx_u64(&x141, &x142, x3, (arg1[2]));
+  uint64_t x143;
+  uint64_t x144;
+  fiat_p256_mulx_u64(&x143, &x144, x3, (arg1[1]));
+  uint64_t x145;
+  uint64_t x146;
+  fiat_p256_mulx_u64(&x145, &x146, x3, (arg1[0]));
+  uint64_t x147;
+  fiat_p256_uint1 x148;
+  fiat_p256_addcarryx_u64(&x147, &x148, 0x0, x143, x146);
+  uint64_t x149;
+  fiat_p256_uint1 x150;
+  fiat_p256_addcarryx_u64(&x149, &x150, x148, x141, x144);
+  uint64_t x151;
+  fiat_p256_uint1 x152;
+  fiat_p256_addcarryx_u64(&x151, &x152, x150, x139, x142);
+  uint64_t x153;
+  fiat_p256_uint1 x154;
+  fiat_p256_addcarryx_u64(&x153, &x154, x152, 0x0, x140);
+  uint64_t x155;
+  fiat_p256_uint1 x156;
+  fiat_p256_addcarryx_u64(&x155, &x156, 0x0, x145, x129);
+  uint64_t x157;
+  fiat_p256_uint1 x158;
+  fiat_p256_addcarryx_u64(&x157, &x158, x156, x147, x131);
+  uint64_t x159;
+  fiat_p256_uint1 x160;
+  fiat_p256_addcarryx_u64(&x159, &x160, x158, x149, x133);
+  uint64_t x161;
+  fiat_p256_uint1 x162;
+  fiat_p256_addcarryx_u64(&x161, &x162, x160, x151, x135);
+  uint64_t x163;
+  fiat_p256_uint1 x164;
+  fiat_p256_addcarryx_u64(&x163, &x164, x162, x153, x137);
+  uint64_t x165;
+  uint64_t x166;
+  fiat_p256_mulx_u64(&x165, &x166, x155, UINT64_C(0xffffffff00000001));
+  uint64_t x167;
+  uint64_t x168;
+  fiat_p256_mulx_u64(&x167, &x168, x155, UINT32_C(0xffffffff));
+  uint64_t x169;
+  uint64_t x170;
+  fiat_p256_mulx_u64(&x169, &x170, x155, UINT64_C(0xffffffffffffffff));
+  uint64_t x171;
+  fiat_p256_uint1 x172;
+  fiat_p256_addcarryx_u64(&x171, &x172, 0x0, x167, x170);
+  uint64_t x173;
+  fiat_p256_uint1 x174;
+  fiat_p256_addcarryx_u64(&x173, &x174, x172, 0x0, x168);
+  uint64_t x175;
+  fiat_p256_uint1 x176;
+  fiat_p256_addcarryx_u64(&x175, &x176, 0x0, x169, x155);
+  uint64_t x177;
+  fiat_p256_uint1 x178;
+  fiat_p256_addcarryx_u64(&x177, &x178, x176, x171, x157);
+  uint64_t x179;
+  fiat_p256_uint1 x180;
+  fiat_p256_addcarryx_u64(&x179, &x180, x178, x173, x159);
+  uint64_t x181;
+  fiat_p256_uint1 x182;
+  fiat_p256_addcarryx_u64(&x181, &x182, x180, x165, x161);
+  uint64_t x183;
+  fiat_p256_uint1 x184;
+  fiat_p256_addcarryx_u64(&x183, &x184, x182, x166, x163);
+  uint64_t x185;
+  fiat_p256_uint1 x186;
+  fiat_p256_addcarryx_u64(&x185, &x186, x184, 0x0, x164);
+  uint64_t x187;
+  fiat_p256_uint1 x188;
+  fiat_p256_subborrowx_u64(&x187, &x188, 0x0, x177, UINT64_C(0xffffffffffffffff));
+  uint64_t x189;
+  fiat_p256_uint1 x190;
+  fiat_p256_subborrowx_u64(&x189, &x190, x188, x179, UINT32_C(0xffffffff));
+  uint64_t x191;
+  fiat_p256_uint1 x192;
+  fiat_p256_subborrowx_u64(&x191, &x192, x190, x181, 0x0);
+  uint64_t x193;
+  fiat_p256_uint1 x194;
+  fiat_p256_subborrowx_u64(&x193, &x194, x192, x183, UINT64_C(0xffffffff00000001));
+  uint64_t x195;
+  fiat_p256_uint1 x196;
+  fiat_p256_subborrowx_u64(&x195, &x196, x194, x185, 0x0);
+  uint64_t x197;
+  fiat_p256_cmovznz_u64(&x197, x196, x187, x177);
+  uint64_t x198;
+  fiat_p256_cmovznz_u64(&x198, x196, x189, x179);
+  uint64_t x199;
+  fiat_p256_cmovznz_u64(&x199, x196, x191, x181);
+  uint64_t x200;
+  fiat_p256_cmovznz_u64(&x200, x196, x193, x183);
+  out1[0] = x197;
+  out1[1] = x198;
+  out1[2] = x199;
+  out1[3] = x200;
+}
+
+/*
+ * Input Bounds:
+ *   arg1: [[0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff]]
+ *   arg2: [[0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff]]
+ * Output Bounds:
+ *   out1: [[0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff]]
+ */
+static void fiat_p256_add(uint64_t out1[4], const uint64_t arg1[4], const uint64_t arg2[4]) {
+  uint64_t x1;
+  fiat_p256_uint1 x2;
+  fiat_p256_addcarryx_u64(&x1, &x2, 0x0, (arg2[0]), (arg1[0]));
+  uint64_t x3;
+  fiat_p256_uint1 x4;
+  fiat_p256_addcarryx_u64(&x3, &x4, x2, (arg2[1]), (arg1[1]));
+  uint64_t x5;
+  fiat_p256_uint1 x6;
+  fiat_p256_addcarryx_u64(&x5, &x6, x4, (arg2[2]), (arg1[2]));
+  uint64_t x7;
+  fiat_p256_uint1 x8;
+  fiat_p256_addcarryx_u64(&x7, &x8, x6, (arg2[3]), (arg1[3]));
+  uint64_t x9;
+  fiat_p256_uint1 x10;
+  fiat_p256_subborrowx_u64(&x9, &x10, 0x0, x1, UINT64_C(0xffffffffffffffff));
+  uint64_t x11;
+  fiat_p256_uint1 x12;
+  fiat_p256_subborrowx_u64(&x11, &x12, x10, x3, UINT32_C(0xffffffff));
+  uint64_t x13;
+  fiat_p256_uint1 x14;
+  fiat_p256_subborrowx_u64(&x13, &x14, x12, x5, 0x0);
+  uint64_t x15;
+  fiat_p256_uint1 x16;
+  fiat_p256_subborrowx_u64(&x15, &x16, x14, x7, UINT64_C(0xffffffff00000001));
+  uint64_t x17;
+  fiat_p256_uint1 x18;
+  fiat_p256_subborrowx_u64(&x17, &x18, x16, x8, 0x0);
+  uint64_t x19;
+  fiat_p256_cmovznz_u64(&x19, x18, x9, x1);
+  uint64_t x20;
+  fiat_p256_cmovznz_u64(&x20, x18, x11, x3);
+  uint64_t x21;
+  fiat_p256_cmovznz_u64(&x21, x18, x13, x5);
+  uint64_t x22;
+  fiat_p256_cmovznz_u64(&x22, x18, x15, x7);
+  out1[0] = x19;
+  out1[1] = x20;
+  out1[2] = x21;
+  out1[3] = x22;
+}
+
+/*
+ * Input Bounds:
+ *   arg1: [[0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff]]
+ *   arg2: [[0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff]]
+ * Output Bounds:
+ *   out1: [[0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff]]
+ */
+static void fiat_p256_sub(uint64_t out1[4], const uint64_t arg1[4], const uint64_t arg2[4]) {
+  uint64_t x1;
+  fiat_p256_uint1 x2;
+  fiat_p256_subborrowx_u64(&x1, &x2, 0x0, (arg1[0]), (arg2[0]));
+  uint64_t x3;
+  fiat_p256_uint1 x4;
+  fiat_p256_subborrowx_u64(&x3, &x4, x2, (arg1[1]), (arg2[1]));
+  uint64_t x5;
+  fiat_p256_uint1 x6;
+  fiat_p256_subborrowx_u64(&x5, &x6, x4, (arg1[2]), (arg2[2]));
+  uint64_t x7;
+  fiat_p256_uint1 x8;
+  fiat_p256_subborrowx_u64(&x7, &x8, x6, (arg1[3]), (arg2[3]));
+  uint64_t x9;
+  fiat_p256_cmovznz_u64(&x9, x8, 0x0, UINT64_C(0xffffffffffffffff));
+  uint64_t x10;
+  fiat_p256_uint1 x11;
+  fiat_p256_addcarryx_u64(&x10, &x11, 0x0, (x9 & UINT64_C(0xffffffffffffffff)), x1);
+  uint64_t x12;
+  fiat_p256_uint1 x13;
+  fiat_p256_addcarryx_u64(&x12, &x13, x11, (x9 & UINT32_C(0xffffffff)), x3);
+  uint64_t x14;
+  fiat_p256_uint1 x15;
+  fiat_p256_addcarryx_u64(&x14, &x15, x13, 0x0, x5);
+  uint64_t x16;
+  fiat_p256_uint1 x17;
+  fiat_p256_addcarryx_u64(&x16, &x17, x15, (x9 & UINT64_C(0xffffffff00000001)), x7);
+  out1[0] = x10;
+  out1[1] = x12;
+  out1[2] = x14;
+  out1[3] = x16;
+}
+
+/*
+ * Input Bounds:
+ *   arg1: [[0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff]]
+ * Output Bounds:
+ *   out1: [[0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff]]
+ */
+static void fiat_p256_opp(uint64_t out1[4], const uint64_t arg1[4]) {
+  uint64_t x1;
+  fiat_p256_uint1 x2;
+  fiat_p256_subborrowx_u64(&x1, &x2, 0x0, 0x0, (arg1[0]));
+  uint64_t x3;
+  fiat_p256_uint1 x4;
+  fiat_p256_subborrowx_u64(&x3, &x4, x2, 0x0, (arg1[1]));
+  uint64_t x5;
+  fiat_p256_uint1 x6;
+  fiat_p256_subborrowx_u64(&x5, &x6, x4, 0x0, (arg1[2]));
+  uint64_t x7;
+  fiat_p256_uint1 x8;
+  fiat_p256_subborrowx_u64(&x7, &x8, x6, 0x0, (arg1[3]));
+  uint64_t x9;
+  fiat_p256_cmovznz_u64(&x9, x8, 0x0, UINT64_C(0xffffffffffffffff));
+  uint64_t x10;
+  fiat_p256_uint1 x11;
+  fiat_p256_addcarryx_u64(&x10, &x11, 0x0, (x9 & UINT64_C(0xffffffffffffffff)), x1);
+  uint64_t x12;
+  fiat_p256_uint1 x13;
+  fiat_p256_addcarryx_u64(&x12, &x13, x11, (x9 & UINT32_C(0xffffffff)), x3);
+  uint64_t x14;
+  fiat_p256_uint1 x15;
+  fiat_p256_addcarryx_u64(&x14, &x15, x13, 0x0, x5);
+  uint64_t x16;
+  fiat_p256_uint1 x17;
+  fiat_p256_addcarryx_u64(&x16, &x17, x15, (x9 & UINT64_C(0xffffffff00000001)), x7);
+  out1[0] = x10;
+  out1[1] = x12;
+  out1[2] = x14;
+  out1[3] = x16;
+}
+
+/*
+ * Input Bounds:
+ *   arg1: [[0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff]]
+ * Output Bounds:
+ *   out1: [[0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff]]
+ */
+static void fiat_p256_from_montgomery(uint64_t out1[4], const uint64_t arg1[4]) {
+  uint64_t x1 = (arg1[0]);
+  uint64_t x2;
+  uint64_t x3;
+  fiat_p256_mulx_u64(&x2, &x3, x1, UINT64_C(0xffffffff00000001));
+  uint64_t x4;
+  uint64_t x5;
+  fiat_p256_mulx_u64(&x4, &x5, x1, UINT32_C(0xffffffff));
+  uint64_t x6;
+  uint64_t x7;
+  fiat_p256_mulx_u64(&x6, &x7, x1, UINT64_C(0xffffffffffffffff));
+  uint64_t x8;
+  fiat_p256_uint1 x9;
+  fiat_p256_addcarryx_u64(&x8, &x9, 0x0, x4, x7);
+  uint64_t x10;
+  fiat_p256_uint1 x11;
+  fiat_p256_addcarryx_u64(&x10, &x11, 0x0, x6, x1);
+  uint64_t x12;
+  fiat_p256_uint1 x13;
+  fiat_p256_addcarryx_u64(&x12, &x13, x11, x8, 0x0);
+  uint64_t x14;
+  fiat_p256_uint1 x15;
+  fiat_p256_addcarryx_u64(&x14, &x15, 0x0, (arg1[1]), x12);
+  uint64_t x16;
+  uint64_t x17;
+  fiat_p256_mulx_u64(&x16, &x17, x14, UINT64_C(0xffffffff00000001));
+  uint64_t x18;
+  uint64_t x19;
+  fiat_p256_mulx_u64(&x18, &x19, x14, UINT32_C(0xffffffff));
+  uint64_t x20;
+  uint64_t x21;
+  fiat_p256_mulx_u64(&x20, &x21, x14, UINT64_C(0xffffffffffffffff));
+  uint64_t x22;
+  fiat_p256_uint1 x23;
+  fiat_p256_addcarryx_u64(&x22, &x23, 0x0, x18, x21);
+  uint64_t x24;
+  fiat_p256_uint1 x25;
+  fiat_p256_addcarryx_u64(&x24, &x25, x9, 0x0, x5);
+  uint64_t x26;
+  fiat_p256_uint1 x27;
+  fiat_p256_addcarryx_u64(&x26, &x27, x13, x24, 0x0);
+  uint64_t x28;
+  fiat_p256_uint1 x29;
+  fiat_p256_addcarryx_u64(&x28, &x29, x15, 0x0, x26);
+  uint64_t x30;
+  fiat_p256_uint1 x31;
+  fiat_p256_addcarryx_u64(&x30, &x31, 0x0, x20, x14);
+  uint64_t x32;
+  fiat_p256_uint1 x33;
+  fiat_p256_addcarryx_u64(&x32, &x33, x31, x22, x28);
+  uint64_t x34;
+  fiat_p256_uint1 x35;
+  fiat_p256_addcarryx_u64(&x34, &x35, x23, 0x0, x19);
+  uint64_t x36;
+  fiat_p256_uint1 x37;
+  fiat_p256_addcarryx_u64(&x36, &x37, x33, x34, x2);
+  uint64_t x38;
+  fiat_p256_uint1 x39;
+  fiat_p256_addcarryx_u64(&x38, &x39, x37, x16, x3);
+  uint64_t x40;
+  fiat_p256_uint1 x41;
+  fiat_p256_addcarryx_u64(&x40, &x41, 0x0, (arg1[2]), x32);
+  uint64_t x42;
+  fiat_p256_uint1 x43;
+  fiat_p256_addcarryx_u64(&x42, &x43, x41, 0x0, x36);
+  uint64_t x44;
+  fiat_p256_uint1 x45;
+  fiat_p256_addcarryx_u64(&x44, &x45, x43, 0x0, x38);
+  uint64_t x46;
+  uint64_t x47;
+  fiat_p256_mulx_u64(&x46, &x47, x40, UINT64_C(0xffffffff00000001));
+  uint64_t x48;
+  uint64_t x49;
+  fiat_p256_mulx_u64(&x48, &x49, x40, UINT32_C(0xffffffff));
+  uint64_t x50;
+  uint64_t x51;
+  fiat_p256_mulx_u64(&x50, &x51, x40, UINT64_C(0xffffffffffffffff));
+  uint64_t x52;
+  fiat_p256_uint1 x53;
+  fiat_p256_addcarryx_u64(&x52, &x53, 0x0, x48, x51);
+  uint64_t x54;
+  fiat_p256_uint1 x55;
+  fiat_p256_addcarryx_u64(&x54, &x55, 0x0, x50, x40);
+  uint64_t x56;
+  fiat_p256_uint1 x57;
+  fiat_p256_addcarryx_u64(&x56, &x57, x55, x52, x42);
+  uint64_t x58;
+  fiat_p256_uint1 x59;
+  fiat_p256_addcarryx_u64(&x58, &x59, x53, 0x0, x49);
+  uint64_t x60;
+  fiat_p256_uint1 x61;
+  fiat_p256_addcarryx_u64(&x60, &x61, x57, x58, x44);
+  uint64_t x62;
+  fiat_p256_uint1 x63;
+  fiat_p256_addcarryx_u64(&x62, &x63, x39, x17, 0x0);
+  uint64_t x64;
+  fiat_p256_uint1 x65;
+  fiat_p256_addcarryx_u64(&x64, &x65, x45, 0x0, x62);
+  uint64_t x66;
+  fiat_p256_uint1 x67;
+  fiat_p256_addcarryx_u64(&x66, &x67, x61, x46, x64);
+  uint64_t x68;
+  fiat_p256_uint1 x69;
+  fiat_p256_addcarryx_u64(&x68, &x69, 0x0, (arg1[3]), x56);
+  uint64_t x70;
+  fiat_p256_uint1 x71;
+  fiat_p256_addcarryx_u64(&x70, &x71, x69, 0x0, x60);
+  uint64_t x72;
+  fiat_p256_uint1 x73;
+  fiat_p256_addcarryx_u64(&x72, &x73, x71, 0x0, x66);
+  uint64_t x74;
+  uint64_t x75;
+  fiat_p256_mulx_u64(&x74, &x75, x68, UINT64_C(0xffffffff00000001));
+  uint64_t x76;
+  uint64_t x77;
+  fiat_p256_mulx_u64(&x76, &x77, x68, UINT32_C(0xffffffff));
+  uint64_t x78;
+  uint64_t x79;
+  fiat_p256_mulx_u64(&x78, &x79, x68, UINT64_C(0xffffffffffffffff));
+  uint64_t x80;
+  fiat_p256_uint1 x81;
+  fiat_p256_addcarryx_u64(&x80, &x81, 0x0, x76, x79);
+  uint64_t x82;
+  fiat_p256_uint1 x83;
+  fiat_p256_addcarryx_u64(&x82, &x83, 0x0, x78, x68);
+  uint64_t x84;
+  fiat_p256_uint1 x85;
+  fiat_p256_addcarryx_u64(&x84, &x85, x83, x80, x70);
+  uint64_t x86;
+  fiat_p256_uint1 x87;
+  fiat_p256_addcarryx_u64(&x86, &x87, x81, 0x0, x77);
+  uint64_t x88;
+  fiat_p256_uint1 x89;
+  fiat_p256_addcarryx_u64(&x88, &x89, x85, x86, x72);
+  uint64_t x90;
+  fiat_p256_uint1 x91;
+  fiat_p256_addcarryx_u64(&x90, &x91, x67, x47, 0x0);
+  uint64_t x92;
+  fiat_p256_uint1 x93;
+  fiat_p256_addcarryx_u64(&x92, &x93, x73, 0x0, x90);
+  uint64_t x94;
+  fiat_p256_uint1 x95;
+  fiat_p256_addcarryx_u64(&x94, &x95, x89, x74, x92);
+  uint64_t x96;
+  fiat_p256_uint1 x97;
+  fiat_p256_addcarryx_u64(&x96, &x97, x95, x75, 0x0);
+  uint64_t x98;
+  fiat_p256_uint1 x99;
+  fiat_p256_subborrowx_u64(&x98, &x99, 0x0, x84, UINT64_C(0xffffffffffffffff));
+  uint64_t x100;
+  fiat_p256_uint1 x101;
+  fiat_p256_subborrowx_u64(&x100, &x101, x99, x88, UINT32_C(0xffffffff));
+  uint64_t x102;
+  fiat_p256_uint1 x103;
+  fiat_p256_subborrowx_u64(&x102, &x103, x101, x94, 0x0);
+  uint64_t x104;
+  fiat_p256_uint1 x105;
+  fiat_p256_subborrowx_u64(&x104, &x105, x103, x96, UINT64_C(0xffffffff00000001));
+  uint64_t x106;
+  fiat_p256_uint1 x107;
+  fiat_p256_subborrowx_u64(&x106, &x107, x105, 0x0, 0x0);
+  uint64_t x108;
+  fiat_p256_cmovznz_u64(&x108, x107, x98, x84);
+  uint64_t x109;
+  fiat_p256_cmovznz_u64(&x109, x107, x100, x88);
+  uint64_t x110;
+  fiat_p256_cmovznz_u64(&x110, x107, x102, x94);
+  uint64_t x111;
+  fiat_p256_cmovznz_u64(&x111, x107, x104, x96);
+  out1[0] = x108;
+  out1[1] = x109;
+  out1[2] = x110;
+  out1[3] = x111;
+}
+
+/*
+ * Input Bounds:
+ *   arg1: [[0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff]]
+ * Output Bounds:
+ *   out1: [0x0 ~> 0xffffffffffffffff]
+ */
+static void fiat_p256_nonzero(uint64_t* out1, const uint64_t arg1[4]) {
+  uint64_t x1 = ((arg1[0]) | ((arg1[1]) | ((arg1[2]) | ((arg1[3]) | (uint64_t)0x0))));
+  *out1 = x1;
+}
+
+/*
+ * Input Bounds:
+ *   arg1: [0x0 ~> 0x1]
+ *   arg2: [[0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff]]
+ *   arg3: [[0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff]]
+ * Output Bounds:
+ *   out1: [[0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff]]
+ */
+static void fiat_p256_selectznz(uint64_t out1[4], fiat_p256_uint1 arg1, const uint64_t arg2[4], const uint64_t arg3[4]) {
+  uint64_t x1;
+  fiat_p256_cmovznz_u64(&x1, arg1, (arg2[0]), (arg3[0]));
+  uint64_t x2;
+  fiat_p256_cmovznz_u64(&x2, arg1, (arg2[1]), (arg3[1]));
+  uint64_t x3;
+  fiat_p256_cmovznz_u64(&x3, arg1, (arg2[2]), (arg3[2]));
+  uint64_t x4;
+  fiat_p256_cmovznz_u64(&x4, arg1, (arg2[3]), (arg3[3]));
+  out1[0] = x1;
+  out1[1] = x2;
+  out1[2] = x3;
+  out1[3] = x4;
+}
+
+/*
+ * Input Bounds:
+ *   arg1: [[0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff]]
+ * Output Bounds:
+ *   out1: [[0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff]]
+ */
+static void fiat_p256_to_bytes(uint8_t out1[32], const uint64_t arg1[4]) {
+  uint64_t x1 = (arg1[3]);
+  uint64_t x2 = (arg1[2]);
+  uint64_t x3 = (arg1[1]);
+  uint64_t x4 = (arg1[0]);
+  uint64_t x5 = (x4 >> 8);
+  uint8_t x6 = (uint8_t)(x4 & UINT8_C(0xff));
+  uint64_t x7 = (x5 >> 8);
+  uint8_t x8 = (uint8_t)(x5 & UINT8_C(0xff));
+  uint64_t x9 = (x7 >> 8);
+  uint8_t x10 = (uint8_t)(x7 & UINT8_C(0xff));
+  uint64_t x11 = (x9 >> 8);
+  uint8_t x12 = (uint8_t)(x9 & UINT8_C(0xff));
+  uint64_t x13 = (x11 >> 8);
+  uint8_t x14 = (uint8_t)(x11 & UINT8_C(0xff));
+  uint64_t x15 = (x13 >> 8);
+  uint8_t x16 = (uint8_t)(x13 & UINT8_C(0xff));
+  uint8_t x17 = (uint8_t)(x15 >> 8);
+  uint8_t x18 = (uint8_t)(x15 & UINT8_C(0xff));
+  uint8_t x19 = (uint8_t)(x17 & UINT8_C(0xff));
+  uint64_t x20 = (x3 >> 8);
+  uint8_t x21 = (uint8_t)(x3 & UINT8_C(0xff));
+  uint64_t x22 = (x20 >> 8);
+  uint8_t x23 = (uint8_t)(x20 & UINT8_C(0xff));
+  uint64_t x24 = (x22 >> 8);
+  uint8_t x25 = (uint8_t)(x22 & UINT8_C(0xff));
+  uint64_t x26 = (x24 >> 8);
+  uint8_t x27 = (uint8_t)(x24 & UINT8_C(0xff));
+  uint64_t x28 = (x26 >> 8);
+  uint8_t x29 = (uint8_t)(x26 & UINT8_C(0xff));
+  uint64_t x30 = (x28 >> 8);
+  uint8_t x31 = (uint8_t)(x28 & UINT8_C(0xff));
+  uint8_t x32 = (uint8_t)(x30 >> 8);
+  uint8_t x33 = (uint8_t)(x30 & UINT8_C(0xff));
+  uint8_t x34 = (uint8_t)(x32 & UINT8_C(0xff));
+  uint64_t x35 = (x2 >> 8);
+  uint8_t x36 = (uint8_t)(x2 & UINT8_C(0xff));
+  uint64_t x37 = (x35 >> 8);
+  uint8_t x38 = (uint8_t)(x35 & UINT8_C(0xff));
+  uint64_t x39 = (x37 >> 8);
+  uint8_t x40 = (uint8_t)(x37 & UINT8_C(0xff));
+  uint64_t x41 = (x39 >> 8);
+  uint8_t x42 = (uint8_t)(x39 & UINT8_C(0xff));
+  uint64_t x43 = (x41 >> 8);
+  uint8_t x44 = (uint8_t)(x41 & UINT8_C(0xff));
+  uint64_t x45 = (x43 >> 8);
+  uint8_t x46 = (uint8_t)(x43 & UINT8_C(0xff));
+  uint8_t x47 = (uint8_t)(x45 >> 8);
+  uint8_t x48 = (uint8_t)(x45 & UINT8_C(0xff));
+  uint8_t x49 = (uint8_t)(x47 & UINT8_C(0xff));
+  uint64_t x50 = (x1 >> 8);
+  uint8_t x51 = (uint8_t)(x1 & UINT8_C(0xff));
+  uint64_t x52 = (x50 >> 8);
+  uint8_t x53 = (uint8_t)(x50 & UINT8_C(0xff));
+  uint64_t x54 = (x52 >> 8);
+  uint8_t x55 = (uint8_t)(x52 & UINT8_C(0xff));
+  uint64_t x56 = (x54 >> 8);
+  uint8_t x57 = (uint8_t)(x54 & UINT8_C(0xff));
+  uint64_t x58 = (x56 >> 8);
+  uint8_t x59 = (uint8_t)(x56 & UINT8_C(0xff));
+  uint64_t x60 = (x58 >> 8);
+  uint8_t x61 = (uint8_t)(x58 & UINT8_C(0xff));
+  uint8_t x62 = (uint8_t)(x60 >> 8);
+  uint8_t x63 = (uint8_t)(x60 & UINT8_C(0xff));
+  out1[0] = x6;
+  out1[1] = x8;
+  out1[2] = x10;
+  out1[3] = x12;
+  out1[4] = x14;
+  out1[5] = x16;
+  out1[6] = x18;
+  out1[7] = x19;
+  out1[8] = x21;
+  out1[9] = x23;
+  out1[10] = x25;
+  out1[11] = x27;
+  out1[12] = x29;
+  out1[13] = x31;
+  out1[14] = x33;
+  out1[15] = x34;
+  out1[16] = x36;
+  out1[17] = x38;
+  out1[18] = x40;
+  out1[19] = x42;
+  out1[20] = x44;
+  out1[21] = x46;
+  out1[22] = x48;
+  out1[23] = x49;
+  out1[24] = x51;
+  out1[25] = x53;
+  out1[26] = x55;
+  out1[27] = x57;
+  out1[28] = x59;
+  out1[29] = x61;
+  out1[30] = x63;
+  out1[31] = x62;
+}
+
+/*
+ * Input Bounds:
+ *   arg1: [[0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff]]
+ * Output Bounds:
+ *   out1: [[0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff], [0x0 ~> 0xffffffffffffffff]]
+ */
+static void fiat_p256_from_bytes(uint64_t out1[4], const uint8_t arg1[32]) {
+  uint64_t x1 = ((uint64_t)(arg1[31]) << 56);
+  uint64_t x2 = ((uint64_t)(arg1[30]) << 48);
+  uint64_t x3 = ((uint64_t)(arg1[29]) << 40);
+  uint64_t x4 = ((uint64_t)(arg1[28]) << 32);
+  uint64_t x5 = ((uint64_t)(arg1[27]) << 24);
+  uint64_t x6 = ((uint64_t)(arg1[26]) << 16);
+  uint64_t x7 = ((uint64_t)(arg1[25]) << 8);
+  uint8_t x8 = (arg1[24]);
+  uint64_t x9 = ((uint64_t)(arg1[23]) << 56);
+  uint64_t x10 = ((uint64_t)(arg1[22]) << 48);
+  uint64_t x11 = ((uint64_t)(arg1[21]) << 40);
+  uint64_t x12 = ((uint64_t)(arg1[20]) << 32);
+  uint64_t x13 = ((uint64_t)(arg1[19]) << 24);
+  uint64_t x14 = ((uint64_t)(arg1[18]) << 16);
+  uint64_t x15 = ((uint64_t)(arg1[17]) << 8);
+  uint8_t x16 = (arg1[16]);
+  uint64_t x17 = ((uint64_t)(arg1[15]) << 56);
+  uint64_t x18 = ((uint64_t)(arg1[14]) << 48);
+  uint64_t x19 = ((uint64_t)(arg1[13]) << 40);
+  uint64_t x20 = ((uint64_t)(arg1[12]) << 32);
+  uint64_t x21 = ((uint64_t)(arg1[11]) << 24);
+  uint64_t x22 = ((uint64_t)(arg1[10]) << 16);
+  uint64_t x23 = ((uint64_t)(arg1[9]) << 8);
+  uint8_t x24 = (arg1[8]);
+  uint64_t x25 = ((uint64_t)(arg1[7]) << 56);
+  uint64_t x26 = ((uint64_t)(arg1[6]) << 48);
+  uint64_t x27 = ((uint64_t)(arg1[5]) << 40);
+  uint64_t x28 = ((uint64_t)(arg1[4]) << 32);
+  uint64_t x29 = ((uint64_t)(arg1[3]) << 24);
+  uint64_t x30 = ((uint64_t)(arg1[2]) << 16);
+  uint64_t x31 = ((uint64_t)(arg1[1]) << 8);
+  uint8_t x32 = (arg1[0]);
+  uint64_t x33 = (x32 + (x31 + (x30 + (x29 + (x28 + (x27 + (x26 + x25)))))));
+  uint64_t x34 = (x33 & UINT64_C(0xffffffffffffffff));
+  uint64_t x35 = (x8 + (x7 + (x6 + (x5 + (x4 + (x3 + (x2 + x1)))))));
+  uint64_t x36 = (x16 + (x15 + (x14 + (x13 + (x12 + (x11 + (x10 + x9)))))));
+  uint64_t x37 = (x24 + (x23 + (x22 + (x21 + (x20 + (x19 + (x18 + x17)))))));
+  uint64_t x38 = (x37 & UINT64_C(0xffffffffffffffff));
+  uint64_t x39 = (x36 & UINT64_C(0xffffffffffffffff));
+  out1[0] = x34;
+  out1[1] = x38;
+  out1[2] = x39;
+  out1[3] = x35;
+}
+