Add support for SIKE/p503 post-quantum KEM

Based on Microsoft's implementation available on github:
Source: https://github.com/Microsoft/PQCrypto-SIDH
Commit: 77044b76181eb61c744ac8eb7ddc7a8fe72f6919

Following changes has been applied

* In intel assembly, use MOV instead of MOVQ:
  Intel instruction reference in the Intel Software Developer's Manual
  volume 2A, the MOVQ has 4 forms. None of them mentions moving
  literal to GPR, hence "movq $rax, 0x0" is wrong. Instead, on 64bit
  system, MOV can be used.

* Some variables were wrongly zero-initialized (as per C99 spec).

* Rewrite x86_64 assembly to AT&T format.

* Move assembly for x86_64 and aarch64 to perlasm.

* Changes to aarch64 assembly, to avoid using x18 platform register.
  Assembly also correctly constructs linked list of stack-frames as
  described in AAPCS64, 5.2.3.

* Move constant values to .RODATA segment, as keeping them in .TEXT
  segment is not compatible with XOM.

* Fixes issue in arm64 code related to the fact that compiler doesn't
  reserve enough space for the linker to relocate address of a global
  variable when used by 'ldr' instructions. Solution is to use 'adrp'
  followed by 'add' instruction. Relocations for 'adrp' and 'add'
  instructions is generated by prefixing the label with :pg_hi21:
  and :lo12: respectively.

* Enable MULX and ADX. Code from MS doesn't support PIC. MULX can't
  reference global variable directly. Instead RIP-relative addressing
  can be used. This improves performance around 10%-13% on SkyLake

* Check if CPU supports BMI2 and ADOX instruction at runtime. On AMD64
  optimized implementation of montgomery multiplication and reduction
  have 2 implementations - faster one takes advantage of BMI2
  instruction set introduced in Haswell and ADOX introduced in
  Broadwell. Thanks to OPENSSL_ia32cap_P it can be decided at runtime
  which implementation to choose. As CPU configuration is static by
  nature, branch predictor will be correct most of the time and hence
  this check very often has no cost.

* Reuse some utilities from boringssl instead of reimplementing them.
  This includes things like:
  * definition of a limb size (use crypto_word_t instead of digit_t)
  * use functions for checking in constant time if value is 0 and/or
    less then
  * #define's used for conditional compilation

* Use SSE2 for conditional swap on vector registers. Improves
  performance a little bit.

* Fix f2elm_t definition. Code imported from MSR defines f2elm_t type as
  a array of arrays. This decays to a pointer to an array (when passing
  as an argument). In C, one can't assign const pointer to an array with
  non-const pointer to an array. Seems it violates 6.7.3/8 from C99
  (same for C11). This problem occures in GCC 6, only when -pedantic
  flag is specified and it occures always in GCC 4.9 (debian jessie).

* Fix definition of eval_3_isog. Second argument in eval_3_isog mustn't be
  const. Similar reason as above.

* Use HMAC-SHA256 instead of cSHAKE-256 to avoid upstreaming cSHAKE
  and SHA3 code.

* Add speed and unit tests for SIKE.

Some speed results:

Skylake (64-bit):

Did 408 SIKE/P503 generate operations in 1002573us (407.0 ops/sec)
Did 275 SIKE/P503 encap operations in 1070570us (256.9 ops/sec)
Did 264 SIKE/P503 decap operations in 1098955us (240.2 ops/sec)

Skylake (32-bit):

Did 9 SIKE/P503 generate operations in 1051620us (8.6 ops/sec)
Did 5 SIKE/P503 encap operations in 1038251us (4.8 ops/sec)
Did 5 SIKE/P503 decap operations in 1103617us (4.5 ops/sec)

Change-Id: I22f0bb1f9edff314a35cd74b48e8c4962568e330
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35204
Reviewed-by: Adam Langley <alangley@gmail.com>
16 files changed
tree: f68bcb6c62b28c8d8e05d89f7e1fb531a2f36950
  1. .clang-format
  2. .github/
  3. .gitignore
  4. API-CONVENTIONS.md
  5. BREAKING-CHANGES.md
  6. BUILDING.md
  7. CMakeLists.txt
  8. CONTRIBUTING.md
  9. FUZZING.md
  10. INCORPORATING.md
  11. LICENSE
  12. PORTING.md
  13. README.md
  14. STYLE.md
  15. codereview.settings
  16. crypto/
  17. decrepit/
  18. fipstools/
  19. fuzz/
  20. go.mod
  21. include/
  22. sources.cmake
  23. ssl/
  24. third_party/
  25. tool/
  26. util/
README.md

BoringSSL

BoringSSL is a fork of OpenSSL that is designed to meet Google's needs.

Although BoringSSL is an open source project, it is not intended for general use, as OpenSSL is. We don't recommend that third parties depend upon it. Doing so is likely to be frustrating because there are no guarantees of API or ABI stability.

Programs ship their own copies of BoringSSL when they use it and we update everything as needed when deciding to make API changes. This allows us to mostly avoid compromises in the name of compatibility. It works for us, but it may not work for you.

BoringSSL arose because Google used OpenSSL for many years in various ways and, over time, built up a large number of patches that were maintained while tracking upstream OpenSSL. As Google's product portfolio became more complex, more copies of OpenSSL sprung up and the effort involved in maintaining all these patches in multiple places was growing steadily.

Currently BoringSSL is the SSL library in Chrome/Chromium, Android (but it's not part of the NDK) and a number of other apps/programs.

There are other files in this directory which might be helpful: