Fix missing vzeroupper in poly_Rq_mul()

poly_Rq_mul() uses ymm registers, so vzeroupper needs to be executed
before returning in order to avoid slowing down subsequent SSE code.

Change-Id: Id85e4ede05c612e0edf4c92a298531dd4c358bf4
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/77229
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
diff --git a/crypto/hrss/asm/poly_rq_mul.S b/crypto/hrss/asm/poly_rq_mul.S
index 2b99d0e..abbc4e3 100644
--- a/crypto/hrss/asm/poly_rq_mul.S
+++ b/crypto/hrss/asm/poly_rq_mul.S
@@ -8475,6 +8475,7 @@
 vpaddw 2752(%r8), %ymm11, %ymm11
 vpand mask_mod8192(%rip), %ymm11, %ymm11
 vmovdqu %ymm11, 1320(%rdi)
+vzeroupper
 pop %r12
 .cfi_restore r12
 pop %rbp