Import sha512-armv8.pl transforms from upstream NEON code.

We currently have two aarch64 SHA-256 implementations: one using
general-purpose registers and one using the SHA-256 extensions.
Upstream's 866e505e0d663158b0fe63a7fb7455eebacc6470 added a NEON
version.

This CL syncs the transforms at the bottom of the file, to avoid
potential mistranslations in future imports. It doesn't change the
output for our current assembly.

Skips the NEON implementation itself for now. It only helps
processors without SHA-256 instructions. While Android does not
actually mandate the cryptography extensions on ARMv8, most devices
have it.

Additionally, this file does CPU dispatch in assembly, without taking
advantage of static information. We'd end up shipping both fallback
SHA-256 implementations. This is particularly silly because NEON is
mandatory in ARMv8-A anyway. (Does anyone build us on -R or -M? Probably
not?)

(If we later have a reason to import it, the binary size cost isn't that
significant. Moreover, the NEON fallback is actually slightly smaller
than the non-NEON fallback, so if we move CPU dispatch to C, importing
may even be worthwhile.)

Change-Id: I3c8ca6e77e4e6d1299f975c407cbcf4c9c240523
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/50805
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
diff --git a/crypto/fipsmodule/sha/asm/sha512-armv8.pl b/crypto/fipsmodule/sha/asm/sha512-armv8.pl
index ae803a9..01154c2 100644
--- a/crypto/fipsmodule/sha/asm/sha512-armv8.pl
+++ b/crypto/fipsmodule/sha/asm/sha512-armv8.pl
@@ -449,12 +449,15 @@
 
 foreach(split("\n",$code)) {
 
-	s/\`([^\`]*)\`/eval($1)/geo;
+	s/\`([^\`]*)\`/eval($1)/ge;
 
-	s/\b(sha256\w+)\s+([qv].*)/unsha256($1,$2)/geo;
+	s/\b(sha256\w+)\s+([qv].*)/unsha256($1,$2)/ge;
 
-	s/\.\w?32\b//o		and s/\.16b/\.4s/go;
-	m/(ld|st)1[^\[]+\[0\]/o	and s/\.4s/\.s/go;
+	s/\bq([0-9]+)\b/v$1.16b/g;		# old->new registers
+
+	s/\.[ui]?8(\s)/$1/;
+	s/\.\w?32\b//		and s/\.16b/\.4s/g;
+	m/(ld|st)1[^\[]+\[0\]/	and s/\.4s/\.s/g;
 
 	print $_,"\n";
 }