Document and test stance on non-canonical base64

From RFC 4648:

3.5. Canonical Encoding

   The padding step in base 64 and base 32 encoding can, if improperly
   implemented, lead to non-significant alterations of the encoded data.
   For example, if the input is only one octet for a base 64 encoding,
   then all six bits of the first symbol are used, but only the first
   two bits of the next symbol are used.  These pad bits MUST be set to
   zero by conforming encoders, which is described in the descriptions
   on padding below.  If this property do not hold, there is no
   canonical representation of base-encoded data, and multiple base-
   encoded strings can be decoded to the same binary data.  If this
   property (and others discussed in this document) holds, a canonical
   encoding is guaranteed.

   In some environments, the alteration is critical and therefore
   decoders MAY chose to reject an encoding if the pad bits have not
   been set to zero.  The specification referring to this may mandate a
   specific behaviour.

OpenSSL's decoder has always accepted non-canonical encodings and it
still appears to be the prevalent practice in 2024. In particular Go's
encoding/base64 package requires you to opt into strict mode (which
encoding/pem does not use). Also, Bouncy Castle and NSS accept such
encodings.

So add a comment to the code that this is a deliberate, if perhaps
begrudging, choice and encode this in regress with a few test cases
that are more obviously of a degenerate nature than the current
non-canonical forms.

Also, group the test vectors straight from RFC 4648 section 10 together.

Change-Id: Ibcc22b7feed86fe1cb0fd51a1d61ec0c60dc8672
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/68247
Auto-Submit: Theo Buehler <theorbuehler@gmail.com>
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Bob Beck <bbe@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
diff --git a/crypto/base64/base64.c b/crypto/base64/base64.c
index 666f832..26ad974 100644
--- a/crypto/base64/base64.c
+++ b/crypto/base64/base64.c
@@ -307,6 +307,10 @@
                                    (in[2] == '=') << 1 |
                                    (in[3] == '=');
 
+  // In presence of padding, the lowest bits of v are unused. Canonical encoding
+  // (RFC 4648, section 3.5) requires that these bits all be set to zero. Common
+  // PEM parsers accept noncanonical base64, adding to the malleability of the
+  // format. This decoder follows OpenSSL's and Go's PEM parsers and accepts it.
   switch (padding_pattern) {
     case 0:
       // The common case of no padding.
diff --git a/crypto/base64/base64_test.cc b/crypto/base64/base64_test.cc
index 6484dc6..f246605 100644
--- a/crypto/base64/base64_test.cc
+++ b/crypto/base64/base64_test.cc
@@ -45,8 +45,8 @@
   const char *encoded;
 };
 
-// Test vectors from RFC 4648.
 static const Base64TestVector kTestVectors[] = {
+    // Test vectors from RFC 4648, section 10.
     {canonical, "", ""},
     {canonical, "f", "Zg==\n"},
     {canonical, "fo", "Zm8=\n"},
@@ -54,12 +54,31 @@
     {canonical, "foob", "Zm9vYg==\n"},
     {canonical, "fooba", "Zm9vYmE=\n"},
     {canonical, "foobar", "Zm9vYmFy\n"},
-    {valid, "foobar", "Zm9vYmFy\n\n"},
-    {valid, "foobar", " Zm9vYmFy\n\n"},
-    {valid, "foobar", " Z m 9 v Y m F y\n\n"},
+
     {invalid, "", "Zm9vYmFy=\n"},
     {invalid, "", "Zm9vYmFy==\n"},
     {invalid, "", "Zm9vYmFy===\n"},
+
+    // valid non-canonical encodings due to arbitrary whitespace
+    {valid, "foobar", "Zm9vYmFy\n\n"},
+    {valid, "foobar", " Zm9vYmFy\n\n"},
+    {valid, "foobar", " Z m 9 v Y m F y\n\n"},
+    {valid, "foobar", "Zm9vYmFy\r\n"},
+
+    // The following "valid" encodings are arguably invalid, but they are
+    // commonly accepted by parsers, in particular by OpenSSL.
+    {valid, "v", "dv==\n"},
+    {canonical, "w", "dw==\n"},
+    {valid, "w", "dx==\n"},
+    {valid, "w", "d+==\n"},
+    {valid, "w", "d/==\n"},
+    {invalid, "", "d===\n"},
+    {canonical, "w`", "d2A=\n"},
+    {valid, "w`", "d2B=\n"},
+    {valid, "w`", "d2C=\n"},
+    {valid, "w`", "d2D=\n"},
+    {canonical, "wa", "d2E=\n"},
+
     {invalid, "", "Z"},
     {invalid, "", "Z\n"},
     {invalid, "", "ab!c"},