Clarify origin of optimized computation of Montgomery n0.

I found an earlier reference for an algorithm for the optimized
computation of n0 that is very similar to the one in the "Montgomery
Multiplication" paper cited in the comments. Add a reference to it.

Henry S. Warren, Jr. pointed out that his "Montgomery Multiplication"
paper is not a chapter of his book, but a supplement to the book.
Correct the reference to it.

Change-Id: Iadeb148c61ce646d1262ccba0207a31ebdad63e9
Reviewed-on: https://boringssl-review.googlesource.com/10480
Reviewed-by: Adam Langley <agl@google.com>
diff --git a/crypto/bn/montgomery_inv.c b/crypto/bn/montgomery_inv.c
index ba085ab..28db62b 100644
--- a/crypto/bn/montgomery_inv.c
+++ b/crypto/bn/montgomery_inv.c
@@ -83,9 +83,11 @@
  * such that u*r - v*n == 1. |r| is the constant defined in |bn_mont_n0|. |n|
  * must be odd.
  *
- * This is derived from |xbinGCD| in the "Montgomery Multiplication" chapter of
- * "Hacker's Delight" by Henry S. Warren, Jr.:
- * http://www.hackersdelight.org/MontgomeryMultiplication.pdf.
+ * This is derived from |xbinGCD| in Henry S. Warren, Jr.'s "Montgomery
+ * Multiplication" (http://www.hackersdelight.org/MontgomeryMultiplication.pdf).
+ * It is very similar to the MODULAR-INVERSE function in Stephen R. Dussé's and
+ * Burton S. Kaliski Jr.'s "A Cryptographic Library for the Motorola DSP56000"
+ * (http://link.springer.com/chapter/10.1007%2F3-540-46877-3_21).
  *
  * This is inspired by Joppe W. Bos's "Constant Time Modular Inversion"
  * (http://www.joppebos.com/files/CTInversion.pdf) so that the inversion is