delocation: large memory model support.

Large memory models on x86-64 allow the code/data of a shared object /
executable to be larger than 2GiB. This is typically impossible because
x86-64 code frequently uses int32 offsets from RIP.

Consider the following program:

    int getpid();

    int main() {
        return getpid();
    }

This is turned into the following assembly under a large memory model:

.L0$pb:
	leaq	.L0$pb(%rip), %rax
	movabsq	$_GLOBAL_OFFSET_TABLE_-.L0$pb, %rcx
	addq	%rax, %rcx
	movabsq	$getpid@GOT, %rdx
	xorl	%eax, %eax
	jmpq	*(%rcx,%rdx)            # TAILCALL

And, with relocations:

   0:	48 8d 05 f9 ff ff ff 	lea    -0x7(%rip),%rax        # 0 <main>
   7:	48 b9 00 00 00 00 00 	movabs $0x0,%rcx
   e:	00 00 00
			9: R_X86_64_GOTPC64	_GLOBAL_OFFSET_TABLE_+0x9
  11:	48 01 c1             	add    %rax,%rcx
  14:	48 ba 00 00 00 00 00 	movabs $0x0,%rdx
  1b:	00 00 00
			16: R_X86_64_GOT64	getpid
  1e:	31 c0                	xor    %eax,%eax
  20:	ff 24 11             	jmpq   *(%rcx,%rdx,1)

We can see that, in the large memory model, function calls involve
loading the address of _GLOBAL_OFFSET_TABLE_ (using `movabs`, which
takes a 64-bit immediate) and then indexing into it. Both cause
relocations.

If we link the binary and disassemble we get:

0000000000001120 <main>:
    1120:	48 8d 05 f9 ff ff ff 	lea    -0x7(%rip),%rax        # 1120 <main>
    1127:	48 b9 e0 2e 00 00 00 	movabs $0x2ee0,%rcx
    112e:	00 00 00
    1131:	48 01 c1             	add    %rax,%rcx
    1134:	48 ba d8 ff ff ff ff 	movabs $0xffffffffffffffd8,%rdx
    113b:	ff ff ff
    113e:	31 c0                	xor    %eax,%eax
    1140:	ff 24 11             	jmpq   *(%rcx,%rdx,1)

Thus the _GLOBAL_OFFSET_TABLE_ symbol is at 0x1120+0x2ee0 = 0x4000.
That's the address of the .got.plt section. But the offset “into” the
table is -0x40, putting it at 0x3fd8, in .got:

Idx Name          Size      VMA               LMA               File off  Algn
 18 .got          00000030  0000000000003fd0  0000000000003fd0  00002fd0  2**3
 19 .got.plt      00000018  0000000000004000  0000000000004000  00003000  2**3

And, indeed, there's a dynamic relocation to setup that address:

OFFSET           TYPE              VALUE
0000000000003fd8 R_X86_64_GLOB_DAT  getpid@GLIBC_2.2.5

Accessing data or BSS works the same: the address of the variable is
stored relative to _GLOBAL_OFFSET_TABLE_.

This is a bit of a pain because we want to delocate the module into a
single .text segment so that it moves through linking unaltered. If we
took the obvious path and built our own offset table then it would need
to contain absolute addresses, but they are only available at runtime
and .text segments aren't supposed to be run-time patched. (That's why
.rela.dyn is a separate segment.) If we use a different segment then
we have the same problem as with the original offset table: the offset
to the segment is unknown when compiling the module.

Trying to pattern match this two-step lookup to do extensive rewriting
seems fragile: I'm sure the compilers will move things around and
interleave other work in time, if they don't already.

So, in order to handle movabs trying to load _GLOBAL_OFFSET_TABLE_ we
define a symbol in the same segment, but outside of the hashed region of
the module, that contains the offset from that position to
_GLOBAL_OFFSET_TABLE_:

.boringssl_got_delta:
    .quad _GLOBAL_OFFSET_TABLE_-.boringssl_got_delta

Then a movabs of $_GLOBAL_OFFSET_TABLE_-.Lfoo turns into:

movq .boringssl_got_delta(%rip), %destreg
addq $.boringssl_got_delta-.Lfoo, %destreg

This works because it's calculating
_GLOBAL_OFFSET_TABLE_ - got_delta + (got_delta - .Lfoo)

When that value is added to .Lfoo, as the original code will do, the
correct address results. Also it doesn't need an extra register because
we know that 32-bit offsets are sufficient for offsets within the
module.

As for the offsets within the offset table, we have to load them from
locations outside of the hashed part of the module to get the
relocations out of the way. Again, no extra registers are needed.

Change-Id: I87b19a2f8886bd9f7ac538fd55754e526bcf3097
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/42324
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
6 files changed
tree: b20532dff2dc62be730ea76d6c40ad6cde1f755d
  1. .github/
  2. crypto/
  3. decrepit/
  4. fuzz/
  5. include/
  6. ssl/
  7. third_party/
  8. tool/
  9. util/
  10. .clang-format
  11. .gitignore
  12. API-CONVENTIONS.md
  13. BREAKING-CHANGES.md
  14. BUILDING.md
  15. CMakeLists.txt
  16. codereview.settings
  17. CONTRIBUTING.md
  18. FUZZING.md
  19. go.mod
  20. go.sum
  21. INCORPORATING.md
  22. LICENSE
  23. PORTING.md
  24. README.md
  25. SANDBOXING.md
  26. sources.cmake
  27. STYLE.md
README.md

BoringSSL

BoringSSL is a fork of OpenSSL that is designed to meet Google's needs.

Although BoringSSL is an open source project, it is not intended for general use, as OpenSSL is. We don't recommend that third parties depend upon it. Doing so is likely to be frustrating because there are no guarantees of API or ABI stability.

Programs ship their own copies of BoringSSL when they use it and we update everything as needed when deciding to make API changes. This allows us to mostly avoid compromises in the name of compatibility. It works for us, but it may not work for you.

BoringSSL arose because Google used OpenSSL for many years in various ways and, over time, built up a large number of patches that were maintained while tracking upstream OpenSSL. As Google's product portfolio became more complex, more copies of OpenSSL sprung up and the effort involved in maintaining all these patches in multiple places was growing steadily.

Currently BoringSSL is the SSL library in Chrome/Chromium, Android (but it's not part of the NDK) and a number of other apps/programs.

Project links:

There are other files in this directory which might be helpful: