Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
Wanted to give a little update.

LD64: (1) I've found that with a few small changes to the code, LD64 no longer needs OpenSSL on OS X 10.4+, which greatly simplifies the dependency tree.

Attached is my patch.

(2) From looking at the change log, it seems that GCC 4.0+ anonymous symbols were removed in an earlier version, so it is no wonder that I was having problems with some binaries. I did some preliminary work on at least fixing the Bus Error, but this did not completely solve the issue since the files that caused the Bus Error ended up corrupted and won't load (attached is a separate patch in this regard just in case it may be helpful as a clue).

My guess is that this issue is caused by the removal of GCC 4.0 support, but I haven't expended the effort to look deeper into the problem since it seems that it is not an issue for others (who are using newer compilers) and for myself I've already found a different solution. But, if there is a huge interest, I may revisit re-adding the GCC 4 support back to LD64-97 at a later date and looking deeper into that Bus Error bug if that doesn't completely fix the problem.

However, for my own purposes, since I'm still using GCC 4.0 and just wanted 64-bit PowerPC memory support, I switched to version 62.1 of LD64, which works well and I was able to compile and run TenFourKit.

(3) I've also made a small patch (copy attached) to LD64 version 62 to remove all of the external dependencies. So if you can download the official Apple version of LD 64-62.1 and then apply my patch, you should be able to compile it as 64-bit PowerPC without issue and without external dependencies (if this is not the case, let me know).


OpenSSL: I've hit an important snag and think I will take a break for a bit.

There is clearly an issue with OpenSSL's assembler code as I've described with the .L/L local symbols. In the GNU AS manual (all versions from 1994-present), it suggests pretty clearly that these are intended for GCC (and other similar tools) and not really for hand-written assembler.

And for those using PowerPC Linux and BSD be forewarned that this problem in the assembler optimizations may affect your system, too (!) as described below, but as you will also see further below there's a major snag to fixing it...

The problems are technical but simple: The .L/L local symbols are discarded by GNU AS (i.e. address translation) before code segments are relocated. All versions of the GNU AS manual I've seen also warn that actual machine code may appear in a different order than what's written in the assembler files without making any guarantees or promises about whether/when this will occur. This causes a problem when the branch instruction is in a different code segment than the offset it's branching to.

For example:

Code:
02A   Ltest: move D0,D1
.....
300   bra Ltest
....

Gets translated to:

Code:
02A  move D0, D1
.....
300  bra #$FD28(PC)  // i.e. 02A

However, if "Ltest:" and "bra Ltest" are in different sections and relocated after address translation, you end up with something like:

Code:
02A  or.i $#00, D0   // i.e. some other block of code or even non-code (may not necessarily be "or.i $#00,D0" - this is just an example)
.....
300  bra #$FD28(PC)     // i.e. 02A
....
34A  move D0, D1   // where it should branch

Casually, I'd like to call this a "shearing" of the branch since both the branch and its destination are "sheared" by moving relatively in opposite directions from one another.

And since the symbols are discarded by GNU AS (and even LD if you happened to use a different assembler), there is no way for the assembler / linker / OS to fix the broken reference.

However, two segments are not always independently relocated, so sometimes you will end-up wit the code branching where it's supposed to and other times you won't. It is this inconsistency why the OpenSSL team hasn't noticed a problem (yet) in PowerPC Linux.

And since the .L/L local symbols are largely undocumented, I've come to the conclusion from talking with the OpenSSL team and through my own compilation and disassembly that the warnings provided by GNU AS are minimal if there actually are any at all (since they are not expecting you to use them and GCC outputs GNU AS errors to the Terminal).

This is of course aside from any other potential issues which may exist from using the .L/L local symbols, since much of their behaviour is left undocumented in the GNU AS manual, or other issues that have yet found in the OpenSSL assembler code (considering the circumstances and its implications on code quality).

There are more .L/L local symbol branches than what I've done in this thread so-far. But to be clear, it is only the OpenSSL assembler optimizations which are affected, and these can be turned off at compilation time using with "no-asm" configuration parameter.

For reference, my proxy program is compiled with "no-asm" and using 32-bit PowerPC only and thus does not use the OpenSSL assembler optimizations.

The snag to fixing OpenSSL's code:

When I tried to communicate the problem to the OpenSSL team, the response missed the mark. The person reviewing the code and I are at a small impasse, where he doesn't want to make the change to the main branch and he has declined to have a technical discussion on the issue nor explain and elaborate on his position of why he wants to keep the Local symbols (or even use them in the first place) other than one comment about "not wanting to pollute the code with a global symbol" (which seems to be a possible misunderstanding on his part about what local symbols, local labels, and non-local symbols are and how they all work).

His main response is that he doesn't want to change it for anything but OS X because he hasn't "noticed a problem" in PowerPC Linux (despite him noting that he hasn't necessarily tested every line of code) and other various statements which show to me that while he may be a very talented cryptographer and likely a good C/C++ programmer, he simply doesn't understand the issue and doesn't seem receptive.

He has also hasn't seemed receptive to alternatives that would accommodate his view of symbols.

Nonetheless, to his full credit he is at least not opposed to having a fix for OS 10.4 specifically, although a CLA would be need to be signed and he seems to say that it would depend on also accepting the mpsuzuki patches (which were submitted in 2022(!) and thus their status is not clear to me).

Considering their response, I'm not confident in the quality of OpenSSL's PowerPC assembler code (including on Linux) and don't have the stamina to perform a full review of it.

So, I've determined that it is probably the safest just to turn off the assembler optimizations for now and worry about them later. After all, it's clear their strongest language is C/C++ and for something where security is paramount, you want the most secure version, even if it's at the expense of a little speed.

However, that said, and on the positive, I note that:
(a) the assembler optimizations don't yield that much of a performance benefit (at least not according to the code comments), so we aren't losing much;
(b) OpenSSL is no longer needed for building LD (with the above minor change);

And again (for those like me who are using it):
(c) my A.proxy program's included OpenSSL doesn't use the assembler optimizations at all (it's compiled with no-asm), so it should be unaffected.

Noting that we now seem to have a solution for 64-bit PowerPC LD64 (for Tiger and Leopard) and that OpenSSL works best with the assembler optimizations turned off, I think I'm now going to shift gears and see if I can fix MacIPX in the Classic Environment (for Command and Conquer). If anybody is interested in the MacIPX project, I've been mostly posting in this thread:

 

Attachments

  • ld64-62.patch.txt
    7.9 KB · Views: 1
  • ld64-97-re_bus_error.patch.txt
    1.3 KB · Views: 1
  • ld64-97-remove_openssl.patch.txt
    2 KB · Views: 2
  • Like
Reactions: doctor_dog
Sorry, I haven't been following the GCC thread.

From what I see in the GNU thread, it sounds like it's a missing feature in LD64-97, which could potentially be retrofitted.
 
To clarify for others, the problem with compiling GCC 14 sounds like it is an issue with a displacement branch instruction not being long enough to reach the code, specifically a branch from __ZN10hash_tableI19default_hash_traitsIP11cgraph_edgeELb0E11xcallocatorED1Ev to __Z8ggc_freePv .

To illustrate this problem, using an example, consider the following PowerPC assembly code (with relative addresses on the left):

Code:
0000:0000  b  label
.....
00DF:FF00 label: sc

When translating to machine code, the addresses for branch and label need to be determined. This is because the CPU does not understand "b label", but instead needs it to be converted to an instruction like "b +/-displacement" (10010 xxxxxxxxxxxxxxxxxxxxxxxx 00), where xxxx... is a 24-bit signed integer representing how far away the instruction is from the branch. The range for such a signed integer is -16M to + 16M.

However, when "label" further away than what will fit in the 24-bits, the branch cannot be translated using that instruction. Instead, a workaround needs to happen.

One simple option could be to have a branch(es) in the middle that simply "trampoline" the branch to the proper location, such as:

Code:
0000:0000  b  trampoline
......
007F:FF00   trampoline: b label
.....
00DF:FF00   label: sc

Another option could be to use a register. On X86 and M68K there are many instructions which can do this. However, since PowerPC is a RISC instruction set, there only seem to be two instructions which allow for this - one is branch to link register and the other is branch to control register.

Such a solution would look something like (in this case using the count register):

Code:
0000:0000
......   <--- insert code for saving any relevant register data here (if needed)
1000:0000   li   R1, 0xFF0C
1000:0004   lis  R1, 0x10DF
1000:0008   mtctr R1      // or mtlr and blr, which has the advantage that there's a mflr, but the disadvantage that you must save the link register and restore it after the branch.
1000:000C   bctr
......
10DF:FF0C   label:
..... <-- insert code for restoring any relevant register data here (if needed)
10DF:FF1C   sc

In either case, this is something that would need to be added to LD, but shouldn't be too difficult, just somewhat time consuming and meticulous work.

Alternatively (and much simpler), you could change the code of GCC to fix the problem so you don't need to rely on a branch to +/- 16M (especially since there only seems to be one file that's affected?):

You could figure out where in the source code the two symbols are, and make a copy of "__Z8ggc_freePv " using a different name and place it right beside (or at least closer) to the segment that branches to it. Alternatively, you could inline the code for __Z8ggc_freePv , if it is not too large and save a branch. This could even be done using a #define macro, which could allow you to keep the code itself in one location in case you wanted to modify it later.[/Code]
 
Last edited:
  • Like
Reactions: barracuda156
Alternatively (and much simpler), you could change the code of GCC to fix the problem so you don't need to rely on a branch to +/- 16M

We probably could, but there are issues with that:

– [this is trivial:] the bug in the linker doesn’t vanish, so the problem will likely resurface elsewhere, and then something else has to be patched again;
– having a patched compiler makes debugging harder, since upstream would not expect such changes (and won’t bother to reproduce);
– I am just not qualified enough to mess with a primary compiler beyond something absolutely necessary and well-understood.

There are some chances the issue might be addressed in the new pre-release of Iain’s xtools; the problem though is that Iain does not have a suitable hardware to test ppc64 builds, since his Quad died. I will try new xtools soon and update on this.

P. S. If you have interest in the matter (and unless new xtools fix the problem), it would be cool if you take a look at the issue in detail and maybe help to fix it. Whenever you have time to. It is not something I can do myself :(
 
I've got quite a bit on my plate at the moment, but I can at the very least provide some more information about how to approach fixing it (in case anyone wants to give it a try in the meantime) and perhaps we can revisit after you know whether Iain's patched version solves the issue.

The code the linker would need to generate to patch the branches is very simple assembler code, similar to what I've posed above. However, the meticulous part is that patching the code may change the distance between other branches in-between, and so these would need to be adjusted/corrected and properly tested.

For instance, consider the following example:

Code:
0FFF:E000   alpha:  li   R3, 0x0002
......
1000:0000   beta:  b  kappa
......
1000:1000   gamma:  bctr
.....
10D8:EE00   delta:  blt  alpha
.....
10DF:FF0C   kappa:  sc

For the branch at beta to reach the code at kappa, you need to add a patch to the code. However, doing this will insert instructions between delta and alpha and hence change the distance.

This is why symbol tables are handy. As long as the branch and destination are in the symbol table at link-time, it should be relatively straightforward to locate and re-calcuate it without a need to disassemble code.

Doing this still requires a meticulous eye for detail and it still requires going through the LD64 code in detail to know how things are stored in memory and where things are manipulated, which can be time consuming (and in my case, I would still need to either get LD64-97 to support GCC 4 or install a newer compiler, to test with more than hand-written assembler code on my computer).

However, what happens to branches not in the symbol table?

There's a reason that GNU AS warns both not to use the .L/L local labels (which get discarded before reaching the linker) and that code may be relocated.

For instance, think about the branch at label gamma in the above example assembler code:

Unlike the other branches (where the branch distance is coded directly in the instruction), at gamma, bctr gets its destination from whatever CTR happens to be at run-time. Sometimes this can be determined in advance by tracing the code backwards (or starting backwards and tracing forwards). However, many other times it can be challenging to near-impossible for a computer program to figure out with 100% accuracy all of the possible branch destinations.

For instance, there are entire research papers published on jump table detection with varying levels of accuracy, but none of them yet seem to be 100% accurate at detecting all formats of standard jump tables, let alone when people do unexpected things with hand assembly:

Thus, it is perfectly reasonable for a linker to decline entertaining the idea of adjusting branches which are not in the symbol table (even if it may be possible to handle some of the simpler cases) to avoid misleading people on what to expect when they write such code. However, it is generally not recommended to write things like "b +0x200" directly in assembler rather than using the labels, anyways.

Fortunately, the makers of GCC should be aware of this problem and are not as likely to generate code for the linker that is incompatible with whatever approach that GNU LD takes to handle these long branches in its PowerPC implementation.

If I were to approach this problem, I'd probably start with the following:

1. Locate where the warning message is printed in the code via grep.
2. See how it calculates the branch distance and see where the branch is stored in memory and see where in the code address translation occurs (which should all likely be close or at least on the same general call stack as the code that prints the warning).
3. See if you can determine the distance to the end of the code block (this would seem a reasonably safe position to put a branch island as long as it is +/- the distance of a 24-bits signed integer), which is probably stored in one of LD64's internal structures.
4. Write some code to determine if the distance from the branch island target (immediately after the code block) to the destination is also +/- the distance of a 24-bits signed integer.
5. And if both 3 & 4 are true (i.e. the branch can be done in two 24-bit jumps), then you want the code to do the following:
(a) replace the branch at offset 3 to a branch to where you want to place the next branch (i.e. just after the end of the code block, (this can be done by taking the instruction and using an AND mask, such as F8000003 for b/bl, and then take the destination shift it left by two and OR mask it with the result).
(b) move the code after the end of the segment to make room for the branch island
(c) place a single branch island instruction right after the end of the code segment to link up with the destination
(d) adjust the offsets of any branches that cross over the branch island.
6. If it cannot be done in two 24-bit jumps, just throw an error for now and then worry about adding it at a later date, since it is more important to get the thing working perfectly first with a simple case and add to it than trying to do it all at once and potentially ending up with code that is harder to debug.
7 Test with some simple cases and if it works then incrementally move onto any further steps (e.g. testing with GCC and/or implementing more complicated branch islands for longer jumps and other branch formats like bcc).

(Edited for clarity)
 
Last edited:
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.