Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

Nermal

Moderator
Original poster
Staff member
Dec 7, 2002
21,188
4,850
New Zealand
This isn't strictly a Mac/Apple programming question, but there's no "generic" category!

I was introduced to Ghidra a few days ago. If I understand correctly, it can take a binary (whether for Windows, Mac OS, Linux or a handful of others) and disassemble it. It can then produce a C representation of the code. While the resulting code isn't particularly readable (since it's missing all the function and variable names, among other things) it's apparently possible to recompile the result back into a working binary.

I do have a couple of questions though. First and foremost, does anyone know whether it can produce C versions of binaries that weren't originally written in C? If something was originally produced in assembly, is Ghidra able to create equivalent C code? Or is it limited to binaries that came from a C compiler?

Secondly, is it possible to feed in an existing plain-text assembly file? From what I've seen so far, it'll produce an assembly listing from a binary, but what if I already have a listing?

I'm probably a little over my head here :)
 
I've not used Ghidra specifically, but I can still say a few things to this.

First, not specifically addressing one of your questions, it is actually possible for a compiled binary to retain the names of variables and function names so a decompiler would be able to show it. The code would still be very hard to read, but at least there'd be that help. - But this depends how it's initially compiled

Then to your questions,

1)
It is irrelevant what language it was originally written in. Once it's machine code, it's machine code and working back to C from there doesn't care where it came from. - You can theoretically also transcode from language to language anyway. - Now there are some caveats to this, for instance if it's a non-compiled language or a VM language like Java, I doubt Ghidra can do much. There are also languages where the code often relies heavily on the runtime and I don't know how something like Ghidra would react to that, or if it would assume calls to the standard C library, but it should be possible to get C out no matter the initial language. The initial language shouldn't be visible in compiled machine code, unless we get into details like known conventions or familiar calls to library functions.

As for your second question, no clue, but you could always just run an assembler on the file first and then plug it in. Only one more command to go through.
 
I realised that I forgot to explain what I was actually trying to accomplish.

I have access to some old (late 80s/early 90s) assembly code, and I was hoping that Ghidra (or a similar tool) would be able to produce a C version of it. Over time this could be refactored to something a bit more modern.

It's all Arm code and unfortunately Ghidra can't read Arm Image Format (AIF). It therefore appears that your "assemble it first" technique won't work, at least not using the original assembler. GCC might be able to make an ELF out of it though...

It looks like this could warrant a bit more investigation. Thanks for your comments :)
 
It's all Arm code and unfortunately Ghidra can't read Arm Image Format (AIF). It therefore appears that your "assemble it first" technique won't work, at least not using the original assembler. GCC might be able to make an ELF out of it though...

Are we sure Ghidra can disassemble ARM instructions in the first place?
 
It claims to handle Armv4 through v8.

Hm. Well if it doesn't take the AIF object files, maybe you could also try and look at a different disassembler that may just be able to deal with them directly.

Have never worked with ARM directly. I say directly cause I've written Java based Android apps that thus run on ARM as well as some iOS, but never really dealt with ARM in any real way, so a bit out of my depths on this one, but I'm not sure you can easily just convert it to ELF. Though I did once have a tool that could disassemble ELF and let me repackage it as Mach-O, so it might be possible to semi-easily get it done. - Can't assist much more on this though I'm afraid, but I hope someone else who has more expertise can step in and help with the mission if you don't find a good path on your own :)
 
A Raspberry Pi is based on ARM, and they're pretty low cost and easy to bring up. If you can get the files onto it, even just copying them to the SD card before booting the Pi, then there's a lot you can accomplish using just an ssh command-line.

I have a couple Pi's here that I've been playing with, and a Buster Lite version of Raspbian with a wifi or ethernet connection and ssh gives a workable system. The executables are ELF files, e.g.
Code:
file /bin/sync
/bin/sync: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-armhf.so.3, for GNU/Linux 2.6.32, BuildID[sha1]=811b9678ddb3b02e2ac9af07428df17902127eef, stripped
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.