The first programs were written in binary/hexadecimal, and only later did we invent coding languages to convert between human readable code and binary machine code.
So why can’t we just do the same thing in reverse? I hear a lot about devices from audio streaming to footware rendered useless by abandonware. Couldn’t a very smart person (or AI) just take the existing program and turn it into code?
It is not. idk who told you it was.
Disassembling an executable is trivial to do. Everything is open source if you can read assembly. Obfuscation be damned.
The hard part isn’t reading assembly. The hard part is figuring out why it’s doing what it’s doing with no comments or function names or anything useful to help.
This is like saying if you can read English you can understand an advanced math or physics paper written in English without having any knowledge or context of those subjects.
I’ve used a decompiler to peek at the source code of an app written in Visual Basic I wanted to recreate as a browser addon. It was mostly successful but some variable and function names were messed up.
Variable names, class names, package structure, method names, etc. won’t normally be maintained in the disassembled code. They are meaningless to the CPU, and just a series of memory addresses. In cases where you have method names being mentioned, it’s likely a syscall, and it’s calling a method from an existing library. I’m not familiar with VB, but at least in .Net and .Net Framework, this would be something like the System.Collections.Generic providing the implementation for List<string> and when .Sort() is called, it makes the syscall to that compiled .dll.
You could chuck it at an AI to reverse compile it into something readable.
Instead of just getting the down votes, I’ll explain why that wouldnt work.
Lastly, other comments have pointed out decompiled code is extremely expensive to analyze. The output from whatever we decompile would easily exceed the input limits for all existing LLMs.
Thanks. I was thinking that you could have an AI “looking over the shoulder” of a compiler, seeing what comes out for the code going in to it. Basically training it to spot sequences in compiled code in order to guess the instructions that compiled into that code.
Well decompiling is only one step in the reverse engineering process. I would recommend taking a look at the Legend of Zelda: Ocarina of Time decompile projects. They reversed engineered the whole thing, which took years and was a team effort.
In the end they got perfectly readable source code, fully documented. And the most amazing thing is, when compiled with the right compiler and right flags, it recreates the original rom perfectly.
I would also recommend a YouTuber called Kaze. He’s been working on Mario 64 for years, re-writing large parts of the engine to get some pretty cool stuff going.