DARPA: Translating All C to Rust (TRACTOR)

Aatube@kbin.melroy.org · 3 months ago

DARPA: Translating All C to Rust (TRACTOR)

solrize@lemmy.world · 3 months ago

Maybe it would be easier to translate to Ada? That is for C code that doesn’t make heavy use of malloc/free. The idea of Rust’s borrow checker as I understand it is to statically track the references to malloc’d memory to make sure that you never use-after-free or double-free. If your C code uses malloc in uncontrolled ways, then massaging it to satisfy a borrow checker sounds horribly difficult and you should either give up, or run it under a very managed environment like valgrind. If (as is typical of embedded code) it just does stuff with some fixed memory buffers and doesn’t do much runtime allocation, then there isn’t anything for a borrow checker to look after, so you can use a safe language (Ada) that doesn’t have borrow checking.

Disclaimer: I don’t use Rust at the moment. Someday. I do like Ada despite its verbosity, but it’s not that great at managing dynamic memory. It is starting to take on Rust influences to help with that.

astronaut_sloth@mander.xyz · 3 months ago

I think this is an interesting idea. If they’re able to pull it off, I think it will cement the usefulness of LLMs. I have my doubts, but it’s worth trying. I’d imagine that the LLM is specially tuned to be more adept at this task. Your bog-standard GPT-4 or Claude will probably be unreliable.

ByteOnBikes@slrpnk.net · 3 months ago

Having built code converters for the same language to auto migrate to a later version of that language, I’m incredibly worried. We still had to manually verify every thing.

I’m hopeful though that this does become the wave of the future. There’s some serious legacy shit out there that doesn’t have enough of a financial gain to revisit and rewrite.

astronaut_sloth@mander.xyz · 3 months ago

Yeah, they’ll probably have to check everything. Though, I wonder if even just checking that everything is good to go would save time from manually re-writing it all. While it may not be a smashing success, it could still prove useful.

I dunno, I’m interested to see how this plays out.

IllNess@infosec.pub · 3 months ago

I’m gonna guess this is going to be a major pain to debug.

technocrit@lemmy.dbzer0.com · edit-2 3 months ago

I’m gonna guess this is going to be a major ~~pain~~ profit to debug.

Some “AI” grifters gonna be showering in that state paper.

simple@lemm.ee · 3 months ago

Translating entire codebases with LLMs? What could POSSIBLY go wrong?

I also don’t see how it would ever be possible to directly translate C to Rust. They’re so fundamentally different that things are bound to not work the same.

FaceDeer@fedia.io · 3 months ago

What could go wrong with using human programmers to convert it?

If you’re going to insist on perfection for something like this then you’re probably never going to get anything done. Convert the program and then test and debug it just like you’d do with any newly written code. The idea is to make it easier to do that, not to make it so you don’t have to do it at all.

IllNess@infosec.pub · 3 months ago

I don’t even understand how they are going to get around the memory security they are doing this translation for. Watch them have to break the security features of Rust just to make certain programs work.

FaceDeer@fedia.io · 3 months ago

I would expect that’s part of the point, if a C program can’t be converted to a language that doesn’t allow memory violations that probably indicates that there are execution pathways that result in memory violations.

fubarx@lemmy.ml · 3 months ago

I’vd tried multiple times to convert existing code or createnew ones using LLMs. The first attempts are OK, but once you start refining the prompts, they all go off-the-rails.

Most of the time, the generated code uses old or deprecated libraries or APIs. You point that out and they correct it. But a few iterations later, you’re refining something else and the old, deprecated calls come back. Once again, you point it out and it gets corrected.

Forget trying to correct it yourself by hand, because now it’s diverged from the LLM context. And this can happen in multiple places in the code. Rinse. Repeat.

At some point you just give up. Either it’s wrong or it will be wrong in different ways later. You have to read through every line to find strange, divergent errors. Over and over. It gets exhausting.

At the end, it feels like maybe you could have done it faster and more quickly yourself, but the time has already been sunk.