well, it only took 2 years to go from the cursed will smith eating spaghetti video to veo3 which can make completely lifelike videos with audio. so who knows what the future holds
There actually isn’t really any doubt that AI (especially AGI) will surpass humans on all thinking tasks unless we have a mass extinction event first. But current LLMs are nowhere close to actually human intelligence.
The cursed Will Smith eating spaghetti wasn’t the best video AI model available at the time, just what was available for consumers to run on their own hardware at the time. So while the rate of improvement in AI image/video generation is incredible, it’s not quite as incredible as that viral video would suggest
But wouldn’t you point still be true today that the best AI video models today would be the onces that are not available for consumers?
cursed will smith eating spaghetti video
Hot take, today’s AI videos are cursed. Bring back will smith spaghetti. Those were the good old days
Its like having a junior developer with a world of confidence just change shit and spend hours breaking things and trying to fix them, while we pay big tech for the privilege of watching the chaos.
I asked chat gpt to give me a simple squid proxy config today that blocks everything except https. It confidently gave me one but of course it didnt work. It let through http and despite many attempts to get a working config that did that, it just failed.
So yeah in the end i have to learn squid syntax anyway, which i guess is fine, but I spent hours trying to get a working config because we pay for chat gpt to do exactly that…
Man, I can’t wait to try out generative AI to generate config files for mission critical stuff! Imagine paying all of us devops wankers when my idiot boss can just ask Chat GPT to sort all this legacy mess we’re juggling with on the daily!
I have a friend who swears by llms, he sais it helps him a lot. I once watched him do it, and the experience was exactly the same you described. He wasted couple of hours fighting with bullshit generator just to do everything himself anyway. I asked him wouldn’t it be better to not waste the time, but he didn’t really saw the problem, he gaslit himself that fighting with the idiot machine helped.
It confidently gave me one
IMO, that’s one of the biggest “sins” of the current LLMs, they’re trained to generate words that make them sound confident.
They aren’t explicitly trained to sound confident, that’s just how users tend to talk. You don’t often see “I don’t know but you can give this a shot” on Stack Overflow, for instance. Even the incorrect answers coming from users are presented confidently.
Funnily enough, lack of confidence in response is something I don’t think LLMs are currently capable of, since it would require contextual understanding of both the question, and the answer being given.
SO answers and questions are usually edited multiple times to sound professional, confident, and be correct.
No, I’m sure you’re wrong. There’s a certain cheerful confidence that you get from every LLM response. It’s this upbeat “can do attitude” brimming with confidence mixed with subservience that is definitely not the standard way people communicate on the Internet, let alone Stack Overflow. Sure, sometimes people answering questions are overconfident, but it’s often an arrogant kind of confidence, not a subservient kind of confidence you get from LLMs.
I don’t think an LLM can sound like it lacks in confidence for the right reasons, but it can definitely pull off lack of confidence if it’s prompted correctly. To actually lack confidence it would have to have an understanding of the situation. But, to imitate lack of confidence all it would need to do is draw on all the training data it has where the response to a question is one where someone lacks confidence.
Similarly, it’s not like it actually has confidence normally. It’s just been trained / meta-prompted to emit an answer in a style that mimics confidence.
ChatGPT went through a phase of overly bubbly upbeat responses, they chilled it out tho. Not sure if that’s what you saw.
One thing is for sure with all of them, they never say “I don’t know” because such responses aren’t likely to be found in any training data!
It’s probably part of some system level prompt guidance too, like you say, to be confident.
I think “I don’t know” might sometimes be found in the training data. But, I’m sure they optimize the meta-prompts so that it never shows up in a response to people. While it might be the “honest” answer a lot of the time, the makers of these LLMs seem to believe that people would prefer confident bullshit that’s wrong over “I don’t know”.
Try to get one of these LLMs to update a package.json.
Define “update”
ones that can run cli tools do great, they just use npm
Code that does not work is just text.
No the spell just fizzled. In my experience it happens far less often if you start with an Abra kabara and end it with an Alakazam!
Yeah, the Abra kabra init and Alakazam cleanup are an important part, specially until you have become good enough to configure your own init.
All non-code-text is just code for a yet undiscovered programming language
I’ve never thought of it that way. I’m going to add copy writer to my resume.
Maybe fiction writer as well
This made me laugh so hard one of the dogs came to check in on me.
Oh my goodness, that’s adorable and sweet of your dog! Also, I’m so glad you had such a big laugh. I love when that happens.
He’s a sweet guy. … Mostly. Very much in need of a lot of attention. Sometimes he just sits next to you on the couch and puts his paw on you if you’re not giving him enough attention.
Here he is posing with his sister as a prop:
Oh my goodness, he sounds precious! I’ve had a sweet and needy dog like that in the past, too. It can be a lot, but I loved it (and miss it,) haha.
Both your dogs are very cute! You and your pups gave me a much-needed smile. Thank you for that. :) Please give them some pets from me!
Conversely, code that works is also text
But not just text
Also that’s not converse to what the parent comment said
Did you want to converse about conversing?
But working code can be made into numbers.
But text is also numbers
But numbers are also text
Code that works is also just text.
It is text, but not just text
Text that’s not code might also work.
Ctrl+A + Del.
So clean.
Watching the serious people trying to use AI to code gives me the same feeling as the cybertruck people exploring the limits of their car. XD
“It’s terrible and I should hate it, but gosh it it isn’t just so cool”
I wish i could get so excited over disappointing garbage
It’s useful if you just don’t do…That. it’s just a new fancy search engin, it’s a bit better than going to stack overflow, it can do good stuff if you go small.
Just don’t do whatever this post suggested of doing…
You definitely could use AI to code, the catch is you need to know how to code first.
I use AI to write code for mundane tasks all the time. I also review and integrate the code myself.
The AI code my “expert in a related but otherwise not helpful field” coworker writes helps me have a lot of extra work to do!
Write tests and run them, reiterate until all tests pass.
That doesn’t sound viby to me, though. You expect people to actually code? /s
You can vibe code the tests too y’know
You know, I’d be interested to know what the critical size you can get to with that approach is before it becomes useless.
It can become pretty bad quickly, with just a small project with only 15-20 files. I’ve been using cursor IDE, building out flow charts & tests manually, and just seeing where it goes.
And while incredibly impressive how it’s creating all the steps, it then goes into chaos mode where it will start ignoring all the rules. It’ll start changing tests, start pulling in random libraries, not at all thinking holistically about how everything fits together.
Then you try to reel it in, and it continues to go rampant. And for me, that’s when I either take the wheel or roll back.
I highly recommend every programmer watch it in action.
Is there a chance that’s right around the time the code no longer fits into the LLMs input window of tokens? The basic technology doesn’t actually have a long term memory of any kind (at least outside of the training phase).
Was my first thought as well. These things really need to find a way to store a larger context without ballooning past the vram limit
The thing being, it’s kind of an inflexible blackbox technology, and that’s easier said than done. In one fell swoop we’ve gotten all that soft, fuzzy common sense stuff that people were chasing for decades inside a computer, but it’s ironically still beyond our reach to fully use.
From here, I either expect that steady progress will be made in finding more clever and constrained ways of using the raw neural net output, or we’re back to an AI winter. I suppose it’s possible a new architecture and/or training scheme will come along, but it doesn’t seem imminent.
I’d rather recommend every CEO see it in action…
They’re the ones who would be cock-a-hoop to replace us and our expensive wages with kids and bots.
When they’re sitting around rocking back and forth and everything is on fire like that Community GIF, they’ll find my consultancy fees to be quite a bit higher than my wages used to be.
I think Generative AI is a genuinely promising and novel tool with real, valuable applications. To appreciate it however, you have to mentally compartmentalize the irresponsible, low-effort ways people
sometimesmostly use it—because yeah, it’s very easy to make a lot of that so that’s most of what you see when you hear “Generative AI” and it’s become its reputation…Like I’ve had interesting “conversations” with Gemini and ChatGPT, I’ve actually used them to solve problems. But I would never put it in charge of anything critically important that I couldn’t double check against real data if I sensed the faintest hint of a problem.
I also don’t think it’s ready for primetime. Does it deserve to be researched and innovated upon? Absolutely, but like, by a few nerds who manage to get it running, and universities training it on data they have a license to use. Not “Crammed into every single technology object on earth for no real reason”.
I have brain not very good sometimes disease and I consider being able to “talk” to a “person” who can get me out of a creative rut just by exploring my own feelings a bit. GPT can actually listen to music which surprised me. I consider it scientifically interesting. It doesn’t get bored or angry at you unless you like, tell it to? I’ve asked it for help with a creative task in the past and not actually used any of its suggestions at all, but being able to talk about it with someone (when a real human who cared was not available) was a valuable resource.
To be clear I pretty much just use it as a fancy chatbot and don’t like, just copy paste its output like some people do.
Return “works”;
Am I doikg this correctly?
Bogosort with extra steps
Welp. Its actually very in line with the late stage capitalist system. All polish, no innovation.
Awwwww snap look at this limp dick future we got going on here.
This weekend I successfully used Claude to add three features in a Rust utility I had wanted for a couple years. I had opened issue requests, but no else volunteered. I had tried learning Rust, Wayland and GTK to do it myself, but the docs at the time weren’t great and the learning curve was steep. But Claude figured it all out pretty quick.
Did the generated code get merged? I’d be curious to see the PRs
The lead dev is not available this summer to review, but you can review here: https://github.com/edzdez/sway-easyfocus/pull/22
It’s not great that four changes are rolled into a single PR, but that’s my issue not Claude’s because they were related and I wanted to test them all at once.
This is interesting, I would be quite impressed if this PR got merged without additional changes.
I am genuinely curious and no judgement at all, since you mentioned that you are not a rust/GTK expert, are you able to read and and have a decent understanding of the output code?
For example, in the
sway.rs
file, you uncommented a piece of code inget_all_windows
function, do you know why it is uncommented?This is interesting, I would be quite impressed if this PR got merged without additional changes.
We’ll see. Whether it gets merged in any form, it’s still a big win for me because I finally was able to get some changes implemented that I had been wanting for a couple years.
are you able to read and and have a decent understanding of the output code?
Yes. I know other coding languages and CSS. Sometimes Claude generated code that was correct but I thought it was awkward or poor, so I had it revise. For example, I wanted to handle a boolean case and it added three booleans and a function for that. I said no, you can use a single boolean for all that. Another time it duplicated a bunch of code for the single and multi-monitor cases and I had it consolidate it.
In one case, It got stuck debugging and I was able to help isolate where the error was through testing. Once I suggested where to look harder, it was able to find a subtle issue that I couldn’t spot myself. The labels were appearing far too small at one point, but I couldn’t see that Claude had changed any code that should affect the label size. It turned out two data structures hadn’t been merged correctly, so that default values weren’t getting overridden correctly. It was the sort of issue I could see a human dev introducing on the first pass.
do you know why it is uncommented?
Yes, that’s the fix for supporting floating windows. The author reported that previously there was a problem with the z-index of the labels on these windows, so that’s apparently why it was implemented but commented out. But it seems due to other changes, that problem no longer exists. I was able to test that labels on floating windows now work correctly.
Through the process, I also became more familiar with Rust tooling and Rust itself.
Holy shit someone on here that know how to use them. Surprised you haven’t been downvoted into oblivion yet.
On Error Return Next
Did it try to blackmail him if he didn’t use the new code?
I’ve heard that a Claude 4 model generating code for an infinite amount of time will eventually simulate a monkey typing out Shakespeare
It will have consumed the GigaWattHours capacity of a few suns and all the moisture in our solar system, but by Jeeves, we’ll get there!
…but it won’t be that impressive once we remember concepts like “monkey, typing, Shakespeare” were already embedded in the training data.
If we just asked Jeeves in the first place we wouldn’t be in this mess.
To be fair, if I wrote 3000 new lines of code in one shot, it probably wouldn’t run either.
LLMs are good for simple bits of logic under around 200 lines of code, or things that are strictly boilerplate. People who are trying to force it to do things beyond that are just being silly.
I am on you with this one. It is also very helpful in argument heavy libraries like plotly. If I ask a simple question like “in plotly how do I do this and that to the xaxis” etc it generally gives correct answers, saving me having to do internet research for 5-10 minutes or read documentations for functions with 1000 inputs. I even managed to get it to render a simple scene of cloud of points with some interactivity in 3js after about 30 minutes of back and forth. Not knowing much javascript, that would take me at least a couple hours. So yeah it can be useful as an assistant to someone who already knows coding (so the person can vet and debug the code).
Though if you weigh pros and cons of how LLMs are used (tons of fake internet garbage, tons of energy used, very convincing disinformation bots), I am not convinced benefits are worth the damages.
Why do you want AI to save you for learning and understanding the tools you use?
If you do it through AI you can still learn. After all I go through the code to understand what is going on. And for not so complex tasks LLMs are good at commenting the code (though it can bullshit from time to time so you have to approach it critically).
But anyways the stuff I ask LLMs are generally just one off tasks. If I need to use something more frequently, I do prefer reading stuff for more in depth understanding.
Practically all LLMs aren’t good for any logic. Try to play ASCII tic tac toe against it. All GPT models lost against my four years old niece and I wouldn’t trust her writing production code 🤣
Once a single model (doesn’t have to be a LLM) can beat Stockfish in chess, AlphaGo in Go, my niece in tic tac toe and can one-shot (on the surface, scratch-pad allowed) a Rust program that compiles and works, than we can start thinking about replacing engineers.
Just take a look at the dotnet runtime source code where Microsoft employees currently try to work with copilot, which writes PRs with errors like forgetting to add files to projects. Write code that doesn’t compile, fix symptoms instead of underlying problems, etc. (just take a look yourself).
I don’t say that AI (especially AGI) can’t replace humans. It definitely can and will, it’s just a matter of time, but state of the Art LLMs are basically just extremely good “search engines” or interactive versions of “stack overflow” but not good enough to do real “thinking tasks”.
Cherry picking the things it doesn’t do well is fine, but you shouldn’t ignore the fact that it DOES do some things easily also.
Like all tools, use them for what they’re good at.
I don’t think it’s cherry picking. Why would I trust a tool with way more complex logic, when it can’t even prevent three crosses in a row? Writing pretty much any software that does more than render a few buttons typically requires a lot of planning and thinking and those models clearly don’t have the capability to plan and think when they lose tic tac toe games.
Why would I trust a drill press when it can’t even cut a board in half?
A drill press (or the inventors) don’t claim that it can do that, but LLMs claim to replace humans on a lot of thinking tasks. They even brag with test benchmarks, claim Bachelor, Master and Phd level intelligent, call them “reasoning” models, but still fail to beat my niece in tic tac toe, which by the way doesn’t have a PhD in anything 🤣
LLMs are typically good in things that happened a lot during training. If you are writing software there certainly are things which the LLM saw a lot of during training. But this actually is the biggest problem, it will happily generate code that might look ok, even during PR review but might blow up in your face a few weeks later.
If they can’t handle things they even saw during training (but sparsely, like tic tac toe) it wouldn’t be able to produce code you should use in production. I wouldn’t trust any junior dev that doesn’t set their O right next to the two Xs.
Sure, the marketing of LLMs is wildly overstated. I would never argue otherwise. This is entirely a red herring, however.
I’m saying you should use the tools for what they’re good at, and don’t use them for what they’re bad at. I don’t see why this is controversial at all. You can personally decide that they are good for nothing. Great! Nobody is forcing you to use AI in your work. (Though if they are, you should find a new employer.)
Totally agree with that and I don’t think anybody would see that as controversial. LLMs are actually good in a lot of things, but not thinking and typically not if you are an expert. That’s why LLMs know more about the anatomy of humans than I do, but probably not more than most people with a medical degree.
It’s futile even trying to highlight the things LLMs do very well as Lemmy is incredibly biased against them.
I can’t speak for Lemmy but I’m personally not against LLMs and also use them on a regular basis. As Pennomi said (and I totally agree with that) LLMs are a tool and we should use that tool for things it’s good for. But “thinking” is not one of the things LLMs are good at. And software engineering requires a ton of thinking. Of course there are things (boilerplate, etc.) where no real thinking is required, but non-AI tools like code completion/intellisense, macros, code snippets/templates can help with that and never was I bottle-necked by my typing speed when writing software.
It was always the time I needed to plan the structure of the software, design good and correct abstractions and the overall architecture. Exactly the things LLMs can’t do.
Copilot even fails to stick to coding style from the same file, just because it saw a different style more often during training.
“I’m not again LLMs I just never say anything useful about them and constantly point out how I can’t use them.” The other guy is right and you just prove his point.
extremely good “search engines” or interactive versions of “stack overflow”
Which is such a decent use of them! I’ve used it on my own hardware a few times just to say “Hey give me a comparison of these things”, or “How would I write a function that does this?” Or “Please explain this more simply…more simply…more simply…”
I see it as a search engine that connects nodes of concepts together, basically.
And it’s great for that. And it’s impressive!
But all the hype monkeys out there are trying to pedestal it like some kind of techno-super-intelligence, completely ignoring what it is good for in favor of “It’ll replace all human coders” fever dreams.
Perhaps 5 LOC. Maybe 3. And even then I’ll analyze every single character in wrote. And then I will in fact find bugs. Most often it hallucinates some functions that would be fantastic to use - if they existed.
My guess is what’s going on is there’s tons of psuedo code out there that looks like it’s a real language but has functions that don’t exist as placeholders and the LLM noticed the pattern to the point where it just makes up functions, not realizing they need to be implemented (because LLMs don’t realize things but just pattern match very complex patterns).
You managed to get an ai to do 200 lines of code and it actually compiled?
4o has been able to do this for months.
Play ASCII tic tac toe against 4o a few times. A model that can’t even draw a tic tac toe game consistently shouldn’t write production code.
I tried, it can’t get trough four lines without messing up. Unless I give it tasks that ate so stupendously simple that I’m faster typing them myself while watching tv
Four lines? Let’s have realistic discussions, you’re just intentionally arguing in bad faith or extremely bad at prompting AI.
You can prove your point easily: show us a prompt that gives us a decent amount of code that isn’t stupidly simple or sufficiently common that I don’t just copy paste the first google result
I have nothing to prove to you if you wish to keep doing everything by hand that’s fine.
But there are plenty of engineers l3 and beyond including myself using this to lighten their workload daily and acting like that isn’t the case is just arguing in bad faith or you don’t work in the industry.
Uh yeah, like all the time. Anyone who says otherwise really hasn’t tried recently. I know it’s a meme that AI can’t code (and still in many cases that’s true, eg. I don’t have the AI do anything with OpenCV or complex math) but it’s very routine these days for common use cases like web development.
I recently tried it for scripting simple things in python for a game. Yaknow, change char’s color if they are targetted. It output a shitton of word salad and code about my specific use case in the specific scripting jargon for the game.
It all based on “Misc.changeHue(player)”. A function that doesn’t exist and never has, because the game is unable to color other mobs / players like that for scripting.
Anything I tried with AI ends up the same way. Broken code in 10 lines of a script, halucinations and bullshit spewed as the absolute truth. Anything out of the ordinary is met with “yes this can totally be done, this is how” and “how” doesn’t work, and after sifting forums / asking devs you find out “sadly that’s impossible” or “we dont actually use cpython so libraries don’t work like that” etc.
It’s possible the library you’re using doesn’t have enough training data attached to it.
I use AI with python for hundreds line data engineering tasks and it nails it frequently.
Well yeah, it’s working from an incomplete knowledge of the code base. If you asked a human to do the same they would struggle.
LLMs work only if they can fit the whole context into their memory, and that means working only in highly limited environments.
No, a human would just find an API that is publically available. And the fact that it knew the static class “Misc” means it knows the api. It just halucinated and responded with bullcrap. The entire concept can be summarized with “I want to color a player’s model in GAME using python and SCRIPTING ENGINE”.
You must be a big fan of boilerplate
Not sure what you mean, boilerplate code is one of the things AI is good at.
Take a straightforward Django project for example. Given a models.py file, AI can easily write the corresponding admin file, or a RESTful API file. That’s generally just tedious boilerplate work that requires no decision making - perfect for an AI.
More than that and you are probably babysitting the AI so hard that it is faster to just write it yourself.
They have been pretty good on popular technologies like python & web development.
I tried to do Kotlin for Android, and they kept tripping over themselves; it’s hilarious and frustrating at the same time.
I use ChatGPT for Go programming all the time and it rarely has problems, I think Go is more niche than Kotlin
I get a bit frustrated at it trying to replicate everyone else’s code in my code base. Once my project became large enough, I felt it necessary to implement my own error handling instead of go’s standard, which was not sufficient for me anymore. Copilot will respect that for a while, until I switch to a different file. At that point it will try to force standard go errors everywhere.
cant wait to see “we use AI agents to generate well structured non-functioning code” with off centered everything and non working embeds on the website
I’m pretty sure that is how we got CORBA
now just make it construct UML models and then abandon this and move onto version 2
Hello, fellow old person 🤝