Plot twist: it was the user
That’s not even a plot twist, that’s expected user behavior
Random guess, a php error caused Apache to log a ridiculous number of errors to /var/log and on this system that isn’t its own partition so /var filled up crashing MySQL. The user wiped /var/log to free up space.
That’s not far off of something that happened to me once a few years ago. My computer suddenly started struggling one day, and I quickly figured out that my hard drive suddenly had 500 gigs or so of extra data somewhere. I had to find a tool that would let me see how much space a given folder was taking up, and eventually I found an absolutely HUMONGOUS error log file. After I cleared it out, the file rapidly filled up again when I used a program I’d been using all the time. I think it was Minecraft or something. Anyway, my duck tape solution was to just make that log file read-only, since the error in question didn’t actually affect anything else.
/var/log has been deleted, you say…
I think we all know what this means, don’t we?
Hint
ls -ld /var/log drwxrwxr-x 18 root syslog 4096 Aug 11 08:13 /var/log
That seems so obvious I think we’re missing something
Whatever, we have a suspect.
Bring in GDB to do the interrogation! And perhaps also call Nice, he can play the good cop…
Forgive me my ignorance, but since Apache is running as root, couldn’t PHP inherit it’s permissions?
The Apache main process runs as root. When it receives a request, it spawns a child process that doesn’t run as root. PHP runs as the same user as the Apache child process.
Or PHP runs in its own fastcgi like process under a different account.
I have no clue. Root nuked the logs? Why? OOM killer does not do that.
Well, there is only one who could have erased all traces of the SIGKILL…
And only the SIGKILLER would have had reason to do so…
Ahh ok, so it is the obvious one.
It looks like the OOM killer has struck again.
Mariadb did it with the candlestick in the library.
This process has been murdered mysteriously.
Oracle.
It was Java, coaxing the Linux OOM killer into doing the job
Alright who’s running the database on the same machine as the server…👀
If you can do this, do it. It’s a huge boost to performance thanks to infinitely lower latency.
And infinitely lower reliability because you can’t have failovers (well you can, but people that run everything in the same host, won’t). It’s fine for something non critical, but I wouldn’t do it with anything that pays the bills.
I work for a company that has operated like this for 20 years. The system goes down sometimes, but we can fix it in less than an hour. At worst the users get a longer coffee break.
A single click in the software can often generate 500 SQL queries, so if you go from 0.05 ms to 1 ms latency you add half a second to clicks in the UI and that would piss our users off.
Definitely not saying this is the best way to operate at all times. But SQL has a huge problem with false dependencies between queries and API:s that make it very difficult to pipeline queries, so my experience has been that I/O-bound applications easily become extremely sensitive to latency.
A single click in the software can often generate 500 SQL queries, so if you go from 0.05 ms to 1 ms latency you add half a second to clicks
Those queries don’t all have to be executed sequentially though, do they? Usually if you have that many queries, at least some of them are completely independent of the others and thus can execute concurrently.
You don’t even need threading for that, just non-blocking IO and ideally an event loop.
The catch is that they all need to run in the same transaction to be unaffected by other things going on in the database and to make updates atomic. A single transaction means a single connection, and ODBC/JDBC has no way of multiplexing or pipelining queries over a single connection.
It’s probably theoretically possible to run some things in different transactions. But with all the different layers and complexity of the code (including third party components and ORMs like Hibernate), understanding all the failure modes and possible concurrency issues becomes intractable.
I’m going to guess quite a people here work on businesses where “sometimes breaks, but fixed in less than an hour” isn’t good enough for reliability.
Yeah if you need even 99.9% uptime, the most downtime you can accept in a year is eight hours.
Most businesses dont require that kind of uptime though. If i killed or servers for a couple of hours between 02:00 and 04:00 every night probably nobody would notice for at least a year if it wasn’t for the alerts we’d get.
Most beginner selfhosters.
and most every cpanel (and every other web host panel) box on the planet.
web, ftp, database, mail, dns, and more. all on one machine.
Most sites hosted on cPanel are relatively small and never need to horizontally scale. People running apps large enough to require multiple servers usually try to optimize their environment by reducing overhead (and cPanel adds quite a bit of it) and tend to not need a GUI for server admin.
For distributed that feed back to a centralized DB? Me. All the dang time.
I’m not that brave doing development by connecting to Production database
Systemd. SQL is now in Systemd.
Dont spoil. That’s the secret in Episode 5.
It was the kubelet after MySQL failed his liveness probes
Ok, now I need a 8 season animated show and at least 2 direct-to-TV movies of this
Best I can do is a Netflix series that gets cancelled halfway through season 2 and a fan-made animation spoof on YouTube
As long as the animation is done by Don Hertzfeldt, you have yourself a goddamn deal!
Coming this fall: “My
anusmemory isbleedingleaking!”
I did it like this: 🔫 BANG WhooOooOoopty doOoO
Will there be a follow up?
It was the BOFH
It was taking away resources from the coffee cam. Had to go.