I am currently out of town, and my server went down. All my services go through nginx, and suddenly started giving error 502. My SSH won’t let me in. I had my sister reboot the server, and it still doesn’t work. I apologize for the lack of details, but that is all I know, and I can’t access logs. I’ve cleared cache, and used a VPN in case fail2ban got me. I recently got a tp link router, so it could be something with that, but it was working for a while. I will have her do another reboot, and if that doesn’t work I will have her power off and unplug the server in case it was hacked.

Edit: I have absolutely no clue why, but it works now. I literally did nothing. As far as I know, my sister hasn’t touched it today. It just started working. Computers, man…

Edit 2: Actually she said she did something. Not sure what, but it works now.

  • xantoxis@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    edit-2
    2 months ago

    Some troubleshooting thoughts:

    What do you mean when you say SSH is “down”:

    1. connection refused (fail2ban’s activity could result in a connection refused, but a VPN should have avoided that problem, as you said)
    2. connection timeout. probably a failure at the port forwarding level.
    3. connection succeeded but closed; this can happen for a few reasons, such as the system is in an early boot up state. there’s usually a message in this case.
    4. connection succeeded but auth rejected. this can happen if your os failed to boot but came up in a fallback state of some kind.

    Knowing which one of these it is can give you a lot more information about what’s wrong:

    System can’t get past initial boot = Maybe your NAS is unplugged? Maybe your home DNS cache is down?

    Connection refused = either fail2ban or possibly your home IP has moved and you’re trying to connect to somebody else’s computer? (nginx is very popular after all, it’s not impossible somebody else at your ISP has it running). This can also be a port forwarding failure = something’s wrong with your router.

    Connection succeeded + closed is similar to “can’t get past initial boot”

    Auth rejected might give you a fallback option if you can figure out a default username/password, although you should hope that’s not the case because it means anyone else can also get in when your system is in fallback.

    Very few of these things are actually fixable remotely, btw. I suggest having your sister unplug everything related to your setup, one device at a time. Internet router, raspberry pi, NAS, your VM host, etc. Make sure to give them a minute to cool down. Hardware, particularly cheap hardware, tends to fail when it gets hot, and this can take a while to happen, and, well, it’s been hot.

    Here’s a few things with a high likelihood of failing when you’re away from home:

    • heat, as previously mentioned.
    • running out of disk space. Maybe you’re logging too much, throw some more disk in there and tune down the logging. This can definitely affect SSH, and definitely won’t be fixed by a reboot.
    • OOM failures (or other resource leaks). This isn’t likely to affect your bare metal ssh, but it could. Some things leak memory, and this can lead to cascading process destruction by the OS. In this scenario you’d probably be able to connect to things in the first few minutes after a reboot, though.
    • shitty cabling. Sometimes stuff just falls out of the socket, if it wasn’t plugged in perfectly to begin with. (Heat can also contribute to this one.)
    • reliance on a cloud service that’s currently down. (This can include: you didn’t pay the bill.) Hopefully your OS boot doesn’t fail due to a cloud service, but I’ve definitely seen setups that could.
    • shnizmuffin@lemmy.inbutts.lol
      link
      fedilink
      English
      arrow-up
      0
      ·
      2 months ago

      running out of disk space

      This would be my first guess. Nothing shuts down arbitrary services quite like a full /var/logs.

    • HumanPerson@sh.itjust.worksOP
      link
      fedilink
      English
      arrow-up
      0
      ·
      edit-2
      2 months ago

      It says connection closed. There is no message beyond that. I think it is likely that it is failing to boot. I might video call my sister and have her try to boot it so I can see any errors.

      Edit: Also, thanks very much for your response. It was very detailed and informative.

      • Shimitar@feddit.it
        link
        fedilink
        English
        arrow-up
        0
        ·
        2 months ago

        Connection closed means somebody is listening to the port and failing/not willing to reply.

        Unless some network middlemen is closing your connection (ssh should be on port > 1024 to be safe from ISP throttling), your ssh server is severely strained (oom, disk full…) or your F2B is kicking in.

  • adr1an@programming.dev
    link
    fedilink
    English
    arrow-up
    0
    ·
    2 months ago

    502 means the app is broken. For example, if it were Flask python, it would be raising an exception (e.g. divide by zero). If this is happening to many services or apps simultaneously, it is concerning. Turning it off sounds wise at this point.

  • cybersandwich@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    2 months ago

    If it’s working again all of the sudden I would lean towards f2b. I don’t know what your “timeout” is, but if f2b got tripped it would explain why you couldn’t get in yesterday but today it works (assuming your ban expires in 24hrs or so).

  • ThrowawayOnLemmy@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    edit-2
    2 months ago

    Does your router have an app or way of letting you remotely see if the server is even showing on your home network?? It could be a physical disconnect or Ethernet port failure, or NIC failure maybe? A reboot wouldn’t help if the issue was related to something like that.

    Edit: Actually, re-read your post and thinking about this again, what I said wouldn’t make sense…

    You could have some sort of corruption causing an error in the appdata, preventing it from running. Might be a RAM issue.

    • HumanPerson@sh.itjust.worksOP
      link
      fedilink
      English
      arrow-up
      0
      ·
      2 months ago

      It has a network connection, I am able to get to the nginx error, the services themselves are down. What’s really weird is everything is down, even SSH.

      • ThrowawayOnLemmy@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        edit-2
        2 months ago

        I edited my original post right when you replied, my bad.

        I dunno if you can do that much remotely, honestly. I kinda feel like something might have corrupted? What kinda system are you using? Any more details you can provide?

  • chiisana@lemmy.chiisana.net
    link
    fedilink
    English
    arrow-up
    0
    ·
    edit-2
    2 months ago

    Are you by chance using something like Cloudflare? It may be possible that during the reboot the static IP changed, so your “gateway” cannot reach your router on your old IP no more?

    In other words : it’s always the DNS?