They said it started with a power outage, but I thought Linode data centers ran on UPS and didn't have power outages? And, given the length of this outage, the problem seems more serious.
The power outage affected their HVAC system, so it seems to me like the entire DC had to be shutdown, so bringing everything back up from a black start seems to be the situation.
Over 22 hours of downtime for the one VPS I have in that region.
My infrastructure is redundant and spread out among hosting providers and DCs so there's no real impact, but I'm pretty sure this is the longest outage I've ever had with any provider. And the communication level has been so dissapointing. 4 hrs to say it's a power / HVAC issue? Updates that basically just say we're still working on it since then.
We are approaching 24 hours of downtime, I'm still one of those also affected and I'm starting to wonder if the situation is worse than they are letting on.
I woke up to a few hundred messages from Icinga - thankfully my phone is on do-not-disturb overnight. Some of my servers in Newark are up and responding, some are not.
Happy Sunday! Cleaning up the automatically-created maintenance/alert tickets generated by this is going to be a fun time.
Yes, it has gotten worse as time progressed. Some k8s services started to fail, which is how I noticed something was wrong. Then the k8s control plane was up and down. Then k8s control plan completely done. Now I can't even connect to any of my non k8s servers over ssh.
Yes, my servers and all associate services are down: DNS, email, websites. It's a major outage at the whole Newark datacenter, which is their main one, no less.
Thankfully all my nodes are back up. But the DNS server as down so long that Namecheap registered it as a personal DNS server, so I am working with them to get it back up.
These are secondary effects of outage, not Linode directly, but caused by the outage itself.
Thankfully all my nodes are back online and accessible via SSH but there are currently secondary effects due to the length of the outage that I am dealing with, such as my DNS server being deregistered with Namecheap from serving as a personal DNS server (their name for that feature). The consequence of this is also that my email is not currently work, since that affects the hostname the email server operates under serving multiple other domains. I am still grateful that it seems I suffered no data loss, thank God!
Also, I've taken it upon myself to PROPERLY implement a better, redundant backup strategy (since I was mainly relying on Linode's service, but now I feel I should go beyond that). I am using restic backing-up to a Backblaze bucket via the S3 interface. Nice thing is I can put all hosts into the same bucket and restic will organize by host but still full deduplication support. Not sure how much that'll net me but it's nice to have.
I'm in the exact same boat- mail server hosted on linode with namecheap as my registrar. I couldn't figure out why my DNS stopped working despite my linode finally coming back up, and... now I know where to look.
Of 60-something Linodes in Newark across a few accounts (we don't use LKE, Node Balancers, etc)
- Many came back up yesterday. Most of the rest came back up this morning.
- All but two are back online. One of those is "Powered off" but can't be turned on because "Linode busy". The other is online but unreachable, same behavior as most of them during the outage.
- Three required me to put them in Rescue Mode and run fsck.ext4 -F /dev/sda to get them back online.
"The issue is related to heating/cooling complications in the data center due to a power outage . The power outage has been fixed and we are working quickly to bring our services back online."
It is still ongoing and impacting multiple services and regions. Looking into the status page history, it seems they're facinga different issue on Kubernetes for the last couple of days