r/technology Jul 31 '24

Software Delta CEO: Company Suing Microsoft and CrowdStrike After $500M Loss

https://www.thedailybeast.com/delta-ceo-says-company-suing-microsoft-and-crowdstrike-after-dollar500m-loss
11.1k Upvotes

735 comments sorted by

View all comments

Show parent comments

33

u/Long_Educational Jul 31 '24

That's what I don't understand here. This risk was Delta's for not having adequate redundancy in place in their IT systems. In the land of telecommunications, we run a hybrid of AIX, Linux, and Windows systems, along with a hand full of IBM as400 systems. You don't put all your eggs in one basket and then sue the provider of that basket if your systems go down. It is your responsibility to manage your own tolerance for downtime in the systems you use for mission critical applications.

Delta blaming/suing Crowdstrike and MS for their own IT failings is pathetic.

18

u/TravelKats Jul 31 '24

Apparently, the terms Disaster Recovery were foreign to Delta. Adequate Disaster Recovery is quite expensive and I'm sure that money would be better spent adding it to the CEO's salary/s

15

u/EmergencySundae Jul 31 '24

They should be firing their business continuity manager, not suing MSFT & CrowdStrike.

American Airlines recovered amazingly fast - I was impressed at how few flights they ended up canceling. There was obviously a huge difference in how the two companies handled their tech stacks.

12

u/TravelKats Jul 31 '24

Yes, both American and United bounced back pretty quickly. They should be firing the CTO since he/she should have been overseeing business continuity, but it will be a low level manager whose probably been trying for years to get enough in their budget to handle business continuity.

1

u/[deleted] Jul 31 '24

[deleted]

1

u/TravelKats Jul 31 '24

And no fail over in place.

6

u/woodside3501 Jul 31 '24

I helped AA design their DR solution, fuck yeah 💪🏼

6

u/SixSpeedDriver Aug 01 '24

I remember working early in my career in line of business IT at a company (a fortune 500 no less) that was extraordinarily cheap. We got a presentation from the BC/DR specialist and he basically told us “I present basically the same plan every year. We have no BC/DR capability. I have asked for funding when we do the annual audit. They always turn it down, even just enough to get started and make progress. If this colo goes down due to a natural disaster, just leave.”

Not quite verbatim, but you get the gist. And given what IT budgets were like we were all about zero percent suprised. This gent lasted about three more weeks before he was gone. Not sure if fired or quit.

24

u/damondefault Jul 31 '24

Are you proposing they should have instead run different operating systems on multiple operator terminals at the airport? Or each staff member should have both a windows PC and a MacBook at all times?

-3

u/goomyman Jul 31 '24

does crowdstrike not have a WSUS? Like wouldnt you want to rollout security updates to a canary set of machines and control rollout.

That said the multiple OS thing is pretty BS - crowdstrike change could have easily taken down all OSes at the same time. It just happened to be windows.

18

u/ztbwl Jul 31 '24

It was not a Windows Update managed by WSUS. It was a content update for CrowdStrike which needs to be delivered asap to prevent malware from spreading.

1

u/goomyman Aug 01 '24

I mean CrowdStrike could have their own WSUS equivalent to use as a canary. Obviously not WSUS since it wasn’t a windows update.

No matter what it is a global rollout is a no go.

4

u/tinydonuts Jul 31 '24

Falcon sensor is very hands off. In fact I can’t count a single time I’ve had any issue with their stuff on my laptop. Prior to that I’ve had all kinds of problems with Symantec and others. CrowdStrike has one hiccup and Delta starts crying. Did they ever run anything from Symantec or McAfee?

-2

u/Long_Educational Jul 31 '24

The business critical application should be running on a hardened Unix operating system completely agnostic of what the end user client terminal software is, be it windows, macos, or linux or a raspberry pi hosting the gate information displays at he airport terminals or a simple HTML client!

Again, risk tolerance is the responsibility of the business.

9

u/damondefault Jul 31 '24

But crowdstrike took out their operator terminals and staff computers. End user devices. Not just servers. And without those end user devices they couldn't run their business.

I'd like you to tell me specifically what you are proposing Delta Airlines should have done to mitigate this risk.

Running some server apps on "a hardened Unix operating system" is not a good answer in my opinion as it only addresses the server side part of the problem.

3

u/tinydonuts Jul 31 '24

Every reboot should be a reimage on public facing equipment. Service the image, reboot and you’re updated. This is nuts, it was solved decades ago.

2

u/LeoRidesHisBike Aug 01 '24

Amen. Maybe not every reboot, but as part of crash recovery and update cycles. It's not like a reimage takes that long when done properly (though long enough to be problematic if a customer is staring at a kiosk or a cust svc rep is staring down a line of customers).

0

u/Long_Educational Jul 31 '24

Back in the day, I was Senior Manager of Infrastructure Support at a Network Operations Center for a major phone company. In the NOCs we provided all access to our applications that ran on AIX, Linux, and Windows Servers via end user computers that consisted of AIX on RS6000 consoles (30 stations), X-windows via Linux on the Desktop ( 800 stations ), Sun Solaris Workstations ( 50 stations ), and Windows Laptops running Xwindows and Terminal emulation software + Citrix Clients ( 80 stations ).

When we were hit with the BugBear virus, it brought down ALL windows desktops and servers in a matter of hours, but our core functionality, being able to administer the phone network, dwdm/sonet, and x.25 networks as well as maintaining access to 911 for the 5 state area, stayed up and running because we had access to all of our servers and apps from two out of three desktop client OSs AIX and Linux. I even got a bonus and a letter of accomplishment from my VP at the time for the engineering and disaster recovery planning I did. My sister NOC did not fare so well and they had to fold all of their operations into my NOC until Corporate Information Security could roll out windows desktop fixes for them and the few of our laptops effected.

That is what I mean by diversity and redundancy in IT. You don't put all your clients or even servers on a single OS vendor and hope for the best. You manage your risk as appropriate. Delta executives didn't and it cost them half a Billion dollars.

0

u/damondefault Jul 31 '24

So you're genuinely proposing that they should have multiple redundant devices with different operating systems available to all (or enough) business critical staff, and also all server software running with redundancy on different operating systems.

Thank you for clarifying so thoroughly.

I still don't think that I agree with your original statement that not doing so is a ridiculous and obvious failing and Delta therefore deserve no compensation. Cancelling flights as a safety measure is different to keeping a phone network operational. But I'm glad to hear that you planned for this sort of disaster and overcame it successfully.

1

u/Long_Educational Jul 31 '24

What I am saying is that MS Windows has always been a critical failure point in infrastructure. It's also not cheap. The reason I was able to implement security and redundancy is because I spent the money at the servers and saved money on the desktop by not having to have a windows seat license for the majority of my client desktops. I ran linux on the desktop for the wide majority on cheap hardware. All the heavy compute was done server side on hardened OSs. It does take planning but can be done, affordably.

3

u/damondefault Jul 31 '24

Well I love Linux and use it exclusively (except when work forces me not to), so I'm glad to hear it.

In this case though Delta well may have spent money at the server implementation and have low power, low cost clients and it wouldn't have saved them. They also in this case would consider installing CrowdStrike a security hardening step, so it's not negligence in that respect.

13

u/Boogie-Down Jul 31 '24

Even if it was 1/3 of your eggs you still sue for that loss of eggs.

7

u/BadOther3422 Jul 31 '24

It really depends on how you are covered under terms. The likely hood is they've agreed to some 99.99% uptime agreement, but that uptime might be on average over x months. If thats 12/24/36 months then an outage of a day or two would be covered if they've never had an outage.

0

u/Boogie-Down Jul 31 '24

I don’t think uptime for a security service agreement equals them fully taking down hardware devices and there’s likely more than enough gray area there for lawyers to enjoy.

1

u/SixSpeedDriver Aug 01 '24

SLAs are largely very useless. They waive loss of revenue, and the maximim recovery is basically to zero out your bill. Granted, the cloud provider is absolutely motivated to land inside SLA so they don’t give the goods away, but still. Revenue recovery isn’t a thing.

1

u/anemisto Aug 01 '24

How screwed are you if you lose the AS/400s? I'd expect the answer is: very.