What Caused the Global IT Outage?

Addressing a global crisis... and it's not just this outage

Jul 23, 2024

Yesterday, we saw what happens when software deployed on millions of endpoints across the globe to prevent malware… BECAME THE MALWARE.
—Liam Sutton

***

What Happened

On July 19, 2024 at 04:09 UTC, computer systems across much of the globe crashed leading to the blue screen of death (BSOD) on affected machines. According to Crowdstrike, the company responsible for the crash:

Customers running Falcon sensor for Windows version 7.11 and above, that were online between Friday, July 19, 2024 04:09 UTC and Friday, July 19, 2024 05:27 UTC… were susceptible to a system crash.

The incident affected machines running Microsoft 365 enterprise software, which comprised at least 49 million terminals. Interos reported that this disrupted 674,476 businesses globally, with 41% of them based in the United States, and the rest by order of volume in Britain, France, Spain, and a smattering of other countries. This disturbed businesses in virtually every sector, as well as governments. In the US, 82% of state governments and 48% of the largest cities use the impacted software.

The Cause

One of Crowdstrike’s core services is “Endpoint Detection and Response” (EDR). In simple terms, this is a process that looks for threats on user devices like an office worker’s computer (the ‘endpoint’). When a protected device fires up, channel files look for new malicious activity. While channel files themselves are not system files (despite their .sys extension), they operate in the same space that controls the boot level process.

In this case, the affected Crowdstrike channel file (C-00000291) controlled how the security element identified and managed name pipe execution. Name Pipe files are used to “provide communication between processes on the same computer or between processes on different computers across a network.” These files are considered critical and are thus routinely monitored for suspicious activity. Because the Crowdstrike file disrupted communication at the protected kernel layer, systems crashed and caused the infamous blue screen. Worse, as the issue occurred at the boot level, it forced the endpoint into a “boot loop” where the system starts, crashes, restarts, and endlessly repeats this cycle.

The Fix

Although procedurally simple, the fix created considerable headaches for IT departments. In short, techs needed to reboot every affected computer in recovery mode, locate the corrupted channel file, delete it, and reboot. For companies with thousands of machines, one can imagine the tediousness of this process.

Cybersecurity Impact

Numerous threat actors have taken the opportunity to exploit businesses victimized by this failure. In most cases, these criminals present ostensible fixes which are actually disguised malware, through the usual methods of phishing and other social engineering. Some have adopted the even bolder approach of soliciting victims directly as if they are independent contractors out to help correct the problem for companies facing waiting lists or other delays. In addition, many URLs have popped up online that appear to be associated with Crowdstrike. Clicking on these links brings the user to a legitimate-looking page containing downloadable ‘fixes’ (malware).

One attacker (or group) uploaded a malicious ZIP archive named crowdstrike-hotfix.zip to a Mexican-based online malware-scanning service. The archive contains an executable file posing as a utility that loads a HijackLoader when executed. HijackLoaders can adapt their injection method to bypass security barriers. In other words, even if the malicious code is detected, the HijackLoader can alter its process to evade the methodologies used by the security system designed to prevent malware infiltration. For a detailed analysis on how this is done, check out this blog.

Although some tech commentators have attempted to characterize the Crowdstrike incident as “not a security incident,” the pervasiveness of its effect has instigated a security crisis nonetheless.

Reverberations

This time, the digital cataclysm was caused by well-intentioned people who made a mistake. That meant the fix came relatively quickly; CrowdStrike knew what had gone wrong. But we may not be so lucky next time.
—Brian Klaas

Brian Klaas argues in the Atlantic Magazine that the 21st century has evolved in such a way as to heavily prioritize efficiency over resilience. He uses the historical evolution of human society as a comparative illustration. When, millennia ago, humans lived in small, geographically isolated groups that never crossed paths, a disaster among one remain localized. A plague infecting one group, for instance, would never infect another and thus a global pandemic would fail to occur.

Today, however, things are quite different. We saw this when a massive container ship grounded itself in the Suez Canal, disrupting supply chains everywhere across the world. Crowdstrike is the digital version of that foundering ship.

Cybersecurity specialists know that creating a layered defense is far more effective than a single, robust firewall. No matter the efficacy of a single point of defense, catastrophe requires only one instance of ineffectiveness, whereas maintaining the health of a system demands absolute perfection. Mathematical odds point to which will inevitably win. As Klaas writes, “Slightly more efficient systems that might someday fail catastrophically are a bad bet for humanity.”

Exacerbating the risky position in which the global community has positioned itself is the largely “unpersoned” and unaccountable infrastructure supporting it. Cory Doctorow describes this as the “vanishingly small number of people with the authority to undo the determinations of the Kafka-as-a-service systems that underpin the big platforms.”

He articulates the situation this way:

Microsoft – like Google, Apple, Facebook and the other giants – routinely disconnects users for reasons it refuses to explain, and offers no meaningful appeal. Microsoft tells its business customers, "force your clients to get a Microsoft account in order to maintain communications security" but also reserves the right to unilaterally ban those clients from having a Microsoft account.

Doctorow provides numerous examples illustrating how customers have little choice but to engage with these companies who maintain exclusive control over every element of interaction, including forced exclusion. Irrespective of whether the user “purchases” something, the user itself remains the product. This means that no matter the importance of the associated system to the user’s livelihood, these companies are free to destroy it with no recourse. He adds:

Crowdstrike demonstrated what happens to users when a cloud provider accidentally terminates their account, but while we're thinking about reducing the likelihood of such accidents, we should really be thinking about what happens when you get Crowdstruck on purpose.

The Crowdstrike incident captured the attention of regulators only because of the profundity of its impact, not the nature of it. Put another way, regulators to date have shown little interest or aptitude in protecting individual consumers—however many billions of them there are—from just this kind of blackout, whether accidental or purposeful. It was only when business suffered the affront that someone like the CEO of the company is now being hauled before the US Congress.

The idea that petulant babies like Elon Musk or Mark Zuckerberg can effectively destroy anyone’s lives with impunity should anger everyone. But even if the motivation is more banal, driven by pecuniary interests instead, this is not a sustainable model. Returning to Doctorow:

If they know we can't leave for a competitor, if they know we can't sue them, if they know that a tech rival can't give us a tool to get our data out of their silos, then the expected cost of mistreating us goes down. That makes it economically rational to seek out ever-more trivial sources of income that impose ever-more miserable conditions on us.

These companies are collectively worth trillions of dollars. There is zero excuse for them to sacrifice everyone’s digital and economic well-being to earn a few extra pennies for shareholders. Regulatory authorities are equally to blame, beholden as they are to their financial support in the form of campaign contributions, PACs, and other democratically destructive forms of influence.

Take Action

Society should demand action and dismiss the disingenuous protestations of tech executives. A good start would be to hold shareholders and executives individually and strictly liable for any economic damage caused by their companies’ products. Strict liability means they pay for the damage regardless of “fault.” This would eliminate bogging down recovery in lawsuits dragged on forever by these companies’ vast legal departments. (Note that they have no problem with profligate spending in this area.)

Governments should also require human-occupied assistance that is proportional to the size of the company’s customer base. This would create jobs while providing better service to customers mistreated by unfeeling algorithms that lack any ability to make reasoned decisions.

Lastly, the world needs a serious overhaul on the unethical use of Terms of Service agreements. Companies should not have the power to force users into one-sided “contracts” based on their need to engage with a required service. See a detailed argument about this here.

Finally

I have been a tech enthusiast, user and, in some areas, expert, for quite a while now. We live in the most interesting epoch of human development in history when it comes to this sector. But my enthusiasm wanes. The progression of and benefit from tech has been decidedly one-sided. This paradigm is destroying the environment, threatening people’s well-being, prohibiting real innovation, and positioning power in a tiny and undeserving few.

People are enamored by the idea of traveling among the stars, or sharing space with robots that can perform the most distasteful jobs necessary to keep society operating. But this dream has mostly been a sham, a shiny object dangled in front of our collective eyes to hypnotize us into believing that a cadre of thieves and criminals have anything but their own interests in mind. Society is quickly being pushed into a poignant decision—destroy them or let them destroy us. Unless that is the point we truly wish to reach, the need to act is now.

***

Do you think I am being too harsh calling them thieves and criminals? Read about Elon Musk’s history and decide for yourself.

Law and Crime

The Case for Criminally Charging the World’s Richest Man

Robert Vanwey

August 3, 2023

The Case for Criminally Charging the World’s Richest Man

Visit the Evidence Files Facebook and YouTube pages; Like, Follow, Subscribe or Share! Find more about me on Instagram, Facebook, LinkedIn, or Mastodon. Or visit my EALS Global Foundation’s webpage page here.The Evidence Files is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

Read full story

Thank you for reading The Evidence Files. This post is public so feel free to share it.

What Caused the Global IT Outage?

Addressing a global crisis... and it's not just this outage

Do you think I am being too harsh calling them thieves and criminals? Read about Elon Musk’s history and decide for yourself.

The Case for Criminally Charging the World’s Richest Man

Discussion about this post