Without Backup Plans, Global IT Outages Will Happen Again

The "2038 Problem" underscores the growing complexity of technological infrastructure due to increased reliance on interconnected systems (Shutterstock).
The "2038 Problem" underscores the growing complexity of technological infrastructure due to increased reliance on interconnected systems (Shutterstock).
TT

Without Backup Plans, Global IT Outages Will Happen Again

The "2038 Problem" underscores the growing complexity of technological infrastructure due to increased reliance on interconnected systems (Shutterstock).
The "2038 Problem" underscores the growing complexity of technological infrastructure due to increased reliance on interconnected systems (Shutterstock).

Elements of Friday’s global IT outage, which grounded planes and hit services from banking to healthcare, have occurred before and until more contingencies are built into networks, and organizations put better back-up plans in place, it will happen again.
Friday’s outage was caused by an update that US cybersecurity firm CrowdStrike pushed to its clients early on Friday morning which conflicted with Microsoft’s Windows operating system, rendering devices around the world inoperable, reported Reuters.
CrowdStrike has one of the largest shares of the highly competitive cybersecurity market that provides such tools, leading some industry analysts to question whether control over such operationally critical software should remain in the hands of just a handful of companies.
But the outage has also raised concerns among experts that many organizations are not well-prepared to implement contingency plans when a single point of failure such as an IT system, or a piece of software within it, goes down.
At the same time there are also more solvable digital disasters looming on the horizon, with perhaps the biggest global IT challenge since the Millennium Bug, the “2038 Problem”, just under 14 years away - and, this time, the world is infinitely more dependent on computers.
“It’s easy to jump at the idea that this is disastrous and therefore suggest there must be a more diverse market and, in an ideal world, that’s what we’d have,” said Ciaran Martin, former head of Britain’s National Cyber Security Centre (NCSC), part of the country's GCHQ intelligence agency.
“We're actually good at managing the safety aspects of tech when it comes to cars, trains, planes, and machines. What we're bad at is then providing services,” he added.
“Look at what happened to the London health system a few weeks ago - they were hacked, and that led to loads of canceled operations, which is physically dangerous,” he said, referring to a recent ransomware incident which affected Britain’s National Health Service (NHS).
Organizations need to look around their IT systems, Martin said, and ensure there are enough failsafes and redundancies in those systems to stay operational in the event of an outage.
Friday’s outage happened amid a perfect storm, with both Microsoft and CrowdStrike owning huge shares of a market which relies on both of their products.
“I'm sure the regulators globally are looking at this. There is limited competition globally for operating systems, for example, and also for the large scale cybersecurity products like the ones CrowdStrike provides,” said Nigel Phair, a cybersecurity professor at Australia’s Monash University.
Friday's outage hit airlines particularly hard, as many scrambled to check in and board passengers who relied upon digital tickets to fly. Some travelers posted photos on social media of hand-written boarding cards provided by airline staff. Others were only able to fly if they had printed out their ticket.
“I think it's very important for organizations of all shapes and sizes to really look at their risk management and look at an all-hazards approach,” Phair said.
EPOCHALYPSE NOW
Friday’s outage will not be the last time the world is reminded of its dependency on computers and IT products for basic services to function. In about 14 years' time, the world will be faced with a time-based computer issue similar to the Millennium Bug called the “2038 Problem”.
The Millennium Bug, or “Y2K” happened because early computers saved expensive memory space by only counting the last two digits of the year, meaning many systems were unable to distinguish between the year 1900 and 2000, leading to critical errors.
The cost to mitigate the problem in the years before 2000 ran up a global bill of hundreds of billions of dollars.
The 2038 problem, or "Epochalypse", which begins at 0314 GMT on Jan. 19, 2038, is, in essence, the same problem.
Many computers count the passage of time by measuring the number of seconds since midnight on Jan. 1, 1970, also known as the “Epoch”.
Those seconds are stored as a finite sequence of zeros and ones, or “bits” but for many computers, the number of bits that can be stored reaches its maximum value in 2038.
“We currently have a situation where there's huge global disruption, because we cannot cope administratively,” said Ciaran Martin, the former NCSC head.
“We can cope in terms of safety, but we can't cope in terms of service provision when key networks go down”.



8.5 Million Computers Running Windows Affected by Faulty Update from CrowdStrike

A technician works on an information display near United Airlines gates at Chicago O'Hare International Airport in Chicago, Friday, July 19, 2024, after a faulty CrowdStrike update caused a major internet outage for computers running Microsoft Windows. (AP Photo/Carolyn Kaster)
A technician works on an information display near United Airlines gates at Chicago O'Hare International Airport in Chicago, Friday, July 19, 2024, after a faulty CrowdStrike update caused a major internet outage for computers running Microsoft Windows. (AP Photo/Carolyn Kaster)
TT

8.5 Million Computers Running Windows Affected by Faulty Update from CrowdStrike

A technician works on an information display near United Airlines gates at Chicago O'Hare International Airport in Chicago, Friday, July 19, 2024, after a faulty CrowdStrike update caused a major internet outage for computers running Microsoft Windows. (AP Photo/Carolyn Kaster)
A technician works on an information display near United Airlines gates at Chicago O'Hare International Airport in Chicago, Friday, July 19, 2024, after a faulty CrowdStrike update caused a major internet outage for computers running Microsoft Windows. (AP Photo/Carolyn Kaster)

As the world continues to recover from massive business and travel disruptions caused by a faulty software update from cybersecurity firm CrowdStrike, malicious actors are trying to exploit the situation for their own gain.
Government cybersecurity agencies across the globe and even CrowdStrike CEO George Kurtz are warning businesses and individuals around the world about new phishing schemes that involve malicious actors posing as CrowdStrike employees or other tech specialists offering to assist those recovering from the outage.
“We know that adversaries and bad actors will try to exploit events like this,” Kurtz said in a statement. “I encourage everyone to remain vigilant and ensure that you’re engaging with official CrowdStrike representatives.”
According to The Associated Press, the UK Cyber Security Center said they have noticed an increase in phishing attempts around this event.
Microsoft said 8.5 million devices running its Windows operating system were affected by the faulty cybersecurity update Friday that led to worldwide disruptions. That’s less than 1% of all Windows-based machines, Microsoft cybersecurity executive David Weston said in a blog post Saturday.
He also said such a significant disturbance is rare but “demonstrates the interconnected nature of our broad ecosystem.”
What's happening with air travel? By late morning on the US East Coast, airlines around the world had canceled more than 1,500 flights, far fewer than the 5,100-plus cancellations on Friday, according to figures from tracking service FlightAware.
Two-thirds of Saturday’s canceled flights occurred in the United States, where carriers scrambled to get planes and crews back into position after massive disruptions the day before. According to travel-data provider Cirium, US carriers canceled about 3.5% of their scheduled flights for Saturday. Only Australia was hit harder.
Canceled flights were running at about 1% in the United Kingdom, France and Brazil and about 2% in Canada, Italy and India among major air-travel markets.
Robert Mann, a former airline executive and now a consultant in the New York area, said it was unclear exactly why US airlines were suffering disproportionate cancellations, but possible causes include a greater degree of outsourcing of technology and more exposure to Microsoft operating systems that received the faulty upgrade from CrowdStrike.
How are healthcare systems holding up? Health care systems affected by the outage faced clinic closures, canceled surgeries and appointments and restricted access to patient records.
Cedars-Sinai Medical Center in Los Angeles, Calif., said “steady progress has been made” to bring its servers back online and thanked its patients for being flexible during the crisis.
“Our teams will be working actively through the weekend as we continue to resolve remaining issues in preparation for the start of the work week,” the hospital wrote in a statement.
In Austria, a leading organization of doctors said the outage exposed the vulnerability of relying on digital systems. Harald Mayer, vice president of the Austrian Chamber of Doctors, said the outage showed that hospitals need to have analog backups to protect patient care.
The organization also called on governments to impose high standards in patient data protection and security, and on health providers to train staff and put systems in place to manage crises.
“Happily, where there were problems, these were kept small and short-lived and many areas of care were unaffected” in Austria, Mayer said.
The Schleswig-Holstein University Hospital in northern Germany, which canceled all elective procedures Friday, said Saturday that systems were gradually being restored and that elective surgery could resume by Monday.