System Issues and Public Panic

Air travel in the United States was severely interrupted by a cyber incident last week. I heard about it on the news as I walked my dog and spent the entirety of that walk framing how I would speak to the incident at the opening of my class later that day. I was teaching a beginner-level class during the day, and they would hear about it first. It was not until the following evening that I would be teaching my more advanced class on cyber security.

We, the general public, might never know the complete truth about the incident. Nor should we. Throughout my life I have worn many hats, but the two most consistent ones have been that of an IT Professional and that of a Security Specialist. To be clear, the latter is often mistaken for what it is not. While I do know a great deal about cybersecurity, I do not claim to be a cybersecurity expert; rather, I am an IT Professional with a strong mind for security. I have friends who are cybersecurity experts, and frankly they are both much smarter and much more devious than I am. They also have particular skillsets that I do not claim to be a master of. In any event, it is with these eyes that I view every incident.

Over the past several years the world has become suspicious and skeptical, seeing conspiracy and lies under every rock and behind every tree. Too many people hear hoofbeats and immediately look for zebras… and when they see horses, they try to find someone to blame for painting the zebra to look like a horse. Well in the world of IT, a world in which I have been firmly ensconced for most of my post-military life, there are people (like me) whose job it is to seek out those zebras, and to be able to see the difference between a real one and one masked to look like a horse.
When an agency like the Federal Aviation Administration (FAA) has computer issues, the immediate reaction from most is that ‘This must be a cybercrime! This might even be terrorism!’ How the world has changed… the morning of September 11, 2001, when the first plane hit the first tower, most people were crying, ‘What a terrible accident!’ I do not know if every security specialist in the world knew immediately that it was terrorism… I can almost guarantee you that every Israeli security specialist did.

In that initial news report from CBC radio, I listened to some details that, to me, indicated that the incident likely was not a malicious attack. The spokesperson for the FAA said that they needed to reboot the systems, and that everything should be back up and working by 9:00am. For context, this was during the 8:00am The World This Hour. assuming the newsroom considered this a big story (which indeed it was), they would have been using a quote that was less than an hour old. In fact, from what I could tell, the systems were indeed back online by 9:00am. It probably took the entire day for the backlog to be cleared, but that is an issue with airports and not with the FAA systems.

What are the implications of the FAA systems going down due to either human error or unattributable systems failure? If it is human error, there will be a reckoning. That might mean disciplinary action, better education, or possibly as extreme as termination. If it was a systems failure, the system component that failed will be replaced. The country will not stop flying, and once these steps are taken, the system will be more resilient.

What are the implications of the FAA systems going down due to a cyber-attack? They will improve their security, they will patch whatever holes there were in their security, and then cybersecurity experts who have a lot more patience for minutia and attention to detail that I ever have will spend the next several months scouring every single line of code in every one of their systems (the live systems as well as the pre-production and test/dev ones) to determine if the downtime was caused by an intrusion that might have established a backdoor and a command and control foothold in these systems. These are people who do not mess around. The entire world will be afraid to fly once again, there will be widespread panic among future passengers (and stories for weeks from passengers who have flown previously who are amazed that they were on an airplane and actually landed safely). There will be calls for heads to roll, from the head of the agency at the top (and possibly to the politicians who hired him) to the guy who delivered pizza to the patch management team the night before. All things travel-related will get more expensive, and none of these reactions from the general public will make any sense but they will be no less real.

There is an old adage that says: “Do not ascribe to malice that which can be explained by incompetence.” This makes a lot of sense. However, in the world of information technology, there does not have to be incompetence or human error for things to go wrong. I have heard a lot of speculation over the past few days about what might have happened, but my initial speculation was this: a patch was applied during the regular maintenance window, and when the systems were rebooted, something went wrong. This can be caused by a bug in the patch, but it can also be caused by any number of things, including pesky file/data corruption. Had it been a bug in the patch, it should have been caught in the dev/test stage before being applied to the production systems… but as anyone who has ever been responsible for patch management for a large organization, that is not always the case. In either event, when the issue (which now the FAA is indeed claiming was database corruption) was discovered, they immediately (using the definition of that term that cautious IT Professionals who know what might be on the line if they make a mistake will use) restored from backup, rebooted the systems, and were up in exactly the timeframe that they projected. Over the following few days, their team (overseen by myriad uninvolved experts) would perform what is called a Root Cause Analysis (RCA) to determine what happened, and they would use the results of that to make the adjustments to their systems to ensure that this issue would not happen again.
If there is no incompetence (and I do not immediately believe there is), and there is no conspiracy to hide the truth (and it would not actually be a conspiracy, but more on that in a minute), then maybe that adage can be amended to say: ‘Do not rush to ascribe malice that which can be explained by other factors.’ Yes, something went wrong with the systems. Every IT Professional in the world knows that there need not be either malice or human error for systems to fail. What did cause the outage? We might never know. As both an IT Professional and a Security Specialist (with extensive training in the needs of national security) I think it is often necessary for the organizations that keep us safe to keep secrets from the public. Any issue regarding the safety of air travel is by necessity an issue of national security, so the FAA should be keeping secrets from us. That is not so that we will become suspicious of them, rather because the integrity of their systems and internal processes will always require a level of secrecy, lest the malicious actors discover a way to compromise them due to unintended disclosure of information… or the disclosure of information that alone would be innocuous, but when put together with other information either gleaned or stolen.

As an IT Professional with a strong head for security, my initial thought when I heard the systems were down on Wednesday turned to the possibility of a cyber-attack. It did not take very long for me to begin doubting that. Had it been a malicious attack, the systems would have been kept down much longer than they were. No matter what level of disruption to air travel it might have caused, it is extremely unlikely the FAA would have brought their systems back online so fast. They would have not only needed to eliminate the immediate threat, they would have also spent as much time as was required to ensure that the threat was indeed completely eliminated, including that search for the elusive backdoor and C&C foothold. Additionally, as anyone who has ever been infected with malware or the target of a hacker will tell you, rebooting your system does not fix those… ever.

For the rest of the week, LinkedIn and Facebook (and myriad other platforms) were replete with polls and opinion and speculation about what happened. One poll, conducted by a contact of mine whose LinkedIn title reads: Cybersecurity Promoter | Risk Assessor | Vulnerability Patcher | Security Awareness Trainer | IT Professional, asked: “The FAA grounded all flights due to a software outage. Malicious, or not malicious?’ The response of 140 votes was 59% Malicious, 41% Not malicious. There were a lot of comments on the thread, and a lot of interesting responses (from one of which I quoted the adage about ascribing to malice). As with most discussion among serious IT Professionals, there was little mention of conspiracy. It was about malware and software bugs and hardware failures and the need for redundancies. These are serious people, and we need to ask serious questions.

There are zebras in this world. However, there are many more horses than there are zebras. There is no good reason to paint a zebra to look like a horse (although I was once at an event where a live zebra had been painted in the colours of the rainbow). Likewise, there is a need for some people who will be suspicious of every horse. These people are necessary so that when a zebra is disguised, it can be detected and quarantined and remediated. What is not helpful is for hordes of people to be panicky and shouting about conspiracies when the professionals are trying to confirm whether the horse is really a zebra, cleaning it to look like what it is supposed to, and then coating it to ensure that nobody could ever paint it again. I think by now we can agree that I have gone too far with this analogy, and there will be no more discussion of zebras… or equines of any sort.

Information systems are not simple. Albert Einstein said, ‘Everything should be made as simple as possible… but no simpler.’ The systems that are involved in communicating with thousands of airplanes across six time zones about conditions at thousands of airports cannot be simple. There are more “moving” parts (most of which do not actually move) than most of us could possibly imagine. With that in mind, it is frankly not at all shocking that the systems went down, but that they do not go down more often. I shared a meme on Facebook a couple of weeks ago that read: ‘You can do 99 things for someone, and all they remember is the one thing you didn’t do.’ This could be reworded to: ‘You can do 99 things right, and all they will remember is the one thing you did wrong (or that went wrong).’ With all of the complaints about air travel, the system that runs it is a marvel that works perfectly most of the time. I am not talking about the airlines or the airports or weather-related incidents and a hundred other things that do not operate as well, rather the federal infrastructure (both in the USA and in Canada, as well as internationally under the auspices of the International Civil Air Organization) that those airlines and airports operate within. Those systems hardly ever fail. When they do, the implications can be harmless, or they can be staggering… but it is not something for lay people to determine because most of them do not (and cannot) understand what is involved.

What is the real answer? Were Wednesday’s failures caused by an attack, a mistake, or a faulty component? I do not know if the public will ever know, but I do know that the next time I plan to fly to the United States I will not be deterred by this incident. I will continue to trust the system that has not done ninety-nine things right but rather millions of them. There will, of course, be people who believe that I am out of my mind… that there is a giant conspiracy to mask the fact that air travel to and within the United States has been compromised, and that to fly into any airport there would be madness, and that the government is hiding the truth from us. Many of these are the same people who believed and spread conspiracies such as the one about 5G cell towers causing COVID-19, but there are also people who are more rational who will be hesitant to fly. I am not saying that they are crazy, only that I do not agree with them… which is a horse of a different colour.

Fly safe, folks!


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: