Process integrity as central to ICS security.

Oct 24, 2019

Process integrity as central to ICS security.

Before taking up the nature of attacks on ICS systems, Joe Slowik, Principal Adversary Hunter at Dragos, spoke at SecurityWeek's 2019 ICS Cyber Security Conference on Tuesday. He wished to dispel some loose thinking around the topic of industrial control systems generally. ICS are not smart buildings, still less are they the whole of the Internet of Things. When we talk about ICS, he emphasized, we should be clear that we're talking about control of industrial processes. And in that regard, an overemphasis on availability can be misleading. Availability is important, but it's not sufficient, Slowik argued. It's not enough that a process be available; it must be controllable. You have to be able to stop a process when you need to do so, for example, particularly when safety demands doing so. So the beginning of any discussion of ICS security should be process integrity.

An outline of ICS attacks.

The popular conception of an ICS attack is that it turns off the power, blows up the plant, destroys the centrifuges, and so on. That’s true, Slowik said, but this picture of an attack obscures other, more subtle and sophisticated actions: degradation of a process, introduction of defects, undermining of process safety, and, above all, doing these things in a way that’s difficult to detect.

An attack follows a common sequence. First, there are preparatory actions, and these are then followed by denial, degradation, or destruction, "the three Ds." Just because someone got into your network doesn’t mean that it was an ICS attack. "If it doesn’t fit the three Ds, then it’s not an ICS attack." Thus he would exclude such preparatory actions as reconnaissance of a power grid from attacks proper.

An attack typically unfolds by breaching an IT network, identifying points of contact with the industrial control systems, then enumerating and categorizing the control system environment, moving into that environment, and then delivering effects on target.

Stuxnet, Crashoverride, and Trisis are good examples of process integrity attacks, and Slowik took them up in order to illustrate the threat.

Stuxnet: "the world's first digital weapon."

Stuxnet was deployed against Iranian uranium enrichment facilities at Natanz. In the popular press, the conception of Stuxnet was that it deployed many zero days, destroyed centrifuges, and in the end put a stop to Iranian nuclear activity. In fact, its operation was more subtle. It increased operational variability in the centrifuge cascade, thereby increasing failure rates. It modified process telemetry to hide these defects, and created a difficult-to-diagnose uncertainty in the uranium enrichment process.

Thus, Slowik argued, Stuxnet certainly had a direct impact, but it focused on indirect impacts. Operators couldn’t trust the equipment, and leaders no longer trusted the scientists at Natanz or their supply chain. Uranium was still being enriched, but with greater difficulty, and Stuxnet altered the calculus of those in charge of Iran's nuclear program. The attack increased the cost of enrichment, emphasized the risks of the activity, and probably facilitated negotiation of JCPOA, the Joint Comprehensive Plan of Action among Iran, China, France, Germany, Russia, the United Kingdom, and the United States.

Crashoverride: a Russian hit on the Ukrainian power grid.

Crashoverride twice penetrated industrial control systems in a localized section of the Ukrainian power grid, installed malware on computers communicating to field devices, scheduled malware execution to open breakers at the target transmission site, and then performed a limited wipe and system disabling event on infected machines. The attack also targeted protective relays with a post-attack denial-of-service exploit.

In 2015 the attackers manually interacted with control systems, and the wiper was intended to impede recovery. In 2016 their interactions were encoded in malware, and there was an attempt to impact protection systems. The 2016 attack seemed less successful, and in any case power was restored more rapidly than it had been in 2015, but in other respects 2016 was a more interesting attack. The attackers realized that the wiper should be used to deny control over systems. This didn't work in practice, but the approach was designed to create much more widespread damage. It anticipated a rush to physical restoration, and induced a loss of view into transmission protections. Had it functioned as it was apparently intended to, it would have led the personnel involved in recovery to restore power to an unprotected line, thereby creating an unsafe state at the moment of restoration.

Thus Crashoverride eventually anticipated a rush to recovery, sought to create an unsafe state at the time of restoration, and to produce a physically destructive impact.

Triton/Trisis: lethal, or at least indifferent to possible harm.

Triton/Trisis, a 2017 attack on a Saudi petrochemical plant, was the most recent and in some respects the most disturbing of the three attacks. Press coverage of the incident emphasized its disruption of plant operations, and the headlines called it "malware that can kill." Its actual implications were again, Slowik argued, more subtle and complex. An interaction with safety instrumented systems (SIS) introduced an in-memory rootkit that allowed adversary access. That access could be used to arbitrarily modify the SIS, with SIS integrity compromised to unknown effect. But there were in general three possible conditions it could have achieved. It could have induced a denial-of-service condition in the plant by recording safe conditions as unsafe. It could have been used for destruction (and this is the possibility that drew most attention, and is also the likeliest intention of the attackers) by recording unsafe conditions as safe. Or it could have been used to directly trip the SIS for a variety of other reasons.

The attackers gained access to and harvested credentials from an IT network, pivoted through control OT networks, and used credential capture to gain sufficient access to attack safety systems.

An assessment of the record, and some thoughts for the future.

A brief assessment of the three attacks suggests that Stuxnet basically worked. Crashoverride largely failed: the disruption was smaller than what the attackers probably desired. Trisis failed: it caused target devices to fail by tripping safety systems.

Immediate direct effects are the least flexible and least likely to scale. Delayed direct effects can produce their impact at the time of the adversaries’ choosing. Integrity attacks undermine confidence in processes while potentially producing other impacts.

Integrity attacks have been seen across different sectors. Attacks on manufacturing are difficult to understand and diagnose. Such incidents can introduce defects, add hard-to-diagnose errors, increase the likelihood of product failure, and manipulate testing tolerance for equipment quality control.

Attacks on electrical power generation and distribution can generate frequency instability, disable protective relays, or translate loss of frequency control to physical damage. They might, like Trisis, affect safety. As Crashoverride did, they might undermine ability to protect equipment or personnel. Or, in the case of Aurora events, they could affect system reliability.

Cyber threat detection, defense, and mitigation.

Slowik concluded with a summary of appropriate defensive measures. "We need traditional IT-based defenses. We also need process-monitoring and analysis, and resilience and recovery investment." And we need to bring an OT, engineering perspective to those defenses.

There are some challenges defenders face. "How do you do forensic analysis on your safety environment? If you need to restore, do you have a last known-good condition, and a recent known-good condition?" Slowik argued that we need more focus on root-cause analysis. We need to be able to identify indications of an ICS breach, correlate IT intrusion data to anomalous process data, deploy knowledge to investigate process disruptions, and facilitate post-recovery analysis.

We can, Slowik warned, expect continued adversary interest in industrial control services. We can expect increased acceptance of physical damage on the part of adversaries, and defenders will embrace logical and process monitoring in an ICS-focused defense.