From 09f918187d53ed94b8fe6c34188fd0334e986429 Mon Sep 17 00:00:00 2001 From: jaseg Date: Wed, 5 Oct 2022 18:33:58 +0200 Subject: Final review pass WIP --- paper/safety-reset-paper.tex | 82 +++++++++++++++++++++----------------------- 1 file changed, 39 insertions(+), 43 deletions(-) (limited to 'paper') diff --git a/paper/safety-reset-paper.tex b/paper/safety-reset-paper.tex index 6ad01d9..b1904e8 100644 --- a/paper/safety-reset-paper.tex +++ b/paper/safety-reset-paper.tex @@ -124,7 +124,7 @@ Conference}{December 5--9}{Austin, TX, USA} In this paper propose a broadcast channel based on the modulation of grid frequency through which utility operators can issue commands to devices at the consumer premises both during an attack for mitigation and in its wake to aid recovery. Our proposed grid frequency modulation (GFM) channel is independent of other telecommunication networks. - It is resilient towards localized blackouts and it is operational immediately as soon as power is restored. + It is resilient towards localized blackouts and it is operational immediately after power is restored. Based on our GFM broadcast channel we propose a ``safety reset'' system to mitigate an ongoing attack by disabling a device's network interfaces and resetting its control functions. It can also be used in the wake of an attack to aid @@ -160,11 +160,11 @@ In this paper, we propose a novel, resilient, grid-wide communication technique modulation} (GFM) that can be used to broadcast short messages to all devices connected to the electrical grid. The grid frequency modulation channel is robust and can be used even during an ongoing attack. Based on our channel we propose the \emph{safety reset} controller, an attack mitigation technique that is compatible with most smart meter and IoT -device designs. A safety reset controller is a separate controller integrated to the device that awaits an out-of-band +device designs. A safety reset controller is a separate controller integrated with the device that awaits an out-of-band reset command transmitted through GFM. Upon reception of the reset command, it puts the device into a safe state (e.g. -\emph{heater off} or \emph{light on}) that interrupts attacker control over the device. The safety reset controller is -separated from the system's main application controller and itself does not have any conventional network connections to -reduce attack surface and cost. +\emph{heater off} or \emph{light on}) that interrupts attacker control over the device. To reduce attack surface and +cost, the safety reset controller is separated from the system's main application controller and does not have any +conventional network interfaces. The grid frequency modulation channel can be operated by transmission system operators (TSOs) even during black-start recovery procedures and it bridges the gap between the TSO's private control network and consumer devices that can not @@ -177,9 +177,9 @@ The frequency behavior of the electrical grid can be analyzed by examining the g oscillators coupled through the grid via the electromotive force~\cite{rogers01,wcje+12}. The generators and motors that are electromagnetically coupled through the grid's transmission lines and transformers run synchronously with each other, with only minor localized variations in their rotation angle. The dynamic behavior of grid frequency is a direct -product of this electromechanical coupling: With increasing load, frequency drops because shafts move slower under +product of this electromechanical coupling: With increasing load, frequency drops because turbines move slower under higher torque, and consequentially with decreasing load frequency rises. Industrial control systems keep frequency close -to its nominal value over time spans of minutes or hours, but at shorter time frames the combined inertia of all +to its nominal value over time spans of minutes or hours, but over shorter time spans the combined inertia of all grid-connected generators and motors is what regulates frequency. Grid frequency modulation works by quickly modulating the power of a large, grid-connected load or generator. When this @@ -239,12 +239,13 @@ grid~\cite{zlmz+21,kgma21,smp18,hcb19}. In this paper, we focus on assisting the recovery procedure after a successful attack because we estimate that this approach will yield a better return of investment in overall grid stability versus resources spent on security -measures. Previous work on IoT and Smart Grid security has focused on the prevention of attacks though firmware security -measures. While research on prevention is important, we estimate that its practical impact will be limited by the -diversity of implementations found in the field~\cite{nbck+19,zlmz+21,smp18}. We predict that it would be a Sisyphean -task to secure the firmware of sufficiently many devices to deny an attacker the critical mass needed to cause trouble. -Even if all flaws in the firmware of a broad range of devices would be fixed, users still have to update. In smart grid -and IoT devices, this presents a difficult problem since user awareness is low~\cite{nbck+19}. +measures compared to bug hunting in device firmware. Previous work on IoT and Smart Grid security has focused on the +prevention of attacks though firmware security measures. While research on prevention is important, we estimate that its +practical impact will be limited by the diversity of implementations found in the field~\cite{nbck+19,zlmz+21,smp18}. We +predict that it would be a Sisyphean task to secure the firmware of a number of devices devices sufficient to deny an +attacker the critical mass needed to cause trouble. Even if all flaws in the firmware of a broad range of devices would +be fixed, users still have to update. In smart grid and IoT devices, this presents a difficult problem since user +awareness is low~\cite{nbck+19}. \subsection{Attacker model} @@ -254,10 +255,9 @@ According to the above criteria, our attacker model has the following key featur \item The attacker cannot compromise the utility operators' SCADA systems. \item The attacker can compromise and subsequently control a large number of target devices at the customer's premises such as smart meters or large IoT devices such as air conditioners or central heating systems. - \item Target devices can be designed to include a separate firmware and factory reset function that the attacker - cannot circumvent. In the simplest case, this could be a separate microcontroller that is connected to the - device's application processor's programming port. - \item The attacker aims for maximum disruption as opposed to e.g. data extraction. + \item Devices that may become targets of attacks can be designed to include a separate firmware and factory reset + function that the attacker cannot circumvent. In the simplest case, this could be a separate microcontroller + that is connected to an in-system programming interface of the device's application processor. \end{itemize} \subsection{Contents} @@ -297,7 +297,7 @@ This work contains the following contributions: \section{Background on the electrical grid} \subsection{Components and interactions} -The electrical grid transmits alternating current electrical power from generators to loads. Any device that is +The electrical grid transmits electrical power from generators to loads through alternating current. Any device that is connected to the grid must run \emph{synchronous} with the grid, i.e.\ it must produce or consume power following the grid's voltage waveform. In generators and motors, the electromotive force acts to synchronize the device with the grid. Connecting a generator that has not been synchronized to the grid leads to large currents flowing through the @@ -324,24 +324,20 @@ resistance to their source of mechanical power, or \emph{prime mover}, which wou and faster. Similarly, if consumption outpaced production, the increased mechanical load would slow down generators, ultimately leading to a collapse. -On top of the grid's inherent mechanical inertia, several tiers of control systems are layered to stabilize mains -frequency during day-to-day operations. Fast-acting automatic primary control stabilizes temporary frequency excursions, -while slower automatic secondary control and manual tertiary control re-adjust device's operating points back to their -nominal values after they have shifted due to primary control action. - In day-to-day operation, the frequency of the electrical grid is maintained at a fixed, stable level through several -layers of control systems. +layers of control systems on top of the grid's inherent mechanical inertia. Fast-acting automatic primary control +stabilizes temporary frequency excursions, while slower automatic secondary control and manual tertiary control +re-adjust device's operating points back to their nominal values after they have shifted due to primary control action. \subsection{Black-start recovery} -The recovery from a large-scale power outage is a complex operational challenge. Large outages are caused by cascading -failures. Since all consumers and producers that are connected to the electrical grid are physically coupled through the -electromotive force, a fault in one part of the grid affects all devices connected across the grid. To function, the -grid relies on a delicate balance between electricity generation, transmission and consumption. When this balance is -disturbed, cascading failures can occur. A transmission line shutting off can lead other, nearby lines to overload and -shut off. Due to the electromechanical coupling of all machines connected to the grid, a generator or consumer suddenly -shutting off causes a transient in the grid's frequency. If the frequency goes too far out of bounds, protection devices -take power plants and large industrial loads offline. +To function, the grid relies on a delicate balance between electricity generation, transmission and consumption. When +this balance is disturbed, cascading failures can occur and because this balance must be kept in balance at all times, +the recovery from a large-scale power outage is a complex operational challenge. Since all consumers and producers that +are connected to the electrical grid are physically coupled through the electromotive force, a fault in one part of the +grid affects all devices connected across the grid. A transmission line shutting off can lead other, nearby lines to +overload and shut off, and a generator or consumer suddenly shutting off causes a transient in the grid's frequency. If +the frequency goes too far out of bounds, protection devices take power plants and large industrial loads offline. The recovery from a large-scale outage requires the grid's operators to bring generators and loads back online one by one while continuously maintaining balance between generation and consumption to avoid their protection devices shutting @@ -353,8 +349,8 @@ dedicated radio links. To start from a complete outage, first a number of \emph{black start}-capable power stations that can start by themselves without any external power are brought online. With their help, other power stations and consumers are gradually brought online until a part of the grid has been restored to nominal operation. This process can be performed -simultaneously in different parts of the grid. After these \emph{islands} have been restored, they can then be joined to -restore the grid to its normal state. +simultaneously in different parts of the grid. After these \emph{islands} have been restored, they can then be +synchronized and re-joined to restore the grid to its normal state. \subsection{Demand-side response and Smart Metering} @@ -385,15 +381,15 @@ concrete blocking radio signals. For these reasons, in some AMI deployments, pow chosen for the meters' uplink. Since the early days of the electrical grid, powerline communication has been used to control devices spread throughout -the grid from a central transmitter~\cite{rs48}. PLC systems super-impose a modulated high-frequency signal on top of +the grid from a central transmitter~\cite{rs48}. PLC systems super-impose a modulated higher-frequency signal on top of the grid voltage. When the carrier frequency of this modulation is in the audible frequency range, low data rates can be transmitted over distances of several tens of kilometers. By using a radio frequency carrier, higher data rates can be achieved across shorter distances\cite{pvyh03}. Audio frequency PLC, called ``ripple control'', is still -used today by utilities to enable demand-side response, by remotely switching on and off water heaters to avoid times of +used today by utilities for demand-side response, remote-controlling special water heaters to avoid times of peak electricity demand. -Usually, such powerline communication systems are uni-directional but they are instance of bi-directional powerline -communication for smart meter reading such as the italian smart meter deployment~\cite{ec03,rs48,gungor01,agf16}. +Powerline communication systems are usually uni-directional, but there are instances of bi-directional powerline +communication for smart meter reading~\cite{ec03,rs48,gungor01,agf16}. \section{Related work} \label{sec_related_work} @@ -438,17 +434,17 @@ security indicates that even if significant effort is spent on the security of I with desktop, server and smartphone security, significant vulnerabilities are likely to remain for some time to come. In this instance, market forces do not align with the interest of the public at large. Vulnerabilities remain likely, especially in code implementing complex network protocols such as TLS~\cite{georgiev01}, which may even be mandated by -national standards in some devices such as smart electricity meters. +national standards in some devices such as smart meters. %\subsection{Reliably resetting an IoT or Smart Grid device} \subsection{Oscillations in the electrical grid} Common to the attacks on the electrical grid proposed in the papers discussed above is their approach of overloading -parts of the grid. However, scenarios have been proposed that go beyond a simple overload condition, and in which an -attacker exploits the physical characteristics of the grid to cause oscillations of increasing amplitude, ultimately -triggering a cascade of protection mechanisms. The purpose of this type of attack is to use a small controllable load to -cause outsized damage. +parts of the grid. However, scenarios have been proposed that go beyond a simple overload condition, in which an +attacker instead carefully exploits the physical characteristics of the grid to cause oscillations of increasing +amplitude, ultimately triggering a cascade of protection mechanisms. The purpose of this type of attack is to use a +small controllable load to cause outsized damage. Electro-mechanical oscillation modes between different geographical areas of an electrical grid are a well-known phenomenon. In their book~\cite{rogers01}, Rogers and Graham provide an in-depth analysis of these oscillations and -- cgit