From 2fd97644bd27654151899f95c5faa54933516d44 Mon Sep 17 00:00:00 2001 From: jaseg Date: Tue, 28 Jun 2022 17:46:36 +0200 Subject: paper: add crypto description --- paper/safety-reset-paper.tex | 239 ++++++++++++++++++++++++++----------------- 1 file changed, 145 insertions(+), 94 deletions(-) diff --git a/paper/safety-reset-paper.tex b/paper/safety-reset-paper.tex index e3c3a41..732387a 100644 --- a/paper/safety-reset-paper.tex +++ b/paper/safety-reset-paper.tex @@ -69,7 +69,7 @@ approaches. A core issue with post-attack mitigation is that network connections between the utility and devices on consumer premises may not work due to the attack. Thus, mitigation strategies that involve devices on the consumer premises will need an out-of-band communication channel. -In this paper, we propose a novel, resilient, grid-wide communication technique based on \empH{grid frequency +In this paper, we propose a novel, resilient, grid-wide communication technique based on \emph{grid frequency modulation} (GFM) that can be used to broadcast short messages to all devices connected to the electrical grid. The grid frequency modulation channel is robust and can be used even during an ongoing attack. Based on our channel we propose the \emph{safety reset} controller, an attack mitigation technique that is compatible with most smart meter and IoT @@ -112,15 +112,15 @@ traditional PLC, any large industrial load that allows for fast computer control \label{fig_intro_flowchart} \end{figure} -Figure~\ref{fig_intro_flowchart} shows an overview of our concept, where a large aluminium smelter has been temporarily -re-purposed as a GFM transmitter. Two scenarios for its application are before or during a cyberattack, to stop an -attack on the electrical grid in its tracks, and after an attack while power is being restored to prevent a repeated -attack. In both scenarios, our concept is independent of telecommunication networks (such as the internet or cellular -networks) as well as broadcast systems (such as cable television or terrestrial broadcast radio) while requiring only -inexpensive signal processing hardware and no external antennas (such as are needed for satellite communication). A grid -frequency-based system can function as long as power is still available, or as soon as power is restored after the -attack. One powerful function this allows is ``flushing out`` an attacker from compromised smart meters after an attack, -before restoring smart meter internet connectivity. +Figure~\ref{fig_intro_flowchart} shows an overview of our concept using a smart meter as the target device and a large +aluminium smelter temporarily re-purposed as a GFM transmitter. Two scenarios for its application are before or during +a cyberattack, to stop an attack on the electrical grid in its tracks, and after an attack while power is being restored +to prevent a repeated attack. In both scenarios, our concept is independent of telecommunication networks (such as the +internet or cellular networks) as well as broadcast systems (such as cable television or terrestrial broadcast radio) +while requiring only inexpensive signal processing hardware and no external antennas (such as are needed for satellite +communication). A grid frequency-based system can function as long as power is still available, or as soon as power is +restored after the attack. One powerful function this allows is ``flushing out`` an attacker from compromised smart +meters after an attack, before restoring smart meter internet connectivity. Using simulations we have determined that control of a $\SI{25}{\mega\watt}$ load such as a large aluminium smelter, load bank or photovoltaic farm would allow for the transmission of a crytographically secured safety reset signal within @@ -131,16 +131,16 @@ of decoding such signals on a resource-constrained microcontroller. Consumer devices are increasingly becoming \emph{smart}. Large numbers of IoT devices are connected through the public internet, and in several countries internet-connected Smart Meters can disconnect entire households from the grid in -case of unpaid bills. The increasing proliferation of smart devices on the consumer side presents an opportunity to grid -operators, who rely on forecasts for the cost-optimized control of generation and power flow. The core of the -\emph{Smart Grid} vision is that utilities can now gather detailed data for more accurate consumption forecasts, and in -some cases can even adjust parameters of large devices like water heaters to smooth out load spikes. +case of unpaid bills~\cite{anderson01}. The increasing proliferation of smart devices on the consumer side presents an +opportunity to grid operators, who rely on forecasts for the cost-optimized control of generation and power flow. The +core of the \emph{Smart Grid} vision is that utilities can now gather detailed data for more accurate consumption +forecasts, and in some cases can even adjust parameters of large devices like water heaters to smooth out load spikes. However, this increased degree of visibility and control comes with an increased IT security risk. In this paper we focus on scenarios where an attacker compromises a large number of grid-connected remote-controllable devices. This may -be simple smart home devices such as IoT light bulbs, but it may also include Smart Meters that are outfitted with a -remote disconnect switch as is common in some countries. By rapidly switching large numbers of such devices in a -coordinated manner, the attacker has the opportunity to de-stabilize the electrical +be simple smart home devices such as IoT-connected air conditioners, but it may also include Smart Meters that are +outfitted with a remote disconnect switch as is common in some countries. By rapidly switching large numbers of such +devices in a coordinated manner, the attacker has the opportunity to de-stabilize the electrical grid~\cite{zlmz+21,kgma21,smp18,hcb19}. In this paper, we focus on assisting the recovery procedure after a succesful attack because we estimate that this @@ -169,22 +169,22 @@ This work contains the following contributions: \item We carry out extensive simulations of our systems to determine its performance characteristics. \end{enumerate} -\subsection{Notation} +%\subsection{Notation} % FIXME drop or rework this section ; actually update notation to be consistent throughout -To a computer scientist there is one confusing aspect to the theory of grid frequency modulation. GFM can be seen as a -frequency modulation (FM) with a baseband signal in the band below approximately $f_m = \SI{5}{\hertz}$ that is -modulated on top of a carrier signal at $f_c = \SI{50}{\hertz}$ in case of the European electrical grid. The frequency -deviation $f_\Delta$ that the modulated carrier deviates from its nominal value of $f_m$ is very small at only a few -milli-Hertz. - -When grid frequency is measured by first digitizing the mains voltage waveform, then de-modulating digitally, the FM's -signal-to-noise ratio (SNR) is very high and is dominated by the ADC's quantization noise and nearby mains voltage noise -sources such as resistive droop due to large inrush current of nearby machines. - -Note that both the carrier signal at $f_c$ and the modulation signal at $f_m$ both have unit Hertz. To disambiguate -them, in this paper we will use \textbf{bold} letters to refer to the carrier waveform $\mathbf{U}$ or frequency -$\mathbf{f_c}$ as well as its deviation $\mathbf{f_\Delta}$, and we will use normal weight for the actual modulation -signal and its properties such as $f_m$. +%To a computer scientist there is one confusing aspect to the theory of grid frequency modulation. GFM can be seen as a +%frequency modulation (FM) with a baseband signal in the band below approximately $f_m = \SI{5}{\hertz}$ that is +%modulated on top of a carrier signal at $f_c = \SI{50}{\hertz}$ in case of the European electrical grid. The frequency +%deviation $f_\Delta$ that the modulated carrier deviates from its nominal value of $f_m$ is very small at only a few +%milli-Hertz. +% +%When grid frequency is measured by first digitizing the mains voltage waveform, then de-modulating digitally, the FM's +%signal-to-noise ratio (SNR) is very high and is dominated by the ADC's quantization noise and nearby mains voltage noise +%sources such as resistive droop due to large inrush current of nearby machines. +% +%Note that both the carrier signal at $f_c$ and the modulation signal at $f_m$ both have unit Hertz. To disambiguate +%them, in this paper we will use \textbf{bold} letters to refer to the carrier waveform $\mathbf{U}$ or frequency +%$\mathbf{f_c}$ as well as its deviation $\mathbf{f_\Delta}$, and we will use normal weight for the actual modulation +%signal and its properties such as $f_m$. \section{Background on the electrical grid} \subsection{Components and interactions} @@ -216,7 +216,13 @@ resistance to their source of mechanical power, or \emph{prime mover}, which wou and faster. Similarly, if consumption outpaced production, the increased mechanical load would slow down generators, ultimately leading to a collapse. -The frequency of the electrical grid is maintained at a fixed, stable level through several layers of measures. +On top of the grid's inherent mechanical inertia, several tiers of control systems are layered to stabilize mains +frequency during day-to-day operations. Fast-acting automatic primary control stabilizes temporary frequency excursions, +while slower automatic secondary control and manual tertiary control re-adjust device's operating points back to their +nominal values after they have shifted due to primary control action. + +In day-to-day operation, the frequency of the electrical grid is maintained at a +fixed, stable level through several layers of control systems. \subsection{Black-start recovery} @@ -264,19 +270,18 @@ meters, smart meters can provide near-realtime data that the utility can use for \subsection{Powerline Communication (PLC)} -A core issue in smart metering is the communication channel from the meter to the greater world. Smart meters are -cost-constrained devices, which limits the use of landline internet or cellular conenctions. Additionally, electricity -meters are often installed in basements, far away from the customer's router and with soil and concrete blocking radio -signals. For these reasons, in some AMI deployments, powerline communication (PLC) has been chosen for the meters' -uplink. +A core issue in smart metering and demand-side response is the communication channel from the meter to the greater +world. Smart meters are cost-constrained devices, which limits the use of landline internet or cellular conenctions. +Additionally, electricity meters are often installed in basements, far away from the customer's router and with soil and +concrete blocking radio signals. For these reasons, in some AMI deployments, powerline communication (PLC) has been +chosen for the meters' uplink. Since the early days of the electrical grid, powerline communication has been used to control devices spread throughout the grid from a central transmitter~\cite{rs48}. PLC systems super-impose a modulated high-frequency signal on top of the grid voltage. When the carrier frequency of this modulation is in the audible frequency range, low data rates can be transmitted over distances of several tens of kilometers. By using a radio frequency carrier, higher data rates can be achieved across shorter distances. Audio frequency PLC, called ``ripple control'', is still used today by utilities to -enable ``demand-side response'', i.e.\ the remote switching of loads such as water heaters to avoid times of peak -electricity demand. +enable demand-side response, by remotely switching on and off water heaters to avoid times of peak electricity demand. Usually, such powerline communication systems are uni-directional but they are instance of bi-directional powerline communication for smart meter reading such as the italian smart meter deployment~\cite{ec03,rs48,gungor01,agf16}. @@ -287,7 +292,7 @@ communication for smart meter reading such as the italian smart meter deployment \subsection{IoT and Smart Grid security} The security of IoT devices as well as the smart grid has received extensive attention in the -literature~\cite{nbck+19,acsc20,smp18,ykll17,anderson01,anderson02,zlmz21,kgma21,hcb19,mpdm10,lzlw+20,chl20,lam21,olkd20,yomu+20,}. +literature~\cite{nbck+19,acsc20,smp18,ykll17,anderson01,anderson02,zlmz+21,kgma21,hcb19,mpdm+10,lzlw+20,chl20,lam21,olkd20,yomu+20}. The challenges of IoT device security and the security of smart meters and other smart grid devices are similar because smart grid devices are essentially IoT devices in a particularly sensitive location~\cite{acsc20}. In both device types, the challenge is that securing embedded firmware is difficult, and adding network interfaces and cost constraints only @@ -333,8 +338,8 @@ cause outsized damage. Electro-mechanical oscillation modes between different geographical areas of an electrical grid are a well-known phenomenon. In their book~\cite{rogers01}, Rogers and Graham provide an in-depth analysis of these oscillations and -their mitigation. In~\cite{grebe01}, Grebe, Kabouris, López Barba et al.\ analyzed modeskj inherent to the -continental european grid. A report on an event where an oscillation on one such mode caused a problem can be found in +their mitigation. In~\cite{grebe01}, Grebe, Kabouris, López Barba et al.\ analyzed modes inherent to the +continental European grid. A report on an event where an oscillation on one such mode caused a problem can be found in \cite{entsoe01}. In~\cite{zlmz+21}, Zou, Liu, Ma et al.\ analyzed the possibility of a modal attack in which electric vehicle chargers @@ -401,17 +406,17 @@ receiver hardware complexity. To the best of the authors' knowledge, grid frequency modulation has only ever been proposed as a communication channel at very small scales in microgrids before~\cite{urtasun01} and has not yet been considered for large-scale application. -Compared to traditional channels such as DSL, LTE or LoraWAN, grid frequency as a communication channel has a resiliency -advantage: If there is power, a grid frequency modulation system is operational. Both DSL and LTE systems not only -require power at their base stations, but also require centralized infrastructure to operate. Mesh networks such as -LoraWAN can cover short distances up to $\SI{20}{\kilo\meter}$ without requiring infrastructure to be available, but for -longer distances LoraWAN relies on the public internet for its network backbone. Additionally, systems such as DSL, LTE -and LoraWAN are built around a point-to-point communication model and usually do not support a generic broadcast -primitive. During times when a large number of devices must be reached simultaneously this can lead to congestion of -cellular towers and servers. Therefore, during an ongoing cyberattack, grid frequency is promising as a communication -channel because only a single transmitter facility must be operational for it to function, and this single transmitter -can reach all connected devices simultaneously. After a power outage, it can resume operation as soon as electrical -power is restored, even while the public internet and mobile networks are still offline. It is unaffected by +Compared to traditional channels such as Fiber To The Home (FTTH), 5G or LoraWAN, grid frequency as a communication +channel has a resiliency advantage: If there is power, a grid frequency modulation system is operational. Both FTTH and +5G systems not only require power at their base stations, but also require centralized infrastructure to operate. Mesh +networks such as LoraWAN can cover short distances up to $\SI{20}{\kilo\meter}$ without requiring infrastructure to be +available, but for longer distances LoraWAN relies on the public internet for its network backbone. Additionally, +systems such as FTTH, 5G and LoraWAN are built around a point-to-point communication model and usually do not support a +generic broadcast primitive. During times when a large number of devices must be reached simultaneously this can lead to +congestion of cellular towers and servers. Therefore, during an ongoing cyberattack, grid frequency is promising as a +communication channel because only a single transmitter facility must be operational for it to function, and this single +transmitter can reach all connected devices simultaneously. After a power outage, it can resume operation as soon as +electrical power is restored, even while the public internet and mobile networks are still offline. It is unaffected by cyberattacks that target telecommunication networks. \subsection{Characterizing Grid Frequency} @@ -458,7 +463,7 @@ this $1/f$ behavior, the spectrum shows several sharp peaks at time intervals wi $\SI{10}{\second}$, $\SI{60}{\second}$ or multiples of $\SI{300}{\second}$. These peaks are due to loads turning on- or off depending on wall-clock time, and demand forecasting not being able to precisely match the amplitude of these large changes in load. Besides the narrow peaks caused by this effect we can also observe two wider bumps at -$\SI{7.0}{\second}$ and $\SI{4.7}{\second}$. These bumps closely correlate with continental european synchonous area's +$\SI{7.0}{\second}$ and $\SI{4.7}{\second}$. These bumps closely correlate with continental European synchonous area's oscillation modes at $\SI{0.15}{\hertz}$ (east-west) and $\SI{0.25}{\hertz}$ (north-south)~\cite{grebe01}. \section{Grid Frequency Modulation} @@ -470,10 +475,10 @@ thyristor rectifier bank. Compared to this baseline solution, hardware and maint by repurposing a large industrial load as a transmitter. Going through a list of energy-intensive industries in Europe~\cite{ec01}, we found that an aluminium smelter would be a good candidate. In aluminium smelting, aluminium is electrolytically extracted from alumina solution. High-voltage mains power is -transformed, rectified and fed into about 100 series-connected electrolytic cells forming a \emph{potline}. Inside these -pots, alumina is dissolved in molten cryolite electrolyte at about \SI{1000}{\degreeCelsius} and electrolysis is -performed using a current of tens or hundreds of Kiloampère. The resulting pure aluminium settles at the bottom of the -cell and is tapped off for further processing. +transformed, rectified and fed into approximately 100 series-connected electrolytic cells forming a \emph{potline}. +Inside these pots, alumina is dissolved in molten cryolite electrolyte at approximately \SI{1000}{\degreeCelsius} and +electrolysis is performed using a current of tens or hundreds of Kiloampère. The resulting pure aluminium settles at the +bottom of the cell and is tapped off for further processing. Aluminium smelters are operated around the clock, and due to the high financial stakes their behavior under power outages has been carefully characterized. Power outages of tens of minutes up to two hours reportedly do not cause @@ -502,10 +507,10 @@ parts of the plant, as this is commonplace during routine maintenance activities Given the grid characteristics we measured using our custom waveform recorder and using a model of our transmitter, we can derive parameters for the modulation of our broadcast system. The overall network power-frequency characteristic of -the continental European synchronous area is about $\SI{25}{\giga\watt\per\hertz}$~\cite{entsoe02}. Thus, the main -challenge for a GFM system will be poor signal-to-noise ratio (SNR) due to low transmission power. A second layer of -modulation yielding some modulation gain beyond the basic amplitude modulation of the transmitter will be necessary to -achieve sufficient overall SNR. +the continental European synchronous area is approximately $\SI{25}{\giga\watt\per\hertz}$~\cite{entsoe02}. Thus, the +main challenge for a GFM system will be poor signal-to-noise ratio (SNR) due to low transmission power. A second layer +of modulation yielding some modulation gain beyond the basic amplitude modulation of the transmitter will be necessary +to achieve sufficient overall SNR. The grid's frequency noise has significant localized peaks that might interfere with this modulation. Further complicating things are the oscillation modes. A GFM system must be designed to avoid exciting these modes. However, @@ -595,15 +600,56 @@ correction~\cite{mackay01} and some cryptography. The goal of our PoC cryptograp sender of an emergency reset broadcast to authorize a reset command to all listening smart meters. An additional constraint of our setting is that due to the extremely slow communication channel all messages should be kept as short as possible. The solution we chose for our PoC is a simplistic hash chain using the approach from the Lamport and -Winternitz One-time Signature (OTS) schemes. Informally, the private key is a random bitstring. The public key is -generated by recursively applying a hash function to this key a number of times. Each smart meter reset command is then -authorized by disclosing subsequent elements of this series. Unwinding the hash chain from the public key at the end of -the chain towards the private key at its beginning, at each step a receiver can validate the current command by checking -that it corresponds to the previously unknown input of the current step of the hash chain. Replay attacks are prevented -by recording the most recent valid command. Keys revocation is supported by designating the last key in the chain as a -\emph{revocation key} upon whose reception the client devices advance their local hash ratchet without taking further -action. This simple scheme does not afford much functionality but it results in very short messages and removes the -need for computationally expensive public key cryptography inside the smart meter. +Winternitz One-time Signature (OTS) schemeS~\cite{lamport02,merkle01}. Informally, the private key is a random +bitstring. The public key is generated by recursively applying a hash function to this key a number of times. Each smart +meter reset command is then authorized by disclosing subsequent elements of this series. Unwinding the hash chain from +the public key at the end of the chain towards the private key at its beginning, at each step a receiver can validate +the current command by checking that it corresponds to the previously unknown input of the current step of the hash +chain. Replay attacks are prevented by the device memorizing the most recent valid command. Keys revocation is supported +by designating the last key in the chain as a \emph{revocation key} upon whose reception the client devices advance +their local hash ratchet without taking further action. This simple scheme does not afford much functionality but it +results in very short messages and removes the need for computationally expensive public key cryptography inside the +smart meter. + +Formally, we can describe our simple cryptographic protocol as follows. Given an $n$-bit cryptographic hash function $H +: \{0,1\}^*\rightarrow\{0,1\}^m$ and a private key $k_0 \in \{0,1\}^m$, we construct the public key as +$k_{n_\text{total}} = H^{n_\text{total}}(k_0)$ where $H^n(x)$ denotes the $n$-times recursive application of $H$ to +itself, i.e.\ $\underbrace{H(H(\hdots H(}_{n\;\text{times}}x)))$. $q$ is the total number of signatures that the system can +issue. $n_\text{total}$ must be chosen with adequate safety margin to account for unpredictable future use of the +system. The choice of $n_\text{total}$ is of no consequence when a device checks reset authorization, but key generation +time grows linearly with $n_\text{total}$ since $H$ needs to applied $n_\text{total}$ times. In practice, given the +speed of modern computers, values of $n_\text{total} > 10^9$ should pose no problem during key generation. For public +key $k_{n_\text{total}}$, the system can authorize up to $n_\text{total}$ commands by successively disclosing the $k_i$ +starting at $i=n-1$ and counting down until finally disclosing $k_0$. Since we only want to transmit a single bit of +information, we do not need any payload. Instead, we simply send a message $m = (k_i)$ consisting solely of $k_i$. The +receiver of a message $m$ can check that the message is a legitimate command by checking $\exists i