diff options
-rw-r--r-- | paper/safety-reset-paper.tex | 94 |
1 files changed, 47 insertions, 47 deletions
diff --git a/paper/safety-reset-paper.tex b/paper/safety-reset-paper.tex index 0faff8c..6ad01d9 100644 --- a/paper/safety-reset-paper.tex +++ b/paper/safety-reset-paper.tex @@ -127,7 +127,7 @@ Conference}{December 5--9}{Austin, TX, USA} It is resilient towards localized blackouts and it is operational immediately as soon as power is restored. Based on our GFM broadcast channel we propose a ``safety reset'' system to mitigate an ongoing attack by disabling a - device's network interfaces and restting its control functions. It can also be used in the wake of an attack to aid + device's network interfaces and resetting its control functions. It can also be used in the wake of an attack to aid recovery by shutting down non-essential loads to reduce strain on the grid. To validate our proposed design, we conducted simulations based on measured grid frequency behavior. Based on these @@ -208,13 +208,13 @@ modifications. Figure~\ref{fig_intro_flowchart} shows an overview of our concept using a smart meter as the target device and a large aluminium smelter temporarily re-purposed as a GFM transmitter. Two scenarios for its application are before or during -a cyberattack, to stop an attack on the electrical grid in its tracks, and after an attack while power is being restored -to prevent a repeated attack. In both scenarios, our concept is independent of telecommunication networks (such as the -internet or cellular networks) as well as broadcast systems (such as cable television or terrestrial broadcast radio) -while requiring only inexpensive signal processing hardware and no external antennas (such as are needed for satellite -communication). A grid frequency-based system can function as long as power is still available, or as soon as power is -restored after the attack. One powerful function this allows is ``flushing out`` an attacker from compromised smart -meters after an attack, before restoring smart meter internet connectivity. +a cyber attack, to stop an attack on the electrical grid in its tracks, and after an attack while power is being +restored to prevent a repeated attack. In both scenarios, our concept is independent of telecommunication networks (such +as the internet or cellular networks) as well as broadcast systems (such as cable television or terrestrial broadcast +radio) while requiring only inexpensive signal processing hardware and no external antennas (such as are needed for +satellite communication). A grid frequency-based system can function as long as power is still available, or as soon as +power is restored after the attack. One powerful function this allows is ``flushing out`` an attacker from compromised +smart meters after an attack, before restoring smart meter internet connectivity. Using simulations we have determined that control of a $\SI{25}{\mega\watt}$ load such as a large aluminium smelter, load bank or photovoltaic farm would allow for the transmission of a cryptographically secured safety reset signal @@ -237,8 +237,8 @@ outfitted with a remote disconnect switch as is common in some countries. By rap devices in a coordinated manner, the attacker has the opportunity to de-stabilize the electrical grid~\cite{zlmz+21,kgma21,smp18,hcb19}. -In this paper, we focus on assisting the recovery procedure after a succesful attack because we estimate that this -approach will yield a better return of investement in overall grid stability versus resources spent on security +In this paper, we focus on assisting the recovery procedure after a successful attack because we estimate that this +approach will yield a better return of investment in overall grid stability versus resources spent on security measures. Previous work on IoT and Smart Grid security has focused on the prevention of attacks though firmware security measures. While research on prevention is important, we estimate that its practical impact will be limited by the diversity of implementations found in the field~\cite{nbck+19,zlmz+21,smp18}. We predict that it would be a Sisyphean @@ -347,7 +347,7 @@ The recovery from a large-scale outage requires the grid's operators to bring ge one while continuously maintaining balance between generation and consumption to avoid their protection devices shutting them down again. To coordinate this process, transmission system operators cannot rely on the public internet or cellular networks, as they may not work during a large-scale power outage. Instead, they maintain private communication -infrastructure using dedicated lines rented from telecommunciations providers, fibers run along transmission lines, and +infrastructure using dedicated lines rented from telecommunication providers, fibers run along transmission lines, and dedicated radio links. To start from a complete outage, first a number of \emph{black start}-capable power stations that can start by @@ -379,7 +379,7 @@ meters, smart meters can provide near-realtime data that the utility can use for \subsection{Powerline Communication (PLC)} A core issue in smart metering and demand-side response is the communication channel from the meter to the greater -world. Smart meters are cost-constrained devices, which limits the use of landline internet or cellular conenctions. +world. Smart meters are cost-constrained devices, which limits the use of landline internet or cellular connections. Additionally, electricity meters are often installed in basements, far away from the customer's router and with soil and concrete blocking radio signals. For these reasons, in some AMI deployments, powerline communication (PLC) has been chosen for the meters' uplink. @@ -446,7 +446,7 @@ national standards in some devices such as smart electricity meters. Common to the attacks on the electrical grid proposed in the papers discussed above is their approach of overloading parts of the grid. However, scenarios have been proposed that go beyond a simple overload condition, and in which an -attacker exploits the physcial characteristics of the grid to cause oscillations of increasing amplitude, ultimately +attacker exploits the physical characteristics of the grid to cause oscillations of increasing amplitude, ultimately triggering a cascade of protection mechanisms. The purpose of this type of attack is to use a small controllable load to cause outsized damage. @@ -496,7 +496,7 @@ electrical grid. \section{Grid Frequency as a Communication Channel} -During a large-scale cyberattack, availability of internet and cellular connectivity cannot be relied upon. An attacker +During a large-scale cyber attack, availability of internet and cellular connectivity cannot be relied upon. An attacker may already have disabled such systems in a separate attack, or they may go down along with parts of the electrical grid. Powerline communication systems will likely be unaffected by an attack, but at a range of no more than several tens of kilometers, covering the entire grid would require a large upfront infrastructure investment for transmitters. @@ -509,7 +509,7 @@ equations describing their control systems' interaction with the machine's physi by aggregating these approximations into a large system of differential equations. As a consequence, small signal changes in generation/consumption power balance cause an approximately proportional change in frequency~\cite{kundur01,crastan03,entsoe02,entsoe04}. The slope of this first-order approximation is known as -\emph{Power Frequency Charactersistic}, and in case of the continental European synchronous area happens to be about +\emph{Power Frequency Characteristic}, and in case of the continental European synchronous area happens to be about \SI{25}{\giga\watt\per\hertz} according to the European electricity grid authority, ENTSO-E. If we modulate the power consumption of a large load, this modulation will result in a small change in frequency @@ -525,17 +525,17 @@ at very small scales in microgrids before~\cite{urtasun01} and has not yet been Compared to traditional channels such as Fiber To The Home (FTTH), 5G or LoraWAN, grid frequency as a communication channel has a resiliency advantage. It can start transmission as soon as a power island with a connected transmitter is -powered up, while communciation networks such as FTTH or 5G are still rebooting, or might be waiting for parts of their +powered up, while communication networks such as FTTH or 5G are still rebooting, or might be waiting for parts of their centralized infrastructure that are connected to different power islands to come back online. Mesh networks such as LoraWAN can cover short distances up to $\SI{20}{\kilo\meter}$ without requiring infrastructure to be available, but for longer distances LoraWAN relies on the public internet for its network backbone. Additionally, systems such as FTTH, 5G and LoraWAN are built around a point-to-point communication model and usually do not support a global broadcast primitive. During times when a large number of devices must be reached simultaneously this can lead to congestion of -cellular towers and servers. Therefore, during an ongoing cyberattack, grid frequency is promising as a communication +cellular towers and servers. Therefore, during an ongoing cyber attack, grid frequency is promising as a communication channel because only a single transmitter facility must be operational for it to function, and this single transmitter can reach all connected devices simultaneously. After a power outage, it can resume operation as soon as electrical power is restored, even while the public internet and mobile networks are still offline. It is unaffected by -cyberattacks that target telecommunication networks. +cyber attacks that target telecommunication networks. \subsection{Characterizing Grid Frequency} \label{grid-freq-characterization} @@ -622,7 +622,7 @@ the $\SI{30}{\second}$ requirement posed by local standards for primary control. that for their system, an effective thermal energy storage capacity of $\SI{7.7}{\giga\watt\hour}$ is possible if all plants of a single operator are used. Given the maximum modulation depth of $\SI{100}{\percent}$ for up to one hour that is mentioned by the authors, this results in an effective modulation power of $\SI{7.7}{\giga\watt}$. Over a longer -timespan of $\SI{48}{\hour}$, they have demonstrated a $\SI{33}{\percent}$ modulation depth which would correspond to a +time span of $\SI{48}{\hour}$, they have demonstrated a $\SI{33}{\percent}$ modulation depth which would correspond to a modulation power of $\SI{2.5}{\giga\watt}$. We conclude that a modulation of part of an aluminium smelter's power consumption is possible at no significant production impact and at low infrastructure cost. Aluminium smelters are already connected to the grid in a way that they do not pose a danger to other nearby consumers when they turn off or on @@ -648,7 +648,7 @@ utility's dedicated SCADA network. In an emergency, the command center can then through their gps-backed frequency standards, two transmitters will then constructively interfere as soon as they are connected to the same power island. -\subsection{Parametrizing Modulation for GFM} +\subsection{Parameterizing Modulation for GFM} Given the grid characteristics we measured using our custom waveform recorder and using a model of our transmitter, we can derive parameters for the modulation of our broadcast system. The overall network power-frequency characteristic of @@ -718,7 +718,7 @@ durations move our signals' bandwidth into the lower-noise region from $\SI{0.2} \caption{Symbol Error Rate as a function of modulation amplitude for Gold sequences of several lengths.} \Description{A plot of symbol error rate versus amplitude in millihertz. The plot shows four lines, one each for 5 bit, 6 bit, 7 bit and 8 bit. All four lines form smooth step functions, plateauing at a symbol error rate of 1.0 for - low amplitudes and falliing to a symbol error rate of 0.0 for high amplitudes. The low-amplitude plateau is widest + low amplitudes and falling to a symbol error rate of 0.0 for high amplitudes. The low-amplitude plateau is widest for 5 bit and narrowest for 8 bit. The falloff is steepest for 8 bit, and slowest for 5 bit. For 8 bit, a symbol error rate of 0.5 is crossed at about 0.4 millihertz. For 7 bit at about 0.6 millihertz, for 6 bit at 0.8 millihertz and for 5 bit at 1.3 millihertz. For 7 and 8 bit, symbol error rate settles at zero above 1.0 millihertz. For 5 bit @@ -770,7 +770,7 @@ durations move our signals' bandwidth into the lower-noise region from $\SI{0.2} \label{fig_ser_chip} \end{figure} -\subsection{Parametrizing a proof-of-concept ``Safety Reset'' System Based on GFM} +\subsection{Parameterizing a proof-of-concept ``Safety Reset'' System Based on GFM} %FIXME introduce scenario Taking these modulation parameters as a starting point, we proceeded to create a proof-of-concept smart meter emergency @@ -780,13 +780,13 @@ sender of an emergency reset broadcast to authorize a reset command to all liste constraint of our setting is that due to the extremely slow communication channel all messages should be kept as short as possible. The solution we chose for our PoC is a simplistic hash chain using the approach from the Lamport and Winternitz One-time Signature (OTS) schemes~\cite{lamport02,merkle01}. Informally, the private key is a random -bitstring. The public key is generated by recursively applying a hash function to this key a number of times. Each smart -meter reset command is then authorized by disclosing subsequent elements of this series. Unwinding the hash chain from -the public key at the end of the chain towards the private key at its beginning, at each step a receiver can validate -the current command by checking that it corresponds to the previously unknown input of the current step of the hash -chain. Replay attacks are prevented by the device memorizing the most recent valid command. This simple scheme does not -afford much functionality but it results in very short messages and removes the need for computationally expensive -public key cryptography inside the smart meter. +bit string. The public key is generated by recursively applying a hash function to this key a number of times. Each +smart meter reset command is then authorized by disclosing subsequent elements of this series. Unwinding the hash chain +from the public key at the end of the chain towards the private key at its beginning, at each step a receiver can +validate the current command by checking that it corresponds to the previously unknown input of the current step of the +hash chain. Replay attacks are prevented by the device memorizing the most recent valid command. This simple scheme +does not afford much functionality but it results in very short messages and removes the need for computationally +expensive public key cryptography inside the smart meter. Formally, we can describe our simple cryptographic protocol as follows. Given an $m$-bit cryptographic hash function $H : \{0,1\}^*\rightarrow\{0,1\}^m$ and a private key $k_0 \in \{0,1\}^m$, we construct the public key as @@ -814,12 +814,12 @@ becomes idempotent, and the utility can repeat the command until sufficiently ma e.g.\ performed a safety reset. In our protocol, we define two commands, \emph{reset} and \emph{disarm}. We assign \emph{reset} and \emph{disarm} to the -$k_i$ alternatingly. For odd $i$, $k_i$ is a reset command and for even $i$, $k_i$ is a \emph{disarm} command. To -trigger a safety reset, the utility transmits the next unused $k_{2i+1}$. The utility may transmit this command repeatedly -to also reset devices that have come online only after earlier transmissions have started. After a sufficient number of -devices have performed a safety reset, the utility then transmits the next disarm command, $k_{2i}$. When devices -receive the disarm command, they still update the last received command, but they do not perform any other action. The -initial private key, $k_0$, is a \emph{disarm} key. +$k_i$ in an alternating way. For odd $i$, $k_i$ is a reset command and for even $i$, $k_i$ is a \emph{disarm} command. +To trigger a safety reset, the utility transmits the next unused $k_{2i+1}$. The utility may transmit this command +repeatedly to also reset devices that have come online only after earlier transmissions have started. After a sufficient +number of devices have performed a safety reset, the utility then transmits the next disarm command, $k_{2i}$. When +devices receive the disarm command, they still update the last received command, but they do not perform any other +action. The initial private key, $k_0$, is a \emph{disarm} key. The reason for interleaving two commands in this way is to prevent a specific attack scenario in which an attacker first observes a safety reset command being transmitted, and then at a later time gains access to a large load that could act @@ -876,12 +876,12 @@ the meter's display after boot-up. \includegraphics[width=0.45\textwidth]{prototype_schema} \caption{The signal processing chain of our demonstrator.} \Description{A diagram showing the signal processing flow. The diagram shows a number of steps going from grid - voltage waveform to trigger decition. The diagram begins with the DMA-assisted ADC capture. At this point, the + voltage waveform to trigger decision. The diagram begins with the DMA-assisted ADC capture. At this point, the signal is a clean analog sine wave. The next step is grid frequency estimation, after which the signal is a noise-like ragged line. After grid frequency estimation follows DSSS demodulation, which itself is made up of three steps. The first step of DSSS demodulation is convolution, which produces a small noise signal with a large peak somewhere in the middle. The peak is roughly ten times the amplitude of the noise and has two prominent negative - sidelobes to the left and right. The following step, CWT peak contrast enhancement, clenas up this signal and + side lobes to the left and right. The following step, CWT peak contrast enhancement, cleans up this signal and removes the side-lobes leaving only the positive peak sticking out of the background noise. The final step of DSSS demodulation is maximum likelihood estimation, which produces a vector of n plus k discrete elements. After DSSS demodulation, this vector is passed through Reed-Solomon error correction, which transforms it into a vector of now @@ -897,7 +897,7 @@ clock. Since we did not have an aluminium smelter ready, we decided to feed our an emulated grid voltage sine wave from a computer's headphone jack. Where in a real application this microcontroller would take ADC readings of input mains voltage divided down by a long resistive divider chain, we instead feed the ADC from a $\SI{3.5}{\milli\meter}$ audio input. For operational safety, we disconnected the meter microcontroller from its -grid-referenced capacitive dropper power supply and connected it to our reset controlller's debug USB power supply. +grid-referenced capacitive dropper power supply and connected it to our reset controller's debug USB power supply. We performed several successful experiments using a signature truncated at 120 bit and a 5 bit DSSS sequence. Taking the sign bit into account, the length of the encoded signature is 20 DSSS symbols. On top of this we used Reed-Solomon error @@ -935,12 +935,12 @@ implementation has no issues processing data in real-time due to the low samplin In this paper we have developed an end-to-end design for a safety reset system that provides these capabilities. Our novel broadcast data transmission system is based on intentional modulation of global grid frequency. Our system is -independent of normal communication networks and can operate during a cyberattack. We have shown the practical viability -of our end-to-end design through simulations. Using our purpose-designed grid frequency recorder, we can capture and -process real-time grid frequency data in an electrically safe way. We used data captured this way as the basis for -simulations of our proposed grid frequency modulation communication channel. In these simulations, our system has proven -feasible. From our simulations we conclude that a large consumer such as an aluminium smelter at a small cost can be -modified to act as an on-demand grid frequency modulation transmitter. +independent of normal communication networks and can operate during a cyber attack. We have shown the practical +viability of our end-to-end design through simulations. Using our purpose-designed grid frequency recorder, we can +capture and process real-time grid frequency data in an electrically safe way. We used data captured this way as the +basis for simulations of our proposed grid frequency modulation communication channel. In these simulations, our system +has proven feasible. From our simulations we conclude that a large consumer such as an aluminium smelter at a small cost +can be modified to act as an on-demand grid frequency modulation transmitter. We have demonstrated our modulation system in a small-scale practical demonstration. For this demonstration, we have developed a simple cryptographic protocol ready for embedded implementation in resource-constrained systems that allows @@ -952,7 +952,7 @@ an utility and an operator of a multi-megawatt load. \subsection{Discussion} During an emergency in the electrical grid, the ability to communicate to large numbers of end-point devices is a -valuable tool for restoring normal operation. When a resilient communcation channel is available, loads such as smart +valuable tool for restoring normal operation. When a resilient communication channel is available, loads such as smart meters and IoT devices can be equipped with a supervisor circuit that allows for a remote ``safety reset'' that puts the device into a safe operating state. Using this safety reset, an attacker that uses compromised smart meters or IoT devices to attack grid stability can be interrupted before the can conclude their attack. During recovery from an @@ -966,8 +966,8 @@ use complex Systems on Chip (SoCs), a safety reset controller could be integrate itself at little added complexity. In summary, we expect safety reset controllers to be commercially viable. Safety reset controllers can be adapted to most IoT device and smart meter designs. Because they are independent from -other pubilc utilities such as the internet or cellular networks, we believe in their potential as a last line of -defense providing resilience under large-scale cyberattacks. The next steps towards a practical implementation will be +other public utilities such as the internet or cellular networks, we believe in their potential as a last line of +defense providing resilience under large-scale cyber attacks. The next steps towards a practical implementation will be a practical demonstration of broadcast data transmission through grid frequency modulation using a megawatt-scale controllable load as well as further optimization of the modulation and data encoding as well as the demodulator implementation. |