\documentclass[sigconf,anonymous]{acmart} \usepackage[binary-units]{siunitx} \DeclareSIUnit{\baud}{Bd} \DeclareSIUnit{\year}{a} \usepackage{graphicx,color} \usepackage{subcaption} \usepackage{array} \usepackage{hyperref} \usepackage{enumitem} \renewcommand{\floatpagefraction}{.8} \newcommand{\degree}{\ensuremath{^\circ}} \newcolumntype{P}[1]{>{\centering\arraybackslash}p{#1}} \newcommand{\partnum}[1]{\texttt{#1}} \begin{document} % https://eepublicdownloads.entsoe.eu/clean-documents/pre2015/publications/entsoe/Operation_Handbook/Policy_1_Appendix%20_final.pdf \begin{abstract} The dependence of the electrical grid on networked control systems is steadily rising. While utilities are defending their side of the grid effectively through rigorous IT security measures such as physically separated control networks, the increasing number of networked devices on the consumer side such as smart meters or large IoT-connected appliances such as air conditioners are much harder to secure due to their heterogeneity. We consider a crisis scenario in which an attacker compromises a large number of consumer-side devices and modulates their electrical to destabilize the grid and cause an electrical outage~\cite{ctap+11,wu01,zlmz+21,kgma21,smp18,hcb19}. In this paper propose a broadcast channel based on the modulation of grid frequency through which utility operators can issue commands to devices at the consumer premises both during an attack for mitigation and in its wake to aid recovery. Our proposed grid frequency modulation (GFM) channel is independent of other telecommunication networks. It is resilient towards localized blackouts and it is operational immediately as soon as power is restored. Based on our GFM broadcast channel we propose a ``safety reset'' system to mitigate an ongoing attack by disabling a device's network interfaces and restting its control functions. It can also be used in the wake of an attack to aid recovery by shutting down non-essential loads to reduce strain on the grid. To validate our proposed design, we conducted simulations based on measured grid frequency behavior. Based on these simulations, we performed an experimental validation on simulated grid voltage waveforms using a smart meter equipped with a prototype safety reset system based on an inexpensive commodity microcontroller. \end{abstract} \date{} \title{\large\bf Ripples in the Pond:\\Transmitting Information through Grid Frequency Modulation} \author{{\rm Jan Sebastian Götte}\\TU Darmstadt \and {\rm Liran Katzir}\\Tel Aviv University\and {\rm Björn Scheuermann}\\TU Darmstadt} %\institute{TU Darmstadt\\ Communication Networks Lab\\ \email{safetyreset@jaseg.de} %\and Tel Aviv University\\ Faculty of Engineering\\ \email{lirankat@tau.ac.il} %\and TU Darmstadt\\ Communication Networks Lab\\ \email{scheuermann@informatik.hu-berlin.de}} \maketitle %\keywords{Security, privacy and resilience in critical infrastructures \and Security and privacy in ``internet of %things'' \and Cyber-physical systems \and Hardware security \and Network Security \and Energy systems \and Signal theory} \section{Introduction} With the rollout of the smart grid, the IT security of electrical infrastructure has attracted increased attention in the last years. Smart Grid security has two major components: The security of central SCADA systems, and the security of equipment at the consumer premises such as smart meters and IoT devices. While there is previous work on both sides, their interactions have not yet received much attention. We consider the previously proposed scenario where a large number of compromised consumer devices is used alone or in conjunction with an attack on the grid's central SCADA systems to destabilize the grid by rapidly modulating the total connected load~\cite{ctap+11,wu01,zlmz+21,kgma21,smp18,hcb19}. Several devices have been identified as likely targets for such an attack including smart meters with integrated remote disconnect switches~\cite{ctap+11,anderson01}, large IoT-connected appliances~\cite{smp18,hcb19,chl20,olkd20} and electric vehicle chargers~\cite{kgma21,zlmz+21,olkd20}. Such attacks are hard to mitigate, and existing literature focuses on hardening grid control systems~\cite{kgma21,lzlw+20,lam21,zlmz+21} and device firmware\cite{mpdm+10,smp18,zb20,yomu+20} to prevent compromise. Despite the infeasibility of perfect firmware security, there is little research on \emph{post-compromise} mitigation approaches. A core issue with post-attack mitigation is that network connections such as internet and cellular networks between the utility and devices on consumer premises may not work due to the attack. Thus, mitigation strategies that involve devices on the consumer premises will need an out-of-band communication channel. In this paper, we propose a novel, resilient, grid-wide communication technique based on \empH{grid frequency modulation} (GFM) that can be used to broadcast short messages to all devices connected to the electrical grid. The grid frequency modulation channel is robust and can be used even during an ongoing attack. Based on our channel we propose the \emph{safety reset} controller, an attack mitigation technique that is compatible with most smart meter and IoT device designs. A safety reset controller is a separate controller integrated to the device that awaits an out-of-band reset command transmitted through GFM. Upon reception of the reset command, it puts the device into a safe state (e.g. \emph{relay on} or \emph{light on}) that interrupts attacker control over the device. The safety reset controller is separated from the system's main application controller and itself does not have any conventional network connections to reduce attack surface and cost. The grid frequency modulation channel can be operated by transmission system operators (TSOs) even during black-start recovery procedures and it bridges the gap between the TSO's private control network and consumer devices that can not economically be equipped with other resilient communication techniques such as satellite transceivers. To demonstrate our proposed channel, we have implemented a system that transmits error-corrected and cryptographically secured commands through an emulated grid frequency-modulated voltage waveform to an off-the-shelf smart meter equipped with a prototype safety reset controller based on a small off-the-shelf microcontroller. The frequency behavior of the electrical grid can be analyzed by examining the grid as a large collection of mechanical oscillators coupled through the grid via the electromotive force~\cite{rogers01,wcje+12}. The generators and motors that are electromagnetically coupled through the grid's transmission lines and transformers run synchronously with each other, with only minor localized variations in their rotation angle. The dynamic behavior of grid frequency is a direct product of this electromechanical coupling: With increasing load, frequency drops because shafts move slower under higher torque, and consequentially with decreasing load frequency rises. Industrial control systems keep frequency close to its nominal value over time spans of minutes or hours, but at shorter time frames the combined inertia of all grid-connected generators and motors is what regulates frequency. Grid frequency modulation works by quickly modulating the power of a large, grid-connected load or generator. When this modulation is at low amplitude and high frequency, it is below the thresholds set for the grid's automated control systems and monitoring systems and it will directly affect frequency according to the grid's inertia. GFM differs from traditional Powerline Communication (PLC) systems in that it reaches every device within one synchronous area as the signal is embedded into the fundamental grid frequency. Traditional PLC uses a superimposed voltage, which is quickly attenuated across long distances. Practically speaking, using GFM a single large transmitter can cover an entire synchronous area, while in traditional PLC hundreds or thousands of smaller transmitters would be necessary. Unlike traditional PLC, any large industrial load that allows for fast computer control can act as a GFM transmitter. \begin{figure} \centering \includegraphics[width=0.4\textwidth]{flowchart} \caption{Structural overview of our concept. 1 - Government authority or utility operations center. 2 - Emergency radio link. 3 - Aluminium smelter. 4 - Electrical grid. 5 - Target smart meter.} \label{fig_intro_flowchart} \end{figure} Figure~\ref{fig_intro_flowchart} shows an overview of our concept, where a large aluminium smelter has been temporarily re-purposed as a GFM transmitter. Two scenarios for its application are before or during a cyberattack, to stop an attack on the electrical grid in its tracks, and after an attack while power is being restored to prevent a repeated attack. In both scenarios, our concept is independent of telecommunication networks (such as the internet or cellular networks) as well as broadcast systems (such as cable television or terrestrial broadcast radio) while requiring only inexpensive signal processing hardware and no external antennas (such as are needed for satellite communication). A grid frequency-based system can function as long as power is still available, or as soon as power is restored after the attack. One powerful function this allows is ``flushing out`` an attacker from compromised smart meters after an attack, before restoring smart meter internet connectivity. Using simulations we have determined that control of a $\SI{25}{\mega\watt}$ load such as a large aluminium smelter, load bank or photovoltaic farm would allow for the transmission of a crytographically secured safety reset signal within $15$ minutes. We have designed and constructed a proof-of-concept prototype receiver that demonstrates the feasibility of decoding such signals on a resource-constrained microcontroller. \subsection{Motivation} Consumer devices are increasingly becoming \emph{smart}. Large numbers of IoT devices are connected through the public internet, and in several countries internet-connected Smart Meters can disconnect entire households from the grid in case of unpaid bills. The increasing proliferation of smart devices on the consumer side presents an opportunity to grid operators, who rely on forecasts for the cost-optimized control of generation and power flow. The core of the \emph{Smart Grid} vision is that utilities can now gather detailed data for more accurate consumption forecasts, and in some cases can even adjust parameters of large devices like water heaters to smooth out load spikes. However, this increased degree of visibility and control comes with an increased IT security risk. In this paper we focus on scenarios where an attacker compromises a large number of grid-connected remote-controllable devices. This may be simple smart home devices such as IoT light bulbs, but it may also include Smart Meters that are outfitted with a remote disconnect switch as is common in some countries. By rapidly switching large numbers of such devices in a coordinated manner, the attacker has the opportunity to de-stabilize the electrical grid~\cite{zlmz+21,kgma21,smp18,hcb19}. In this paper, we focus on assisting the recovery procedure after a succesful attack because we estimate that this approach will yield a better return of investement in overall grid stability versus resources spent on security measures. Previous work on IoT and Smart Grid security has focused on the prevention of attacks though firmware security measures. While research on prevention is important, we estimate that its practical impact will be limited by the diversity of implementations found in the field~\cite{nbck+19,zlmz+21,smp18}. We predict that it would be a Sisyphean task to secure the firmware of sufficiently many devices to deny an attacker the critical mass needed to cause trouble. Even if all flaws in the firmware of a broad range of devices would be fixed, users still have to update. In smart grid and IoT devices, this presents a difficult problem since user awareness is low~\cite{nbck+19}. \subsection{Contents} Starting from a high level architecture, we have carried out simulations of our concept's performance under real-world conditions using measured grid frequency data. Based on these simulations we implemented an end-to-end prototype of our proposed safety reset controller as part of a realistic smart meter demonstrator. Finally, we experimentally validated our results based on a simulated mains voltage signal and we will conclude with an outline of further steps towards a practical implementation. This work contains the following contributions: \begin{enumerate}[topsep=4pt] \item We introduce Grid Frequency Modulation (GFM) as a communication primitive. % FIXME done before in that one paper \item We elaborate the fundamental physics underlying GFM and theorize on the constrains of a practical implementation. \item We design a communication system based on GFM. \item We carry out extensive simulations of our systems to determine its performance characteristics. \end{enumerate} \subsection{Notation} % FIXME drop or rework this section ; actually update notation to be consistent throughout To a computer scientist there is one confusing aspect to the theory of grid frequency modulation. GFM can be seen as a frequency modulation (FM) with a baseband signal in the band below approximately $f_m = \SI{5}{\hertz}$ that is modulated on top of a carrier signal at $f_c = \SI{50}{\hertz}$ in case of the European electrical grid. The frequency deviation $f_\Delta$ that the modulated carrier deviates from its nominal value of $f_m$ is very small at only a few milli-Hertz. When grid frequency is measured by first digitizing the mains voltage waveform, then de-modulating digitally, the FM's signal-to-noise ratio (SNR) is very high and is dominated by the ADC's quantization noise and nearby mains voltage noise sources such as resistive droop due to large inrush current of nearby machines. Note that both the carrier signal at $f_c$ and the modulation signal at $f_m$ both have unit Hertz. To disambiguate them, in this paper we will use \textbf{bold} letters to refer to the carrier waveform $\mathbf{U}$ or frequency $\mathbf{f_c}$ as well as its deviation $\mathbf{f_\Delta}$, and we will use normal weight for the actual modulation signal and its properties such as $f_m$. \section{Background on the electrical grid} \subsection{Components and interactions} The electrical grid transmits alternating current electrical power from generators to loads. Any device that is connected to the grid must run ``synchronously'' with the grid, i.e.\ it must produce or consume power following the grid's voltage waveform. In generators and motors, the electromotive force acts to synchronize the device with the grid. Connecting a generator that has not been synchronized to the grid leads to large currents flowing through the generator's windings, inducing extreme forces that can mechanically destroy the generator. Similarly, if the inverters of a solar power station would try to fight the grid, the grid would win and the inverters' power semiconductors would release their magic smoke. Originally, all power sources on the grid were synchronous rotating generators. Today, the shift towards renewable energies and the introduction of high-voltage DC links has led to some of the grid's generating capacity being replaced with inverters that electronically emulate the grid's voltage waveform to efficiently convert a DC input to the grid's alternating current. The generators and loads on the grid are linked through a complex network of transmission lines. Transformers are used to couple between transmission lines operating at different voltage levels, and several types of switches allow utilities to steer power flow throughout this network. Through the electromotive force, all synchronous generators connected to the grid are electromechanically coupled. Transmission lines introduce a (small) phase delay to the electric fields traversing the grid, but besides local differences in phase, all parts of the grid are synchronous. \subsection{Grid frequency behavior} On the electrical grid, generation and consumption of energy must be precisely matched at all times for the grid to stay at a constant, synchronous frequency. If generation outpaces consumption, generators would provide less mechanical resistance to their source of mechanical power, or \emph{prime mover}, which would lead the generators to spin faster and faster. Similarly, if consumption outpaced production, the increased mechanical load would slow down generators, ultimately leading to a collapse. The frequency of the electrical grid is maintained at a fixed, stable level through several layers of measures. \subsection{Black-start recovery} The recovery from a large-scale power outage is a complex operational challenge. Large outages are caused by cascading failures. Since all consumers and producers that are connected to the electrical grid are physically coupled through the electromotive force, a fault in one part of the grid affects all devices connected across the grid. To function, the grid relies on a delicate balance between electricity generation, transmission and consumption. When this balance is disturbed, cascading failures can occur. A transmission line shutting off can lead other, nearby lines to overload and shut off. Due to the electromechanical coupling of all machines connected to the grid, a generator or consumer suddenly shutting off causes a transient in the grid's frequency. If the frequency goes too far out of bounds, protection devices take power plants and large industrial loads offline. The recovery from a large-scale outage requires the grid's operators to bring generators and loads back online one by one while continuously maintaining balance between generation and consumption to avoid their protection devices shutting them down again. To coordinate this process, transmission system operators cannot rely on the public internet or cellular networks, as they may not work during a large-scale power outage. Instead, they maintain private communication infrastructure using dedicated lines rented from telecommunciations providers, fibers run along transmission lines, and dedicated radio links. To start from a complete outage, first a number of \emph{black start}-capable power stations that can start by themselves without any external power are brought online. With their help, other power stations and consumers are gradually brought online until a part of the grid has been restored to nominal operation. This process can be performed simultaneously in different parts of the grid. After these \emph{islands} have been restored, they can then be joined to restore the grid to its normal state. \subsection{Demand-side response and Smart Metering} Maintaining the balance between electricity generation and consumption under varying load conditions is critical. Utilities can access different energy sources, each of which have their own trade-off in response speed versus energy cost. For instance, the availability of wind and solar power cannot be controlled at all, while hydroelectric power plants can quickly regulate the speed and power output of their turbines. Combined with the complex layout of the grid's infrastructure such as transmission lines, these economical factors lead to a complex optimization problem, the quality of whose solution directly manifests itself in the utility's bottom line. For decades, one solution to this issue has been demand-side response (DSR)~\cite{rs48}. In DSR, large loads such as water heaters are centrally controlled by the utility to switch on outside of peak demand. Since the precise timing of these loads is of no consequence to their user, users are happy to get slightly better prices from their utility while utilities gain a degree of control allowing them to optimize their network's performance. As part of the smart grid vision, DSR will be utilized in a larger fraction of consumer devices. A core component of the smart grid is the rollout of ``Advanced Metering Infrastructure'' (AMI), colloquially known as smart meters. Smart meters are electricity meters that use a real-time communication interface to automatically transmit high-resolution measurements to the utility. In contrast to the yearly reading schedule of traditional electricity meters, smart meters can provide near-realtime data that the utility can use for more accurate load forecasting. \subsection{Powerline Communication (PLC)} A core issue in smart metering is the communication channel from the meter to the greater world. Smart meters are cost-constrained devices, which limits the use of landline internet or cellular conenctions. Additionally, electricity meters are often installed in basements, far away from the customer's router and with soil and concrete blocking radio signals. For these reasons, in some AMI deployments, powerline communication (PLC) has been chosen for the meters' uplink. Since the early days of the electrical grid, powerline communication has been used to control devices spread throughout the grid from a central transmitter~\cite{rs48}. PLC systems super-impose a modulated high-frequency signal on top of the grid voltage. When the carrier frequency of this modulation is in the audible frequency range, low data rates can be transmitted over distances of several tens of kilometers. By using a radio frequency carrier, higher data rates can be achieved across shorter distances. Audio frequency PLC, called ``ripple control'', is still used today by utilities to enable ``demand-side response'', i.e.\ the remote switching of loads such as water heaters to avoid times of peak electricity demand. Usually, such powerline communication systems are uni-directional but they are instance of bi-directional powerline communication for smart meter reading such as the italian smart meter deployment~\cite{ec03,rs48,gungor01,agf16}. \section{Related work} \label{sec_related_work} \subsection{IoT and Smart Grid security} The security of IoT devices as well as the smart grid has received extensive attention in the literature~\cite{nbck+19,acsc20,smp18,ykll17,anderson01,anderson02,zlmz21,kgma21,hcb19,mpdm10,lzlw+20,chl20,lam21,olkd20,yomu+20,}. The challenges of IoT device security and the security of smart meters and other smart grid devices are similar because smart grid devices are essentially IoT devices in a particularly sensitive location~\cite{acsc20}. In both device types, the challenge is that securing embedded firmware is difficult, and adding network interfaces and cost constraints only makes the task harder. In~\cite{smp18}, Soltan, Mittal and Poor investigated an attack scenario where an attacker first gains control over a large number of high wattage devices through an IoT security vulnerability, then uses this control to cause rapid load spikes. The researchers performed computer simulations for a range of parameters and concluded that given sufficiently many compromised devices, an attacker can cause issues up to a large-scale blackout. In~\cite{hcb19}, Huang, Cardenas and Baldick raised a counter-point to the conclusions of Soltan et al., finding that limitations of their simulations in~\cite{smp18} have lead them to over-estimate the severity of an attack. Using a more accurate model, they confirmed that such attacks can cause problems such as localized blackouts and the decay of the grid into islands, but they found that overall the electrical grid is less vulnerable than previously assumed and particularly large-scale blackouts are very unlikely, primarily due to the action of protection systems such as load shedding and over frequency protection. From literature, we get the overall impression that both IoT and Smart Grid security are challenging. Both lack behind the security standard of state of the art desktop, server and smartphone operating systems. Reasons for this are the relatively recent nature of the IoT software ecosystem and the large number of independent implementations. A unique challenge to Smart Grid security is that due to the fragmentation of markets along national borders, certain devices such as smart meters or DSR implementations exist in large monocultures. Compared to IoT and Smart Grid devices, the embedded firmware foundations of modern smartphones have received more attention both from the industry and from academia. Pinto and Santos in~\cite{pinto01} conducted a survey of implementations based on ARM's TrustZone embedded virtualization architecture and found a significant number of reported vulnerabilities across different implementations. For instance, Rosenberg in~\cite{rosenberg01} found critical issues in Qualcomm's QSEE hypervisor, and Kanonov and Wool in~\cite{kanonov01} identified a number of design weaknesses and security vulnerabilities in Samsung's competing KNOX virtualization product. To us, the state of the field of embedded security indicates that even if significant effort is spent on the security of IoT and Smart Grid devices to catch up with desktop, server and smartphone security, significant vulnerabilities are likely to remain for some time to come. In this instance, market forces do not align with the interest of the public at large. Vulnerabilities remain likely, especially in code implementing complex network protocols such as TLS~\cite{georgiev01}, which may even be mandated by national standards in some devices such as smart electricity meters. \subsection{Oscillations in the electrical grid} Common to the attacks on the electrical grid proposed in the papers discussed above is their approach of overloading parts of the grid. However, scenarios have been proposed that go beyond a simple overload condition, and in which an attacker exploits the physcial characteristics of the grid to cause oscillations of increasing amplitude, ultimately triggering a cascade of protection mechanisms. The purpose of this type of attack is to use a small controllable load to cause outsized damage. Electro-mechanical oscillation modes between different geographical areas of an electrical grid are a well-known phenomenon. In their book~\cite{rogers01}, Rogers and Graham provide an in-depth analysis of these oscillations and their mitigation. In~\cite{grebe01}, Grebe, Kabouris, López Barba et al.\ analyzed modeskj inherent to the continental european grid. A report on an event where an oscillation on one such mode caused a problem can be found in \cite{entsoe01}. In~\cite{zlmz+21}, Zou, Liu, Ma et al.\ analyzed the possibility of a modal attack in which electric vehicle chargers rapidly modulate their power to force an oscillation of a poorly dampened wide-area electromechanical mode. Using mathematical analysis, small-scale simulations and practical experiments they validated the attack scenario and developed a countermeasure that can be implemented as part of generator control systems and that when activated can suppress forced oscillations of wide-area electromechanical modes. On the device side of the smart grid, research has concentrated on smart meter security. Smart meters are architecturally similar to IoT devices~\cite{zheng01,ifixit01}, but come with different challenges. Similar to a high-power IoT device, an attacker could use an off-switch built as part of an attack, a scenario that was investigated by Anderson and Fuloria in~\cite{anderson01}. Unique to smart meters, an attacker could, however, also use their control to manipulate the meter's energy accounting, quickly leading to potentially severe financial impact on the meter's operating utility company. This scenario has received research attention~\cite{anderson02,mcdaniel01} and this is where industry incentives are the strongest. Smart electricity meters are consumer devices built down to a price and manufacturers' firmware security R\&D budgets are limited by the high degree of market fragmentation that is caused by mutually incompatible national smart metering standards. Landis+Gyr, a large utility meter manufacturer, state in their 2019 annual report that they invested \SI{36}{\percent} of their total R\&D budget on embedded software while spending only \SI{24}{\percent} on hardware R\&D~\cite{landisgyr01,landisgyr02}, which indicates tension between firmware security and the manufacturers's bottom line. \subsection{Proposed Countermeasures} In~\cite{kgma21}, the authors propose an extension to grid control algorithms aimed at increasing the grid's robustness towards forced oscillations. In~\cite{smp18}, the authors propose that utility operators use a detailed attacker model to engineer additional safety margins into the grid while minimizing the economic inefficiency of these measures. On the IoT side, they note that due to the wide implementation diversity, the problem cannot be solved by individual measures and propose additional fundamental research on IoT device security. In~\cite{hcb19}, the authors conclude that simple demand attacks where compromised loads suddenly increase demand are adequately mitigated by existing safety measures, in particular \emph{Under-Frequency Load Shedding} (UFLS). As part of UFLS, during a contingency the utility will progressively disconnected loads according to set priorities until the production / generation balance has been restored and a blackout has been averted. UFLS is already deployed in any large electrical grid. % FIXME more sources! \section{Grid Frequency as a Communication Channel} During a large-scale cyberattack, availability of internet and cellular connectivity cannot be relied upon. An attacker may already have disabled such systems in a separate attack, or they may go down along with parts of the electrical grid. Powerline communication systems will likely be unaffected by an attack, but at a range of no more than several tens of kilometers, covering the entire grid would require a large upfront infrastructure investment for transmitters. We propose to approach the problem of broadcasting an emergency signal to all grid-connected devices such as smart meters or IoT appliances within a synchronous area by using grid frequency as a communication channel. Despite the technological complexity of the grid, the physics underlying its response to changes in load and generation is surprisingly simple. Individual machines (loads and generators) can be approximated by a small number of differential equations describing their control systems' interaction with the machine's physics, and the entire grid can be modelled by aggregating these approximations into a large system of differential equations. As a consequence, small signal changes in generation/consumption power balance cause an approximately proportional change in frequency~\cite{kundur01,crastan03,entsoe02,entsoe04}. The slope of this first-order approximation is known as \emph{Power Frequency Charactersistic}, and in case of the continental European synchronous area happens to be about \SI{25}{\giga\watt\per\hertz} according to the European electricity grid authority, ENTSO-E. If we modulate the power consumption of a large load, this modulation will result in a small change in frequency according to this characteristic. As long as we stay within the operational limits set by ENTSO-E~\cite{entsoe02,entsoe03}, this change will not degrade the operation of other parts of the grid. The advantages of grid frequency modulation are the fact that a single transmitter can cover an entire synchronous area as well as low receiver hardware complexity. To the best of the authors' knowledge, grid frequency modulation has only ever been proposed as a communication channel at very small scales in microgrids before~\cite{urtasun01} and has not yet been considered for large-scale application. Compared to traditional channels such as DSL, LTE or LoraWAN, grid frequency as a communication channel has a resiliency advantage: If there is power, a grid frequency modulation system is operational. Both DSL and LTE systems not only require power at their base stations, but also require centralized infrastructure to operate. Mesh networks such as LoraWAN can cover short distances up to $\SI{20}{\kilo\meter}$ without requiring infrastructure to be available, but for longer distances LoraWAN relies on the public internet for its network backbone. Additionally, systems such as DSL, LTE and LoraWAN are built around a point-to-point communication model and usually do not support a generic broadcast primitive. During times when a large number of devices must be reached simultaneously this can lead to congestion of cellular towers and servers. Therefore, during an ongoing cyberattack, grid frequency is promising as a communication channel because only a single transmitter facility must be operational for it to function, and this single transmitter can reach all connected devices simultaneously. After a power outage, it can resume operation as soon as electrical power is restored, even while the public internet and mobile networks are still offline. It is unaffected by cyberattacks that target telecommunication networks. \subsection{Characterizing Grid Frequency} \label{grid-freq-characterization} Before analyzing grid frequency as a communication channel, we developed a device that allows us to collect ground truth for our analysis by safely recording the grid voltage waveform. Our system consists of an \texttt{STM32F030F4P6} ARM Cortex M0 microcontroller that records mains voltage using its internal 12-bit ADC and transmits measured values through a galvanically isolated USB/serial bridge to a host computer. We derive our system's sampling clock from a crystal oven to avoid frequency measurement noise due to thermal drift of a regular crystal: \SI{1}{ppm} of crystal drift would cause a grid frequency error of $\SI{50}{\micro\hertz}$. We compared our oven-stabilized clock against a GPS 1 pps reference and found that over a time span of 20 minutes both stayed stable within 5 ppb of each other, which corresponds to the drift specification of a typical crystal oven. In utility SCADA systems, Phasor Measurement Units (PMUs) are used to precisely measure grid frequency among other parameters. Details on the inner workings of commercial phasor measurement units are scarce but there is a large amount of academic research on their measurement algorithms. PMUs employ complex signal analysis algorithms to provide fast and precise measurements even when given a heavily distorted input signal~\cite{narduzzi01,derviskadic01,belega01}. In our application, we do not need the same level of precision. For the sake of simplicity, we use the universal frequency estimation approach of Gasior and Gonzalez~\cite{gasior01}. In this algorithm, the windowed input signal is processed using a Discrete Fourier Transform (DFT), then the signal's fundamental frequency is interpolated by fitting a wavelet to the largest peak in the DFT result. The bias parameter of this curve fit is an accurate estimation of the signal's fundamental frequency. This algorithm is similar to the interpolated DFT algorithm referenced by phasor measurement literature~\cite{borkowski01}. \begin{figure} \centering \includegraphics[width=0.45\textwidth]{../notebooks/fig_out/freq_meas_spectrum_new} \caption{The spectrum of grid frequency variations measured over 24 hours. The raw spectrum is shown in gray, and a smoothed spectrum is shown in red. The blue line is inversely proportional to frequency and illustrates the $1/f$ nature of the spectrum. Distinctive peaks in the spectrum are marked with red crosses, and their locations are given on the bottom of the diagram.} \label{fig_freq_spec} \end{figure} Using our grid frequency recorder, we performed a two-day measurement series of grid frequency. Figure~\ref{fig_freq_spec} shows the frequency spectrum of grid frequency over this two-day span. In this spectrum, we observe a number of features. Across the frequency range, we observe a broad $1/f$ noise. Above a period of $\SI{10}{\second}$, this $1/f$ noise dips to a flat noise floor. We estimate that this low-noise region is caused by the self-regulating effect of loads. %FIXME citation Above a $\SI{10}{\second}$ period, primary control is activated and thus the $1/f$ noise we observe is the result of the interaction between primary control and consumer demand. On top of this $1/f$ behavior, the spectrum shows several sharp peaks at time intervals with a ``round'' number such as $\SI{10}{\second}$, $\SI{60}{\second}$ or multiples of $\SI{300}{\second}$. These peaks are due to loads turning on- or off depending on wall-clock time, and demand forecasting not being able to precisely match the amplitude of these large changes in load. Besides the narrow peaks caused by this effect we can also observe two wider bumps at $\SI{7.0}{\second}$ and $\SI{4.7}{\second}$. These bumps closely correlate with continental european synchonous area's oscillation modes at $\SI{0.15}{\hertz}$ (east-west) and $\SI{0.25}{\hertz}$ (north-south)~\cite{grebe01}. \section{Grid Frequency Modulation} A transmitter for grid frequency modulation would be a controllable load of several Megawatt that is located centrally within the grid. A baseline implementation would be a spool of wire submerged in a body of cooling liquid (such as a small lake) which is powered from a thyristor rectifier bank. Compared to this baseline solution, hardware and maintenance investment can be decreased by repurposing a large industrial load as a transmitter. Going through a list of energy-intensive industries in Europe~\cite{ec01}, we found that an aluminium smelter would be a good candidate. In aluminium smelting, aluminium is electrolytically extracted from alumina solution. High-voltage mains power is transformed, rectified and fed into about 100 series-connected electrolytic cells forming a \emph{potline}. Inside these pots, alumina is dissolved in molten cryolite electrolyte at about \SI{1000}{\degreeCelsius} and electrolysis is performed using a current of tens or hundreds of Kiloampère. The resulting pure aluminium settles at the bottom of the cell and is tapped off for further processing. Aluminium smelters are operated around the clock, and due to the high financial stakes their behavior under power outages has been carefully characterized. Power outages of tens of minutes up to two hours reportedly do not cause problems in aluminium potlines~\cite{eisma01,oye01}. Recently, even techniques for intentional power modulation without affecting cell lifetime or product quality have been developed to take advantage of variable energy prices~\cite{duessel01,eisma01,depree01}. An aluminium plant's power supply is controlled to constantly keep all smelter cells under optimal operating conditions. Modern power supply systems employ large banks of diodes or thyristors to rectify low-voltage AC to DC to be fed into the potline~\cite{ayoub01}. Potline voltage is controlled through a combination of a tap changer and a transductor. Individual cell voltages are controlled by changing the physical distance between anode and cathode distance. In this setup, power can be electronically modulated using the thyristor rectifier. Since the system does not have any mechanical inertia, high modulation rates are possible. In~\cite{depree01}, the authors describe a setup where a large Aluminium smelter in continental Europe is used as primary control reserve for frequency regulation. In this setup, a rise time of $\SI{15}{\second}$ was achieved to meet the $\SI{30}{\second}$ requirement posed by local standards for primary control. In their conclusion, the authors note that for their system, an effective thermal energy storage capacity of $\SI{7.7}{\giga\watt\hour}$ is possible if all plants of a single operator are used. Given the maximum modulation depth of $\SI{100}{\percent}$ for up to one hour that is mentioned by the authors, this results in an effective modulation power of $\SI{7.7}{\giga\watt}$. Over a longer timespan of $\SI{48}{\hour}$, they have demonstrated a $\SI{33}{\percent}$ modulation depth which would correspond to a modulation power of $\SI{2.5}{\giga\watt}$. We conclude that a modulation of part of an aluminium smelter's power consumption is possible at no significant production impact and at low infrastructure cost. Aluminium smelters are already connected to the grid in a way that they do not pose a danger to other nearby consumers when they turn off or on parts of the plant, as this is commonplace during routine maintenance activities. \subsection{Parametrizing Modulation for GFM} Given the grid characteristics we measured using our custom waveform recorder and using a model of our transmitter, we can derive parameters for the modulation of our broadcast system. The overall network power-frequency characteristic of the continental European synchronous area is about $\SI{25}{\giga\watt\per\hertz}$~\cite{entsoe02}. Thus, the main challenge for a GFM system will be poor signal-to-noise ratio (SNR) due to low transmission power. A second layer of modulation yielding some modulation gain beyond the basic amplitude modulation of the transmitter will be necessary to achieve sufficient overall SNR. The grid's frequency noise has significant localized peaks that might interfere with this modulation. Further complicating things are the oscillation modes. A GFM system must be designed to avoid exciting these modes. However, since these modes are not static, a modulation method that is designed around a specific assumption of their location would not be future proof. Given these concerns, the optimal second-level modulation technique for GFM is a spread-spectrum technique. By spreading signal energy throughout a wide band, both the impact of local noise spikes is minimized and the risk of mode excitation is reduced since spread-spectrum techniques minimize energy in any particular sub-band. The spread-spectrum technique that we chose is Direct Sequence Spread Spectrum for its simple implementation and good overall performance. DSSS chip timing should be as fast as the transmitter's physics allow to exploit the low-noise region between $\SI{0.2}{\hertz}$ to $\SI{2.0}{\hertz}$ in Figure~\ref{fig_freq_spec}. Going past $\approx\SI{2}{\hertz}$ would complicate frequency measurement at the receiver side. \subsubsection{Direct Sequence Spread Spectrum (DSSS) modulation} Direct Sequence Spread Spectrum modulation is a common spread-spectrum technique that forms the basis of a number of radio systems, most prominently all global navigation satellite systems (GNSS). As a spread-spectrum technique, DSSS spreads out the signal's energy across a broad spectral range. This decreases the susceptibility of a DSSS signal to narrowband interference. In GNSS, this allows the rejection of other nearby RF sources. In our use case, this makes the signal immune to the many narrow peaks in the grid frequency's noise spectrum that are caused by UTC-synchronized control systems (cf.~Fig.~\ref{fig_freq_spec}). In addition to better interference immunity, DSSS has two other important characteristics: It provides \emph{modulation gain}, i.e.~it allows a trade-off between data rate and receiver sensitivity, and it allows for Code Division Multiple Access (CDMA). In CDMA, multiple DSSS-modulated signals can be sent simultaneously through a shared channel with less impact to the resulting signal-to-noise ratio (SNR) than would be the case for other modulation techniques. A DSSS signal is made up from pseudo-random \emph{symbols}, which in turn are made up from individual physical layer bits called \emph{chips}. Chips are encoded in the signal using a lower-layer modulation such as phase-shift keying (e.g.~in GPS) or frequency-shift keying (in this work). In DSSS, a \emph{code} is a library of symbols that are constructed to have minimal cross-correlation, meaning they are near-orthogonal. A transmitter sends a symbol by transmitting its particular pseudo-random chip sequence at a chosen polarity, conveying one bit of information. A receiver demodulates the signal by directly correlating the incoming physical-layer signal with the symbol's chip pattern, which results in a positive or negative peak depending on symbol polarity when a symbol is received. By increasing the DSSS sequence length by a factor of $2$, SNR is improved by $\sqrt{2}$ assuming an additive white gaussian noise (AWGN) channel. At the same time, when doubling the sequence length, common DSSS code construction methods provide twice the number of distinctive symbols allowing for twice the number of CDMA participants. The trade off between twice the sequence length (and transmission time) for approximately $\SI{1.5}{dB}$ in SNR is a steep trade-off, but is necessary in systems where transmitter power cannot be increased further and the resulting signal has a marginally low SNR. \subsubsection{DSSS parametrization} To find the parameters for our DSSS modulation, we simulated a proof-of-concept modulator and demodulator using data captured from our grid frequency sensor. Our simulations covered a range of combinations of modulation amplitude, DSSS sequence bit depth, chip duration and detection threshold. Figure~\ref{fig_ser_nbits} shows our simulation results for symbol error rate (SER) as a function of modulation amplitude with Gold sequences of several bit depths. From these graphs we conclude that the range of practical modulation amplitudes starts at approximately $\SI{1}{\milli\hertz}$, which corresponds to a modulation power of approximately $\SI{25}{\mega\watt}$~\cite{entsoe02}. Figure~\ref{fig_ser_thf} shows SER against detection threshold relative to background noise. Figure~\ref{fig_ser_chip} shows SER against chip duration for a given fixed symbol length. As expected from looking at our measured grid frequency noise spectrum, performance is best for short chip durations and worsens for longer chip durations since shorter chip durations move our signals' bandwidth into the lower-noise region from $\SI{0.2}{\hertz}$ to $\SI{2}{\hertz}$. %FIXME introduce term "chip" somewhere \begin{figure} \centering \includegraphics[width=0.45\textwidth]{../notebooks/fig_out/dsss_gold_nbits_overview} \caption{Symbol Error Rate as a function of modulation amplitude for Gold sequences of several lengths.} \label{fig_ser_nbits} \end{figure} \begin{figure} \centering \hspace*{-5mm}\includegraphics[width=0.5\textwidth]{../notebooks/fig_out/dsss_thf_amplitude_5678} \vspace*{-5mm} \caption{SER vs.\ Amplitude and detection threshold. Detection threshold is set as a factor of background noise level.} \label{fig_ser_thf} \end{figure} \begin{figure} \centering \hspace*{-5mm}\includegraphics[width=0.5\textwidth]{../notebooks/fig_out/chip_duration_sensitivity_6} \vspace*{-5mm} \caption{SER vs.\ DSSS chip duration.} \label{fig_ser_chip} \end{figure} \subsection{Parametrizing a proof-of-concept ``Safety Reset'' System Based on GFM} %FIXME introduce scenario Taking these modulation parameters as a starting point, we proceeded to create a proof-of-concept smart meter emergency reset system. On top of the modulation described in the previous paragraphs we layered simple Reed-Solomon error correction~\cite{mackay01} and some cryptography. The goal of our PoC cryptographic implementation was to allow the sender of an emergency reset broadcast to authorize a reset command to all listening smart meters. An additional constraint of our setting is that due to the extremely slow communication channel all messages should be kept as short as possible. The solution we chose for our PoC is a simplistic hash chain using the approach from the Lamport and Winternitz One-time Signature (OTS) schemes. Informally, the private key is a random bitstring. The public key is generated by recursively applying a hash function to this key a number of times. Each smart meter reset command is then authorized by disclosing subsequent elements of this series. Unwinding the hash chain from the public key at the end of the chain towards the private key at its beginning, at each step a receiver can validate the current command by checking that it corresponds to the previously unknown input of the current step of the hash chain. Replay attacks are prevented by recording the most recent valid command. Keys revocation is supported by designating the last key in the chain as a \emph{revocation key} upon whose reception the client devices advance their local hash ratchet without taking further action. This simple scheme does not afford much functionality but it results in very short messages and removes the need for computationally expensive public key cryptography inside the smart meter. % FIXME add more precise/formal description of crypto % FIXME add description of targeting/scope function? % FIXME somewhere above descirbe entire reset system architecture????!!! % FIXME add description of disarm message (replay protection) \subsection{Experimental results} \begin{figure} \centering \includegraphics[width=0.45\textwidth]{prototype.jpg} \caption{The completed prototype setup. The board on the left is the safety reset microcontroller. It is connected to the smart meter in the middle through an adapter board. The top left contains a USB hub with debug interfaces to the reset microcontroller. The cables on the bottom left are the debug USB cable and the \SI{3.5}{\milli\meter} audio cable for the simulated mains voltage input.} \label{fig_proto_pic} \end{figure} For a realistic proof of concept, we decided to implement our signal processing chain from DSSS demodulator through error correction up to our simple cryptography layer in microcontroller firmware and demonstrate this firmware on actual smart meter hardware, shown in Figure~\ref{fig_proto_pic}. In our proof of concept a safety reset controller is connected to the main application microcontroller of a smart meter. The reset controller is tasked with listening for authenticated reset commands on the voltage waveform, and on reception of such a command resetting the smart meter application controller by flashing a known-good firmware image to its memory. The signal processing chain of our PoC is shown in Figure~\ref{fig_demo_sig_schema}. To interoperate with existing implementations of SHA-512 and reed-solomon decoding, this implementation was written in the C programming language. To demonstrate an application close to a field implementation, we chose an Easymeter \texttt{Q3DA1002} smart meter as our reset target. This model is popular in the German market and readily available second-hand. The meter consists of three isolated metering ASICs connected to a data logging and display PCB through infrared optical links. To demonstrate the safety reset's firmware reset functionality, we connected our safety reset microcontroller to the Texas Instruments \texttt{MSP430} microcontroller on the meter's display and data logging board through the JTAG debug interface that the board's vendor had conveniently left accessible. We ported part of \texttt{mspdebug}\footnote{\url{https://dlbeer.co.nz/mspdebug/}} to drive the meter microcontroller's JTAG interface and wrote a piece of demonstrator code that overwrites the meter's firmware with one that displays an identifying string on the meter's display after boot-up. \begin{figure} \centering \includegraphics[width=0.45\textwidth]{prototype_schema} \caption{The signal processing chain of our demonstrator.} \label{fig_demo_sig_schema} \end{figure} To measure grid frequency in our demonstrator, we ported the same code we used in Section~\label{grid-freq-characterization} to our demonstrator, again using the voltage measured using the microcontroller's internal ADC but using a regular crystal instead of a crystal oven for the microcontroller's system clock. Since we did not have an aluminium smelter ready, we decided to feed our proof-of-concept reset controller with an emulated grid voltage sine wave from a computer's headphone jack. Where in a real application this microcontroller would take ADC readings of input mains voltage divided down by a long resistive divider chain, we instead feed the ADC from a $\SI{3.5}{\milli\meter}$ audio input. For operational safety, we disconnected the meter microcontroller from its grid-referenced capacitive dropper power supply and connected it to our reset controlller's debug USB power supply. We performed several successful experiments using a signature truncated at 120 bit and a 5 bit DSSS sequence. Taking the sign bit into account, the length of the encoded signature is 20 DSSS symbols. On top of this we used Reed-Solomon error correction at a 2:1 ratio inflating total message length to 30 DSSS symbols. At the \SI{1}{\second} chip rate we used in other simulations as well this equates to an overall transmission duration of approximately \SI{15}{\minute}. To give the demodulator some time to settle and to produce more realistic conditions of signal reception we padded the modulated signal unmodulated noise on both ends. \section{Lessons learned} For our proof of concept, before settling on the commercial smart meter we first tried to use an \texttt{EVM430-F6779} smart meter evaluation kit made by Texas Instruments. This evaluation kit did not turn out well for two main reasons. One, it shipped with half the case missing and no cover for the terminal blocks. Because of this some work was required to get it electrically safe. Even after mounting it in an electrically safe manner the safety reset controller prototype would also have to be galvanically isolated to not pose an electrical safety risk since the main MCU is not isolated from the grid and the JTAG port is also galvanically coupled. The second issue we ran into was that the development board is based around a specific microcontroller from TI's \texttt{MSP430} series that is incompatible with common JTAG programmers. Our initial assumption that a development kit would be easier to program than a commercial meter did not prove to be true. Contrary to our expectations the commercial meter had JTAG enabled allowing us to easily read out its stock firmware without either reverse-engineering vendor firmware update files nor circumventing code protection measures. The fact that its firmware was only available in its compiled binary form was not much of a hindrance as it proved not to be too complex and all we wanted to know we found with just a few hours of digging in Ghidra\footnote{\url{https://ghidra-sre.org/}}. In the firmware development phase our approach of testing every module individually (e.g. DSSS demodulator, Reed-Solomon decoder, grid frequency estimation) proved useful particularly for debugging. The modular architecture allowed us to directly compare our demodulator implementation to our Jupyter/Python prototype, where we found that our C implementation outperformed the Python prototype. Despite the algorithms's complexity, the microcontroller C implementation has no issues processing data in real-time due to the low sampling rate necessary. \section{Conclusion} \label{sec_conclusion} \subsection{Applicability to IoT devices} \subsection{Discussion} During an emergency in the electrical grid, the ability to communicate to large numbers of end-point devices is a valuable tool for restoring normal operation. When a resilient communcation channel is available, loads such as smart meters and IoT devices can be equipped with a supervisor circuit that allows for a remote ``safety reset'' that puts the device into a safe operating state. Using this safety reset, an attacker that uses compromised smart meters or IoT devices to attack grid stability can be interrupted before the can conclude their attack. During recovery from an outage, a safety reset can be used to reduce stress on the system during a black start by temporarily disabling non-essential loads such as air conditioners. In this paper we have developed an end-to-end design for a safety reset system that provides these capabilities. Our novel broadcast data transmission system is based on intentional modulation of global grid frequency. Our system is independent of normal communication networks and can operate during a cyberattack. We have shown the practical viability of our end-to-end design through simulations. Using our purpose-designed grid frequency recorder, we can capture and process real-time grid frequency data in an electrically safe way. We used data captured this way as the basis for simulations of our proposed grid frequency modulation communication channel. In these simulations, our system has proven feasible. From our simulations we conclude that a large consumer such as an aluminium smelter at a small cost can be modified to act as an on-demand grid frequency modulation transmitter. We have demonstrated our modulation system in a small-scale practical demonstration. For this demonstration, we have developed a simple cryptographic protocol ready for embedded implementation in resource-constrained systems that allows triggering a safety reset with a response time of less than 30 minutes. In this demonstration we use simulated grid frequency data to trigger a commercial microcontroller to perform a firmware reset of an off-the-shelf smart meter. The next step in our evaluation will be to conduct an experimental evaluation of our modulation scheme in collaboration with an utility and an operator of a multi-megawatt load. The safety reset controller does not require any peripherals except for an ADC. Thus we expect code size to be the main factor affecting per-unit cost in an in-field deployment of our concept. At around \SI{64}{\kilo\byte}, our demonstrator firmware implementation is viable on low-end microcontrollers. Thus, we expect safety reset controllers to be commercially viable. Source code and EDA designs are available at the public repository listed at the end of this document. \begin{acks} This work has been co-funded by the LOEWE initiative (Hesse, Germany) within the emergenCITY center. \end{acks} \bibliographystyle{plain} \bibliography{\jobname} \center{ \footnotesize \center{This is version \texttt{\input{version.tex}\unskip} of this paper, generated on \today. The git repository can be found at:} \center{\url{https://git.jaseg.de/safety-reset.git}} } \end{document}