From 6fac195a973ead5e2c79ecc523b19830c8268be4 Mon Sep 17 00:00:00 2001 From: jaseg Date: Thu, 7 Apr 2022 17:59:50 +0200 Subject: WIP --- paper/safety-reset-paper.tex | 337 ++++++++++++++++++------------------------- 1 file changed, 138 insertions(+), 199 deletions(-) (limited to 'paper') diff --git a/paper/safety-reset-paper.tex b/paper/safety-reset-paper.tex index fc62bc8..3b7a93e 100644 --- a/paper/safety-reset-paper.tex +++ b/paper/safety-reset-paper.tex @@ -1,4 +1,6 @@ -\documentclass[runningheads]{llncs} +\documentclass[letterpaper,twocolumn,10pt]{article} +\usepackage{usenix} + \usepackage[T1]{fontenc} \usepackage[ backend=biber, @@ -32,165 +34,64 @@ % https://eepublicdownloads.entsoe.eu/clean-documents/pre2015/publications/entsoe/Operation_Handbook/Policy_1_Appendix%20_final.pdf +\date{} \title{Ripples in the Pond: Transmitting Information through Grid Frequency Modulation} -\titlerunning{Ripples in the Pond: Transmitting Information through Grid Frequency} \author{Jan Sebastian Götte \and Liran Katzir \and Björn Scheuermann} -\institute{TU Darmstadt\\ Communication Networks Lab\\ \email{safetyreset@jaseg.de} -\and Tel Aviv University\\ Faculty of Engineering\\ \email{lirankat@tau.ac.il} -\and TU Darmstadt\\ Communication Networks Lab\\ \email{scheuermann@informatik.hu-berlin.de}} +%\institute{TU Darmstadt\\ Communication Networks Lab\\ \email{safetyreset@jaseg.de} +%\and Tel Aviv University\\ Faculty of Engineering\\ \email{lirankat@tau.ac.il} +%\and TU Darmstadt\\ Communication Networks Lab\\ \email{scheuermann@informatik.hu-berlin.de}} \maketitle -\keywords{Security, privacy and resilience in critical infrastructures \and Security and privacy in ``internet of -things'' \and Cyber-physical systems \and Hardware security \and Network Security \and Energy systems \and Signal theory} +%\keywords{Security, privacy and resilience in critical infrastructures \and Security and privacy in ``internet of +%things'' \and Cyber-physical systems \and Hardware security \and Network Security \and Energy systems \and Signal theory} \begin{abstract} - With the rollout of the smart grid, the IT security of electrical infrastructure has attracted increased attention - in the last years. Smart Grid IT security has two major components: The security of central SCADA systems, and - the security of equipment at the consumer premises such as smart meters and IoT devices. While there is previous - work on both sides, their interactions have not yet received much attention. - - In this paper, we consider the previously proposed scenario where a large number of compromised consumer devices is - used alone or in conjunction with an attack on the grid's central SCADA systems to destabilize the grid by rapidly - modulating the total connected load. Such attacks might include IoT devices, but they might also target Smart - Meters, which in many parts of the world now contain remote-controlled disconnect switches. Such attacks are hard to - mitigate, and existing literature focuses on hardening device firmware to prevent compromise. Although perfect - firmware security is not practically achievable, there is little research on \emph{post-compromise} mitigation - approaches. A core issue of any post-attack mitigation is that the devices normal network connection may not work - due to the attack and as such an out-of-band communication channel is necessary. - - We propose a \emph{safety reset} controller that is controlled through a novel, resilient, grid-wide powerline - communication technique. Our safety reset controller can be fitted into any Smart Meter or IoT device. Its purpose - is to await an out-of-band command to put the device into a safe state (e.g. \emphp{relay on} or \emph{light on}) - that interrupts attacker control over the device. The safety reset controller is separated from the system's main - application controller and does not have any conventional network connections to reduce attack surface and cost. - - Our proposed resilient communication channel is a grid-wide broadcast channel based on modulating grid frequency. It - can be operated by transmission system operators (TSOs) even during black-start recovery procedures and in this - situation bridges the gap between the TSO's private network and the consumer devices. To demonstrate our proposed - channel, we have implemented a system that transmits error-corrected and cryptographically secured commands. - - Our approach differs from traditional Powerline Communication (PLC) systems in that it reaches every device within - the same synchronous area as the signal is embedded into the fundamental grid frequency. Traditional PLC uses a - superimposed voltage, which is quickly attenuated across long distances. - - Using simulations we have determined that control of a $\SI{25}{\mega\watt}$ load such as a large aluminium smelter, - load bank or photovoltaic farm would allow for the transmission of a crytographically secured \emph{reset} signal - within $15$ minutes. We have designed and constructed a proof-of-concept prototype receiver that demonstrates the - feasibility of decoding such signals on a resource-constrained microcontroller. + Previous work has explored the scenario of an attacker compromising a large number of Smart Meters that are equipped + with remote disconnect switches, and using these remote-controllable switches to cause a large-scale outage. + Previous work focuses on attack prevention. In this paper, we will instead look at recovery after a successful + attack. To transmission system operators (TSOs), the major challenge after such a Smart Meter-triggered outage is + that the attacker will likely persist through the outage, and compromised Smart Meters will resume malicious + activity after their power is restored. In the event of such an attack, TSOs would need a way to remotely put these + compromised devices into a \emph{safe} mode of operation. + + Given that public telecommunications networks including the internet, cellular networks, and LoRa base stations may + also be disrupted during a large-scale blackout, the challenging aspect of this remote \emph{Safety Reset} is the + communication channel between TSO and the smart meter. For this purpose, in this paper we propose a simple yet + effective communication channel based on modulating grid frequency by modulating the power of a connected load or + generator. Our proposed communciation channel (1) requires minimal infrastructure, (2) has a reach spanning the + entire power grid and (3) is fully independent of other telecommunication networks and functions even under severe + disruption of the grid. \end{abstract} \section{Introduction} -% FIXME This is meh. -% Maybe *start* with "the recovery from a blackout bla bla..."? -The power grids of the world are some of the most complex man-made technological systems. Their operation is essential -for modern human life and with the proliferation of ransomware and state-sponsored attacks their IT security has come -under close scrutiny. To grid operators, there are two main challenges that complicate IT security efforts. First, all -parts of the electrical grid are physically coupled and faults can have consequences far from their source. Second, many -of the networked devices used in grid applications are special-purpose devices built in low volumes, which limits the -amount of engineering effort that could have been spent on their firmware security. - -We expect that a serious compromise can never fully be ruled out since the combined attack surface of a large number of -diverse devices is too large to effectively secure, and perimeter security measures are only effective to a point when -devices are spread out across a vast geographical area. Thus, in this paper we focus not on the prevention of an attack, -but on the recovery from one. -%The IT security of the power grid is a complicated issue. Transmission system operators are faced with multiple -%challenges. - -%First, the grid is composed of myriad different devices that are interconnected on a contintental scale. Since all these -%devices are physically coupled, faults in one system can have ripple effects far away. In other critical infrastructure -%such as the water supply, transportation or the public health system, a number of fundamentally independent sub-systems -%are only linked at an organizational level, which means faults due to either natural disasters or hacking attacks are -%likely to be localized. In contrast, a transmission system operator has to make sure no faults happen anywhere in the -%system for the system to be stable. Ensuring faultless operation across thousands of devices is hard. - -%Like any other complex technological system, the components that make up the power grid are increasingly being outfitted -%with networked computer systems for monitoring and control. -%They have to secure a large and diverse fleet of networked systems, many of which are special-purpose devices customized -%for this particular application. Small production quantities -%mean that the limit of economically achievable security is already low. Coupled with the high complexity of each of -%these devices, this results in - -\subsection{The digitalization of the grid} -In the power grid, as in many other engineered systems, we can observe an ongoing diffusion of information systems into -the domain of industrial control. Automation of these control systems has already been practiced for the better part of a -century. Throughout the 20th century this automation was mostly limited to core components of the grid. Generators in -power stations are computer-controlled according to electromechanical and economic models. Switching in substations is -automated to allow for fast failure recovery. Human operators are still vital to these systems, but their tasks have -shifted from pure operation to engineering, maintenance and surveillance~\cite{crastan03,anderson02}. - -With the turn of the century came a large-scale trend in power systems to move from a model of centralized generation, -built around massive large-scale fossil and nuclear power plants, towards a more heterogenous model of smaller-scale -generators working together. In this new model large-scale fossil power plants still serve a major role, but new -factors come into play. One such factor is the advance of renewable energies. The large-scale use of wind and solar -power in particular seems unavoidable for continued human life on this planet. For the electrical -grid these systems constitute a significant challenge. Fossil-fueled power plants can be controlled in a precise and -quick way to match energy consumption. This tracking of consumption with production is vital to the stability of the -grid. Renewable energies such as wind and solar power do not provide the same degree of controllability, and they -introduce a larger degree of uncertainty due to the unpredictability of the forces of nature~\cite{crastan03}. - -Along with this change in dynamic behavior, renewable energies have brought forth the advance of distributed generation. -In distributed generation end customers that previously only consumed energy have started to feed energy into the grid -from small solar installations on their property. Distributed generation is a chance for customers to gain autonomy and -shift from a purely passive role to being active participants of the electricity market~\cite{crastan03}. - -% FIXME the following paragraph is weird. - -To match this new landscape unpredictable renewable resources and of decentralized generation, the utility industry has -had to adapt itself in major ways. One aspect of this adaptation that is particularly visible to energy consumers is the -computerization of end-user energy metering. Despite the widespread use of industrial control systems inside the -electrical grid and the far-reaching diffusion of computers into people's everyday lives, the energy meter has long been -one of the last remnants of an offline, analog time. Until the 2010s many households were still served through -electromechanical Ferraris-style meters that have their origin in the late 19th -century~\cite{borlase01,ukgov04,bnetza02}. Today, under the umbrella term \emph{Smart Metering}, the shift towards fully -computerized, often networked meters is well underway. The roll out of these \emph{Smart Meters} has not been very -smooth overall with some countries severely lagging behind. As a safety-critical technology, smart metering technology -is usually standardized on a per-country basis. - -\subsection{Perfect firmware security} -% FIXME join these paragraphs -This leads to an inhomogenous landscape with---in some -instances---wildly incompatible systems. Often vendors only serve a single country or have separate models of a meter -for each country. This complex standardization landscape and market situation has led to a proliferation of highly -complex, custom-coded microcontroller firmware. The complexity and scale of this---often network-connected---firmware -makes for a ripe substrate for bugs to surface. - -A remotely exploitable flaw inside the firmware of a component of a smart metering system could have consequences -ranging from impaired billing functionality to an existential threat to grid stability~\cite{anderson01,anderson02}. In a -country where meters commonly include disconnect switches for purposes such as prepaid tariffs, a coordinated attack -could at worst cause widespread activation of grid safety systems through oscillations caused by repeated cycling of -megawatts of load capacity at just the wrong frequency~\cite{wu01}. - -Mitigation of these attacks through firmware security measures is unlikely to yield satisfactory results. The enormous -complexity of smart meter firmware makes firmware security extremely labor intensive. The diverse standardization -landscape makes a coordinated, comprehensive response unlikely. - -In this paper, we introduce \emph{Grid Frequency Modulation}, a new communication channel that can be used for grid-wide -broadcast without relying on any other telecommunication networks being operational. Grid Frequency Modulation uses -Direct Sequence Spread Spectrum (DSSS) modulation carried out on grid frequency through a large controllable load such -as an aluminium smelter. Note that Grid Frequency Modulation is \emph{changing the grid frequency itself}. This is -fundamentally different in both generation and detection from systems such as traditional PLC that superimpose a signal -on grid voltage, but leave the underlying grid frequency itself unaffected. As it requires high-fidelity control over a -large load or producer connected to the grid, Grid Frequency Modulation provides a degree of implicit sender -authentication. - -To illustrate the utility of Grid Frequency Modulation we propose a pragmatic solution to the---in our opinion -likely---scenario of a large-scale compromise of smart meter firmware. Instead of improving firmware security or -resiliency of public telecommunication infrastructure, both of which are hard problems, we introduce the \emph{safety -reset controller} as a fail-safe that allows an utility to flush an attacker out of their deployed smart meters even -during large-scale disruption of telecommunication networks. In our concept the components of the smart meter that are -threatened by remote compromise are equipped with a physically separate microcontroller that listens for a ``reset'' -command transmitted through the electrical grid's frequency and on reception forcibly resets the smart meter's entire -firmware through a low-level programming interface such as JTAG to a known-good state that disables all network -functionality to prevent re-compromise and lock out the attackers until the device can be programmed with a patched -firmware by a service technician. As part of our prototype reset controller we have developed a simple cryptographic -command protocol based on the Lamport and Winternitz One-time Signature (OTS) schemes that our prototype reset -controller uses to receive an authenticated command to re-flashe the smart meter's main microcontroller over the -standard JTAG interface. The safety reset controller is an off-the-shelf microcontroller much smaller than the one used -for the meter's main application controller. To receive grid frequency-modulated commands, it measures grid frequency -from a voltage waveform acquired using its internal analog-to-digital-converter (ADC) directly connected to the mains -voltage input through a resistive divider chain. By using of an off-the-shelf microcontroller we keep the implementation -overhead of our solution low in engineering cost compared to an ASIC. By keeping its firmware small, we can use a -simpler and less expensive microcontroller, keeping per-unit implementation cost low. +With the rollout of the smart grid, the IT security of electrical infrastructure has attracted increased attention in +the last years. Smart Grid security has two major components: The security of central SCADA systems, and the security +of equipment at the consumer premises such as smart meters and IoT devices. While there is previous work on both sides, +their interactions have not yet received much attention. + +In this paper, we consider the previously proposed scenario where a large number of compromised consumer devices is used +alone or in conjunction with an attack on the grid's central SCADA systems to destabilize the grid by rapidly modulating +the total connected load. Previous work considered compromised smart meters with integrated remote disconnect switches +as likely candidates for such an attack, but the same attack can also be performed using compromised IoT devices. Such +attacks are hard to mitigate, and existing literature focuses on hardening device firmware to prevent compromise. +Despite the infeasibility of perfect firmware security, there is little research on \emph{post-compromise} mitigation +approaches. A core issue with post-attack mitigation is that the devices normal network connection may not work due to +the attack and as such an out-of-band communication channel is necessary. + +We propose a \emph{safety reset} controller that is controlled through a novel, resilient, grid-wide powerline +communication technique. Our safety reset controller can be fitted into any Smart Meter or IoT device. Its purpose is to +await an out-of-band command to put the device into a safe state (e.g. \emph{relay on} or \emph{light on}) that +interrupts attacker control over the device. The safety reset controller is separated from the system's main application +controller and does not have any conventional network connections to reduce attack surface and cost. + +We propose a resilient grid-wide broadcast channel based on modulating grid frequency. This channel can be operated by +transmission system operators (TSOs) even during black-start recovery procedures and in this situation bridges the gap +between the TSO's private network and the consumer devices. To demonstrate our proposed channel, we have implemented a +system that transmits error-corrected and cryptographically secured commands. + +Our approach differs from traditional Powerline Communication (PLC) systems in that it reaches every device within one +synchronous area as the signal is embedded into the fundamental grid frequency. Traditional PLC uses a superimposed +voltage, which is quickly attenuated across long distances. \begin{figure} \centering @@ -208,6 +109,59 @@ broadcast radio). A grid frequency-based system can function as long as power is restored after the attack. One powerful function this allows is ``flushing out`` an attacker from compromised smart meters after an attack, before restoring smart meter internet connectivity. +Using simulations we have determined that control of a $\SI{25}{\mega\watt}$ load such as a large aluminium smelter, +load bank or photovoltaic farm would allow for the transmission of a crytographically secured \emph{reset} signal within +$15$ minutes. We have designed and constructed a proof-of-concept prototype receiver that demonstrates the feasibility +of decoding such signals on a resource-constrained microcontroller. + +\subsection{Motivation} + +Consumer devices are increasingly becoming \emph{smart}. Large numbers of IoT devices are connected through the public +internet, and in several countries internet-connected Smart Meters can disconnect entire households from the grid in +case of unpaid bills. The increasing proliferation of smart devices on the consumer side presents an opportunity to grid +operators, who rely on forecasts for the cost-optimized control of generation and power flow. The core of the +\emph{Smart Grid} vision is that utilities can now gather detailed data for more accurate consumption forecasts, and in +some cases can even adjust parameters of large devices like water heaters to smooth out load spikes. + +However, this increased degree of visibility and control comes with an increased IT security risk. In this paper we +focus on scenarios where an attacker compromises a large number of grid-connected remote-controllable devices. This may +be simple smart home devices such as IoT light bulbs, but it may also include Smart Meters that are outfitted with a +remote disconnect switch as is common in some countries. By rapidly switching large numbers of such devices in a +coordinated manner, the attacker has the opportunity to de-stabilize the electrical grid. % FIXME citation + +Previous work on IoT and Smart Grid security has focused on the prevention of attacks though firmware security measures. +While research on prevention is undoubtably important, we estimate that its practical impact will be limited by the vast +diversity of implementations found in the field combined with the slow update cycles inherent to non-functional firmware +enhancements for consumer devices. We predict that it would be a Sisyphean task to secure sufficiently many devices +to deny an attacker the critical mass needed to cause trouble. For this reason, in this paper we focus on recovery after +an attack. + +\subsection{Black-start recovery} + +The recovery from a large-scale power outage is a complex operational challenge. Large outages are caused by cascading +failures. Since all consumers and producers that are connected to the electrical grid are physically coupled through the +electromotive force, a fault in one part of the grid affects all devices connected across the grid. To function, the +grid relies on a delicate balance between electricity generation, transmission and consumption. When this balance is +disturbed, cascading failures can occur. A transmission line shutting off can lead other, nearby lines to overload and +shut off. Due to the electromechanical coupling of all machines connected to the grid, a generator or consumer suddenly +shutting off causes a transient in the grid's frequency. If the frequency goes too far out of bounds, protection devices +take power plants and large industrial loads offline. + +The recovery from a large-scale outage requires the grid's operators to bring generators and loads back online one by +one while continuously maintaining balance between generation and consumption to avoid their protection devices shutting +them down again. To coordinate this process, transmission system operators cannot rely on the public internet or +cellular networks, as they may not work during a large-scale power outage. Instead, they maintain private communication +infrastructure using dedicated lines rented from telecommunciations providers, fibers run along transmission lines, and +dedicated radio links. + +To start from a complete outage, first a number of \emph{black start}-capable power stations that can start by +themselves without any external power are brought online. With their help, other power stations and consumers are +gradually brought online until a part of the grid has been restored to nominal operation. This process can be performed +simultaneously in different parts of the grid. After these \emph{islands} have been restored, they can then be joined to +restore the grid to its normal state. + +\subsection{Contents} + Starting from a high level architecture, we have carried out simulations of our concept's performance under real-world conditions. Based on these simulations we implemented an end-to-end prototype of our proposed safety reset controller as part of a realistic smart meter demonstrator. Finally, we experimentally validated our results and we will conclude with @@ -222,7 +176,7 @@ This work contains the following contributions: \item We carry out extensive simulations of our systems to determine its performance characteristics. \end{enumerate} -\section{Notation} +\subsection{Notation} To a computer scientist there is one confusing aspect to the theory of grid frequency modulation. GFM can be seen as a frequency modulation (FM) with a baseband signal in the band below approximately $f_m = \SI{5}{\hertz}$ that is @@ -242,53 +196,38 @@ signal and its properties such as $f_m$. \section{Related work} \label{sec_related_work} -% FIXME: intro here - -\subsection{Security and Privacy in the Smart Grid} - -The smart grid in practice is nothing more or less than an aggregation of embedded control and measurement devices that -are part of a large control system. This implies that all the same security concerns that apply to embedded systems in -general also apply to the components of a smart grid. Where programmers have been struggling for decades now with issues -such as input validation~\cite{leveson01}, the same potential issue raises security concerns in smart grid scenarios as -well~\cite{mo01, lee01}. Only, in smart grid we have two complicating factors present: many components are embedded -systems, and as such inherently hard to update. Also, the smart grid and its control algorithms act as a large partially -distributed system making problems such as input validation or authentication harder~\cite{blaze01} and adding a host of -distributed systems problems on top~\cite{lamport01}. - -Given that the electrical grid is essential infrastructure in our modern civilization, these problems amount to -significant issues. Attacks on the electrical grid may have grave consequences~\cite{anderson01,lee01} while the long -replacement cycles of various components make the system slow to adapt. Thus, components for the smart grid need to be -built to a much higher standard of security than most consumer devices to ensure they live up to well-funded attackers -even decades down the road. This requirement intensifies the challenges of embedded security and distributed systems -security among others that are inherent in any modern complex technological system. The safety-critical nature of the -modern smart metering ecosystem in particular was quickly recognized~\cite{anderson01}. - -A point we will not consider in much depth in this work is theft of electricity. While in publications aimed towards the -general public the introduction of smart metering is always motivated with potential cost savings and ecological -benefits, in industry-internal publications the reduction of electricity theft is often cited as an -incentive~\cite{czechowski01}. Likewise, academic publications tend to either focus on other benefits such as generation -efficiency gains through better forecasting or rationalize the consumer-unfriendly aspects of smart metering with social -benefits~\cite{mcdaniel01}. They do not usually point out revenue protection mechanisms as -incentives~\cite{anderson01,anderson02}. - -A serious issue in smart metering setups is customer privacy. Even though the meter ``only'' collects aggregate energy -consumption of a whole household, this data is highly sensitive~\cite{markham01}. This counterintuitive fact was -initially overlooked in smart meter deployments leading to outrage, delays and reduced features~\cite{cuijpers01}. The -root cause of this problem is that given sufficient time resolution these aggregate measurements contain ample -entropy. Through disaggregation algorithms, individual loads can be identified and through pattern matching even complex -usage patterns can be discerned with alarming accuracy~\cite{greveler01} in the same way that similar privacy issues -arise in many other areas of modern life through other kinds of pervasive tracking and surveillance~\cite{zuboff01}. - -Another fundamental challenge in smart grid implementations is the central role of smart electricity meters in the smart -grid ecosystem. Smart meters are used both for highly-granular load measurement and in some countries also for load -switching~\cite{zheng01}. Smart electricity meters are effectively consumer devices. They are built down to a certain -price point that is measured by the burden it puts on consumers and that is divided by the relatively small market -served by a single smart meter implementation. Such cost requirements can preclude security features such as the use of -a standard hardened software environment on a high powered embedded system. Landis+Gyr, a large manufacturer that makes -most of its revenue from utility meters state in their 2019 annual report that they invested \SI{36}{\percent} of their -total R\&D budget on embedded software while spending only \SI{24}{\percent} on hardware -R\&D~\cite{landisgyr01,landisgyr02}, indicating a significant tension between firmware security and a smart meter -vendor's bottom line. +Previous work has analyzed Smart Grid security from numerous angles and made several suggestions towards its +improvement. Apart from the critical location that Smart Grid devices occupy, they are computer systems like many +others. Thus, for IT security purposes the Smart Grid is simply an aggregation of embedded control and measurement +devices that are part of a large control system. These devices share the same security concerns that apply to embedded +systems in general. + +\subsection{Smart Meter Security} + +Where programmers have been struggling for decades now with issues such as input validation~\cite{leveson01}, the same +potential issue raises security concerns in smart grid scenarios as well~\cite{mo01, lee01}. Only, in smart grid we +have two complicating factors present: many components are embedded systems, and as such inherently hard to update. +Also, the smart grid and its control algorithms act as a large (partially) distributed system making problems such as +input validation or authentication harder~\cite{blaze01} and adding a host of distributed systems problems on +top~\cite{lamport01}. + +Given that the electrical grid is essential infrastructure, these issues are significant. Attacks on the electrical grid +may have grave consequences~\cite{anderson01,lee01} while the long replacement cycles of various components make the +system slow to adapt. Thus, components for the smart grid need to be built to a higher standard of security than e.g.\ +IoT devices to live up to well-funded attackers decades down the road. Another implication of their long service life +is that their agility w.r.t.\ post-hoc mitigations through firmware updates is limited. + +%Another fundamental challenge in smart grid implementations is the central role of smart electricity meters in the +%smart grid ecosystem. Smart meters are used both for highly-granular load measurement and in some countries also for +%load switching~\cite{zheng01}. +Smart electricity meters are effectively consumer devices built down to a certain price point. The small market served +by a single smart meter implementation limits how much effort a vendor can spend on firmware security. Landis+Gyr, a +large manufacturer that makes most of its revenue from utility meters state in their 2019 annual report that they +invested \SI{36}{\percent} of their total R\&D budget on embedded software while spending only \SI{24}{\percent} on +hardware R\&D~\cite{landisgyr01,landisgyr02}, indicating significant tension between firmware security and the vendor's +bottom line. + +% FIXME more sources! \subsection{The state of the art in embedded security} -- cgit