\documentclass[12pt,a4paper,notitlepage]{report} \usepackage[utf8]{inputenc} \usepackage[a4paper,textwidth=17cm, top=2cm, bottom=3.5cm]{geometry} \usepackage[T1]{fontenc} \usepackage[ backend=biber, style=numeric, natbib=true, url=true, doi=true, eprint=false ]{biblatex} \addbibresource{safety_reset.bib} \usepackage{amssymb,amsmath} \usepackage{listings} \usepackage{eurosym} \usepackage{wasysym} \usepackage{amsthm} \usepackage{tabularx} \usepackage{multirow} \usepackage{multicol} \usepackage{tikz} \usetikzlibrary{arrows} \usetikzlibrary{backgrounds} \usetikzlibrary{calc} \usetikzlibrary{decorations.markings} \usetikzlibrary{decorations.pathreplacing} \usetikzlibrary{fit} \usetikzlibrary{patterns} \usetikzlibrary{positioning} \usetikzlibrary{shapes} \usepackage{hyperref} \usepackage{tabularx} \usepackage{commath} \usepackage{graphicx,color} \usepackage{subcaption} \usepackage{float} \usepackage{footmisc} \usepackage{array} \usepackage[underline=false]{pgf-umlsd} \usetikzlibrary{calc} %\usepackage[pdftex]{graphicx,color} %\usepackage{epstopdf} % Needed for murks.tex \usepackage{setspace} \usepackage[draft=false,babel,tracking=true,kerning=true,spacing=true]{microtype} % optischer Randausgleich etc. % For german quotation marks \newcommand{\foonote}[1]{\footnote{#1}} \newcommand{\degree}{\ensuremath{^\circ}} \newcolumntype{P}[1]{>{\centering\arraybackslash}p{#1}} \begin{document} % Beispielhafte Nutzung der Vorlage für die Titelseite (bitte anpassen): \input{murks} \titel{FIXME} % Titel der Arbeit \typ{Masterarbeit} % Typ der Arbeit: Diplomarbeit, Masterarbeit, Bachelorarbeit \grad{Master of Science (M. Sc.)} % erreichter Akademischer Grad % z.B.: Master of Science (M. Sc.), Master of Education (M. Ed.), Bachelor of Science (B. Sc.), Bachelor of Arts (B. A.), Diplominformatikerin \autor{Jan Sebastian Götte} \gebdatum{Aus datenschutzrechtlichen Gründen nicht abgedruckt} % Geburtsdatum des Autors \gebort{Aus datenschutzrechtlichen Gründen nicht abgedruckt} % Geburtsort des Autors \gutachter{Prof. Dr. Björn Scheuermann}{FIXME} % Erst- und Zweitgutachter der Arbeit \mitverteidigung % entfernen, falls keine Verteidigung erfolgt \makeTitel \selbstaendigkeitserklaerung{31.03.2020} \newpage % Hier folgt die eigentliche Arbeit (bei doppelseitigem Druck auf einem neuen Blatt): \tableofcontents \newpage \chapter{Introduction} \section{Structure and operation of the electrical grid} \subsection{Structure of the electrical grid} \subsubsection{Generators and loads} \subsubsection{Transformers} \subsubsection{Tie lines} \subsection{Operational concerns} \subsubsection{Modelling the electrical grid} \subsubsection{Generator controls} \subsubsection{Load shedding} \subsubsection{System stability} \subsubsection{Power System Stabilizers} \subsubsection{Smart metering} \section{Smart meter technology} \subsubsection{Common components} Smart meters usually are built around a standard microcontroller. \label{sm-cpu} \subsubsection{Cryptographic coprocessors} \subsubsection{Physical structure} \subsubsection{Physical installation} \section{Regulatory frameworks around the world} \subsection{International standards} \subsection{Regulations in Europe} \subsection{The regulatory situation in Germany} \subsection{The regulatory situation in France} \subsection{The regulatory situation in the UK} \subsection{The regulatory situation in Italy} \subsection{The regulatory situation in northern America} \subsection{The regulatory situation in Japan} \subsection{Common themes} \section{Security in smart grids} The smart grid in practice is nothing more or less than an aggregation of embedded control and measurement devices that are part of a large control system. This implies that all the same security concerns that apply to embedded systems in general also apply to most components of a smart grid in some way. Where programmers have been struggling for decades now with input validation\cite{leveson01}, the same potential issue raises security concerns in smart grid scenarios as well\cite{mo01, lee01}. Only, in smart grid we have two complicating factors present: Many components are embedded systems, and as such inherently hard to update. Also, the smart grid and its control algorithms act as a large (partially-)distributed system, making problems such as input validation or authentication difficult to implement\cite{blaze01} and adding a host of distributed systems problems on top\cite{lamport01}. Given that the electrical grid is a major piece of essential infrastructure in modern civilization, these problems amount to significant issues in practice. Attacks on the electrical grid may have grave consequences\cite{lee01} all the while the long maintenance cycles of various components make the system slow to adapt. Thus, components for the smart grid need to be built to a much higher standard of security than most consumer devices to ensure they live up to well-funded attackers even decades down the road. This requirement intensifies the challenges of embedded security and distributed systems security among others that are inherent in any modern complex technological system. A point we will not consider in much depth is theft of electricity. A large part of the motivation of the introduction of smart meters seems to be % TODO weak statement to reduce the level of fraud by consumers. Academic papers tend to either focus on other benefits such as generation efficiency gains through better forecasting or try to rationalize the funamentally anti-consumer nature of smart metering with strenuous claims of ``enormous social benefits''\cite{mcdaniel01}. We will entirely focus on grid stability and discard electricity theft in the context of this paper for two reasons: One, billing inaccuracies of electricity companies are of very low urgency compared to grid stability, and the one is a precondition for the other. Two, utility companies can already put strong bounds on the amount of theft by simply cross-refrencing meter readings against trusted readings from upstream sections of the grid. This capability works even without smart meters and only gains speed from smart meters, just as the old exploit of bypassing the meter with a section of wire can't be prevented like this. Due to these bounds on its volume, electricity theft using smart meter hacking would not scale. Hackers would simply be rooted up one by one with no damage to consumers and very limmited damage to utility companies. Damage in these scenarios would be a far cry from the efficiency of an exponentially growing botnet. \subsection{Smart grid components as embedded devices} A fundamental challenge in smart grid implementations is the central role smart electricity meters play. Smart meters are used both for highly-granular load measurement and (in some countries) load switching\cite{zheng01}. Smart electricity meters are effectively consumer devices. They are built down to a certain price point that is measured by the burden it puts on consumers and that is generally fixed by regulatory authorities. % FIXME cite This requirement precludes some hardware features such as the use of a standard hardened software environment on a high-powerded embedded system (such as a hypervirtualized embedded linux setup) that would both increase resilience against attacks and simplify updates. Combined with the small market sizes in smart grid deployments \footnote{ Most vendors of smart electricity meters only serve a handful of markets. For the most part, smart meter development cost lies in the meter's software % TODO cite? There exist multiple competing standards applicable to various parts of a smart electricity meter. In addition, most countries have their own certification regimen\cite{cenelec01}. This complexity creates a large development burden for new market entrants\cite{perez01}. } this produces a high cost pressure on the software development process for smart electricity meters. \subsection{The state of the art in embedded security} Embedded security generally is much harder than security of higher-level systems. This is due to a combination of the unique constraints of embedded devices (hard to update, usually small quantity) and their lack of capabilities (processing power, memory protection functions, user interface devices). Even very well-funded companies continue to have serious problems securing their embedded systems. A spectacular example of this difficulty is the recently-exposed flaw in Apple's iPhone SoC first-stage ROM bootloader\footnote{ Modern system-on-chips integrate one or several CPUs with a multitude of peripherals, from memory and DMA controllers over 3D graphics accelerators down to general-purpose IO modules for controlling things like indicator LEDs. Most SoCs boot from one of several boot devices such as flash memory, ethernet or USB according to a configuration set e.g. by connecting some SoC pins a certain way or set by device-internal write-only fuse bits. Physically, one of the processing cores of the SoC (usually one of the main CPU cores) is connected such that it is taken out of reset before all other devices, and is tasked with switching on and configuring all other devices of the SoC. In order to run later intialization code or more advanced bootloaders, this core on startup runs a very small piece of code hard-burned into the SoC in the factory. This ROM loader initializes the most basic peripherals such as internal SRAM memory and selects a boot device for the next bootloader stage. Apple's ROM loader performs some authorization checks, to ensure no unauthorized software is loaded. The present flaw allows an attacker to circumvent these checks, booting code not authorized by Apple on a USB-connected iPhone, compromising Apple's chain of trust from ROM loader to userland right at its root. }, that allows a full compromise of any iPhone before the iPhone X. iPhone 8, one of the affected models, is still being manufactured and sold by Apple today\footnote{ i.e. at the time this paragraph was written, on %FIXME }. In another instance, Samsung put a flaw in their secure-world firmware used for protection of sensitive credentials in their mobile phone SoCs in % FIXME year % . If both of these very large companies have trouble securing parts of their secure embedded software stacks measuring a mere few hundred bytes in Apple's case or a few kilobytes in Samsung's, what is a smart electricity meter manufacturer to do? For their mass-market phones, these two companies have R\&D budgets that dwarf some countries' national budgets. % FIXME hyperbole? % FIXME cite Since thorough formal verification of code is not yet within reach for either large-scale software development or code heavy in side-effects such as embedded firmware or industrial control software\cite{pariente01} the two most effective measures for embedded security is reducing the amount of code on one hand, and labour-intensively checking and double-checking this code on the other hand. A smart electricity manufacturer does not have a say in the former since it is bound by the official regulations it has to comply with, and will almost certainly not have sufficient resources for the latter. % FIXME expand? % FIXME cite some figures on code size in smart meter firmware? \subsection{Attack avenues in the smart grid} If we model the smart grid as a control system responding to changes in inputs by regulating outputs, on a very high level we can see two general categories of attacks: Attacks that directly change the state of the outputs, and attacks that try to influence the outputs indirectly by changing the system's view of its inputs. The former would be an attack such as one that shuts down a power plant to decrease generation capacity. The latter would be an attack such as one that forges grid frequency measurements where they enter a power plant's control systems to provoke increasing oscillation in the amount of power generated by the plant according to the control systems' directions. % FIXME cite % FIXME expand \subsubsection{Communication channel attacks} Communication channel attacks are attacks on the communication links between smart grid components. This could be attacks on IP-connected parts of the core network or attacks on shared busses between smart meters and IP gateways in substations. Generally, these attacks can be mitigated by securing the aforementioned communication links using modern cryptography. IP links can be protected using TLS, and more low-level busses can be protected using more lightweight Noise-based protocols. % FIXME cite Cryptographic security transforms an attackers ability to manipulate communication contents into a mere denial of service attack. Thus, in addition to cryptographic security safety under DoS conditions must be ensured to ensure continued system performance under attacks. This safety property is identical with the safety required to withstand random outages of components, such as communications link outages due to physical damage from storms, flooding etc. % FIXME cite papers on attack impact, on coutermeasures and on attack realization In general, attacks at the meter level may be hard to weaponize % may be -> weak statement? since meters are used mostly for billing and forecasting purposes % FIXME cite and for more critical grid control purposes there exist several additional layers of sensors above smart meters that limit how much an attacker can falsify smart meter readings without the manipulation being obvious. In order for an attack to have more far-reaching consequences the attacker would need to compromise additional grid infrastructure\cite{kim01,kosut01}. \subsubsection{Exploiting centralized control systems} The type of smart grid attack most often cited in popular discourse, and to the author's knowledge % FIXME verify, cite the only type that has so far been conducted in practice, is a direct attack on centralized control systems. In this attack, computer components of control systems are compromised by the same techniques used to compromise any other kind of computer system such as exploiting insecure services running on internet-exposed ports and using one compromised system to compromised other systems connected with it through an ostensably secure internal network. These attacks are very powerful as they yield the attacker direct control over whatever outputs the control systems are controlling. If an attacker manages to compromise a power stations control computers, they may be able to influence generation output or even cause an emergency shutdown. % FIXME Despite their potentially large impact, these attacks are only moderately interesting from a scientific perspective. For one, their mitigation mostly consists of a straightforward application of security practices well-known for decades. Though there is room for the implementation of genuinely new, application-specific security systems in this field, the general state of the art is lacking behind the rest of the computer industry such that the low-hanging fruit should take priority. % FIXME cite this bold claim very properly In addition, given political will these systems can readily be secured since there is only a comparatively small number of them and driving a technician to every one of them in turn to install some security update is perfectly feasible. \subsubsection{Control function exploits} Control function exploits are attacks on the mathematical control loops used by the centralized control system. One example of such an attack would be resonance attacks as described in \textcite{wu01}. In this kind of attack, inputs from peripheral sensors indicating grid load to the centralized control system are carefully modified to cause a disproportionally large oscillation in control system action. This type of attack relies on complex resonance effects that arise when mechanical generators are electrically coupled. These resonances, coloquially called ``modes'' are well-studied in power system engineering\cite{rogers01,grebe01,entsoe01}. % FIXME: refer to section on stability control above here Even disregarding modern attack scenarios, for stability electrical grids are designed with measures in place to dampen any resonances inherent to grid structure. Still, requiring an accurate grid model these resonances are hard to analyze and unlikely to be noiticed under normal operating conditions. Mitigation of these attacks is most easily done by on the one hand ensuring unmodified sensor inputs to the control systems in the first place, and on the other hand carefully designing control systems not to exhibit exploitable behavior such as oscillations. % FIXME cite mitigation approaches \subsubsection{Endpoint exploits} One rather interesting attack on smart grid systems is one exploiting the grid's endpoint devices such as smart electricity meters\footnote{ Though potentially this could also aim at other kinds of devices distributed on a large scale such as sensors in unmanned substations. % FIXME cite verify } These meters are deployed on a massive scale, with several thousand meters deployed for every substation. % FIXME cite (this should be straightforward) Thus, once compromised restoration to an uncompromised state can be potentially very difficult if it requires physical access to thousands of devices hidden inaccessible in private homes. By compromising smart electricity meters, an attacker can trivially forge the distributed energy measurements these devices perform. In a best-case scenario, this might only affect billing and lead to customers being under- or over-charged if the attack is not noticed in time. However, in a less ideal scenario the energy measurements taken by these devices migth be used to inform the grid centralized control systems % FIXME cite and a falsification of these measurements might lead to inefficiency. In some countries and for some customers, these smart meters have one additional function that is highly useful to an attacker: They contain high-current load switches to disconnect the entire household or business in case electricity bills are left unpaid for a certain period. In countries that use these kinds of systems, the load disconnect is often simply hooked up to one of the smart merter's central microcontroller's general-purpose IO pins, allowing anyone compromising this microcontroller's firmware to actuate the load switch at will. % FIXME validate cite add pictures Given control over a large number of network-connected smart meters, an attacker might thus be able to cause large-scale disruptions of power consumption by repeatedly disconnecting and re-connecting a large number of consumers. % FIXME cite some analysis of this Combined with an attack method such as the resonance attack from \textcite{wu01} that was mentioned above, this scenario poses a serious danger to grid stability. % FIXME add small-scale load shedding for heaters etc. \subsection{Attacker models in the smart grid} \subsection{Practical attacks} \subsection{Practical threats} \subsection{Conclusion, or why we are doomed} We can conclude that a compromise of a large number of smart electricity meters cannot be ruled out. The complexity of network-connected smart meter firmware makes it exceedingly unlikely that it is in fact flawless. Large-scale deployments of these devices under some circumstances such as where they are used with load disconnect relays make them an attractive target for attackers interested in causing grid instability. The attacker model for these devices very definitely includes enemy states, who have considerable resources at their disposal. For a reasonable guarantee that no large-scale compromises of hard- and software built today will happen over a span of some decades, we would have to radically simplify its design and limit attack surface. Unfortunately, the complexity of smart electricity meter implementations mostly stems from the large list of requirements these devices have to conform with. Additionally, standards have already been written and changes that reduce scope or functionality have become exceedingly unlikely at this point. A general observation with smart grid systems of any kind is that they comprise a zealous departure of the decentralized control structure of yesterday's dumb grid and the advent of centralization at an enormous scale. This modern, centralized infrastructure has been carefully designed to defend against malicious actors%FIXME cite and all involved parties have an interest in keeping it secure. Still, like in any other system this centralization also makes a very attractive target for attackers since an attacker can likewise employ this centralized control to their goals. Fundamentally, decentralized systems tend to make attacks of any kind a lot more costly and one might question whether security has truly been gained during smart grid rollout. % FIXME hot take maybe \chapter{Restoring endpoint safety in an age of smart devices} If as layed out in the previous paragraph we cannot rule out a large-scale compromise of smart energy meters, we have to rephrase our claim to security. If we cannot rule out exploitation, we have to limit its impact. If we assume that we cannot strip any functionality from smart meters since it may be required by standards or for enormous social benefits\cite{mcdaniel01} % FIXME is sarcasm ok here? all we can do is to flush out an attacker once they are in. In a worst-case scenario an attacker would gain unconstrained code execution e.g. by exploiting a flaw in a network protocol implentation. Since smart meters use standard microcontrollers that do not have advanced memory protection functions (see pg. \ref{sm-cpu}), at this point we can assume the attacker has full control over the main microcontroller. With this control they can actuate the load switch if present, transmit data through the device's communication interfaces or use the user interface components such as LEDs and the LCD. Using the self-programming capabilities of modern flash microcontrollers, an attacker may even gain persistency without much trouble. Note that in systems separating cryptographic functions into some form of cryptographic module such as systems used in Germany % TODO list other countries as well? FIXME cite BSI standard requiring this we can be optimistic and assume the attacker has not in fact compromised this cryptographic co-processor yet and does not have access to any cryptographic secrets yet. Given that the attacker has complete control over the meter's core microcontroller and given that due to cost constraints we are bound to use whatever microcontroller the meter OEM has chosen for their design, we cannot rely on software running on the core mircocontroller to restore system integrity. Our solution to this problem is to add another, very small microcontroller to the smart meter design. This microcontroller will contain a small piece of software to receive cryptographically authenticated commands from utility companies and on demand reset the meter's core microcontroller to a known-good state. We have to assume the code in the core controller's flash memory has been compromised, so our only option to flush out an attacker is to re-program the core microcontroller in its entirety. We propose using JTAG to re-program the core microcontroller % TODO get terminology consistent. Is "core microcontroller" a good term here? with a known-good firmware image read from a sufficiently large SPI flash connected to the reset controller. JTAG is supported by most microcontrollers complex enough to end up in a smart meter design % TODO colloquialism and given adequate documentation JTAG programming functionality can be ported to new microcontrollers with relatively little work. On the microcontroller side our solution requires the JTAG interface to be activated (i.e. not fused-shut) and for our solution to work core microcontroller firmware must not be able to permanently disable the JTAG interface from within. In microcontrollers that do not yet provide this functionality this is a minor change that could be added to a custom microcontroller variant at low cost. On most microcontrollers keeping JTAG open should not interfere with code readout protection. Code secrecy should be of no concern\cite{schneier01} here but besides security manufacturers have strong preferences about this due to fear of copyright infringement. \section{The theory of endpoint safety} \label{sec_criteria} In order to gain anything by adding our reset controller to the smart meter's already complex design we must satisfy two interrelated conditions. \begin{enumerate} \item \textsc{security} means our reset controller itself does not have any remotely exploitable flaws \item \textsc{safety} menas our reset controller will perform its job as intended \end{enumerate} Note that our \textsc{security} property includes only remote exploitation, and excludes any form of hardware attack. Even though most smart meters provide some level of physical security, we do not wish to make any assumptions on this. In the following section we will elaborate our attacker model and it will become apparent that sufficient physical security to defend against all attackers in our model would be infeasible, and thus we will design our overall system to remain secure even assuming some number of physically compromised devices. % FIXME expand \subsection{Attack characteristics} The attacker model these two conditions must hold under is as follows. We assume three angles of attack: Attacks by the customer themselves, attacks by an insider within the metering systems controlling utility company and lastly attacks from third parties. Examples for these third parties are hobbyist hackers or outside cyber-criminals on the one hand, but also other companies participating in the smart grid infrastructure besides the utility company such as intermediary providers of meter-reading services. Due to the critical nature of the electrical grid, we have to include hostile state actors in our attacker model. When acting directly, these would be classified as third-party attackers by the above schema, but they can reasonably be expected to be able to assume either of the other two roles as well e.g. through infiltration or bribery. \textcite{fraunholz01} in their elaboration of their generalized attacker model give some classification of attackers and provide a nice taxonomy of attacker properties. In their threat/capability rating, criminals are still considered to have higher threat rating than state-sponsored attackers. The New York Times reported in 2016 that some states recruit their hacking personnel in part from cyber-criminals. If this report is true, in a worst-case scenario we have to assume a state-sponsored attacker to be the worst of both types. Comparing this against the other attacker types in \textcite{fraunholz01}, this state-sponsored attacker is strictly worse than any other type in both variables. We are left with a highly-skilled, very well-funded, highly intentional and motivated attacker. Based on the above classification of attack angles and our observations on state-sponsored attacks, we can adapt \textcite{fraunholz01} to our problem, yielding the following new attacker types: \begin{enumerate} \item \textbf{Utility company insiders controlled by a state actor} We can ignore the other internal threats described in \textcite{fraunholz01} since an insider cooperating with a state actor is strictly worse in every respect. \item \textbf{State-sponsored external attackers} A state actor can obviously directly attack the system through the internet. \item \textbf{Customers controlled by a state actor} A state actor can very well compromise some customers for their purposes. They might either physically infiltrate the system posing as legitimate customers, or they might simply deceive or bribe existing customers into cooperation. \item \textbf{Regular customers} Though a hostile state actor might gain control of some number of customers through means such as voluntary cooperation, bribery, infiltration, they are limited in attack scale since they do not want to arouse premature attention. Though regular customers may not have the motivation, skill or resources of a state-sponsored attacker, potentially large numbers of them may try to attack a system out of financial incentives. To allow for this possibility, we consider regular customers separate from state actors posing as customers in some way. \end{enumerate} \subsection{Overall structural system security} Considering overall security, we first introduce the \emph{reset authority}, a trusted party acting as the single authority for issuing reset commands in our system. In practice this trusted party may be part of the utility company, part of an external regulatory body or a hybrid setup requiring both to cooperate. We assume this party will be designed to be secure against all of the above attacker types. The precise design of this trusted party is out of scope for this work but we will list some practical suggestions on how to achieve security below. % FIXME do the list % FIXME put up a large box on this limitation Using an asymmetric cryptographic design centered around the \emph{reset authority}, we rule out all attacks except for denial-of-service attacks on our system by any of the four attacker types. All reset commands in our system originate from the \emph{reset authority} and are cryptographically secured to provide authentication and tamper detection. Under this model, attacks on the electrical grid components between the \emph{reset authority} and the customer device degrade into man-in-the-middle attacks. To ensure the \textsc{safety} criterion from \ref{sec_criteria} holds we must % FIXME check whether this \ref displays as intended make sure our cryptography is secure against man-in-the-middle attacks and we must try to harden the system against denial-of-service attacks by the attacker types listed above. Given our attacker model we cannot fully guard against this sort of attack but we can at least choose a commmunication channel that is resilient against denial of service attacks under the above model. Finally, we have to consider the issue of hardware security. We will solve the problem of physical attacks on some small number of devices by simply not programming any secret information into these devices. This also simplifies hardware production. From consideration in this work we explicitly rule out any form of supply-chain attack as out-of-scope. % FIXME include considerations on production testing somewhere (is the device working? is the right key programmed?) \subsection{Complex microcontroller firmware} The \textsc{security} property from \ref{sec_criteria} is in a large part reliant on the security of our reset controller firmware. The best method to increase firmware security is to reduce attack surface by limiting external interfaces as much as possible and by reducing code complexity as much as possible. % FIXME formalize this as something like "Design Goal DG-023-42-1" ? If we avoid the complexity of most modern microcontroller firmware we gain another benefit beyond implicitly reduced attack surface: If the resulting design is small enough we may attempt formal verification of our security property. Though formal verification tools are not yet suitable for highly complex tasks they are already barely adequate for small amounds of code and simple interfaces. \subsection{Modern microcontroller hardware} Microcontrollers have gained enormously in both performance/efficiency as well as in peripheral support. Alas, these gains have largely been driven by insatiable customer demand for faster, more powerful chips and for a long time security has not been considered important outside of some specific niches such as smartcards. Traditionally a microcontroller would spend its entire lifetime without ever being exposed to any networks. Though this trend has been reversing with the increasing adoption of internet-of-things things % FIXME is this pun ok? and more advanced security features have started appearing in general-purpose microcontrollers, most still lack even basic functionality found in processors for computers or smartphones. One of the components lacking from most microcontrollers is strong memory protection or even a memory mapping unit as it is found in all modern computer processors and SoCs for applications such as smartphones. Without an MPU/MPU some mitigations for memory safety violations cannot be implemented. This and the absence of virtualization tools such as ARM's TrustZone make hardening microcontroller firmware a big task. It is very important to ensure memory safety in microcontroller firmware through tools such as defensive coding, extensive testing and formal verification. In our design we achieve simplicity on two levels: One, we isolate the very complex metering firmware from our reset controller by having both run on separate microcontrollers. Two, we keep the reset controller firmware itself extremely simple to reduce attack surface there. \subsection{Regulatory and economical constraints} \subsection{Safety vs. Security: Opting for restoration instead of prevention} \subsection{Technical outline of a safety reset} \section{Communication channels on the grid} \subsection{Powerline communication systems and their use} \subsection{Proprietary wireless systems} \subsection{Landline IP} \subsection{IP-based wireless systems} \subsection{Frequency modulation as a communication channel} For our system, we chose grid frequency modulation (henceforth GFC) as a low-bandwidth uni-directional communications channel. Compared to traditional PLC GFC requires no additional hardware, works reliably throughout the grid and is harder to manipulate by a malicious actor. % FIXME \cite{urtasun01} \subsubsection{The frequency dependance of grid frequency} \subsubsection{Control systems coupled to grid frequency} \subsubsection{Avoiding dangerous modes} \subsubsection{Overall system parameters} \subsubsection{An outline of practical implementation} \section{From grid frequency to a reliable communications channel} \subsection{Channel properties} \subsection{Modulation and its parameters} \subsection{Error-correcting codes} \subsection{Cryptographic security} \chapter{Practical implementation} \section{Cryptographic validation} \section{Data collection for channel validation} \subsection{Frequency sensor hardware design} \subsection{Frequency sensor measurement results} \section{Channel simulation and parameter validation} \section{Implementation of a demonstrator unit} \section{Experimental results} \section{Lessons learned} \chapter{Future work} \section{Technical standardization} The description of a safety reset system provided in this work could be translated into a formalized technical standard with relatively low effort. Our system is very simple compared to e.g. a full smart meter communication standard and thus can conceivably be described in a single, concise document. The much more complicated side of standardization would be the standardization of the backend operation including key management, coordination and command authorization. \section{Regulatory adoption} Since the proposed system adds significant cost and development overhead at no immediate benefit to either consumer or utility company it is unlikely that it would be adopted voluntarily. Market forces limit what long-term planning utility companies can do. An advanced mitigation such as this one might be out of their reach on their own and might require regulatory intervention to be implemented. To regulatory authorities a system such as this one provides a powerful primitive to guard against attacks. Due to the low-level approach our system might allow a regulatory authority to restore meters to a safe state without the need of fine-grained control of implementation details such as application network protocols. A regulatory authority might specify that all smart meters must use a standardized reset controller that on command resets to a minimal firmware image that disables external communication, continues basic billing functions and enables any disconnect switches. This system would enable the \emph{reset authority} to directly preempt a large-scale attack irrespective of implementation details of the various smart meter implementations. Cryptographic key management for the smart reset system is not much different to the management of highly privileged signing keys as they are used in many other systems already. If the safety reset system is implemented with a regulatory authority as the \emph{reset authority} they would likely be able to find a public entity that is already managing root keys for other government systems to also manage safety reset keys. Availability and security requirements of safety reset keys do not differ significantly from those for other types of root keys. \section{Practical implementation} \section{Zones of trust} In our design, we opted for a safety reset controller % FIXME is "safety reset" the proper name here? We need some sort of branding, but is this here really about "safety"? in form of a separate micocontroller entirely separate from whatever application microcontroller the smart meter design is already using. This design nicely separates the meter into an untrusted application (the core microcontroller) and the trusted reset controller. Since the interface between the two is simple and logically one-way, it can be validated to a high standard of security. Despite these security benefits, the cost of such a separate hardware device might prove high in a mass-market rollout. In this case, one might attempt to integrate the reset controller into the core microcontroller in some way. Primarily, there would be two ways to accomplish this. One is a solution that physically integrates an additional microcontroller core into the main application microcontroller package either as a submodule on the same die or as a separate die in a multi-chip module (MCM) with the main application microcontroller. A full-custom solution integrating both on a single die might be a viable path for very large-scale deployments, but will most likely be too expensive in tooling costs alone to justify its use. More likely for a medium- to large-scale deployment (millions of meters) would be a MCM integrating an off-the-shelf smart metering microcontroller die with the reset controller running on another, much smaller off-the-shelf microcontroller die. This solution might potentially save some cost compared to a solution using a discrete microcontroller for the reset controller. The more likely approach to reducing cost overhead of the reset controller would be to employ virtualization technologies such as ARM's TrustZone in order to incorporate the reset controller firmware into the application firmware on the same chip without compromising the reset controller's security or disturbing the application firmware's operation. TrustZone is a virtualization technology that provides a hardware-assisted privileged execution domain on at least one of the microcontrollers cores. In traditional virtualization setups a privileged hypervisor is managing several unprivileged applications sharing resources between them. Separation between applications in this setup is longitudinal between adjacent virtual machines. Two applications would both be running in unprivileged mode sharing the same cpu and the hypervisor would merely schedule them, configure hardware resource access and coordinate communication. This longitudinal virtualization simplifies application development since from the application's perspective the virtual machine looks very similar to a physical one. In addition, in general this setup reciprocally isolates two applications with neither one being able to gain control over the other. In contrast to this, a TrustZone-like system in general does not provide several application virtual machines and longitudinal separation. Instead, it provides lateral separation between two domains: The unprivileged application firmware and a privileged hypervisor. Application firmware may communicate with the hypervisor through defined interfaces but due to TrustZone's design it need not even be aware of the hypervisor's existence. This makes a perfect fit for our reset controller. The reset controller firmware would be running in privileged mode and without exposing any communication interfaces to application firmware. The application firmware would be running in unprivileged mode without any modification. The main hurdles to the implementation to a system like this are the requirement for a microcontroller providing this type of virtualization on the one hand and the complexity of correctly employing this virtualization on the other hand. Virtualization systems such as TrustZone are still orders of magnitude more complex to correctly configure than it is to simply use separate hardware and secure the interfaces in between. \chapter{Conclusion} \newpage \appendix \chapter{Acknowledgements} \newpage \chapter{References} \nocite{*} \printbibliography \newpage \chapter{Demonstrator schematics and code} \chapter{Economic viability of countermeasures} \section{Attack cost} \section{Countermeasure cost} % FIXME maybe include a standard for the technical side of a safety reset system here, e.g. in the style of an IETF draft? \end{document}