summaryrefslogtreecommitdiff
path: root/ma/safety_reset.tex
diff options
context:
space:
mode:
Diffstat (limited to 'ma/safety_reset.tex')
-rw-r--r--ma/safety_reset.tex2804
1 files changed, 0 insertions, 2804 deletions
diff --git a/ma/safety_reset.tex b/ma/safety_reset.tex
deleted file mode 100644
index 765dcad..0000000
--- a/ma/safety_reset.tex
+++ /dev/null
@@ -1,2804 +0,0 @@
-\documentclass[12pt,a4paper,notitlepage]{report}
-\usepackage[ngerman, english]{babel}
-\usepackage[utf8]{inputenc}
-\usepackage[a4paper, top=2cm, bottom=3.5cm, left=3cm, right=4cm]{geometry}
-% Matti remarkable tablet special size
-%\usepackage[paperwidth=15cm, paperheight=244mm, top=1cm, bottom=1cm, left=5mm, right=5mm]{geometry}
-\usepackage[T1]{fontenc}
-\usepackage[
- backend=biber,
- style=numeric,
- natbib=true,
- url=false,
- doi=true,
- eprint=false
- ]{biblatex}
-\addbibresource{safety_reset.bib}
-\usepackage{amssymb,amsmath}
-\usepackage{listings}
-\usepackage{eurosym}
-\usepackage{wasysym}
-\usepackage{amsthm}
-\usepackage{tabularx}
-\usepackage{multirow}
-\usepackage{multicol}
-\usepackage{tikz}
-\usepackage{mathtools}
-\DeclarePairedDelimiter{\ceil}{\lceil}{\rceil}
-\DeclarePairedDelimiter{\paren}{(}{)}
-
-\usetikzlibrary{arrows}
-\usetikzlibrary{chains}
-\usetikzlibrary{backgrounds}
-\usetikzlibrary{calc}
-\usetikzlibrary{decorations.markings}
-\usetikzlibrary{decorations.pathreplacing}
-\usetikzlibrary{fit}
-\usetikzlibrary{patterns}
-\usetikzlibrary{positioning}
-\usetikzlibrary{shapes}
-
-\usepackage[binary-units]{siunitx}
-\DeclareSIUnit{\baud}{Bd}
-\usepackage{hyperref}
-\usepackage{tabularx}
-\usepackage{commath}
-\usepackage{graphicx,color}
-\usepackage{ccicons}
-\usepackage{subcaption}
-\usepackage{float}
-\usepackage{footmisc}
-\usepackage{array}
-\usepackage[underline=false]{pgf-umlsd}
-\usetikzlibrary{calc}
-%\usepackage[pdftex]{graphicx,color}
-\usepackage{epstopdf}
-\usepackage{pdfpages}
-\usepackage{minted} % pygmentized source code
-% Needed for murks.tex
-\usepackage{setspace}
-\usepackage[draft=false,babel,tracking=true,kerning=true,spacing=true]{microtype} % optischer Randausgleich etc.
-% For german quotation marks
-
-\usepackage{fltpage}
-
-\renewcommand{\floatpagefraction}{.8}
-\newcommand{\degree}{\ensuremath{^\circ}}
-\newcolumntype{P}[1]{>{\centering\arraybackslash}p{#1}}
-
-\usepackage{fancyhdr}
-\fancyhf{}
-\fancyfoot[C]{\thepage}
-\newcommand{\includenotebook}[2]{
- \fancyhead[C]{Included Jupyter notebook: #1}
- \includepdf[pages=1,
- pagecommand={\thispagestyle{fancy}\section{#1}\label{#2_notebook}}
- ]{resources/#2.pdf}
- \includepdf[pages=2-,
- pagecommand={\thispagestyle{fancy}}
- ]{resources/#2.pdf}
-}
-
-\begin{document}
-\selectlanguage{ngerman}
-\input{murks}
-\titelen{A Post-Attack Recovery Architecture for Smart Electricity Meters}
-\titelde{Eine Architektur zur Kontrollwiederherstellung nach Angriffen auf Smart Metering in Stromnetzen}
-\typ{Masterarbeit}
-\grad{Master of Science (M. Sc.)}
-\autor{Jan Sebastian Götte}
-\gebdatum{\rule{2cm}{12pt}} % Geburtsdatum des Autors
-\gebort{\rule{3cm}{12pt}} % Geburtsort des Autors
-\ifdefined\includeprivatedata
-\input{private-data.tex}{}{}
-\fi
-\gutachter{Prof. Dr. Björn Scheuermann}{Prof. Dr.-Ing. Eckhard Grass}
-\mitverteidigung
-\makeTitel
-\selbstaendigkeitserklaerung{\today}
-\vfill
-\selectlanguage{english}
-{\center{
-\begin{minipage}[t][10cm][b]{\textwidth}
- \center{\ccbysa}
-
- \center{This work is licensed under a Creative-Commons ``Attribution-ShareAlike 4.0 International'' license. The
- full text of the license can be found at:}
-
- \center{\url{https://creativecommons.org/licenses/by-sa/4.0/}}
-
- \center{For alternative licensing options, source files, questions or comments please contact the author at
- \texttt{masterarbeit@jaseg.de}}.
-
- \center{This is version \texttt{\input{version.tex}\unskip} generated on \today. The git repository can be found at:}
-
- \center{\url{https://git.jaseg.de/master-thesis.git}}
-\end{minipage}
-}}
-\newpage
-
-% Hier folgt die eigentliche Arbeit (bei doppelseitigem Druck auf einem neuen Blatt):
-\tableofcontents
-\newpage
-
-\chapter{Introduction}
-
-In the power grid, as in many other engineered systems, we can observe an ongoing diffusion of information systems into
-industrial control systems. Automation of these control systems has already been practiced for the better part of a
-century. Throughout the 20th century this automation was mostly limited to core components of the grid. Generators in
-power stations are computer-controlled according to electromechanical and economic models. Switching in substations is
-automated to allow for fast failure recovery. Human operators are still vital to these systems, but their tasks have
-shifted from pure operation to engineering, maintenance and surveillance\cite{crastan03,anderson02}.
-
-With the turn of the century came a large-scale trend in power systems to move from a model of centralized generation,
-built around massive large-scale fossil and nuclear power plants, towards a more heterogenous model of smaller-scale
-generators working together. In this new model large-scale fossil power plants still serve a major role, but two new
-factors come into play. One is the advance of renewable energies. The large-scale use of wind and solar power in
-particular from a current standpoint seems unavoidable for our continued existence on this planet. For the electrical
-grid these systems constitute a significant challenge. Fossil-fueled power plants can be controlled in a precise and
-quick way to match energy consumption. This tracking of consumption with production is vital to the stability of the
-grid. Renewable energies such as wind and solar power do not provide the same degree of controllability, and they
-introduce a larger degree of uncertainty due to the unpredictability of the forces of nature\cite{crastan03}.
-
-Along with this change in dynamic behavior, renewable energies have brought forth the advance of distributed generation.
-In distributed generation end-customers that previously only consumed energy have started to feed energy into the grid
-from small solar installations on their property. Distributed generation is a chance for customers to gain autonomy and
-shift from a purely passive role to being active participants of the electricity market\cite{crastan03}.
-
-To match this new landscape of decentralized generation and unpredictable renewable resources the utility industry has
-had to adapt itself in major ways. One aspect of this adaptation that is particularly visible to ordinary people is the
-computerization of end-user energy metering. Despite the widespread use of industrial control systems inside the
-electrical grid and the far-reaching diffusion of computers into people's everyday lives the energy meter has long been
-one of the last remnants of an offline, analog time. Until the 2010s many households were still served through
-electromechanical Ferraris-style meters that have their origin in the late 19th
-century\cite{borlase01,ukgov04,bnetza02}. Today under the umbrella term \emph{Smart Metering} the shift towards fully
-computerized, often networked meters is well underway. The roll out of these \emph{Smart Meters} has not been very
-smooth overall with some countries severely lagging behind. As a safety-critical technology, smart metering technology
-is usually standardized on a per-country basis. This leads to an inhomogenous landscape with--in some instances--wildly
-incompatible systems. Often vendors only serve a single country or have separate models of a meter for each country.
-This complex standardization landscape and market situation has led to a proliferation of highly complex, custom-coded
-microcontroller firmware. The complexity and scale of this--often network-connected--firmware makes for a ripe substrate
-for bugs to surface.
-
-A remotely exploitable flaw inside a smart meter's firmware\footnote{
- There are several smart metering architectures that ascribe different roles to the component called \emph{smart
- meter}. Coarsely divided into two camps these are systems where all metering and communication functions reside
- within one physical unit and systems where metering and communication functions are separated into two units called
- the \emph{smart meter} and the \emph{smart meter gateway}\cite{stuber01}. An example for the former are setups in
- the USA, an example of the latter is the setup in Germany. For clarity, in this introductory chapter we use
- \emph{smart meter} to describe the entire system at the customer premises including both the meter and a potential
- gateway.
-} could have consequences ranging from impaired billing functionality to an existential threat to grid
-stability\cite{anderson01,anderson02}. In a country where meters commonly include disconnect switches for purposes such
-as prepaid tariffs a coördinated attack could at worst cause widespread activation of grid safety systems by repeatedly
-connecting and disconnecting megawatts of load capacity in just the wrong moments\cite{wu01}.
-
-Mitigation of these attacks through firmware security measures is unlikely to yield satisfactory results. The enormous
-complexity of smart meter firmware makes firmware security extremely labor-intensive. The diverse standardization
-landscape makes a coördinated, comprehensive response unlikely.
-
-In this thesis, instead of focusing on the very hard task of improving firmware security we introduce a pragmatic
-solution to the--in our opinion likely--scenario of a large-scale compromise of smart meter firmware. In our proposal
-the components of the smart meter that are threatened by remote compromise are equipped with a physically separate
-\emph{safety reset controller} that listens for a reset command transmitted through the electrical grid's frequency and
-on reception forcibly resets the smart meter's entire firmware to a known-good state. Our safety reset controller
-receives commands through Direct Sequence Spread Spectrum (DSSS) modulation carried out on grid frequency through a
-large controllable load such as an aluminum smelter. After forward error correction and cryptographic verification it
-re-flashes the meter's main microcontroller over the standard JTAG interface.
-
-In this thesis, starting from a high level architecture we have carried out extensive simulations of our proposal's
-performance under real-world conditions. Based on these simulations we implemented an end-to-end prototype of our
-proposed safety reset controller as part of a realistic smart meter demonstrator. Finally we experimentally validated
-our results and we will conclude with an outline of further steps towards a practical implementation.
-
-\chapter{Fundamentals}
-
-\section{Structure and operation of the electrical grid}
-
-Since this thesis is filed under \emph{computer science} we will provide a very brief overview of some basic concepts of
-modern power grids.
-
-\subsection{Structure of the electrical grid}
-
-The electrical grid is composed of a large number of systems such as distribution systems, power stations and substations
-interconnected by long transmission lines. Mostly due to ohmic losses\footnote{
- Power dissipation of a resistor of resistance $R [\Omega]$ given current $I [A]$ is $P_\text{loss} [W] =
- U_\text{drop} \cdot I = I^2 \cdot R$. Fixing power $P_\text{transmitted} [W] = U_\text{line} \cdot I$ this yields a
- dependency on line voltage $U_\text{line} [V]$ of $P_\text{loss} =
- \left(\frac{P_\text{transmitted}}{U_\text{line}}\right)^2 \cdot R$. Thus, ignoring other losses a $2\times$ increase
- in transmission voltage halves current and cuts ohmic losses to a quarter. In practice the economics are much more
- complicated due to the cost of better insulation for higher-voltage parts and the cost of power factor compensation.
-}
-the efficiency of transmission of electricity through long transmission lines increases with the square of
-voltage\cite{crastan01,simon01}. % simon01: p. 425, 9.4.1.1, crastan p.55, 3.1
-In practice economic considerations take into account a reduction of the considerable transmission losses (about
-\SI{6}{\percent} in case of Germany\cite{destatis01}) as well as the cost of equipment such as additional transformers
-and the cost increase for the increased voltage rating of components such as transmission lines. Overall these
-considerations have led to a hierarchical structure where large amounts of energy are transmitted over very long
-distances (up to thousands of kilometers) at very high voltages (upwards of \SI{200}{\kilo\volt}) and voltages get lower
-the closer one gets to end-customer premises. In Germany at the local level a substation will distribute
-\SIrange{10}{30}{\kilo\volt} to large industrial consumers and small transformer substations which converting this to
-the \SI{400}{\volt} three-phase AC households are usually hooked up with\cite{crastan01}.
-
-\subsubsection{Transmission lines, bus bars and tie lines}
-
-The number one component of the electrical grid are transmission lines. Short transmission lines that tightly couple
-parts of a substation are called \emph{bus bars}. Transmission lines that couple otherwise independent grid segments are
-called \emph{tie lines}. A tie line often connects grid segments operated by two different operators e.g.\ across a
-country border.
-
-In mathematical analysis \emph{short} transmission lines can be approximated as a simple lumped-component
-RLC\footnote{Resistor-inductor-capacitor.} circuit. In longer lines the effect of wave propagation along the line has to
-be taken into consideration. In the lumped model the transmission line is represented by a circuit of one or two
-inductors, one or two capacitors and some resistors. This representation simplifies analysis. For \emph{long}
-transmission lines above \SI{50}{\kilo\meter} (cable) or \SI{250}{\kilo\meter} (overhead lines) this approximation
-breaks down and wave propagation along the line's length has to be taken into account. The resulting model is what RF
-engineering calls a transmission line and models the line's parasitics\footnote{Stray capacitance, ohmic resistance and
-stray inductance.} as being uniformly distributed along the length of the line. To approximate this model in
-lumped-element evaluations the line is represented as a long chain of small lumped-component RLC sections. This complex
-structure makes simulation and analysis more difficult in comparison to short lines\cite{crastan01}.
-
-Almost all transmission lines used in the transmission and distribution grid use three-phase alternating current (AC).
-Long-distance overland lines are usually implemented as overhead lines due to their low cost and ease of maintenance.
-Underground cables are much more expensive because of their insulation and are only used when overhead lines cannot be
-used for reasons such as safety or aesthetics. In specialized applications such as long, high-power undersea cables
-high-voltage DC (HVDC) is used. In HVDC converter stations at both ends of the line convert between three-phase AC and
-the line's DC voltage. These converter stations are controlled electronically and do not exhibit any of the mechanical
-inertia that is characteristic for rotating generators in a power plant. Since HVDC re-synthesizes three-phase AC from
-DC at the receiving end of the line it can be used to couple non-synchronous grids. This allows for additional degrees
-of control over the transmission of power compared to a regular transmission line. These technical benefits are offset
-by high initial cost (mostly due to the converter stations) leading to HVDC being used in specific situations
-only\cite{crastan03}.
-
-\subsubsection{Generators}
-
-Traditionally all generators in the power grid were synchronous machines. A synchronous machine is a generator whose
-copper coils are wound and connected in such a way that during normal operation its rotation is synchronous with the
-grid frequency. Grid frequency and generator rotation speed are bidirectionally electromechanically coupled. If a
-generator's angle of rotation would lag behind the grid it would receive electrical energy from the grid and convert it
-into mechanical energy, acting as a motor--When the machine leads it acts as a generator and is braked. Small
-deviations between rotational speed and grid frequency will be absorbed by the electromechanical coupling between both.
-Maintaining optimal synchronization over time is the task of complex control systems inside power stations' speed
-governors\cite{simon01,crastan01}.
-
-Nowadays besides traditional rotating generators the grid also contains a large amount of electronically controlled
-inverters. These inverters are used in photovoltaic installations and other setups where either DC or non-synchronous AC
-is to be fed into the grid. Setups like these behave differently to rotating generators. In particular \emph{inertia} in
-these setups is either absent or a software parameter. This potentially reduces their overload capacity compared to
-rotating generators. The fundamentally different nature of electronically controlled inverters has to be taken into
-account in planning and regulation\cite{crastan03}.
-
-\subsubsection{Switchgear}
-
-In the electrical grid switches perform various roles. The ones a computer scientist would recognize are used for
-routing electricity between transmission lines and transformers and can be classified into ones that can be switched
-under load (called load switches) and ones that can not (called disconnectors). The latter are used to ensure parts of
-the network are free from voltage e.g.\ during maintenance. The former are used to re-route flows of electrical
-currents. A major difference in their construction is that in contrast to disconnectors load switches have built-in
-components that extinguish the high-power arc discharge that forms when the circuit is interrupted under load\footnote{
- While an arc discharge is considered a fault condition in most low-voltage systems including computers, in energy
- systems it is often part of normal operation.
-}. Beyond this there are circuit breakers. Circuit breakers are safety devices that even under failure conditions can
-still switch at several times the circuit's nominal current. They are activated automatically on conditions such as
-overcurrent or overvoltage. Finally, fuses can be considered non-resettable switches. The fuse in a computer power
-supply is barely more than a glass tube with some wire in it that is designed to melt at the designated current. In
-energy systems fuses are often much more complex devices that in some cases utilize explosives to quickly and decisively
-open the circuit and extinguish the resulting arc discharge\cite{nelles01,crastan01,simon01}.
-% disconnect switches, fuses, breakers -> crastan 1 (ch. 8)
-
-\subsubsection{Transformers}
-
-Along with transmission lines transformers are one of the main components most people will be thinking of when talking
-about the electrical grid. Transformers connect grid segments at different voltage levels with one another. In the
-distribution grid transformers are used to provide standard end-user voltage levels to the customer (e.g. 230/400V in
-Europe) from a \SIrange{10}{25}{\kilo\volt} feeder. In places that use overhead wiring to connect customer households
-this is the role of the pole-mounted gray devices the size of a small refrigerator that are characteristic for these
-systems. Transformers can also be used to convert between buses without a fourth neutral conductor and buses with one.
-
-Transformers are large and heavy devices consisting of thick copper wire or copper foil windings arranged around a core
-made from thin stacked, insulated iron sheets. The entire core sits within a large metal enclosure that is filled with
-liquid (usually a specialized oil) for both cooling and electrical insulation. This cooling liquid is cooled by radiator
-fins on the transformer enclosure itself or an external heat exchanger. Depending on the design cooling may rely on
-natural convection within the cooling liquid or on electrical pumps\cite{crastan01,simon01}.
-
-Transformers come in a large variety of coil and wiring configurations. There exist autotransformers where the secondary
-is part of the primary (or vice-versa) that are used to translate between voltage levels without galvanic isolation at
-lower cost. Transformers used in parts of the electrical grid often have several taps and include \emph{tap changers}. A
-tap changer is a system of mechanical switches that can be used to switch between several discrete transformer ratios to
-adjust secondary voltage under load\cite{simon01}. Tap changers are used in the distribution grid to maintain the
-specified voltage tolerances at the customer's connection.
-
-\subsubsection{Instrument transformers}
-
-While operating on the exact same physical principles instrument transformers are very different from regular
-transformers in an energy system. Instrument transformers are specialized low-power transformers that are used as
-transducers to measure voltage or current at very high voltages. They are part of the control and protection systems of
-substations\cite{crastan01}.
-
-\subsubsection{Chokes}
-
-Chokes are large inductors. In power grid applications their construction is similar to the construction of a
-transformer with the exception that they only have a single winding on the core. They are used for a variety of
-purposes. A frequent use is as a series inductor on one of the phases or the neutral connection to limit transient fault
-currents. In addition to this inductors are also used to tune LC circuits. One such use are Petersen coils, large
-inductors in series with the earth connection at a transformer's star point that are used to quickly extinguish arcs
-between phase and ground on a transmission line. The Petersen coil forms a parrallel LC resonant circuit with the
-transmission line's earth capacitance. Tuning this circuit through adjusting the Petersen coil reduces earth fault
-current to a level low enough to quickly extinguish the arc\cite{simon01}.
-
-\subsubsection{Power factor correction}
-
-Power factor is a power engineering term that is used to describe how close the current waveform of a load is to that of
-a purely resistive load. Given sinusoidal input voltage $V(t) = V_\text{pk} \sin \paren{\omega_\text{nom} t}$ with
-$\omega_\text{nom} = 2 \pi f_\text{nom} = 2 \pi \cdot \SI{50}{\hertz}$ being the nominal angular frequency, the current
-waveform of a resistor with resistance $R \left[\Omega\right]$ according to Ohm's law would be $I(t) = \frac{V(t)}{R} =
-\frac{1}{R} V_\text{pk} \sin\paren{\omega_\text{nom} t}$. In this case voltage and current are perfectly in phase, i.e.
-the current at time $t$ is linear in voltage at constant factor $\frac{1}{R}$.
-
-In contrast to this idealized scenario reality provides us with two common issues: One, the load may be reactive. This
-means its current waveform is an ideal sinusoid, but there is a phase difference between mains voltage and load current
-like so: $I(t) = \frac{V(t)}{R} = \frac{1}{\left|Z\right|} V_\text{pk} \sin\paren{\omega_\text{nom} t + \varphi}$. $Z$
-is the load's complex impedance combining inductive, capacitive and resistive components and $\varphi$ is the phase
-difference between the resulting current waveform and the mains voltage waveform. Examples of such loads are motors and
-the inductive ballasts in old fluorescent lighting fixtures.
-
-The second potential issue are loads with a non-sinusoidal current waveform. There are many classes of these but the
-most common one are the switching-mode power supplies (SMPS) used in most modern electronic devices.. Most SMPS have an
-input stage consisting of a bridge rectifier followed by a capacitor that provide high-voltage DC power to the following
-switch-mode convert circuit. This rectifier-capacitor input stage under normal load draws a high current only at the
-very peak of the input voltage sinusoid and draws almost zero current for most of the period.
-
-These two cases are measured by \emph{displacement power factor} and \emph{distortion power factor} that when combined
-yield the overall true power factor. The power factor is a key quantity in the design and operation of the power grid.
-As a variable in the operation of electrical grids it is also referred to as \emph{VAR} after its is unit Volt-Ampère
-Reactive. A high power factor (close to $1.0$, i.e.\ an in-phase sinusoidal current waveform) yields lowest
-transmission and generation losses. If reactive power generation and consumption are mismatched and power factor is
-low, high currents develop that lead to high transmission losses. For this reason grids include circuits to compensate
-reactive power imbalances\cite{crastan01}. These circuits can be as simple as inductors or capacitors connected to a
-power line but often can be switched to adapt to changing load conditions. Static var compensators are particularly
-fast-acting reactive power compensation devices whose purpose is to maintain a constant bus voltage\cite{rogers01}.
-
-\subsubsection{Loads}
-
-Lastly, there is the loads that the electrical grid serves. Loads range from mains-powered indicator lights in devices
-such as light switches or power strips weighing in at mere Milliwatts to large smelters in industrial metal production
-that can consume a fraction of a gigawatt all on their own.
-
-\subsection{Operational concerns}
-\subsubsection{Modelling the electrical grid}
-
-Modelling performs an important role in the engineering of a reliable power infrastructure. The grid is a complex,
-highly dynamic system. To maintain operational parameters such as voltage, grid frequency and currents inside their
-specified ranges complex control systems are necessary. To design and parametrize such control systems simulations are a
-valuable tool. Using model calculations the effects of control systems on operational variables such as transmission
-efficiency or generation losses can be estimated. Model simulations can be used to identify structural issues such as
-potential points of congestion. The same models can then be used to engineer solutions to such issues, e.g.\ by
-simulating the effect of a new transmission line.
-
-There are several aspects under which the grid or parts of the grid can be simulated. There are static analysis methods
-such as modal analysis that yield information on problematic electromechanical oscillations by computing the eigenvalues
-of a large system of differential equations describing the collective behavior of all components of the grid. Modal
-analysis is one example of simulations used in grid planning. Modal analysis is used in decisions to install additional
-stabilization systems in a particular location. In contrast to static analysis, transient simulations calculate an
-approximation of the time-domain behavior of some variable of interest under a given model. Transient simulations are
-used e.g.\ in the design of control systems. Finally, power flow equations describe the flow of electrical energy
-throughout the network from generator to load. Numerical solutions these equations are used to optimize control
-parameters to increase overall efficiency.
-
-% TODO decide what of this to keep.
-% \subsubsection{Generator controls}
-% \subsubsection{Load shedding}
-% \subsubsection{System stability}
-% \subsubsection{Power System Stabilizers}
-
-\section{Smart meter technology}
-
-Smart meters were a concept pushed by utility companies throughout the early 21st century. Smart metering is one component of the
-larger societal shift towards digitally interconnected technology. Old analog meters required that service personnel
-physically come to read the meter. \emph{Smart} meters automatically transmit their readings through modern
-technologies. Utility companies were very interested in this move not only because of the cost savings for meter reading
-personnel: An always-connected meter also allows several entirely new use cases that have not been possible before. One
-often-cited one is utilizing the new high-resolution load data to improve load forecasting to allow for greater
-generation efficiency. Computerizing the meter also allows for new fee models where electricity cost is no longer fixed
-over time but adapts to market conditions. Models such as prepayment electricity plans where the customer is
-automatically disconnected until they pay their bill are significantly aided by a fully electronic system that can be
-controlled and monitored remotely\cite{anderson02}. A remotely controllable disconnect switch can also be used to coerce
-customers in situations where that was not previously economically possible\footnote{
- The Swiss association of electrical utility companies in Section 7.2 Paragraph (2)a of their 2010 white paper on the
- introduction of smart metering\cite{vseaes01} cynically writes that remotely controllable disconnect switches ``lead
- a new tenant to swiftly register'' with the utility company. This white paper completely vanished from their website
- some time after publication, but the internet archive has a copy.
-}. Figure \ref{fig_smgw_schema} shows a schema of a smart metering installation in a typical household\cite{stuber01}.
-
-\begin{figure}
- \centering
- \includegraphics[width=\textwidth]{resources/smgw_usage_scenario}
- \vspace*{1cm}
- \caption{A typical usage scenario of a smart metering system in a typical home. This diagram shows a gateway
- connected to multiple smart meters through its local metrological network (LMN) and a multitude of devices on the
- customer's home area network (HAN). A solar inverter and an electric car are connected through a controllable local
- systems (CLS) adaptor.}
- \label{fig_smgw_schema}
-\end{figure}
-
-To the customer the utility of a smart meter is largely limited to the convenience of being able to read it without
-going to their basement. In the long term it is said that there will be second-order savings to the customer since
-electricity prices adapting to the market situation along with this convenience will lead them to consume less
-electricity and to consume it in a way that is more amenable to utilities, both leading to reduced
-cost\cite{borlase01,bmwi03,anderson02}.
-
-Traditional Ferraris counters with their distinctive rotating aluminum disc are simple electromechanical devices. Since
-they do not include any semiconductors or other high technology that might be prone to failure a cheap Ferraris-style
-meter can last decades. In contrast to this, smart meters are complex high technology. They are vastly more expensive to
-develop in the first place since they require the development and integration of large amounts of complex, custom
-firmware. Once deployed, their lifetime is limited by this complexity. Complex semiconductor devices tend to fail, and
-firmware that needs to communicate with the outside world tends to not age well\cite{borkar01}. This combination of
-higher unit cost and lower expected lifetime leads to increased costs per household. This cost is usually shared between
-utility and customer.
-
-As part of its smart metering rollout the German government in 2013 had a study conducted on the economies of smart
-meter installations. This study came to the conclusion that for the majority of households computerizing an existing
-Ferraris meter is uneconomical. For larger consumers or new installations the higher cost of installation over time is
-expected to be offset by the resulting savings in electricity cost\cite{bmwi03}.
-
-\subsection{Smart metering and Human-Computer Interaction}
-
-A fundamental aspect in realizing many of the cost and energy savings promised by the smart metering revolution is that
-it requires a paradigm shift in consumer interaction. Previously most consumers would only confront their energy use
-when they receive their monthly or yearly electricity bill. A large part of the cost savings smart meters promise over
-traditional metering infrastructure\footnote{ We are excluding savings from Demand-Side Response (DSR) implemented
-through smart meters here: Traditional ripple control systems already allowed for these\cite{dzung01}, and due to the
-added cost of high-power relays many smart meters do not include such features. } critically depend on the consumer
-regularly interacting with the meter through an in-home display or app, then changing their behavior. We live in an era
-where our attention is already highly contested. A myriad of apps and platforms compete for our attention through our
-smart phones and other devices. Introducing an entirely new service exerting cognitive pressure into this already
-complex battleground is a large endeavour. On the one hand it is not clear how this new service would compete with
-everything else. On the other hand if it does manage to capture our attention and lead us to modify our behavior, what
-are the side effects? For instance an in-home display might increase financial anxiety in economically disadvantaged
-customers.
-
-Human Computer Interaction research has touched the topic of smart metering several times and has many insights to offer
-for technologists\cite{pierce01,rodden01,lupton01,costanza01,fell01}. An issue pointed out in \cite{rodden01} is that at
-least in some countries consumers fundamentally distrust their utility companies. This trust issue is exacerbated by
-smart meters being unilaterally forced onto consumers by utility companies. Much of the success of smart metering's
-ubiquitous promises of energy savings depends on consumer coöperation. Here, the aforementioned trust issue calls into
-question smart metering's chances of long-term success.
-
-As \cite{pierce01} pointed out smart metering developments could benefit greatly from early involvement of HCI research.
-A systematic analysis of non-technical aspects can prevent issues such as privacy implications initially being
-overlooked in the dutch deployment\cite{cuijpers01}. It is not clear that current standardization practice encompasses
-an in-depth consideration of the role of consumers in the socio-technological environment posed by this new technology.
-Standardization is often narrowly focused on technological aspects with little input beyond the occassional public
-consultation at the time the new standards are being implemented into law. This corporate-driven approach to
-technological progress being forced through national standardization bears a risk of failing to meet its advertised
-consumer benefits.
-
-\subsection{Common components}
-\label{sm-cpu}
-
-Smart meters usually are built around an off-the-shelf microcontroller (microcontroller unit, MCU). Some meters use
-specialized smart metering system-on-chips (SoCs)\cite{ifixit01} while others use standard microcontrollers with core
-metering functions implemented in external circuitry (cf.\ Section \ref{sec-easymeter} where we detail the meter in our
-demonstration setup). Specialized SoCs usually contain a segment LCD driver along with some high-resolution
-analog-to-digital converters for the actual measurement functions. In many smart meter designs the metering SoC is
-connected to another full-featured SoC acting as the modem. At a casual glance this might seem to be a security measure,
-but it is be more likely that this is done to ease integration of one metering platform with several different
-communication stacks (e.g.\ proprietary sub-gigahertz wireless, power line communication (PLC) or Ethernet). In these
-architectures there is a clear line of functional demarcation between the metering SoC and the modem. As evidenced by
-over-the-air software update functionality (see e.g.\ \cite{honeywell01}) this does not however extend to an actual
-security boundary.
-
-Energy usage is calculated by measuring both voltage and current at high resolution and then integrating the
-measurements. Current measurements are usually made with either a current transformer or a shunt in a four-wire
-configuration. Voltage is measured by dividing input AC down with a resistor chain. Both are integrated digitally using
-the MCU's time base as a reference.
-
-Whereas legacy electromechanical energy meters only provided a display of aggregate energy use through a decimal counter
-as well as an indirect indication of power through a rotating wheel one of the selling points of smart meters is their
-ability to calculate advanced statistics on energy use. These statistics are supposed to help customers better target
-energy conservation measures\cite{bmwi03}.
-
-Smart meters can perform additional functions in addition to pure measurement and data aggregation. One is to serve as a
-gateway between the utility company's control systems and large controllable loads in the consumer's household for
-Demand-Side Management (DSM)\cite{borlase01}. In DSM the utility company can control when exactly a high-power device
-such as a water storage heater is switched on. To the customer the precise timing does not matter since the storage
-heater is set so that it has enough hot water in its reservoir at all times. The utility company however can use this
-degree of control to reduce load variations during peak times. The efficiency gains realized with this system translate
-into lower electricity prices for DSM-enabled loads for the customer. Traditionally DSM was realized on a local level
-using ripple control systems. In ripple control control data is coded by modulating a carrier at a low frequency such as
-\SI{400}{\hertz} on top of the regular mains voltage. These systems require high-power transmitters at tens of kilowatts
-and still can only bridge regional distances\cite{dzung01}.
-
-Another important additional function is that some smart meters can be used to remotely disconnect consumer households
-with outstanding bills. Using euphemisms such as \emph{utility revenue protection}\cite{kamstrup01} or \emph{reducing
-nontechnical losses}\cite{brown01} while cynically claiming \emph{Consumer Empowerment}\cite{kamstrup01} these systems
-allow an utility company to remotely disconnect a customer at any time\cite{anderson01}. Whereas before smart metering
-this required either additional hardware or an expensive site visit by a qualified technician smart meters have ushered
-in an era of frictionless control\footnote{ Note that in some countries such as the UK non-networked mechanical
-prepayment meters did exist. In such systems the user inserts coins into a coin slot that activates a disconnect switch
-at the household's main electricity connection. These systems were non-networked and did not allow for remote control.
-A disadvantage of such systems compared to modern \emph{smart} systems are the high cost of the coin acceptor and the
-overhead of site visits required to empty the coin box\cite{anderson02}. }.
-
-\subsection{Cryptographic coprocessors}
-
-Just like in legacy electricity meters in smart meters physical security is still a key component of the overall system
-design. Since in both types of meter cost depends on physical quantities being measured at the customer premises
-customers can save cost in case they are able to falsify the meter's measurements without being
-detected\cite{anderson02}. For this reason both types of meters employ countermeasures against physical intrusion.
-Compared to high-risk devices such as card payment processing terminals or ATMs the tamper proofing used in smart meters
-is only basic\cite{anderson02}. Common measures include sealing the case by irreversibly ultrasonically welding the
-front and back plastic shells together or the use of security seals on the lid covering the input and output screw
-terminals. The common low-tech attack of using magnets to saturate the current transformer's ferrite cores is detected
-using hall sensors\cite{anderson02,anderson03,itron01,hager01,easymeter01}. German smart metering standards specify the
-use of a smartcard-like security module to provide transport encryption and other cryptographic
-services\cite{bsi-tr-03109-2,bsi-tr-03109-2-a}. During our literature review we did not find many references to similar
-requirements in other national standards, though this does not mean that individual manufacturers do not use smartcards
-for engineering reasons or due to pressure from utilities. The limited documentation on meter internals that we did find
-such as \cite{ifixit01,bigclive01,eevblog01} suggests where no such regulation exists manufacturers and utilities likely
-choose to forego such advanced measures and instead settle on simple software implementations.
-
-\subsection{Physical structure and installation}
-
-Smart meters are installed like traditional electricity meters. In Japan this means they are usually installed on an
-exterior wall and need to be resistant against weather and extreme environmental conditions (direct sunlight, high
-temperature, high humidity). In Germany the meter is always installed either indoors or in an outdoor utility closet
-that is sealed to keep out the weather. In most countries the meter is connected through large integrated screw
-terminals. In the US meters compliant with the domestic ANSI C12 standard are round and plug into a large socket that is
-wired into the house or apartment's electrical connection.
-
-Modern smart meters are usually made with plastic cases. Ferraris meters often used cases stamped from sheet metal with
-glass windows on them. Smart meters now look much more like other modern electronic devices. A common construction style
-is to separate the case into front and back halves with both clipped or ultrasonically welded together. Ultrasonic
-welding gives a robust, airtight interface that cannot easily be separated and reconnected without leaving visible
-traces, which helps with tamper evidence properties. As an industry-standard process common in various consumer goods
-ultrasonic welding is a cheap and accessible technology\cite{easymeter01,ifixit01}.
-
-Communication interfaces sometimes are brought out through regular electromechanical connectors but often also are
-optical interfaces. A popular style here is to use a regular UART connected to an LED/phototransistor optocoupler
-mounted on the side of the case. The user interface is usually limited to an LCD display. For cost and ingress
-protection smart meters rarely use mechanical buttons. Some smart meters use a phototransistor mounted behind the
-faceplate that can be activated with a flashlight as a crude contact-less input device\cite{easymeter01}.
-
-All meters provide several options for security seals to be installed to detect opening of the meter or access to its
-terminal block. The shape and type of these security seals varies. Factory-installed seals are used to detect tampering
-of the meter itself while seals made by the utility during meter installation are used to guard the meter's terminal
-block and detect attempts at by-passing\cite{czechowski01}.
-
-\section{Regulatory frameworks around the world}
-
-Smart metering regulation varies from country to country as it is tightly coupled to the overall regulation of the
-electrical grid. The standardization of the physical form factor and metrological parameters of a meter is usually
-separate from the standardization of its \emph{smart} functionality. Most countries base the standard for their meters'
-outwards-facing communication interface on a family of standards unified under the IEC as DLMS/COSEM. Employing this
-base protocol ountry-specific standardization only covers which precise variant of it is spoken and what features are
-supported.
-
-\subsection{International standards}
-
-The family of standards one encounters most in smart metering applications are IEC 62056 specifying the Device Language
-Message Specification (DLMS) and the Companion Specification for Electronic Metering (COSEM). DLMS/COSEM are
-application-layer standards describing a request/response schema similar to HTTP. DLMS/COSEM are mapped onto a
-multitude of wire protocols. They can be spoken over TCP/IP or mapped onto low-speed UART serial interfaces
-\cite{sato01,stuber01}. Besides DLMS/COSEM there are a multitude of standards usually specifying how DLMS/COSEM are to
-be applied.
-
-DLMS/COSEM show some amount of feature creep. They do not adhere to the age-old systems design adage that a tool should
-\emph{do one thing and do it well}. Instead they try to capture the convex hull of all possible applications. This led
-to a complicated design that requires extensive additional specification and testing to maintain interoperability. In
-particular in the area of transport security it becomes evident that the IEC as an electrical engineering standards body
-stretched their area of expertise where resorting to established standard protocols would have led to a better
-outcome\cite{weith01}. Compared to industry-standard transport security the IEC standards provide a simplistic key
-management framework based on a static shared key with unlimited lifetime and provide sub-optimal transport security
-properties (e.g.\ lack of forward-secrecy)\cite{khurana01,sato01}.
-
-\subsection{The regulatory situation in selected countries}
-
-In this section we will give an overview of the situation in a number of countries. This list of countries is not
-representative and notably does not include any developing countries and is geographically biased. We selected these
-countries for illustration only and based our selection in a large part on the availability of information in a language
-we can read. We will conclude this section with a summary of common themes.
-
-\subsubsection{Germany}
-
-Germany standardized smart metering on a national level. Apart from the calibration standards applying to any type of
-meter smart meters are covered by a set of communications and security standards developed by the German Federal Office
-for Information Security (BSI). Germany mandates smart meter installations for newly constructed buildings and during
-major renovations but does not require most legacy residential installations to be upgraded. This is a consequence of a
-2013 cost-benefit analysis that found these upgrades to be uneconomical for the majority of residential
-customers\cite{bmwi03,bmwi1,bmwe01,brown01}.
-
-The German standards strictly separate between metering and communication functions. Both are split into separate
-devices, the \emph{meter} and the \emph{gateway} (called \emph{smart meter gateway} in full and often abbreviated
-\emph{SMGW}). One or several meters connect to a gateway through a COSEM-derived protocol. The communication interface
-between meter and gateway can optionally be physically unidirectional. An unidirectional interface eliminates any
-possibility of meter firmware compromise. The gateway contains a cryptographic security module similar to a
-smartcard\cite{mahlknecht01} that is entrusted with signing of measurements and maintaining an authenticated and
-encrypted communication channel with its authorities. Security of the system is certified according to a Common Criteria
-process.
-
-The German specification does not include any support for disconnect switches as they are common in some other countries
-outside of demand-side management. It only does not prohibit the installation of one behind the smart meter
-installation. This makes it theoretically possible for a utility company to still install a disconnect switch to
-disconnect a customer, but this would be a spearate installation from the smart meter. In Germany there are significant
-barriers that have to be met before a utility company may cut power to a household\cite{delaw01}. The elision of a
-disconnect switch means attacks on German meters will be limited in influence to billing irregularities and attacks
-using DSM equipment such as water storage heaters that represent only a fraction of overall load.
-
-\subsubsection{The Netherlands}
-The Netherlands were early to take initiative to roll out smart metering after its recognition by the European
-Commission in 2006\cite{cuijpers01,ec04}. After overcoming political issuses the Netherlands were above the European
-median in 2018, having replaced almost half of all meters\cite{cuijpers01,ec03}. Dutch smart meters are standardized by
-a consortium of distribution system operators. They integrate gateway and metrology functions into one device. The
-utility-facing interface is a IEC DLMS/COSEM-based interface over cellular radio such as GPRS or LTE\cite{aubel01}. Like
-e.g.\ the German standard, the Dutch standard precisely specifies all communication interfaces of the
-meter\cite{dsmrp3}. Another parallel is that the Dutch standard also does not cover any functionality for remotely
-disconnecting a household. This absence of a disconnect switch limits attacks on Dutch smart meters, too to causing
-billing irregularities.
-
-\subsubsection{The UK}
-
-The UK is currently undergoing a smart metering rollout. Meters in the UK are nationally standardized to provide both
-Zigbee ZSE-based and IEC DLMS/COSEM connectivity. UK smart metering specifications are shared between electrical and gas
-meters. Different to other countries' specifications the UK national specifications require electrical meters to have an
-integrated disconnect switch and gas meters to have an integrated valve. In Northern Ireland most consumers use prepaid
-electricity contracts\cite{anderson02}. Prepayment and credit functionality are also specified in the UK's national
-smart metering standard, as is remote firmware update functionality\cite{ukgov02}. Outside communications in these
-standards is performed through a gateway (there called \emph{communications hub}) that can be shared between several
-meters \cite{ukgov01,ukgov02,ukgov03,brown01,sato01}. The combination of both gas and electricity metering into one
-family of standards and the exceptionally large set of \emph{required} features make the UK regulations the maximalist
-option among the regulations in this section. The mandatory inclusion of both disconnect switches and remote
-connectivity up to remote firmware update make it an interesting attack target\cite{anderson01}.
-
-\subsubsection{Italy}
-
-Italy was among the first countries to legally mandate the widespread installation of smart meters in households. Italy
-in 2006 and 2007 by law set a starting date for the rollout in 2008\cite{brown01}. The Italian electricity market was
-recently privatized. While the wholesale market and transmission network privatization has advanced the vast majority of
-retail customers continued to use the incumbent distribution system operator ENEL as their supplier\cite{ec03}. This
-dominant position allowed ENEL to orchestrate the large-scale rollout of smart meters in Italy. Almost every meter in
-Italy had been replaced by a smart meter by 2018\cite{ec03}. An unique feature of the Italian smart metering
-infrastructure is that it relies on Power Line Communication (PLC) to bridge distances between meters and cellular radio
-gateways\cite{gungor01}.
-
-\subsubsection{Japan}
-
-Japan is currently rolling out smart metering infrastructure. Compared to other countries in Japan significant
-standardization effort has been spent on smart home integration\cite{usitc01,sato01,brown01}. Japan has domestic
-standards under its Japanese Industrial Standards organization (JIS) that determine metrology and physical dimensions.
-Tokyo utility company TEPCO is currently rolling out a deployment that is based on the IEC DLMS/COSEM standards suite
-for remote meter reading in conjuction with the Japanese ECHONET home-area network protocol. Smart meters are
-connected to TEPCO's backend systems through the customer's internet connection, sub-gigahertz radio based on 802.15.4
-framing, regular landline internet or PLC\cite{toshiba01,sato01}.
-
-A unique point in the Japanese utility metering landscape is that the current practice is monthly manual readings. In
-Japan residential utility meters are usually mounted outside the building on an exterior wall and every month someone
-with a mirror on a long stick will come and read the meter. The meter reader then makes a thermal paper print-out of the
-updated utility bill and puts it into the resident's post box. This practice gives consumers good control over their
-consumption but does incur significant personnel overhead.
-
-\subsubsection{The USA}
-
-In the USA the rollout of smart meters has been promoted by law as early as 2005. The US electricity market is highly
-complex with states having significant authority to decide on their own policies\cite{brown01}. Originally different
-from the IEC standards used in large fraction of the rest of the world the USA developed their own domestic set of
-standards for smart meters under the Americal National Standards Institute (ANSI)\cite{sato01}. Today ANSI is converging
-with the IEC on the protcol layer. An obvious feature of ANSI-standard meters is that they are round and plug into a
-wall-mounted socket while IEC devices are usually rectangular and connected directly to the mains wiring through large
-screw terminals\cite{ifixit01}.
-
-\subsection{Common themes}
-
-Researching the current situation around the world for the above sections we were able to distill some common themes.
-First, smart metering is slowly advancing on a global scale and despite significant reservations from privacy-conscious
-people and consumer advocates it seems it is here to stay. Still, there are some notable exceptions of countries that
-have decided to scale-back an ongoing rollout effort after subsequent analysis showed economical or other
-issues\footnote{cf.\ the Netherlands and Germany}.
-
-\subsubsection{The introduction of smart metering}
-
-The smart meter rollout is largely driven by utility companies. Utility companies field a variety of arguments for the
-rollout. The most prominent argument is a general increase in energy-efficiency along with a reduction of emissions.
-This argument is based on the estimation that smart metering will increase private customers' awareness of their own
-consumption and this will lead them to reduce their consumption. The second highly popular argument for smart metering
-is that it is necessary for the widespread adoption of renewable energies. This argument again builds on the trend
-towards green energy to rationalize smart metering. Interestingly this argument is often formulated as an inevitability
-instead of a choice.
-
-Academic reception of smart metering is dyed with an almost unanimous enthusiasm. In particular smart meter
-communication infrastructure has received a large amount of research
-attention\cite{dzung01,gungor01,kabalci01,lloret01,mahmood01,yan01,anderson01,anderson02}. Outside of human-computer
-interaction claims that smart meters will reduce customer energy consumption have often been uncritically accepted.
-
-\subsubsection{Standardization and reality of smart devices}
-
-Regulators, utilities and academics meet in their enthusiasm on the issue of smart home integration of smart metering. A
-feature of many concepts is that the meter acts as the centerpiece of a modern, fully integrated smart
-home\cite{aubel01,geelen01,bsi-tr-03109-1,abdallah01}. The smart meter serves as a communication hub between a new class
-of grid-aware loads and the utility company's control center. Large (usually thermal) loads such as dishwashers,
-refrigerators and air conditioners are expected to intelligently adapt their heating/cooling cycles to better match
-the grid's supply. A frequent scenario is one in which the meter bills the customer using near-real time pricing, and
-supplies large loads in the customer's household with this pricing information. These loads then intelligently schedule
-their operation to minimize cost\cite{sato01}. At the time between 2000 and 2005 when smart metering proposals were
-first advanced this vision might have been an effect of the \emph{law of the instrument}\cite{kaplan01,anderson02}. Back
-then outside of specialty applications household devices were not usually networked\cite{merz01}. Smart meters at the
-time may have seemed to be the obvious choice for a smart home communications hub.
-
-From today's perspective, this idea is obviously outdated. Smart \emph{things} now have found their way into many homes.
-Only these things are directly interconnected through the internet--foregoing the home-area network (HAN) technologies
-anticipated by smart metering pioneers. The simple reason for this is that nowadays anyone has Wifi, and Wifi
-transceivers have become inexpensive enough to disappear in the bill of materials (BOM) cost of a large home device such
-as a washing machine. Smart meters are usually situated in the basement--physically far away from most of one's devices.
-This makes connecting them to said devices awkward and connecting them via the local Wifi lends the question why the
-smart devices should not simply use the internet directly.
-
-Connecting things to a smart meter through a local bus is academically appealing. It promises cost-savings from a
-simpler physical layer (such as ZigBee instead of Wifi) and it neatly separates concerns into home infrastructure and
-the regular internet. Communication between smart meter and devices never leaves the house. This promises tolerance to
-utility backend systems breaking. It also physically keeps communication inside the house, bypassing the utility's eyes
-improving both customer privacy and agency. The presently popular model of a device as simple as a light bulb proxying
-its every action through a manufacturer's servers somewhere on the public internet is in stark contrast to this
-scenario. Alas, the reason that this model is as popular is that in most cases it simply works. Device manufacturers
-integrate one of many off-the-shelf Wifi modules. The resulting device will work anywhere on earth\footnote{For some
-places channel assignments may have to be updated. This is a configuration-level change and in some devices can be done
-by the end-user during provisioning.}. A HAN-connected device would have several variants with different modems for
-different standards. Some might work across countries, but some might not. And in some countries there might not even be
-a standard for smart grid HANs.
-
-Looking at the situation like this begs the question why this realization has not yet found its way into mainstream
-acceptance by smart metering implementors. The customer-facing functionality promised through smart meters would be
-simple to implement as part of a now-standard \emph{Internet of Things} application. An in-home display that shows
-real time energy consumption and cost statistics would simply be an Android tablet fetching summarized data from the
-utility's billing backend. Custom hardware for this purposes seems anachronistic today. Demand-side response by large
-loads would be as simple as an HTTPS request with a token identifying the customer's contract that returns the
-electricity price the meter is currently charging along with a recommendation to switch on or off. It seems the smart
-home has already arrived while smart metering is still getting off the starting blocks\cite{anderson02}.
-% TODO is this too critical? Is maybe the modern smart home compatible with smart meters? Is maybe the local-only path
-% of data, avoiding utility clouds a design feature? (may be true in DE, NL, probably not anywhere else)
-
-\section{Security in smart distribution grids}
-
-The smart grid in practice is nothing more or less than an aggregation of embedded control and measurement devices that
-are part of a large control system. This implies that all the same security concerns that apply to embedded systems in
-general also apply to most components of a smart grid. Where programmers have been struggling for decades now with input
-validation\cite{leveson01}, the same potential issue raises security concerns in smart grid scenarios as well\cite{mo01,
-lee01}. Only, in smart grid we have two complicating factors present: Many components are embedded systems, and as such
-inherently hard to update. Also, the smart grid and its control algorithms act as a large (partially-)distributed
-system making problems such as input validation or authentication harder\cite{blaze01} and adding a host of distributed
-systems problems on top\cite{lamport01}.
-
-Given that the electrical grid is essential infrastructure in our modern civilization, these problems amount to
-significant issues in practice. Attacks on the electrical grid may have grave consequences\cite{anderson01,lee01} while
-the long maintenance cycles of various components make the system slow to adapt. Thus, components for the smart grid
-need to be built to a much higher standard of security than most consumer devices to ensure they live up to well-funded
-attackers even decades down the road. This requirement intensifies the challenges of embedded security and distributed
-systems security among others that are inherent in any modern complex technological system. The safety-critical nature
-of the modern smart metering ecosystem in particular was quickly recognized by security experts\cite{anderson01}.
-
-A point we will not consider in much depth in this work is theft of electricity. An incentive for the introduction of
-smart metering that is frequently cited in utility industry publications outside of a general public's view is the
-reduction of electricity theft\cite{czechowski01}. Academic publications tend to either focus on other benefits such as
-generation efficiency gains through better forecasting or rationalize the consumer-unfriendly aspects of smart metering
-with ``enormous social benefits''\cite{mcdaniel01}. They do not usually point out the economical incentive such
-\emph{revenue protection} mechanisms provide\cite{anderson01,anderson02}.
-
-\subsection{Privacy in the smart grid}
-
-A serious issue in smart metering setups is customer privacy. Even though the meter ``only'' collects aggregate energy
-consumption of a whole household this data is highly sensitive\cite{markham01}. This counterintuitive fact was initially
-overlooked in smart meter deployments leading to outrage, delays and reduced features\cite{cuijpers01}. The root cause
-of this problem is that given sufficient timing resolution these aggregate measurements contain ample entropy. Through
-disaggregation algorithms individual loads can be identified and through pattern matching even complex usage patterns
-can be discerned with alarming accuracy\cite{greveler01}. Similar privacy issues arise in many other areas of modern
-life through pervasive tracking and surveillance\cite{zuboff01}. What makes the case of smart metering worse is that
-even the fig leaf of consent such practices often hide behind does not apply here. If a citizen does not consent to
-Google's privacy policy Google says they can choose not to use their service. In today's world this may not be a free
-choice thereby invalidating this argument but it is at least technically possible. Smart metering on the other hand is
-mandated by law and depending on the law a customer unwilling to accept the accompanying privacy violation may not be
-able to evade it\cite{bmwi04}.
-
-\subsection{Smart grid components as embedded devices}
-
-A fundamental challenge in smart grid implementations is the central role smart electricity meters play. Smart meters
-are used both for highly-granular load measurement and (in some countries) load switching\cite{zheng01}. Smart
-electricity meters are effectively consumer devices. They are built down to a certain price point that is measured by
-the burden it puts on consumers. The cost of a smart meter is ultimately limited by it being a major factor in the
-economies of a smart meter rollout\cite{bmwi03}. Cost requirements preclude some hardware features such as the use of a
-standard hardened software environment on a high powered embedded system (such as a hypervirtualized embedded linux
-setup) that would both increase resilience against attacks and simplify updates. Combined with the small market sizes in
-smart grid deploymentsthis results in a high cost pressure on the software development process for smart electricity
-meters. Most vendors of smart electricity meters only serve a handful of markets. A large fraction of smart meter
-development cost lies in the meter's software. Landis+Gyr, a large manufacturer that makes most of its revenue from
-utility meters in their 2019 annual report write that they \SI{36}{\percent} of their total R\&D budget on embedded
-software (firmware) while spending only \SI{24}{\percent} on hardware R\&D\cite{landisgyr01,landisgyr02}. There exist
-multiple competing standards applicable to various parts of a smart electricity meter and most countries have their own
-certification regimen\cite{cenelec01}. This complexity creates a large development burden for new market
-entrants\cite{perez01}.
-
-\subsection{The state of the art in embedded security}
-
-Embedded software security generally is much harder than security of higher-level systems. This is due to a combination
-of the unique constraints of embedded devices: Among others they are hard to update and usually produced in small
-quantities. They also lack capabilities compared to full computers. Processing power is limited and memory protection
-functions are spartan. Even well-funded companies continue to have trouble securing their embedded
-systems. A spectacular example of this difficulty is the recently-exposed flaw in Apple's iPhone SoC first-stage ROM
-bootloader\footnote{
- Modern system-on-chips integrate one or several CPUs with a multitude of peripherals, from memory and DMA
- controllers over 3D graphics accelerators down to general-purpose IO modules for controlling things like indicator
- LEDs. Most SoCs boot from one of several boot devices such as flash memory, Ethernet or USB according to a
- configuration set by pin-strapping configuration IOs or through write-only fuse bits.
-
- Physically, one of the processing cores of the SoC (usually one of the main CPU cores) is connected such that it is
- taken out of reset before all other devices, and is tasked with enabling and configuring all other peripherals of
- the SoC. In order to run later intialization code or more advanced bootloaders, this core on startup runs a very
- small piece of code hard-burned into the SoC in the factory. This ROM loader initializes the most basic peripherals
- such as internal SRAM memory and selects a boot device for the next bootloader stage.
-
- Apple's ROM loader measures only a few hundred bytes. It performs authorization checks to ensure only software
- authorized by Apple is booted. The present flaw allows an attacker to circumvent these checks and boot their own
- code on a USB-connected iPhone. This compromises Apple's chain of trust from ROM loader to userland right at its
- root. Since this is a flaw in the factory-programmed first stage read-only boot code of the SoC it cannot be patched
- in the field.
-}, that allows a full compromise of any iPhone before the iPhone X. iPhone 8, one of the affected models, was still
-being manufactured and sold by Apple until April 2020. In another instance in 2016 researchers found multiple flaws in
-the secure-world firmware used by Samsung in their mobile phone SoCs. The flaws they found were both severe
-architectural flaws such as secret user input being passed through untrusted userspace processes without any protection
-and shocking cryptographic flaws such as
-CVE-2016-1919\footnote{\url{http://cve.circl.lu/cve/CVE-2016-1919}}\cite{kanonov01}. And Samsung is not the only large
-multinational corporation having trouble securing their secure world firmware implementation. In 2014 researchers found
-an embarrassing integer overflow flaw in the low-level code handling untrusted input in Qualcomm's QSEE
-firmware\cite{rosenberg01}. For an overview of ARM TrustZone including a survey of academic work and past security
-vulnerabilities of TrustZone-based firmware see \cite{pinto01}.
-
-For their mass-market phones these companies have R\&D budgets that dwarf some countries' national budgets. If even
-they have trouble securing their secure embedded software stacks, what is a smart meter manufacturer to do? If a
-standard as in case of the German one requires IP gateways to speak TLS, a protocol that is notoriously tricky to
-implement correctly\cite{georgiev01}, the manufacturer is short on options to secure their product.
-
-Since thorough formal verification of code is not yet within reach for either large-scale software development or code
-heavy in side-effects such as embedded firmware or industrial control software\cite{pariente01} the two most effective
-measures for embedded security are reducing the amount of code on one hand, and labor-intensively reviewing and testing
-this code on the other hand. A smart meter manufacturer does not have a say in the former since it is bound by the
-official regulations it has to comply with, and will likely not have sufficient resources for the latter. We are left
-with an impasse: Manufacturers in this field likely do not have the security resources to keep up with complex standards
-requirements. At the same time they have no option to reduce the scope of their implementation to alleviate the burden
-on firmware security.
-
-\subsection{Attack avenues in the smart grid}
-
-If we model the smart grid as a control system responding to changes in inputs by regulating outputs, on a very high
-level we can see two general categories of attacks: Attacks that directly change the state of the outputs, and attacks
-that try to influence the outputs indirectly by changing the system's view of its inputs. The former would be an attack
-such as shutting down a power plant to decrease generation capacity\cite{lee01}. The latter would be an attack such as
-forging grid frequency measurements where they enter a power plant's control systems to provoke the control systems to
-oscillate\cite{kosut01,wu01,kim01}.
-
-\subsubsection{Communication channel attacks}
-
-Communication channel attacks are attacks on the communication links between smart grid components. This could be
-attacks on IP-connected parts of the core network or attacks on shared busses between smart meters and IP gateways in
-substations. Generally, these attacks can be mitigated by securing the aforementioned communication links using modern
-cryptography. IP links can be protected using TLS, and more low-level busses can be protected using more lightweight
-Noise\cite{perrin01}-based protocols.
-
-Cryptographic security transforms an attackers ability to read and manipulate communication contents into a mere denial
-of service attack. Thus, in addition to cryptographic security safety under DoS conditions must be ensured for continued
-system performance under attacks. This safety property is identical with the safety required to withstand random outages
-of components, such as communication link outages due to physical damage from storms, flooding etc\cite{sato01}. In
-general attacks at the meter level are hard to weaponize. Meters primarily serve billing purposes. The use of smart
-meter data for load forecasting is not yet common practice. Once it is this data will only be used to refine existing
-forecasting models that are based on aggregate data collected at higher vantage points in the distribution grid. This
-combination of smart metering data with more trusted aggregate data from sensors within the grid infrastructure limits
-the potential impact of a data falsification attack on smart meters. It also allows the utility to identify potentially
-corrupt meter readings and thus detect manipulation above a certain threshold. In order for an attack to have more
-far-reaching consequences the attacker would need to compromise additional grid infrastructure\cite{kim01,kosut01}.
-
-\subsubsection{Exploiting centralized control systems}
-
-The type of smart grid attack most often cited in popular discourse, and to the author's knowledge the only type that
-has so far been carried out in practice, is a direct attack on centralized control systems. In this attack, computer
-components of control systems are compromised by the same techniques used to compromise any other kind of computer
-system such as spearfishing, exploiting insecure services running on internet-exposed ports and using one compromised
-system to compromise other systems on the same ostensably secure internal network. These attacks are very powerful as
-they yield the attacker direct control over whatever outputs the compromised control systems are controlling. If an
-attacker manages to compromise the right set of control computers, they may even be able to cause physical
-damage\cite{lee01}.
-
-Despite their potentially large impact, these attacks are only moderately interesting from a scientific perspective. For
-one, their mitigation mostly consists of a straightforward application of decades-old security best practices. Though
-there is room for the implementation of genuinely new, power systems-specific security systems in this field, the general
-state of the art is lacking behind other fields of embedded security. From this background low-hanging fruit should take
-priority\cite{heise02}. Given political will these systems can readily be fortified. There is only a comparatively
-small number of them and having a technician drive to every one of them in turn to install a firmware security update is
-feasible.
-
-\subsubsection{Control function exploits}
-
-Control function exploits are attacks on the mathematical control loops used by the centralized control system. One
-example of this type of attack are resonance attacks as described in \cite{wu01}. In this kind of attack, inputs from
-peripheral sensors indicating grid load to the centralized control system are carefully modified to cause a
-disproportionately large oscillation in control system action. This type of attack relies on complex resonance effects
-that arise when mechanical generators are electrically coupled. These resonances, colloquially called ``modes'', are
-well-studied in power system engineering\cite{rogers01,grebe01,entsoe01,crastan03}. Even disregarding modern attack
-scenarios, for stability electrical grids are designed with measures in place to dampen any resonances inherent to grid
-structure. These resonances are hard to analyze since they require an accurate grid model and they are unlikely to be
-noticed under normal operating conditions.
-
-Mitigation of these attacks can be achieved by ensuring unmodified sensor inputs to the control systems in the first
-place. Carefully designing control systems not to exhibit exploitable behavior such as oscillations is also possible but
-harder.
-
-\subsubsection{Endpoint exploits}
-
-The one to us rather interesting attack on smart grid systems is someone exploiting the grid's endpoint devices such as
-smart electricity meters. These meters are deployed on a massive scale, with at least one meter per household on
-average\footnote{Households rarely share a meter but some households may have a separate meter for detached properties
-such as a detached garage or basement.}. Once compromised, restoration to an uncompromised state can be difficult if it
-requires physical access to thousands of devices in hard-to-access locations.
-
-By compromising smart electricity meters, an attacker can forge the distributed energy measurements these devices
-perform. In a best-case scenario, this might only affect billing and lead to customers being under- or over-charged if
-the attack is not noticed in time. In a less ideal scenario falsified energy measurements reported by these devices
-could impede the correct operation of centralized control systems.
-
-In some countries such as the UK smart meters have one additional function that is highly useful to an attacker: They
-contain high-current disconnect switches to disconnect the entire household or business in case electricity bills are
-left unpaid for a certain period. In countries that use these kinds of systems on a widespread level, the load
-disconnect switch is controlled by the smart meter's central microcontroller. This allows anyone compromising this
-microcontroller's firmware to actuate the disconnect switch at will. Given control over a large number of
-network-connected smart meters, an attacker might thus be able to cause large-scale disruptions of power
-consumption\cite{anderson01,temple01}. Combined with an attack method such as the resonance attack from \cite{wu01}
-that was mentioned above, this scenario poses a serious threat to grid stability.
-
-In places where Demand-Side Management (DSM) is common this functionality may be abused in a similar way. In DSM the
-smart metering system directly controls power to certain devices such as heaters. The utility can remotely control the
-turn-on and turn-off of these devices to smoothen out the load curve. In exchange the customer is billed a lower price
-for the energy consumed by these loads. DSM was traditionally done in a federated fashion usually through low-frequency
-PLC over the distribution grid\cite{dzung01}. Smart metering systems no longer require large, resource-intensive
-transmitters in substations and bear the potential for a rollout of such technology on a much wider scale than before.
-This leads to a potentially significant role of DSM systems in the impact calculation of an attack on a smart metering
-system. DSM does not control as much load capacity as remote disconnect switches do but the attacks cited in the above
-paragraph still fundamentally apply.
-
-\subsection{Practical threats}
-
-As a highly integrated system the electrical grid is vulnerable to attacks from several angles. One way to classify
-attacks is by their motivation. Along this axis we found the following motives:
-
-\begin{description}
- \item[Service disruption.] An attack aimed at disrupting service could e.g.\ aim at causing a blackout. It could
- also take aim in a more subtle way targeting a degradation of parameters such as power quality (voltage,
- frequency and waveform). It could target a particular customer, geographic area or all parts of the grid.
- Possible motivations range from a tennage hacker's boredom to actual cyberwar\cite{cleveland01,lee01}.
- \item[Commercial disruption.] Simple commercial motives already motivate a wide variety of attacks on grid
- infrastructure\cite{czechowski01}. Though generally mostly harmless from a cypersecurity point of view there are
- instances where these attacks put the lives of both the attacker and bystanders at grave risk\cite{anderson01}.
- Such attacks generally aim at the meter itself but a more sophisticated attacker might also target the
- utility's backend computer bureaucracy.
- \item[Data extraction.] The smart grid collects large amounts of data on both individual consumers and on an
- aggregate level. The privacy risk in individual consumer's data is obvious. On the web
- data collection practices ranging from questionable to flat-out illegal have widely proliferated for various
- purposes including election manipulation\cite{heise03}. Assuming criminals in this field would eschew
- fertile ground such as this due to legal or ethical concerns is optimistic. Taking the risk to individual
- customer's data out of the equation even aggregate data is still highly attractive to some. Aggregate real-time
- electricity usage data is a potential source on timely information on matters such as national social events
- (through TV set energy consumption\cite{greveler01}) or the state of the economy.
-\end{description}
-
-A factor to consider in all these cases is that one actor's attacks have the potential to weaken system security
-overall. An attacker might add new backdoors to gain persistence or they might disable existing mitigations to enable
-further steps of their attack.
-
-In this paper we will largely concentrate on attacks of the first type because they both have the most serious
-consequences and the most motivated attackers. Attackers that may want to disrupt service include nation state's
-cyberwar operations. This type of attacker is both highly skilled and highly funded.
-
-\subsection{Conclusion or, why we are doomed}
-
-We can conclude that a compromise of a large number of smart electricity meters cannot be ruled out. The complexity of
-network-connected smart meter firmware makes it exceedingly unlikely that it is in fact flawless. Large-scale
-deployments of these devices sometimes with disconnect relays make them an attractive target for attackers interested in
-causing grid instability. The attacker model for these devices includes nation states, who have considerable resources
-at their disposal.
-
-For a reasonable guarantee that no large-scale compromises of hard- and software built today will happen over a span of
-some decades, we would have to radically simplify its design and limit attack surface. Unfortunately, the complexity of
-smart electricity meter implementations mostly stems from the large list of requirements these devices have to conform
-with. Alas, the standards have already been written, political will has been cast into law and changes that reduce scope
-or functionality have become exceedingly unlikely at this point.
-
-A general observation with smart grid systems of any kind is that they comprise a departure from the federated
-control structure of yesterday's ``dumb'' grid and the advent of centralization to an enormous scale. This modern,
-centralized infrastructure has been carefully designed to defend against malicious actors and all involved parties have
-an interest in keeping it secure but in centralized systems scaling attacks is inherently easier than in decentralized
-systems\cite{anderson02}. An attacker can employ centralized control to their advantage. From this perspective the
-centralization of smart metering control systems--sometimes up to a national level\cite{anderson01,anderson02}--poses a
-security risk.
-
-\chapter{Restoring endpoint safety in an age of smart devices}
-
-As laid out in the previous section we cannot fully rule out a large-scale compromise of smart energy meters at some
-point in the long-term future. Instead we have to rephrase our claim to security. We cannot rule out exploitation: We
-have to limit its impact. Assuming that we cannot strip any functionality from smart meters all we can do is to flush
-out an attacker once they are in. Mitigation replaces prevention.
-
-In a worst-case scenario an attacker would gain unconstrained code execution e.g.\ by exploiting a flaw in a network
-protocol implentation. Smart meters use standard microcontrollers that do not have advanced memory protection functions
-(cf.\ Section \ref{sm-cpu}). We can assume the attacker has full control over the main microcontroller given any such
-flaw. With this control they can actuate the disconnect switch if present. They can transmit data through the device's
-communication interfaces or use the user interface components such as LEDs and the LCD. Using the self-programming
-capabilities of flash microcontrollers an attacker could even gain persistency. Note that in systems separating
-cryptographic functions into some form of cryptographic module\footnote{such as systems used in
-Germany\cite{bsi-tr-03109}.} we can be optimistic and assume the attacker has not yet compromised this cryptographic
-co-processor.
-
-With the meter's core microcontroller under attacker control we cannot use this microcontroller to restore control over
-the system. We have no way of ensuring the attacker does not simply delete a security mechanism we include in the core
-microcontroller's firmware. Theoretically a secure boot implementation could be used to ensure meters boot into a safe
-state after temporary power loss but we cannot rely on secure boot being present on every smart meter application
-controller. Nowadays secure boot is a standard feature in many SoC aimed at smartphones or smart TVs but it is still
-very uncommon in microcontrollers.
-
-Our solution to this problem is to add another smaller microcontroller to the smart meter design. This microcontroller
-will contain a small piece of software that receives cryptographically authenticated commands from utility companies. On
-demand it can reset the meter's core microcontroller to a known-good state. To reliably flush out an attacker from a
-compromised core microcontroller we re-program the core microcontroller in its entirety. We propose using JTAG to
-re-program the core microcontroller with a known-good firmware image read from a sufficiently large SPI flash connected
-to the reset controller. JTAG is supported by most microcontrollers complex enough to be used in a smart meter design.
-JTAG programming functionality can be ported to a new microcontroller with relatively little work.
-
-Our solution requires the core mircocontroller's JTAG interface to be activated (i.e. not fused-shut). For our solution
-to work the core microcontroller firmware must not be able to permanently disable the JTAG interface by itself. In
-microcontrollers that do not yet provide this functionality this is a minor change that could be added to a custom
-microcontroller variant at low cost. On most microcontrollers keeping JTAG open should not interfere with code readout
-protection\footnote{Readout protection usually forces a device to erase its program and data memories before allowing
-JTAG access.}. Code secrecy should be of no concern\cite{schneier01} here but some manufacturers have strong preferences
-due to a fear of copyright infringement.
-
-\section{The theory of endpoint safety}
-\label{sec_criteria}
-
-In order to gain anything by adding our reset controller to the smart meter's already complex design we must satisfy two
-interrelated conditions.
-\begin{enumerate}
-\item \emph{security} means our reset controller itself does not have any remotely exploitable flaws
-\item \emph{safety} menas our reset controller will perform its job as intended
-\end{enumerate}
-
-Note that our \emph{security} property includes only remote exploitation, and excludes any form of hardware attack.
-Even though most smart meters provide some level of physical security, we do not wish to make any assumptions on this.
-In the following section we will elaborate our attacker model and it will become apparent that sufficient physical
-security to defend against all attackers in our model would be infeasible, and thus we will design our overall system
-to remain secure even if we assume some number of physically compromised devices.
-
-\subsection{Attack characteristics}
-The attacker model the two above conditions must hold under is as follows. We assume three angles of attack: Attacks by the
-customer themselves, attacks by an insider within the metering systems controlling utility company and lastly attacks
-from third parties. Examples for these third parties are hobbyist hackers or outside cybercriminals on the one hand,
-but also other companies participating in the smart grid infrastructure besides the utility company such as intermediary
-providers of meter-reading services.
-
-Due to the critical nature of the electrical grid, we have to include hostile state actors in our attacker model. When
-acting directly, these would be classified as third-party attackers by the above schema, but they can reasonably be
-expected to be able to assume either of the other two roles as well e.g. through infiltration or bribery. In the
-generalized attacker model in \cite{fraunholz01} the authors give a classification of attacker types and provide a nice
-taxonomy of attacker properties. In their threat/capability rating, criminals are still considered to have higher threat
-rating than state-sponsored attackers. The New York Times reported in 2016 that some states recruit their hacking
-personnel in part from cybercriminals. If this report is true, in a worst-case scenario we have to assume a
-state-sponsored attacker to be the worst of both types. Comparing this against the other attacker types in
-\cite{fraunholz01}, this state-sponsored attacker is strictly worse than any other type in both variables. We are left
-with a highly-skilled, very well-funded, highly intentional and motivated attacker.
-
-Based on the above classification of attack angles and our observations on state-sponsored attacks, we can adapt
-\cite{fraunholz01} to our problem, yielding the following new attacker types:
-
-\begin{enumerate}
- \item \textbf{Utility company insiders controlled by a state actor.}
- We can ignore the other internal threats described in \cite{fraunholz01} since an insider coöperating with a
- state actor is strictly worse in every respect.
- \item \textbf{State-sponsored external attackers.}
- A state actor can directly attack the system through the internet and with proper operations security they do
- not risk exposure or capture.
- \item \textbf{Customers controlled by a state actor.}
- A state actor can very well compromise some customers for their purposes. They might either physically
- infiltrate the system posing as legitimate customers, or they might simply deceive or bribe existing customers
- into coöperation.
- \item \textbf{Regular customers.}
- A hostile state actor might gain control of some number of customers through means such as voluntary
- coöperation, bribery or infiltration but this limits the scale of an attack since an attacker has to avoid
- arousing premature attention. Though regular customers may not have the motivation, skill or resources of a
- state-sponsored attacker, potentially large numbers of them may try to attack a system out of financial
- incentives\cite{anderson01,czechowski01}. To allow for this possibility, we consider regular customers separate
- from state actors posing as customers.
-\end{enumerate}
-
-\subsection{Overall structural system security}
-
-Considering overall security, we first introduce the reset authority, a trusted party acting as the single authority for
-issuing reset commands in our system. In practice this trusted party may be part of the utility company, part of an
-external regulatory body or a hybrid setup requiring both to coöperate. We assume this party will be designed to be
-secure against all of the above attacker types. The precise design of this trusted party is out of scope for this work
-but we will provide some practical suggestions on how to achieve security below in Section \ref{sec-regulation}.
-
-Using an asymmetric cryptographic design centered around the reset authority, we rule out all attacks except for
-denial-of-service attacks on our system by any of the four attacker types. All reset commands in our system originate
-from the reset authority and are cryptographically secured to provide authentication and tamper detection. Under this
-model attacks on the electrical grid components between the reset authority and the customer device degrade into denial
-of service attacks. To ensure the \emph{safety} criterion from Section \ref{sec_criteria} holds we must make sure our
-cryptography is secure against man-in-the-middle attacks and we must try to harden the system against denial-of-service
-attacks by the attacker types listed above. Given our attacker model we cannot fully guard against this sort of attack
-but we can at least choose a communication channel that is resilient under the above model.
-
-Finally, we have to consider the issue of hardware security. We will solve the problem of physical attacks by simply not
-programming any secret information into devices. This also simplifies hardware production. We consider supply-chain
-attacks out-of-scope for this work.
-
-\subsection{Complex microcontroller firmware}
-
-The \emph{security} property from \ref{sec_criteria} is in a large part reliant on the security of our reset
-controller firmware. The best method to increase firmware security is to reduce attack surface by limiting external
-interfaces as much as possible and by reducing code complexity as much as possible. If we avoid the complexity of most
-modern microcontroller firmware we gain another benefit beyond implicitly reduced attack surface: If the resulting
-design is small enough we may even succeed in formal verification of our security properties. Though formal
-verification tools are not yet suitable for highly complex tasks they are already adequate for small amounts of code and
-simple interfaces.
-
-\subsection{Modern microcontroller hardware}
-
-Microcontrollers have gained enormously in both performance and efficiency as well as in peripheral support. Alas, these
-gains have largely been driven by insatiable customer demand for faster, more powerful chips and for the longest time
-security has not been considered important outside of some specific niches such as smartcards. A few years ago a
-microcontroller would spend its entire lifetime without ever being exposed to any networks\cite{anderson02}. Though this
-trend has been reversing with the increasing adoption of internet-of-things things and more advanced security features
-have started appearing in general-purpose microcontrollers, most still lack even basic functionality found in processors
-for computers or smartphones.
-
-One of the components lacking from most microcontrollers is strong memory protection or even a memory mapping unit as it
-is found in all modern computer processors and SoCs for applications such as smartphones. Without an MPU (Memory
-Protection Unit) or MMU (Memory Management Unit) many memory safety mitigations cannot be implemented. This and the
-absence of virtualization tools such as ARM's TrustZone make hardening microcontroller firmware a big task. It is very
-important to ensure memory safety in microcontroller firmware through tools such as defensive coding, extensive testing
-and formal verification.
-
-In our design we achieve simplicity on two levels: One, we isolate the very complex metering firmware from our reset
-controller by having both run on separate microcontrollers. Two, we keep the reset controller firmware itself extremely
-simple to reduce attack surface there. Our protocol only has one message type and no state machine.
-
-\subsection{Safety vs. security: Opting for restoration instead of prevention}
-
-By implementing our reset system as a physically separate microcontroller we sidestep most security issues around the
-main application microcontroller. There are some simple measures that can be taken to harden its firmware.
-Implementing industry best practices such as memory protection or stack canaries will harden the system and increase the
-cost of an attack but it will not yield a system that we can be confident enough in to say it is fully secure. The
-complexity of the main application controller firmware makes fully securing the system a formidable effort--and one that
-would have to be repeated by every meter vendor for every one of their code bases.
-
-In contrast to this our reset system does not provide any additional security. Any attack that could occur without it
-can still occur with it in place. What it provides is a fail-safe mechanism that can quickly immobilize a malicious
-actor mid-attack. It does this in a way that can be adapted to any meter architecture and any microcontroller platform
-with low effort since it relies on established standard interfaces such as JTAG and SWD. Concentrating research and
-development resources on a single platform like this allows for a system that is more economical to implement across
-device series and across vendors.
-
-Attack resilience in the power grid can benefit from a safety-focused approach. The greater threat such an attack poses
-is not the temporary denial of service of utility metering functions. Even in a highly integrated smart grid as
-envisioned by utility companies these measurement functions are used by utility companies to increase efficiency and
-reduce cost but are not necessary for the grid to function at all. Thus if we can provide mere \emph{safety} with a
-fail-safe semantic instead of unattainable perfect \emph{security} we have gained resilience against a large class of
-realistic attack scenarios.
-
-\subsection{Technical outline of a safety reset system}
-
-There are several ways our system could be practically implemented. The most basic way is to add a separate
-microcontroller connected to the meter's main application MCU and optionally other embedded microcontrollers such as
-modems. This discrete chip could either be placed on the metering board itself or it could be placed on a separate PCB
-connected to the programming interface(s) of the metering board. In certain cases the latter might allow its use in
-otherwise unmodified legacy designs.
-
-The safety reset controller would be a much simpler MCU than the meter's main application controller. Its software can
-be kept simple leading to low program flash and RAM requirements. Since it does not need to address rich periphery such
-as external parallel memory, LCDs etc.\ it can be a physically small, low-pin count device. If the main application
-controller is supposed to be reset to a full factory image with little or no reduced functionality its firmware image
-size is certainly too large for the reset controller's embedded flash. Thus a realistic setup would likely use an
-external SPI flash chip to store this image.
-
-The most likely interfaces to reset the main application controller and possibly other microcontrollers such as modem
-chips would be the controller's integrated programming port such as JTAG. Parallel high-voltage flash programming has
-come to be uncommon in modern microcontrollers and most nowadays use some form of a serial interface. There exist a
-variety of serial programming and debug interfaces but JTAG has grown to be by far the most broadly supported one and
-has largely displaced vendor-specific debug interfaces except for very small devices.
-
-The kind of microcontroller that would likely be used as the main application controller in a smart meter application
-will almost certainly support JTAG. These microcontrollers are high pin-count devices since they need to connect to a
-large set of peripherals such as the LCD and the large program flash makes it likely for a proper debugging interface to
-be present. The one remaining issue in this coarse technical outline is what communication interface should be used to
-transmit the trigger command to the reset controller. In the following section we will give an overview on communication
-interfaces established in energy metering applications and evaluate each of them for our purpose.
-
-\section{Communication channels on the grid}
-
-There is a number of well-established technologies for communication on or along power lines. We can distinguish three
-basic system categories: Systems using separate wires (such as DSL over landline telephone wiring), wireless radio
-systems (such as LTE) and \emph{power line communication} (PLC) systems that reüse the existing mains wiring and
-superimpose data transmissions onto the 50 Hz mains sine\cite{gungor01,kabalci01}.
-
-For our scenario, we will ignore short-range communication systems. There exists a large number of \emph{wideband}
-power line communication systems that are popular with consumers for bridging Ethernet segments between parts of an
-apartment or house. These systems transmit up to several hundred megabits per second over distances up to several tens
-of meters\cite{kabalci01}. Technologically, these wideband PLC systems are very different from \emph{narrowband}
-systems used by utilities for load management among other applications and they are not relevant to our analysis.
-
-\subsection{Power line communication (PLC) systems and their use}
-
-In long-distance communications for applications such as load management, PLC systems are attractive since they allow
-re-using the existing wiring infrastructure and have been used as early as in the 1930s\cite{hovi01}. Narrowband PLC
-systems are a potentially low-cost solution to the problem of transmitting data at small bandwidth over distances of
-several hundred meters up to tens of kilometers.
-
-Narrowband PLC systems transmit on the order of Kilobits per second or slower. A common use of this sort of system are
-\emph{ripple control} systems. These systems superimpose a low-frequency signal at some few hundred Hertz carrier
-frequency on top of the 50Hz mains sine. This low-frequency signal is used to encode switching commands for
-non-essential residential or industrial loads. Ripple control systems provide utilities with the ability to actively
-control demand while promising savings in electricity cost to consumers\cite{dzung01}.
-
-In any PLC system there is a strict trade-off between bandwidth, power and distance. Higher bandwidth requires higher
-power and reduces maximum transmission distance. Where ripple control systems usually use few transmitters to cover
-the entire grid of a regional distribution utility, higher bandwidth bidirectional systems used for automatic meter
-reading (AMR) in places such as Italy or France require repeaters within a few hundred meters of a transmitter.
-
-\subsection{Landline and wireless IP-based systems}
-
-Especially in automated meter reading (AMR) infrastructure the cost-benefit trade-off of power line systems does not
-always work out for utilities. A common alternative in these systems is to use the public internet for communication.
-Using the public internet has the advantage of low initial investment on the part of the utility company as well as
-quick commissioning. Disadvantages compared to a PLC system are potentially higher operational costs due to recurring
-fees to network providers as well as lower reliability. Being integrated into power grid infrastructure, a PLC system's
-failure modes are highly correlated with the overall grid. Put briefly, if the PLC interface is down, there is a good
-chance that power is out, too. In contrast general internet services exhibit a multitude of failures that are entirely
-uncorrelated to power grid stability. For purposes such as meter reading for billing purposes, this stability is
-sufficient. However for systems that need to hold up in crisis situations such as the recovery system we are
-contemplating in this thesis, the public internet may not provide sufficient reliability.
-
-\subsection{Short-range wireless systems}
-
-Smart meters contain copious amounts of firmware but still pale in comparison to the complexity of full-scale computers
-such as smartphones. For short-range communication between a meter and a cellular radio gateway mounted nearby or
-between a meter and a meter reading operator in a vehicle on the street a protocol such as Wifi (IEEE 802.11) is too
-complex. Absent widely-used standards in this space proprietary radio protocols grew attractive. These are often based
-on some standardized lower-level protocol such as ZigBee (IEEE 802.15) but entirely home-grown ones also exist. To the
-meter manufacturer a proprietary radio protocol has several advantages. It is easy to implement and requires no external
-certification. It can be customized to its specific application. In addition it provides vendor lock-in to customers
-sharing infrastructure such as a cellular radio gateway between multiple devices. In other fields a lack of
-standardization has led to a proliferation of proprietary protocols and a fragmented protocol landscape. This is a large
-problem since the consumer cannot easily integrate products made by different manufacturers into one system. In advanced
-metering infrastructure this is unlikely to be a disadvantage since usually there is only one distribution grid
-operator for an area. Shared resources such as a cellular radio gateway would most likely only be shared within a
-single building and usually they are all operated by the same provider.
-
-Systems in Europe commonly support Wireless M-Bus, an European standardized protocol\cite{silabs01} that operates on
-several ISM bands\footnote{
- Frequency bands that can be used for \emph{Industrial, Scientific and Medical} applications by anyone and that do
- not require obtaining a license for transmitter operation. Manufacturers can use whatever protocol they like on
- these bands as long as they obtain certification that their transmitters obey certain spectral and power
- limitations.
-}. ZigBee is another popular standard and some vendors additionally support their own proprietary protcols\footnote{
- For an example see \cite{honeywell01}.
-}.
-
-\subsection{Frequency modulation as a communication channel}
-
-For our system, we chose grid frequency modulation (henceforth GFM) as a low-bandwidth unidirectional broadcast
-communication channel. Compared to traditional PLC, GFM requires only a small amount of additional equipment, works
-reliably throughout the grid and is harder to manipulate by a malicious actor.
-
-Grid frequency in Europe's synchronous areas is nominally 50 Hertz, but there are small load-dependent variations from
-this nominal value. Any device connected to the power grid (or even just within physical proximity of power wiring) can
-reliably and accurately measure grid frequency at low hardware overhead. By intentionally modifying grid frequency, we
-can create a very low-bandwidth broadcast communication channel. Grid frequency modulation has only ever been proposed
-as a communication channel at very small scales in microgrids before\cite{urtasun01} and to our knowledge has not yet
-been considered for large-scale application.
-
-Advantages of using grid frequency for communication are low receiver hardware complexity as well as the fact that a
-single transmitter can cover an entire synchronous area. Though the transmitter has to be very large and powerful the
-setup of a single large transmitter faces lower bureaucratic hurdles than integration of hundreds of smaller ones into
-hundreds of local systems that each have autonomous governance.
-
-\subsubsection{The frequency dependency of grid frequency}
-
-Despite the awesome complexity of large power grids the physics underlying their response to changes in load and
-generation is surprisingly simple. Individual machines (loads and generators) can be approximated by a small number of
-differential equations and the entire grid can be modelled by aggregating these approximations into a large system of
-nonlinear differential equations. Evaluating these systems it has been found that in large power grids small signal
-steady state changes in generation/consumption power balance cause an approximately linear change in
-frequency\cite{kundur01,crastan03,entsoe02,entsoe04}. \emph{Small signal} here describes changes in power balance that
-are small compared to overall grid power. \emph{Steady state} describes changes over a time frame of multiple waveform
-cycles as opposed to transient events that only last a few milliseconds.
-
-This approximately linear relationship allows the specification of a coefficient with unit \si{\watt\per\hertz} linking
-power differential $\Delta P$ and frequency differential $\Delta f$. In this thesis we are using the European power
-grid as our model system. We are using data provided by ENTSO-E (formerly UCTE), the governing association of European
-transmission system operators. In our calculations we use data for the continental European synchronous area, the
-largest synchronous area. $\frac{\Delta P}{\Delta f}$, called \emph{Overall Network Power Frequency Characteristic} by
-ENTSO-E is around \SI{25}{\giga\watt\per\hertz}.
-
-We can derive general design parameter for any system utilizing grid frequency as a communication channel from the
-policies of ENTSO-E\cite{entsoe02,entsoe03}. Any such system should stay below a modulation amplitude of
-\SI{100}{\milli\hertz} which is the threshold defined in the ENTSO-E incidents classification scale for a Scale 0-1
-(from ``Anomaly'' to ``Noteworthy Incident'' scale) frequency degradation incident\cite{entsoe02} in the continental
-Europe synchronous area.
-
-\subsubsection{Control systems coupled to grid frequency}
-
-The ENTSO-E Operations Handbook Policy 1 chapter\cite{entsoe02} defines the activation threshold of primary control to
-be \SI{20}{\milli\hertz}. Ideally, a modulation system would stay well below this threshold to avoid fighting the
-primary control reserve. Modulation line rate should likely be on the order of a few hundred Millibaud. Modulation at
-these rates would outpace primary control action which is specified by ENTSO-E as acting within between ``a few
-seconds'' and \SI{15}{\second}.
-
-Keeping modulation amplitude below this threshold would help to avoid spuriously triggering these control functions.
-The effective \emph{Network Power Frequency Characteristic} of primary control in the European grid is reported by
-ENTSO-E at around \SI{20}{\giga\watt\per\hertz}. This works out to an upper bound on modulation power of
-\SI{20}{\mega\watt\per\milli\hertz}.
-
-\subsubsection{An outline of practical transmitter implementation}
-
-In its most basic form a transmitter for grid frequency modulation would be a very large controllable load connected to
-the power grid at a suitable vantage point. A spool of wire submerged in a body of cooling liquid such as a small lake
-along with a thyristor rectifier bank would likely suffice to perform this function during occasional cybersecurity
-incidents. We can however decrease hardware and maintenance investment even further compared to this rather
-uncultivated solution by repurposing regular large industrial loads as transmitters in an emergency situation. For some
-preliminary exploration we went through a list of energy-intensive industries in Europe\cite{ec01}. The most
-electricity-intensive industries in this list are primary aluminum and steel production. In primary production raw ore
-is converted into raw metal for further refinement such as casting, rolling or extrusion. In steelmaking iron is
-smolten in an electric arc furnace. In aluminum smelting aluminum is electrolytically extracted from alumina. Both
-processes involve large amounts of electricity with electricity making up \SI{40}{\percent} of production costs. Given
-these circumstances a steel mill or aluminum smelter would be good candidates as transmitters in a grid frequency
-modulation system.
-
-In aluminum smelting high-voltage mains is transformed, rectified and fed into about 100 series-connected electrolytic
-cells forming a \emph{potline}. Inside these pots alumina is dissolved in molten cryolite electrolyte at about
-\SI{1000}{\degreeCelsius} and electrolysis is performed using a current of tens or hundreds of Kiloampère. The resulting
-pure aluminum settles at the bottom of the cell and is tapped off for further processing.
-
-Like steelworks, aluminum smelters are operated night and day without interruption. Aside from metallurgical issues the
-large thermal mass and enormous heating power requirements do not permit power cycling. Due to the high costs of
-production inefficiencies or interruptions the behavior of aluminum smelters under power outages is a
-well-characterized phenomenon in the industry. The recent move away from nuclear power and towards renewable energy has
-lead to an increase in fluctuations of electricity price throughout the day. These electricity price fluctuations have
-provided enough economic incentive to aluminum smelters to develop techniques to modulate smelter power consumption
-without affecting cell lifetime or product quality\cite{duessel01,eisma01}. Power outages of tens of minutes up to two
-hours reportedly do not cause problems in aluminum potlines and are in fact part of routine operation for purposes such
-as electrode changes\cite{eisma01,oye01}.
-
-The power supply system of an aluminum plant is managed through a highly-integrated control system as keeping all cells
-of a potline under optimal operating conditions is challenging. Modern power supply systems employ large banks of diodes
-or SCRs\footnote{SCRs, also called thyristors, are electronic devices that are often used in high-power switching
-applications. They are normally-off devices that act like diodes when a current is fed into their control terminal.} to
-rectify low-voltage AC to DC to be fed into the potline\cite{ayoub01}. The potline voltage can be controlled almost
-continuously through a combination of a tap changer and a transductor. The individual cell voltages can be controlled by
-changing the anode to cathode distance (ACD) by physically lowering or raising the anode. The potline power supply is
-connected to the high voltage input and to the potline through isolators and breakers.
-
-In an aluminum smelter most of the power is sunk into resistive losses and the electrolysis process. As such an
-aluminum smelter does not have any significant electromechanical inertia compared to the large rotating machines used
-in other industries. Depending on the capabilities of the rectifier controls high slew rates are possible, permitting
-modulation at high\footnote{Aluminum smelter rectifiers are \emph{pulse rectifiers}. This means instead of simply
-rectifying the incoming three-phase voltage they use a special configuration of transformer secondaries and in some
-cases additional coils to produce a large number of equally spaced phases (e.g.\ six) from a standard three-phase input.
-Where a direct-connected three-phase rectifier would draw current in six pulses per mains voltage cycle a pulse
-rectifier draws current in more, smaller pulses to increase power factor. For example a 12-pulse rectifier will draw
-current in 12 pulses per cycle. In the best case an SCR pulse rectifier switched at zero crossing should allow
-\SIrange{0}{100}{\percent} load changes from one rectifier pulse to the next, i.e. within a fraction of a single cycle.}
-data rates.
-
-\subsubsection{Avoiding dangerous modes}
-
-Modern power systems are complex electromechanical systems. Each component is controlled by several carefully tuned
-feedback loops to ensure voltage, load and frequency regulation. Multiple components are coupled through transmission
-lines that themselves exhibit complex dynamic behavior. The overall system is generally stable, but may exhbit
-instabilities to particular small-signal stimuli\cite{kundur01,crastan03}. These instabilities, called \emph{modes},
-occur when due to mis-tuning of parameters or physical constraints the overall system exhibits oscillation at a
-particular frequency. \cite{kundur01} separates these modes into four categories:
-
-\begin{description}
- \item[Local modes] where a single power station oscillates in some parameter,
- \item[Interarea modes] where subsections of the overall grid oscillate with respect to each other due to weak
- coupling between them,
- \item[Control modes] caused by imperfectly tuned control systems and
- \item[Torsional modes] that originate from electromechanical oscillations in the generator itself.
-\end{description}
-
-The oscillation frequencies associated with each of these modes are usually between a few tens of Millihertz and a few
-Hertz\cite{grebe01,entsoe01,crastan03}. It is hard to predict the particular modes of a power system at the scale of the
-central European interconnected system. Theoretical analysis and simulation may give rough indications but cannot yield
-conclusive results. Due to the obvious danger as well as high economical impact due to inefficiencies experimental
-measurements are infeasible. Modes are highly dependent on the power grid's structure and will change with changes in
-the power grid over time. For all of these reasons, a grid frequency modulation system must be designed very
-conservatively without relying on the absence (or presence) of modes at particular frequencies. A concrete design
-guideline that we can derive from this situation is that the frequency spectrum of any grid frequency modulation system
-should not exhibit large peaks and should avoid a concentration of spectral energy in small frequency bands.
-
-\subsubsection{Overall system parameters}
-
-In conclusion we end up with the following tunable parameters for a grid frequency modulation based on a large
-controllable load:
-
-\begin{description}
- \item[Modulation amplitude.] Amplitude is proportionally related to modulation power. In a practical setup we might
- realize a modulation power up to a few hundred \si{\mega\watt} which would yield a few tens of \si{\milli\hertz}
- of frequency amplitude.
- \item[Modulation preemphasis and slew-rate control.] Preemphasis might be necessary to ensure an adequate
- Signal-to-Noise ratio (SNR) at the receiver. Slew-rate control and other shaping measures might be necessary to
- reduce the impact of these sudden load changes on the transmitter's primary function (say, aluminum smelting)
- and to prevent disturbances to other grid components.
- \item[Modulation frequency.] For a practical implementation a careful study would be necessary to determine the
- optimal frequency band for operation. On one hand we need to prevent disturbances to the grid such as the
- excitation of local or inter-area modes. On the other hand we need to optimize Signal-to-Noise ratio (SNR)
- and data rate to achieve optimal latency between transmission start and reset completion and to reduce the
- overall burden on both transmitter and grid.
- \item[Further modulation parameters.] The modulation itself has numerous parameters that are discussed in Section
- \ref{mod_params} below.
-\end{description}
-
-\section{From grid frequency to a reliable communication channel}
-Based on the physical properties oulined above we will provide the theoretical groundwork for a practical communication
-system based on grid frequency modulation.
-
-\subsection{Channel properties}
-In this section we will explore how we can construct a reliable communication channel from the analog primitive we
-have outlined in the previous section. Our load control approach to grid frequency modulation leads to a channel with the
-following properties.
-
-\begin{description}
- \item[Slow-changing.] Accurate grid frequency measurements take several periods of the mains sine wave. Faster
- sampling rates can be achieved with more complex specialized synchrophasor estimation algorithms but this will
- result in a trade-off between sampling rate and accuracy\cite{belega01}.
- \item[Analog.] Grid frequency is an analog signal.
- \item[Noisy.] While stable over long periods of time thanks to power stations' Load-Frequency Control
- systems\cite{entsoe04} there are considerable random short-term variations. Our modulation amplitude is limited
- by technical and economic constraints so we have to find a system that will work at poor SNRs.
- \item[Polarized.] Grid frequency measurements have an inherent sense of polarity that we can use in our modulation
- scheme.
-\end{description}
-
-\subsection{Modulation and its parameters}
-\label{mod_params}
-
-In this section we will analyze what makes for a good set of parameters for a modulation scheme fitting grid frequency
-modulation.
-
-As described before the grid's oscillatory modes mean that we should avoid any modulation technique that would
-concentrate energy in a small bandwidth. Taking this principle to its extreme provides us with a useful pointer towards
-techniques that might work well: Spread-spectrum techniques. By employing spread-spectrum modulation we can produce
-close to ideal frequency-domain behavior. Modulation energy is spread almost flatly across the modulation
-bandwidth\cite{goiser01}. At the same time we achieve modulation gain which increases system sensitivity. This
-modulation gain potentially allows us to use a weaker stimulus allowing for a further reduction of the probability of
-disturbance to the overall system. Spread-spectrum techniques also inherently allow us to trade-off receiver sensitivity
-for data rate. This tunability is a useful parameter in the overall system design.
-
-Spread spectrum covers a whole family of techniques that are comprehensively explained in \cite{goiser01}.
-\cite{goiser01} divides spread spectrum techniques into the coarse categories of \emph{Direct Sequence Spread Spectrum},
-\emph{Frequency Hopping Spread Spectrum} and \emph{Time Hopping Spread Spectrum}.
-
-In \cite{goiser01} a BPSK or similar modulation is assumed underlying the spread-spectrum technique. Our grid frequency
-modulation channel effectively behaves more like a DC-coupled wire than a traditional radio channel: Any change in
-excitation will cause a proportional change in the receiver's measurement. Using our FFT-based measurement methodology
-we get a real-valued signed quantity. In this way grid frequency modulation is similar to a channel using coherent
-modulation. We can utilize both signal strength and polarity in our modulation.
-
-For our purposes we can discount both Time and Frequency Hopping Spread Spectrum techniques. Time hopping helps to
-reduce interference between multiple transmitters but does not help with SNR any more than Direct Sequence does since
-all it does is allowing other transmitters to transmit. Our system is strictly limited to a single transmitter so we do
-not gain anything through Time Hopping.
-
-Frequency Hopping Spread Spectrum techniques require a carrier. Grid frequency modulation itself is very limited in
-peak frequency deviation $\Delta f$. Frequency hopping could only be implemented as a second modulation on top of GFM,
-but this would not yield any benefits while increasing system complexity and decreasing data bandwidth.
-
-Direct Sequence Spread Spectrum is the only remaining approach for our application. Direct Sequence Spread Spectrum
-works by directly modulating a long pseudo-random bit sequence onto the channel. The receiver must know the same
-pseudo-random bit sequence and continuously calculates the correlation between the received signal and the pseudo-random
-template sequence mapped from binary $[0, 1]$ to bipolar $[1, -1]$. The pseudo-random sequence has an approximately equal
-number of $0$ and $1$ bits. The positive contribution of the $+1$ terms of the correlation template approximately cancel
-out with the $-1$ terms when multiplied with an uncorrelated signal such as white Gaussian noise.
-
-By using a family of pseudo-random sequences with low cross-correlation channel capacity can be increased. Either the
-transmitter can encode data in the choice of sequence or multiple transmitters can use the same channel at once. The
-longer the pseudo-random sequence, the lower its cross-correlation with noise or other pseudo-random sequences of the
-same length. Choosing a long sequence we increase modulation gain while decreasing bandwidth. For any given application
-the sweet spot will be the shortest sequence that is long enough to yield sufficient SNR for subsequent processing
-layers such as channel coding.
-
-A popular code used in many DSSS systems are Gold codes. A set of Gold codes has small cross-correlations. For some
-value $n$ a set of Gold codes contains $2^n + 1$ sequences of length $2^n - 1$. Gold codes are generated from two
-different maximum length sequences generated by linear feedback shift registers (LFSRs). For any bit count $n$ there are
-certain empirically determined preferred pairs of LFSRs that produce Gold codes with especially good cross-correlation.
-The $2^n + 1$ gold codes are defined as the XOR sum of both LFSR sequences shifted from $0$ to $2^n-1$ bit as well as
-the two individual LFSR sequences. Given LFSR sequences \texttt{a} and \texttt{b} in numpy notation this is
-\mintinline{python}{[a, b] + [ a ^ np.roll(b, shift) for shift in len(b) ]}.
-
-In DSSS modulation the individual bits of the DSSS sequence are called \emph{chips}. Chip duration determines modulation
-bandwidth\cite{goiser01}. In our system we are directly modulating DSSS chips on mains frequency without an underlying
-modulation such as BPSK as it is commonly used in DSSS systems.
-
-\subsection{Error-correcting codes}
-
-To reduce reception error rate we have to layer channel coding on top of the DSSS modulation. The messages we expect to
-transmit are at least a few tens of bits long. We are highly constrained in SNR due to limited transmission power and
-with lower SNR comes higher BER (Bit Error Rate). At a fixed BER, packet error rate grows exponentially with
-transmission length so for our relatively long transmissions we would realistically get unacceptable error rates.
-
-Error correcting codes are a very broad field with many options for specialization. Since we are implementing only an
-advanced prototype in this thesis we chose to spend only limited resources on optimization and settled on a basic
-Reed-Solomon code. We have no doubt that applying a more state-of-the-art code we could gain further improvements in
-code overhead and decoding speed among others\cite{mackay01}. Since message length in our system limits system response
-time but we do not have a fixed target we can tolerate some degree of overhead. Decoding speed is of very low concern
-to us because our data rate is extremely low. We derived our implementation by adapting and optimizing an existing open
-source decoder that we validated on an open source encoder implementation. We generate test signals using a Python tool
-on the host.
-
-\subsection{Cryptographic security}
-\label{sec-crypto}
-Above the communication base layer elaborated in the previous section we have to layer a cryptographic protocol to
-ensure system security. We want to avoid a case where a third party could interfere with our system or even subvert this
-safety system itself for an attack. From a protocol security perspective the system we are looking for can informally
-be modelled as consisting of three parties: the trusted \emph{transmitter}, one of a large number of untrusted
-\emph{receivers}, and an \emph{attacker}. These three play according to the following rules:
-
-\begin{description}
- \item[Access.] Both transmitter and attacker can transmit any bit sequence.
- \item[Indistinguishability.] The receiver receives any transmission by either but cannot distinguish between them.
- \item[Kerckhoff's principle.] Since the protocol design is public and anyone can get access to an electricity meter
- the attacker knows anything any receiver might know\cite{kerckhoff01,kerckhoff02}.
- \item[Priority.] The transmitter is stronger than an attacker and will ``win'' during simultaneous transmission.
- \item[Seeding.] Both transmitter and receiver can be seeded out-of-band with some information on each other such as
- public key fingerprints.
-\end{description}
-
-We are not considering situations where an attacker attempts to jam an ongoing transmission. In practice there are
-several avenues to prevent such attempts. Compromised large loads that are being abused by the attacker can be manually
-disconnected by the utility. Error-correcting codes can be used to provide resiliency against small-scale disturbances.
-Finally, the transmitter can be designed to have high enough power to be able to override any likely attacker.
-
-With the above properties in mind our goal is to find a cryptographic primitive that has the following properties:
-\begin{description}
- \item[Authentication.] The transmitter can produce a message bit sequence that a certain subset of receivers can
- identify as being generated by the transmitter. On reception of this sequence, all addressed receivers perform a
- safety reset.
- \item[Unforgeability.] The attacker cannot forge a message, i.e.\ find a bit sequence other than one of the
- transmitter's previous messages that a receiver would accept. This implies that the attacker also cannot create
- a new distinct message from a previously transmitted message.
- \item[Brevity.] The message should be short. Our communication channel is outrageously slow compared to anything
- else used in modern telecommunications and every bit counts.
-\end{description}
-
-On a protocol level we also have to ensure \emph{idempotence}. Our system should have an at-most-once semantic. This
-means for a given message each receiver either performs exactly one safety reset or none at all, even if the message is
-re-transmitted by either the transmitter or an attacker. We cannot achieve the ideal exactly-once semantic wit pure
-protocol gymnastics since we are using an unidirectional lossy communication primitive. A receiver might be offline
-(e.g.\ due to a local power outage) and then would not hear the transmission even if our broadcast primitive was
-reliable. Since there is no back channel, the transmitter has no way of telling when that happens. The practical impact
-of this can be mitigated by the transmitter repeating the message a number of times.
-
-It follows from the unforgeability requirement that we can trivially reach idempotence at the protocol level by keeping
-a database of all previous messages and only accepting new messages. By considering this in our cryptographic design we
-can reduce the storage overhead of this ``database''.
-
-Along with the indistinguishability property the access requirement implies that we need a cryptographic
-signature\cite{lamport01}. However, we have relaxed constraints on this signature compared to standard cryptographic
-practice\cite{anderson04}. While cryptographic signatures need to work over arbitrary inputs, all we want to ``sign''
-here is the instruction to perform a safety reset. This is the only message we might ever want to transmit so our
-message space has only one element. The information content of our message thus is 0 bit! All the information we want to
-transmit is already encoded \emph{in the fact that we are transmitting} and we do not require a further payload to be
-transmitted: We can omit the entirety of the message and just transmit whatever ``signature'' we
-produce\cite{haller01,rfc1760}. This is useful to conserve transmission bits so our transmission does not take an
-exceedingly long time over our extremely slow communication channel.
-
-We can modify this construction to allow for a small number of bits of information content in our message (say two or
-three instead of zero) at no transmission overhead by transmitting the cryptographic signature as usual but simply
-omitting the message. The message contains only a few bits of information and we are dealing with minutes of
-transmission time so the receiver can reconstruct the message through brute-force. Though this trade-off between
-computation and data transmission might seem inelegant it does work for our extremely slow link for up to a few bits of
-information.
-
-There is an important limitation in the rules of our setup above: The attacker can always record the reset bit sequence
-the transmitter transmits and replay that same sequence later. Even without cryptography we can trivially prevent an
-attacker from violating the at-most-once criterion. If every receiver memorizes all bit sequences that have been
-transmitted so far it can detect replays. With this mitigation by replaying an older authentic transmission an attacker
-can cause receivers that were offline during the original transmission to reset at a later point. Considering our goal
-is to reset them in the first place this should not pose a threat to the system's safety or security.
-
-A possible scenario would be that an attacker first causes enough havoc for authorities to trigger a safety reset. The
-attacker would record the trigger transmission. We can assume most meters were reset during the attack. Due to this the
-attacker cannot cause a significant number of additional resets immediately afterwards. However, the attacker could
-wait several years for a number of new meters to be installed that might not yet have updated firmware that includes the
-last transmission. This means the attacker could cause them to reset by replaying the original sequence.
-
-A possible mitigation for this risk would be to introduce one bit of information into the trigger message that is
-ignored by the replay protection mechanism. This \emph{enable} bit would be $1$ for the actual reset trigger message.
-After the attack the transmitter would then perform scheduled transmissions of a ``disarm'' message that has this bit
-set to $0$. This message informs all new meters and meters that were offline during the original transmission of the
-original transmission for replay protection without actually performing any further resets.
-
-We could use any of several traditional asymmetric cryptographic primitives to produce these signatures. The
-comparatively high computational effort required for signature verification would not be an issue. Transmissions take
-several minutes anyway and we can afford to spend some tens of seconds even in signature verification. Transmission
-length and by proxy system latency would be determined by the length of the signature. For RSA signature length is the
-modulus length (i.e. larger than \SI{1000}{bit} for very basic contemporary security). For elliptic curve-based systems
-curve length is approximately twice the security level and signature size is twice the curve length because two curve
-points need to be encoded\cite{anderson02}. For contemporary security this results in more than 300 bit transmission
-length. We can exploit our unique setting's low message entropy to improve on this by basing our scheme on a
-cryptographic hash function used as a one-way pseudo-random function (PRF). Hash-based signature schemes date back to
-the very beginnings of cryptographic signatures\cite{anderson04,diffie01,lamport02}. Today, in general applications
-schemes based on asymmetric cryptography are preferred but hash-based signature systems have their applications in
-certain use cases. One example of such a scheme is the TESLA scheme\cite{perrig01} that is the basis for navigation
-message authentication in the European Galileo global navigation satellite system. Here, a system based purely on
-asymmetric primitives would result in too much computation and communication overhead\cite{ec05}. In the following
-sections we will introduce the foundations of hash-based signatures before deriving our authentication scheme.
-
-\subsubsection{Lamport signatures}
-
-1979, Lamport in \cite{lamport02} introduced a signature scheme that is based only on a one-way function such as a
-cryptographic hash function. The basic observation is that by choosing a random secret input to a one-way function and
-publishing the output, one can later prove knowledge of the input simply by publishing it. In the following paragraphs
-we will describe a construction of a one-time signature scheme based on this observation. The scheme we describe is the
-one usually called a ``Lamport Signature'' in modern literature but is slightly different from the variant described in
-the 1979 paper. For our purposes we can consider both to be equivalent.
-
-\paragraph{Setup.} In a Lamport signature, for an n-bit hash function $H$ the signer generates a private key $s =
-\left(s_{b, i} | b\in\left\{0, 1\right\}, 0\le i<n\right)$ of $2n$ random strings of length $n$. The signer publishes a
-public key $p = \left(p_{b, i} = H\left(s_{b, i}\right), b\in\left\{0, 1\right\}, 0\le i<n\right)$ that is simply the
-list of hashes of each of the random strings that make up the private key.
-
-\paragraph{Signing.} To sign a message $m$, the signer publishes the signature $\sigma = \left(\sigma_i = k_{H(m)_i,
-i}\right)$ where $H(m)_i$ is the $i$-th bit of $H$ applied to $m$. That is, for the $i$-th bit of the message's hash
-$H(m)$ the signer publishes either of $p_{0, i}$ or $p_{1, i}$ depending on the hash bit's value, keeping the other
-entry of $P$ secret.
-
-\paragraph{Verification.} The verifier can compute $H(m)$ themselves and check the corresponding entries $\sigma_i =
-k_{H(m)_i}$ of $S$ correctly evaluate to $p_{b, i} = H\left(s_{b, i}\right)$ from $P$ under $H$.
-
-The above scheme is a one-time signature scheme only. After one signature has been published for a given key, the
-corresponding key must not be reüsed for other signatures. This is intuitively clear as we are effectively publishing
-part of the private key as the signature, and if we were to publish a signature for another message an attacker could
-derive additional signatures by ``mixing'' the two published signatures.
-
-\subsubsection{Winternitz signatures}
-
-An improvement to basic Lamport signatures as described above are Winternitz signatures as detailed in
-\cite{merkle01,dods01}. Winternitz signatures reduce public key length as well as signature length for hash length $n$
-from $2n$ to $\mathcal O \left(n/t\right)$ for some choice of parameter $t$ (usually a small number such as 4).
-
-\paragraph{Setup.} The signer generates a private key $s = \left(s_i\right)$ consisting of $\ceil{\frac{n}{t}}$ random
-bit strings. The signer publishes a public key $p = \left(H^{2^t}\left(s_i\right)\right)$ where each element
-$H^{2^t}\left(s_i\right)$ is the $2^t$-fold recursive application of $H$ to $s_i$.
-
-\paragraph{Signing.} The signer splits $m$ padded to a multiple of $t$ bits into $\ceil{\frac{n}{t}}$ chunks $m_i$ of
-$t$ bit each. The signer publishes the signature $\sigma = \left( \sigma_i = H^{m_i}\left(s_i\right) \right)$.
-
-\paragraph{Verification.} The verifier can calculate for each $\sigma_i = H^{m_i}\left(s_i\right)$ that $H^{2^t -
-m_i}\left(\sigma_i\right) = H^{2^t - m_i}\left(H^{m_i}\left(s_i\right)\right) = H^{2^t - m_i + m_i} \left(s_i\right) =
-p_i$.
-
-To prevent an attacker from forging additional signatures from one signature by calculating $\sigma_i' =
-H\left(\sigma_i\right)$ matching $m_i' = m_i + 1$, this scheme is usually paired with a simple checksum as described in
-\cite{merkle01}.
-
-\subsubsection{Using hash-based signatures for trigger authentication}
-
-Applying these concepts the most basic trigger authentication scheme possible would be to simply generate a random
-secret key bit string $s$ and publish $p = H(s)$ for some hash function $H$. To activate the trigger, $\sigma = s$ is
-published and receivers verify that $H(\sigma) = p = H(s)$. This simplistic scheme has one main disadvantage: It is a
-fundamentally one-time construction. To prevent an attacker from re-triggering a receiver a second time by replaying a
-valid trigger $\sigma$ all receivers have to blacklist any ``used'' $\sigma$. Alas, this means we can only ever trigger
-a receiver \emph{once}. The good part is that any receiver that missed this trigger can still be triggered later, but
-the bad part is that once $s$ is burned we are out of options. The trivial solution to this would be to simply provision
-each receiver with a whole list of public keys in advance. This however takes $n$ times the amount of space for $n$-fold
-retriggerability and for each one we have to memorize separately whether it has been used up. Luckily we can easily
-derive a scheme that yields $n$-fold retriggerability and naturally memorizes replay state while using no more space
-than the original scheme by taking some inspiration from Winternitz signatures.
-
-In this improved scheme the secret key $s$ is still a random bit string. The public key is $p = H^n(s)$ for $n$-times
-retriggerability. The $i$-th time the trigger is activated, $\sigma_i = H^{n-i}(s)$ is published, and every receiver
-can verify that $\sigma_{i-1} = H\left(\sigma_i\right)$ with $\sigma_0 = p$. In case a receiver missed one or more
-previous triggers it continues computing $H\left(H\left(\sigma_i\right)\right)$ and
-$H\left(H\left(H\left(\sigma_i\right)\right)\right)$ and so on until either reaching the $n$-th recursion
-level--indicating an invalid signature--or finding $H^n\left(\sigma_i\right) = \sigma_j$ with $\sigma_j$ being the last
-signature this receiver recorded or $p$ in case there is none.
-
-This scheme provides replay protection since the receiver memorizes the last signature they acted on. Public key length
-is equal to the length of the hash function $H$ used. Even for our embedded systems use case $n$ can realistically be up
-to $\mathcal O\left(10^3\right)$, which is enough for our purposes. This use of a hash chain for event authentication is
-identical to the one in the S/KEY one-time password system\cite{anderson04,haller01,rfc1760}.
-% 1990ies crypto yeah!
-
-The ``disarm'' message we discussed above for replay protection can be integrated into this scheme by encoding the
-``enable'' bit into the least significant bit of $n$ in our $H^n$ construction. In the chain of valid signatures every
-second one would be a disarm signature: Reset and disarm signatures would alternate in this scheme. By skipping a disarm
-signature two resets can still be triggered directly after one another.
-
-In practice it may be useful to have some control over which meters reset. An attack exploiting a particular network
-protocol implementation flaw might only affect one series of meters made by one manufacturer. Resetting \emph{all}
-meters may be too much in this case. A simple solution for this is to define addressable subsets of meters. ``All
-meters'' along with ``meters made by manufacturer $x$'' and ``meters of model $y$'' are good choices for such scopes. On
-the cryptographic level the protocol state is simply duplicated for each scope. This incurs memory and computation
-overhead linear in the number of scopes but device memory requirements are small at a few bytes only and computation is
-of no concern due to the very slow channel so this simple solution is adequate. The transmitter has to either store
-copies of all scope's keys or derive these keys from a root key using the scope's identifier. Keys are small and the
-transmitter would be using a regular server or hardware security module for key management so either easily feasible.
-
-A diagram of the key structure in this key management scheme is shown in Figure \ref{fig:sig_key_chain}. The
-transmitter key management is shown in Figure \ref{fig:tx_scope_key_illu}. This scheme is simplistic but suffices for
-our prototype in Section \ref{sec-prototype} and may even be useful in a practical implementation. During
-standardization of a safety reset system the key management system would most likely have to be customized to the
-particular application's requirements. Developing an universal solution is outside the scope of this work.
-
-\begin{figure}
- \centering
- \begin{minipage}[c]{0.5\textwidth}
- \includegraphics{resources/signature_key_chain}
- \end{minipage}
- \begin{minipage}[c]{0.45\textwidth}
- \caption{
- The hash chain between secret transmitter key and public device key. Each step represents one invocation of the
- hash function. To generate a new chain a random transmitter key is generated, then hashed $n$ times to
- generate the corresponding device key. A new trigger message can be generated by generating the key at depth
- $m-1$ where $m$ is the height of the last used trigger, or $n$ initially. Every second trigger message is a
- disarm message and every second one a reset message. Depending on which is needed either one may be skipped.
- }
- \label{fig:sig_key_chain}
- \end{minipage}
-\end{figure}
-
-\begin{figure}
- \centering
- \includegraphics[width=\textwidth]{resources/transmitter_scope_key_illustration}
- \caption{
- An illustration of a key management system using a common master key. First, the transmitter derives one secret
- key for each addressable group from the master key. Then public device keys are generated like in Figure
- \ref{fig:sig_key_chain}. Finally for each device the manufacturer picks the group public keys matching the
- device. In this example one device is a series A meter made by manufacturer B so it gets provisioned with the
- keys for the ``all devices'', ``manufacturer B'' and ``series A'' groups. The other device is also made by
- manufacturer B but is a series C device so it gets provisioned with the ``all devices'', ``manufacturer B'' and
- ``series C'' device keys. In this example the transmitter stores (or is able to derive) all six shown
- group keys, but each device only needs to store the three applying to it--one for each of the three scopes ``all
- devices'', ``manufacturer'' and ``series''.
- }
- \label{fig:tx_scope_key_illu}
-\end{figure}
-
-\chapter{Practical implementation}
-
-To validate the practical feasibility of the theoretical concepts we laid out in the previous chapter we decided to
-build a prototype of a safety reset controller. In this section we describe the reasoning behind the components of this
-prototype and the engineering that went into its firmware. The prototype consists of a smart meter whose application
-microcontroller is reset by a microcontroller on an external circuit board. We lay out how we extensively
-tested all parts of our firmware implementation. We conclude with results of a practical end-to-end experiment
-exercising every part of our prototype.
-
-\section{Data collection for channel validation}
-
-To design a solid system we needed to parametrize mains frequency variations under normal conditions. To set modulation
-amplitude as well as parameters of our modulation scheme we need a frequency spectrum of mains frequency variations
-(that is $\mathcal F\left(f(V(t))\right)$: Taking mains frequency $f(x)$ as a variable, the frequency spectrum of that
-variable, as opposed to the frequency spectrum of mains voltage $V(t)$ itself).
-
-\subsection{Grid frequency estimation}
-\label{frequency_estimation}
-
-In commercial power systems Phasor Measurement Units (PMUs, also called \emph{synchrophasors}) are used to precisely
-measure parameters of the mains voltage waveform, one of which is grid frequency. PMUs are used as part of SCADA systems
-controlling transmission networks to characterize the operational state of the network.
-
-From a superficial viewpoint measuring grid frequency might seem like a simple problem. Take the mains voltage waveform,
-measure time between two rising-edge (or falling-edge) zero-crossings and take the inverse $f = t^{-1}$. In practice,
-phasor measurement units are significantly more complex than this. This discrepancy is due to the combination of both
-high precision and quick response that is demanded from these units. High precision is necessary since variations of
-mains frequency under normal operating conditions are quite small--in the range of \SIrange{5}{10}{\milli\hertz} over
-short intervals of time. Relative to the nominal \SI{50}{\hertz} this is a derivation of less than \SI{100}{ppm}.
-Relative to the corresponding period of \SI{20}{\milli\second} this means a time derivation of about $2 \mu\text{s}$
-from cycle to cycle. From this it is already obvious why a simplistic measurement cannot yield the required precision
-for manageable averaging times: We would need either an ADC sampling rate in the order of megabits per second or for a
-reconstruction through interpolated readings an impractically high ADC resolution.
-
-Detail on the inner workings of commercial phasor measurement units is scarce but given their essential role to SCADA
-systems there is a large amount of academic research on such algorithms\cite{narduzzi01,derviskadic01,belega01}. A
-popular approach to these systems is to perform a Short-Time Fourier Transform (STFT) on ADC data sampled at high
-sampling rate (e.g. \SI{10}{\kilo\hertz}) and then perform analysis on the frequency-domain data to precisely locate the
-peak at \SI{50}{\hertz}. A key observation here is that FFT bin size is going to be much larger than required frequency
-resolution. This fundamental limitation follows from the Nyquist criterion\cite{shannon01}
-and if we had to process an \emph{arbitrary} signal this would severely limit our practical measurement accuracy
-\footnote{
- Some software packages providing FFT or STFT primitives such as scipy\cite{virtanen01} allow the user to
- super-sample FFT output by specifying an FFT width larger than input data length, padding the input data with zeros
- on both sides. Note that in line with the Nyquist theorem this \emph{does not} actually provide finer output
- resolution but instead just amounts to an interpolation between output bins. Depending on the downstream analysis
- algorithm it may still be sensible to use this property of the DFT for interpolation, but in general it will be
- computationally expensive compared to other interpolation methods and in any case it will not yield any better
- frequency resolution aside from a potential numerical advantage\cite{gasior02}.
-}.
-For this reason all approaches to grid frequency estimation are based on a model of the voltage waveform. Nominally
-this waveform is a perfect sine at $f=\SI{50}{\hertz}$. In practice it is a sine at $f\approx\SI{50}{\hertz}$
-superimposed with some aperiodic noise (e.g. irregular spikes from inductive loads being energized) as well as harmonic
-distortion that is caused by topologically nearby devices with power factor $\cos \theta \neq 1.0$. Under a continuous
-fourier transform over a long period the frequency spectrum of a signal distorted like this will be a low noise floor
-depending mainly on aperiodic noise on which a comb of harmonics as well as some sub-harmonics of $f \approx
-f_\text{nom} = \SI{50}{\hertz}$ is riding. The main peak at $f \approx f_\text{nom}$ will be very strong with the
-harmonics being approximately an order of magnitude weaker in energy and the noise floor being at least another order of
-magnitude weaker. See Figure \ref{mains_voltage_spectrum} for a measured spectrum. This domain knowledge about the
-expected frequency spectrum of the signal can be employed in a number of interpolation techniques to reconstruct the
-precise frequency of the spectrum's main component despite distortions and the comparatively coarse STFT resolution.
-
-Published grid frequency estimation algorithms such as \cite{narduzzi01,derviskadic01} are rather sophisticated and use
-a combination of techniques to reduce numerical errors in FFT calculation and peak fitting. Given that we do not need
-reference standard-grade accuracy for our application we chose to start with a very basic algorithm instead. We chose to
-use a general approach to estimate the precise fundamental frequency of an arbitrary signal that was published by
-experimental physicists Gasior and Gonzalez at CERN\cite{gasior01}. This approach assumes a general sinusoidal signal
-superimposed with harmonics and broadband noise. Applicable to a wide spectrum of practical signal analysis tasks it is
-a reasonable first-degree approximation of the much more sophisticated estimation algorithms developed specifically for
-power systems. Some algorithms use components such as kalman filters\cite{narduzzi01} that require a physical model.
-As a general algorithm \cite{gasior01} does not require this kind of application-specific tuning, eliminating one source
-of error.
-
-The Gasior and Gonzalez algorithm\cite{gasior01} passes the windowed input signal through a DFT, then interpolates the
-signal's fundamental frequency by fitting a wavelet such as a Gaussian to the largest peak in the DFT results. The bias
-parameter of this curve fit is an accurate estimation of the signal's fundamental frequency. This algorithm is similar
-to the simpler interpolated DFT algorithm used as a reference in much of the synchrophasor estimation
-literature\cite{borkowski01}. The three-term variant of the maximum side lobe decay window often used there is a
-Blackman window with parameter $\alpha = \frac{1}{4}$. Analysis has shown\cite{belega01} that the interpolated DFT
-algorithm is worse than algorithms involving more complex models under some conditions but that there is \emph{no free
-lunch} meaning that more complex perform worse when the input signal deviates from their models.
-
-\subsection{Frequency sensor hardware design}
-
-\label{sec-fsensor}
-Our safety reset controller will have to measure mains frequency to later demodulate a reset signal transmitted through
-it. Since we have decided to do our own frequency measurement system here we can reüse this frequency measurement setup
-as a prototype for the frequency measurement component of the demodulation system we will develop later. Since we do
-not plan to do a large-scale field deployment of our measurement setup we can keep the hardware implementation simple by
-moving most of the signal processing to a regular computer and concentrating our hardware efforts on raw signal capture.
-
-\begin{figure}
- \begin{center}
- \begin{tikzpicture}[start chain = going below, node distance = 12mm and 50mm, every join/.style = {norm}]
- \tikzset{
- base/.style = {draw, on chain, on grid, align=center, minimum height = 4ex, font=\footnotesize},
- text/.style = {base},
- component/.style = {base, rectangle, text width=40mm},
- coord/.style = {coordinate, on chain, on grid, node distance=6mm and 25mm}
- }
- \node[text centered] (input) {Single phase mains input};
- \node[component] (safety) [below = of input] {Input protection};
- \node[coord] (safety-anchor) [below = of safety] {};
- \node[component] (analog) [below = of safety-anchor] {Analog signal processing};
- \node[component] (powersupply) [left = of analog] {Power supply};
- \node[component] (adc) [below = of analog] {ADC};
- \node[component] (micro) [below = of adc] {Microcontroller};
- \node[component] (isol) [below = of micro] {Galvanic digital isolation};
- \node[coord] (isol-left) [left = 6cm of isol.west] {};
- \node[coord] (isol-right) [right = 1cm of isol.east] {};
- \node[component] (usb) [below = of isol] {USB interface};
-
- \draw[->] (input.south) -- (safety.north);
- \draw[-] (safety.south) -- (safety-anchor);
- \draw[->] (safety-anchor) -| (powersupply.north);
- \draw[->] (safety-anchor) -| (analog.north);
- \draw[->] (powersupply.south) |- (adc.west);
- \draw[->] (powersupply.south) |- (micro.west);
- \draw[->] (analog.south) -- (adc.north);
- \draw[->] (adc.south) -- (micro.north);
- \draw[->] (micro.south) -- (isol.north);
- \draw[->] (isol.south) -- (usb.north);
-
- \draw[dashed] (isol.west) -- (isol-left.east);
- \draw[dashed] (isol.east) -- (isol-right.west);
- \end{tikzpicture}
- \end{center}
- \caption{Frequency sensor hardware block diagram.}
- \label{fmeas-sens-diag}
-\end{figure}
-
-An overall block diagram of our system is shown in Figure \ref{fmeas-sens-diag}. The microcontroller we chose is an
-\texttt{STM32F030F4P6} ARM Cortex M0 microcontroller made by ST Microelectronics. The ADC in Figure
-\ref{fmeas-sens-diag} in our implementation is the integrated 12-bit ADC of this microcontroller, which is sufficient
-for our purposes. The USB interface is a simple USB to serial converter IC (\texttt{CH340G}) and the galvanic digital
-isolation is accomplished with a pair of high speed optocouplers on its \texttt{RX} and \texttt{TX} lines. The analog
-signal processing is a simple voltage divider using high power resistors to get the required creepage along with some
-high frequency filter capacitors and an op-amp buffer. The power supply is an off-the-shelf mains-input power module.
-The system is implemented on a single two-layer PCB that is housed in an off-the-shelf industrial plastic case fitted
-with a printed label and a few status lights on its front. The schematics of our system can be found in Appendix
-\ref{sec-app-freq-sens-schematics}.
-
-\subsection{Clock accuracy considerations}
-
-Our measurement hardware will sample line voltage at some sampling rate $f_S$, e.g.\ \SI{1}{\kilo\hertz}. All downstream
-processing is limited in accuracy by the accuracy of $f_S$\footnote{
-We are not considering the effect of clock jitter. We are highly oversampling the signal and the FFT done in our
-downstream processing will average out small jitter effects leaving only frequency stability to worry about. }. We
-generate our sampling clock in hardware by clocking the ADC from one of the microcontroller's timer blocks clocked from
-the microcontroller's system clock. This means our ADC's sampling window will be synchronized cycle-accurate to the
-microcontroller's system clock.
-
-Our downstream estimation of mains frequency by nature is relative to our sampling frequency $f_S$. In the setup
-described above this means we have to make sure our system clock is stable. A frequency deviation of \SI{1}{ppm} in our
-system clock causes a proportional grid frequency measurement error of $\Delta f = f_\text{nom} \cdot 10^{-6} =
-\SI{50}{\micro\hertz}$. In a worst-case scenario where our system is clocked from a particularly bad crystal that
-exhibits \SI{100}{ppm} of instabilities over our measurement period we end up with an error of \SI{5}{\milli\hertz}.
-This is well within our target measurement range, so we need a more stable clock source. Ideally we want to avoid
-writing our own clock conditioning code where we try to change an oscillators operating frequency to match some
-reference. Clock conditioning algorithms are complex\cite{ti01} and in our case post processing of measurement data and
-simply adding an offset is simpler and less error-prone.
-
-Our solution to these problems is to use a crystal oven\footnote{
- A crystal oven is a crystal oscillator closely thermally coupled to a heater and temperature sensor and enclosed in
- a thermally isolated case. The heater is controlled to hold the crystal oscillator at a near constant temperature
- some tens of degrees Celsius above ambient temperature. Ambient temperature variations will be absorbed by the
- temperature control. This yields a crystal frequency that is almost completely unaffected by ambient temperature
- variations below the oven temperature and whose main remaining instability is aging.
-}as our main system clock source. Crystal ovens are expensive compared to ordinary crystal oscillators. Since any
-crystal oven will be much more accurate than a standard room-temperature crystal we chose to reduce cost by using one
-recycled from old telecommunications equipment.
-
-To verify clock accuracy we routed an externally accessible SMA connector to a microcontroller pin that is routed to one
-of the microcontroller's timer inputs. By connecting a GPS 1pps signal to this pin and measuring its period we can
-calculate our system's Allan variance\footnote{
- Allan variance is a measure of frequency stability between two clocks.
-}, thereby measuring both clock stability and clock accuracy.
-We ran a 4 hour test of our frequency sensor that generated the histogram shown in Figure \ref{ocxo_freq_stability}.
-These results show that while we get a systematic error of about \SI{10}{ppm} due to manufacturing tolerances the
-random error at less than \SI{10}{ppb} is smaller than that of a room-temperature crystal oscillator by 3-4 orders of
-magnitude. Since we are interested in grid frequency variations over time but not in the absolute value of grid
-frequency the systematic error is of no consequence to us. The random error at \SI{3.66}{ppb} corresponds to a
-frequency measurement error of about \SI{0.2}{\micro\hertz}, well below what we can achieve at reasonable sampling rates
-and ADC resolution.
-
-\begin{figure}
- \centering
- \includegraphics{../lab-windows/fig_out/ocxo_freq_stability}
- \caption{OCXO Frequency derivation from its nominal \SI{19.440}{\mega\hertz} frequency measured against a GPS
- receiver's 1pps reference output.}
- \label{ocxo_freq_stability}
-\end{figure}
-
-\subsection{Firmware implementation}
-
-The firmware uses one of the microcontroller's timers clocked from an external crystal oscillator to produce an
-\SI{1}{\milli\second} tick that the internal ADC is triggered from for a sample rate of \SI{1}{\kilo sps}. Higher sample
-rates would be possible but reliable data transmission over the opto-isolated serial interface might prove challenging
-and \SI{1}{\kilo sps} already corresponds to $20$ samples per cycle at $f_\text{nominal}$. This figure exceeds the
-Nyquist criterion by a factor of ten and is plenty for accurate measurements.
-
-The ADC measurements are read using DMA and written into a circular buffer. Using DMA controller features this
-circular buffer is split in back and front halves with one being written to and the other being read at the same time.
-Buffer contents are moved from the ADC DMA buffer into a packet-based reliable UART interface as they come in. The UART
-packet interface keeps two ring buffers: One byte-based ring buffer for transmission data and one ring buffer pointer
-structure that keeps track of ADC data packet boundaries in the byte-based ring buffer. Every time a chunk of data is
-available from the ADC the data is framed into the byte-based ring buffer and the packet boundaries are logged in the
-packet pointer ring buffer. If the UART transmitter is idle at this time a DMA-backed transmission of the oldest packet
-in the packet ring buffer is triggered at this point. Data is framed using Consistent Overhead Byte Stuffing
-(COBS)\footnote{
-COBS is a framing technique that allows encoding $n$ bytes of arbitrary data into exactly $n+1$ bytes with no embedded
-$0$ bytes that can then be delimited using $0$ bytes. COBS is simple to implement and allows both one pass decoding and
-encoding. The encoder either needs to be able to read up to \SI{256}{\byte} ahead or needs a buffer of \SI{256}{\byte}.
-COBS is very robust in that it allows self-synchronization. At any point a receiver can reliably synchronize itself
-against a COBS data stream by waiting for the next $0$ byte. The constant overhead allows precise bandwidth and buffer
-planning and provides constant, good efficiency close to the theoretical maximum.}\cite{cheshire01} along with a
-CRC-32 checksum for error checking. When the host receives a new packet with a valid checksum it returns an
-acknowledgement packet to the sensor. When the sensor receives the acknowledgement, the acknowledged packet is dropped
-from the transmission packet ring buffer. When the host detects an incorrect checksum it simply stays quiet and waits for
-the sensor to resume with retransmission when the next ADC buffer has been received.
-
-The serial interface logic presents most of the complexity of the sensor firmware. This complexity is necessary since
-we need reliable, error-checked transmission to the host. Though rare, bit errors on a serial interface do happen and
-data corruption is unacceptable. The packet layer queueing on the sensor is necessary since the host is not a realtime
-system and unpredictable latency spikes of several hundred milliseconds are possible.
-
-The host in our recording setup is a Raspberry Pi 3 model B running a Python script. The Python script handles serial
-communication and logs data and errors into an SQLite database file. SQLite has been chosen for its simple yet flexible
-interface and its good tolerance of system resets due to unexpected power loss. Overall our setup performed adequately
-with IO contention on the Raspberry PI/Linux side causing only 16 skipped sample packets over a 68 hour recording span.
-
-\subsection{Frequency sensor measurement results}
-
-\begin{figure}
- \centering
- \begin{minipage}[c]{0.48\textwidth}
- \includegraphics{resources/grid_meas_device_front.jpg}
- \end{minipage}
- \begin{minipage}[c]{0.48\textwidth}
- \includegraphics{resources/grid_meas_device_open.jpg}
- \end{minipage}
- \vspace*{3mm}
- \caption{
- The finished grid frequency sensor device. The large yellow part on the bottom left is the crystal oven. The
- large black part is the power supply module. The microcontroller is on the bottom right of the device and the
- measurement circuit is in its middle. The device connects to the data recording computer via galvanically
- isolated USB on the bottom and to a regular wall socket through the IEC connector on the top of the device.
- }
- \label{pic_freq_sensor}
-\end{figure}
-
-Our completed frequency sensor can be seen in Figure \ref{pic_freq_sensor}. The raw voltage waveform data we captured
-with it has been processed in the Jupyter Lab environment\cite{kluyver01} and grid frequency estimates are extracted as
-described in Section \ref{frequency_estimation} using the Gasior and Gonzalez\cite{gasior01} technique. The Jupyter
-notebook we used for frequency measurement is included with the supplementary materials to this thesis. In Figure
-\ref{freq_meas_feedback} we fed back to the frequency estimator its own output giving us an indication of its numerical
-performance. The result was \SI{1.3}{\milli\hertz} of RMS noise over a \SI{3600}{\second} simulation time. This
-indicates performance is good enough for our purposes. In addition to this we validated our algorithm's performance by
-applying it to the test waveforms from \cite{wright01}. In this test we got errors of \SI{4.4}{\milli\hertz} for the
-\emph{noise} test waveform, \SI{0.027}{\milli\hertz} for the \emph{interharmonics} test waveform and
-\SI{46}{\milli\hertz} for the \emph{amplitude and phase step} test waveform. Full results can be found in Figure
-\ref{freq_meas_rocof_reference}.
-
-Figures \ref{freq_meas_trace} and \ref{freq_meas_trace_mag} show our measurement results over a 24-hour and a 2-hour
-window respectively.
-
-\begin{figure}
- \centering
- \includegraphics[width=\textwidth]{../lab-windows/fig_out/freq_meas_feedback}
- \caption{
- The frequency estimation algorithm applied to a synthetic noise-less mains waveform generated from its own
- output. This feedback simulation gives an indication of numerical errors in our estimation algorithm. The top
- four graphs show a comparison of the original trace (blue) and the re-calculated trace (orange). The bottom
- trace shows the difference between the two. As we can tell both traces agree very well with an overall RMS
- deviation of about \SI{1.3}{\milli\hertz}. The bottom trace shows deviation growing over time. This is an effect
- of numerical errors in our ad hoc waveform generator.
- }
- \label{freq_meas_feedback}
-\end{figure}
-
-\begin{figure}
- \centering
- \includegraphics[width=\textwidth]{../lab-windows/fig_out/freq_meas_rocof_reference}
- \caption{
- Performance of our frequency estimation algorithm under the test suite specified in \cite{wright01}. Shown are
- standard deviation and variance measurements as well as time-domain traces of absolute differences.
- }
- \label{freq_meas_rocof_reference}
-\end{figure}
-
-\begin{figure}
- \centering
- \includegraphics[width=\textwidth]{../lab-windows/fig_out/freq_meas_trace_24h}
- \caption{Trace of grid frequency over a 24 hour time span. One clearly visible feature are large positive and negative
- transients at full hours. Times shown are UTC. Note that the European continental synchronous area that this
- sensor is placed in covers several time zones which may result in images of daily load peaks appearing in 1 hour
- intervals. Figure \ref{freq_meas_trace_mag} contains two magnified intervals from this plot.}
- \label{freq_meas_trace}
-\end{figure}
-
-\begin{figure}
- \begin{subfigure}{\textwidth}
- \centering
- \includegraphics[width=\textwidth]{../lab-windows/fig_out/freq_meas_trace_2h_1}
- \caption{A 2 hour window centered on 00:00 UTC.}
- \end{subfigure}
- \begin{subfigure}{\textwidth}
- \centering
- \includegraphics[width=\textwidth]{../lab-windows/fig_out/freq_meas_trace_2h_2}
- \caption{A 2 hour window centered on 18:30 UTC.}
- \end{subfigure}
- \caption{Two magnified 2 hour windows of the trace from Figure \ref{freq_meas_trace}.}
- \label{freq_meas_trace_mag}
-\end{figure}
-
-\begin{figure}
- \centering
- \includegraphics[width=\textwidth]{../lab-windows/fig_out/mains_voltage_spectrum}
- \caption{Power spectral density of the mains voltage trace in Figure \ref{freq_meas_trace}. Data was captured using
- our frequency measurement sensor (\ref{sec-fsensor}) and FFT-processed after applying a Blackman window. The
- vertical lines indicate \SI{50}{\hertz} and odd harmonics. We can see the expected peak at \SI{50}{\hertz} along
- with smaller peaks at odd harmonics. We can also see a number of spurious tones both between harmonics and at low
- frequencies. We can also see bands containing high noise energy around \SI{0.1}{\hertz}. This graph shows a high
- signal-to-noise ratio that is not very demanding on our frequency estimation algorithm.
- }
- \label{mains_voltage_spectrum}
-\end{figure}
-
-\section{Channel simulation and parameter validation}
-\label{sec-ch-sim}
-
-To validate all layers of our communication stack from modulation scheme to cryptography we built a prototype
-implementation in Python. Implementing all components in a high level language builds up familiarity with the concepts
-while taking away much of the implementation complexity. For our demonstrator we will not be able to use Python since
-our target platform is an inexpensive low-end microcontroller. Our demonstrator firmware will have to be written in a
-low-level language such as C or Rust. For prototyping these languages lack flexibility compared to Python.
-
-To validate our modulation scheme we first performed a series of simulations on our Python demodulator prototype
-implementation. To simulate a modulated grid frequency signal we added noise to a synthetic modulation signal. For most
-simulations we used measured frequency data gathered with our frequency sensor. We only have a limited amount of capture
-data. Re-using segments of this data as background noise in multiple simulation runs could lead to our simulation
-results depending on individual features of this particular capture that would be common between all runs. To estimate
-the impact of this problem we re-ran some of our simulations with artificial random noise synthesized with a power
-spectral density matching that of our capture. To do this, we first measured our capture's PSD, then fitted a
-low-resolution spline to the PSD curve in log-log coördinates. We then generated white noise, multiplied the resampled
-spline with the DFT of the synthetic noise and performed an iDFT on the result. The resulting time-domain signal is our
-synthetic grid frequency data. Figure \ref{freq_meas_spectrum} shows the PSD of our measured grid frequency signal. The
-red line indicates the low-resolution log-log spline interpolation used for shaping our artificial noise. Figure
-\ref{simulated_noise_spectrum} shows the PSD of our simulated signal overlaid with the same spline as a red line and
-shows time-domain traces of both simulated (blue) and reference signals (orange) at various time scales. Visually both
-signals look very similar, suggesting that we have found a good synthetic approximation of our measurements.
-
-\begin{figure}
- \centering
- \hspace*{-1.2cm}\includegraphics[width=1.2\textwidth]{../lab-windows/fig_out/freq_meas_spectrum}
- \caption{Power spectral density of the 24 hour grid frequency trace in Figure \ref{freq_meas_trace} with some notable
- peaks annotated with the corresponding period in seconds. The $\frac{1}{f}$ line indicates a pink noise spectrum.
- Around a period of \SI{20}{\second} the PSD starts to fall off at about $\frac{1}{f^3}$ until we can make out some
- bumps at periods around $2$ and \SI{3}{\second}. Starting at at around \SI{1}{Hz} we can see a white noise floor in
- the order of \si{\micro\hertz^2\per\hertz}.
- % TODO: where does this noise floor come from? Is it a fundamental property of the grid? Is it due to limitations of
- % our measurement setup (such as ocxo stability/phase noise) ???
- }
- \label{freq_meas_spectrum}
-\end{figure}
-
-\begin{figure}
- \centering
- \hspace*{-1.2cm}
- \includegraphics[width=1.2\textwidth]{../lab-windows/fig_out/simulated_noise_spectrum}
- \caption{Synthetic grid frequency in comparison with measured data. The topmost graph shows the synthetic spectrum
- compared to the spline approximation of the measured spectrum (red line). The other graphs show time-domain
- synthetic data (blue) in comparison with measured data (orange).
- }
- \label{simulated_noise_spectrum}
-\end{figure}
-
-In our simulations, we manipulated four main variables of our modulation scheme and demodulation algorithm and observed
-their impact on symbol error rate (SER):
-
-\begin{description}
- \item[Modulation amplitude.] Higher amplitude corresponds to a lower SER.
- \item[Modulation bit count.] Higher bit count $n$ means longer transmissions but yields higher theoretical decoding
- gain, and should increase demodulator sensitivity. Ultimately, we want to find a sweet spot of manageable
- transmission length at good demodulator sensitivity.
- \item[Decimation or DSSS chip duration.] The chip time determines where in the grid frequency spectrum (Figure
- \ref{freq_meas_spectrum}) our modulated signal is located. Given our noise spectrum (Figure
- \ref{freq_meas_spectrum}) lower chip durations (shifting our signal upwards in the spectrum) should yield lower
- in-band background noise which should correspond to lower symbol error rates.
- \item[Demodulation correlator peak threshold factor.] The first step of our prototype demodulation algorithm is to
- calculate the correlation between all $2^n+1$ Gold sequences and our signal and to identify peaks corresponding
- to the input data containing a correctly aligned Gold sequence. The threshold factor determines peaks of which
- magnitude compared to baseline noise levels are considered in the following maximum likelihood estimation (MLE)
- decoding (cf.\ Figure \ref{fig_demo_sig_schema}).
-\end{description}
-
-Our results indicate that symbol error rate is a good proxy of demodulation performance. With decreasing signal-to-noise
-ratio, margins in various parts of the demodulator decrease which statistically leads to an increased symbol error rate.
-Our simulations yield smooth, reproducible SER curves with adequately low error bounds. This shows SER is related
-monotonically to the signal-to-noise margins inside our demodulator prototype.
-
-\subsection{Sensitivity as a function of sequence length}
-
-A basic parameter of our DSSS modulation is the length of the Gold codes used. The length of a Gold code is exponential
-in the code's bit count. Figure \ref{dsss_gold_nbits_overview} shows a plot of the symbol error rate of our demodulator
-prototype depending on amplitude for each of five, six, seven and eight bit Gold sequences. In regions where symbol
-error rate is neither clipping at $0$ nor at $1$ we can see the expected dependency that a $n+1$ bit Gold sequence at
-roughly twice the length yields roughly one half the SER. We can also observe a saturation effect: At low amplitudes,
-increasing the correlation length does not yield much benefit in SER anymore. In particular at a signal amplitude of
-\SI{2.5}{\milli\hertz} even with asymptotically infinite sequence length our demodulator would still not be able to
-produce a good demodulation. This is likely due to numerical errors in our demodulator. Since Gold codes of more than 7
-bit would yield unacceptably long transmission times this does not pose a problem in practice.
-
-Figure \ref{dsss_gold_nbits_sensitivity} for each bit count shows the minimum signal amplitude at which our demodulator
-crossed below $\text{SER}=0.5$. If we have sufficient transmitter power to allocate selecting either a 5 bit or a 6 bit
-Gold code yields sufficient performance at manageable data rates.
-
-\begin{figure}
- \centering
- \includegraphics[width=0.6\textwidth]{../lab-windows/fig_out/dsss_gold_nbits_overview}
- \caption{
- Symbol Error Rate (SER) as a function of transmission amplitude. The line represents the mean of several
- measurements for each parameter set. The shaded areas indicate one standard deviation from the mean. Background
- noise for each trial is a random segment of measured grid frequency. Background noise amplitude is the same for
- all trials. Shown are four traces for four different DSSS sequence lengths. Using a 5-bit gold code, one DSSS
- symbol measures 31 chips. 6 bit per symbol are 63 chips, 7 bit are 127 chips and 8 bit 255 chips. This
- simulation uses a decimation of 10, which corresponds to an $1 \text{s}$ chip length at our $10 \text{Hz}$ grid
- frequency sampling rate. At 5 bit per symbol, one symbol takes $31 \text{s}$ and one bit takes $6.2 \text{s}$
- amortized. At 8 bit one symbol takes $255 \text{s} = 4 \text{min} 15 \text{s}$ and one bit takes $31.9 \text{s}$
- amortized. Here, slower transmission speed buys coding gain. All else being equal this allows for a decrease
- in transmission power.
- }
- \label{dsss_gold_nbits_overview}
-\end{figure}
-
-\begin{figure}
- \centering
- \begin{minipage}[c]{0.5\textwidth}
- \hspace*{-1cm}\includegraphics[width=1.1\textwidth]{../lab-windows/fig_out/dsss_gold_nbits_sensitivity}
- \end{minipage}\begin{minipage}[c]{0.45\textwidth}
- \caption{
- Amplitude at an SER of 0.5\ in mHz depending on symbol length. Here we can observe an increase of sensitivity
- with increasing symbol length, but we can clearly see diminishing returns above 6 bit (63 chips). Considering
- that each bit roughly doubles overall transmission time for a given data length it seems lower bit counts are
- preferrable if the required transmitter power can be realized.
- }
- \label{dsss_gold_nbits_sensitivity}
- \end{minipage}
-\end{figure}
-
-\subsection{Sensitivity versus peak detection threshold factor}
-
-One of the high level parameters of our demodulation algorithm is the \emph{threshold factor}. This parameter is
-an implementation detail specific to our algorithm and not general to all possible DSSS demodulation algorithms. After
-correlating the input signal against the template Gold sequences our algorithm runs a single channel discrete wavelet
-transform (DWT) on the correlator output to better discriminate peaks from background noise. The output of this DWT is
-then normalized against a running average and then fed into a simple threshold detector. The threshold of this detector
-is our threshold factor. This threshold is the ratio that a correlation peak after DWT has to stand out from long-term
-average background noise to be considered a peak.
-
-The threshold factor is an empirically determined unitless parameter. Low threshold factors yield many false positives
-that in the extreme ultimately overload our MLE estimator's capacity to discard them. Moderate numbers of false
-positives do not pose much of a challenge to our MLE since these spurious peaks have a random time distribution and are
-easily discarded by our MLE's detection of sequences of equally-spaced symbols. High threshold factors lead the
-algorithm to completely ignore some valid peaks. To some degree this can be compensated by our later interpolation step
-for missing peaks but in the extreme will also break demodulation. In our simulations good values lie in the range from
-$4.0$ to $5.5$.
-
-Figure \ref{dsss_thf_amplitude_5678} contains plots of demodulator sensitivity like the one in Figure
-\ref{dsss_gold_nbits_overview}. This time there is one color-coded trace for each threshold factor between $1.5$ and
-$10.0$ in steps of $0.5$. We can see a clear dependency of demodulation performance from threshold factor with both very
-low and very high values breaking the demodulator. The runaway traces that we can see at low threshold factors are
-artifacts of an implementation issue with our prototype code. We later fixed this issue in the demonstrator firmware
-in Section \ref{sec-demo-fw-impl}. For comparison purposes this issue do not matter.
-
-\begin{figure}
- \centering
- \hspace*{-1cm}\includegraphics[width=1.2\textwidth]{../lab-windows/fig_out/dsss_thf_amplitude_5678}
- \caption{
- SER vs.\ amplitude graph similar to Figure \ref{dsss_gold_nbits_overview} with one color-coded traces for
- threshold factors between $1.5$ and $10.0$. Each graph shows traces for a single DSSS symbol length.
- }
- \label{dsss_thf_amplitude_5678}
-\end{figure}
-
-If we again look at the intercept points where the amplitude traces cross $\text{SER}=0.5$ in these graphs we get the
-plots in Figure \ref{dsss_thf_sensitivity_all_bits}. From this we can conclude that the range between $4.0$ and $5.0$ will
-yield adequate threshold factors for our use case.
-
-\begin{figure}
- \centering
- \hspace*{-1cm}\includegraphics[width=1.1\textwidth]{../lab-windows/fig_out/dsss_thf_sensitivity_5678}
- \caption{
- Graphs of amplitude at $SER=0.5$ for each symbol length as well as asymptotic SER for large amplitudes. Areas
- shaded red indicate that $SER=0.5$ was not reached for any amplitude in the simulated range. The bumps in the 7
- bit and 8 bit graphs are due to the convergence problem we identified above and do not exist in our demonstrator
- implementation. We see that smaller symbol lengths favor lower threshold factors, and that optimal threshold
- factors for all symbol lengths are between $4.0$ and $5.0$.
- }
- \label{dsss_thf_sensitivity_all_bits}
-\end{figure}
-
-\subsection{Chip duration and bandwidth}
-
-A parameter of any DSSS system is the frequency band used for transmission. Instead of specifying absolute frequencies
-in our simulations we expressed DSSS bandwidth through chip duration and Gold sequence length. In our prototype, chip
-duration is specified in grid frequency sampling periods to ease implementation without loss of generalization.
-
-Figure \ref{chip_duration_sensitivity} shows the dependence of symbol error rate at a fixed good threshold factor from
-chip duration. The color bars indicate both chip duration translated to seconds real-time and the resulting symbol
-duration at the given Gold code length. In the lower graphs we show the trace of amplitude at $\text{SER}=0.5$ over chip
-duration like we did in Figure \ref{dsss_thf_sensitivity_all_bits} for threshold factor. In both graphs we can see a
-faint optimum for very short chips with a decrease of sensitivity for long chips. This effect is due to longer chips
-moving the signal band into noisier spectral regions (cf.\ Figure \ref{freq_meas_spectrum}).
-
-\begin{FPfigure}
- \begin{subfigure}{\textwidth}
- \centering
- \hspace*{-1cm}\includegraphics[width=1.2\textwidth]{../lab-windows/fig_out/chip_duration_sensitivity_5}
- \vspace*{-1cm}
- \label{chip_duration_sensitivity_5}
- \caption{
- 5 bit Gold code.
- }
- \end{subfigure}
-%\end{figure}
-%\begin{figure}
-% \ContinuedFloat
- \begin{subfigure}{\textwidth}
- \centering
- \hspace*{-1cm}\includegraphics[width=1.2\textwidth]{../lab-windows/fig_out/chip_duration_sensitivity_6}
- \vspace*{-1cm}
- \label{chip_duration_sensitivity_6}
- \caption{
- 6 bit Gold code.
- }
- \end{subfigure}
- \caption{
- Dependence of demodulator sensitivity on DSSS chip duration. Due to computational constraints this simulation is
- limited to 5 bit and 6 bit DSSS sequences. There is a clearly visible sensitivity maximum at short chip
- lengths around $0.2 \text{s}$. Short chip durations shift the entire transmission band up in frequency. In
- Figure \ref{freq_meas_spectrum} we can see that noise energy is mostly concentrated at lower frequencies, so
- shifting our signal up in frequency will reduce the amount of noise the decoder sees behind the correlator by
- shifting the band of interest into a lower-noise spectral region. For a practical implementation chip duration
- is limited by physical factors such as the maximum modulation slew rate ($\frac{\text{d}P}{\text{d}t}$) that can
- be technically realized and the maximum Rate-Of-Change-Of-Frequency (ROCOF, $\frac{\text{d}f}{\text{d}t}$) that
- the grid can tolerate.
- }
- \label{chip_duration_sensitivity}
-\end{FPfigure}
-
-In the previous graphs we have used random clips of measured grid frequency noise as noise in our simulations. Comparing
-between a simulation using measured noise and synthetic noise generated as we outlined in the beginning of Section
-\ref{sec-ch-sim} we get the plots in Figure \ref{chip_duration_sensitivity_cmp}. We can see that while not perfect our
-simulated noise is an adequate approximation of reality: Our prototype demodulator shows no significant difference in
-behavior between measured and simulated noise. Simulated noise causes slightly worse performance for long chips. Overall
-the results for both are very close in absolute value.
-
-\begin{FPfigure}
- \begin{subfigure}{\textwidth}
- \centering
- \hspace*{-1cm}\includegraphics[width=1.2\textwidth]{../lab-windows/fig_out/chip_duration_sensitivity_cmp_meas_6}
- \vspace*{-1cm}
- \label{chip_duration_sensitivity_cmp_meas_6}
- \caption{
- Simulation using baseline frequency data from actual measurements.
- }
- \end{subfigure}
-%\end{figure}
-%\begin{figure}
-% \ContinuedFloat
- \begin{subfigure}{\textwidth}
- \centering
- \hspace*{-1cm}\includegraphics[width=1.2\textwidth]{../lab-windows/fig_out/chip_duration_sensitivity_cmp_synth_6}
- \vspace*{-1cm}
- \label{chip_duration_sensitivity_cmp_synth_6}
- \caption{
- Simulation using synthetic frequency data.
- }
- \end{subfigure}
- \caption{
- Chip duration/sensitivity simulation results like in Figure \ref{chip_duration_sensitivity} compared between a
- simulation using measured frequency data like in the previous graphs and one using artificially generated noise.
- There is little visible difference indicating that we have found a good model of reality in our noise
- synthesizer, but also that real grid frequency behaves like a frequency-shaped Gaussian noise process.
- }
- \label{chip_duration_sensitivity_cmp}
-\end{FPfigure}
-
-\section{Implementation of a demonstrator unit}
-\label{sec-prototype}
-
-To demonstrate the viability of our reset architecture we decided to implement a demonstrator system. In this
-demonstrator we use JTAG to reset part of a commodity smart meter from an externally-connected reset controller. The
-reset controller receives its commands over the grid frequency modulation system we outlined in this thesis. To keep
-implementation cost low the reset controller is fed a simulation of a modulated grid frequency signal through a standard
-\SI{3.5}{\milli\meter} audio jack\footnote{
- By generously cutting two PCB traces the meter we chose to use can be easily modified to provide galvanic separation
- between grid and main application microcontroller. With this modification we have to supply power to its main
- application MCU externally along with the JTAG interface but now the modified meter is electrically safe.
-}. Measurement of actual grid frequency instead would simply require a voltage divider and depending on the setup an
-analog optoisolator.
-
-\subsection{Selecting a smart meter for demonstration purposes}
-\label{sec-easymeter}
-
-\begin{figure}[h!]
- \centering
- \begin{subfigure}{\textwidth}
- \centering
- \includegraphics[width=0.6\textwidth]{resources/easymeter_board_composite.jpg}
- \label{easymeter_display_board_composite}
- \caption{
- \footnotesize
- Optical composite image of the display and data logging board in the top of the case. The six pins at the
- top are the SPI chip-on-glass segment LCD. Of the eight pads on the left six are unused and two carry the
- auxiliary power supply from the measurement board below. The bottom right section contains the
- \si{\kilo\watt\hour} impulse LED and the angled IR communication LED. The flying wires
- connect to the 14-pin JTAG and serial debug header.
- }
- \end{subfigure}
- \begin{subfigure}{\textwidth}
- \vspace{1cm}
- \centering
- \includegraphics[width=0.8\textwidth]{resources/easymeter_baseboard_composite.jpg}
- \label{easymeter_measurement_board_composite}
- \caption{
- \footnotesize
- Composite microfocus x-ray image of the potted measurement module in the bottom of the case. The ovals on
- the top left and right are power supply and data jumper connections for external modules such as SMGW
- interfaces. The bright parts at the bottom are the massive screw terminals with integrated current shunts.
- The circuitry right of the three independent measurement channels is the power supply circuit for the
- display board.
- }
- \end{subfigure}
-
- \caption{
- Composite images of the circuit boards inside the EasyMeter Q3DA1002 smart electricity meter used in our
- demonstration.
- }
- \label{easymeter_composites}
-\end{figure}
-
-\begin{figure}[h!]
- \centering
- \begin{subfigure}{0.45\textwidth}
- \centering
- \includegraphics[width=\textwidth]{resources/easymeter_baseboard_channel.jpg}
- \label{easymeter_channel_xray}
- \caption{Microfocus x-ray of one channel's data acquisition circuit.}
- \end{subfigure}\hspace*{5mm}
- \begin{subfigure}{0.45\textwidth}
- \centering
- \includegraphics[width=\textwidth]{resources/easymeter_baseboard_powersupply.jpg}
- \label{easymeter_powersupply_xray}
- \caption{Microfocus x-ray of the auxiliary power supply.}
- \end{subfigure}
-
- \caption{
- Microfocus x-rays of major sections of the EasyMeter Q3DA1002 measurement board.
- }
- \label{easymeter_detail_xrays}
-\end{figure}
-
-For our demonstrator to make sense we wanted to select a realistic reset target. In Germany where this thesis was
-written a standards-compliant setup would consist of a comparatively feature-limited smart meter and a smart meter
-gateway (SMGW) containing all of the complex bidirectional protocol logic such as wireless or landline IP connectivity.
-The realistic target for a setup in this architecture would be the components of an SMGW such as its communication modem
-or main application processor. In the German architecture the smart meter does not even have to have a bi-directional
-data link to the SMGW effectively mitigating any attack vector for remote compromise.
-
-Despite these considerations we still chose to reset the application MCU inside smart meter for two reasons. One is that
-SMGWs are much rarer on the second-hand market. The other is that SMGWs are a particular feature of the German
-standardization landscape and in many other countries functions of an SMGW such as wireless protocol handling are
-integrated into the meter itself (see e.g.\ \cite{honeywell01}).
-
-In the end we settled on a Q3DA1002 three phase 60A meter made by German manufacturer EasyMeter. This meter is typical
-of what would be found in an average German household and can be acquired very inexpensively as new old stock on online
-marketplaces.
-
-The meter consists of a plastic enclosure with a transparent polycarbonate top part and a gray ABS bottom part that are
-ultrasonically welded together. In the bottom part of the case a PCB we call the \emph{measurement} board is potted in
-epoxide resin (see Figure \ref{easymeter_composites}). This PCB contains three separate energy measurement ASICs for the
-three phases (see Figure \ref{easymeter_detail_xrays}). It also contains a capacitive dropper power supply for the meter
-circuitry and external modules such as a SMGW. The measurement board through three infrared links (one per phase)
-communicates with a smaller unpotted PCB we call the \emph{display} board in the top of the case. This PCB handles
-measurement logging and aggregation, controls a small segment LCD displaying totals and handles the externally
-accessible \si{\kilo\watt\hour} impulse LED and serial IR links.
-
-The measurement board does not contain any logging or outside communication interfaces. All of that is handled on the
-display board by a Texas Instruments \texttt{MSP430F2350} application MCU. This is a 16-bit RISC MCU with
-\SI{16}{\kilo\byte} flash and \SI{2}{\kilo\byte} SRAM\footnote{
- At first glance the microcontroller might seem overkill for such a simple application, but most of its
- \SI{16}{\kilo\byte} program flash is in fact used. A casual glance with Ghidra shows that a large part of program
- flash is expended on keeping multiple redundant copies of energy consumption aggregates including error recovery in
- case of data corruption and some effort has even been made to guard against data corruption using simple
- non-cryptographic checksums. Another large part of the MCU's firmware handles data transmission over the meter's
- externally accessible IR link through Smart Message Language\cite{bsi-tr-03109-1-IVb}.
-}. There is an I2C EEPROM that is used in conjunction with the microcontroller's internal \SI{256}{\byte} data flash to
-keep redundant copies of energy consumption aggregates. On the side of the display board there is a 14-pin header
-containing both a standard TI MSP430 JTAG pinout and a UART serial interface for debugging. Conveniently, the JTAG port
-was left enabled by fuse in our particular production unit.
-
-We chose to use this \texttt{MSP430} series application MCU as our reset target. Though in this particular unit remote
-compromise is impossible due to a lack of bidirectional communication links some of its sister models do contain
-bidirectional communication links\cite{easymeter01} making compromise through communication interfaces an at least
-theoretical possibility. In other countries, meters with a similar architecture to the Q3DA1002 include complex protocol
-logic as part of the meter itself or have bidirectional links to it\cite{honeywell01,ifixit01,bigclive01,eevblog01}. As
-an example, the Honeywell REX2 uses a Maxim Integrated \texttt{71M6541} main application microcontroller along with a
-Texas Instruments \texttt{CC1000} series radio transceiver and is advertised to support both over-the-air firmware
-upgrade and a remotely accessible disconnect switch.
-
-\subsection{Firmware implementation}
-\label{sec-demo-fw-impl}
-
-We based our safety reset demonstrator firmware on the grid frequency sensor firmware we developed in Section
-\ref{sec-fsensor}. We implemented DSSS demodulation by translating the Python prototype code we developed in Section
-\ref{sec-ch-sim} to embedded C code. After validating the C translation in extensive simulations we integrated our code
-with a Reed-Solomon implementation and a libsodium-based implementation of the cryptographic protocol we designed in
-Section \ref{sec-crypto}. To reprogram the target \texttt{MSP430} microcontroller we ported the low-level bitbang JTAG
-driver of \texttt{mspdebug}\footnote{\url{https://github.com/dlbeer/mspdebug}}. See Figure \ref{fig_demo_sig_schema} for
-a schematic overview of signal processing in our demonstrator.
-
-For all computation-heavy high level modules of our firmware such as the DSSS demodulator or the grid frequency
-estimator we wrote test fixtures that allow the same code that runs on the microcontroller to be executed on the host
-for testing. These test fixtures are very simple C programs that load input data from a file or the command line, run
-the algorithm and print results on standard output. To enable automatic testing of a large parameter set we run these
-test fixtures repeatedly from a set of Python scripts sweeping parameters.
-
-\begin{figure}
- \centering
- \includegraphics[width=\textwidth]{resources/prototype_schema}
- \caption{The signal processing chain of our demonstrator.}
- \label{fig_demo_sig_schema}
-\end{figure}
-
-\section{Grid frequency modulation emulation}
-
-To emulate a modulated grid frequency signal we superimposed a DSSS-modulated signal at the proper amplitude with
-synthetic grid frequency noise generated according to the measurements we took in Section \ref{sec-fsensor}. In this
-primitive simulation we do not simulate the precise impulse response of the grid to a DSSS-modulated stimulus signal.
-Our results still serve to illustrate the possibility of data transmission in this manner this impulse response can be
-compensated for at the transmitter by selecting appropriate modulation parameters (e.g. chip rate and amplitude) and at
-the receiver by equalization with a matched filter.
-
-\section{Experimental results}
-
-\begin{figure}
- \centering
- \includegraphics[width=0.6\textwidth]{resources/prototype.jpg}
- \caption{The completed prototype setup. The board on the left is the safety reset microcontroller. It is connected
- to the smart meter in the middle through an adapter board. The top left contains a USB hub with debug interfaces to
- the reset microcontroller. The cables on the bottom left are the debug USB cable and the \SI{3.5}{\milli\meter}
- audio cable for the simulated mains voltage input.}
- \label{fig_proto_pic}
-\end{figure}
-
-After extensive simulations and testing of the individual modules of our solution we proceeded to conduct a real-world
-experiment. We tried the demonstrator setup in Figure \ref{fig_proto_pic} using an emulated noisy DSSS signal in
-real-time. Our experiment went without any issues and the firmware implementation correctly reset the demonstrator's
-meter. We were happy to see that our extensive testing paid off: The demonstrator setup worked on its first try.
-
-Our experiment consisted of the demonstrator prototype with the meter flashed with its factory firmware connected to a
-microcontroller development board acting as the safety reset controller. The safety reset controller is connected to a
-laptop's audio output through an adapter board. The laptop plays back an emulated grid voltage waveform that the safety
-reset microcontroller measures and analyzes as it would when directly connected to the mains. When the microcontroller
-receives a reset sequence that is a valid signature using a development key incorporated into its firmware through JTAG
-it re-programs the smart meter with a modified firmware image that displays a success message on the meter's LCD.
-
-We used a signature truncated at 120 bit in our experiment. We chose a 5 bit DSSS sequence. Taking the sign bit into
-account the length of the encoded signature is 20 DSSS symbols. On top of this we used Reed-Solomon error correction at
-a 2:1 ratio inflating total message length to 30 DSSS symbols. At the \SI{1}{\second} chip rate we used in other
-simulations as well this equates to an overall transmission duration of approximately \SI{15}{\minute}. To give the
-demodulator some time to settle and to produce more realistic conditions of signal reception we padded the modulated
-signal unmodulated noise on both ends.
-
-\section{Lessons learned}
-
-Before settling on the commercial smart meter we first tried to use an \texttt{EVM430-F6779} smart meter evaluation kit
-made by Texas Instruments. This evaluation kit did not turn out well for two main reasons. One, it shipped with half the
-case missing and no cover for the terminal blocks. Because of this some work was required to get it electrically safe.
-Even after mounting it in an electrically safe manner the safety reset controller prototype would also have to be
-galvanically isolated to not pose an electrical safety risk since the main MCU is not isolated from the grid and the
-JTAG port is also galvanically coupled. The second issue we ran into was that the \texttt{EVM430-F6779} is based around
-an \texttt{MSP430F6779} microcontroller. This microcontroller is a rather large part within the \texttt{MSP430} series
-and uses a new revision of the CPU core and associated JTAG peripheral that are incompatible with all \texttt{MSP430}
-programmers we tried to use on it. \texttt{mspdebug} does not have support for it and porting TI's own JTAG programmer
-reference sources did not yield any results either. Finally we tried an USB-based programmer made by TI themselves that
-turned out to either have broken firmware or a hardware defect, leading to it frequently reënumerating on the USB.
-
-Overall our initial assumption that a development kit would certainly be easier to program than a commercial meter did
-not prove to be true. Contrary to our expectations the commercial meter had JTAG enabled allowing us to easily read out
-its stock firmware without needing to reverse-engineer vendor firmware update files or circumventing code protection
-measures. The fact that its firmware was only available in its compiled binary form was not much of a hindrance as it
-proved not to be too complex and all we wanted to know could be found out with just a few hours of digging in Ghidra.
-
-In the firmware development phase our approach of testing every module individually (e.g. DSSS demodulator, Reed-Solomon
-decoder, grid frequency estimation) proved to be very useful. In particular debugging benefited greatly from being able
-to run several thousand tests within seconds. In case of our DSSS demodulator this modular testing and simulation
-architecture allowed us to simulate thousands of runs of our implementation on test data and directly compare it to our
-Jupyter/Python prototype (see Figure \ref{fw_proto_comparison}). Since we spent more time polishing our embedded C
-implementation it turned out to perform better than our Python prototype. At the same time it shows fundamentally
-similar response to its parameters. One significant bug we fixed in the embedded C version was the Python version's
-tendency towards incorrect decodings at even very large amplitudes.
-
-\begin{figure}
- \centering
- \begin{subfigure}{\textwidth}
- \centering
- \hspace*{-1cm}
- \includegraphics[trim={0 4cm 0 0},clip,width=1.2\textwidth]{../lab-windows/fig_out/dsss_thf_amplitude_56_jupyter_impl}
- \caption{Python prototype.}
- \end{subfigure}
- \begin{subfigure}{\textwidth}
- \centering
- \hspace*{-1cm}
- \includegraphics[trim={0 4cm 0 0},clip,width=1.2\textwidth]{../lab-windows/fig_out/dsss_thf_amplitude_56_fw_impl}
- \caption{Embedded C implementation.}
- \end{subfigure}
-
- \caption{
- Symbol error rate plots versus threshold factor for both our Python prototype (above) and our firmware
- implementation of our demodulation algorithm. Note the slightly different threshold factor color scales. Cf.\
- Figure \ref{dsss_thf_amplitude_5678}.
- }
- \label{fw_proto_comparison}
-\end{figure}
-
-In accordance with our initial estimations we did not run into any code space nor computation bottlenecks for chosing
-floating point emulation instead of porting over our algorithms to fixed point calculations. The extremely slow sampling
-rate of our systems makes even heavyweight processing such as FFT or our brute-force dynamic programming approach to
-DSSS demodulation possible well within our performance constraints.
-
-Since we are only building a prototype we did not optimize firmware code size at all. The compiled code size of our
-firmware implementation is slightly larger than we would like at around \SI{64}{\kilo\byte} for our firmware image
-including everything except the target microcontroller firmware image. See appendix \ref{symbol_size_chart} for a graph
-illustrating the contribution of various parts of the signal processing toolchain to this total. Overall the most
-heavy-weight operations by far are the SHA512 implementation from libsodium and the FFT from ARM's CMSIS signal
-processing library. Especially the SHA512 implementation has large potential for size optimization because it is highly
-optimized for speed using extensive manual loop unrolling.
-
-\chapter{Future work}
-
-\section{Precise grid characterization}
-
-We based our simulations on a linear relationship between the generation/consumption power imbalance and grid frequency.
-Our literature study suggests that this is an appropriate first order approximation\cite{crastan03}. We kept the
-modulation bandwidth in our simulations inside a \SIrange{1000}{100}{\milli\hertz} frequency band that we reason is most
-likely to exhibit this linear behavior in practice. At lower frequencies primary control kicks in. With the frequency
-delta thresholds specified for primary control systems\cite{entsoe04} this would lead to significant non-linear
-effects. At higher frequencies grid frequency estimation at the receiver becomes more complex since the margins of the
-FFT transform shrink. Higher frequencies also come close to modes of mechanical oscillation in generators that usually
-lie at \SI{5}{\hertz} and above\cite{crastan03}.
-
-An analysis of the above concerns can be performed using dynamic grid simulation models\cite{semerow01,entsoe05}.
-Presumably out of security concerns these models are only available under non-disclosure agreements. Integrating
-NDA-encumbered results stemming from such a model in an open-source publication such as this one poses a logistical
-challenge which is why we decided to leave this topic for a separate future work.
-
-After detailed model simulation we ultimately aim to validate our results experimentally. Assuming linear grid behavior
-even under very small disturbances a small-scale experiment is an option. Such a small-scale experiment would require
-very long integration times: Given a frequency characteristic of \SI{30}{\giga\watt\per\hertz} a stimulus of
-\SI{10}{\kilo\watt} yields $\Delta f = \SI{0.33}{\micro\hertz}$. At an estimated \SI{20}{\milli\hertz} of RMS noise over
-a bandwidth of interest this results in an SNR slightly better than \SI{-50}{\decibel}. The correlation time necessary
-to offset this with DSSS processing gain at a chip rate of \SI{1}{\baud} would be in the order of days. With such long
-correlation times clock stability starts to become a problem as during correlation transmitter and receiver must
-maintain close phase alignment with respect to one chip period. A phase difference requirement of less than
-\SI{10}{\degree}over this period of time would translate into clock stability better than \SI{10}{ppm}. Though certainly
-not impossible to achieve this does pose an engineering challenge.
-
-A way to reduce clock alignment might be to use grid frequency itself as a reference. Instead of keying the DSSS
-modulator/demodulator on a local crystal oscillator, chip timings would be described in fractions of a mains voltage
-cycle. This would track grid frequency variations synchronously at both ends and would maintain phase alignment even
-over long periods of time at cost of a slight increase in system complexity. The receiver would then measure differences
-between consecutive chips instead of their absolute values.
-
-\section{Technical standardization}
-
-The description of a safety reset system provided in this work could be translated into a formalized technical standard.
-Our system is simple compared to e.g.\ a full smart meter communication standard and thus can conceivably be
-described in a single, concise document. The complicated side of standardization would be the standardization of the
-backend operation including key management, coördination and command authorization.
-
-\section{Regulatory adoption}
-\label{sec-regulation}
-
-Since the proposed system adds significant cost and development overhead at no immediate benefit to either consumer or
-utility company it is unlikely that it would be adopted voluntarily. Market forces limit what long-term planning utility
-companies can do. An advanced mitigation such as this one might be out of their reach on their own and might require
-regulatory intervention to be implemented. To regulatory authorities a system such as this one provides a primitive to
-guard against attacks. Due to the low-level approach our system might allow a regulatory authority to restore meters to
-a safe state without the need of fine-grained control of implementation details such as application network protocols.
-
-A regulatory authority might specify that all smart meters must use a standardized reset controller that on command
-resets to a minimal firmware image that disables external communication, continues basic billing functions and enables
-any disconnect switches. This system would enable the regulatory authority to directly preempt a large-scale attack
-irrespective of implementation details of the various smart meter implementations.
-
-Cryptographic key management for the smart reset system is not much different to the management of highly privileged
-signing keys as they are used in many other systems such as TLS already. If the safety reset system is implemented by a
-regulatory authority they would likely be able to find a public entity that is already managing root keys for other
-government systems to also manage safety reset keys. Availability and security requirements of safety reset keys do not
-differ significantly from those for other types of root keys.
-
-\section{Zones of trust}
-
-In our design, we opted for a safety reset controller in form of a separate micocontroller entirely separate from
-whatever application microcontroller the smart meter design is already using. This design nicely separates the meter
-into an untrusted application on the core microcontroller and the trusted reset controller. Since the interface between
-the two is simple and one-way, it can be validated to a high standard of security.
-
-Despite these security benefits, the cost of such a separate hardware device might prove high in a mass-market rollout.
-In this case, one might attempt to integrate the reset controller into the core microcontroller in some way. Primarily,
-there would be two ways to accomplish this. One is a solution that physically integrates an additional microcontroller
-core into the main application microcontroller package either as a module on the same die or as a separate die in a
-multi-chip module (MCM) with the main application microcontroller. A custom solution integrating both on a single die
-might be a viable path for very large-scale deployments but will most likely be too expensive in tooling costs alone to
-justify its use. More likely for a medium- to large-scale deployment of millions of meters would be a MCM integrating an
-off-the-shelf smart metering microcontroller die with the reset controller running on another, much smaller
-off-the-shelf microcontroller die. This solution might potentially save some cost compared to a solution using a
-discrete microcontroller for the reset controller.
-
-The more likely approach to reducing cost overhead of the reset controller would be to employ virtualization
-technologies such as ARM's TrustZone in order to incorporate the reset controller firmware into the application firmware
-on the same processor core without compromising the reset controller's security or disturbing the application firmware's
-operation.
-
-TrustZone is a virtualization technology that provides a hardware-assisted privileged execution domain. In traditional
-virtualization setups a privileged hypervisor is managing several unprivileged applications that share resources between
-them. Separation between applications in this setup is longitudinal between adjacent virtual machines. Two applications
-would both be running in unprivileged mode sharing the same CPU and the hypervisor would merely schedule them, configure
-hardware resource access and coördinate communication. This longitudinal virtualization simplifies application
-development since from the application's perspective the virtual machine looks very similar to a physical one. In
-addition, in general this setup can be used to reciprocally isolate two applications with neither one being able to gain
-control over the other.
-
-In contrast to this, a TrustZone-like system in general does not provide several application virtual machines and
-longitudinal separation. Instead, it provides lateral separation between two domains: The unprivileged application
-firmware and a privileged hypervisor. Application firmware may communicate with the hypervisor through defined
-interfaces but due to TrustZone's design it need not even be aware of the hypervisor's existence. This makes a perfect
-fit for our reset controller. The reset controller firmware would be running in privileged mode and without exposing any
-communication interfaces to application firmware. The application firmware would be running in unprivileged mode
-without any modification. The main hurdles to the implementation to a system like this are the requirement for a
-microcontroller providing this type of virtualization on the one hand and the complexity of correctly employing this
-virtualization on the other hand. Virtualization systems such as TrustZone are still orders of magnitude more complex to
-correctly configure than it is to simply use separate hardware and secure the interfaces in between.
-
-\chapter{Conclusion}
-
-In this thesis we have developed an end-to-end design of a reset system to restore smart meters to a safe operating
-state during an ongoing large-scale cyberattack. We have laid out the fundamentals of smart metering infrastructure and
-elaborated the need for an out of band method to reset a meter's firmware due to the large attack surface of this
-complex firmware. To allow our system to be triggered even in the middle of a cyberattack we have developed a broadcast
-data transmission system based on intentional modulation of the global grid frequency. We have developed the theoretical
-foundations of the process based on an established model of inertial grid frequency response to load variations and
-shown the viability of our end-to-end design through extensive simulations. To put these simulations on a solid
-foundation we have developed a grid frequency measurement methodology comprising of a custom-designed hardware device
-for electrically safe data capture and a set of software tools to archive and process captured data. Our simulations
-show good behavior of our broadcast communication system and give an indication that coöperating with a large consumer
-such as an aluminum smelter would be a feasible way to set up a transmitter with very low hardware overhead. Based on
-our broadcast primitive we have developed a cryptographic protocol ready for embedded implementation in
-resource-constrained systems that allows triggering all or a selected subset of devices within a quick response time of
-less than 30 minutes. Finally, we have experimentally validated our system using simulated grid frequency data in a
-demonstrator setup based on a commercial microcontroller as our safety reset controller and an off-the-shelf smart
-meter. We have laid out a path for further research and standardization related to our system. Our code and electronics
-designs are available at the public repository listed on the second page of this document.
-
-\newpage
-
-%\nocite{*} TODO: check unused references
-\printbibliography[heading=bibintoc]
-\newpage
-
-\appendix
-
-\chapter{Frequency sensor schematics}
-\label{sec-app-freq-sens-schematics}
-\fancyhead[C]{Frequency sensor schematics (1/3)}
-\fancyfoot[C]{}
-\fancyhead[R]{\thepage}
-\includepdf[fitpaper,landscape,pagecommand={\thispagestyle{fancy}}]{resources/platform-export-pg1.pdf}
-\fancyhead[C]{Frequency sensor schematics (2/3)}
-\includepdf[fitpaper,pagecommand={\thispagestyle{fancy}}]{resources/platform-export-pg2.pdf}
-\fancyhead[C]{Frequency sensor schematics (3/3)}
-\includepdf[fitpaper,landscape,pagecommand={\thispagestyle{fancy}}]{resources/platform-export-pg3.pdf}
-\fancyfoot[C]{\thepage}
-
-\chapter{Demonstrator firmware symbol size map}
-\emph{Please find this appendix enclosed in the pouch on the inside of the back cover.}
-\label{symbol_size_chart}
-\includepdf[fitpaper]{resources/safetyreset-symbol-sizes.pdf}
-
-\ifdefined\includenotebooks
-\chapter{Transcripts of Jupyter notebooks used in this thesis}
-
-\includenotebook{Grid frequency estimation}{grid_freq_estimation}
-\includenotebook{Grid frequency estimation validation against ROCOF test suite}{freq_meas_validation_rocof_testsuite}
-\includenotebook{Frequency sensor clock stability analysis}{gps_clock_jitter_analysis}
-\includenotebook{DSSS modulation experiments}{dsss_experiments-ber}
-\fi
-
-\ifdefined\includefirmwaresources
-\chapter{Firmware source code excerpts}
-\section{DMA-backed ADC capture (adc.c)}
-\inputminted[fontsize=\footnotesize,linenos,firstline=18,lastline=115,breaklines]{C}{../gm_platform/fw/adc.c}
-
-\section{Frequency sensor packetized serial interface}
-\subsection{serial.c}
-\inputminted[fontsize=\footnotesize,linenos,breaklines]{C}{../gm_platform/fw/serial.c}
-\subsection{packet\_interface.c}
-\inputminted[fontsize=\footnotesize,linenos,breaklines]{C}{../gm_platform/fw/packet_interface.c}
-\subsection{cobs.c}
-\inputminted[fontsize=\footnotesize,linenos,breaklines]{C}{../gm_platform/fw/cobs.c}
-\subsection{Host data logging utility (tw\_test.py)}
-\inputminted[fontsize=\footnotesize,linenos,breaklines]{python}{../gm_platform/fw/tw_test.py}
-
-\section{Frequency estimation (freq\_meas.c)}
-\inputminted[fontsize=\footnotesize,linenos,breaklines]{C}{../controller/fw/src/freq_meas.c}
-\section{DSSS demodulation (dsss\_demod.c)}
-\inputminted[fontsize=\footnotesize,linenos,breaklines]{C}{../controller/fw/src/dsss_demod.c}
-\section{Cryptographic protocol handling}
-\subsection{protocol.c}
-\inputminted[fontsize=\footnotesize,linenos,breaklines]{C}{../controller/fw/src/protocol.c}
-\subsection{crypto.c}
-\inputminted[fontsize=\footnotesize,linenos,breaklines]{C}{../controller/fw/src/crypto.c}
-\fi
-
-
-% TODO
-%\chapter{Economic viability of countermeasures}
-%\section{Attack cost}
-%\section{Countermeasure cost}
-%\section{Conclusion}
-
-\end{document}