summaryrefslogtreecommitdiff
path: root/ma/safety_reset.tex
diff options
context:
space:
mode:
Diffstat (limited to 'ma/safety_reset.tex')
-rw-r--r--ma/safety_reset.tex543
1 files changed, 517 insertions, 26 deletions
diff --git a/ma/safety_reset.tex b/ma/safety_reset.tex
index 5a256af..154fbda 100644
--- a/ma/safety_reset.tex
+++ b/ma/safety_reset.tex
@@ -20,8 +20,12 @@
\usepackage{multirow}
\usepackage{multicol}
\usepackage{tikz}
+\usepackage{mathtools}
+\DeclarePairedDelimiter{\ceil}{\lceil}{\rceil}
+\DeclarePairedDelimiter{\paren}{(}{)}
\usetikzlibrary{arrows}
+\usetikzlibrary{chains}
\usetikzlibrary{backgrounds}
\usetikzlibrary{calc}
\usetikzlibrary{decorations.markings}
@@ -42,7 +46,7 @@
\usepackage[underline=false]{pgf-umlsd}
\usetikzlibrary{calc}
%\usepackage[pdftex]{graphicx,color}
-%\usepackage{epstopdf}
+\usepackage{epstopdf}
% Needed for murks.tex
\usepackage{setspace}
\usepackage[draft=false,babel,tracking=true,kerning=true,spacing=true]{microtype} % optischer Randausgleich etc.
@@ -56,15 +60,15 @@
% Beispielhafte Nutzung der Vorlage für die Titelseite (bitte anpassen):
\input{murks}
-\titel{FIXME} % Titel der Arbeit
-\typ{Masterarbeit} % Typ der Arbeit: Diplomarbeit, Masterarbeit, Bachelorarbeit
-\grad{Master of Science (M. Sc.)} % erreichter Akademischer Grad
-% z.B.: Master of Science (M. Sc.), Master of Education (M. Ed.), Bachelor of Science (B. Sc.), Bachelor of Arts (B. A.), Diplominformatikerin
+\titelen{A Post-Attack Recovery Architecture for Smart Electricity Meters}
+\titelde{Eine Architektur zur Kontrollwiederherstellung nach Angriffen auf Smart Metering in Stromnetzen}
+\typ{Masterarbeit}
+\grad{Master of Science (M. Sc.)}
\autor{Jan Sebastian Götte}
-\gebdatum{Aus datenschutzrechtlichen Gründen nicht abgedruckt} % Geburtsdatum des Autors
-\gebort{Aus datenschutzrechtlichen Gründen nicht abgedruckt} % Geburtsort des Autors
-\gutachter{Prof. Dr. Björn Scheuermann}{FIXME} % Erst- und Zweitgutachter der Arbeit
-\mitverteidigung % entfernen, falls keine Verteidigung erfolgt
+\gebdatum{Aus Datenschutzgründen nicht abgedruckt} % Geburtsdatum des Autors
+\gebort{Aus Datenschutzgründen nicht abgedruckt} % Geburtsort des Autors
+\gutachter{Prof. Dr. Björn Scheuermann}{Prof. Dr.-Ing. Eckhard Grass}
+\mitverteidigung % entfernen, falls keine Verteidigung erfolgt %FIXME
\makeTitel
\selbstaendigkeitserklaerung{31.03.2020}
\newpage
@@ -99,13 +103,13 @@ Smart meters usually are built around a standard microcontroller. \label{sm-cpu}
\section{Regulatory frameworks around the world}
\subsection{International standards}
-\subsection{Regulations in Europe}
-\subsection{The regulatory situation in Germany}
-\subsection{The regulatory situation in France}
-\subsection{The regulatory situation in the UK}
-\subsection{The regulatory situation in Italy}
-\subsection{The regulatory situation in northern America}
-\subsection{The regulatory situation in Japan}
+\subsection{The regulatory situation in selected countries}
+\subsubsection{Germany}
+\subsubsection{France}
+\subsubsection{the UK}
+\subsubsection{Italy}
+\subsubsection{Northern America}
+\subsubsection{Japan}
\subsection{Common themes}
\section{Security in smart grids}
@@ -213,7 +217,7 @@ Communication channel attacks are attacks on the communication links between sma
attacks on IP-connected parts of the core network or attacks on shared busses between smart meters and IP gateways in
substations. Generally, these attacks can be mitigated by securing the aforementioned communication links using modern
cryptography. IP links can be protected using TLS, and more low-level busses can be protected using more lightweight
-Noise-based protocols. % FIXME cite
+Noise\cite{perrin01}-based protocols.
Cryptographic security transforms an attackers ability to manipulate communication contents into a mere denial of
service attack. Thus, in addition to cryptographic security safety under DoS conditions must be ensured to ensure
continued system performance under attacks. This safety property is identical with the safety required to withstand
@@ -452,7 +456,7 @@ Microcontrollers have gained enormously in both performance/efficiency as well a
gains have largely been driven by insatiable customer demand for faster, more powerful chips and for a long time
security has not been considered important outside of some specific niches such as smartcards. Traditionally a
microcontroller would spend its entire lifetime without ever being exposed to any networks. Though this trend has been
-reversing with the increasing adoption of internet-of-things things % FIXME is this pun ok?
+reversing with the increasing adoption of internet-of-things things
and more advanced security features have started appearing in general-purpose microcontrollers, most still lack even
basic functionality found in processors for computers or smartphones.
@@ -470,41 +474,528 @@ simple to reduce attack surface there.
\subsection{Safety vs. Security: Opting for restoration instead of prevention}
-\subsection{Technical outline of a safety reset}
+\subsection{Technical outline of a safety reset system}
\section{Communication channels on the grid}
-\subsection{Powerline communication systems and their use}
+
+There is a number of well-established technologies for communication on or along power lines. We can distinguish three
+basic system categories: Systems using separate wires (such as DSL over landline telephone wiring), wireless radio
+systems (such as LTE) and \emph{powerline communication} (PLC) systems that re-use the existing mains wiring and
+superimpose data transmissions on the 50 Hz mains sine\cite{gungor01,kabalci01}.
+
+For our scenario, we will ignore short-range communication systems. There exists a large number of \emph{wideband}
+powerline communication systems that are popular with consumers for bridging ethernet between parts of an apartment or
+house. These systems transmit at up to several hundred megabits over distances up to several tens of
+meters\cite{kabalci01}. Technologically, these wideband PLC systems are very different from \emph{narrowband} systems
+used by utilities for load management among other applications and they are not relevant to our analysis.
+
+\subsection{Powerline communication (PLC) systems and their use}
+In long-distance communications for applications such as load management, PLC systems are attractive since they allow
+re-using the existing wiring infrastructure and have been used as early as in the 1930s\cite{hovi01}. Narrowband PLC
+systems are a potentially low-cost solution to the problem of transmitting data at small bandwidth over distances of
+several hundred meters up to tens of kilometers.
+
+Narrowband PLC systems transmit on the order of kilobits per second or slower. A common use of this sort of system are
+\emph{ripple control} systems. These systems superimpose a low-frequency signal at some few hundred Hertz carrier
+frequency on top of the 50Hz mains sine. This low-frequency signal is used to encode switching commands for
+non-essential residential or industrial loads. Ripple control systems provide utilities with the ability to actively
+control demand while promising small savings in electricity cost to consumers\cite{dzung01}.
+
+In any PLC system there is a strict tradeoff between bandwidth, power and distance. Higher bandwidth requires higher
+power and reduces maximum transmission distance. Where ripple control systems usually use few transmitters to cover
+the entire grid of a regional distribution utility, higher-bandwidth bidirectional systems used for automatic meter
+reading (AMR) in places such as italy or france require repeaters within a few hundred meters of a transmitter.
+
+\subsection{Landline and wireless IP-based systems}
+Especially in automated meter reading (AMR) infrastructure the cost-benefit tradeoff of powerline systems does not
+always work out for utilities. A common alternative in these systems is to use the public internet for communication.
+Using the public internet has the advantage of low initial investment on the part of the utility company as well as
+quick commissioning. Disadvantages compared to a PLC system are potentially higher operational costs due to recurring
+fees to network providers as well as lower reliability. Being integrated into power grid infrastructure, a PLC system's
+failure modes are highly correlated with the overall grid. Put briefly, if the PLC interface is down, there is a good
+chance that power is out, too. In contrast to this general internet services exhibit a multitude of failures that are
+entirely decorrelated from power grid stability.
+
+For purposes such as meter reading for billing purposes, this stability is sufficient. However for systems that need to
+hold up in crisis situations such as the recovery system we are contemplating in this thesis, the public internet may
+not provide sufficient reliability.
+
\subsection{Proprietary wireless systems}
-\subsection{Landline IP}
-\subsection{IP-based wireless systems}
+% FIXME
+
\subsection{Frequency modulation as a communication channel}
-For our system, we chose grid frequency modulation (henceforth GFC) as a low-bandwidth uni-directional communications channel.
-Compared to traditional PLC GFC requires no additional hardware, works reliably throughout the grid and is harder to
-manipulate by a malicious actor.
-% FIXME \cite{urtasun01}
+For our system, we chose grid frequency modulation (henceforth GFM) as a low-bandwidth uni-directional broadcast
+communications channel. Compared to traditional PLC GFM requires only a small amount of additional hardware, works
+reliably throughout the grid and is harder to manipulate by a malicious actor.
+
+Grid frequency in europe's synchronous areas is nominally 50 Hertz, but there are small load-dependent variations from
+this nominal value. Any device connected to the power grid (or even just within physical proximity of power wiring) can
+reliably and accurately measure grid frequency at low hardware overhead. By intentionally modifying grid frequency, we
+can create a very low-bandwidth broadcast communication channel. Grid frequency modulation has only ever been proposed
+as a communications channel at very small scales in microgrids before\cite{urtasun01} but to our knowledge has not yet
+been considered for large-scale application.
+
+Advantages of using grid frequency for communication are low receiver hardware complexity as well as the fact that a
+single transmitter can cover an entire synchronous area. Though the transmitter has to be very large and powerful, setup
+of a single large transmitter faces lower bureaucratic hurdles than integration of hundreds of smaller ones into
+hundreds of local systems each with autonomous goverance.
\subsubsection{The frequency dependance of grid frequency}
+% FIXME find a solid citation on this
+
\subsubsection{Control systems coupled to grid frequency}
+
\subsubsection{Avoiding dangerous modes}
+
+Modern power systems are complex electromechanical systems. Each component is controlled by several carefully tuned
+feedback loops to ensure voltage, load and frequency regulation. Multiple components are coupled through transmission
+lines that themselves exhibit complex dynamic behavior. The overall system is generally stable, but may exhbit some
+instabilities to particular small-signal stimuli. These instabilities, called \emph{modes} occur when due to mis-tuning
+of parameters or physical constraints the overall system exhibits oscillation at particular frequencies.
+\textcite{kundur01} splits these into four categories:
+
+\begin{description}
+ \item[Local modes] where a single power station oscillates in some parameter
+ \item[Interarea modes] where subsections of the overall grid oscillate w.r.t.\ each other due to weak coupling
+ between them
+ \item[Control modes] caused by imperfectly tuned control systems
+ \item[Torsional modes] that originate from electromechanical oscillations in the generator itself
+\end{description}
+
+The oscillation frequencies associated with each of these modes are usually between a few tens of Millihertz and a few
+Hertz, see for example \textcite{grebe01} and \textcite{entsoe01}. It is hard to predict the particular modes of a
+power system at the scale of the central-european interconnected system. Theoretical analysis and simulation may give
+rough indications but cannot yield conclusive results. Due to the obvious danger as well as high economical impact due
+to inefficiencies experimental measurements are infeasible. Finally, modes are highly dependent on the power grid's
+structure and will change with changes in the power grid over time. For all of these reasons, a grid frequency
+modulation system must be designed very conservatively without relying on the absence (or presence) of modes at
+particular frequencies. A concrete design guideline that we can derive from this situation is that the frequency
+spectrum of any grid frequency modulation system should not exhibit any notable peaks and should avoid a concentration
+of spectral energy in certain frequency ranges. On the one hand this rules out some modulation schemes. On the other
+hand it provides us with a useful pointer towards those techniques that might work well: Spread-spectrum techniques. By
+employing spread-spectrum modulation we can produce an almost ideal frequency-domain behavior while at the same time
+achieving some modulation gain, increasing system sensitivity.
+
+By using spread-spectrum techniques we can spread the energy of our modulation over a maximum in
+bandwidth\cite{goiser01}. The coding gain spread-spectrum techniques yield potentially allows for a weaker excitation,
+thereby allowing further reduction of the probability of disturbance to the overall system. Spread-spectrum techniques
+also inherently allow a tradeoff between receiver sensitivity and data rate which is a highly useful parameter to have
+for the overall system design.
+
\subsubsection{Overall system parameters}
\subsubsection{An outline of practical implementation}
\section{From grid frequency to a reliable communications channel}
\subsection{Channel properties}
+
+
\subsection{Modulation and its parameters}
+
+
\subsection{Error-correcting codes}
+
+
\subsection{Cryptographic security}
+Informally the system we are looking for can be modelled as consisting of three parties: The trusted
+\textsc{Transmitter}, one of a large number of untrusted \textsc{Receivers}, and an \textsc{Attacker} according to the
+following rules:
+
+\begin{enumerate}
+ \item \textsc{Transmitter} and \textsc{Attacker} can both transmit any bit sequence
+ \item \textsc{Receiver} receives any transmission by either \textsc{Transmitter} or \textsc{Attacker} but cannot
+ distinguish between the two on the signal level
+ \item \textsc{Attacker} knows anything a \textsc{Receiver} might know
+ \item \textsc{Transmitter} is stronger than \textsc{Attacker} and will ``win'' in simultaneous transmission
+ \item Both \textsc{Transmitter} and \textsc{Receiver} can be seeded with some information on each other such as
+ public key fingerprints.
+\end{enumerate}
+
+We are not interested in congestion scenarios where an attacker attempts to disrupt an ongoing transmission by the
+transmitter. In practice there are several avenues to prevent such attempts including the following. Compromised loads
+that are being abused by the attacker can be manually disconnected by the utility. Error-correcting codes can be used to
+provide resiliency against small-scale disturbances. Finally, the transmitter can be designed to have high enough power
+to be able to override any likely attacker.
+
+Our goal is to find a cryptographic primitive that has the following properties:
+\begin{enumerate}
+ \item \textsc{Transmitter} can produce a transmission bit sequence $\mathbf{s}$ (or equivalently a set of such
+ sequences) that \textsc{Receiver} can uniquely identify as being generated by \textsc{Transmitter}:
+ $\mathcal{R}\left(\mathbf{s}\right) = 1$. Upon reception of this sequence, \textsc{Receiver} performs the safety
+ reset.
+ \item \textsc{Attacker} cannot forge $\mathbf{s}$, that is find $\mathbf{s}'$ such that
+ $\mathbf{s} \neq \mathbf{s}' \land \mathcal{R}\left(\mathbf{s}'\right) = 1$
+ \item Our system conforms to an at-most-once semantic. That is, upon transmission of a valid bit sequence coded for
+ a particular \textsc{Receiver} or set of receivers each one either performs exactly one safety reset or none at
+ all. We cannot achieve an exactly-once semantic since we are using an unidirectional lossy communication
+ primitive. More coloquially, \textsc{Receiver} might be offline due to a localized power outage and might thus
+ not hear \textsc{Transmitter} even if our broadcast primitive is reliable. The practical impact of this
+ limitation can be mitigated by transmitter simply repeating itself until the desired effect has been achieved.
+\end{enumerate}
+
+An important limitation from the rules of our setup above is that \textsc{Attacker} can always record the bit sequence
+\textsc{Transmitter} transmits and replay that same sequence later. Before considering any cryptographic approaches we
+can make the preliminary observation that we can trivially prevent \textsc{Attacker} from violating the
+at-most-once criterion by simply requiring \textsc{Receiver} to memorize all bit sequences that have been transmitted
+thus far and only reacting to new bit sequences. This means an attacker might be able to cause offline receivers to
+reset at a later point, but considering our goal is to reset them in the first place this would not pose a danger to the
+system.
+% FIXME elaborate why this is not a threat, and possible mitigations
+
+As it seems we need a cryptographic primitive that looks somewhat like a signature. Different from a signature however,
+we have somewhat relaxed constraints here: While cryptographic signatures need to work over arbitrary inputs, all we
+want to ``sign'' here is the instruction to perform a safety reset. Since this is the only message we might ever want to
+transmit, our message space has only one entry and thus the informational content of our message is 0 bit! All the
+information we want to transmit is already encoded \emph{in the fact that we are transmitting}, and we do not require
+any further payload to be transmitted. This means we can omit the entirety of the message and just transmit whatever
+``signature'' we produce. This is useful since we have to conserve transmission bits so our transmissions do not take
+exceeedingly long time over our extremely slow communication channel.
+
+We could use any of several traditional asymmetric cryptographic primitives to produce these signatures. The
+comparatively high computational effort required for signature verification would not be an issue. Transmissions take
+several minutes anyway and we can afford to spend some tens of seconds even in signature verification. Transmission
+length and by proxy system latency would be determined by the length of the signature. For RSA signature length is the
+modulus length (i.e. larger than 1000 bit for even basic contemporary security). For elliptic curve-based systems
+signature size is approximately twice the curve length (i.e. ~300 bit for contemporary security). However, we can do
+better than this: We can exploit the strange nature of our setting that our effective message entropy is 0 bit to derive
+a more efficient scheme.
+
+\subsubsection{Lamport signatures}
+
+In 1979, \textcite{lamport02} introduced a signature scheme that is based only on a one-way function such as a
+cryptographic hash function. The basic observation is that by choosing a random secret input to a one-way function and
+publishing the output, one can later prove knowledge of the input by simply publishing it. In the following paragraphs
+we will describe a construction of a one-time signature scheme based on this observation. The scheme we describe is the
+one usually called a ``Lamport Signature'' in modern literature and is slightly different from the variant described in
+the 1979 paper, but for our purposes we can consider both to be equivalent.
+
+\paragraph{Setup.} In a Lamport signature, for an n-bit hash function $H$ the signer generates a private key $s =
+\left(s_{b, i} | b\in\left\{0, 1\right\}, 0\le i<n\right)$ of $2n$ random strings of length $n$. The signer publishes a
+public key $p = \left(p_{b, i} = H\left(s_{b, i}\right), b\in\left\{0, 1\right\}, 0\le i<n\right)$ that is simply the
+list of hashes of each of the random strings that make up the private key.
+
+\paragraph{Signing.} To sign a message $m$, the signer publishes the signature $\sigma = \left(\sigma_i = k_{H(m)_i,
+i}\right)$ where $H(m)_i$ is the $i$-th bit of $H$ applied to $m$. That is, for the $i$-th bit of the message's hash
+$H(m)$ the signer publishes either of $p_{0, i}$ or $p_{1, i}$ depending on the hash bit's value, keeping the other
+entry of $P$ secret.
+
+\paragraph{Verification.} The verifier can compute $H(m)$ themselves and check the corresponding entries $\sigma_i =
+k_{H(m)_i}$ of $S$ correctly evaluate to $p_{b, i} = H\left(s_{b, i}\right)$ from $P$ under $H$.
+
+The above scheme is a one-time signature scheme only. After one signature has been published for a given key, the
+corresponding key must not be re-used for other signatures. This is intutively clear as we are effectively publishing
+part of the private key as the signature, and if we were to publish a signature for another message an attacker could
+derive additional signatures by ``mixing'' the two published signatures.
+
+\subsubsection{Winternitz Signatures}
+An improvement to basic Lamport signatures as described above are Winternitz signatures as detailed in
+\textcite{merkle01} and \textcite{dods01}. Winternitz signatures reduce public key length as well as signature length
+for hash length $n$ from $2n$ to $\mathcal O \left(n/t\right)$ for some choice of parameter $t$ (usually a small number
+such as 4).
+
+\paragraph{Setup.} The signer generates a private key $s = \left(s_i\right)$ consisting of $\ceil{\frac{n}{t}}$ random
+bit strings. The signer publishes a public key $p = \left(H^{2^t}\left(s_i\right)\right)$ where each element
+$H^{2^t}\left(s_i\right)$ is the $2^t$-fold recursive application of $H$ to $s_i$.
+
+\paragraph{Signing.} The signer splits $m$ padded to a multiple of $t$ bits into $\ceil{\frac{n}{t}}$ chunks $m_i$ of
+$t$ bit each. The signer publishes the signature $\sigma = \left( \sigma_i = H^{m_i}\left(s_i\right) \right)$.
+
+\paragraph{Verification.} The verifier can calculate for each $\sigma_i = H^{m_i}\left(s_i\right)$ that $H^{2^t -
+m_i}\left(\sigma_i\right) = H^{2^t - m_i}\left(H^{m_i}\left(s_i\right)\right) = H^{2^t - m_i + m_i} \left(s_i\right) =
+p_i$.
+
+To prevent an attacker from forging additional signatures from one signature by calculating $\sigma_i' =
+H\left(\sigma_i\right)$ matching $m_i' = m_i + 1$, this scheme is usually paired with a simple checksum as described in
+\textcite{merkle01}.
+
+\subsubsection{Using hash-based signatures for trigger authentication}
+The most basic possible trigger authentication scheme would be to simply generate a random bit string secret key $s$ and
+publish $p = H(s)$ for some hash function $H$. To activate the trigger, $\sigma = s$ would be published and listeners
+could verify that $H(\sigma) = p = H(s)$. This simplistic scheme has one main disadvantage: It is a fundamentally
+one-time construction. To prevent an attacker from re-triggering a listener a second time by replaying a valid trigger
+$\sigma$ all listeners have to blacklist any ``used'' $\sigma$. Alas, this means we can only ever trigger a listener
+\emph{once}. The good part is that any listener that missed this trigger can still be triggered later, but the bad part
+is that once $s$ is burned we are out of options. The trivial solution to this would be to simply inform each listener
+with a whole list of public keys in advance. This however takes $n$ times the amount of space for $n$-fold
+retriggerability. Luckily we can easily derive a scheme that yields $n$-fold retriggerability while using no more same
+space than the original scheme by taking some inspiration from Winternitz signatures above.
+
+In this scheme the secret key $s$ is still a random bit string. The public key is $p = H^n(s)$ for n-times
+retriggerability. The $i$-th time the trigger is activated, $\sigma_i = H^n-i(s)$ is published, and every listener can
+verify that $\sigma_{i-1} = H\left(\sigma_i\right)$ with $\sigma_0 = p$. In case a listener missed one or more previous
+triggers it can simply continue computing $H\left(H\left(\sigma_i\right)\right)$ and
+$H\left(H\left(H\left(\sigma_i\right)\right)\right)$ until either reaching the $n$-th recursion level (indicating an
+invalid signature) or finding $H^n\left(\sigma_i\right) = \sigma_j$ with $sigma_j$ being the last signature this
+listener recorded, or $p$ in case there is none.
+
+This scheme provides replay protection through listeners memorizing the last signature they activated to. Public key
+length is equal to the length of the hash function $H$ used. Even for our embedded systems use case $n$ can
+realistically be up to $\mathcal O\left(10^3\right)$, which is easily enough for our application.
+
+% FIXME here and in previous ~2 pages get transmitter/receiver and sender/listener terminology straight. Also perhaps do
+% some sort of scenario definition introducing those terms somewhere.
+
\chapter{Practical implementation}
\section{Cryptographic validation}
\section{Data collection for channel validation}
+
+To design a solid system we needed to parametrize mains frequency variations under normal conditions. To set modulation
+amplitude as well as parameters of our modulation scheme we need a frequency spectrum of mains frequency variations
+(that is $\mathcal F\left(f(V(t))\right)$: Taking mains frequency $f(x)$ as a variable, the frequency spectrum of that
+variable, as opposed to the frequency spectrum of mains voltage $V(t)$ itself).
+
+\subsection{Grid Frequency Estimation}
+\label{frequency_estimation}
+In commercial power systems Phasor Measurement Units (PMUs) are used to precisely measure parameters of a mains voltage
+waveform. One of the parameters PMUs measure is mains frequency. PMUs are used as part of SCADA systems controlling
+transmission networks to characterize the operational state of the network.
+
+From a superficial viewpoint measuring mains frequency might seem like a simple problem. Take the mains voltage
+waveform, measure time between two rising-edge (or falling-edge) zero-crossings and take the inverse $f = t^{-1}$. In
+practice, phasor measurement units are significantly more complex than this. This discrepancy is due to the unhealthy
+% FIXME is this pun ok?
+combination of both high precision and quick response that is demanded from these units. High precision is necessary
+since variations of mains frequency under normal operating conditions are quite small--in the range of $5-10 \text{mHz}$
+over short intervals of time. Relative to the nominal $50 \text{Hz}$ this is a derivation of less than $100 \text{ppm}$.
+Relative to the corresponding $20 \text{ms}$ period that means a time derivation of about $2 \mu\text{s}$ from cycle to
+cycle. From this it is already obvious why a simplistic measurement cannot yield the required precision for manageable
+averaging times--we would need either a ADC sampling rate in the order of megabits or for a reconstruction through
+interpolated readings an impractically high ADC resolution.
+
+Detail on the inner workings of commercial phasor measurement units is scarce but given their essential role to SCADA
+systems there is a large amount of academic research on such algorithms\cite{narduzzi01,derviskadic01}. A popular
+approach to these systems is to perform a Short-Time Fourier Transform (STFT) on ADC data sampled at high sampling rate
+(e.g. $10 \text{kHz}$) and then perform some analysis on the frequency-domain data to precisely locate the strong peak
+around $50 \text{Hz}$. A key observation here is that FFT bin size is going to be much larger than required frequency
+resolution. This fundamental limitiation follows from the nyquist criterion %FIXME maybe cite? and if we had to process
+an \emph{arbitrary} signal this would highly limit our practical measurement accuracy
+\footnote{
+ Some software packages providing FFT or STFT primitives such as scipy\cite{virtanen01} allow the user to
+ super-sample FFT output by specifying an FFT width larger than input data length, padding the input data with zeros
+ on both sides. Note that in line with Nyquist this \emph{does not} actually provide finer output resolution but
+ instead just amounts to an interpolation between output bins. Depending on the downstream analysis algorithm it may
+ still be sensible to use this property of the DFT for interpolation, but in general it will be computationally
+ expensive compared to other interpolation methods and in any case it will not yield any better frequency resolution
+ aside from a hypothetical numerical advantage\cite{gasior01}.
+}.
+For this reason all approaches to mains frequency estimation are based on a model of the mains voltage waveform.
+Nominally, this waveform would be a perfect sine at $f = 50 \text{Hz}$. In practice it is a sine at $f \approx 50
+\text{Hz}$ superimposed with some aperiodic noise (e.g. irregular spikes from inductive loads being energized) as well
+as harmonic distortion that is caused by grid-topologically nearby devices with power factor
+\footnote{
+ Power factor is a power engineering term that is used to describe how close the current waveform of a load is to
+ that of a purely resistive load. Given sinusoidal input voltage $V(t) = V_\text{pk} \sin \paren{\omega_\text{nom}
+ t}$ with $\omega_\text{nom} = 2 \pi f_\text{nom} = 2 \pi \cdot 50 \text{Hz}$ being the nominal angular frequency,
+ the current waveform of a resistor with resistance $R \left[\Omega\right]$ according to Ohm's law would be $I(t) =
+ \frac{V(t)}{R} = \frac{1}{R} V_\text{pk} \sin\paren{\omega_\text{nom} t}$. In this case voltage and current are
+ perfectly in phase, i.e. the current at time $t$ is linear in voltage at constant factor $\frac{1}{R}$.
+
+ In contrast to this idealized scenario reality provides us with two common issues: One, the load may be reactive.
+ This means its current waveform is an ideal sinusoid, but there is a phase difference between mains voltage and load
+ current like so: $I(t) = \frac{V(t)}{R} = \frac{1}{\left|Z\right|} V_\text{pk} \sin\paren{\omega_\text{nom} t +
+ \varphi}$ $Z$ would be the load's complex impedance combining inductive, capacitive and resistive components and
+ $\varphi$ the phase difference between the resulting current waveform and the mains voltage waveform. A common case
+ of such loads are motors and the inductive ballasts in old fluorescent lighting fixtures.
+
+ The second potential issue are loads with non-sinusoidal current waveform. There are many classes of these but the
+ most common one are switching-mode power supplies. Most SMPS for modern electronic devices have an input stage
+ consisting of a bridge rectifier followed by a capacitor that provide high-voltage DC power to the following
+ switch-mode convert circuit. This rectifier-capacitor input stage under normal load draws a high current only at the
+ very peak of the input voltage sinusoid and draws almost zero current for most of the period.
+
+ These two cases are measured by \emph{displacement power factor} and \emph{distortion power factor} that when
+ combined yield the overall true power factor. The power factor is a key quantity in the design and operation of the
+ power grid since a high power factor (close to $1.0$ or an in-phase sinusoidal current waveform) yields lowest
+ transmission and generation losses.
+}
+$\cos \theta \neq 1.0$. Under a continous fourier transform over a long period the frequency spectrum of a signal
+distorted like this will be a low noise floor depending mainly on aperiodic noise on which a comb of harmonics as well
+as some sub-harmonics of $f \approx f_\text{nom} = 50Hz$ rides. The main peak at $f \approx f_\text{nom}$ will be very
+strong with the harmonics being approximately an order of magnitude weaker in energy and the noise floor being at least
+another order of magnitude weaker. See figure \ref{mains_voltage_spectrum} for a measured spectrum. This domain
+knowledge about the expected frequency spectrum of the signal can be employed in a number of interpolation techniques to
+re-construct the precise frequency of the spectrum's main component despite comparatively coarse STFT resolution and
+despite numerous distortions.
+
+\begin{figure}
+ \centering
+ \includegraphics{../lab-windows/fig_out/mains_voltage_spectrum}
+ \caption{Fourier transform of an 8 hour capture of mains voltage. Data was captured using our frequency measurement
+ sensor described in section \ref{sec-fsensor} and FFT'ed after applying a blackman window. Vertical lines indicate
+ $50 \text{Hz}$ and odd harmonics.}
+ \label{mains_voltage_spectrum}
+\end{figure}
+
+Published grid frequency estimation algorithms such as \textcite{narduzzi01} or \textcite{derviskadic01} are rather
+sophisticated and use a combination of techniques to reduce numerical errors in FFT calculation and peak fitting. Given
+that we do not need reference standard-grade accuracy for our application we chose to start with a very basic algorithm
+instead. We chose to use a general approach developed by experimental physicists at CERN that is described by
+\textcite{gasior01}. This approach assumes a general sinusoidal signal superimposed with harmonics and broadband noise.
+Applicable to a wide spectrum of practical signal analysis tasks it is a reasonable first-degree approximation of the
+much more sophisticated estimation algorithms developed specifically for power systems. Some algorithms have components
+such as kalman filters\cite{narduzzi01} that require a phyiscal model. As a general algorithm from \textcite{gasior01}
+does not require this kind of application-specific tuning, eliminating one source of error.
+
\subsection{Frequency sensor hardware design}
+\label{sec-fsensor}
+Our safety reset controller % FIXME is this the right term?
+will have to measure mains frequency to later demodulate a reset signal transmitted through it. Since we have decided to
+do our own frequency measurement system here we can use this frequency measurement setup as a prototype for the
+frequency measurement subcomponent of the demodulation system we will later develop. Since we do not plan to do a
+large-scale field deployment of our measurement setup we can keep the hardware implementation simple by moving most of
+the signal processing to a regular computer and concentrating our hardware efforts on raw signal capture.
+
+\begin{figure}
+ \begin{center}
+ \begin{tikzpicture}[start chain = going below, node distance = 12mm and 50mm, every join/.style = {norm}]
+ \tikzset{
+ base/.style = {draw, on chain, on grid, align=center, minimum height = 4ex, font=\footnotesize},
+ text/.style = {base},
+ component/.style = {base, rectangle, text width=40mm},
+ coord/.style = {coordinate, on chain, on grid, node distance=6mm and 25mm}
+ }
+ \node[text centered] (input) {Single-Phase Mains Input};
+ \node[component] (safety) [below = of input] {Input Protection};
+ \node[coord] (safety-anchor) [below = of safety] {};
+ \node[component] (analog) [below = of safety-anchor] {Analog Signal Processing};
+ \node[component] (powersupply) [left = of analog] {Power supply};
+ \node[component] (adc) [below = of analog] {ADC};
+ \node[component] (micro) [below = of adc] {Microcontroller};
+ \node[component] (isol) [below = of micro] {Galvanic Digital Isolation};
+ \node[coord] (isol-left) [left = 6cm of isol.west] {};
+ \node[coord] (isol-right) [right = 1cm of isol.east] {};
+ \node[component] (usb) [below = of isol] {USB interface};
+
+ \draw[->] (input.south) -- (safety.north);
+ \draw[-] (safety.south) -- (safety-anchor);
+ \draw[->] (safety-anchor) -| (powersupply.north);
+ \draw[->] (safety-anchor) -| (analog.north);
+ \draw[->] (powersupply.south) |- (adc.west);
+ \draw[->] (powersupply.south) |- (micro.west);
+ \draw[->] (analog.south) -- (adc.north);
+ \draw[->] (adc.south) -- (micro.north);
+ \draw[->] (micro.south) -- (isol.north);
+ \draw[->] (isol.south) -- (usb.north);
+
+ \draw[dashed] (isol.west) -- (isol-left.east);
+ \draw[dashed] (isol.east) -- (isol-right.west);
+ \end{tikzpicture}
+ \end{center}
+ \caption{Frequency sensor hardware diagram}
+ \label{fmeas-sens-diag}
+\end{figure}
+
+An overall block diagram of our system is shown in fig. \ref{fmeas-sens-diag}. The mircrocontroller we chose is an
+\texttt{STM32F030F4P6} ARM Cortex-M0 microcontroller made by ST Microelectronics. The ADC in fig. \ref{fmeas-sens-diag}
+in our design is the integrated 12-bit ADC of this microcontroller, which is sufficient for our purposes. The USB
+interface is a simple USB to serial converter IC (\texttt{CH340G}) and the galvanic digital isolation is accomplished
+with a pair of high-speed optocouplers on its \texttt{RX} and \texttt{TX} lines. The analog signal processing is a
+simple voltage divider using high-power resistors to get the required creepage along with some high-frequency filter
+capacitors and an op-amp buffer. The power supply is an off-the-shelf mains-input power module. The system is
+implemented on a single two-layer PCB that is housed in an off-the-shelf industrial plastic case fitted with a printed
+label and a few status lights on its front.
+
+\subsection{Clock accuracy considerations}
+
+Our measurement hardware will sample line voltage at some sampling rate $f_S$, e.g.\ $1 \text{kHz}$. All downstream
+processsing is limited in accuracy by the accuracy of $f_S$\footnote{
+We are not considering the effects of clock jitter. We are highly oversampling the signal and the FFT done in our
+downstream processing will eliminate small jitter effects leaving only frequency stability to worry about. }. We
+generate our sampling clock in hardware by clocking the ADC from one of the microcontroller's timer blocks clocked from
+the microcontroller's system clock. This means our ADC's sampling window will be synchronized cycle-accurate to the
+microcontroller's system clock.
+
+Our downstream measurement of mains frequency by nature is relative to our sampling frequency $f_S$. In the setup
+described above this means we have to make sure our system clock is fairly stable. A frequency derivation of $1
+\text{ppm}$ in our system clock causes a proportional grid frequency measurement error of $\Delta f = f_\text{nom} \cdot
+10^{-6} = 50 \mu\text{Hz}$. In a worst-case where our system is clocked from a particularly bad crystal that exhibits
+$100 \text{ppm}$ of instabilities over our measurement period we end up with an error of $5 \text{mHz}$. This is well
+within our target measurement range, so we need a more stable clock source. Ideally we want to avoid writing our own
+clock conditioning code where we try to change an oscillators operating frequency to match some reference. Clock
+conditioning algorithms are highly complex and in our case post-processing of measurement data and simply adding and
+offset is simpler and less error-prone.
+
+Our solution to these problems is to use a crystal oven\footnote{
+ A crystal oven is a crystal oscillator thermally coupled closely to a heater and temperature sensor and enclosed in
+ a thermally isolated case. The heater is controlled to hold the crystal oscillator at a near-constant temperature
+ some few ten degrees above ambient. Any ambient temperature variations will be absorbed by the temperature control.
+ This yields a crystal frequency that is almost completely unaffected by ambient temperature variations below the
+ oven temperature and whose main remaining instability is aging.
+}as our main system clock source. Crystal ovens are expensive compared to ordinary crystal oscillators. Since any
+crystal oven will be much more accurate than a standard room-temperature crystal we chose to reduce cost by using one
+recycled from old telecommunications equipment.
+
+To verify clock accuracy we routed an externally accessible SMA connector to a microcontroller pin that is routed to one
+of the microcontroller's timer inputs. By connecting a GPS 1pps signal to this pin and measuring its period we can
+calculate our system's Allan variance\footnote{
+ Allan variance is a measure of frequency stability between two clocks.
+}, thereby measuring both clock stability and clock accuracy.
+We ran a 4 hour test of our frequency sensor that generated the histogram shown in figure \ref{ocxo_freq_stability}.
+These results show that while we get a systematic error of about $10 \text{ppm}$ due to manufacturing tolerances the
+random error at less than $10 \text{ppb}$ is smaller than that of a room-temperature crystal oscillator by 3-4 orders of
+magnitude. Since we are interested in grid frequency variations over time but not in the absolute value of grid
+frequency the systematic error is of no consequence to us. The random error at $3.66 \text{ppb}$ corresponds to a
+frequency measurement error of about $0.2 \mu\text{Hz}$, well below what we can achieve at reasonable sampling rates and
+ADC resolution.
+
+\begin{figure}
+ \centering
+ \includegraphics{../lab-windows/fig_out/ocxo_freq_stability}
+ \caption{OCXO Frequency derivation from nominal $19.440 \text{MHz}$ measured against GPS 1pps}
+ \label{ocxo_freq_stability}
+\end{figure}
+
+\subsection{Firmware implementation}
+
+The firmware uses one of the microcontroller's timers clocked from an external crystal oscillator to produce an $1
+\text{ms}$ tick that the internal ADC is triggered from for a sample rate of $1 \text{ksps}$. Higher sample rates would
+be possible but reliable data transmission over the opto-isolated serial interface might prove challenging and $1
+\text{ksps}$ corresponds to $20$ samples per cycle at $f_\text{nominal}$. This is $10\times$ nyquist and should be
+plenty for accurate measurements.
+
+The ADC measurements are read using DMA and written into a circular buffer. Using some DMA controller features this
+circular buffer is split in back and front halves with one being written to and the other being read at the same time.
+Buffer contents are moved from the ADC DMA buffer into a packet-based reliable UART interface as they come in. The UART
+packet interface keeps two ringbuffers: One byte-based ringbuffer for transmission data and one ringbuffer pointer
+structure that keeps track of ADC data packet boundaries in the byte-based ringbuffer. Every time a chunk of data is
+available from the ADC the data is framed into the byte-based ringbuffer and the packet boundaries are logged in the
+packet pointer ringbuffer. If the UART transmitter is idle at this time a DMA-backed transmission of the oldest packet
+in the packet ringbuffer is triggered at this point. Data is framed using Consistent Overhead Byte Stuffing
+(COBS)\footnote{
+COBS is a framing technique that allows encoding $n$ bytes of arbitray data into exactly $n+1$ bytes with no embedded
+$0$-bytes that can then be delimited using $0$-bytes. COBS is simple to implement and allows both one-pass decoding and
+encoding. The encoder either needs to be able to read up to $256 \text{bytes}$ ahead or needs a buffer of $256
+\text{bytes}$. COBS is very robust in that it allows self-synchronization. At any point a receiver can reliably
+synchronize itself against a COBS data stream by waiting for the next $0$-byte. The constant overhead allows precise
+bandwidth and buffer planning and provides constant, good efficiency close to the theoretical maximum.
+}\cite{cheshire01} along with a CRC-32 checksum for error checking. When the host receives a new packet with a
+valid checksum it returns an acknowledgement packet to the sensor. When the sensor receives the acknowledgement, the
+acknowledged packet is dropped from the transmission packet ringbuffer. When the host detects an incorrect checksum it
+simply stays quiet and waits for the sensor to resume with retransmission when the next ADC buffer has been received.
+
+% FIXME make actual error rate measurements
+
+The serial interface logic presents most of the complexity of the sensor firmware. This complexity is necessary since
+we need reliable, error-checked transmission to the host. Though rare, bit errors on a serial interface do happen and
+data corruption is unacceptable. The packet-layer queueing on the sensor is necessary since the host is not a realtime
+system and unpredictable latency spikes of several hundred milliseconds are possible.
+
+The host in our recording setup is a Raspberry Pi 3 model B running a Python script. The Python script handles serial
+communication and logs data and errors into an SQLite database file. SQLite has been chosen for its simple yet flexible
+interface and its good tolerance of system resets due to unexpected power loss.
+
\subsection{Frequency sensor measurement results}
+Captured raw waveform data is processed in the Jupyter Lab environment\cite{kluyver01} and grid frequency estimates are
+extracted as described in sec. \ref{frequency_estimation} using the \textcite{gasior01} technique.
+
+% FIXME comparison against reference measurements?
+
\section{Channel simulation and parameter validation}
+
\section{Implementation of a demonstrator unit}
\section{Experimental results}