summaryrefslogtreecommitdiff
path: root/ma/safety_reset.tex
diff options
context:
space:
mode:
Diffstat (limited to 'ma/safety_reset.tex')
-rw-r--r--ma/safety_reset.tex189
1 files changed, 138 insertions, 51 deletions
diff --git a/ma/safety_reset.tex b/ma/safety_reset.tex
index 8da7960..0fa3cb9 100644
--- a/ma/safety_reset.tex
+++ b/ma/safety_reset.tex
@@ -35,6 +35,7 @@
\usetikzlibrary{positioning}
\usetikzlibrary{shapes}
+\usepackage[binary-units]{siunitx}
\usepackage{hyperref}
\usepackage{tabularx}
\usepackage{commath}
@@ -486,10 +487,13 @@ controller by having both run on separate microcontrollers. Two, we keep the res
simple to reduce attack surface there.
\subsection{Regulatory and economical constraints}
-\subsection{Safety vs. Security: Opting for restoration instead of prevention}
+%FIXME
+\subsection{Safety vs. Security: Opting for restoration instead of prevention}
+%FIXME
\subsection{Technical outline of a safety reset system}
+%FIXME
\section{Communication channels on the grid}
@@ -557,10 +561,93 @@ of a single large transmitter faces lower bureaucratic hurdles than integration
hundreds of local systems each with autonomous goverance.
\subsubsection{The frequency dependance of grid frequency}
-% FIXME find a solid citation on this
+
+Despite the awesome complexity of large power grids the physics underlying their response to changes in load and
+generation is surprisingly simple. Individual machines (loads and generators) can be approximated by a small number of
+differential equations and the entire grid can be modelled by aggregating these approximations into a large system of
+linear differential equations. Evaluating these systems it has been found that in large power grids small-signal
+steady-state changes in generation/consumption power balance cause a linear change in
+frequency\cite{kundur01,entsoe02,entsoe04}. \emph{Small signal} here describes changes in power balance that are small
+compared to overall grid power. \emph{Steady state} describes changes over a timeframe of multiple cycles as opposed to
+transient events that only last a few milliseconds.
+
+This approximately linear relationship allows the specification of a coefficient linking $\Delta P$ and $\Delta f$ with
+unit \si{\watt\per\hertz}. In this thesis we are using the European power grid as our model system. We are
+using data provided by ENTSO-E (formerly UCTE), the governing association of european transmission system operators. In
+our calculations we use data for the continental european synchronous area, the largest synchronous area. $\frac{\Delta
+P}{\Delta f}$, called \emph{Overall Network Power Frequency Characteristic} by ENTSO-E is around
+\SI{25}{\giga\watt\per\hertz}.
+
+We can derive general design parameter for any system utilizing grid frequency as a communications channel from the
+policies of ENTSO-E\cite{entsoe02,entsoe03}. % FIXME introduce ENTSO-E on first use
+Probably any such system should stay below a modulation amplitude of \SI{100}{\milli\hertz} which is the threshold
+defined in the ENTSO-E incidents classification scale for a Scale 0-1 (from "Anomaly" to "Noteworthy Incident" scale)
+frequency degradation incident\cite{entsoe03} in the continental europe synchronous area.
\subsubsection{Control systems coupled to grid frequency}
+The ENTSO-E Operations Handbook Policy 1 chapter defines the activation threshold of primary control to be
+\SI{20}{\milli\hertz}. Ideally a modulation system would stay well below this threshold to avoid fighting the primary
+control reserve. Modulation line rate should probably be on the order of a few hundred millibaud.
+% FIXME is using "probably" here and in the previous paragraph ok?
+Modulation at such high rates would outpace primary control action which is specified by ENTSO-E as acting within
+between ``a few seconds'' and \SI{15}{\second}.
+
+The effective \emph{Network Power Frequency Characteristic} of primary control in the european grid is reported by
+ENTSO-E at around \SI{20}{\giga\watt\per\hertz}. Keeping modulation amplitude below this threshold would help to avoid
+spuriously triggering these control functions. This works out to an upper bound on modulation power of
+\SI{20}{\mega\watt\per\milli\hertz}.
+
+\subsubsection{Practical transmitter implementation}
+
+In its most basic form a transmitter for grid frequency modulation would be a very large controllable load connected to
+the power grid at a suitable vantage point. A spool of wire submerged in a body of cooling water (such as a small lake
+with a fence around it) along with a thyristor rectifier bank would likely suffice to perform this function during
+occassional cybersecurity incidents. We can however decrease hardware and maintenance investment even further compared
+to this rather uncultivated solution by repurposing regular large industrial loads to our transmitter purposes in an
+emergency situation. For some preliminary exploration we went through a list of energy-intensive industries in
+Europe\cite{ec01}. The most electricity-intensive industries in this list are primary aluminium and steel production.
+In primary production raw ore is converted into raw metal for further refinement such as casting, rolling or extrusion.
+In steelmaking iron is smolten in an electric arc furnace. In aluminium smelting aluminium is electrolytically extracted
+from alumina. Both processes involve large amounts of electricity with electricity making up \SI{40}{\percent} of
+production costs. Given these circumstances a steel mill or aluminium smelter would be good candidates as transmitters
+in a grid frequency modulation system.
+
+In aluminium smelting high-voltage mains is transformed, rectified and fed into about 100 series-connected cells forming
+a \emph{potline}. Inside the pots alumina is dissolved in molten cryolite electrolyte at about
+\SI{1000}{\degreeCelsius} and electrolysis is performed using a current of tens or hundreds of kiloampere. Resulting
+pure aluminium settles at the bottom of the cell and is tapped off for further processing.
+
+Like steelworks, aluminium smelters are operated night and day without interruption. Aside from metallurgical issues the
+large thermal mass and enormous heating power requirements do not permit power-cycling. Due to the high costs of
+production inefficiencies or interruptions the behavior of aluminium smelters under power outages is a fairly
+well-characterized phenomenon in the industry. The recent move away from nuclear power and to renewable energy has lead
+to an increase in fluctuations of electricity price throughout the day. These electricity price fluctuations have
+provided enough economic incentive to aluminium smelters to develop techniques to modulate smelter power consumption
+without affecting cell lifetime or the output product\cite{duessel01,eisma01}. Power outages of tens of minutes up to
+two hours reportedly do not cause problems in aluminium potlines and are in fact part of routine operation for purposes
+such as electrode changes\cite{eisma01,oye01}.
+
+The power supply system of an aluminium plant is managed through a highly-integrated control system as keeping all cells
+of a potline under optimal operating conditions is challenging. Modern power supply systems employ large banks of diodes
+or SCRs to rectify low-voltage AC to DC to be fed into the potline\cite{ayoub01}. The potline voltage can be controlled
+almost continuously through a combination of a tap changer and a transductor. The individual cell voltages can be
+controlled by changing the anode to cathode distance (ACD) by physically lowering or raising the anode. The potline
+power supply is connected to the high voltage input and to the potline through isolators and breakers.
+
+In an aluminium smelter most of the power is sunk into resistive losses and the electrolysis process. As such an
+aluminium smelter does not have any significant electromechanical inertia compared to the large rotating machines used
+in other industries. Depending on the capabilities of the rectifier controls high slew rates should be possible,
+permitting modulation at high\footnote{Aluminium smelter rectifiers are \emph{pulse rectifiers}. This means instead of
+simply rectifying the incoming three-phase voltage they use a special configuration of transformer secondaries and in
+some cases additional coils to produce a large number (such as 6) of equally spaced phases. Where
+a direct-connected three-phase rectifier would draw current in 6 pulses per cycle a pulse rectifier draws current in
+more, smaller pulses to increase power factor. E.g. a 12-pulse rectifier will draw current in 12 pulses per cycle. In
+the best case an SCR pulse rectifier switched at zero crossing should allow \SIrange{0}{100}{\percent} load changes from
+one rectifier pulse to the next, i.e. within a fraction of a single cycle.} data rates.
+
+% FIXME validate this \subsubsection with an expert
+
\subsubsection{Avoiding dangerous modes}
Modern power systems are complex electromechanical systems. Each component is controlled by several carefully tuned
@@ -836,18 +923,18 @@ waveform, measure time between two rising-edge (or falling-edge) zero-crossings
practice, phasor measurement units are significantly more complex than this. This discrepancy is due to the unhealthy
% FIXME is this pun ok?
combination of both high precision and quick response that is demanded from these units. High precision is necessary
-since variations of mains frequency under normal operating conditions are quite small--in the range of $5-10 \text{mHz}$
-over short intervals of time. Relative to the nominal $50 \text{Hz}$ this is a derivation of less than $100 \text{ppm}$.
-Relative to the corresponding $20 \text{ms}$ period that means a time derivation of about $2 \mu\text{s}$ from cycle to
-cycle. From this it is already obvious why a simplistic measurement cannot yield the required precision for manageable
-averaging times--we would need either a ADC sampling rate in the order of megabits or for a reconstruction through
-interpolated readings an impractically high ADC resolution.
+since variations of mains frequency under normal operating conditions are quite small--in the range of
+\SIrange{5}{10}{\milli\hertz} over short intervals of time. Relative to the nominal \SI{50}{\hertz} this is a derivation of
+less than \SI{100}{ppm} Relative to the corresponding \SI{20}{\milli\second} period that means a time derivation of
+about $2 \mu\text{s}$ from cycle to cycle. From this it is already obvious why a simplistic measurement cannot yield the
+required precision for manageable averaging times--we would need either a ADC sampling rate in the order of megabits or
+for a reconstruction through interpolated readings an impractically high ADC resolution.
Detail on the inner workings of commercial phasor measurement units is scarce but given their essential role to SCADA
systems there is a large amount of academic research on such algorithms\cite{narduzzi01,derviskadic01}. A popular
approach to these systems is to perform a Short-Time Fourier Transform (STFT) on ADC data sampled at high sampling rate
-(e.g. $10 \text{kHz}$) and then perform some analysis on the frequency-domain data to precisely locate the strong peak
-around $50 \text{Hz}$. A key observation here is that FFT bin size is going to be much larger than required frequency
+(e.g. \SI{10}{\kilo\hertz}) and then perform some analysis on the frequency-domain data to precisely locate the strong peak
+around \SI{50}{\hertz}. A key observation here is that FFT bin size is going to be much larger than required frequency
resolution. This fundamental limitiation follows from the nyquist criterion %FIXME maybe cite? and if we had to process
an \emph{arbitrary} signal this would highly limit our practical measurement accuracy
\footnote{
@@ -860,13 +947,13 @@ an \emph{arbitrary} signal this would highly limit our practical measurement acc
aside from a hypothetical numerical advantage\cite{gasior02}.
}.
For this reason all approaches to mains frequency estimation are based on a model of the mains voltage waveform.
-Nominally, this waveform would be a perfect sine at $f = 50 \text{Hz}$. In practice it is a sine at $f \approx 50
-\text{Hz}$ superimposed with some aperiodic noise (e.g. irregular spikes from inductive loads being energized) as well
-as harmonic distortion that is caused by grid-topologically nearby devices with power factor
+Nominally, this waveform would be a perfect sine at $f=\SI{50}{\hertz}$. In practice it is a sine at
+$f\approx\SI{50}{\hertz}$ superimposed with some aperiodic noise (e.g. irregular spikes from inductive loads being
+energized) as well as harmonic distortion that is caused by grid-topologically nearby devices with power factor
\footnote{
Power factor is a power engineering term that is used to describe how close the current waveform of a load is to
that of a purely resistive load. Given sinusoidal input voltage $V(t) = V_\text{pk} \sin \paren{\omega_\text{nom}
- t}$ with $\omega_\text{nom} = 2 \pi f_\text{nom} = 2 \pi \cdot 50 \text{Hz}$ being the nominal angular frequency,
+ t}$ with $\omega_\text{nom} = 2 \pi f_\text{nom} = 2 \pi \cdot \SI{50}{\hertz}$ being the nominal angular frequency,
the current waveform of a resistor with resistance $R \left[\Omega\right]$ according to Ohm's law would be $I(t) =
\frac{V(t)}{R} = \frac{1}{R} V_\text{pk} \sin\paren{\omega_\text{nom} t}$. In this case voltage and current are
perfectly in phase, i.e. the current at time $t$ is linear in voltage at constant factor $\frac{1}{R}$.
@@ -891,19 +978,19 @@ as harmonic distortion that is caused by grid-topologically nearby devices with
}
$\cos \theta \neq 1.0$. Under a continous fourier transform over a long period the frequency spectrum of a signal
distorted like this will be a low noise floor depending mainly on aperiodic noise on which a comb of harmonics as well
-as some sub-harmonics of $f \approx f_\text{nom} = 50Hz$ rides. The main peak at $f \approx f_\text{nom}$ will be very
-strong with the harmonics being approximately an order of magnitude weaker in energy and the noise floor being at least
-another order of magnitude weaker. See figure \ref{mains_voltage_spectrum} for a measured spectrum. This domain
-knowledge about the expected frequency spectrum of the signal can be employed in a number of interpolation techniques to
-re-construct the precise frequency of the spectrum's main component despite comparatively coarse STFT resolution and
-despite numerous distortions.
+as some sub-harmonics of $f \approx f_\text{nom} = \SI{50}{\hertz}$ rides. The main peak at $f \approx f_\text{nom}$
+will be very strong with the harmonics being approximately an order of magnitude weaker in energy and the noise floor
+being at least another order of magnitude weaker. See figure \ref{mains_voltage_spectrum} for a measured spectrum. This
+domain knowledge about the expected frequency spectrum of the signal can be employed in a number of interpolation
+techniques to re-construct the precise frequency of the spectrum's main component despite comparatively coarse STFT
+resolution and despite numerous distortions.
\begin{figure}
\centering
\includegraphics{../lab-windows/fig_out/mains_voltage_spectrum}
\caption{Fourier transform of a 24 hour capture of mains voltage. Data was captured using our frequency measurement
sensor described in section \ref{sec-fsensor} and FFT'ed after applying a blackman window. Vertical lines indicate
- $50 \text{Hz}$ and odd harmonics.}
+ \SI{50}{\hertz} and odd harmonics.}
\label{mains_voltage_spectrum}
\end{figure}
@@ -978,7 +1065,7 @@ label and a few status lights on its front.
\subsection{Clock accuracy considerations}
-Our measurement hardware will sample line voltage at some sampling rate $f_S$, e.g.\ $1 \text{kHz}$. All downstream
+Our measurement hardware will sample line voltage at some sampling rate $f_S$, e.g.\ \SI{1}{\kilo\hertz}. All downstream
processsing is limited in accuracy by the accuracy of $f_S$\footnote{
We are not considering the effects of clock jitter. We are highly oversampling the signal and the FFT done in our
downstream processing will eliminate small jitter effects leaving only frequency stability to worry about. }. We
@@ -987,10 +1074,10 @@ the microcontroller's system clock. This means our ADC's sampling window will be
microcontroller's system clock.
Our downstream measurement of mains frequency by nature is relative to our sampling frequency $f_S$. In the setup
-described above this means we have to make sure our system clock is fairly stable. A frequency derivation of $1
-\text{ppm}$ in our system clock causes a proportional grid frequency measurement error of $\Delta f = f_\text{nom} \cdot
-10^{-6} = 50 \mu\text{Hz}$. In a worst-case where our system is clocked from a particularly bad crystal that exhibits
-$100 \text{ppm}$ of instabilities over our measurement period we end up with an error of $5 \text{mHz}$. This is well
+described above this means we have to make sure our system clock is fairly stable. A frequency derivation of \SI{1}{ppm}
+in our system clock causes a proportional grid frequency measurement error of $\Delta f = f_\text{nom} \cdot
+10^{-6} = \SI{50}{\micro\hertz}$. In a worst-case where our system is clocked from a particularly bad crystal that exhibits
+\SI{100}{ppm} of instabilities over our measurement period we end up with an error of \SI{5}{\milli\hertz}. This is well
within our target measurement range, so we need a more stable clock source. Ideally we want to avoid writing our own
clock conditioning code where we try to change an oscillators operating frequency to match some reference. Clock
conditioning algorithms are highly complex and in our case post-processing of measurement data and simply adding and
@@ -1012,27 +1099,27 @@ calculate our system's Allan variance\footnote{
Allan variance is a measure of frequency stability between two clocks.
}, thereby measuring both clock stability and clock accuracy.
We ran a 4 hour test of our frequency sensor that generated the histogram shown in figure \ref{ocxo_freq_stability}.
-These results show that while we get a systematic error of about $10 \text{ppm}$ due to manufacturing tolerances the
-random error at less than $10 \text{ppb}$ is smaller than that of a room-temperature crystal oscillator by 3-4 orders of
+These results show that while we get a systematic error of about \SI{10}{ppm} due to manufacturing tolerances the
+random error at less than \SI{10}{ppb} is smaller than that of a room-temperature crystal oscillator by 3-4 orders of
magnitude. Since we are interested in grid frequency variations over time but not in the absolute value of grid
-frequency the systematic error is of no consequence to us. The random error at $3.66 \text{ppb}$ corresponds to a
-frequency measurement error of about $0.2 \mu\text{Hz}$, well below what we can achieve at reasonable sampling rates and
-ADC resolution.
+frequency the systematic error is of no consequence to us. The random error at \SI{3.66}{ppb} corresponds to a
+frequency measurement error of about \SI{0.2}{\micro\hertz}, well below what we can achieve at reasonable sampling rates
+and ADC resolution.
\begin{figure}
\centering
\includegraphics{../lab-windows/fig_out/ocxo_freq_stability}
- \caption{OCXO Frequency derivation from nominal $19.440 \text{MHz}$ measured against GPS 1pps}
+ \caption{OCXO Frequency derivation from nominal \SI{19.440}{\mega\hertz} measured against GPS 1pps}
\label{ocxo_freq_stability}
\end{figure}
\subsection{Firmware implementation}
-The firmware uses one of the microcontroller's timers clocked from an external crystal oscillator to produce an $1
-\text{ms}$ tick that the internal ADC is triggered from for a sample rate of $1 \text{ksps}$. Higher sample rates would
-be possible but reliable data transmission over the opto-isolated serial interface might prove challenging and $1
-\text{ksps}$ corresponds to $20$ samples per cycle at $f_\text{nominal}$. This is $10\times$ nyquist and should be
-plenty for accurate measurements.
+The firmware uses one of the microcontroller's timers clocked from an external crystal oscillator to produce an
+\SI{1}{\milli\second} tick that the internal ADC is triggered from for a sample rate of \SI{1}{\kilo sps}. Higher sample
+rates would be possible but reliable data transmission over the opto-isolated serial interface might prove challenging
+and \SI{1}{\kilo sps} corresponds to $20$ samples per cycle at $f_\text{nominal}$. This is $10\times$ nyquist and should
+be plenty for accurate measurements.
The ADC measurements are read using DMA and written into a circular buffer. Using some DMA controller features this
circular buffer is split in back and front halves with one being written to and the other being read at the same time.
@@ -1045,14 +1132,14 @@ in the packet ringbuffer is triggered at this point. Data is framed using Consis
(COBS)\footnote{
COBS is a framing technique that allows encoding $n$ bytes of arbitray data into exactly $n+1$ bytes with no embedded
$0$-bytes that can then be delimited using $0$-bytes. COBS is simple to implement and allows both one-pass decoding and
-encoding. The encoder either needs to be able to read up to $256 \text{bytes}$ ahead or needs a buffer of $256
-\text{bytes}$. COBS is very robust in that it allows self-synchronization. At any point a receiver can reliably
-synchronize itself against a COBS data stream by waiting for the next $0$-byte. The constant overhead allows precise
-bandwidth and buffer planning and provides constant, good efficiency close to the theoretical maximum.
-}\cite{cheshire01} along with a CRC-32 checksum for error checking. When the host receives a new packet with a
-valid checksum it returns an acknowledgement packet to the sensor. When the sensor receives the acknowledgement, the
-acknowledged packet is dropped from the transmission packet ringbuffer. When the host detects an incorrect checksum it
-simply stays quiet and waits for the sensor to resume with retransmission when the next ADC buffer has been received.
+encoding. The encoder either needs to be able to read up to \SI{256}{\byte} ahead or needs a buffer of \SI{256}{\byte}.
+COBS is very robust in that it allows self-synchronization. At any point a receiver can reliably synchronize itself
+against a COBS data stream by waiting for the next $0$-byte. The constant overhead allows precise bandwidth and buffer
+planning and provides constant, good efficiency close to the theoretical maximum.}\cite{cheshire01} along with a
+CRC-32 checksum for error checking. When the host receives a new packet with a valid checksum it returns an
+acknowledgement packet to the sensor. When the sensor receives the acknowledgement, the acknowledged packet is dropped
+from the transmission packet ringbuffer. When the host detects an incorrect checksum it simply stays quiet and waits for
+the sensor to resume with retransmission when the next ADC buffer has been received.
% FIXME make actual error rate measurements
@@ -1095,9 +1182,9 @@ interface and its good tolerance of system resets due to unexpected power loss.
\centering
\includegraphics{../lab-windows/fig_out/mains_voltage_spectrum}
\caption{Power spectral density of the mains voltage trace in fig. \ref{freq_meas_trace}. We can see the expected
- peak at $50 \text{Hz}$ along with smaller peaks at odd harmonics. We can also see a number of spurious tones both
- between harmonics and at low frequencies, as well as some bands containing high noise energy around $0.1
- \text{Hz}$. This graph demonstrates a high signal-to-noise ratio that is not very demanding on our frequency
+ peak at \SI{50}{\hertz} along with smaller peaks at odd harmonics. We can also see a number of spurious tones both
+ between harmonics and at low frequencies, as well as some bands containing high noise energy around
+ \SI{0.1}{\hertz}. This graph demonstrates a high signal-to-noise ratio that is not very demanding on our frequency
estimation algorithm.
}
\label{mains_voltage_spectrum}
@@ -1108,9 +1195,9 @@ interface and its good tolerance of system resets due to unexpected power loss.
\includegraphics[width=\textwidth]{../lab-windows/fig_out/freq_meas_spectrum}
\caption{Power spectral density of the 24 hour grid frequency trace in fig. \ref{freq_meas_trace} with some notable
peaks annotated with the corresponding period in seconds. The $\frac{1}{f}$ line indicates a pink noise spectrum.
- Around a period of $20 \text{s}$ the PSD starts to fall off at about $\frac{1}{f^3}$ until we can make out some
- bumps at periods around $2$ and $3 \text{s}$. Starting at at around $1 \text{Hz}$ we can see a white noise floor in
- the order of $\frac{\mu\text{Hz}^2}{\text{Hz}}$.
+ Around a period of \SI{20}{\second} the PSD starts to fall off at about $\frac{1}{f^3}$ until we can make out some
+ bumps at periods around $2$ and \SI{3}{\second}. Starting at at around \SI{1}{Hz} we can see a white noise floor in
+ the order of \si{\micro\hertz^2\per\hertz}.
% TODO: where does this noise floor come from? Is it a fundamental property of the grid? Is it due to limitations of
% our measurement setup (such as ocxo stability/phase noise) ???
}