summaryrefslogtreecommitdiff
path: root/paper/ihsm_paper.tex
diff options
context:
space:
mode:
Diffstat (limited to 'paper/ihsm_paper.tex')
-rw-r--r--paper/ihsm_paper.tex96
1 files changed, 96 insertions, 0 deletions
diff --git a/paper/ihsm_paper.tex b/paper/ihsm_paper.tex
index 0188b48..4886693 100644
--- a/paper/ihsm_paper.tex
+++ b/paper/ihsm_paper.tex
@@ -352,6 +352,102 @@ Using longitudinal gaps in the mesh, our setup allows direct air cooling of regu
powerful processing capabilities that greatly increase the maximum possible power dissipation of the payload. In an
evolution of our design, the spinning mesh could even be designed to \emph{be} a cooling fan.
+\subsection{Long-term Operation}
+
+Like with other HSMs, practical use may require an IHSM to continuously run for a decade or even longer. As with other
+setups utilizing HSMs, a setup including IHSMs must be designed in a way that the failure of a small number of IHSMs
+does not compromise the system's security or reliability. Neither IHSMs nor traditional HSMs can withstand fire or
+flooding, so while a breach of security can be ruled out, a catastrophic failure of the device and erasure of data
+cannot~\cite{heise2021ovh}. Traditionally, this problem is solved by storing all secrets in multiple, geographically
+redundant HSMs~\cite{thales2015hsmha}. The problem of providing fault-tolerance in IHSMs is easier since they are based
+on general-purpose computer hardware and use general-purpose operating systems and thus allow for state-of-the-art
+database replication techniques to be applied. One example of this approach is a 2019 technology
+demonstration~\cite{signal2019} created by the signal.org, the organization running the signal secure messenger app. In
+this demonstration, signal.org have implemented the Raft consensus algorithm~\cite{ongaro2019} inside Intel SGX to
+replicate state between redundant instances.
+
+There are three main categories of challenges to an IHSM's longevity: Failure of components of the IHSM due to age and
+wear, failure of the external power supply and spurious triggering of the intrusion alarm by changes in the IHSM's
+environment. In the following paragraphs we will evaluate each of these categories in its practical impact.
+
+\paragraph{Component failure.}
+The failure mode of an IHSM's components is the same as in any other computer system and the same generic mitigation
+techniques apply. The expected lifetime of electronic components can be increased by using higher-spec components and by
+reducing thermal, mechanical and electrical stress. To reduce vibration stress on both rotor and stator, the rotor must
+be balanced. The main mechanical failure mode of an IHSM's is failure of the shaft bearings. By incorporating knowledge
+from other rotating devices that have a long lifetime such as cooling fans, this failure mode can be mitigated. A final
+noteworthy mechanical failure mode of an IHSM is dust buildup on the optical components of the communication link. This
+failure mode can be mitigated by routing cooling airflow such that it does not go past the communication link's optical
+components, as well as by filtering cooling air at the device's intakes.
+
+\paragraph{Power failure.}
+\label{sec-power-failure}
+After engineering an IHSM's components to survive years of continuous operation, the next major failure mode to be
+considered is power loss. Traditional HSMs solve the need for an always-on backup power supply by carrying large backup
+batteries. The low static power consumption of a traditional HSM's simple tamper detection circuitry allows for the use
+of non-replaceable backup batteries. An IHSM in contrast would likely require a rechargeable backup battery since its
+motor requires more power than the mesh monitoring circuit of a traditional HSM. In principle, a conventional
+Uninterruptible Power Supply (UPS) can be used, but in practice a productized IHSM might have a small, simple UPS
+integrated into its case. Conservatively assuming an average operating power consumption of $\SI{10}{\watt}$ for an
+IHSM's motor, a single large laptop battery with a capacity of $\SI{100}{\watt\hour}$~\cite{faa2018} could already power
+an IHSM for 10 hours continuously. If a built-in battery is undesirable, or if power outages of more than a few seconds
+at a time are unlikely (e.g.\ because the IHSM is connected to an external UPS or generator), the IHSM's rotor itself
+can be used as a flywheel for energy storage up to several seconds. By designing the IHSM's rotor to have low friction
+loss and high mass (e.g.\ by coupling it to an actual metal flywheel), longer power outages can be bridged. % FIXME
+
+\paragraph{Spurious alarms.}
+A spurious alarm would be as catastrophic as a failure of a critical component of an IHSM. For this reason, the
+likelihood of such an alarm failure must be minimized. In principle, there are two possible causes for a spurious alarm.
+One is a component failure such as a mesh trace breaking under vibration. This failure mode can be mitigated in the same
+way other failure modes are mitigated. The second possible cause is that the device is accelerated in excess of the
+range expected by its designers. There are several possible causes why an IHSM might move during normal operation. The
+IHSM may have to be transported between datacenters or relocated within a dataceter. Other vibrating machinery such as
+backup generators or large hard disk storage arrays may conduct vibration through the rack the IHSM is mounted inside
+into the IHSM. People working in the datacenter might bump the IHSM. Vibrations from nearby traffic such as trains may
+couple through the ground into the datacenter and into the IHSM. Finally, earthquakes will couple through any reasonable
+amount of vibration dampening.
+
+There are two key points to note on vibration dampening. One, the instantaneous mechanical power of a vibrating motion
+is proportional to the square of its amplitude when fixing frequency and the cube of its frequency when fixing
+amplitude. This means that to reach a certain instantaneous acceleration, much more power is needed in a high-frequency
+vibrating motion compared to lower frequencies. This observation interacts the second key point we want to note here:
+An ideal vibration dampener works the better the higher the frequency, and has a lower bound below which it does no
+longer dampen vibration transmission~\cite{kelly1993,beards1996,dixon2007}. In conclusion, these two observations mean
+that if we wish to reduce the likelihood of false detections by our IHSM tamper alarm we can effectively achieve this
+goal by damping high-frequency shock and vibration, as low-frequency shock or vibration components will not reach
+accelerations large enough to cause a false alarm.
+
+To put the above relations into perspective, consider that at an angular frequency of $\SI{1000}{rpm}$, we can expect an
+IHSM's tamper sensor to measure an acceleration of about $\SI{100}{g}$. Even the strongest earthquakes rarely reach a
+Peak Ground Acceleration (PGA) of $\SI{0.1}{g}$~\cite{yoshimitsu1990}. The highest measured PGA of the 2011 Tohoku
+earthquake was approximately $\SI{0.3}{g}$. Since earthquake vibrations are low-frequency and happen across a large
+geographic area, they nontheless dissipate a tremendous amound of mechanical power through an absolute acceleration that
+may seem low at first glance, but we can largely ignore them for the purposes of our tamper detection system. As
+another point of reference, consider a car crash. An acceleration above $\SI{10}{g}$ corresponds to a crash at roughly
+$\SI{30}{\kilo\meter\per\hour}$~\cite{ika2002}. Thus, an IHSM's tamper detection subsystem will be able to clearly
+distinguish attempts to stop the IHSM's rotation at an amplitude of $\SI{100}{g}$ from external accelerations. External
+acceleration that would come close in order of magnitude to the operating centrifugal acceleration at the periphery of
+an IHSM's rotor would likely destroy the IHSM.
+
+\subsection{Transportation}
+
+While unintentional acceleration is unlikely to cause false alarms in an IHSM when simple vibration damping is employed,
+there is an issue with intentionally moving an IHSM: The IHSM's rotor stores significant rotational energy and will
+respond to tipping with a precession force. This could become an issue when a larger IHSM is transported between e.g.\
+the manufacturer's premises and its destination data center. One solution to this problem is to transport the IHSM
+elastically mounted inside a shipping box that is weighted to resist precession forces. To reduce the amount of
+precession, the IHSM should be transported with its axis of rotation pointing upwards and its speed of rotation set to
+the lower end of the range permitted by its application's security requirements. The IHSM's software could allow for a
+temporary ``shipping mode'' to be entered that could slow down the IHSM and increase the tamper sensing accelerometer's
+thresholds.
+
+During shipping, the IHSM will require a continuous power supply. The most practical solution to this challenge is to
+ship the IHSM along with a small backup battery. Following our conservative estimate in Section~\ref{sec-power-failure},
+a 48-hour shipping window as is offered by many courier shipping services could easily be bridged with the equivalent of
+5-10 laptop batteries. In case a built-in battery backup is not necessary in the IHSM's application, these batteries
+could be connected as an external device that is disconnected and sent back to the IHSM's manufacturer after the IHSM
+has been installed.
+
\section{Attacks}
\label{sec_attacks}