content/blog/sybil-resistance-identity/index-old.rst


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246

---
title: "Theia Attack Resistance and Digital Identity"
date: 2020-09-09T15:00:00+02:00
draft: true
---

.. raw:: html

    <figure class="header">
        <img src="images/succulents.jpg">
        <figcaption>Photo by <a href="https://unsplash.com/@timbennettcreative">Tim Bennett</a> on
        <a href="https://unsplash.com/">Unsplash</a></figcaption>
    </figure> 


Theia in Cyberspace
===================

In informatics, the term *distributed system* is used to describe the aggregate behavior of a complex network made up of
individual computers. For decades, computer scientists to some success have been trying to figure out how exactly the
individual computers that make up such a distributed system need to be programmed for the resulting amalgamation to
behave in a predictable, maybe even a desirable way. Though seemingly simple on its surface, this problem has a
surprising depth to it that has yielded research questions for a whole field for several decades now. One particular
as-of-yet unsolved problem is resistance against *theia attacks* (or "sybil" attacks in older terminology).

   Named after the 1973 book by Flora Rheta Schreiber on dissociative identity disorder, a sybil attack is an
   attack where one computer in a distributed system pretends to be multiple computers to gain an advantage. From your
   author's standpoint, naming a type of computer security attack after a medical condition was an unfortunate choice.
   For this reason this post uses the term *Theia attack* to refer to the same concept. Theia is a greek godess of light
   and glitter and the name alludes to the attacker performing something alike an optical illusion, causing the attacked
   to perceive multiple distinct images that in the end are all only reflections of the same attacker.

The core insight of computer science research on theia attacks is that there cannot be any technological way of
preventing such an attack, and any practical countermeasure must be grounded in some authority or ground truth that is
external to the systems—bridging from technology to its social or political context.

Looking around, we can see a parallel between this question ("which computer is a real computer?") and a social issue
that recently has been growing in importance: Just like computers can pretend to be other computers, they can also
pretend to be humans. As can humans. Be it within the context of election manipulation or down-to-earth astroturfing_
the recurring issue is that in today's online communities, it is hard for an individual to tell who of their online
acquaintances are who they seem to be. Different platforms attempt different solutions to this problem, and all fail in
some way or another. Facebook employs good old snitching, turning people against each other and asking them "Do you know
this person?". Twitter is more laid-back and avoids this Stasi_ methodology in favor of requiring a working mobile phone
number from its subjects, essentially short-circuiting identity verification to the phone company's check of their
subscriber's national passport.

.. the preceding is a simplified representation of these platform's practices. In particular facebook uses several
   methods depending on the case. I think this abbreviated discussion should be ok for the sake of the argument. I am
   not 100% certain on the accuracy on the accuracy of the statement though. Does fb still do the snitching thing? Is
   twitter usually content with a phone number?

Trusting Crypto-Anarchist Authorities
=====================================

Beyond these centralistic solutions to the problem, crypto-anarchists and anarcho-capitalists have been brewing on some
interesting novel approaches to online identity based on *blockchain* distributed ledger technology. Distributed
ledgers are a distributed systems design pattern that yields a system that works like an append-only logbook.
Participants can create new entries in this logbook, but no one—neither the original author, nor other participants—can
retroactively change a logbook entry once it has been written. In the blockchain model, past entries are essentially
written into stone. This near-perfect immutability is what opens them for a number of use cases from cryptographic
pseudo-currencies [#cryptocurrency]_.

An overview over a variety of these unconventional blockchain identity verification approaches can be found in `this
unpublished 2020 survey by Siddarth, Ivliev, Siri and Berman <https://arxiv.org/ftp/arxiv/papers/2008/2008.05300.pdf>`_.
They walk their readers through a number of different projects that try to solve the question "Is this human who they
pretend to be?" using joint socio-technological approaches. In the following few sections, you may find a short outline
of a small selection of them. The conlusion of this post will be a commentary on these approaches, and on the underlying
problem of identity in a digital world.

.. BrightID

In one scheme, identity is determined by "notary" computers that aggregate large amounts of information on a user's
social contacts. These computers then run an algorithm derived from the SybilGuard_, SybilLimit_ and SybilInfer_ lineage
of random-walk based algorithms. These algorithms assume that authentic social graphs are small world graphs: Everyone
knows everyone else through a friend's friend's friend. They also assume that there is an upper bound on how many
connections with authentic users an attacker can forge: Anyone who is not embedded into the graph well enough is cut
out. Like this, they put an upper limit on the number of theia identites an attacker can assume given a certian number
of connections to real people.

Disregarding the catastrophic privacy issues of storing large amounts of data on social relationships on someone else's
computer, this second assumption is where this model unfortunately breaks down. Applying common sense, it is completely
realistic for an attacker to forge a large number of social connections: This is precisely what most of social media
marketing is about! A more malicious angle on this would be to consider how in meatspace [#meatspacefn]_ multi-level
marketing schemes are successful in coaxing people to abuse their social graphs to disastrous consequences to the
well-being of themselves and others. Similar schemes would certainly be possible in cyberspace as well.  An additional
point to consider is that the upper limit SybilGuard_ and others place on the number of fake identities one can have is
simply not that strict at all. An attacker could still get away with a reasonable number of false identities before
getting caught by any such algorithm.

.. Duniter

In another scheme, identity is awarded to anyone who can convince several people already in the network to vouch for
them, and who is at most a few degrees removed from one of several pre-determined celebrities. Apart from again being
vulnerable to conmen and other scammers, this system has the glaring flaw of roundly refusing to recognize any person
who is not willing or able to engage with multiple of its members. Along with the system's informal requirement for
members to only vouch for people they have physically met this leads to a nonstarter in a cyberspace that grown
specifically *because* it transcends national borders and physical distance—two most serious obstacles to in-person
communication.

.. Idena Network

The last scheme I will outline in this post is based around a set of `Turing tests`_; that is, quizzes that are designed
to tell apart man and machine. In this system, all participants have to simultaneously undergo a Turing test once in a
fortnight. The idea is that this limits the number of theia identities an attacker can assume since they can only solve
that many Turing tests at the same time. The system uses a particular type of picture classification-based Turing test
and does not seem to be designed with the blind or mentally disabled in mind with accessibility concerns nowhere to be
found in the so-called "manifesto" published by its creators. But even ignoring that, the system obviously fails at an
even more basic level: The idea that everyone takes a Turing test at the same time only works in a world without time
zones. Or jobs for that matter. Also, it assumes that an attacker cannot simply hire a small army of people someplace
else to fool the system.

.. _SybilLimit: https://www.comp.nus.edu.sg/~yuhf/yuh-sybillimit.pdf
.. _SybilGuard: http://www.math.cmu.edu/~adf/research/SybilGuard.pdf
.. _SybilInfer: https://www.princeton.edu/~pmittal/publications/sybilinfer-ndss09.pdf
.. _`Turing Tests`: https://en.wikipedia.org/wiki/Turing_test

Identity between Cyberspace and Meatspace
=========================================

A common thread in these solutions, from the Facebook'esque Stasi_ methods to the crypto-anarchist challenge-response
utopias, is that they all approach digital identity as a question of Objective Truth™ that can unanimously be decided at
a system level—or that can be externalized to the next larger system such as the state. Alas, the important question
remains unasked:

    What *is* identity?

The answer to this question certainly depends on the system being examined. For example, an important reason the
capitalist corporations mentioned above require knowledge about their users' identity is to generate plausible
statistics for the advertisers that form their customer base, similar to how a farmer will keep statics on yield and
quality for the buyers of his crop. With this background, a full decoupling of platform accounts from a notion of legal
identity seems at odds with the platform's business model—and we will have to adjust our expectations for reform
accordingly.

A common thread among all systems mentioned above is that they all have a social component to them. For this common use
case of social systems, I want to make a suggestion on how we can approach digital identity in a more practical, less
discriminatory [#discriminatory]_ manner than any of the methods we discussed above. I think both using people's social
connections and proxying the decisions of external authorities such as the state are bad systems to decide who is a
person and who is not. I will now illustrate this point a bit. Let us think about how many digital identities a human
beign might have. First, consider the case of n=0, someone who simply wants no business with the system at all. For
simplicity, let us assume that we have solved this issue of consent, i.e.  every person who is identified by the system
consents to this practice.  For n=1, the approaches outlined above all provide some approximate solution. States may not
grant every human sufficient ID (e.g. children, the mentally disabled or prisoners might be left out), and the social
systems might fail to catch people who simply do not have any friends, but otherwise their approximations hold. Maybe.
But what about n=2, n=3, ...?  None of these systems adequately consider cases where a human being might legitimately
wish to hold multiple digital identities, non-maliciously.

Consider a hypothetical lesbian, conservative politician. An active social media presence is a core component of a
modern politician's carreer. At the same time, "conservative homophobe" is still well within the realm of tautology and
it would be legitimate for this politician to wish to not disclose a large fraction of their private life to the world
at large. They might have a separate online identity for matters related to it.  For this politician, the social
relationship-based systems referenced above would either incorporate outing as a design feature, or they would force
the politician to choose either of their two identities: To choose between private life and carreer. When deferring to
the state as the decider over personhood, at least the platform's operator would know about the outrageously sensitive
link between the politician's online identities. Clearly, no such solution can be considered socially just.

Let us try not to be caught up on saving the world at this point. The issue of conservative homophobia is out of the
scope of our consideration, and it is not one that anyone can solve in the near future. Magical realism aside, least of
all can some technological thing beckon this change. There is a case for legitimate uses of multiple, separate digital
identities, and we do not have a technical or political answer to it. All hope is not lost yet, though.  We can easily
undo this gordian knot by acknowledging an unspoken assumption that underlies any social relationships between real
people, past the procrustean bed of computer systems or organizational structures these relationships are cast into.

    As a function of social interaction, digital identities conform to roles_ in sociological terminology, and are not
    at all the same as personhood_. Roles are subjective and arise from a relationship between people, and a single
    person might legitimately perform different roles depending on context.

When computer scientists or programmers are creating new systems, there always is an (often implicit) modelling stage.
Formally, during this stage a domain expert and a modeller with a computer science background come together, each
contributing their knowledge to form a model that is both appropriate for real-world use and practical from an
engineering point of view. In practice, these two roles are often necessarily fulfilled by the same person, who is often
also the programmer of the thing. This leads to many computer systems using poor models. A typical example of this issue
are systems requiring a person's name that use three input fields labelled "First Name", "Middle Initial" and "Last
Name". These systems are often created by US-American programmers, who are used to this naming schema from their lived
experience. Unfortunately, this schema breaks down for those few billion people who use their last name first, who have
more than one middle name, or who have multiple given names and do not normally use the first one of those.

Once a system creator's implicit assumptions have been encoded into the system like this, it is often very hard to get
out of that situation. A pattern to use during careful modelling is to keep the model flexible to account for unforeseen
corner cases. For example, when modelling a system requiring a person's name, one would have to ask what the name is
used for. It may be the most sensible decision to simply ask the user for their name twice: Once in first name/last name
format for e.g. tax purposes, and once with a free-form text field for e.g. displaying on their account page.

While for names, many systems already use some form of flexible model by e.g. having a *handle* or *nickname* separate
from the *display name*, "social" systems still often are stuck with an identity model based around a concept of a
single, rigid identity. In practice, people perform different roles_ in different circumstances. When asking for a
person's identity, one would get wildly different answers from different people. A person's identity as perceived by
others is coupled to their relationship more than to some underlying, biological or administrative truth. Thinking back
to the straw man politician above, this is evident in subtle ways in almost all our everyday relationships: Some people
may know me by my legal name, some by my online nickname. To some I may be a computer scientist, to some a flatmate.
None of my friends and acquaintances have ever wanted to see my passport, or asked to take my DNA to ascertain that I am
a distinct human being from the other humans they know. Likewise, identifying me by my social connections is impractical
as it would require an exceedingly weird amount of what can only be described as snooping. Yet, this concept of a
single, consistent, global, true identity is exactly what up to now all technological solutions to the identity problem
are trying to achieve.

Building Bridges
================

I think I can offer you one main take-aways from the discussion above.

    During modelling social systems, focus on relationships—not identity.

Rephrased into more actionable points, as someone designing a social digital system, do the following:

0. Early in the design stages, take the time to consider fundamental modelling issues like this one. If you don't, you
   will likely get stuck with a sub-optimal model that will be hard to get rid of.
1. Where possible, be flexible. Allow people to chose their own identifier. Don't require them to use their real names,
   they may not wish to disclose those or they may not be in a format that is useful to you (they may be too long, too
   short, too ubiquituous, in foreign characters etc.). A free-form text field with a reasonable length limit is a good
   approach here.
2. Do not use credit cards or phone numbers to identify people. There are many people who do not have either, and
   scammers can simply buy this data in bulk on the darknet.
3. Allow people to create multiple identites [#accountswitchopsec]_, and acknowledge the role of social relationships in
   your interaction features. People have very legitimate reasons to separate areas of their lifes, and it is not for
   you or your computer to decide who is who to whom. If your thing requires a global search function, re-consider the
   data protection aspects of your system. If you want to encourage social functions in the face of bots and trolls,
   make it easy for people to share their identities out-of-band, such as through a QR code or a copy-and-pasteable
   short link. If you require someone's legal name or address for billing purposes, unify these identities behind the
   scenes if at all and allow them to act as if fully independent in public.

While change of perspective comes with its share of user experience challenges, but also with a promise for a more
human, more dignified online experience. Perhaps we can find a way to adapt cyberspace to humans, instead of continuing
trying it the other way around.

.. _astroturfing: https://en.wikipedia.org/wiki/Astroturfing
.. _Stasi: https://en.wikipedia.org/wiki/Stasi

.. [#cryptocurrency] Pseudo-currencies in that, while they provide some aspects of a regular currency such as ownership
        and transactions, they lack most others. Traditional currencies are backed by states, regulated by central banks
        tasked with maintaining their stability and ultimately provide accountability through law enforcement, courts
        and political elections.

.. [#discriminatory] Discriminatory as in discriminating against minorities, but also as in deciding what is and what is
        not.

.. [#accountswitchopsec] This does mean that you should not actively prevent people from creating multiple accounts. It
        does not necessarily entail building a proper user interface around this practice. If you do the latter, e.g. by
        offering a "switch identity" button or an identiy drop-down menu on a post submission form, you can easily
        encourage slip-ups that might disclose the connection between two identities, and you make it possible for
        someone hacking a single login to learn about this connection as well.

.. [#meatspacefn] Meatspace_ is where people physically are, as opposed to cyberspace

.. _Meatspace: https://dictionary.cambridge.org/dictionary/english/meatspace
.. _roles: https://en.wikipedia.org/wiki/Role
.. _personhood: https://en.wikipedia.org/wiki/Personhood