diff options
author | jaseg <git@jaseg.de> | 2020-11-06 18:43:45 +0100 |
---|---|---|
committer | jaseg <git@jaseg.de> | 2020-11-06 18:43:45 +0100 |
commit | f053b9ded4bead8e508e3fdd4d6a7c7b6933ffc1 (patch) | |
tree | 938e8244127f8d6c81b44a0b904c5452fd882bbb /content/posts/sybil-resistance-identity | |
parent | 01a54b71560dc6055fe7b2e89534a8e85e85bc39 (diff) | |
download | blog-f053b9ded4bead8e508e3fdd4d6a7c7b6933ffc1.tar.gz blog-f053b9ded4bead8e508e3fdd4d6a7c7b6933ffc1.tar.bz2 blog-f053b9ded4bead8e508e3fdd4d6a7c7b6933ffc1.zip |
Start splitting up sybil post
Diffstat (limited to 'content/posts/sybil-resistance-identity')
-rw-r--r-- | content/posts/sybil-resistance-identity/index-old.rst | 244 | ||||
-rw-r--r-- | content/posts/sybil-resistance-identity/index.rst | 302 |
2 files changed, 317 insertions, 229 deletions
diff --git a/content/posts/sybil-resistance-identity/index-old.rst b/content/posts/sybil-resistance-identity/index-old.rst new file mode 100644 index 0000000..6f1bee3 --- /dev/null +++ b/content/posts/sybil-resistance-identity/index-old.rst @@ -0,0 +1,244 @@ +--- +title: "Theia Attack Resistance and Digital Identity" +date: 2020-09-09T15:00:00+02:00 +--- + +.. raw:: html + + <figure class="header"> + <img src="images/succulents.jpg"> + <figcaption>Photo by <a href="https://unsplash.com/@timbennettcreative">Tim Bennett</a> on <a href="https://unsplash.com/">Unsplash</a></figcaption> + </figure> + + +Theia in Cyberspace +=================== + +In informatics, the term *distributed system* is used to describe the aggregate behavior of a complex network made up of +individual computers. For decades, computer scientists to some success have been trying to figure out how exactly the +individual computers that make up such a distributed system need to be programmed for the resulting amalgamation to +behave in a predictable, maybe even a desirable way. Though seemingly simple on its surface, this problem has a +surprising depth to it that has yielded research questions for a whole field for several decades now. One particular +as-of-yet unsolved problem is resistance against *theia attacks* (or "sybil" attacks in older terminology). + + Named after the 1973 book by Flora Rheta Schreiber on dissociative identity disorder, a sybil attack is an + attack where one computer in a distributed system pretends to be multiple computers to gain an advantage. From your + author's standpoint, naming a type of computer security attack after a medical condition was an unfortunate choice. + For this reason this post uses the term *Theia attack* to refer to the same concept. Theia is a greek godess of light + and glitter and the name alludes to the attacker performing something alike an optical illusion, causing the attacked + to perceive multiple distinct images that in the end are all only reflections of the same attacker. + +The core insight of computer science research on theia attacks is that there cannot be any technological way of +preventing such an attack, and any practical countermeasure must be grounded in some authority or ground truth that is +external to the systems—bridging from technology to its social or political context. + +Looking around, we can see a parallel between this question ("which computer is a real computer?") and a social issue +that recently has been growing in importance: Just like computers can pretend to be other computers, they can also +pretend to be humans. As can humans. Be it within the context of election manipulation or down-to-earth astroturfing_ +the recurring issue is that in today's online communities, it is hard for an individual to tell who of their online +acquaintances are who they seem to be. Different platforms attempt different solutions to this problem, and all fail in +some way or another. Facebook employs good old snitching, turning people against each other and asking them "Do you know +this person?". Twitter is more laid-back and avoids this Stasi_ methodology in favor of requiring a working mobile phone +number from its subjects, essentially short-circuiting identity verification to the phone company's check of their +subscriber's national passport. + +.. the preceding is a simplified representation of these platform's practices. In particular facebook uses several + methods depending on the case. I think this abbreviated discussion should be ok for the sake of the argument. I am + not 100% certain on the accuracy on the accuracy of the statement though. Does fb still do the snitching thing? Is + twitter usually content with a phone number? + +Trusting Crypto-Anarchist Authorities +===================================== + +Beyond these centralistic solutions to the problem, crypto-anarchists and anarcho-capitalists have been brewing on some +interesting novel approaches to online identity based on *blockchain* distributed ledger technology. Distributed +ledgers are a distributed systems design pattern that yields a system that works like an append-only logbook. +Participants can create new entries in this logbook, but no one—neither the original author, nor other participants—can +retroactively change a logbook entry once it has been written. In the blockchain model, past entries are essentially +written into stone. This near-perfect immutability is what opens them for a number of use cases from cryptographic +pseudo-currencies [#cryptocurrency]_. + +An overview over a variety of these unconventional blockchain identity verification approaches can be found in `this +unpublished 2020 survey by Siddarth, Ivliev, Siri and Berman <https://arxiv.org/ftp/arxiv/papers/2008/2008.05300.pdf>`_. +They walk their readers through a number of different projects that try to solve the question "Is this human who they +pretend to be?" using joint socio-technological approaches. In the following few sections, you may find a short outline +of a small selection of them. The conlusion of this post will be a commentary on these approaches, and on the underlying +problem of identity in a digital world. + +.. BrightID + +In one scheme, identity is determined by "notary" computers that aggregate large amounts of information on a user's +social contacts. These computers then run an algorithm derived from the SybilGuard_, SybilLimit_ and SybilInfer_ lineage +of random-walk based algorithms. These algorithms assume that authentic social graphs are small world graphs: Everyone +knows everyone else through a friend's friend's friend. They also assume that there is an upper bound on how many +connections with authentic users an attacker can forge: Anyone who is not embedded into the graph well enough is cut +out. Like this, they put an upper limit on the number of theia identites an attacker can assume given a certian number +of connections to real people. + +Disregarding the catastrophic privacy issues of storing large amounts of data on social relationships on someone else's +computer, this second assumption is where this model unfortunately breaks down. Applying common sense, it is completely +realistic for an attacker to forge a large number of social connections: This is precisely what most of social media +marketing is about! A more malicious angle on this would be to consider how in meatspace [#meatspacefn]_ multi-level +marketing schemes are successful in coaxing people to abuse their social graphs to disastrous consequences to the +well-being of themselves and others. Similar schemes would certainly be possible in cyberspace as well. An additional +point to consider is that the upper limit SybilGuard_ and others place on the number of fake identities one can have is +simply not that strict at all. An attacker could still get away with a reasonable number of false identities before +getting caught by any such algorithm. + +.. Duniter + +In another scheme, identity is awarded to anyone who can convince several people already in the network to vouch for +them, and who is at most a few degrees removed from one of several pre-determined celebrities. Apart from again being +vulnerable to conmen and other scammers, this system has the glaring flaw of roundly refusing to recognize any person +who is not willing or able to engage with multiple of its members. Along with the system's informal requirement for +members to only vouch for people they have physically met this leads to a nonstarter in a cyberspace that grown +specifically *because* it transcends national borders and physical distance—two most serious obstacles to in-person +communication. + +.. Idena Network + +The last scheme I will outline in this post is based around a set of `Turing tests`_; that is, quizzes that are designed +to tell apart man and machine. In this system, all participants have to simultaneously undergo a Turing test once in a +fortnight. The idea is that this limits the number of theia identities an attacker can assume since they can only solve +that many Turing tests at the same time. The system uses a particular type of picture classification-based Turing test +and does not seem to be designed with the blind or mentally disabled in mind with accessibility concerns nowhere to be +found in the so-called "manifesto" published by its creators. But even ignoring that, the system obviously fails at an +even more basic level: The idea that everyone takes a Turing test at the same time only works in a world without time +zones. Or jobs for that matter. Also, it assumes that an attacker cannot simply hire a small army of people someplace +else to fool the system. + +.. _SybilLimit: https://www.comp.nus.edu.sg/~yuhf/yuh-sybillimit.pdf +.. _SybilGuard: http://www.math.cmu.edu/~adf/research/SybilGuard.pdf +.. _SybilInfer: https://www.princeton.edu/~pmittal/publications/sybilinfer-ndss09.pdf +.. _`Turing Tests`: https://en.wikipedia.org/wiki/Turing_test + +Identity between Cyberspace and Meatspace +========================================= + +A common thread in these solutions, from the Facebook'esque Stasi_ methods to the crypto-anarchist challenge-response +utopias, is that they all approach digital identity as a question of Objective Truth™ that can unanimously be decided at +a system level—or that can be externalized to the next larger system such as the state. Alas, the important question +remains unasked: + + What *is* identity? + +The answer to this question certainly depends on the system being examined. For example, an important reason the +capitalist corporations mentioned above require knowledge about their users' identity is to generate plausible +statistics for the advertisers that form their customer base, similar to how a farmer will keep statics on yield and +quality for the buyers of his crop. With this background, a full decoupling of platform accounts from a notion of legal +identity seems at odds with the platform's business model—and we will have to adjust our expectations for reform +accordingly. + +A common thread among all systems mentioned above is that they all have a social component to them. For this common use +case of social systems, I want to make a suggestion on how we can approach digital identity in a more practical, less +discriminatory [#discriminatory]_ manner than any of the methods we discussed above. I think both using people's social +connections and proxying the decisions of external authorities such as the state are bad systems to decide who is a +person and who is not. I will now illustrate this point a bit. Let us think about how many digital identities a human +beign might have. First, consider the case of n=0, someone who simply wants no business with the system at all. For +simplicity, let us assume that we have solved this issue of consent, i.e. every person who is identified by the system +consents to this practice. For n=1, the approaches outlined above all provide some approximate solution. States may not +grant every human sufficient ID (e.g. children, the mentally disabled or prisoners might be left out), and the social +systems might fail to catch people who simply do not have any friends, but otherwise their approximations hold. Maybe. +But what about n=2, n=3, ...? None of these systems adequately consider cases where a human being might legitimately +wish to hold multiple digital identities, non-maliciously. + +Consider a hypothetical lesbian, conservative politician. An active social media presence is a core component of a +modern politician's carreer. At the same time, "conservative homophobe" is still well within the realm of tautology and +it would be legitimate for this politician to wish to not disclose a large fraction of their private life to the world +at large. They might have a separate online identity for matters related to it. For this politician, the social +relationship-based systems referenced above would either incorporate outing as a design feature, or they would force +the politician to choose either of their two identities: To choose between private life and carreer. When deferring to +the state as the decider over personhood, at least the platform's operator would know about the outrageously sensitive +link between the politician's online identities. Clearly, no such solution can be considered socially just. + +Let us try not to be caught up on saving the world at this point. The issue of conservative homophobia is out of the +scope of our consideration, and it is not one that anyone can solve in the near future. Magical realism aside, least of +all can some technological thing beckon this change. There is a case for legitimate uses of multiple, separate digital +identities, and we do not have a technical or political answer to it. All hope is not lost yet, though. We can easily +undo this gordian knot by acknowledging an unspoken assumption that underlies any social relationships between real +people, past the procrustean bed of computer systems or organizational structures these relationships are cast into. + + As a function of social interaction, digital identities conform to roles_ in sociological terminology, and are not + at all the same as personhood_. Roles are subjective and arise from a relationship between people, and a single + person might legitimately perform different roles depending on context. + +When computer scientists or programmers are creating new systems, there always is an (often implicit) modelling stage. +Formally, during this stage a domain expert and a modeller with a computer science background come together, each +contributing their knowledge to form a model that is both appropriate for real-world use and practical from an +engineering point of view. In practice, these two roles are often necessarily fulfilled by the same person, who is often +also the programmer of the thing. This leads to many computer systems using poor models. A typical example of this issue +are systems requiring a person's name that use three input fields labelled "First Name", "Middle Initial" and "Last +Name". These systems are often created by US-American programmers, who are used to this naming schema from their lived +experience. Unfortunately, this schema breaks down for those few billion people who use their last name first, who have +more than one middle name, or who have multiple given names and do not normally use the first one of those. + +Once a system creator's implicit assumptions have been encoded into the system like this, it is often very hard to get +out of that situation. A pattern to use during careful modelling is to keep the model flexible to account for unforeseen +corner cases. For example, when modelling a system requiring a person's name, one would have to ask what the name is +used for. It may be the most sensible decision to simply ask the user for their name twice: Once in first name/last name +format for e.g. tax purposes, and once with a free-form text field for e.g. displaying on their account page. + +While for names, many systems already use some form of flexible model by e.g. having a *handle* or *nickname* separate +from the *display name*, "social" systems still often are stuck with an identity model based around a concept of a +single, rigid identity. In practice, people perform different roles_ in different circumstances. When asking for a +person's identity, one would get wildly different answers from different people. A person's identity as perceived by +others is coupled to their relationship more than to some underlying, biological or administrative truth. Thinking back +to the straw man politician above, this is evident in subtle ways in almost all our everyday relationships: Some people +may know me by my legal name, some by my online nickname. To some I may be a computer scientist, to some a flatmate. +None of my friends and acquaintances have ever wanted to see my passport, or asked to take my DNA to ascertain that I am +a distinct human being from the other humans they know. Likewise, identifying me by my social connections is impractical +as it would require an exceedingly weird amount of what can only be described as snooping. Yet, this concept of a +single, consistent, global, true identity is exactly what up to now all technological solutions to the identity problem +are trying to achieve. + +Building Bridges +================ + +I think I can offer you one main take-aways from the discussion above. + + During modelling social systems, focus on relationships—not identity. + +Rephrased into more actionable points, as someone designing a social digital system, do the following: + +0. Early in the design stages, take the time to consider fundamental modelling issues like this one. If you don't, you + will likely get stuck with a sub-optimal model that will be hard to get rid of. +1. Where possible, be flexible. Allow people to chose their own identifier. Don't require them to use their real names, + they may not wish to disclose those or they may not be in a format that is useful to you (they may be too long, too + short, too ubiquituous, in foreign characters etc.). A free-form text field with a reasonable length limit is a good + approach here. +2. Do not use credit cards or phone numbers to identify people. There are many people who do not have either, and + scammers can simply buy this data in bulk on the darknet. +3. Allow people to create multiple identites [#accountswitchopsec]_, and acknowledge the role of social relationships in + your interaction features. People have very legitimate reasons to separate areas of their lifes, and it is not for + you or your computer to decide who is who to whom. If your thing requires a global search function, re-consider the + data protection aspects of your system. If you want to encourage social functions in the face of bots and trolls, + make it easy for people to share their identities out-of-band, such as through a QR code or a copy-and-pasteable + short link. If you require someone's legal name or address for billing purposes, unify these identities behind the + scenes if at all and allow them to act as if fully independent in public. + +While change of perspective comes with its share of user experience challenges, but also with a promise for a more +human, more dignified online experience. Perhaps we can find a way to adapt cyberspace to humans, instead of continuing +trying it the other way around. + +.. _astroturfing: https://en.wikipedia.org/wiki/Astroturfing +.. _Stasi: https://en.wikipedia.org/wiki/Stasi + +.. [#cryptocurrency] Pseudo-currencies in that, while they provide some aspects of a regular currency such as ownership + and transactions, they lack most others. Traditional currencies are backed by states, regulated by central banks + tasked with maintaining their stability and ultimately provide accountability through law enforcement, courts + and political elections. + +.. [#discriminatory] Discriminatory as in discriminating against minorities, but also as in deciding what is and what is + not. + +.. [#accountswitchopsec] This does mean that you should not actively prevent people from creating multiple accounts. It + does not necessarily entail building a proper user interface around this practice. If you do the latter, e.g. by + offering a "switch identity" button or an identiy drop-down menu on a post submission form, you can easily + encourage slip-ups that might disclose the connection between two identities, and you make it possible for + someone hacking a single login to learn about this connection as well. + +.. [#meatspacefn] Meatspace_ is where people physically are, as opposed to cyberspace + +.. _Meatspace: https://dictionary.cambridge.org/dictionary/english/meatspace +.. _roles: https://en.wikipedia.org/wiki/Role +.. _personhood: https://en.wikipedia.org/wiki/Personhood diff --git a/content/posts/sybil-resistance-identity/index.rst b/content/posts/sybil-resistance-identity/index.rst index 6f1bee3..6e5acd4 100644 --- a/content/posts/sybil-resistance-identity/index.rst +++ b/content/posts/sybil-resistance-identity/index.rst @@ -1,5 +1,5 @@ --- -title: "Theia Attack Resistance and Digital Identity" +title: "Identity between Cyberspace and Meatspace" date: 2020-09-09T15:00:00+02:00 --- @@ -10,235 +10,79 @@ date: 2020-09-09T15:00:00+02:00 <figcaption>Photo by <a href="https://unsplash.com/@timbennettcreative">Tim Bennett</a> on <a href="https://unsplash.com/">Unsplash</a></figcaption> </figure> - -Theia in Cyberspace -=================== - -In informatics, the term *distributed system* is used to describe the aggregate behavior of a complex network made up of -individual computers. For decades, computer scientists to some success have been trying to figure out how exactly the -individual computers that make up such a distributed system need to be programmed for the resulting amalgamation to -behave in a predictable, maybe even a desirable way. Though seemingly simple on its surface, this problem has a -surprising depth to it that has yielded research questions for a whole field for several decades now. One particular -as-of-yet unsolved problem is resistance against *theia attacks* (or "sybil" attacks in older terminology). - - Named after the 1973 book by Flora Rheta Schreiber on dissociative identity disorder, a sybil attack is an - attack where one computer in a distributed system pretends to be multiple computers to gain an advantage. From your - author's standpoint, naming a type of computer security attack after a medical condition was an unfortunate choice. - For this reason this post uses the term *Theia attack* to refer to the same concept. Theia is a greek godess of light - and glitter and the name alludes to the attacker performing something alike an optical illusion, causing the attacked - to perceive multiple distinct images that in the end are all only reflections of the same attacker. - -The core insight of computer science research on theia attacks is that there cannot be any technological way of -preventing such an attack, and any practical countermeasure must be grounded in some authority or ground truth that is -external to the systems—bridging from technology to its social or political context. - -Looking around, we can see a parallel between this question ("which computer is a real computer?") and a social issue -that recently has been growing in importance: Just like computers can pretend to be other computers, they can also -pretend to be humans. As can humans. Be it within the context of election manipulation or down-to-earth astroturfing_ -the recurring issue is that in today's online communities, it is hard for an individual to tell who of their online -acquaintances are who they seem to be. Different platforms attempt different solutions to this problem, and all fail in -some way or another. Facebook employs good old snitching, turning people against each other and asking them "Do you know -this person?". Twitter is more laid-back and avoids this Stasi_ methodology in favor of requiring a working mobile phone -number from its subjects, essentially short-circuiting identity verification to the phone company's check of their -subscriber's national passport. - -.. the preceding is a simplified representation of these platform's practices. In particular facebook uses several - methods depending on the case. I think this abbreviated discussion should be ok for the sake of the argument. I am - not 100% certain on the accuracy on the accuracy of the statement though. Does fb still do the snitching thing? Is - twitter usually content with a phone number? - -Trusting Crypto-Anarchist Authorities -===================================== - -Beyond these centralistic solutions to the problem, crypto-anarchists and anarcho-capitalists have been brewing on some -interesting novel approaches to online identity based on *blockchain* distributed ledger technology. Distributed -ledgers are a distributed systems design pattern that yields a system that works like an append-only logbook. -Participants can create new entries in this logbook, but no one—neither the original author, nor other participants—can -retroactively change a logbook entry once it has been written. In the blockchain model, past entries are essentially -written into stone. This near-perfect immutability is what opens them for a number of use cases from cryptographic -pseudo-currencies [#cryptocurrency]_. - -An overview over a variety of these unconventional blockchain identity verification approaches can be found in `this -unpublished 2020 survey by Siddarth, Ivliev, Siri and Berman <https://arxiv.org/ftp/arxiv/papers/2008/2008.05300.pdf>`_. -They walk their readers through a number of different projects that try to solve the question "Is this human who they -pretend to be?" using joint socio-technological approaches. In the following few sections, you may find a short outline -of a small selection of them. The conlusion of this post will be a commentary on these approaches, and on the underlying -problem of identity in a digital world. - -.. BrightID - -In one scheme, identity is determined by "notary" computers that aggregate large amounts of information on a user's -social contacts. These computers then run an algorithm derived from the SybilGuard_, SybilLimit_ and SybilInfer_ lineage -of random-walk based algorithms. These algorithms assume that authentic social graphs are small world graphs: Everyone -knows everyone else through a friend's friend's friend. They also assume that there is an upper bound on how many -connections with authentic users an attacker can forge: Anyone who is not embedded into the graph well enough is cut -out. Like this, they put an upper limit on the number of theia identites an attacker can assume given a certian number -of connections to real people. - -Disregarding the catastrophic privacy issues of storing large amounts of data on social relationships on someone else's -computer, this second assumption is where this model unfortunately breaks down. Applying common sense, it is completely -realistic for an attacker to forge a large number of social connections: This is precisely what most of social media -marketing is about! A more malicious angle on this would be to consider how in meatspace [#meatspacefn]_ multi-level -marketing schemes are successful in coaxing people to abuse their social graphs to disastrous consequences to the -well-being of themselves and others. Similar schemes would certainly be possible in cyberspace as well. An additional -point to consider is that the upper limit SybilGuard_ and others place on the number of fake identities one can have is -simply not that strict at all. An attacker could still get away with a reasonable number of false identities before -getting caught by any such algorithm. - -.. Duniter - -In another scheme, identity is awarded to anyone who can convince several people already in the network to vouch for -them, and who is at most a few degrees removed from one of several pre-determined celebrities. Apart from again being -vulnerable to conmen and other scammers, this system has the glaring flaw of roundly refusing to recognize any person -who is not willing or able to engage with multiple of its members. Along with the system's informal requirement for -members to only vouch for people they have physically met this leads to a nonstarter in a cyberspace that grown -specifically *because* it transcends national borders and physical distance—two most serious obstacles to in-person -communication. - -.. Idena Network - -The last scheme I will outline in this post is based around a set of `Turing tests`_; that is, quizzes that are designed -to tell apart man and machine. In this system, all participants have to simultaneously undergo a Turing test once in a -fortnight. The idea is that this limits the number of theia identities an attacker can assume since they can only solve -that many Turing tests at the same time. The system uses a particular type of picture classification-based Turing test -and does not seem to be designed with the blind or mentally disabled in mind with accessibility concerns nowhere to be -found in the so-called "manifesto" published by its creators. But even ignoring that, the system obviously fails at an -even more basic level: The idea that everyone takes a Turing test at the same time only works in a world without time -zones. Or jobs for that matter. Also, it assumes that an attacker cannot simply hire a small army of people someplace -else to fool the system. - -.. _SybilLimit: https://www.comp.nus.edu.sg/~yuhf/yuh-sybillimit.pdf -.. _SybilGuard: http://www.math.cmu.edu/~adf/research/SybilGuard.pdf -.. _SybilInfer: https://www.princeton.edu/~pmittal/publications/sybilinfer-ndss09.pdf -.. _`Turing Tests`: https://en.wikipedia.org/wiki/Turing_test - -Identity between Cyberspace and Meatspace -========================================= - -A common thread in these solutions, from the Facebook'esque Stasi_ methods to the crypto-anarchist challenge-response -utopias, is that they all approach digital identity as a question of Objective Truth™ that can unanimously be decided at -a system level—or that can be externalized to the next larger system such as the state. Alas, the important question -remains unasked: - - What *is* identity? - -The answer to this question certainly depends on the system being examined. For example, an important reason the -capitalist corporations mentioned above require knowledge about their users' identity is to generate plausible -statistics for the advertisers that form their customer base, similar to how a farmer will keep statics on yield and -quality for the buyers of his crop. With this background, a full decoupling of platform accounts from a notion of legal -identity seems at odds with the platform's business model—and we will have to adjust our expectations for reform -accordingly. - -A common thread among all systems mentioned above is that they all have a social component to them. For this common use -case of social systems, I want to make a suggestion on how we can approach digital identity in a more practical, less -discriminatory [#discriminatory]_ manner than any of the methods we discussed above. I think both using people's social -connections and proxying the decisions of external authorities such as the state are bad systems to decide who is a -person and who is not. I will now illustrate this point a bit. Let us think about how many digital identities a human -beign might have. First, consider the case of n=0, someone who simply wants no business with the system at all. For -simplicity, let us assume that we have solved this issue of consent, i.e. every person who is identified by the system -consents to this practice. For n=1, the approaches outlined above all provide some approximate solution. States may not -grant every human sufficient ID (e.g. children, the mentally disabled or prisoners might be left out), and the social -systems might fail to catch people who simply do not have any friends, but otherwise their approximations hold. Maybe. -But what about n=2, n=3, ...? None of these systems adequately consider cases where a human being might legitimately -wish to hold multiple digital identities, non-maliciously. - -Consider a hypothetical lesbian, conservative politician. An active social media presence is a core component of a -modern politician's carreer. At the same time, "conservative homophobe" is still well within the realm of tautology and -it would be legitimate for this politician to wish to not disclose a large fraction of their private life to the world -at large. They might have a separate online identity for matters related to it. For this politician, the social -relationship-based systems referenced above would either incorporate outing as a design feature, or they would force -the politician to choose either of their two identities: To choose between private life and carreer. When deferring to -the state as the decider over personhood, at least the platform's operator would know about the outrageously sensitive -link between the politician's online identities. Clearly, no such solution can be considered socially just. - -Let us try not to be caught up on saving the world at this point. The issue of conservative homophobia is out of the -scope of our consideration, and it is not one that anyone can solve in the near future. Magical realism aside, least of -all can some technological thing beckon this change. There is a case for legitimate uses of multiple, separate digital -identities, and we do not have a technical or political answer to it. All hope is not lost yet, though. We can easily -undo this gordian knot by acknowledging an unspoken assumption that underlies any social relationships between real -people, past the procrustean bed of computer systems or organizational structures these relationships are cast into. - - As a function of social interaction, digital identities conform to roles_ in sociological terminology, and are not - at all the same as personhood_. Roles are subjective and arise from a relationship between people, and a single - person might legitimately perform different roles depending on context. - -When computer scientists or programmers are creating new systems, there always is an (often implicit) modelling stage. -Formally, during this stage a domain expert and a modeller with a computer science background come together, each -contributing their knowledge to form a model that is both appropriate for real-world use and practical from an -engineering point of view. In practice, these two roles are often necessarily fulfilled by the same person, who is often -also the programmer of the thing. This leads to many computer systems using poor models. A typical example of this issue -are systems requiring a person's name that use three input fields labelled "First Name", "Middle Initial" and "Last -Name". These systems are often created by US-American programmers, who are used to this naming schema from their lived -experience. Unfortunately, this schema breaks down for those few billion people who use their last name first, who have -more than one middle name, or who have multiple given names and do not normally use the first one of those. - -Once a system creator's implicit assumptions have been encoded into the system like this, it is often very hard to get -out of that situation. A pattern to use during careful modelling is to keep the model flexible to account for unforeseen -corner cases. For example, when modelling a system requiring a person's name, one would have to ask what the name is -used for. It may be the most sensible decision to simply ask the user for their name twice: Once in first name/last name -format for e.g. tax purposes, and once with a free-form text field for e.g. displaying on their account page. - -While for names, many systems already use some form of flexible model by e.g. having a *handle* or *nickname* separate -from the *display name*, "social" systems still often are stuck with an identity model based around a concept of a -single, rigid identity. In practice, people perform different roles_ in different circumstances. When asking for a -person's identity, one would get wildly different answers from different people. A person's identity as perceived by -others is coupled to their relationship more than to some underlying, biological or administrative truth. Thinking back -to the straw man politician above, this is evident in subtle ways in almost all our everyday relationships: Some people -may know me by my legal name, some by my online nickname. To some I may be a computer scientist, to some a flatmate. -None of my friends and acquaintances have ever wanted to see my passport, or asked to take my DNA to ascertain that I am -a distinct human being from the other humans they know. Likewise, identifying me by my social connections is impractical -as it would require an exceedingly weird amount of what can only be described as snooping. Yet, this concept of a -single, consistent, global, true identity is exactly what up to now all technological solutions to the identity problem -are trying to achieve. - -Building Bridges -================ - -I think I can offer you one main take-aways from the discussion above. - - During modelling social systems, focus on relationships—not identity. - -Rephrased into more actionable points, as someone designing a social digital system, do the following: - -0. Early in the design stages, take the time to consider fundamental modelling issues like this one. If you don't, you - will likely get stuck with a sub-optimal model that will be hard to get rid of. -1. Where possible, be flexible. Allow people to chose their own identifier. Don't require them to use their real names, - they may not wish to disclose those or they may not be in a format that is useful to you (they may be too long, too - short, too ubiquituous, in foreign characters etc.). A free-form text field with a reasonable length limit is a good - approach here. -2. Do not use credit cards or phone numbers to identify people. There are many people who do not have either, and - scammers can simply buy this data in bulk on the darknet. -3. Allow people to create multiple identites [#accountswitchopsec]_, and acknowledge the role of social relationships in - your interaction features. People have very legitimate reasons to separate areas of their lifes, and it is not for - you or your computer to decide who is who to whom. If your thing requires a global search function, re-consider the - data protection aspects of your system. If you want to encourage social functions in the face of bots and trolls, - make it easy for people to share their identities out-of-band, such as through a QR code or a copy-and-pasteable - short link. If you require someone's legal name or address for billing purposes, unify these identities behind the - scenes if at all and allow them to act as if fully independent in public. - -While change of perspective comes with its share of user experience challenges, but also with a promise for a more -human, more dignified online experience. Perhaps we can find a way to adapt cyberspace to humans, instead of continuing +Identity in Cyberspace +====================== + +.. Identity is a frequent problem +.. Easy solutions abound +.. Precise modelling is uncommon +.. True identity is sensitive, hard to handle +.. +.. Often, conversational features emphasized -> true identity is unnecessary +.. Social role theory +.. Call to action + +Most computer systems that interface with humans have a concept of user identity. The data structures used for its +storage vary, but usually one *account* corresponds to one human *user*. In many applications, the system operator tries +to ensure that one user cannot create multiple accounts. In online social networks, astrotufing_ and trolling are easier +to fight when limits are imposed on account creation. In online stores, fraud prevention means the store operator needs +their customers legal identity and the operator must be able to ban offending customers. In mobile messaging systems, +users have to be able to find each other by some identifier such as name or phone number, and this identifier has to be +unique and hard to forge. + +Today, in systems that allow anyone to create an account have largely converged to require either an email address or a +mobile phone number. Email addresses are used by systems that are less vulnerable to abuse and that are used on laptop +or desktop computers. Mobile phone numbers are abundantly used in smartphone apps, as well as in systems more prone to +abuse such as online social networks or ecommerce. Both are easily verified using a confirmation email or SMS. + +When designing or programming an online system, it is uncommon that the precise real-world semantics of accounts are +modelled. Most computer systems use ad-hoc data models. During their creation, their programmers implicit assumptions +about the world are encoded into these data models. Most of the time this works fine, but it does lead to significant +blind spots that can make systems break down for a fraction of their users. + +Lives in Meatspace +================== + +A consequence of the proliferation of phone numbers being used to identify people is that most people will not be able +to create multiple accounts. *"That's the point!"* you might say, but while we want to prevent scammers, spammers and +boored schoolchildren from messing with our systems, everybody else may have legitimate reasons to have more than one +account. + +We can apply sociology's model of roles_ to understand this issue. In sociology, a role is the comprehensive pattern of +rules and expectations that govern an individual's behavior corresponding to their social position. A key fact is that +most people occupy mutliple roles. A parent may also be a company employee or a wife and perform accordingly given the +circumstances. Systems that tie digital identity to legal personhood through the contracts behind phone numbers impede +their users' attempts at role separation. Effects of this are e.g. that nowadays employers routinely screen applicants' +social media accounts for unacceptable content. + +While this role conflict merely amounts to a minor inconvenience to most there are many to who it poses an existential +problem. Consider an LGBT+ person living in a repressive country or a politically conservative person living in a +very liberal city. Both have legitimate reasons to strictly separate parts of their private lives from others. For both, +much is at stake. Yet, both will have to practically circumvent most online systems registration barriers to implement +this separation. + +Trusting the User +================= + +While there is no single solution to these issues, there are several possible mitigations. The first and most important +one is to systematically think about the system's data model when creating it. Which assumptions about the real world +are inherent in it? Are these assumptions likely to cause issues? Ad-hoc models are easily created, but hard to get rid +of when they start causing problems. + +A general guideline on identity should be that hindering trolls by requiring things like phone numbers or credit card +numbers is very likely to also be an obstacle to many entirely legitimate uses. Captchas_ or invitation links can help +to keep out the trolls. Another approach is to limit the damage a troll can cause with things like effective moderation +systems, reputation systems or by limiting the reach of newly created accounts. + +Outside of e-commerce, actually tying a digital account to a real-world identity is very rarely necessary. The value of +a messenger app is not in the names in its contacts list, but the conversations behind these names. When two people meet +each other on the street, their interaction is shaped by a myriad of social factors—but *not* by them showing each other +their photo ID. + +Humans with their messy identities do not fit today's cyberspace well. Let's adapt cyberspace to humans, instead of trying it the other way around. .. _astroturfing: https://en.wikipedia.org/wiki/Astroturfing -.. _Stasi: https://en.wikipedia.org/wiki/Stasi - -.. [#cryptocurrency] Pseudo-currencies in that, while they provide some aspects of a regular currency such as ownership - and transactions, they lack most others. Traditional currencies are backed by states, regulated by central banks - tasked with maintaining their stability and ultimately provide accountability through law enforcement, courts - and political elections. - -.. [#discriminatory] Discriminatory as in discriminating against minorities, but also as in deciding what is and what is - not. - -.. [#accountswitchopsec] This does mean that you should not actively prevent people from creating multiple accounts. It - does not necessarily entail building a proper user interface around this practice. If you do the latter, e.g. by - offering a "switch identity" button or an identiy drop-down menu on a post submission form, you can easily - encourage slip-ups that might disclose the connection between two identities, and you make it possible for - someone hacking a single login to learn about this connection as well. - -.. [#meatspacefn] Meatspace_ is where people physically are, as opposed to cyberspace - -.. _Meatspace: https://dictionary.cambridge.org/dictionary/english/meatspace .. _roles: https://en.wikipedia.org/wiki/Role -.. _personhood: https://en.wikipedia.org/wiki/Personhood +.. _Captchas: https://link.springer.com/content/pdf/10.1007/3-540-39200-9_18.pdf + |