Private Contact Discovery

2019/06/22

Private Contact Discovery

Private Contact Discovery (PCD) is the formal name for the problem modern smartphone messenger applications have on installation: Given a user's address book, find out which of their contacts also use the same messenger without the messenger's servers learning anything about the user's address book. The widespread non-private way to do this is to simply upload the user's address book to the app's operator's servers and do an SQL JOIN keyed on the phone number field against the database of registered users. People have tried sprinkling some hashes over these phone numbers in an attempt to improve privacy, but obviously running a brute-force preimage attack given a domain of maybe a few billion valid inputs is not cryptographically hard.

Private Contact Discovery can be phrased in terms of Private Set Intersection (PSI), the cryptographic problem of having two parties holding one set each find the intersection of their sets without disclosing any other information. PSI has been an active field of research for a while and already yielded useful results for some use cases. Alas, none of those results were truly practical yet for usage in PCD in a typical messenger application. They would require too much CPU time or too much data to be transferred.

At USENIX Security 2019, Researchers from technical universities Graz and Darmstadt published a paper titled Private Contact Discovery at Scale (eprint | PDF). In this paper, they basically optimize the hell out of existing cryptographic solutions to private contact discovery, jumping from a still-impractical state of the art right to practicality. Their scheme allows a client with 1k contacts to run PCD against a server with 1B contacts in about 3s on a phone. The main disadvantage of their scheme is that it requires the client to in advance download a compressed database of all users, that clocks in at about 1GB for 1B users.

I found this paper very interesting for its immediate practical applicability. As an excuse to dig into the topic some more, I gave a short presentation at my university lab's research seminar on this paper (slides: PDF | ODP).

Even if you're not working on secure communication systems on a day-to-day basis this paper might interest you. If you're working with social account information of any kind I can highly recommend giving it a look. Not only might your users benefit from improved privacy, but your company might be able to avoid a bunch of data protection and accountability issues by simply not producing as much sensitive data in the first place.