Privacy is part of the zeitgeist of this moment. Our headlines are dominated with high profile privacy scandals and there is no indication that this is going to slow down anytime soon. On the other hand, advances in data science and the creation of incredibly powerful new platforms offer increasingly large benefits to their users. We’re struggling to grapple with the growing demand for our data at the same time as growing calls for privacy.

But it is becoming increasingly clear that it is a false paradigm that we must give up our privacy to gain the benefits of our data. A new family of privacy preserving technologies are emerging to shatter that paradigm, and in this article I’ll provide a brief introduction to federated learning, homomorphic encryption, trusted execution environments, and zero-knowledge proofs and try to articulate why I think they are important.

Federated learning

Federated learning enables computers to collaboratively learn while keeping their data on their devices. Instead of sending data to one centralized machine, local computers will train a shared algorithm on the data they have locally and share only information on what they’ve learned. These learnings, or rather changes to a shared algorithm, will be aggregated and used to improve the shared algorithm, which will then be disseminated back to participating computers for further training. Critically this decouples the process of training an AI/ML algorithm from the need to centralized data in the cloud and potentially exposing individual’s privacy.

Federated learning is how the keyboard on your phone is able to learn from you and provide you typing corrections and predictions. An added benefit of this model is that local machines will have the shared algorithm, which lets these machines instantly and seamlessly use it. Typing predictions would be a lot less useful is they had to be sent to the cloud, processed, and sent back to a machine. The same is true of other applications, like self-driving cars for example.

tl;dr: AI/Machine learning, but you get to keep your data on your local machine.

Fully homomorphic encryption

Fully homomorphic encryption lets us perform arbitrary computation on encrypted data. It works like this:

  1. Alice encrypts her data and sends it to Bob.
  2. Bob performs some computation on Alice’s encrypted data, such as putting it into an algorithm.
  3. Bob passes the encrypted result back to Alice. In this process Bob does not learn anything about the unencrypted data or the results of its computation.
  4. Alice decrypted the results, getting the same outcome as if her unencrypted data was put through Bob’s algorithm.

Fully homomorphic enecrpytion enables users to outsource computation to untrusted parties without giving up their privacy. Imagine being able to encrypt your genome, send it to an untrusted 3rd party with a trained AI, and get a valuable result back all without anyone else seeing your data. That’s not a hypothetical anymore, built an MVP of this at a hackathon.

tl;dr: running AI/ML algorithms on encrypted data

Trusted execution environments

Trusted execution environments, or TEEs, are magical pieces of hardware which keep prying eyes from seeing inside them. As such you can run applications inside of them and other applications, even the operating system, can’t learn what is going on inside the TEE or tamper with its state. As such, they provide guarantees of integrity and confidentiality.

TEEs are fairly practical today. A drawback is that they require specialized hardware to run, though mobile phone manufacturers have used that in their favor to get TEEs in the hands of consumers, which can then be used for secure biometric authentication and other secure applications. Other popular use cases are private machine learning, secure key generation, and solving cryptographic constructs like Yao’s Millionaire Problem.

tl;dr: magical hardware with integrity and privacy guarantees

Zero-knowledge proofs

Zero-knowledge proofs allow us to validate the truth of a statement without revealing any underlying data about that truth. One of the more popular implementations of zero-knowledge proofs is zk-SNARKs. These are structured like the following:

  • For the value of x;
  • I know some secret value w;
  • Such that condition D holds on x and w.

To give a concrete example:

  • For x = 3869
  • I know two integers p,q;
  • Such that p*q = 3869

Or more generally:

  • For the chess board configurations A and B;
  • I know a sequence of chess moves S;
  • Such that starting from A and applying S, all moves are legal and the final configuration is B.

Any third party observer would be able to verify the validity of these statements without the disclosure of the underlying data (S / p,q / w). Zero-knowledge proofs enable a wide ranging new landscape of possibilities. Already they are being leveraged alongside blockchains to allow transactions to be verified without revealing the sender, receiver, or transaction amount. The Mediledger project extended this from money to drugs along a supply chain as well.

tl;dr: proving you know some secret without revealing what that is

Why this matters

Think about it this way: if you were a developer what would you make if you could give your users full guarantees of privacy? If you were a user, what would you participate in if you knew your privacy wouldn’t be violated? What can we not do today because we can’t guarantee participants privacy? These technologies give us a “building block” of privacy that will unlock new applications, businesses, and ways of collaborating.