Internet censorship
Domain fronting
The near future of circumvention

David Fifield <david@bamsoftware.com>

https://www.bamsoftware.com/talks/msudenver-censor/

I pitched this talk as "The near future of Internet censorship circumvention," but I realized that there's a bit of background required for that to make sense. So I'm going to start with some background topics, but it's okay for you because it's really fascinating background. What I want to go over today is 1) describe the censorship threat model and what I mean by "censorship" and "circumvention" and why the problem is harder than it may seem at first; 2) talk about domain fronting, a circumvention technique that meets the challenges of the model; and then finally 3) talk about potential upcoming developments and how they may change things. And then I'd like to leave plenty of time for questions because this is a topic that tends to provoke lots of questions especially if it's the first time you've seen them.

Censorship model

This is the model I always have in mind. We have a censored client, who resides inside a censor-controlled network. The censor controls all the network links, routers, etc. inside the network, and in particular the routers at the perimeter of the network. We usually allow the censor quite a lot of power, for example we don't really limit it computationally, it has the power to inspect all the packets it sees, can block any traffic for any reason, inject, replay, etc. But we do assume that the client's own computer is trustworthy, and that the censor doesn't control links outside its perimeter.

The challenge is, you're a censored client, and you need to send out a message, and you know that both the contents of your message and the destination address will be blocked by the censor. How do you do it? Well, if direct communication with the destination is blocked, then the only alternative is indirect communication. So at minimum, you have to route your message through some third party, which we generally call a "proxy." And second, if you suppose that the client and destination share a key, you can encrypt the contents and prevent the censor from blocking on that basis. These two basic ideas—in a variety of forms—underlie essentially all principled circumvention designs.

What do I mean by "principled"? That's one of the difficulties of this line of research, that even if you do empirical testing, it's easy to convince yourself that an idea that is in fact broken actually works. It's easy to run a small proxy and keep it secret among a close circle of friends—the challenge is in scaling. Even if you have a large network of proxies, how to you inform your legitimate users of the addresses of those proxies, without also informing the censor? The censor can do anything a normal user can do. This "insider attack" is one of the reasons why the problem is challenging. We prefer to avoid security through obscurity: the more we can assume the censor knows about how the system works, the better.

I'm going to next show you a circumvention technique called domain fronting, which remains difficult to block despite not relying on any secrecy. But first we need to take a walk down memory lane to discuss the twisted history of HTTPS, and the protocol evolution that leads us to today.

HTTP/1.0

without TLSwith TLS

Go back to early HTTP, HTTP/1.0, circa 1995. How does it work? The client sends a GET, and the name of the page it wants, and the server sends back a status code and the content.

And how do we implement HTTPS, that is, add a layer of TLS? We just prepend a TLS handshake. The client initiates the handshake, and the server responds with its certificate. There are some more steps after that to exchange keys, but those details don't matter here. After the handshake, HTTP/1.0 proceeds exactly as before, but now it's under a layer of TLS encryption.

HTTP/1.1 and virtual hosting

without TLSwith TLS

Skip ahead to HTTP/1.1, one of its enhancements was support for virtual hosting. Virtual hosting is when one web server, one IP address, serves several domains. The only way that can work is if the client informs the server of which domain it wants. It does that using a new mandatory Host header.

How does TLS interact with virtual hosting? The client initiates the handshake as before, but then—what certificate is the server supposed to send? We have a chicken-and-egg problem: the client cannot send its desired host until the handshake is complete, but the handshake cannot complete without the server knowing the desired host. So for a long time, about ten years, virtual hosting was just incompatible with HTTPS. If you wanted an HTTPS server, it had to be on its own dedicated IP address.

HTTPS with SNI

SNI: Server Name Indication (TLS extension)

The resolution to the impasse was an extension to TLS, called SNI for "server name indication." It solves the problem in about the simplest, stupidest way possible: the client just sends its desired domain in plaintext as part of its initial message.

https://example.comnot secret/pathsecret

SNI solves the problem of HTTPS virtual hosting, But it's a bummer that it just dumps the destination domain name in plaintext; it means that HTTPS by itself is not very helpful for circumvention. We get encryption to hide our message contents, but the censor can still passively observe which domain we are connecting to, out of the potentially many that the web server supports.

I'll point out a peculiarity of the HTTPS with SNI diagram. Notice how the name "example.com" appears, redundantly, in two different places. This is kind of a historical accident; it might have happened differently, but as it turns out today we have this redundancy. The question is, what happens if the two names are not the same?

Domain fronting

$ wget -q -O - https://www.google.com/ --header "Host: www.android.com" | grep "<title>"
    <title>Android</title>
$ curl -s https://www.google.com/ -H "Host: www.android.com" | grep "<title>"
    <title>Android</title>

I'm not aware of any standard that says exactly what should happen, and in fact implementations differ. But a common implementation choice leads to you getting the TLS certificate of the name you requested in SNI, but the HTTP contents requested by the Host header. We call the mismatching of these names "domain fronting." You can see how this would be useful for circumvention. A web server may host both a blocked and a non-blocked site. You can front access to the blocked site using the name of the non-blocked site. To the censor, such access is indistinguishable from ordinary access to the non-blocked site. The censor can block the circumventing traffic only by blocking some other site.

And virtual hosting is extremely common these days, in the form of CDNs (content delivery networks). CDNs serve as a gateway to thousands of web sites, and most CDNs do (or did) work with domain fronting. The upshot of all this is that we can access any blocked web site, as long as it is on a CDN that also hosts a non-blocked site that the censor is unwilling to block.

But we can actually do a little better. In place of the blocked site, let's become a customer of the CDN and run our own proxy. Now, the client can use domain fronting to reach our proxy, and our proxy can serve as the last mile towards whatever other destination the client may desire.

This basic design, or some variation on it, is now used—or was used until recently—with great success by Tor, Signal, and various circumvention software.

Earlier history (2014–2017)

Recently, though, things have started to change. While domain fronting remained essentially unblocked by the world's censors, a few months ago Amazon and Google—two of the services we had been using—announced that they were going to stop supporting domain fronting—that is, start enforcing a match between the SNI and the Host header—effectively breaking circumvention systems. The reasons why they did so are murky and not really important to this discussion. But the general feeling is that domain fronting is not going to be a viable technique for much longer.

Encrypted SNI and the future

Now we are at the forward-looking part of the talk. Hope is on the horizon in the form of a proposal for encrypted SNI in TLS 1.3. It's exactly what it sounds like. For someone like me, encrypted SNI almost sounds too good to be true. It solves all our circumvention problems. It's domain fronting without the hacks. I expect that encrypted SNI will be really useful to us, but also that it will displace the battlefield. I wrote an essay recently with some of my reflections, these are some of the highlights.

What will censors do once censors have to move out of their comfort zone, when encrypted SNI makes their typical border-firewall techniques ineffective? We're already starting to see some example of this phenomenon. For example, recently the government of China pressured Apple to remove VPN apps from the App Store. VPNs can be used for circumvention. The fact that the censor had to go to Apple shows a certain weakness, something that it couldn't accomplish strictly in-house. But it is also emblematic of censors having to reach out.

Until now we have assumed—because we had to—that network service providers like CDNs don't cooperate with the censor. This assumption is going to be tested. Network middlemen like CDNs are going to become attractive targets of censorship. While encrypted SNI will overall be good, it will also increase a trend towards centralization. A lot of power is going to be centralized in a few hands, and we'll increasingly rely on providers to act rightly.

One of the ways in which encrypted SNI may fail is if e.g. commercial firewalls start to block it. For this reason, I think that circumvention designers should not be the first to adopt it; rather, it would be better if it appears in browsers first, so it's well established and tolerated by network middleboxes by the time circumvention starts using it.

Snowflake